Max Bain

Email / GitHub / G-Scholar / LinkedIn / Twitter

Bio

I am a member of technical staff at Reka,
where I work on multimodal large language models.
Before, I had the pleasure of completing my PhD at VGG,
under the supervision of Prof A Zisserman.

Image

News

Publications

Balancing the Picture: Debiasing Vision-Language Datasets with Synthetic Contrast Sets
B. Smith*, M. Farinha*, S. M. Hall, H. R. Kirk, A. Shtedritski, M. Bain
Technical report, 2023.
[Paper] [Code]
AutoAD II: The Sequel – Who, When, and What in Movie Audio Description
Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
ICCV, 2023.
[Paper] [Code]
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
Max Bain, Jaesung Huh, Tengda Han, Andrew Zisserman
Interspeech, 2023.
[Paper] [Code]
AutoAD: Movie Description in Context
Tengda Han*, Max Bain*, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
CVPR, 2023. [Highlight]
[Paper] [Code]
A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning
H. Berg, S. Hall, Y. Bhalgat, W. Yang, H. R. Kirk, A. Shtedritski, M. Bain
AACL, 2022.
[Paper] [Code]
The CLIP-Hitchhiker's Guide to Long Video Retrieval
Max Bain, Arsha Nagrani, Gül Varol, Andrew Zisserman
Technical report , 2022.
[Paper] [Code]
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Max Bain, Arsha Nagrani, Gül Varol, Andrew Zisserman
ICCV, 2021.
[Paper] [Code] [Project] [Dataset] [Demo]
Automated Audiovisual Behaviour Recognition in Wild Primates
M. Bain, A. Nagrani, D. Schofield, S. Berdugo, J. Bessa, J. Owen, K. J. Hockings, T. Matsuzawa, M. Hayashi, D. Biro, S. Carvalho, A. Zisserman
Science advances, 2021.
[Paper] [Press]
Condensed Movies: Story Based Retrieval with Contextual Embeddings
Max Bain, Arsha Nagrani, Gül Varol, Andrew Zisserman
ACCV, 2020. [Oral]
[Paper] [Code] [Challenge]
Count, Crop and Recognise: Fine-Grained Recognition in the Wild
Max Bain, Arsha Nagrani, Daniel Schofield, Andrew Zisserman
ICCVW, 2019. [Oral]
[Paper]
Useful links

1. WebVid. Dataset of 10 million captioned shorted videos.
https://github.com/m-bain/webvid.

2. Efficient and accurate speech transcription (& diarization)
https://github.com/m-bain/whisperX.