Max Bain

Bio

I am a member of technical staff at Reka,
where I work on multimodal large language models.
Before, I had the pleasure of completing my PhD at VGG,
under the supervision of Prof A Zisserman.

News

[2024-04-15] reka core relased
[2023-09-18] I join reka
[2023-09-05] whisperx hits 5k stars
[2023-08-18] defended phd thesis, no corrections ☺

Publications

2024

Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
Reka Team (Aitor Ormazabal, Che Zheng, Cyprien de Masson d’Autume, Dani Yogatama, Deyu Fu, Donovan Ong, Eric Chen, Eugenie Lamprecht, Hai Pham, Isaac Ong, Kaloyan Aleksiev, Lei Li, Matthew Henderson, Max Bain, Mikel Artetxe, Nishant Relan, Piotr Padlewski, Qi Liu, Ren Chen, Samuel Phua, Yazheng Yang, Yi Tay, Yuqi Wang, Zhihui Xie, Zhongkai Zhu)
Technical report, 2024.
[Paper] [Chat] [Showcase] [Blog]

2023

Balancing the Picture: Debiasing Vision-Language Datasets with Synthetic Contrast Sets
B. Smith*, M. Farinha*, S. M. Hall, H. R. Kirk^†, A. Shtedritski^†, M. Bain^†
Technical report, 2023.
[Paper] [Code]

AutoAD II: The Sequel – Who, When, and What in Movie Audio Description
Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
ICCV, 2023.
[Paper] [Code]

WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
Max Bain, Jaesung Huh, Tengda Han, Andrew Zisserman
Interspeech, 2023.
[Paper] [Code]

AutoAD: Movie Description in Context
Tengda Han*, Max Bain*, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
CVPR, 2023. [Highlight]
[Paper] [Code]

2022

A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning
H. Berg, S. Hall, Y. Bhalgat, W. Yang, H. R. Kirk, A. Shtedritski, M. Bain
AACL, 2022.
[Paper] [Code]

The CLIP-Hitchhiker's Guide to Long Video Retrieval
Max Bain, Arsha Nagrani, Gül Varol, Andrew Zisserman
Technical report , 2022.
[Paper] [Code]

2021

Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Max Bain, Arsha Nagrani, Gül Varol, Andrew Zisserman
ICCV, 2021.
[Paper] [Code] [Project] [Dataset] [Demo]

Automated Audiovisual Behaviour Recognition in Wild Primates
M. Bain, A. Nagrani, D. Schofield, S. Berdugo, J. Bessa, J. Owen, K. J. Hockings, T. Matsuzawa, M. Hayashi, D. Biro, S. Carvalho, A. Zisserman
Science advances, 2021.
[Paper] [Press]

2020

Condensed Movies: Story Based Retrieval with Contextual Embeddings
Max Bain, Arsha Nagrani, Gül Varol, Andrew Zisserman
ACCV, 2020. [Oral]
[Paper] [Code] [Challenge]

2019

Count, Crop and Recognise: Fine-Grained Recognition in the Wild
Max Bain, Arsha Nagrani, Daniel Schofield, Andrew Zisserman
ICCVW, 2019. [Oral]
[Paper]

Useful links

1. WebVid. Dataset of 10 million captioned shorted videos.
https://github.com/m-bain/webvid.

2. Efficient and accurate speech transcription (& diarization)
https://github.com/m-bain/whisperX.