I am Bonaventure F. P. Dossou, a 3rd-year Ph.D. candidate, AI researcher, author, United Nations University scholar, and co-founder of Lanfrica, a platform dedicated to documenting and connecting African language resources. I am also a Computer Science Ph.D. student at McGill University in the NLP group, specializing in Natural Language Processing (NLP) and Healthcare under the supervision of Professor Jackie Cheung. I am renowned for integrating Fon, a traditionally underrepresented African language, into Google Translate and for championing the inclusion of marginalized languages on global digital platforms.

My background bridges mathematics, healthcare, and language technology. I hold a Bachelor of Science with honours in Mathematics from Kazan Federal University and a Master of Science with honours in Computer Science and Data Engineering (with a minor in Bioinformatics) from Jacobs University Bremen.

I am widely recognized for my contributions to African language technology. I have created several Afro-centric NLP systems such as FFRTranslate, AfroLM, and the Okwugbe ASR Python library for low-resource speech recognition. My work consistently centers marginalized languages and aims to democratize access to language technologies for communities historically excluded from global digital platforms.

As a researcher, I am deeply committed to bridging advanced technology with real-world impact. My interests span Machine Learning for Healthcare — including drug discovery, gene and phage therapy, and medical imaging — as well as NLP for low-resourced African languages. I am known for delivering high-impact talks at leading AI conferences across academic and industry spheres and for driving research that addresses global challenges through equitable, accessible AI.

Previously, I was a Machine Learning Consultant and Research Scientist at Phagos Biotech, where I developed large-scale language models for genomic sequencing and bacteriophages. Before my PhD, I was a research intern at the Mila Quebec AI Institute working with Yoshua Bengio and Dianbo Liu on drug discovery using Deep Learning and GFlowNets — particularly for Biological Sequence Design and GFlowOut. I also previously worked as an NLP Data Scientist at Roche Canada and as a Research Scientist at ModelisLabs on Health & Pharma-related challenges.

Alongside my academic work, I contribute to the development of NLP technologies for Sub-Saharan African languages at the Masakhane Research Foundation and have collaborated with Google Research on equitable language technologies.

You can find my CV here.

Past Work and Research Experiences

1. Mila Scientist in Residence [Probe Medical]
2. Mila Scientist in Residence (AI for Chemical Compound Discovery) [Modelis]
3. ML Student Researcher [Google Research]
3. Research Intern (AI for Drug Discovery) [Mila Quebec AI Institute]
4. ML Research Scientist Consultant (AI for Drug Discovery) [Phagos]
5. NLP Research Intern [Roche Canada]
7. Senior Machine Learning Engineer [Omdena]
8. African NLP Researcher & Core Member [Masakhane]
9. Part-time Senior Data Scientist [Speeqo]
10. Fundamental Research Scientist [Lelapa AI]

Selected Publications

All publications can be accessed through my Semantic Scholar and Google Scholar pages.

1. Towards Open-Ended Discovery for Low-Resource NLP Bonaventure F. P. Dossou, Henri Aïdasso (EMNLP 2025)
2. Early Prediction of Postpartum Mood Disorders from Longitudinal Wearable Biometrics using deep learning and times series generative adversarial network Bonaventure F. P. Dossou, Mercy Nyamewaa Asiedu, Maja Mataric, Katherine A Heller, Belen Lafon, Nichole Young-Lin (MLHC 2025)
3. Rethinking Full Finetuning from Pretraining Checkpoints in Active Learning for African Languages Bonaventure F. P. Dossou, Ines Arous, Jackie C. K. Cheung (ACL 2025)
4. AfriMed-QA: Towards A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset Tobi Olatunji, Charles Nimo,..., Bonaventure F. P. Dossou,... (ACL 2025)
5. Adapting Pretrained ASR Models to Low-resource Clinical Speech using Epistemic Uncertainty-based Data Selection. Bonaventure F. P. Dossou (ACL 2025)
6. A Study of Acquisition Functions for Medical Imaging Deep Active Learning Bonaventure F. P. Dossou (Deep Learning Indaba 2023)
7. FonMTL: Towards Multitask Learning for the Fon Language. Bonaventure F. P. Dossou et.al. (EMNLP 2023)
8. AfriSpeechNames: Most ASR models "butcher" African Names. Tobi Olatunji, Tejumade Afonja, Bonaventure F. P. Dossou et.al. (Interspeech 2023)
9. Pretrained Vision Models for Predicting High-Risk Breast Cancer Stage. Bonaventure F. P. Dossou et.al. (ICLR 2023)
10. GFlowOut: Dropout with Generative Flow Networks. Dianbo Liu, Moksh Jain, Bonaventure F. P. Dossou et.al. (ICML 2023)
11. AfroLM: A Self-Active Learning-based Multilingual Pretrained Language Model for 23 African Languages. Bonaventure F. P. Dossou et.al. (EMNLP 2022)
12. Biological Sequence Design with GFlowNets. Moksh Jain, Emmanuel Bengio, Alex Hernandez-Garcia, Jarrid Rector-Brooks, Bonaventure F. P. Dossou et.al. (ICML 2022)
13. MeSH2Matrix: Machine learning-driven biomedical relation classification based on the MeSH keywords of PubMed scholarly publications. Houcemeddine Turki, Bonaventure F. P. Dossou et.al. (ECIR 2022)
14. GraphCC for Diverse and Novel Antimicrobial Peptides Generation and Selection. Bonaventure F. P. Dossou et.al. (preprint)
15. OkwuGbé: End-to-End Speech Recognition for Fon and Igbo. Bonaventure F. P. Dossou et.al. (EMNLP 2021)
16. MMTAfrica: Multilingual Machine Translation for African Languages. Chris C. Emezue and Bonaventure F. P. Dossou (EMNLP 2021)
17. FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition. Bonaventure F. P. Dossou et.al. (ICCV 2021)
18. Crowdsourced Phrase-Based Tokenization for Low-Resourced Neural Machine Translation: The Case of Fon Language. Bonaventure F. P. Dossou et.al. (EACL 2021)
19. AfriVEC: Word Embedding Models for African Languages. Case Study of Fon and Nobiin. Bonaventure F. P. Dossou et.al. (EACL 2021)
20. FFR v1.1: Fon-French Neural Machine Translation. Bonaventure F. P. Dossou et.al. (ACL 2020)

Awards, Honours, Grants & Services

1. Borealis AI PhD Fellowship Award (2024)
2. University Scholars Leadership Symposium Delegate (2024)
3. Honorable Mention Solution for the Nightingale Contest for Detecting Active Tuberculosis Bacilli (2024) 4. Two Best Poster Awards at the Deep Learning Indaba Conference (2023)
5. Winner of AIM-AHEAD Health Equity Data Challenge (2023)
6. Winner of Nightingale Predicting High-Risk Breast Cancer Contexts (2022, 2023)
7. Mila Quebec AI Institute's Impact Annual Reports (2022, 2023)
8. McGill Engineering Doctoral Award (2022)
9. Innovation Award 2022 of the German African Diaspora (2022)
10. Jacobs University's Dean's Prize for outstanding Master's Thesis (2022)
11. Shuttleworth Flash Grant (2021)
12. Winner of the ViVaTech-Unesco Challenge for Cracking Language Barriers through Data and AI (2021)
13. Wikimedia Foundation Research of the Year Award 2021 (2021)
14. Grant "Lacuna Fund" for Named Entity Recognition for Fon with Masakhane Community (2021)
15. Jacobs University Community Award for Innovation, Cultural Understanding, and Diversity (2021)
16. Jacobs University Mobility Area's Scholarship & Jacobs University Faces (2020-2022)
17. Global Nominee and Benin's finalist with «Afro Num» - NASA's World Space Apps Challenge (2020)
18. Winner of the National Russian AI Hackathon (2019)
19. International interviews and articles on BBC, Voice of America, German, Russian newspapers, and TVs (2020-)
20. Scientific presentations and publications, Workshops organizations, and Reviewing Services at ACL, EACL, NAACL, AACL, EMNLP, ICML, ICLR, NeuRIPs (2020, 2021, 2022, 2023, 2024)