My name is Bonaventure F. P. Dossou, I am a Computer Science Ph.D. student at McGill University, in the NLP group specializing in Natural Language Processing (NLP) + Healthcare. I am advised by Professor Jackie Cheung. I hold a Bachelor of Science with honours in Mathematics, from Kazan Federal University, Russia, and a Master of Science with honours in Computer Science and Data Engineering from Jacobs University Bremen, Germany. Recently I was a Machine Learning Consultant and Research Scientist at Phagos Biotech where I worked on building large-scale language models for genomic sequencing, and bacteriophages.

My interests are in Natural Language Processing (Machine Translation, Large Language Modeling, Speech Recognition, Information Retrieval) for low-resourced languages and Machine Learning for Healthcare (Drug Discovery, small molecule generations, gene therapy). I am the creator of many Afro-centric NLP systems like the FFRTranslate, AfroLM and Okwugbe ASR (Automatic Speech Recognition for low-resourced languages) Python library (just to name but a few). My research on my native language, Fongbé have been significant contributions and towards its integration (in July 2024) in Google Translate. You can find my CV here.

Before my PhD, I was a research intern at the Mila Quebec AI Institute, working on Drug Discovery projects using Deep Learning (and GFlowNets), at Mila Quebec AI Institute under the supervisions of Yoshua Bengio and Dianbo Lui. More specifically, I worked on leveraging GFlowNets for Biological Sequence Design but also to learn the posterior distribution over binary multimodal dropout masks (GFlowOut). Previously, I was also a NLP Data Scientist at Roche Canada and Research Scientist at ModelisLabs, working on Health & Pharma-related challenges. Alternatively, I am working on NLP language technologies, with a focus on low-resourced Sub-Saharan languages at Masakhane Research Foundation (and previously at Google Research).

Past Work and Research Experiences

1. Mila Scientist in Residence [Probe Medical]
2. Mila Scientist in Residence (AI for Chemical Compound Discovery) [Modelis]
3. ML Student Researcher [Google Research]
3. Research Intern (AI for Drug Discovery) [Mila Quebec AI Institute]
4. ML Research Scientist Consultant (AI for Drug Discovery) [Phagos]
5. NLP Research Intern [Roche Canada]
7. Senior Machine Learning Engineer [Omdena]
8. African NLP Researcher & Core Member [Masakhane]
9. Part-time Senior Data Scientist [Speeqo]
10. Fundamental Research Scientist [Lelapa AI]

Selected Publications

All publications can be accessed through my Semantic Scholar and Google Scholar pages.

1. AfriMed-QA: Towards A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset Tobi Olatunji, Charles Nimo,..., Bonaventure F. P. Dossou,... (under review at NeuRIPS 2024)
2. Adapting Pretrained ASR Models to Low-resource Clinical Speech using Epistemic Uncertainty-based Data Selection. Bonaventure F. P. Dossou (SIIGUL 2024, under review at NeuRIPS 2024)
3. A Study of Acquisition Functions for Medical Imaging Deep Active Learning Bonaventure F. P. Dossou (Deep Learning Indaba 2023)
4. FonMTL: Towards Multitask Learning for the Fon Language. Bonaventure F. P. Dossou (EMNLP 2023)
5. AfriSpeechNames: Most ASR models "butcher" African Names. Tobi Olatunji, Tejumade Afonja, Bonaventure F. P. Dossou (Interspeech 2023)
6. Pretrained Vision Models for Predicting High-Risk Breast Cancer Stage. Bonaventure F. P. Dossou (ICLR 2023)
7. GFlowOut: Dropout with Generative Flow Networks. Dianbo Liu, Moksh Jain, Bonaventure F. P. Dossou (ICML 2023)
8. AfroLM: A Self-Active Learning-based Multilingual Pretrained Language Model for 23 African Languages. Bonaventure F. P. Dossou (EMNLP 2022)
9. Biological Sequence Design with GFlowNets. Moksh Jain, Emmanuel Bengio, Alex Hernandez-Garcia, Jarrid Rector-Brooks, Bonaventure F. P. Dossou (ICML 2022)
10. MeSH2Matrix: Machine learning-driven biomedical relation classification based on the MeSH keywords of PubMed scholarly publications. Houcemeddine Turki, Bonaventure F. P. Dossou (ECIR 2022)
11. GraphCC for Diverse and Novel Antimicrobial Peptides Generation and Selection. Bonaventure F. P. Dossou (preprint)
12. OkwuGbé: End-to-End Speech Recognition for Fon and Igbo. Bonaventure F. P. Dossou (EMNLP 2021)
13. MMTAfrica: Multilingual Machine Translation for African Languages. Chris C. Emezue and Bonaventure F. P. Dossou (EMNLP 2021)
14. FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition. Bonaventure F. P. Dossou (ICCV 2021)
15. Crowdsourced Phrase-Based Tokenization for Low-Resourced Neural Machine Translation: The Case of Fon Language. Bonaventure F. P. Dossou (EACL 2021)
16. AfriVEC: Word Embedding Models for African Languages. Case Study of Fon and Nobiin. Bonaventure F. P. Dossou (EACL 2021)
17. FFR v1.1: Fon-French Neural Machine Translation. Bonaventure F. P. Dossou (ACL 2020)

Awards, Honours, Grants & Services

1. University Scholars Leadership Symposium Delegate (2024)
2. Honorable Mention Solution for the Nightingale Contest for Detecting Active Tuberculosis Bacilli (2024) 3. Two Best Poster Awards at the Deep Learning Indaba Conference (2023)
4. Winner of AIM-AHEAD Health Equity Data Challenge (2023)
5. Winner of Nightingale Predicting High-Risk Breast Cancer Contexts (2022, 2023)
6. Mila Quebec AI Institute's Impact Annual Reports (2022, 2023)
7. McGill Engineering Doctoral Award (2022)
8. Innovation Award 2022 of the German African Diaspora (2022)
9. Jacobs University's Dean's Prize for outstanding Master's Thesis (2022)
10. Shuttleworth Flash Grant (2021)
11. Winner of the ViVaTech-Unesco Challenge for Cracking Language Barriers through Data and AI (2021)
12. Wikimedia Foundation Research of the Year Award 2021 (2021)
13. Grant "Lacuna Fund" for Named Entity Recognition for Fon with Masakhane Community (2021)
14. Jacobs University Community Award for Innovation, Cultural Understanding, and Diversity (2021)
15. Jacobs University Mobility Area's Scholarship & Jacobs University Faces (2020-2022)
16. Global Nominee and Benin's finalist with «Afro Num» - NASA's World Space Apps Challenge (2020)
17. Winner of the National Russian AI Hackathon (2019)
18. International interviews and articles on BBC, Voice of America, German, Russian newspapers, and TVs (2020-)
19. Scientific presentations and publications, Workshops organizations, and Reviewing Services at ACL, EACL, NAACL, AACL, EMNLP, ICML, ICLR, NeuRIPs (2020, 2021, 2022, 2023, 2024)