My name is Bonaventure F. P. Dossou, I am a Computer Science Ph.D. student at McGill University, in the NLP group specializing in Natural Language Processing (NLP) + Healthcare. I am advised by Professor Jackie Cheung. I hold a Bachelor of Science with honours in Mathematics, from Kazan Federal University, Russia, and a Master of Science with honours in Computer Science and Data Engineering from Jacobs University Bremen, Germany. Recently I was a Machine Learning Consultant and Research Scientist at Phagos Biotech where I worked on building large-scale language models for genomic sequencing, and bacteriophages.

My interests are in Natural Language Processing (Machine Translation, Large Language Modeling, Speech Recognition, Information Retrieval) for low-resourced languages and Machine Learning for Healthcare (Drug Discovery, small molecule generations, gene therapy). I am the creator of many Afro-centric NLP systems like the FFRTranslate, AfroLM and Okwugbe ASR (Automatic Speech Recognition for low-resourced languages) Python library (just to name but a few). You can find my CV here.

Before my PhD, I was a research intern at the Mila Quebec AI Institute, working on Drug Discovery projects using Deep Learning (and GFlowNets), at Mila Quebec AI Institute under the supervisions of Yoshua Bengio and Dianbo Lui. More specifically, I worked on leveraging GFlowNets for Biological Sequence Design but also to learn the posterior distribution over binary multimodal dropout masks (GFlowOut). Previously, I was also a NLP Data Scientist at Roche Canada and Research Scientist at ModelisLabs, working on Health & Pharma-related challenges. Alternatively, I am working on NLP language technologies, with a focus on low-resourced Sub-Saharan languages at Masakhane Research Foundation (and previously at Google Research).

Past Work and Research Experiences

1. NLP Student Researcher, Google Research
2. Research Intern (AI for Drug Discovery), MILA Quebec AI Institute
3. AI for Drug Discovery Research Scientist, Phagos
4. NLP Data Scientist, Roche Canada
5. Scientist in Residence (AI for Chemical Compound Discovery), Modelis
6. Senior Machine Learning Engineer, Omdena
7. African NLP Researcher & Core Member, Masakhane
8. Part-time Senior Data Scientist, Speeqo

Selected Publications

All publications can be accessed through my Semantic Scholar and Google Scholar pages.

1. A Study of Acquisition Functions for Medical Imaging Deep Active Learning Bonaventure F. P. Dossou (Deep Learning Indaba 2023)
2. FonMTL: Towards Multitask Learning for the Fon Language. Bonaventure F. P. Dossou et.al. (EMNLP 2023)
3. Adapting Pretrained ASR Models to Low-resource Clinical Speech using Epistemic Uncertainty-based Data Selection. Bonaventure F. P. Dossou et.al. (EMNLP 2023)
4. AfriSpeechNames: Most ASR models "butcher" African Names. Tobi Olatunji, Tejumade Afonja, Bonaventure F. P. Dossou et.al. (Interspeech 2023)
5. Pretrained Vision Models for Predicting High-Risk Breast Cancer Stage. Bonaventure F. P. Dossou et.al. (ICLR 2023)
6. GFlowOut: Dropout with Generative Flow Networks. Dianbo Liu, Moksh Jain, Bonaventure F. P. Dossou et.al. (ICML 2023)
7. AfroLM: A Self-Active Learning-based Multilingual Pretrained Language Model for 23 African Languages. Bonaventure F. P. Dossou et.al. (EMNLP 2022)
8. Biological Sequence Design with GFlowNets. Moksh Jain, Emmanuel Bengio, Alex Hernandez-Garcia, Jarrid Rector-Brooks, Bonaventure F. P. Dossou et.al. (ICML 2022)
9. MeSH2Matrix: Machine learning-driven biomedical relation classification based on the MeSH keywords of PubMed scholarly publications. Houcemeddine Turki, Bonaventure F. P. Dossou et.al. (ECIR 2022)
10. GraphCC for Diverse and Novel Antimicrobial Peptides Generation and Selection. Bonaventure F. P. Dossou et.al. (preprint)
11. OkwuGbé: End-to-End Speech Recognition for Fon and Igbo. Bonaventure F. P. Dossou et.al. (EMNLP 2021)
12. MMTAfrica: Multilingual Machine Translation for African Languages. Chris C. Emezue and Bonaventure F. P. Dossou (EMNLP 2021)
13. FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition. Bonaventure F. P. Dossou et.al. (ICCV 2021)
14. Crowdsourced Phrase-Based Tokenization for Low-Resourced Neural Machine Translation: The Case of Fon Language. Bonaventure F. P. Dossou et.al. (EACL 2021)
15. AfriVEC: Word Embedding Models for African Languages. Case Study of Fon and Nobiin. Bonaventure F. P. Dossou et.al. (EACL 2021)
16. FFR v1.1: Fon-French Neural Machine Translation. Bonaventure F. P. Dossou et.al. (ACL 2020)

Awards, Honours, Grants & Services

1. Two Best Poster Awards at the Deep Learning Indaba Conference
2. Winner of AIM-AHEAD 2023 Health Equity Data Challenge
3. Winner of Nightingale Predicting High-Risk Breast Cancer Contexts 2022 & 2023
4. Mila Quebec AI Institute's 2022 & 2023 Impact Annual Reports
5. McGill Engineering Doctoral Award (MEDA) 2022
6. Innovation Award 2022 of the German African Diaspora
7. Jacobs University's Dean's Prize for outstanding Master's Thesis
8. Shuttleworth Flash Grant
9. Winner of the ViVaTech-Unesco Challenge for Cracking Language Barriers through Data and AI
10. Wikimedia Foundation Research of the Year Award 2021 with Masakhane Community
11. Grant "Lacuna Fund" for Named Entity Recognition for Fon with Masakhane Community
12. Jacobs University Community Award 2021 for Innovation, Cultural Understanding, and Diversity
13. Jacobs University Mobility Area's Scholarship & Jacobs University Faces
14. Global Nominee and Benin's finalist with «Afro Num» - NASA's 2020 World Space Apps Challenge
15. Winner of the National Russian AI Hackathon 2019
16. International interviews and articles on BBC, Voice of America, German, Russian newspapers, and TVs
17. Scientific presentations and publications, Workshops organizations, and Reviewing Services at ACL, EACL, NAACL, AACL, EMNLP, ICML, ICLR, NeuRIPs (2020, 2021, 2022, 2023)