My name is Bonaventure Dossou,
I've a Bachelor of Science with honours in Mathematics, from Kazan Federal University, Russia, and a Master of Science with honours in Computer Science and Data Engineering from Jacobs University Bremen, Germany. I am a Ph.D. student at McGill University, Canada at the Research Center of Intelligent Machines. I will be specifically working in the Probabilistic Vision Group with Prof. Tal Arbel.

My interests are in in Deep Learning for Computer Vision, and NLP with focus on low-resource languages and healthcare.

I am working on Drug Discovery projects using Deep Learning (and GFlowNets), at Mila Quebec AI Institute under the supervisions of Yoshua Bengio and Dianbo Lui. Previously, I was also a NLP Data Scientist at Roche Canada and Research Scientist at ModelisLabs, working on Health/Pharma-related challenges.

Alternatively, I am working on NLP language technologies, with a focus on low-resourced Sub-Saharan languages at Masakhane Research Foundation (and previously at Google Research).

I am the creator of many Afro-centric NLP systems like the FFRTranslate, AfroLM and Okwugbe ASR (Automatic Speech Recognition for low-resourced languages) Python library (just to name but a few)

Read out my inspirational personal story and how I got into research (includes also a short list of all the scientific talks I have gave), and here is my most recent CV.

Work and Research Experiences

1. NLP Researcher, Google Research
2. Deep Learning for Drug Discovery Researcher, MILA Quebec AI Institute
3. NLP Research Scientist, Lelapa AI
4. Drug Discovery Research Scientist, Phagos
5. NLP Data Scientist, Roche Canada
6. Scientist in Residence - Deep Learning for Chemical Compound Discovery, Modelis
7. Senior Machine Learning Engineer, Omdena
8. African NLP Researcher & Core Member, Masakhane
9. Part-time Senior Data Scientist, Speeqo

Research and Scientific Publications

All publications can be accessed through my Semantic Scholar and Google Scholar pages. Here is a short list:
1. AfricaPOS: Part-of-Speech Tagging for Typologically Diverse African languages (ACL 2023)
2. AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR (under review TACL 2023)
3. Adapting Pretrained ASR Models to Low-resource Clinical Speech using Epistemic Uncertainty-based Data Selection (ACL 2023)
4. Pretrained Vision Models for Predicting High-Risk Breast Cancer Stage (ICLR 2023)
5. AfricaNEWS: News Topic Classification for African languages (AfricaNLP, ICLR 2023)
6. AfriSpeechNames: Most ASR models "butcher" African Names (Interspeech 2023)
7. GFlowOut: Dropout with Generative Flow Networks (ICML 2023)
8. MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition (EMNLP 2022)
9. GraphCC for Diverse and Novel Antimicrobial Peptides Generation and Selection, (preprint)
10. Self-Active Learning for Multilingual Language Models: Case Study of 23 African Languages (EMNLP 2022)
11. A Few Thousand Translations Go a Long Way! Leveraging Pre-trained Models for African News Translation, NAACL 2022
12. Biological Sequence Design with GFlowNets (ICML 2022)
13. MeSH2Matrix: Machine learning-driven biomedical relation classification based on the MeSH keywords of PubMed scholarly publications - BIR, ECIR 2022
14. MMTAfrica: Multilingual Machine Translation for African Languages - WMT, EMNLP 2021
15. FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition, Affective Behavior Analysis In-the-Wild (ABAW) - ICCV 2021
16. OkwuGbé: End-to-End Speech Recognition for Fon and Igbo, WideningNLP - EMNLP 2021 & AfricanNLP - EACL 2021
17. Crowdsourced Phrase-Based Tokenization for Low-Resourced Neural Machine Translation: The Case of Fon Language, AfricaNLP - EACL 2021
18. MasakhaNER: Named Entity Recognition for African Languages, TACL 2021
19. Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets, AfricaNLP - EACL 2021
20. AfriVEC: Word Embedding Models for African Languages. Case Study of Fon and Nobiin, AfricaNLP - EACL 2021
21. An Approach to Intelligent Pneumonia Detection and Integration
22. Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages, Findings of EMNLP 2021
23. Lanfrica: A Participatory Approach to Documenting Machine Translation Research on African Languages
24. Masakhane -- Machine Translation For Africa, AfricaNLP - ICLR 2021
25. FFR v1.1: Fon-French Neural Machine Translation, WideningNLP - ACL 2021
26. FFR v1.0: Fon-French Neural Machine Translation, AfricaNLP - ICLR 2021

Awards, Honours and Grants

1. Winner of Nightingale Predicting High Risk Breast Cancer 2022 & 2023
2. Mila Quebec AI Institute's 2021-2022 Impact Annual Report
3. McGill Engineering Doctoral Award (MEDA)
4. Innovation Award 2022 of the German African Diaspora
5. Dean's Prize for outstanding Master's Thesis
6. Shuttleworth Flash Grant
7. Winner of the ViVaTech-Unesco Challenge for Cracking Language Barriers through Data and AI
8. Wikimedia Foundation Research of the Year Award 2021 with Masakhane Community
9. Grant "Lacuna Fund" for Named Entity Recognition for Fon with Masakhane Community
10. Jacobs University Community Award 2021 for Innovation, Cultural Understanding, and Diversity
11. Jacobs University Hall of Fame
12. Jacobs University Mobility Area’s Scholarship & Jacobs University Faces
13. Academic and Scientific paper reviewer at AfricanNLP workshop, EACL 2021
14. Global Nominee and Benin’s finalist with «Afro Num» - NASA's 2020 World Space Apps Challenge
15. Winner of the National Russian AI Hackathon 2019
16. International interviews and articles on BBC, Voice of America, German, Russian newspapers, and TVs
17. Scientific presentations and publications, Workshops organizations, and Reviewing Services at ACL, EACL, NAACL, AACL, EMNLP, ICML, ICLR, NeuRIPs (2020, 2021, 2022)