I am currently a Computer Science PhD candidate at the University of Toronto, supervised by Dr. Bo Wang.
My research focuses on multimodal integration, biology foundational models, and LLM in biomedicine. I employ advanced AI techniques to harness diverse biological data, aiming to develop models that drive precision medicine and foster innovative approaches to drug discovery.
Update Setember 2025: I recently joined Xaira Therapeutics as a AI Scientist intern!
Education
University of Toronto
Ph.D. in Computer Science
Advisor: Prof. Bo Wang; Supervisory committee: Prof. Bo Wang, Prof. Anna Goldenberg, Prof. Benjamin Haibe-Kains
University of Toronto
B.A.Sc. in Computer Engineering
Publications

NeurIPS Spotlight (top 3.2%) 2025
Ctrl-DNA: Controllable Cell-Type-Specific Regulatory DNA Design via Constrained RL
Xingyu Chen*, Shihao Ma*, Runsheng Lin, Jiecong Lin, Bo Wang
We have many foundation models or language models for DNAs, but can we control them? Introducing Ctrl-DNA - a reinforcement learning framework for controllable cis-regulatory sequence generation.

NeurIPS 2025
BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model
Adibvafa Fallahpour*, Andrew Magnuson*, Purav Gupta*, Shihao Ma, Jack Naimer, Arnav Shah, Haonan Duan, Omar Ibrahim, Hani Goodarzi, Chris J Maddison, Bo Wang
BioReason - the first model to successfully integrate DNA foundation models (eg, Evo 2) with LLMs (eg, Qwen3) for biological reasoning!

Nature Machine Intelligence 2025
Moving towards genome-wide data integration for patient stratification with Integrate Any Omics
Shihao Ma, Andy G.X. Zeng, Benjamin Haibe-Kains, Anna Goldenberg, John E. Dick, Bo Wang
IntegrAO (Integrate Any Omics) is an unsupervised platform designed to tackle the challenges of incomplete multi-omics data. Crucially, it can be seamlessly transformed into a prediction model after integration, enabling robust classification of new patient samples—even when only partial omics data is available.

Nature Communications 2024
AGILE platform: a deep learning powered approach to accelerate LNP development for mRNA delivery
Yue Xu*, Shihao Ma*, Haotian Cui*, Jingan Chen, Shufen Xu, Fanglin Gong, Alex Golubovic, Muye Zhou, Kevin Chang Wang, Andrew Varley, Rick Xing Ze Lu, Bo Wang, Bowen Li
AGILE (AI-Guided Ionizable Lipid Engineering) platform streamlines the iterative development of ionizable lipids, crucial components for LNP-mediated mRNA delivery.

American Heart Journal 2024
Comparison of machine learning and conventional statistical modeling for predicting readmission following acute heart failure hospitalization
Karem Abdul-Samad, Shihao Ma, David E Austin, Alice Chong, Chloe X Wang, Xuesong Wang, Peter C Austin, Heather J Ross, Bo Wang, Douglas S Lee
Developing accurate models for predicting the risk of 30-day readmission is a major healthcare interest.

International Journal of Cardiology 2022
Comparison of machine learning and the regression-based EHMRG model for predicting early mortality in acute heart failure
David E Austin, Douglas S Lee, Chloe X Wang, Shihao Ma, Xuesong Wang, Joan Porter, Bo Wang
We developed ML algorithms to predict 7-day and 30-day mortality in patients with acute HF and compared these with an existing logistic regression model at the same timepoints.

The Lancet Regional Health–Americas 2022
Factors associated with SARS-CoV-2 test positivity in long-term care homes: a population-based cohort analysis using machine learning
Douglas S Lee, Chloe X Wang, Finlay A McAlister, Shihao Ma, Anna Chu, Paula A Rochon, Padma Kaul, Peter C Austin, Xuesong Wang, Sunil V Kalmady, Jacob A Udell, Michael J Schull, Barry B Rubin, Bo Wang
We used machine learning to identify resident and community characteristics predictive of SARS-Cov-2 infection.

Journal of the American Geriatrics Society 2021
Predictors of mortality among long‐term care residents with SARS‐CoV‐2 infection
Douglas S Lee*, Shihao Ma*, Anna Chu, Chloe X Wang, Xuesong Wang, Peter C Austin, Finlay A McAlister, Sunil V Kalmady, Moira K Kapral, Padma Kaul, Dennis T Ko, Paula A Rochon, Michael J Schull, Barry B Rubin, Bo Wang, CORONA Collaboration
We studied residents living in LTC homes in Ontario, Canada and examined predictors of all-cause death within 30 days after a positive test for SARS-CoV-2.

The Lancet Digital Health 2021
Long-term mortality risk stratification of liver transplant recipients: real-time application of deep learning algorithms on longitudinal data
Osvald Nitski, Amirhossein Azhie, Fakhar Ali Qazi-Arisar, Xueqi Wang, Shihao Ma, Leslie Lilly, Kymberly D Watt, Josh Levitsky, Sumeet K Asrani, Douglas S Lee, Barry B Rubin, Mamatha Bhat, Bo Wang
We propose Deep Learning models designed for longitudinal data that reliably predicts an updated clinical outlook for individual patients.
Experience
AI Scientist Intern — Xaira Therapeutics
Machine Learning Researcher — Vector Institute
Advisor: Bo Wang
AI for biology, multimodal integration, biology foundational models.
Research Scientist Intern — Fable Therapeutics
Advisor: Phlip Kim
De novo antibody protein design, 3D geometric deep learning, protein structure generation.
Machine Learning Intern — University Health Network
Advisor: Bo Wang
Prognosis prediction of patients with heart failure, deep learning for single-cell RNA-seq data.
Software Engineering Intern — IBM
Worked on DB2 Availability & Recovery Domain.