I am currently a Computer Science PhD candidate at the University of Toronto, supervised by Dr. Bo Wang.
My research focuses on multimodal integration, biology foundational models, and LLM in biomedicine. I employ advanced AI techniques to harness diverse biological data, aiming to develop models that drive precision medicine and foster innovative approaches to drug discovery.
Education
University of Toronto
Ph.D. in Computer Science
Advisor: Prof. Bo Wang; Supervisory committee: Prof. Bo Wang, Prof. Anna Goldenberg, Prof. Benjamin Haibe-Kains
University of Toronto
B.A.Sc. in Computer Engineering
Publications

arXiv 2025
Ctrl-DNA: Controllable Cell-Type-Specific Regulatory DNA Design via Constrained RL
Xingyu Chen*, Shihao Ma*, Runsheng Lin, Jiecong Lin, Bo Wang
We have many foundation models or language models for DNAs, but can we control them? Introducing Ctrl-DNA - a reinforcement learning framework for controllable cis-regulatory sequence generation.

arXiv 2025
BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model
Adibvafa Fallahpour*, Andrew Magnuson*, Purav Gupta*, Shihao Ma, Jack Naimer, Arnav Shah, Haonan Duan, Omar Ibrahim, Hani Goodarzi, Chris J Maddison, Bo Wang
BioReason - the first model to successfully integrate DNA foundation models (eg, Evo 2) with LLMs (eg, Qwen3) for biological reasoning!

Nature Machine Intelligence 2025
Moving towards genome-wide data integration for patient stratification with Integrate Any Omics
Shihao Ma, Andy G.X. Zeng, Benjamin Haibe-Kains, Anna Goldenberg, John E. Dick, Bo Wang
IntegrAO (Integrate Any Omics) is an unsupervised platform designed to tackle the challenges of incomplete multi-omics data. Crucially, it can be seamlessly transformed into a prediction model after integration, enabling robust classification of new patient samples—even when only partial omics data is available.

Nature Communications 2024
AGILE platform: a deep learning powered approach to accelerate LNP development for mRNA delivery
Yue Xu*, Shihao Ma*, Haotian Cui*, Jingan Chen, Shufen Xu, Fanglin Gong, Alex Golubovic, Muye Zhou, Kevin Chang Wang, Andrew Varley, Rick Xing Ze Lu, Bo Wang, Bowen Li
AGILE (AI-Guided Ionizable Lipid Engineering) platform streamlines the iterative development of ionizable lipids, crucial components for LNP-mediated mRNA delivery.

American Heart Journal 2024
Comparison of machine learning and conventional statistical modeling for predicting readmission following acute heart failure hospitalization
Karem Abdul-Samad, Shihao Ma, David E Austin, Alice Chong, Chloe X Wang, Xuesong Wang, Peter C Austin, Heather J Ross, Bo Wang, Douglas S Lee
Developing accurate models for predicting the risk of 30-day readmission is a major healthcare interest.

International Journal of Cardiology 2022
Comparison of machine learning and the regression-based EHMRG model for predicting early mortality in acute heart failure
David E Austin, Douglas S Lee, Chloe X Wang, Shihao Ma, Xuesong Wang, Joan Porter, Bo Wang
We developed ML algorithms to predict 7-day and 30-day mortality in patients with acute HF and compared these with an existing logistic regression model at the same timepoints.

The Lancet Regional Health–Americas 2022
Factors associated with SARS-CoV-2 test positivity in long-term care homes: a population-based cohort analysis using machine learning
Douglas S Lee, Chloe X Wang, Finlay A McAlister, Shihao Ma, Anna Chu, Paula A Rochon, Padma Kaul, Peter C Austin, Xuesong Wang, Sunil V Kalmady, Jacob A Udell, Michael J Schull, Barry B Rubin, Bo Wang
We used machine learning to identify resident and community characteristics predictive of SARS-Cov-2 infection.

Journal of the American Geriatrics Society 2021
Predictors of mortality among long‐term care residents with SARS‐CoV‐2 infection
Douglas S Lee*, Shihao Ma*, Anna Chu, Chloe X Wang, Xuesong Wang, Peter C Austin, Finlay A McAlister, Sunil V Kalmady, Moira K Kapral, Padma Kaul, Dennis T Ko, Paula A Rochon, Michael J Schull, Barry B Rubin, Bo Wang, CORONA Collaboration
We studied residents living in LTC homes in Ontario, Canada and examined predictors of all-cause death within 30 days after a positive test for SARS-CoV-2.

The Lancet Digital Health 2021
Long-term mortality risk stratification of liver transplant recipients: real-time application of deep learning algorithms on longitudinal data
Osvald Nitski, Amirhossein Azhie, Fakhar Ali Qazi-Arisar, Xueqi Wang, Shihao Ma, Leslie Lilly, Kymberly D Watt, Josh Levitsky, Sumeet K Asrani, Douglas S Lee, Barry B Rubin, Mamatha Bhat, Bo Wang
We propose Deep Learning models designed for longitudinal data that reliably predicts an updated clinical outlook for individual patients.
Experience
Machine Learning Researcher — Vector Institute
Advisor: Bo Wang
AI for biology, multimodal integration, biology foundational models.
Research Scientist Intern — Fable Therapeutics
Advisor: Phlip Kim
De novo antibody protein design, 3D geometric deep learning, protein structure generation.
Machine Learning Intern — University Health Network
Advisor: Bo Wang
Prognosis prediction of patients with heart failure, deep learning for single-cell RNA-seq data.
Software Engineering Intern — IBM
Worked on DB2 Availability & Recovery Domain.