I am currently a Computer Science PhD candidate at the University of Toronto, supervised by Dr. Bo Wang.


My research focuses on multimodal integration, biology foundational models, and LLM in biomedicine. I employ advanced AI techniques to harness diverse biological data, aiming to develop models that drive precision medicine and foster innovative approaches to drug discovery.

Education

2020—Present

University of Toronto

Ph.D. in Computer Science

Advisor: Prof. Bo Wang; Supervisory committee: Prof. Bo Wang, Prof. Anna Goldenberg, Prof. Benjamin Haibe-Kains

2015—2020

University of Toronto

B.A.Sc. in Computer Engineering

Publications

Ctrl-DNA: Controllable Cell-Type-Specific Regulatory DNA Design via Constrained RL

arXiv 2025

Ctrl-DNA: Controllable Cell-Type-Specific Regulatory DNA Design via Constrained RL

Xingyu Chen*, Shihao Ma*, Runsheng Lin, Jiecong Lin, Bo Wang

We have many foundation models or language models for DNAs, but can we control them? Introducing Ctrl-DNA - a reinforcement learning framework for controllable cis-regulatory sequence generation.

BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model

arXiv 2025

BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model

Adibvafa Fallahpour*, Andrew Magnuson*, Purav Gupta*, Shihao Ma, Jack Naimer, Arnav Shah, Haonan Duan, Omar Ibrahim, Hani Goodarzi, Chris J Maddison, Bo Wang

BioReason - the first model to successfully integrate DNA foundation models (eg, Evo 2) with LLMs (eg, Qwen3) for biological reasoning!

Moving towards genome-wide data integration for patient stratification with Integrate Any Omics

Nature Machine Intelligence 2025

Moving towards genome-wide data integration for patient stratification with Integrate Any Omics

Shihao Ma, Andy G.X. Zeng, Benjamin Haibe-Kains, Anna Goldenberg, John E. Dick, Bo Wang

IntegrAO (Integrate Any Omics) is an unsupervised platform designed to tackle the challenges of incomplete multi-omics data. Crucially, it can be seamlessly transformed into a prediction model after integration, enabling robust classification of new patient samples—even when only partial omics data is available.

AGILE platform: a deep learning powered approach to accelerate LNP development for mRNA delivery

Nature Communications 2024

AGILE platform: a deep learning powered approach to accelerate LNP development for mRNA delivery

Yue Xu*, Shihao Ma*, Haotian Cui*, Jingan Chen, Shufen Xu, Fanglin Gong, Alex Golubovic, Muye Zhou, Kevin Chang Wang, Andrew Varley, Rick Xing Ze Lu, Bo Wang, Bowen Li

AGILE (AI-Guided Ionizable Lipid Engineering) platform streamlines the iterative development of ionizable lipids, crucial components for LNP-mediated mRNA delivery.

Comparison of machine learning and conventional statistical modeling for predicting readmission following acute heart failure hospitalization

American Heart Journal 2024

Comparison of machine learning and conventional statistical modeling for predicting readmission following acute heart failure hospitalization

Karem Abdul-Samad, Shihao Ma, David E Austin, Alice Chong, Chloe X Wang, Xuesong Wang, Peter C Austin, Heather J Ross, Bo Wang, Douglas S Lee

Developing accurate models for predicting the risk of 30-day readmission is a major healthcare interest.

Comparison of machine learning and the regression-based EHMRG model for predicting early mortality in acute heart failure

International Journal of Cardiology 2022

Comparison of machine learning and the regression-based EHMRG model for predicting early mortality in acute heart failure

David E Austin, Douglas S Lee, Chloe X Wang, Shihao Ma, Xuesong Wang, Joan Porter, Bo Wang

We developed ML algorithms to predict 7-day and 30-day mortality in patients with acute HF and compared these with an existing logistic regression model at the same timepoints.

Factors associated with SARS-CoV-2 test positivity in long-term care homes: a population-based cohort analysis using machine learning

The Lancet Regional Health–Americas 2022

Factors associated with SARS-CoV-2 test positivity in long-term care homes: a population-based cohort analysis using machine learning

Douglas S Lee, Chloe X Wang, Finlay A McAlister, Shihao Ma, Anna Chu, Paula A Rochon, Padma Kaul, Peter C Austin, Xuesong Wang, Sunil V Kalmady, Jacob A Udell, Michael J Schull, Barry B Rubin, Bo Wang

We used machine learning to identify resident and community characteristics predictive of SARS-Cov-2 infection.

Predictors of mortality among long‐term care residents with SARS‐CoV‐2 infection

Journal of the American Geriatrics Society 2021

Predictors of mortality among long‐term care residents with SARS‐CoV‐2 infection

Douglas S Lee*, Shihao Ma*, Anna Chu, Chloe X Wang, Xuesong Wang, Peter C Austin, Finlay A McAlister, Sunil V Kalmady, Moira K Kapral, Padma Kaul, Dennis T Ko, Paula A Rochon, Michael J Schull, Barry B Rubin, Bo Wang, CORONA Collaboration

We studied residents living in LTC homes in Ontario, Canada and examined predictors of all-cause death within 30 days after a positive test for SARS-CoV-2.

Long-term mortality risk stratification of liver transplant recipients: real-time application of deep learning algorithms on longitudinal data

The Lancet Digital Health 2021

Long-term mortality risk stratification of liver transplant recipients: real-time application of deep learning algorithms on longitudinal data

Osvald Nitski, Amirhossein Azhie, Fakhar Ali Qazi-Arisar, Xueqi Wang, Shihao Ma, Leslie Lilly, Kymberly D Watt, Josh Levitsky, Sumeet K Asrani, Douglas S Lee, Barry B Rubin, Mamatha Bhat, Bo Wang

We propose Deep Learning models designed for longitudinal data that reliably predicts an updated clinical outlook for individual patients.

Experience

Sept. 2020 - Present

Machine Learning Researcher Vector Institute

Advisor: Bo Wang

AI for biology, multimodal integration, biology foundational models.

Summer 2022

Research Scientist Intern Fable Therapeutics

Advisor: Phlip Kim

De novo antibody protein design, 3D geometric deep learning, protein structure generation.

Summer 2019

Machine Learning Intern University Health Network

Advisor: Bo Wang

Prognosis prediction of patients with heart failure, deep learning for single-cell RNA-seq data.

May 2017 - May 2018

Software Engineering Intern IBM

Worked on DB2 Availability & Recovery Domain.