Overview: Accelerating disease research with AI and Bayesian modeling
At the University of Texas at Arlington, researchers are building a powerful bridge between artificial intelligence and statistics to unlock the secrets hidden in massive biological datasets. By combining AI-driven pattern discovery with transparent Bayesian modeling, UT Arlington data scientists are developing tools that help scientists understand how diseases begin, how the immune system responds, and which treatments might work best. The goal is not just speed, but interpretable results you can trust in the lab and clinic.
The team and their mission
Leading this effort is Xinlei (Sherry) Wang, Jenkins Garrett professor of statistics and data science in UT Arlington’s Department of Mathematics. Wang, who also serves as the founding director for research in the Division of Data Science, received a four-year, $1.28 million federal grant to advance the project titled “Statistical and Deep Generative Modeling for Enhanced CyTOF Data Interpretation and Discovery.” Her work centers on creating AI models that can analyze complex biomedical data and translate it into actionable insights for researchers and clinicians.
CyTOF and single-cell data: a wealth of detail
A core part of the research involves CyTOF, a state-of-the-art lab technology that scans thousands of individual cells at once and measures dozens of proteins within each cell. When juxtaposed with single-cell transcriptomics, which sequences genes, CyTOF adds a rich layer of protein-level information. The integration of these data streams provides a fuller picture of cellular behavior in health and disease, enabling researchers to identify cell types, track disease-related changes, and understand immune responses with unprecedented granularity.
From data deluge to interpretable results: the Bayesian framework
The challenge, Wang explains, is to present this torrent of data in a way other scientists can use. The team’s approach is to build a single, scalable Bayesian model that makes the data-generating process transparent and interpretable. In practical terms, a Bayesian parameter might indicate increased protein expression in diseased tissue versus healthy controls. By keeping uncertainty quantification at the core, the framework helps researchers gauge the confidence of findings and understand the variability inherent in biological data.
“AI is powerful, but it’s often a black box,” Wang notes. “We are designing user-friendly, open-source software so end users can run it on their laptops. Our goal is to combine statistical rigor, uncertainty quantification, and scalability—all in one framework.”
Speed, scalability, and real-world impact
One of the standout advantages of integrating AI with Bayesian statistics is speed. Traditional analysis of millions of cells across dozens of proteins could take days or longer. With AI-augmented Bayesian modeling, researchers can obtain reliable, rigorous results in seconds, enabling rapid hypothesis testing and iteration. This speed is critical for translating complex biological signals into potential therapies and treatments for diseases like cancer.
Notable milestones and collaborations
The project has already drawn attention across the research community. A recent Best PhD Poster Award at the 2025 Conference of Texas Statisticians highlighted preliminary results from Wang’s group, including doctoral graduates now moving into tenure-track roles. A Nature Communications publication by Wang, postdoctoral researcher Zeyu Lu, and Lin Xu introduced BIT, a tool that enhances transcriptional regulator identification from epigenomics data, showcasing the credibility and impact of the team’s methods. Collaborators span UT Arlington’s Division of Data Science and mathematics, with cross-institutional partners at UT Southwestern.
Looking ahead: accessible, open science
Beyond the technical advances, the team aims to democratize access to these sophisticated tools. By prioritizing open-source software and making computational workflows portable, the researchers hope to empower scientists who may not have access to massive compute resources. The emphasis on interpretability and usability ensures that the models can support real-world decision-making in laboratories and clinics while maintaining scientific transparency.
Conclusion
UT Arlington’s fusion of AI and Bayesian modeling demonstrates how disciplined statistical thinking, robust uncertainty handling, and scalable computation can accelerate discovery in disease research. As data streams from cutting-edge technologies like CyTOF converge with deep generative modeling, the path from cellular detail to clinical insight becomes clearer—and faster.