AI and Bayesian modeling accelerate disease research at UT Arlington
Artificial intelligence is transforming how researchers interpret vast biological datasets, but the real breakthroughs come from the people designing the algorithms. At the University of Texas at Arlington, a team of data scientists is building sophisticated statistical tools that let AI uncover how diseases begin, how the immune system responds, and which treatments may work best. The effort centers on integrating cutting-edge technologies like CyTOF with Bayesian inference to deliver fast, reliable insights from millions of single-cell measurements.
A strong foundation: Bayesian statistics meets deep generative modeling
Xinlei (Sherry) Wang, the Jenkins Garrett professor of statistics and data science in UT Arlington’s Department of Mathematics, leads a four-year, $1.28 million federal grant project titled “Statistical and Deep Generative Modeling for Enhanced CyTOF Data Interpretation and Discovery.” The work aims to create AI-powered models that can explain complex biomedical data in human terms, not as opaque calculations. By adopting a Bayesian framework, the team develops a single, interpretable model that reveals how CyTOF data — which captures thousands of cells and dozens of proteins — is generated and what makes diseased cells distinct from healthy ones.
“AI is powerful, but it’s often a black box,” Wang said. “We’re designing user-friendly, open-source software so end users can run it on their laptops. Our framework balances statistical rigor, uncertainty quantification, and scalability.”
CyTOF and the single-cell revolution
CyTOF (cytometry by time of flight) is a state-of-the-art lab technology that analyzes at the single-cell level, measuring dozens of protein expressions across millions of cells. When paired with next-generation single-cell transcriptomics, researchers can assemble a more complete molecular portrait of cellular states. The challenge has been translating this flood of high-dimensional data into actionable insights. Wang’s team is building tools that render the data interpretable, enabling scientists to see which cell types or protein signatures are linked to disease progression and treatment response.
From data to discovery: turning numbers into actionable biology
The Bayesian approach provides a transparent map of uncertainty, so researchers can trust the inferences drawn from millions of measurements. A core idea is to model what is known about the data while remaining open to new patterns that emerge from the analysis. For example, a model parameter might quantify increased protein expression in disease versus control groups, offering a concrete target for further study or therapeutic development.
By combining CyTOF with single-cell transcriptomics, the team can trace how genetic programs translate into protein expression and cellular behavior. This integrated view helps identify distinct cell populations that drive disease, potential regulatory mechanisms, and candidate interventions that could be tested in the lab or clinic.
Timely wins, growing impact
The UT Arlington initiative is already earning recognition. A recent PhD award for a doctoral student who contributed to the project highlights the strength of the team’s interdisciplinary approach. In addition, a Nature Communications study co-authored by Wang and colleagues introduced BIT, a Bayesian tool for identifying transcriptional regulators from epigenomics-based queries, demonstrating the broader reach of their methodology in boosting gene research accuracy.
Team members extend beyond UT Arlington’s math department to include the Division of Data Science, other UT campuses, and collaborators at UT Southwestern. The collaborative network underpins a scalable platform that can handle the data deluge produced by modern biology while remaining accessible to researchers who may not be BI specialists.
Open science for broad accessibility
Wang emphasizes that the ultimate goal is usable software with clear outputs. The project prioritizes open-source development, enabling researchers to run tools directly on personal devices. This approach supports rapid iteration, reproducibility, and wider adoption, moving beyond narrow “black-box” AI systems toward transparent, trustworthy analytics that can inform experimental design and therapeutic strategies.
Why this matters for disease research
As biology becomes increasingly data-rich, the combination of AI speed, Bayesian interpretability, and multi-omics integration offers a path to faster, more reliable discoveries. For diseases like cancer, where subtle shifts in cell states can have major consequences, the UT Arlington effort promises a framework that translates complex cellular portraits into concrete hypotheses for treatment and prevention. The collaboration also serves as a model for how universities can foster interdisciplinary teams — statisticians, data scientists, and domain experts — to drive real-world impact through responsible AI.