Data in Public Health: Equity, AI, and Community-Based Health Innovation

Introduction: Data as a Catalyst for Public Health Reform

The public health landscape is undergoing rapid change driven by the vast amounts of data generated through the digital transformation of health and related fields. Advances in data science, novel biostatistical methods, genomics, and artificial intelligence (AI)—coupled with multimodal datasets from imaging and wearables—are accelerating the pace of health innovation. Yet this transformation also raises concerns about widening health disparities and unintended consequences. Framing the discussion through data equity helps ensure innovations in public health promote human rights, opportunity, and dignity for all communities.

This perspective synthesizes insights from a Yale School of Public Health conference held in April 2024, focusing on how data equity intersects with public health research, policy, and practice. The aim is to offer practical, interdisciplinary recommendations that help researchers, policymakers, and practitioners embed equity into data collection, analysis, and insight generation while advancing health outcomes.

Three Interconnected Themes: SDOH, AI, and Community-Based Data

The conference structured its exploration around three core themes: how social determinants of health (SDOH) are depicted in data, the impact of AI on data equity in health, and community-based models for data. This framework reflects the reality that health outcomes are shaped by economic stability, education, healthcare access, neighborhood environments, and social context—and that data systems must capture these factors to guide effective interventions.

SDOH data, often collected outside traditional clinical settings, require interoperable, privacy-preserving approaches to be truly actionable. The integration of EHR data with nonclinical sources—such as social services data, housing, and education information—offers the potential to illuminate root causes of disease and inequity, while raising questions about consent, governance, and data sharing across jurisdictions.

AI and machine learning (ML) bring powerful tools to model complex relationships between social factors and health outcomes. However, they also risk amplifying biases if training data underrepresent certain groups or if models fail to generalize across languages and cultures. The conference highlighted strategies to mitigate these risks, including bias auditing, transparent reporting, and community engagement throughout AI development.

Community-Based Data: Local Voice, Local Vision

Community-based data models emphasize participatory engagement where communities help define priorities, collect data, and shape interventions. Such approaches aim to reflect lived experiences, improve trust, and ensure that results translate into tangible benefits, such as enhanced services or targeted resources. A key takeaway is that “community” is not monolithic—definitions should emerge from cocreation with community members to reflect diverse identities and intersecting experiences.

Challenges remain, including historical power imbalances and mistrust in institutions. Transparent data ownership, consent, and benefits-sharing practices are essential. Mixed-methods research, combining quantitative data with qualitative insights, provides a fuller picture of community needs and supports equitable, locally relevant health interventions. Decentralized data models, where communities retain control over their data, can strengthen sovereignty and foster sustained collaboration at the local level.

AI, Equity, and Responsible Innovation

AI’s potential to transform health care hinges on equitable design. Participants urged governance that includes diverse voices beyond technologists—encompassing social sciences, ethics, and community representatives—so AI tools reflect broader social realities. Open-source models offer innovation opportunities for low- and middle-income countries, but raise considerations about security, quality, and long-term support.

Key concepts discussed include “machine unlearning”—the idea that models could adjust or forget biased knowledge—alongside governance frameworks that promote transparency and accountability. Fine-tuning AI systems with diverse health datasets can reduce bias but must be balanced against cost and risk of introducing new inaccuracies. Importantly, data equity must be linked to health outcomes through frameworks that address data collection, analysis, and application biases, rather than focusing solely on technical fixes.

Policy Implications and Practical Recommendations

The conference proposed five cross-cutting recommendations for policymakers and practitioners:

Enable big data and interoperability to connect SDOH with health outcomes, while improving data standardization and consent processes.
Include diverse, nontechnical voices in AI and health discussions to ensure governance reflects broader societal values and needs.
Foster collaboration among academia, industry, and local health systems to protect and expand data for public health and health innovation, emphasizing trust and privacy.
Modernize HIPAA and develop AI-specific guidelines that address transparency, bias mitigation, and data equity.
Develop new conceptual frameworks that explicitly connect data equity, AI, and improved health outcomes to guide ethical and effective application.

Conclusion: Toward a More Equitable Data-Driven Public Health

The Yale conference underscored that data equity should drive the future of health innovation. Real progress requires breaking down data silos, enhancing interoperability, and ensuring AI applications are transparent, interpretable, and inclusive. By centering community voices, adopting mixed-methods approaches, and building robust governance, stakeholders can translate data-driven insights into equitable health benefits. The path forward demands sustained, cross-sector collaboration and scalable strategies that acknowledge resource constraints while prioritizing health justice for all communities.