Data Equity in Public Health: SDOH, Community Data, AI

Introduction: A Data-Driven Shift in Public Health

The public health landscape is undergoing a transformation driven by the flood of health-related data—from electronic health records to wearable devices, imaging, and digital sources. This shift, accelerated by data science, novel biostatistics, genomics, and artificial intelligence (AI), holds promise for advancing health at population and community levels. Yet it also raises concerns about widening disparities and unintended consequences. Centering data equity—defined as fair data practices that respect human rights and dignity—offers a compass for navigating opportunities while protecting vulnerable communities.

Conference Foundations: Toward Equity in Data and Health Innovation

This viewpoint draws on structured expert dialogue from a Yale School of Public Health conference in April 2024. The event brought together public health leaders, community advocates, and researchers to discuss data equity, SDOH, AI, and community-based data models. The goal was to generate actionable recommendations that researchers, policymakers, and practitioners can apply to embed equity into data collection, analysis, and insight generation, with an eye toward improving health outcomes for diverse populations.

Three Interconnected Thematic Areas

1) Social Determinants of Health (SDOH) in Data: SDOH—economic stability, education, healthcare access, neighborhood context, and social supports—shape health outcomes far more than clinical care alone. Integrating longitudinal, nonclinical data with traditional health data helps illuminate root causes of disparities and informs targeted interventions. 2) AI and Health Data Equity: AI offers powerful tools to map and mitigate inequities, but biases in data and models can amplify disparities if not carefully managed. The conference emphasized community involvement, model transparency, and governance frameworks to ensure AI benefits are equitable. 3) Community-Based Data Approaches: Local, participatory data models empower communities to define priorities, collect data, and shape health initiatives. Mixed-methods approaches—combining quantitative and qualitative data—capture lived experiences and promote trust, accountability, and relevance in health programs.

Key Findings on SDOH Data and Interoperability

SDOH data remain largely outside traditional clinical settings, posing challenges for collection, standardization, and analysis. Initiatives like the Colorado Social Health Information Exchange (CoSHIE) illustrate how secure, centralized platforms can integrate SDOH with health care data to coordinate care and services while protecting privacy. Participants highlighted the need to break data silos, enhance interoperability across EHRs, social services, and community organizations, and address consent and purpose clarity when using SDOH data in decision-making.

Big Data, AI, and Ethical Considerations

Big data and AI can model complex, bidirectional links between social factors and health outcomes, enabling precision public health and more effective interventions. However, risks include biased data, model drift, and unequal access to AI benefits. Strategies discussed included developing bias indexes for AI models, incorporating community perspectives in AI development, and adhering to transparent reporting standards to build trust and accountability.

<h2 Toward Inclusive AI and Trustworthy Data Practices

Discussions emphasized that AI models are only as inclusive as the data they are trained on. Concepts like the A.C.C.E.S.S. AI framework (affirm your aims, consider your communities, cultivate your conversation, embrace your essentials, specify your scope, and scrutinize your space) were highlighted as guiding tools for equitable AI deployment. Open-source versus proprietary models sparked debate about innovation, security, and accessibility, with a shared call for governance that includes nontechnical voices such as social scientists, ethicists, and community advocates.

Community-Based Models: Defining, Engaging, and Empowering Communities

Community-based data models emphasize cocreation, local control, and reciprocal data sharing. Defining “community” is context-specific; inclusive definitions built with community members ensure data reflect diverse identities and experiences. Trust-building, transparency about data ownership, consent, and use, and ensuring communities benefit from data insights are recurring themes. Decentralized data approaches—where communities retain control—are seen as promising pathways to sustainable, locally relevant health interventions.

<h2 Implications for Policy and Practice

The conference outlined five cross-cutting recommendations: (1) Enable big data and interoperability to connect SDOH and health outcomes while safeguarding privacy; (2) Include diverse, nontechnical voices in AI and health discussions to shape ethical guidelines; (3) Foster academia–private sector–local system collaboration to protect and steward data for health equity; (4) Modernize HIPAA and develop AI-specific regulatory guidelines focused on transparency and bias mitigation; (5) Develop new conceptual frameworks that explicitly link data equity with health outcomes and AI governance. These steps aim to ensure that data-driven health innovations advance equity rather than entrench disparities.

Conclusion: A Pathway to Equitable Health Innovation

Data equity is not a peripheral concern but a guiding framework for modern public health. The Yale conference underscores the importance of interdisciplinary collaboration, meaningful community engagement, and adaptable governance to translate data and AI innovations into tangible health benefits for all communities. Realizing this vision will require ongoing attention to interoperability, trust, representation, and accountability as data-enabled public health evolves.