Categories: Health Technology / Cardiovascular Nutrition

Enhancing Cardiovascular Nutrition Guidance: A Cross-Sectional Evaluation of LLMs with Retrieval-Augmented Generation

Enhancing Cardiovascular Nutrition Guidance: A Cross-Sectional Evaluation of LLMs with Retrieval-Augmented Generation

Introduction: The Promise and Peril of AI in Cardiovascular Nutrition

As digital health tools proliferate, large language models (LLMs) and generative AI offer the potential to scale evidence-based nutrition education for cardiovascular disease (CVD) prevention. Grounded in the American Heart Association’s (AHA) guideline framework, these technologies aim to improve health literacy while ensuring information reliability. Yet the breadth of training data means LLMs can generate inconsistent, off-guideline, or even harmful nutrition guidance if not properly anchored to trusted sources.

What is Retrieval-Augmented Generation (RAG) and Why It Matters

RAG combines a neural language model with an external knowledge retrieval system. By grounding responses in vetted reference materials—such as AHA dietary guidelines—RAG can reduce misinformation and enhance adherence to clinical standards. This study extends prior work by benchmarking a RAG-enhanced Llama 3 model against several off-the-shelf options to assess performance in CV nutrition guidance.

Methods in Brief: How the Benchmark Was Built

Researchers developed a 30-question bank covering CV nutrition topics from cooking oils to sodium intake and major dietary patterns. A registered dietitian specializing in preventive cardiology validated the questions. The AHA 2021 dietary framework served as the gold standard. Models tested included OpenAI GPT-4o, Perplexity, Llama 3-70B, and a 15,074-word RAG knowledge base tuned to AHA content. The RAG system retrieved the top five relevant knowledge chunks, which were then used to ground the model’s answers with proper citations.

What Was Measured and How

Responses were evaluated for reliability, harm, guideline adherence, and overall appropriateness using expert assessment. Readability was gauged with established metrics to balance accessibility with clinical precision. The study employed a zero-shot prompt across three attempts per question to gauge consistency, with 3 expert reviewers resolving any disagreements.

Key Findings: RAG-Enhanced Llama 3 Outperforms Off-the-Shelf Models

The RAG-enhanced Llama 3 model demonstrated higher guideline adherence and appropriateness, with no observed harmful outputs. While readability lagged behind some off-the-shelf models due to more technical phrasing, the outputs were consistently aligned with AHA recommendations and properly cited. In contrast, the three off-the-shelf models exhibited more variable performance and introduced instances of misleading or overly prescriptive guidance, particularly on topics like sodium targets and caloric needs.

Reliability and Safety

Across repeated prompts, the RAG-enhanced model showed less drift and fewer harmful responses than its peers. The ability to ground answers in the knowledge bank reduced the incidence of guideline misinterpretation and fabricated citations—a critical concern when AI-generated dietary advice influences real-world behavior.

Implications for Healthcare Delivery

For health systems considering AI-assisted nutrition guidance, RAG-based approaches offer a pathway to safer, guideline-consistent patient education. Deployment strategies must address operational realities, including latency from retrieval steps, HIPAA considerations, and the need for ongoing validation against current CV nutrition guidelines. A tiered information approach—combining plain-language summaries with detailed, evidence-based explanations—can improve readability without sacrificing accuracy.

Limitations and Future Directions

Limitations include evaluating only a subset of models and relying on qualitative expert assessments. Future work should explore broader model families, longitudinal performance tracking to capture drift, and patient-centered usability studies to assess comprehension and behavior change. Advancing evaluation metrics to harness readability alongside guideline fidelity will help standardize comparisons across clinical domains.

Conclusion: A Path Forward for Evidence-Based AI Nutrition Tools

The study indicates that RAG-enhanced LLMs grounded in CV dietary guidelines can outperform off-the-shelf models in delivering safe, guideline-adherent nutrition information for CVD prevention. While readability may improve with user-centric prompt design and content structuring, the core value lies in reliable, evidence-based guidance supported by verified sources.