Categories: Health Tech & Nutrition

Evaluating LLMs and Retrieval-Augmented Generation for Guideline-Adherent Cardiovascular Nutrition Guidance

Evaluating LLMs and Retrieval-Augmented Generation for Guideline-Adherent Cardiovascular Nutrition Guidance

Introduction: Digital Diet Guidance Meets Cardiovascular Health

As digital health tools proliferate, large language models (LLMs) and retrieval-augmented generation (RAG) offer promising avenues to deliver scalable, guideline-based nutrition information for cardiovascular disease (CVD) prevention. Grounded in the American Heart Association’s (AHA) dietary recommendations, these technologies aim to translate complex nutrition science into accessible advice that supports heart-healthy choices. Yet the quality and reliability of AI-generated guidance remain a critical concern, given the potential for misinterpretation of guidelines or reliance on unverified sources.

What RAG Brings to Nutrition Advice

RAG enhances LLMs by incorporating external knowledge retrieval, allowing responses to be anchored in vetted sources such as AHA statements and other credible guidelines. By retrieving relevant content before generating an answer, RAG helps ensure that recommendations reflect current evidence and reduces the risk of fabricating or misquoting sources. In cardiovascular nutrition, where even small misstatements can affect public health outcomes, such grounding is essential.

Study Overview: A Comparative Benchmark

Researchers conducted a cross-sectional evaluation to compare an RAG-enhanced Llama 3-70B model (Llama 3+RAG) against three off-the-shelf models: OpenAI’s GPT-4o, Perplexity AI, and Meta AI’s Llama 3-70B. A 30-question bank, curated and reviewed by a preventive cardiology dietitian, covered key CV dietary topics, including oils, sodium, macronutrient ranges, and diet patterns like ketogenic and intermittent fasting. The AHA CV dietary guidelines served as the benchmark for adherence and reliability. The RAG framework integrated a 15,074-word knowledge bank extracted from the AHA’s 2021 dietary and lifestyle recommendations and related sources, enabling the model to cite credible references during responses.

How the Evaluation Was Conducted

Each model answered questions three times using a standardized prompt encouraging guideline-cited responses. Three expert reviewers rated replies for reliability, appropriateness, potential harm, readability, and strict guideline adherence. The framework also assessed the presence and accuracy of citations, the functionality of linked sources, and whether answers remained consistent across attempts. The goal was to identify not just accuracy, but also the safety and usability of AI-generated CV nutrition guidance.

Key Findings: RAG-Enhanced Llama 3 Shines in Safety and Adherence

The Llama 3+RAG model outperformed the off-the-shelf models across critical metrics related to guideline adherence and appropriateness, with no observed harm in the responses. While its readability tended to be lower due to more formal, guideline-heavy language, the model offered reliable, evidence-based guidance aligned with AHA recommendations. In contrast, the off-the-shelf models demonstrated higher readability but a greater incidence of partially appropriate or harmful guidance and inconsistent citation practices. These results highlight a trade-off: enhancing factual grounding through RAG can improve safety and fidelity to guidelines, albeit at the cost of readability.

Implications for Health Systems and Clinicians

For health systems considering AI-powered nutrition tools, the study underscores the value of RAG-based customization when deploying CV dietary guidance. Hospitals and clinics can reduce the risk of misinterpretation or misrepresentation by anchoring AI responses to vetted guidelines and ensuring references are verifiable. This approach supports clinical workflows with consistent, evidence-based information while preserving the capacity to explain complex nutrition concepts in a structured, patient-friendly way through tiered content presentation and plain-language summaries.

Challenges and Future Directions

Despite the improvements seen with RAG, challenges remain. Model drift, citation fidelity, and readability balance require ongoing monitoring and governance. The study also notes the importance of patient-facing evaluation to understand how end users interpret AI-generated nutrition guidance. Future work should explore broader model comparisons, more diverse knowledge banks, and governance frameworks that balance transparency, accountability, and patient safety in clinical nutrition AI.

Conclusion: A Path Forward for Evidence-Based AI in Heart Health

The evaluated RAG-enhanced Llama 3 architecture demonstrates clear advantages for delivering guideline-adherent CV nutrition information, reducing the risk of harm, and improving reliability relative to off-the-shelf models. While readability remains an area for refinement, integrating domain-specific guidelines with robust citation practices represents a promising pathway for AI-assisted cardiovascular health education and decision support.