Categories: Health/Medicine

Enhancing Cardiovascular Nutrition Guidance with Retrieval-Augmented LLMs: A Cross-Sectional Evaluation

Enhancing Cardiovascular Nutrition Guidance with Retrieval-Augmented LLMs: A Cross-Sectional Evaluation

Introduction

As digital health tools proliferate, large language models (LLMs) and retrieval-augmented generation (RAG) offer promise for delivering accessible, guideline-based nutrition information aimed at preventing cardiovascular disease (CVD). This cross-sectional study investigates how different model architectures perform in providing nutrition guidance aligned with American Heart Association (AHA) recommendations, and whether grounding LLMs in vetted guidelines via RAG can reduce misinformation and potential harm.

Study Rationale

LLMs trained on vast, heterogeneous sources may generate reliable advice, yet they can also reproduce unverified, sometimes misleading content. Grounding responses in external reference materials through RAG helps ensure that generated guidance remains anchored to evidence-based sources. Prior work hints at variability in performance across domains, with controlled, domain-specific systems showing improved consistency and safety. This study extends that work to CV nutrition, a field where precise guidance matters for prevention strategies.

Methods in Brief

Guidelines from the AHA were used as the benchmark for CV dietary recommendations. A dietitian specializing in preventive cardiology developed 30 nutrition questions spanning cooking practices, dietary patterns, and nutrient specifics relevant to heart health. The study compared one RAG-enhanced Llama 3 model (Llama 3+RAG) against three off-the-shelf models: OpenAI’s GPT-4o, Perplexity AI, and Meta AI’s Llama 3-70B. Each model answered questions three times using standardized prompts, and expert reviewers evaluated responses for reliability, harm, guideline adherence, readability, and overall appropriateness. A 15,074-word knowledge base grounded the RAG system, drawing from the AHA 2021 dietary statements and related resources, with citations formatted to enable traceability.

What Was Grounded in Guidelines?

The knowledge base was constructed from AHA sources and high-quality educational materials to explicitly map guidance for topics such as sodium limits, recommended fat types, protein intake, beverages, and meal patterns. The RAG framework used a vector database to retrieve the five most relevant knowledge chunks per query, reinforcing the model’s outputs with source material and formal citations. Temperature settings were tuned to balance factual fidelity with conversational clarity, seeking an optimal middle ground between strict guideline adherence and user-friendly explanations.

Key Findings

The RAG-enhanced Llama 3 model consistently outperformed off-the-shelf models in reliability, guideline adherence, and safety. It showed fewer inappropriate or harmful responses and demonstrated the most credible alignment with AHA recommendations. Readability, while strong in the commercial models, lagged in the RAG-enhanced system due to more formal, guideline-focused language. Importantly, no harmful outputs were observed from the Llama 3+RAG model, whereas some off-the-shelf models produced prescriptive, potentially risky advice or misrepresented guidelines.

Implications for Practice

For health systems considering AI-assisted nutrition guidance, RAG-enhanced models grounded in evidence-based CV nutrition guidelines offer a safer, more trustworthy option than generic LLMs. The study underlines the need for robust citation verification and transparency around sources to prevent citation hallucinations. Moreover, it highlights a critical trade-off: higher guideline fidelity may come with reduced readability. Tiered information delivery—coupling plain-language summaries with detailed guideline language—could address diverse user needs and health literacy levels.

Limitations and Future Directions

Limitations include evaluating a limited set of models at a single time point and relying on qualitative expert assessments. Future work should expand model comparisons, incorporate longitudinal evaluations to monitor drift, and involve patient testing to assess comprehension and behavior change. The integration of end-user feedback, clinician workflows, and governance frameworks will be essential as AI tools transition toward routine clinical use.

Conclusion

Our cross-sectional study demonstrates that a RAG-enhanced Llama 3 model, fine-tuned to CV dietary guidelines, delivers more guideline-adherent, reliable nutrition guidance for CVD prevention compared with several off-the-shelf LLMs. While readability lags slightly behind, the potential for safer, evidence-based AI-driven nutrition support in cardiovascular health remains compelling, warranting further development and careful clinical integration.