Categories: Healthcare AI, Lung Cancer

Large Language Models in Lung Cancer: A Comprehensive Systematic Review

Large Language Models in Lung Cancer: A Comprehensive Systematic Review

Introduction: The Rise of LLMs in Lung Cancer Care

Large language models (LLMs) are increasingly explored as tools to assist in the full cycle of lung cancer (LC) management, from prevention and screening to diagnosis, treatment planning, and supportive care. This systematic review synthesizes recent evidence on how LLMs are being applied to LC, what they promise, and where their limitations lie. By collating studies from 2023–2024 and beyond, we aim to clarify where clinicians and researchers should focus efforts to safely integrate LLMs into routine LC care.

Methods at a Glance

Following PRISMA-inspired methods, we screened multiple databases up to January 1, 2025, selecting studies that examined LLMs in LC contexts. Data were extracted on model type, application scenario, input/output formats, and outcomes. Quality appraisal used tools adapted to clinical AI tasks, including QUADAS-2 for diagnostic work, ROBINS-I for interventions, and PROBAST for predictive modeling, with input from oncology experts to tailor criteria to LC-specific challenges. While meta-analysis was not planned due to heterogeneity, descriptive statistics and narrative synthesis highlight patterns across domains and prompts.

Key Findings: Seven Application Domains

Across 28 included studies (documented in 2023–2024, with several preprints), LLMs touched seven domains in LC care: auxiliary diagnosis, information extraction, question answering, scientific research, medical education, nursing support, and treatment decision-making. Most work focused on non–small cell lung cancer (NSCLC) and leveraged LLMs to interpret reports, extract relevant facts, and provide decision support. General-purpose models such as GPT-3.5/4, LLaMA-2, and Bard were commonly evaluated, with several studies noting that domain-tuned models (e.g., ClinicalBERT, Med-PaLM 2) can outperform broad, general models on specialized LC tasks in some contexts. Multimodal capabilities emerged in a subset of studies, with image interpretation and cross-modal reasoning showing promise for integrated LC workflows.

Diagnostic Support and Screening

LLMs aided LC staging, interpretation of CT findings (including low-dose CT for screening), and generation of computable scores (e.g., lung-RADS) from radiology reports. These applications can streamline triage, standardize reporting, and support population-level screening initiatives, though they require robust prospective validation to ensure accuracy and safety across diverse populations.

Information Extraction and Knowledge Management

Many studies used LLMs to extract structured data from text-heavy sources—electronic medical records, pathology reports, radiology notes—and to convert free-text findings into computable inputs for downstream analyses, such as trial eligibility or survival predictions. Prompt engineering, templates, and few-shot learning were common strategies to improve extraction quality.

Question Answering, Education, and Research Support

LLMs served as knowledge bases for clinicians and patients, offering explanations of LC concepts, treatment options, and study results. In research settings, they helped generate annotations for clinical trials, extract safety/efficacy data, and summarize conference proceedings, accelerating information dissemination and synthesis for busy oncologists.

Treatment Planning and Patient Follow-up

Early demonstrations show LLMs proposing treatment considerations and drafting plan outlines, including chemotherapy/radiation sequences and symptom management. However, these applications require careful alignment with guidelines and patient-specific factors, with ongoing emphasis on human oversight to prevent bias or misinterpretation.

Prompts, Training, and Model Selection

Prompt engineering is central to performance. About half of the studies explicitly describe prompting strategies, using templates, instructional prompts, and zero-/few-shot approaches. Training data were predominantly text-based (reports, abstracts, guidelines), with a minority addressing image data. Some evidence suggests lightweight or domain-specialized models can outperform general LLMs on narrow LC tasks, while larger, multimodal models often excel in broader reasoning and cross-domain tasks. Retrieval-augmented generation (RAG) is highlighted as a way to ground answers in external, trusted LC knowledge bases, improving reliability and interpretability.

Safety, Ethics, and Data Privacy

Privacy and data security are central concerns, especially when cloud-based LLMs process protected health information. Only a subset of studies explicitly discuss mitigations such as HIPAA-compliant workflows, anonymization, federated learning, and differential privacy. Human oversight, or a “human-on-the-loop” approach, is repeatedly emphasized to manage risk, ensure transparency, and maintain clinician accountability in decision-making.

Limitations and Future Directions

Current LC-related LLM work often relies on retrospective, single-center data and small samples. Prospective, multicenter validation is needed to establish generalizability and safety. Key priorities include: (1) rigorous prospective trials for diagnostic and treatment-planning tools; (2) development of patient-follow-up and supportive-care applications; (3) stronger emphasis on model interpretability, bias mitigation, and integration into diverse health systems. Multimodal LLMs that combine imaging, pathology, and genomics with text are poised to advance LC care, provided privacy and governance frameworks keep pace with technical progress.

Conclusion: Toward Responsible, Beneficial LC Care

LLMs hold real promise to enhance LC diagnosis, screening, education, and research, and to support clinicians across the care continuum. Realizing this potential requires robust prospective validation, careful attention to safety and ethics, and human-centered design that centers patient welfare. As multimodal capabilities mature and governance frameworks strengthen, LLMs could become integral tools for more precise, timely, and humane LC care.