Categories: Medical Research

A human-LLM collaborative annotation approach for screening articles on precision oncology randomized controlled trials

A human-LLM collaborative annotation approach for screening articles on precision oncology randomized controlled trials

Why a human-LLM collaborative approach matters

Systematic reviews in precision oncology require screening thousands of articles to identify randomized controlled trials (RCTs) that illuminate biomarker-driven therapies and targeted interventions. Manual screening, while thorough, is time-consuming and resource-intensive. Large language models (LLMs) can accelerate triage by quickly categorizing relevance and extracting key trial details, but their reliability in nuanced clinical contexts can vary. A human-in-the-loop annotation approach blends the speed and scale of LLMs with the discernment and domain knowledge of clinical researchers to achieve high-quality screening results without sacrificing rigor.

How the workflow works

The workflow combines automated, model-driven screening with expert adjudication in a structured, auditable process. It typically includes data collection, automated screening with LLMs, targeted human review, and an iterative feedback loop that tunes prompts and improves consistency over time.

Phase 1: data collection and annotation schema

Assemble a corpus of precision oncology articles, emphasizing studies that report randomized designs, biomarkers, and targeted therapies. Define a clear annotation schema (e.g., relevance, trial type, intervention, biomarker, outcomes) and publish guidance with concrete examples. A well-documented schema supports reproducibility and helps different reviewers maintain alignment across rounds of screening.

Phase 2: automated screening with LLMs

Deliver task-focused prompts to an LLM to estimate relevance to precision oncology RCTs and extract structured attributes (population, intervention, comparison, outcomes, trial phase). The model returns a relevance score and a provisional annotation. Crucially, items with lower confidence or ambiguous wording are flagged for human review, ensuring that uncertain cases receive expert attention. Thresholds can be adjusted to trade speed for precision as needed.

Phase 3: human adjudication and feedback

Clinical researchers and methodologists review flagged items and model outputs, making inclusion/exclusion decisions and refining annotation labels. Corrections and justifications are captured to inform subsequent model updates and prompt refinements. This feedback loop improves consistency, reduces drift, and builds an auditable record of why decisions were made, which is essential for transparency in systematic reviews.

Benefits for precision oncology research

The human-LLM collaborative approach increases screening throughput while preserving, and often enhancing, accuracy. By documenting decisions and providing structured annotations, it supports robust meta-analyses, enables reproducible evidence synthesis, and streamlines updating living reviews as new RCT data emerge in the precision oncology landscape.

Quality control and safeguards

Quality hinges on clear guidelines and ongoing monitoring. Key components include inter-annotator agreement checks, frequently updated instruction sets, and monitoring for model drift as the oncology literature evolves. Evaluate performance with standard metrics—precision, recall, and F1—and maintain an auditable trail of decisions and model prompts. Privacy and data integrity controls should be in place for any patient-level or sensitive data that may appear in trial reports.

Implementation tips and best practices

Practical steps to succeed include starting with a small pilot, iterating prompts with domain-specific examples, and establishing an escalation path for difficult cases. Calibrate LLM outputs with explicit uncertainty indicators and refit prompts based on reviewer feedback. Use active learning to prioritize the most informative items for human adjudication and implement version control for prompts and annotation schemas to ensure reproducibility across review cycles.

Looking ahead

As LLMs become more specialized, future work will focus on tighter integration with trial registries, standardized reporting for oncology RCTs, and open benchmarks for screening tasks that enable broader adoption of human-LLM collaboration in precision oncology evidence synthesis.