Challenge Tasks
The AMOS Challenge consists of three main tasks focusing on multi-modal medical image analysis: semantic segmentation, medical report generation, and vision question answering. These tasks aim to advance the development of comprehensive medical image understanding systems.
Task 1: Organ Segmentation
Task Description
Accurately segment 15 abdominal organs from CT and MRI scans. This task requires robust algorithms that can handle multi-modal medical images and provide precise organ delineation.
Dataset Split
- Training set: 200 CT + 40 MRI
- Validation set: 100 CT + 20 MRI
- Test set: 100 CT + 40 MRI
Evaluation Metrics
- Dice Score: Measures spatial overlap between predicted and ground truth segmentations
- Normalized Surface Distance (NSD): Evaluates boundary accuracy of segmentation results
Task 2: Medical Report Generation
Task Description
Generate comprehensive radiology reports (Findings & Impressions) for CT scans covering Chest/Abdomen/Pelvis regions. The system should produce accurate, clinically relevant, and well-structured reports.
Dataset Split
- Training set: 1,287 CT scans with reports
- Validation set: 400 CT scans with reports
- Test set: 400 CT scans with reports
Evaluation Metrics
- GreenScore: Evaluates both the clinical accuracy and linguistic quality of generated reports, example code is shown below.
Example Code
from green_score import GREEN
# pip install green-score
gt_report = {
"amos_0001.nii.gz": {
"findings": {
"abdomen": "The liver demonstrates normal size and contour..."
}
}
}
pred_report = {
"amos_0001.nii.gz": {
"findings": {
"abdomen": "The liver is normal in size with smooth contour,.."
}
}
}
# Initialize GREEN model
model = GREEN(
model_id_or_path="StanfordAIMI/GREEN-radllama2-7b",
do_sample=False,
batch_size=1,
cuda=True
)
# Compute GREEN score
refs = [gt_report["amos_0001.nii.gz"]["findings"]["abdomen"]]
hyps = [pred_report["amos_0001.nii.gz"]["findings"]["abdomen"]]
mean_score, scores, explanations = model(refs=refs, hyps=hyps)
print(f"GREEN Score: {mean_score:.3f}")
print(f"Explanation: {explanations[0]}")
return {
"amos_id": "amos_0001.nii.gz",
"green_score": float(scores[0]),
"explanation": explanations[0]
}
Task 3: Vision Question Answering
Task Description
Answer medical questions based on CT scan images across 6 clinical scopes. The system should demonstrate understanding of medical imaging and provide accurate responses to clinical queries.
Dataset Split
- Training set: 13,751 VQA pairs
- Validation set: 2,787 VQA pairs
- Test set: 2,787 VQA pairs
Evaluation Metrics
- Accuracy: Measures the correctness of model responses to medical visual questions