Challenge Tasks

The AMOS Challenge consists of three main tasks focusing on multi-modal medical image analysis: semantic segmentation, medical report generation, and vision question answering. These tasks aim to advance the development of comprehensive medical image understanding systems.

Task 1: Organ Segmentation

Task Description

Accurately segment 15 abdominal organs from CT and MRI scans. This task requires robust algorithms that can handle multi-modal medical images and provide precise organ delineation.

Dataset Split

Training set: 200 CT + 40 MRI
Validation set: 100 CT + 20 MRI
Test set: 100 CT + 40 MRI

Evaluation Metrics

Dice Score: Measures spatial overlap between predicted and ground truth segmentations
Normalized Surface Distance (NSD): Evaluates boundary accuracy of segmentation results

Task 2: Medical Report Generation

Task Description

Generate comprehensive radiology reports (Findings & Impressions) for CT scans covering Chest/Abdomen/Pelvis regions. The system should produce accurate, clinically relevant, and well-structured reports.

Dataset Split

Training set: 1,287 CT scans with reports
Validation set: 400 CT scans with reports
Test set: 400 CT scans with reports

Evaluation Metrics

GreenScore: Evaluates both the clinical accuracy and linguistic quality of generated reports, example code is shown below.

Example Code


from green_score import GREEN
# pip install green-score

gt_report = {
    "amos_0001.nii.gz": {
        "findings": {
            "abdomen": "The liver demonstrates normal size and contour..."
        }
    }
}

pred_report = {
    "amos_0001.nii.gz": {
        "findings": {
            "abdomen": "The liver is normal in size with smooth contour,.."
        }
    }
}

# Initialize GREEN model
model = GREEN(
    model_id_or_path="StanfordAIMI/GREEN-radllama2-7b",
    do_sample=False,
    batch_size=1,
    cuda=True
)

# Compute GREEN score
refs = [gt_report["amos_0001.nii.gz"]["findings"]["abdomen"]]
hyps = [pred_report["amos_0001.nii.gz"]["findings"]["abdomen"]]

mean_score, scores, explanations = model(refs=refs, hyps=hyps)

print(f"GREEN Score: {mean_score:.3f}")
print(f"Explanation: {explanations[0]}")

return {
    "amos_id": "amos_0001.nii.gz",
    "green_score": float(scores[0]),
    "explanation": explanations[0]
}

Task 3: Vision Question Answering

Task Description

Answer medical questions based on CT scan images across 6 clinical scopes. The system should demonstrate understanding of medical imaging and provide accurate responses to clinical queries.

Dataset Split

Training set: 13,751 VQA pairs
Validation set: 2,787 VQA pairs
Test set: 2,787 VQA pairs

Evaluation Metrics

Accuracy: Measures the correctness of model responses to medical visual questions