Today's capstone integrates everything from Domain IV. You'll work through building a monitoring and incident runbook for a deployed AI system and answer 10 scenario-based practice questions.
Background: MedTech Solutions has deployed SafeScreen AI, a high-risk AI system that screens mammography images for potential breast cancer. The system provides a risk score (0–100) and a recommendation (further review, routine follow-up, or clear). It operates in hospitals across the EU and US.
Deployment model: Human-on-the-loop — radiologists review all cases flagged as "further review" (score > 70) before patient notification. Cases scored "clear" (score < 30) proceed through the standard workflow. Cases in the middle range (30–70) are queued for radiologist review within 48 hours.
Current state: The system has been deployed for 3 months. No formal monitoring framework or incident response plan exists.
Define the monitoring framework for SafeScreen AI:
Performance KPIs:
- Sensitivity (true positive rate): Baseline 94.5%, threshold: never below 92%
- Specificity (true negative rate): Baseline 88.2%, threshold: never below 85%
- False negative rate: Baseline 5.5%, threshold: alert above 6%, halt above 8%
- Processing time: Baseline 12 seconds, threshold: alert above 30 seconds
Fairness metrics:
- Sensitivity by age group (under 40, 40–60, over 60): gap threshold ≤ 3%
- Sensitivity by ethnicity: gap threshold ≤ 3%
- False negative rate by demographics: gap threshold ≤ 2%
Drift detection:
- Input data distribution comparison: weekly statistical tests
- Score distribution monitoring: flag if mean risk score shifts more than 10%
- Confidence score monitoring: flag if average confidence drops below 80%
Alert levels:
- Green: all metrics within normal parameters
- Yellow: one or more metrics approaching thresholds — investigate within 24 hours
- Red: thresholds exceeded — escalate immediately, consider system halt