AI systems fail in ways traditional systems don't. Today we cover AI-specific incident response, business continuity planning for AI dependencies, and the regulatory reporting obligations that come with AI incidents.
AI incidents are broader than traditional security incidents. Your incident taxonomy must include:
Model failure — Model produces incorrect or harmful outputs. Could be sudden (software bug) or gradual (data drift). Example: a credit model starts approving high-risk applicants due to distribution shift in input data.
Adversarial attack — Deliberate manipulation of AI systems. Includes prompt injection, data poisoning, model evasion, and model extraction. These are intentional security events.
Bias and fairness events — AI system produces discriminatory outcomes. May not be a "security" incident in the traditional sense, but is a governance incident requiring response. Example: hiring AI disproportionately rejects candidates from certain demographics.
Safety events — AI system causes or could cause physical or psychological harm. Content generation that's harmful, autonomous system malfunction, or medical AI misdiagnosis.
Data incidents — Training data breach, unauthorized access to model weights, or exposure of PII through model outputs (memorization/extraction attacks).
Availability events — AI system becomes unavailable when business processes depend on it. Different from traditional outages because fallback may not exist.
Each category needs its own detection methods, severity criteria, and response procedures.
Extend your existing IR framework for AI. The phases remain the same — the content changes.
Detection — Traditional SIEM won't catch AI-specific incidents. You need model monitoring (performance drift, output anomalies), bias detection tools, and user feedback channels. Alert on unusual patterns in model inputs and outputs.
Containment — For AI, containment may mean: disable the model, switch to a fallback system, add manual review to model outputs, restrict access to the model API, or isolate the training environment if poisoning is suspected.
Eradication — Root cause analysis is more complex for AI. Was the issue in the model, the training data, the serving infrastructure, or the input data? Model behavior is harder to debug than traditional software.
Recovery — Recovery may require model rollback to a previous version, data cleansing and retraining, or human review of decisions made during the incident period. Consider downstream impact: if the model made bad decisions, those decisions may need to be reversed.
Lessons learned — Update the AI risk register. Adjust monitoring thresholds. Revise training data quality controls. Document what worked and what didn't.
AI systems create new BCP challenges:
Dependency mapping — Identify which business processes depend on AI systems. Many organizations have AI dependencies they don't realize until the system fails.
Fallback procedures — What happens when the AI system is unavailable? Options include manual processes, rule-based systems, previous model versions, or graceful degradation (partial functionality).
Graceful degradation — Design AI systems to degrade gracefully rather than fail completely. A recommendation engine that can't personalize should still show popular items. A fraud detection system that's degraded should increase manual review, not stop all transactions.
Manual overrides — Ensure human operators can override AI decisions at all times. This is both a safety requirement and a regulatory expectation (EU AI Act requires human oversight for high-risk AI).
Recovery time and point objectives — Define RTO and RPO for AI systems just as you would for traditional systems. Note: model retraining can take hours or days, which may exceed typical RTO targets.
The EU AI Act introduces specific reporting obligations for AI incidents:
Serious incidents must be reported to the relevant market surveillance authority. A serious incident is one that results in death, serious damage to health, serious and irreversible disruption to critical infrastructure, or serious breach of fundamental rights obligations.
Timeline — Report within 72 hours of becoming aware of the incident, or immediately if life-threatening.
Documentation — Detailed incident documentation including the AI system involved, nature of the incident, corrective measures taken, and impact assessment.
Ongoing obligations — Provide updates as investigation progresses. Final report when investigation is complete.
Even if your organization isn't directly subject to EU AI Act, understand that similar reporting obligations are emerging globally. Design your incident reporting procedures to meet the most stringent requirements you might face.
Regular tabletop exercises prepare your team for AI incidents. Design scenarios that test:
Cross-functional coordination — AI incidents require security, engineering, legal, communications, and business stakeholders. Does everyone know their role?
Decision-making under uncertainty — AI incidents often have unclear root causes. Can your team make containment decisions with incomplete information?
Regulatory notification — Does your team know when and how to notify regulators? Is the process documented and tested?
Communication — How do you communicate to affected users that AI-made decisions may have been incorrect? What's the remediation process?
Recommended scenarios: biased hiring AI discovered by media, adversarial attack on customer-facing chatbot, training data breach exposing PII, and AI-generated content that causes reputational damage.