Business Continuity and AI Incident Response — ISACA AAISM Certification Prep

AI systems fail in ways traditional systems don't. Today we cover AI-specific incident response, business continuity planning for AI dependencies, and the regulatory reporting obligations that come with AI incidents.

AI incident taxonomy

AI incidents are broader than traditional security incidents. Your incident taxonomy must include:

Model failure — Model produces incorrect or harmful outputs. Could be sudden (software bug) or gradual (data drift). Example: a credit model starts approving high-risk applicants due to distribution shift in input data.

Adversarial attack — Deliberate manipulation of AI systems. Includes prompt injection, data poisoning, model evasion, and model extraction. These are intentional security events.

Bias and fairness events — AI system produces discriminatory outcomes. May not be a "security" incident in the traditional sense, but is a governance incident requiring response. Example: hiring AI disproportionately rejects candidates from certain demographics.

Safety events — AI system causes or could cause physical or psychological harm. Content generation that's harmful, autonomous system malfunction, or medical AI misdiagnosis.

Data incidents — Training data breach, unauthorized access to model weights, or exposure of PII through model outputs (memorization/extraction attacks).

Availability events — AI system becomes unavailable when business processes depend on it. Different from traditional outages because fallback may not exist.

Each category needs its own detection methods, severity criteria, and response procedures.

AI incident taxonomy showing six categories: adversarial attack, model failure, bias event, safety event, data incident, and availability event

AI incidents extend beyond traditional security events. Each category requires its own detection, severity criteria, and response procedures.

AI-aware incident response

Extend your existing IR framework for AI. The phases remain the same — the content changes.

Detection — Traditional SIEM won't catch AI-specific incidents. You need model monitoring (performance drift, output anomalies), bias detection tools, and user feedback channels. Alert on unusual patterns in model inputs and outputs.

Containment — For AI, containment may mean: disable the model, switch to a fallback system, add manual review to model outputs, restrict access to the model API, or isolate the training environment if poisoning is suspected.

Eradication — Root cause analysis is more complex for AI. Was the issue in the model, the training data, the serving infrastructure, or the input data? Model behavior is harder to debug than traditional software.

Recovery — Recovery may require model rollback to a previous version, data cleansing and retraining, or human review of decisions made during the incident period. Consider downstream impact: if the model made bad decisions, those decisions may need to be reversed.

Lessons learned — Update the AI risk register. Adjust monitoring thresholds. Revise training data quality controls. Document what worked and what didn't.

Knowledge Check

An AI model used for fraud detection has been producing a 40% increase in false positives over two weeks. The operations team has been manually reviewing and overriding the flagged transactions. What is the MOST appropriate FIRST response from the security manager?

ISACA wants **process-driven responses.** Before taking any action, classify the incident and follow documented procedures. The predefined thresholds and response procedures should determine whether this warrants model shutdown, investigation, or continued monitoring with overrides.

Business continuity planning for AI

AI systems create new BCP challenges:

Dependency mapping — Identify which business processes depend on AI systems. Many organizations have AI dependencies they don't realize until the system fails.

Fallback procedures — What happens when the AI system is unavailable? Options include manual processes, rule-based systems, previous model versions, or graceful degradation (partial functionality).

Graceful degradation — Design AI systems to degrade gracefully rather than fail completely. A recommendation engine that can't personalize should still show popular items. A fraud detection system that's degraded should increase manual review, not stop all transactions.

Manual overrides — Ensure human operators can override AI decisions at all times. This is both a safety requirement and a regulatory expectation (EU AI Act requires human oversight for high-risk AI).

Recovery time and point objectives — Define RTO and RPO for AI systems just as you would for traditional systems. Note: model retraining can take hours or days, which may exceed typical RTO targets.

EU AI Act serious incident reporting

The EU AI Act introduces specific reporting obligations for AI incidents:

Serious incidents must be reported to the relevant market surveillance authority. A serious incident is one that results in death, serious damage to health, serious and irreversible disruption to critical infrastructure, or serious breach of fundamental rights obligations.

Timeline — Report within 72 hours of becoming aware of the incident, or immediately if life-threatening.

Documentation — Detailed incident documentation including the AI system involved, nature of the incident, corrective measures taken, and impact assessment.

Ongoing obligations — Provide updates as investigation progresses. Final report when investigation is complete.

Even if your organization isn't directly subject to EU AI Act, understand that similar reporting obligations are emerging globally. Design your incident reporting procedures to meet the most stringent requirements you might face.

Knowledge Check

Which of the following is the MOST important BCP consideration specific to AI systems?

The most important AI-specific BCP consideration is **fallback procedures.** Traditional BCP focuses on system availability (redundancy, backups). AI BCP must also address what happens when the AI *produces wrong outputs* — which requires fallback to human judgment or alternative systems, not just keeping the AI running.

Tabletop exercises for AI incidents

Regular tabletop exercises prepare your team for AI incidents. Design scenarios that test:

Cross-functional coordination — AI incidents require security, engineering, legal, communications, and business stakeholders. Does everyone know their role?

Decision-making under uncertainty — AI incidents often have unclear root causes. Can your team make containment decisions with incomplete information?

Regulatory notification — Does your team know when and how to notify regulators? Is the process documented and tested?

Communication — How do you communicate to affected users that AI-made decisions may have been incorrect? What's the remediation process?

Recommended scenarios: biased hiring AI discovered by media, adversarial attack on customer-facing chatbot, training data breach exposing PII, and AI-generated content that causes reputational damage.

Final Check

During a tabletop exercise, the team discovers that the AI fraud detection system has no documented fallback procedure. The business process currently has no alternative to the AI system. What should the security manager recommend FIRST?

Before developing a solution, understand the problem. A **business impact analysis** determines the criticality of the dependency and informs the appropriate level of investment in fallback capabilities. The solution (manual process, secondary AI, rule-based fallback) should match the risk level.

🚨

Day 5 Complete

"AI incidents are broader than security incidents. Your IR and BCP must address model failures, bias events, safety incidents, and regulatory reporting — not just adversarial attacks."