14 Comprehensive OT Incident Response Playbooks
Operational technology environments do not forgive slow responses or unsafe remediation choices. A misconfigured containment step in a power substation, a premature return to automated control after a PLC compromise, or an uncoordinated vendor escalation during active ransomware deployment can each produce consequences far more serious than the original incident. OT incident response is a different discipline from IT response, and it requires different playbooks.
Industrial teams face a converging threat environment: ransomware groups now specifically target OT-attached systems [source: year], nation-state actors conduct long-dwell reconnaissance in industrial networks, and regulatory frameworks including NIS2, NERC CIP, and sector-specific standards are imposing response time and notification obligations that demand documented, tested procedures.
These 14 comprehensive OT incident response playbooks address the most common and consequential scenarios across power, water, manufacturing, and oil and gas. Each follows a consistent structure: detection triggers, safe immediate containment, evidence capture, lab-validated investigation, phased recovery, communications escalation, and measurable KPIs. Every playbook can be piloted in a tabletop exercise before any live deployment.
- Safety-critical PLC compromise, Immediate containment and safe-mode transition for a suspected live PLC integrity failure.
- Ransomware on OT-attached workstations, Isolation and recovery for encryption events affecting SCADA, historian, or HMI hosts.
- Unauthorized remote vendor access, Detection and revocation procedures for anomalous or unapproved third-party sessions.
- Suspicious firmware or PLC logic change, Integrity verification and safe rollback procedures for unauthorized controller modifications.
- Supply chain compromise (software/firmware), Triage and containment for vendor-introduced malicious code or compromised update packages.
- Data exfiltration / unusual northbound traffic, Detection and blocking procedures for unexpected outbound data flows from OT segments.
- Physical security breach affecting OT assets, Response to unauthorized physical access with direct OT asset exposure risk.
- Power/UPS failure with cyber suspicion, Cyber-informed power event triage distinguishing equipment failure from cyber manipulation.
- Alarm floods / DoS on control networks, Response to denial-of-service or alarm storm events degrading operator situational awareness.
- ICS protocol manipulation (Modbus/DNP3/IEC-104 anomalies), Detection and isolation of anomalous industrial protocol command sequences.
- Insider threat / malicious operator activity, Containment and evidence preservation for suspected intentional misuse by authorized personnel.
- Third-party cloud/IoT service misconfiguration, Remediation of cloud-connected OT integration exposures and IoT device access failures.
- Environmental sensor spoofing (safety sensors), Validation and recovery procedures for suspected manipulation of process safety sensor inputs.
- Post-incident forensics and learning, Structured postmortem, digital forensic collection, and disaster recovery validation procedures.
Playbook 1 – Safety-Critical PLC Compromise
Purpose & scope: Covers suspected or confirmed unauthorized access to, or modification of, safety-critical PLCs, RTUs, or DCS controllers in any sector. Priority: protect life safety and process integrity above all response activities.
Detection triggers:
- Unexpected changes in process variable setpoints without operator confirmation
- Unplanned PLC mode changes (RUN→STOP or program mode activation)
- OT monitoring platform alert for unauthorized configuration read or write
- Field operator reporting unexpected equipment behavior inconsistent with current commands
Immediate safe actions (0–72 hours):
- Immediately notify the plant safety officer and operations manager, do not initiate containment actions without their authorization.
- Switch affected process segments to manual control using documented manual override procedures, coordinate with field operators before switching.
- Isolate the affected PLC at the network layer via ACL modification, do not physically disconnect without operations engineering sign-off.
- Preserve volatile evidence: capture current PLC diagnostic data, historian logs, and OT monitoring platform alerts without altering device state.
- Activate the safety-critical incident bridge: operations, OT security, vendor, and executive sponsor on the same communication channel.
Investigation & validation:
- Download PLC logic backup to a forensic workstation and compare hash against the approved golden image.
- Reconstruct suspected command sequences in the lab replica, do not validate logic assumptions against the live controller.
- Request vendor emergency technical support; confirm whether vendor remote access was active during the event window.
Recovery actions:
- Restore approved logic to the controller from the verified golden image backup during a controlled maintenance window with plant safety sign-off.
- Conduct a full process startup validation before returning to automated control, confirm all setpoints, interlocks, and safety functions operate correctly.
- Run a 72-hour monitored observation period in manual-assisted mode before full automation restoration.
Communications & escalation: Notify the plant manager within the first hour; notify the executive sponsor within 4 hours. Regulator notification obligations depend on sector and jurisdiction, confirm with legal before any external disclosure. Vendor notification must follow the contractual SLA and access audit requirements.
KPIs: Safety incidents attributable to the response = zero; MTTD from first anomaly to OT security team alert under 2 hours; full process restoration within defined RTO.
Test cadence: Quarterly tabletop with roles: operations manager, OT security lead, safety officer, and vendor escalation contact. Scenario: unexpected PLC mode change with no corresponding work order.
Playbook 2 – Ransomware on OT-Attached Workstations
Purpose & scope: Covers ransomware encryption events on SCADA servers, historian workstations, HMI hosts, and engineering workstations with OT network connectivity. Scope does not extend to live controller isolation without separate safety review.
Detection triggers:
- Mass file encryption indicators on OT-zone workstations (file extension change, CPU/disk I/O spike)
- OT monitoring platform alert for anomalous east-west SMB or RDP traffic
- Historian or SCADA application failure with file-not-found errors
- Ransom note files appearing in SCADA application directories
Immediate safe actions (0–72 hours):
- Isolate affected workstations at the VLAN level, disable switch ports or modify ACLs; do not power off affected hosts (preserve volatile memory for forensics).
- Verify that PLC and RTU control functions remain operational and on known-good logic, controllers typically continue operating even with SCADA host loss.
- Switch to manual HMI or field operator control for process monitoring during SCADA host isolation.
- Preserve memory images and forensic disk images of affected workstations before any remediation.
- Block all outbound connectivity from OT segments pending investigation, ransomware frequently has active C2 communication.
Investigation & validation:
- Analyze encrypted file timestamps and event logs to establish patient zero and lateral movement timeline.
- Test backup restoration on an isolated forensic workstation before applying to production systems.
- Confirm controller integrity via golden image comparison, ransomware actors have used workstation footholds to modify PLC logic [example; multiple OT ransomware incidents, [source: year]].
Recovery actions:
- Restore SCADA and historian from verified, pre-incident backups, not from snapshots taken after the infection window.
- Reimage affected workstations from hardened baseline before reconnecting to OT network.
- Implement application whitelisting before returning workstations to production.
Communications & escalation: Notify CISO and executive sponsor immediately. Preserve all encrypted files and ransom communications for law enforcement and legal review. Do not pay ransom without executive, legal, and law enforcement consultation.
KPIs: MTTD from first encryption indicator to isolation; MTTR from isolation to SCADA restoration; zero recurrence within 90 days.
Test cadence: Tabletop twice annually including IT, OT security, operations, and legal. Scenario: historian encryption with simultaneous attempt to move laterally to engineering workstations.
Playbook 3 – Unauthorized Remote Vendor Access
Purpose & scope: Covers detection and response to remote access sessions that occur outside approved windows, from unrecognized endpoints, or with anomalous activity patterns, including both current sessions and evidence of recent unauthorized access.
Detection triggers:
- Session broker alert for access outside defined maintenance window
- VPN or jump host log showing connection from unrecognized IP or endpoint
- OT monitoring alert for device configuration reads not associated with a work order
- Vendor account used without corresponding support ticket
Immediate safe actions (0–72 hours):
- Terminate the active session immediately via the session broker or by revoking the VPN credential.
- Preserve all session logs, connection metadata, and screen recordings before any credential revocation.
- Audit all other active vendor accounts for anomalous access in the preceding 30 days.
- Place the affected access pathway in review status, do not re-enable until investigation is complete.
- Notify the vendor’s security contact and request confirmation of whether their credentials have been compromised.
Investigation & validation:
- Review session recordings to determine whether any configuration reads, writes, or file transfers occurred during the unauthorized session.
- Cross-reference OT monitoring platform alerts for any device behavior anomalies coinciding with the session window.
- Verify device configuration integrity against golden image for any controller accessed during the session.
Recovery actions:
- Rotate all vendor credentials associated with the compromised access pathway.
- Implement JIT access with reduced session duration for the affected vendor going forward.
- Require vendor to provide endpoint security attestation before re-enabling access.
Communications & escalation: Notify the vendor security team and escalate to contract account manager. Legal review required if evidence suggests deliberate unauthorized access. Regulatory notification depends on whether control systems were accessed or modified.
KPIs: Time from anomalous session detection to session termination (target: under 15 minutes); zero vendor sessions without corresponding work order post-remediation.
Test cadence: Quarterly tabletop. Scenario: vendor account used from foreign IP during off-hours with configuration reads on safety relay.
Playbook 4 – Suspicious Firmware or PLC Logic Change
Purpose & scope: Covers detection and response to unauthorized modifications of PLC ladder logic, function block programs, firmware versions, or device configuration files, including both malicious changes and unauthorized maintenance actions.
Detection triggers:
- OT monitoring platform or file integrity tool alert for logic file hash change
- Change management system shows no approved work order for the affected device
- Field operator reporting unexpected equipment behavior correlated with recent maintenance activity
- Firmware version mismatch detected during scheduled integrity scan
Immediate safe actions (0–72 hours):
- Do not attempt to revert the logic change without safety engineering review , the change may have introduced safety-critical modifications that require analysis before rollback.
- Isolate the affected controller at the network layer to prevent any further remote modification.
- Capture the current logic image to a forensic workstation using read-only methods.
- Suspend all remote access to the affected device and adjacent controllers.
- Engage the control system vendor for technical review of the modified logic.
Investigation & validation:
- Compare the captured logic image against the approved golden image using vendor tools , document all differences line by line.
- Assess whether any differences affect safety interlocks, emergency shutdown logic, or process operating limits.
- Reconstruct the change in the lab replica and evaluate process impact before approving rollback.
Recovery actions:
- Restore the approved golden image during a controlled maintenance window with safety engineering sign-off.
- Conduct a full functional test of all safety interlocks and process controls before returning to automatic operation.
- Update the golden image repository date and re-verify all adjacent controllers.
KPIs: Time from logic change detection to engineering review initiation; zero logic changes without associated approved work order post-remediation.
Test cadence: Annually, combined with firmware integrity audit. Scenario: unauthorized firmware update applied during third-party maintenance visit.
Playbook 5 – Supply Chain Compromise (Software/Firmware)
Purpose & scope: Covers response to suspected malicious code, backdoors, or unauthorized modifications introduced through vendor software updates, firmware packages, or third-party integration libraries deployed in OT environments.
Detection triggers:
- Threat intelligence or vendor advisory identifying compromise in a software/firmware product in use
- OT monitoring alert for unexpected outbound connections following an update deployment
- Anomalous process behavior beginning shortly after scheduled software update
- Hash mismatch between deployed package and vendor-published hash
Immediate safe actions (0–72 hours):
- Halt all further deployment of the suspected package across the OT estate immediately.
- Isolate systems that have received the update, check OT monitoring for anomalous behavior since deployment date.
- Preserve forensic images of affected systems before any rollback.
- Obtain vendor confirmation of package integrity and request an emergency advisory if not already published.
- Notify sector ISAC and CISA ICS-CERT (or national equivalent) per coordinated vulnerability disclosure obligations [source: CISA, year].
Investigation & validation:
- Analyze the deployed package in a sandbox environment, check for unexpected network calls, file modifications, or scheduled tasks.
- Reconstruct the deployment in the lab replica to observe behavior without production risk.
- Cross-reference update deployment timeline with any anomalies in OT monitoring alerts.
Recovery actions:
- Roll back to the last verified-clean software version from an authenticated backup.
- Re-verify package hash against vendor-published values before any future deployment.
- Establish a mandatory pre-deployment sandbox test for all vendor updates going forward.
KPIs: Time from threat intelligence receipt to exposure assessment completion; percentage of OT software with verified hash on record.
Test cadence: Annual supply chain tabletop including procurement, IT, OT security, and vendor management.
Playbook 6 – Data Exfiltration / Unusual Northbound Traffic
Purpose & scope: Covers detection and response to unexpected or unauthorized data flows from OT segments toward enterprise networks, the internet, or cloud services, including process data, configuration files, and historian exports.
Detection triggers:
- OT monitoring alert for new outbound flow from control VLAN to external destination
- Significant volume spike on historian-to-enterprise data path outside scheduled transfer windows
- DNS queries from OT-zone hosts to external or unrecognized domains
- OT SIEM alert for large file transfer from engineering workstation
Immediate safe actions (0–72 hours):
- Block the identified exfiltration pathway at the OT DMZ firewall without disrupting process control traffic.
- Preserve firewall and DNS logs immediately, these degrade or rotate rapidly.
- Identify the source host and isolate at VLAN level if exfiltration is confirmed active.
- Assess what data categories have been exposed: process parameters, safety logic, network diagrams, operator credentials.
- Engage legal and executive sponsor if personally identifiable or classified operational data is involved.
Investigation & validation:
- Reconstruct exfiltration pathway and data volume from preserved flow records.
- Determine whether the exfiltration pathway existed previously as an undocumented flow.
- Test the blocked pathway in the lab to confirm no legitimate process function was disrupted.
Recovery actions:
- Harden OT DMZ rules to allow only explicitly documented and approved northbound flows.
- Conduct a full data flow audit for all OT segments against approved flow register.
KPIs: Time from first anomalous flow to blocking action; zero undocumented northbound flows post-remediation.
Test cadence: Semi-annual tabletop with IT security, OT security, and legal participation.
Playbook 7 – Physical Security Breach Affecting OT Assets
Purpose & scope: Covers response to unauthorized physical access to control rooms, substation panels, switch rooms, or field devices, including scenarios where physical access may have enabled device tampering, credential theft, or USB-based attack delivery.
Detection triggers:
- Physical access control system alert for unauthorized entry to OT-restricted area
- CCTV motion alert in control room or substation outside business hours
- Field operator discovering tampered enclosure seals or unfamiliar hardware
- USB connection event logged on OT-zone workstation without corresponding work order
Immediate safe actions (0–72 hours):
- Secure the affected physical area, no personnel access pending forensic sweep.
- Review all OT-zone device access logs and USB connection events for the relevant time window.
- Check all controllers and workstations in the affected area for configuration integrity.
- Preserve CCTV footage, badge logs, and physical access system records immediately.
- Notify site security, plant manager, and executive sponsor; assess law enforcement notification requirement.
Investigation & validation:
- Conduct a physical device inspection in coordination with the control system vendor.
- Check for unauthorized hardware additions (network taps, USB devices, cellular-connected modules).
- Validate firmware and logic integrity for all physically accessible controllers.
Recovery actions:
- Replace compromised access credentials for all systems in the affected area.
- Upgrade physical access controls and CCTV coverage where gaps are identified.
- Implement tamper-evident seals on all controller enclosures and network equipment.
KPIs: Time from physical breach detection to area securing and OT device integrity audit completion; zero unauthorized hardware found unaddressed.
Test cadence: Annual combined physical and cyber tabletop with site security, OT security, and operations.
Playbook 8 – Power/UPS Failure with Cyber Suspicion
Purpose & scope: Covers power loss events at OT facilities where circumstantial indicators suggest cyber involvement, drawing on the documented pattern of power sector targeting by advanced threat actors [example: [source: year]].
Detection triggers:
- Unexpected power loss coinciding with unusual network activity in the preceding hours
- UPS or PDU management interface showing configuration changes without work order
- OT monitoring alert for device communication loss across multiple segments simultaneously
- SCADA historian showing setpoint or protection relay changes shortly before power event
Immediate safe actions (0–72 hours):
- Follow standard power restoration procedures, safety and restoration take precedence over cyber investigation.
- Simultaneously preserve all OT monitoring, SIEM, and control system logs from the period before the event.
- Check UPS and PDU management interfaces for unauthorized access or configuration changes.
- Assess protection relay settings for any changes not matching approved configurations.
- Do not return automatic control until controller integrity is verified.
Investigation & validation:
- Analyze the timeline of network events preceding power loss for indicators of deliberate action.
- Compare protection relay configurations against approved baselines.
- Engage vendor for emergency protection systems integrity assessment.
Recovery actions:
- Restore to manual operation, verify all protection settings before re-energizing.
- Implement enhanced monitoring on power management interfaces going forward.
KPIs: Time from power event to cyber involvement assessment; zero return to automatic control without integrity verification.
Playbook 9 – Alarm Floods / DoS on Control Networks
Purpose & scope: Covers denial-of-service events and alarm flooding on OT networks that overwhelm operator situational awareness, whether from external attack, misconfigured equipment, or deliberate manipulation.
Detection triggers:
- Alarm management system reporting alarm rate exceeding defined threshold
- OT monitoring alert for broadcast storm or excessive traffic on control VLAN
- Operators reporting inability to distinguish process alarms from system-generated noise
- SCADA historian gaps indicating communication loss with multiple devices simultaneously
Immediate safe actions (0–72 hours):
- Enable alarm suppression filters for non-critical systems to restore operator situational awareness for safety-critical alarms.
- Identify the traffic source via OT monitoring platform, do not modify switch configurations without operations awareness.
- Isolate the identified flooding source at the network layer.
- Brief field operators on the event and place additional field staff at critical process areas.
- Capture network traffic from the affected segments for forensic analysis.
Recovery actions:
- Implement rate limiting on control VLANs to bound future flooding impact.
- Review and adjust alarm philosophy to prioritize safety-critical alarm classes.
KPIs: Time from alarm flood detection to operator situational awareness restoration; zero safety events during alarm flood period.
Playbook 10 – ICS Protocol Manipulation (Modbus/DNP3/IEC-104 Anomalies)
Purpose & scope: Covers detection and response to anomalous or unauthorized Modbus, DNP3, IEC-104, or similar industrial protocol command sequences, including function code abuse, unauthorized register writes, and replay attacks.
Detection triggers:
- OT monitoring alert for unexpected function codes (e.g., Modbus FC16 write from unrecognized source)
- DNP3 Direct Operate command issued outside normal operator workflow
- Process variable change inconsistent with issued commands (mismatch between command and physical outcome)
- New master-slave communication pair identified in baseline deviation alert
Immediate safe actions (0–72 hours):
- Apply ACLs to block the identified anomalous source from reaching affected controllers, do not modify controller configurations.
- Alert field operators to monitor affected process segments manually.
- Capture full PCAP from the affected segment for forensic protocol analysis.
- Review process variable histories for any unexpected setpoint changes in the preceding 24 hours.
Investigation & validation:
- Analyze captured PCAP for anomalous function codes, unauthorized address targets, and command replay patterns.
- Reproduce the protocol sequence in the lab replica to assess potential process impact.
Recovery actions:
- Implement protocol whitelisting rules at the application-layer firewall for affected communication pairs.
- Verify controller configurations against golden images before removing manual monitoring.
KPIs: Time from anomalous protocol detection to ACL blocking; zero unauthorized protocol commands reaching live controllers post-remediation.
Playbook 11 – Insider Threat / Malicious Operator Activity
Purpose & scope: Covers response to suspected intentional misuse of authorized access, including deliberate process interference, unauthorized data access or export, and sabotage, by current or recently departed personnel.
Detection triggers:
- Operator-initiated commands inconsistent with current process requirements or shift instructions
- Access to sensitive process data or configuration files outside normal role scope
- Anomalous changes to access controls or audit log settings
- Behavioral reports from colleagues or supervisors
Immediate safe actions (0–72 hours):
- Preserve all authentication, command, and access logs immediately, do not alert the suspect before evidence preservation is complete.
- Engage HR, legal, and the executive sponsor before taking any visible response action.
- Reassign the suspect away from OT system access using a non-suspicious operational pretext if possible.
- Conduct a quiet audit of all recent access and commands attributed to the suspect account.
- Verify process integrity for any areas the suspect had access to.
Investigation & validation:
- Cross-reference commanded actions against process outcomes for the relevant period.
- Preserve evidence under chain-of-custody protocols suitable for potential legal proceedings.
Recovery actions:
- Revoke access immediately once legal and HR have authorized the action.
- Review and tighten role-based access controls for the affected functions.
KPIs: Evidence preserved under defensible chain of custody; zero additional unauthorized access post-revocation.
Test cadence: Annual tabletop with HR, legal, and operations leadership included.
Playbook 12 – Third-Party Cloud/IoT Service Misconfiguration
Purpose & scope: Covers security incidents arising from misconfigured cloud-connected OT integrations, IoT device default credentials, or unintended internet exposure of OT data interfaces.
Detection triggers:
- External scan (Shodan/Censys) identifies internet-facing OT management interface
- Cloud security alert for anomalous access to OT data pipeline
- IoT device management console showing unauthorized access or configuration change
- OT monitoring alert for new outbound connection from IoT-connected segment
Immediate safe actions (0–72 hours):
- Disable the exposed integration or interface immediately, restore via secure, reviewed configuration before re-enabling.
- Rotate all credentials associated with the affected cloud service and IoT devices.
- Assess what OT data has been accessible and for how long.
- Notify the cloud service provider’s security team.
Recovery actions:
- Re-architect the cloud integration with explicit data minimization and one-way export controls.
- Implement a regular external exposure scan for OT-adjacent cloud services.
KPIs: Time from exposure identification to remediation; zero OT management interfaces internet-exposed post-remediation.
Playbook 13 – Environmental Sensor Spoofing (Safety Sensors)
Purpose & scope: Covers suspected manipulation of safety sensor inputs, including temperature, pressure, flow, chemical concentration, or flame detection, that could cause safety systems to fail to actuate or actuate falsely.
Detection triggers:
- Safety sensor reading inconsistent with physical process state confirmed by field operator
- Multiple redundant sensors diverging without corresponding physical cause
- OT monitoring alert for anomalous configuration read on safety sensor communication
- Safety system spurious trip or unexplained failure to trip
Immediate safe actions (0–72 hours):
- Immediately escalate to safety officer and operations manager, this is a potential safety emergency.
- Switch to independent physical verification of the affected parameter using portable instruments.
- Do not override the safety system based on the suspect sensor reading without independent verification.
- Isolate the sensor communication network at the network layer while maintaining process monitoring via independent instrumentation.
Investigation & validation:
- Physical calibration check of the affected sensor by the instrument engineer.
- Review sensor communication network for anomalous traffic targeting sensor communication protocols.
- Engage safety system vendor for integrity assessment.
Recovery actions:
- Replace or re-certify suspect sensors before returning to automated safety operation.
- Verify full safety system functional test before returning process to automatic mode.
KPIs: Time from sensor anomaly to safety officer notification; zero return to automatic safety mode without full functional test.
Playbook 14 – Post-Incident Forensics and Learning
Purpose & scope: Covers the structured postmortem, digital forensics package assembly, and disaster recovery validation that follows any significant OT incident, ensuring captured evidence supports legal and regulatory needs and that lessons improve future response.
Detection triggers (initiating conditions):
- Completion of any Playbooks 1–13
- End of incident response phase declared by incident commander
- Regulatory or legal requirement for documented postmortem
Immediate actions (0–72 hours post-incident declaration):
- Assemble the complete evidence package: PCAPs, system logs, HMI screenshots, forensic images, session recordings, and chain-of-custody documentation.
- Conduct initial timeline reconstruction with the response team while details are fresh.
- Identify preliminary root cause category: technical failure, human error, or deliberate attack.
- Confirm all regulatory notification obligations have been met or are in progress.
- Brief executive sponsor on preliminary findings within 24 hours of incident closure.
Investigation & validation:
- Conduct detailed technical forensic analysis of collected evidence in the forensic lab.
- Validate that all indicators of compromise have been remediated, use the lab replica to confirm the attack pathway is closed.
- Cross-reference findings against current threat intelligence for attribution or pattern context.
Recovery actions:
- Produce a structured incident report suitable for executive, legal, and regulatory audiences.
- Translate findings into specific, prioritized control improvements with owners and timelines.
- Schedule a full DR test for affected systems within 90 days.
Communications & escalation: Final incident report delivered to executive sponsor and CISO within 30 days. Regulatory submissions completed per applicable framework. All evidence maintained in immutable storage per legal hold requirements.
KPIs: Incident report delivered within 30 days; control improvements with owners and target dates documented; DR test scheduled and completed within 90 days.
Test cadence: Annual full DR test including backup restoration validation. Post-incident review tabletop within 2 weeks of any significant incident.
Conclusion
Effective OT incident response is a program, not a document. These 14 comprehensive OT incident response playbooks provide the procedural foundation, but they deliver value only when tested, trained, and continuously improved. Start by selecting the three playbooks most relevant to your highest-risk scenarios, run tabletop exercises with cross-functional teams including operations and safety, and measure MTTD and MTTR against baseline.
Safety is non-negotiable throughout. The OT responder’s primary obligation, before evidence collection, before attribution, before any remediation, is to prevent the incident from becoming a physical safety event. Build that principle into every response decision and every playbook test.
