Best 12 OT Visibility KPIs Every SOC Should Track
In the high-stakes world of Industrial Control Systems (ICS) and Operational Technology (OT), “you cannot protect what you cannot see” is no longer just a catchy phrase-it is a survival imperative. As we move through 2025, the air gap is a myth, and the convergence of IT, OT, and IIoT has created a massive, often invisible attack surface.
For a modern Security Operations Center (SOC), applying traditional IT metrics to an industrial environment is a recipe for disaster. OT systems prioritize safety and availability over data confidentiality. When a PLC (Programmable Logic Controller) goes offline, it’s not just a “server down” alert; it’s a potential life-safety event or a multi-million dollar production halt.
This guide provides the 12 essential OT Visibility KPIs that every industrial SOC should track to move from reactive firefighting to proactive resilience.
The 12 Essential OT Visibility KPIs for 2025
1. Asset Discovery Depth (By Purdue Level)
In OT, a simple IP count is insufficient. This KPI measures the percentage of assets identified at each level of the Purdue Model.
- Why it matters: Most SOCs have 90% visibility at Level 3 (Operations Systems) but drop to less than 20% at Level 1 (Basic Control/PLCs).
- The Goal: Achieve >95% visibility at Level 3 and 2, and >80% at Level 1.
- Actionable Insight: If your discovery stops at the gateway, you are blind to the “east-west” traffic where most industrial attacks propagate.
2. Firmware and OS Vulnerability Coverage
Tracking the percentage of OT assets with known, mapped CVEs (Common Vulnerabilities and Exposures).
- Why it matters: Industrial assets often have lifespans of 20+ years. Tracking this KPI helps prioritize “virtual patching” via firewalls when physical patching is impossible.
- The Goal: 100% mapping of critical assets to their respective firmware versions and known vulnerabilities.
3. Percentage of Encrypted vs. Cleartext OT Protocols
This KPI monitors the volume of industrial traffic (Modbus, S7, CIP) running without encryption.
- Why it matters: Many legacy protocols are “insecure by design.” Knowing what percentage of your traffic is unencrypted allows the SOC to implement better network segmentation.
- The Goal: Identification of all cleartext protocols and a roadmap to wrap them in secure tunnels or move to secure versions (e.g., OPC UA).
4. Mean Time to Detect (MTTD) Industrial Anomalies
The time elapsed from the start of an unauthorized process change or protocol anomaly to SOC notification.
- Why it matters: In IT, a 24-hour detection time is decent. In OT, an unauthorized “Stop” command to a turbine needs to be detected in seconds.
- The Goal: Under 15 minutes for critical process-impact alerts.
5. Unauthorized External-to-Internal Connections
Tracking any connection from the corporate IT network or the Internet directly to the Industrial Zone (Level 2 or 1).
- Why it matters: This is the primary vector for ransomware. Any breach of the “conduit” between IT and OT is a high-severity event.
- The Goal: Zero unauthorized bypasses of the DMZ.
6. Baseline Deviation Frequency
Industrial networks are deterministic; they do the same thing every day. This KPI tracks how often “new” communication patterns appear.
- Why it matters: A new communication path between a HMI and a PLC that has never existed before is a leading indicator of reconnaissance or lateral movement.
- The Goal: Low “noise” baseline with immediate investigation of any new asset-to-asset (East-West) traffic.
7. OT Sensor Uptime and Health
The percentage of time your passive monitoring sensors (e.g., Dragos, Nozomi, Claroty) are online and ingesting traffic.
- Why it matters: If a TAP or SPAN port fails, the SOC is “blind” to that segment. An attacker can operate in that dark zone indefinitely.
- The Goal: 99.9% sensor uptime.
8. Remote Access Session Auditing Rate
The percentage of remote vendor or maintenance sessions that are fully logged, recorded, and tied to a specific user.
- Why it matters: Third-party vendors are a major risk vector. You must see exactly what a vendor changed in a PLC logic file.
- The Goal: 100% of remote access sessions must be proxied and recorded.
9. Dormant or “Ghost” Asset Ratio
The number of assets that appear in the inventory but haven’t communicated on the wire in over 30 days.
- Why it matters: These could be decommissioned devices that still have active network ports-perfect hiding spots for attackers.
- The Goal: Less than 5% dormant assets; regular pruning of the “ghost” inventory.
10. PLC Logic Change Frequency
Tracking how often the code (logic) within a controller is updated or modified.
- Why it matters: Unexpected logic changes are the hallmark of advanced threats like Stuxnet or Triton.
- The Goal: Every logic change should map back to a scheduled maintenance ticket.
11. Asset Ownership Accuracy
The percentage of discovered assets that have a designated “Operational Owner” (e.g., a specific Plant Engineer).
- Why it matters: When an alert fires, the SOC needs to know who to call on the plant floor. If there is no owner, response time (MTTR) sky-rockets.
- The Goal: 100% ownership mapping for critical/high-impact assets.
12. MTTR (Mean Time to Recover) – Safety First
In OT, “R” stands for Recovery/Resilience, not just Resolution. This measures how long it takes to return the process to a “Safe State” after an incident.
- Why it matters: The SOC must coordinate with the plant floor to ensure that “killing a process” to stop an attack doesn’t cause a physical explosion or environmental spill.
- The Goal: Documented recovery times that align with the business’s Maximum Tolerable Downtime (MTD).
Background: The Shift from IT SOC to OT SOC
The historical approach to industrial security was simple: Air-gapping. By physically disconnecting the plant from the internet, it was assumed to be safe. However, the rise of Industry 4.0 and the need for real-time data analytics have bridged these gaps.
Modern factories now use IIoT sensors, cloud-based predictive maintenance, and remote vendor support. While these drive efficiency, they also mean that a threat actor in a corporate office in one country can potentially manipulate a valve in a refinery in another.
The metrics listed above are designed to bridge the gap between the Digital (the packets on the wire) and the Physical (the spinning motor or the flowing chemical). A mature SOC in 2025 doesn’t just look for “malware”; it looks for operational deviations.
How to Implement These KPIs in Your SOC
To make these KPIs effective, follow these three steps:
- Deploy Passive Monitoring: Never use active scanning (like Nmap) in a production OT environment, as it can crash legacy PLCs. Use passive TAPs to “listen” to traffic.
- Establish a Baseline: Spend 30 days monitoring “normal” operations before turning on blocking or high-severity alerts.
- Cross-Train Analysts: Ensure your SOC analysts understand the difference between a “TCP Reset” and a “CIP Service Request.”
Final Thoughts
OT security maturity starts with visibility, and visibility starts with measurement. These 12 KPIs provide a practical, strategic framework for SOCs looking to protect industrial environments without disrupting operations.
For CyberSec Magazine readers-CISOs, SOC leaders, OT engineers, and industrial decision-makers-this KPI-driven approach offers a clear path from blind spots to control, from alerts to insight, and from risk to resilience.
