The average water utility in the United States runs SCADA infrastructure built between 1990 and 2010. These systems were designed to control treatment processes and log operational data — not to enable machine learning or cloud connectivity. Integrating modern AI analytics into this infrastructure does not require replacing it. It requires understanding its architecture and building around its constraints.
Understanding What SCADA Actually Is
SCADA — Supervisory Control and Data Acquisition — is not a single technology but a category of industrial control system. In water utility contexts, SCADA encompasses the programmable logic controllers (PLCs) that directly control pumps, valves, and chemical feed systems; the human-machine interface (HMI) workstations where operators view system status and issue commands; the historian databases that log sensor readings and operational events; and the communication infrastructure that connects field devices to the control center.
Legacy SCADA systems present several integration challenges. Communication protocols may be proprietary or obsolete — DNP3, Modbus, and early OPC implementations are common in water utility infrastructure, alongside vendor-specific protocols with limited documentation. Security architectures typically follow an air-gap model that was designed to prevent connectivity to external networks, making it difficult to extract data without redesigning the security boundary. Historian databases may use proprietary formats and APIs that are not designed for the high-volume, analytics-oriented data access patterns that AI platforms require.
The Integration Architecture
The key to successful SCADA integration is building a data extraction layer that sits between the SCADA system and the AI analytics platform without modifying the SCADA architecture itself. This approach preserves the operational integrity of the control system — which must remain reliable regardless of what the analytics layer is doing — while enabling data flow to the cloud or on-premises analytics infrastructure.
This integration layer typically uses one or more of the following approaches. OPC-UA, the modern successor to the original OPC standard, provides a standardized, secure data exchange interface that most SCADA vendors now support, either natively or through adapter software. For older systems that support only legacy OPC DA or OPC HDA, protocol bridge software can translate to OPC-UA without changes to the SCADA configuration. Modbus and DNP3 can be bridged via industrial IoT gateways that connect to field devices directly and forward data via standard interfaces.
The integration layer should operate in read-only mode — it extracts data from the SCADA system but does not write commands back. This maintains a strict separation between the operational control layer and the analytics layer, ensuring that a failure or compromise of the analytics platform cannot affect system operation. Only after extensive testing and formal change management processes should any AI-driven automation be introduced that could affect control system behavior.
Data Historians: The Bridge to the Past
Most SCADA systems include an operational data historian — a time-series database that stores sensor readings at regular intervals, often at 1-minute to 15-minute resolution going back years or decades. This historical data is the foundation for training predictive models. Getting access to it is often harder than it should be.
OSIsoft PI (now AVEVA PI) is the most common historian platform in water utility infrastructure. It has a well-documented REST API that enables data extraction, but access requires PI Server credentials and network connectivity that may not be provisioned for external systems. Wonderware Historian, Ignition, and various vendor-specific historians present similar data access challenges.
A practical first step in any SCADA AI integration project is a data extraction pilot: identify the specific historian tags required for the target use case, negotiate the network access and credentials needed to query the historian, and verify that the historical data meets the volume and quality requirements for model training. This pilot typically reveals data quality issues — gaps, calibration offsets, unit inconsistencies — that must be addressed before modeling can proceed.
Cybersecurity Considerations
Any integration that introduces network connectivity to a previously air-gapped SCADA environment must be evaluated carefully from a cybersecurity perspective. Water utilities are critical infrastructure under Executive Order 13636 and are subject to TSA security directives issued in 2021 and 2022. Any modification to SCADA network architecture requires coordination with state regulators and, for larger systems, with the Water Information Sharing and Analysis Center (WaterISAC).
The recommended architecture for SCADA-to-cloud connectivity uses a one-way data diode at the SCADA network boundary — hardware devices that physically enforce unidirectional data flow. Data can leave the SCADA network but nothing can enter it from the external network. This eliminates the network connectivity attack surface while enabling data extraction for analytics purposes.
For utilities that cannot implement a data diode architecture, an intermediate demilitarized zone (DMZ) with a historian replication server provides an alternative. Data is replicated from the operational historian to a DMZ historian using a strictly controlled, one-way replication configuration. The analytics platform connects only to the DMZ historian, with no network path to the operational SCADA network.
Building the Analytics Layer
With data flowing reliably from the SCADA historian to an accessible data store, the analytics layer can be built. Cloud-based platforms that specialize in industrial time-series data — including Nyad — handle the storage, processing, and model deployment infrastructure, allowing utility staff to focus on configuring use cases rather than managing data pipelines.
The initial use cases for AI analytics in a newly integrated SCADA environment should be chosen carefully. Start with use cases where the relevant historical data is abundant and well-documented, where the failure mode or optimization opportunity is well understood, and where the business value is clear enough to justify the effort. Anomaly detection in treatment process data and predictive maintenance for high-value pump assets are typically good starting points.
As the integration matures and trust in the analytics layer grows, more sophisticated use cases become accessible: real-time optimization of chemical dosing, automated compliance reporting from SCADA operational data, and eventually AI-assisted control recommendations that operators review before implementation. Each step should be validated thoroughly before the next is introduced.
Staff and Change Management
Technology integration in water utilities fails at least as often from organizational factors as from technical ones. SCADA operators who have built their expertise around the existing control system may be skeptical of AI recommendations that conflict with their operational intuition. Water quality staff may resist dashboards that surface data they did not previously have access to.
Successful integration programs invest as heavily in training and change management as in technology. Operators should understand how the models work at a conceptual level, should be involved in validating model outputs against their operational knowledge, and should have clear channels to report when model recommendations seem wrong. Building operator trust is a multi-year process that requires consistent follow-through on the promise that the AI system is a tool for them, not a replacement for their judgment.