How a Security by Design Approach Might Have Stopped the Florida Water Facility HMI Attack

March 1, 2021

Now that the dust has settled a bit around the cyber attack at the Florida water facility, and everyone has made their sales pitch, I want to highlight some of the more fundamental issues that haven’t really been discussed to date. I hinted at it in our original blog with my quote but want to dive deeper into how a security by design approach might have lessened the severity of this attack, and perhaps even prevented it altogether.

The really scary thing about the Florida water attack is not that there was an external access point, nor that it was configured poorly and allowed someone in. Those are the obvious control failures, and the easiest to fix as well. No, the real issue at play in the Florida water hack, at least as far as I can tell, is the lack of defense in depth with the HMI itself, which presented some fundamental flaws. These flaws open the door for a malicious insider, or even a really tired operator, to do something they shouldn’t. This is, of course, not a shock to the veterans of the field, but it is something we cannot stop saying until it changes.

The veteran ICS security community frequently discusses how these things are designed without security in mind, but that is usually in reference to protocols or the PLC themselves. Less often is the criticism pointed at the HMI design, cynically maybe because that’s more of a project and less of a product point. Who knows, really?

However, there is no reason for this. There are some really good HMI designers out there who take this seriously and know how to do this right. It has been so easy to spot these oversights based on simple observations, because I’ve seen this done correctly. I’m not proposing some engineering revolution, or even an evolution, just that HMI design gets the attention it’s due as critical infrastructure organizations commission new control systems or upgrade existing ones.

“There should always be input validation on checking critical inputs such as setpoints, alarm limits and safety limits.  In fact, I was taught to implement those limits in the PLC, if possible, because if they’re implemented in the HMI (alone) it’s too easy for someone to modify them accidentally or deliberately if they gain access to the HMI computer.  It requires more skill and privileges to modify the PLC logic. Unfortunately, these are exactly the type of assessment findings we bring to system owners all over the world still today. We help them build programs and control systems with strong fundamentals by using standards like IEC 62443 which include sections on access control, remote access and change management.”
- John Cusimano, VP of Industrial Cybersecurity at aeSolutions

One of the easily referenceable frameworks that addresses some of the major issues with the design of the Florida HMI that was hacked, as described in publicly available information, is the OWASP Top 10 Proactive Controls. We won’t go through all 10 of these today, since I can only call out two based on the evidence in this particular case study, but this should shed some light on how picking fundamental proactive design controls can save everyone some major headaches down the road.

C5: Validate All Inputs

OWASP describes this as: “Input validation is a programming technique that ensures only properly formatted data may enter a software system component”.

This is further split into two types of validity checking: syntax and semantic. Syntax does not apply in this case as the attacker input a numerical value into a numerical value. Perhaps they tried other data types and were stopped, but we simply do not one way or the other. We do know semantically there was a likely failure. Semantics is about ensuring only valid values or ranges are accepted.

On the surface, there is no apparent reason to allow a 100x normal level increase in the HMI for a caustic chemical of this type. Some well-known experts in the field, like Joe Wiess as quoted in Krebs on Security, were quick to point out that simple physics and downstream alarming systems would have prevented this from being an actual concern, so then why allow it all? A mistake like this points to the likelihood of other poorly designed input controls where maybe we would not have been so lucky.

Also, do those same physics apply in drought conditions or when the normal thermal assumptions no longer apply? Many engineering disasters have happened when engineers assumed something could never happen. It’s pretty easy to see how poor input control in the HMI design could lead to far worse incidents. Validation not only makes things harder on an attacker, but also saves the normal operators from human performance incidents.

C7: Enforce Access Controls

OWASP describes this as: “Access Control (or Authorization) is the process of granting or denying specific requests from a user, program or process.”

This one requires a bit more assuming, but bear with me.

To understand why I find fault with this control in this particular case, we need to first explain a bit about how control rooms typically operate, along with what layers of controls one can typically assume. First, I’m going to assume this is a typical 24-hour manned operations center, so the HMI in question is probably logged in all the time. It obviously was during the actual attack. In a best-case scenario, operators log out and in at shift changes but that is not necessarily the norm, since in a lot of cases it’s a single shared account logged into every HMI at the OS level.

Second, one can assume the HMI is setup with a couple of levels. Let’s say a default read level, where screens can be seen but not manipulated, which should be the default authorization context. Then there is an operator level, where normal levels and inputs can be entered, and an engineer level where things can not only be manipulated, but also actual changes to the screen can be made or default boundaries can be overridden. Finally, there is usually an administrator role, although the system owner may or may not know that level exists. It is often the domain of the integrator who built the system and is used only for the most extreme cases. These roles are usually also shared, especially in anything not commissioned in the last 5-7 years.

The way this normally is supposed to work is that the HMI screen is left in the default read level. This allows the operator to leave the PC logged in with no screen saver and other normal corporate controls in place. This is so the operator can do their job, which is to observe the system at all times. Only when action needs to be taken should the operator then elevate themselves into the actual operator level. Remember, operators live in an event horizon measured in minutes, anything requiring a quicker response, should be handled by the control system itself or the safety systems as a last resort. This means anything an operator needs to do can withstand a quick password entry. The HMI should then return to read-only promptly after inactivity, similar to what happens with a lock screen on the corporate side.

Now, it is entirely possible that the attacker waited for the system to be elevated into operator mode, which still should have not been sufficient to elevate chemical levels that high (it ideally should have required an engineer role if needed at all), or maybe the operator is continuously logged in because there are just a lot of alarm conditions that need constant clearing (again, not great design in that case). However, it is just as likely to assume that either proper role-based authorization was not implemented in this system at all, or that protective controls like the timeout back to read were not in use.

If we could go back in time and apply these two controls, while it is unnerving that something as precious as a municipal water supply system was so easily accessed, the level of panic would have been much lower if proper system design had been a fundamental component from the start. Something as simple as validating all inputs would have prevented the attacker from even being able to set the value so high. Or, if the operator panel had been designed with appropriate enforcement of access controls, requiring any authorization prior to modification of any kind let alone to dangerous levels, then this seemingly unsophisticated hacker would have been prevented from their attempted attack.

Perhaps with more in-depth knowledge and interviews, we could glean even more lessons from the OWASP Top 10 Proactive Controls in this particular cyber security failure, but even with limited data, it’s pretty easy to see how picking a straightforward and well-designed set of principles to aid a security by design program can save you a lot of headaches down the road.