Support

Creating a Threat-Informed Defense with the MITRE ATT&CK for ICS Matrix

March 10, 2021

As we’ve discussed in previous posts, the MITRE ATT&CK for ICS Matrix provides a common nomenclature to allow asset owners, security researchers and consultants, internal defenders and product vendors to better communicate about adversary techniques. Nothing in it is new to experienced individuals, but what is revolutionary is the taxonomy of techniques and helpful guidance like data sources for detection. It helps normalize the discussion between these different groups to aid with the automation of detection & response, and more generally, the overall risk discussion. Using the threat scenario outlined below, we’ll demonstrate how an asset owner can benefit from leveraging diverse data collection methods to create a threat-informed defense using one of the MITRE ATT&CK for ICS Techniques as an example.

Scenario Synopsis

Observed:

A third party is onsite to update some control logic due to an AC motor drive replacement in a plant cooling process. As part of this update, the third party brings in a piece of removable media. As per policy, a quick virus scan is settled on for expediency as the outage is almost over, and the physical drive replacement took longer than anticipated. The files are mostly in a proprietary control system executable format, not understood by the AV engine, and from actual scanning as per the Original Manufacturer’s recommendation due to concerns around false positives causing outages. The results of the malware scan come back clean with no warning.

The technician logs in as a domain administrator on the primary engineering workstation where all the control code is kept, as is required by the control system operations manual. They then plug the USB drive in and begin to copy files from the USB drive into the primary code repository file share. Again, all directories and file extensions are exempted from normal malware scanning, and this process is exempted from any whitelisting, as it is the sole method of updating the system. This has been risk reviewed/accepted with all the proper authority as part of both Factory and Site Acceptance Testing.

The technician then updates all the controllers via the system software which really is just using FTP/SMB to copy files out to other workstations and the controllers. They then start system initialization and run through all the normal startup sequences. At this point, everyone is happy. The technician is off to the airport, and plant starts up and comes online.

As is customary, the plant maintenance was done a calendar quarter ahead of peak season. This gave the plant time to run and be vetted prior to being placed into a must-run condition by the load management group during critical weather periods. No issues were noted during this time.

Then, on the hottest day of the year things go terribly wrong. The grid is already having trouble balancing generation with the ever increasing load demand. The grid supervising authority has already leveraged its capacity management under volunteer load-shedding agreements.

At the site where the plant is running, all four units are running along smoothly. In fact, the lead plant operator, who’s a seasoned veteran of hot summers, is noticing that water plant is running cooler than expected given all the environmental stresses. That upgrade a few months ago must be working beautifully. Suddenly, all 4 units go into emergency shut down. A complete loss of cooling is alarmed, and safety systems turn on to prevent catastrophic failure of the units.

Unfortunately, the sudden loss of generation and inertia causes a voltage collapse on an already stressed grid and results in a regional black-out. It will be days or weeks before the units can be brought back online due to repairs for the cooling system.

Unobserved:

The third-party vendor has an undetected compromise of their network, more specifically their central code vault system has been compromised. As a major Value-Added Reseller (VAR) of the most popular control systems for this application, they are a major target. The bad actors have been in there for months and have been going through the code vault looking for high-value customer files. They know the standard procedure is to release whole code branches when doing system updates, so many more files are copied than needed, and technicians never vet the actual files copied.

They have leveraged this to plant fake files that are really malware, but parade as legitimate code. Again, no one ever counts or checks the actual file names. The advantage this gives is that all normal checks and balances are by-passed. There is no need to worry about malware checks because the files are always exempted. There is no reason to worry about privilege escalation because the code is already a domain administrator. There is no reason to worry about being copied around, because it is already copied out under a white-listed network process that comes from a trusted source using a trusted protocol. The AI algorithms don’t scream or are suppressed as part of the maintenance work. Again, this is normal, approved work. The malware files are tiny compared to the larger dataset, so no one will notice the couple of extra kilobytes on the transfer.

This malware goes on a timer. Since being installed in the system, it is waiting to do its thing. The first thing it does is reach out to the control system’s Active Directory Controller (DC), which just so happens to be running on the engineering server (which isn’t monitored by IT, because IT has been kept out of these systems) and begins to query the groups. It automatically creates a couple of familiar looking groups and accounts all with highly privileged access. It also looks at the local firewall rules on the host and determines that 53 is allowed outward access. As this is a DC, it is also a forward Domain Name System (DNS) look-up server, and as such has access through DNS to the internet. Again, nothing too suspect here. Since there is no firewall violation due to an unfortunate firewall rules setting, no alerts are sent to the Security Operations Center.

Once the malware contacts the fake DNS host it begins to leverage DNS to import new code and export critical system data. The first thing they export is an encrypted packet of all the AD groups. The bad actors leverage this to determine that there is a group called “Radius-FWadmin”. They look into that group and note that there is no need to create a new user, as there is an account that hasn’t changed its password or logged on in 180 days. They reset the password on this account and use it to make a few more paths out, most notably they include access for the control protocol used on the network, as well as the controller administrative protocol used to load firmware for this cooling system.

The actors also discover during this time, that while each unit has its own cooling setup, they are all conveniently tied back into the same supervisory infrastructure from the same vendor. This allows the operators to manage all 4 units from a single HMI, but also allows bad actors to easily replicate their attack across the entire site.

Over next couple of months, the bad actors take their time to replace executables on the engineering server with new ones, changing the operations mode of the programmable logic controllers (PLCs) to accept new code and firmware, and slowly pushing new firmware that can be used to mask actual conditions. Routine operations actually do most of the heavy lifting of pushing the new code out under normal routines, so very little high-risk activity had to be carried out by the bad actors themselves.

The critical day hits, and the bad actors are ready. They’ve been watching the heat wave affect other utility grids as it migrates towards your region. Even without any extra help, older less sophisticated equipment has been failing. The attack begins by sending a command for the controllers to start hiding temperatures and pressure readings and reducing cooling system performance. Fans on the mechanical cooling towers are told to operate at a percentage of what they should. Pumps are told to push harder and faster than they should, all of which is contributing to a slow and steady rise in the water temperature and pressure. Eventually this pushes the weakest point on all the systems to fail in very short order.

Initial Access Failure

MITRE Technique

Analysis

How To Detect It

Engineering Workstation Compromise (T818)

Analysis

In this scenario, the attackers specifically went after the engineering workstation because of its trusted and critical role in the process. It is especially vulnerable and valuable because it’s one of the few places in an ICS where both machine and human critically interact and serves as a communication hub from the engineers to the actual process components.

How To Detect It

As per MITRE’s guidance, there are a few data sources that would have exposed this particular vector. The greatest being File Integrity Management (FIM) and Software Inventory detection technology. Between these two technologies, the extra software would have been caught and a red flag raised when all the new files were added. They would have been clearly marked as net new files or software items and should have created a dialogue between the engineering team and the third party. It would not have been hard to identify the new files as not belonging, and ultimately, as malware.

Replication Through Removable Media (T847)

Analysis

This is still one of the most common forms of malware ingestion in the ICS world. It is how “air gapped” networks have been compromised for decades. Stuxnet is the most well-known exploiter of this technique.

How To Detect It

An advantage of having agents capable of running in low-resource environments is that it exposes events that are otherwise impossible to catch. It’s common for our customers to monitor for such events, as they are rare and therefore worthy of additional attention, in OT systems. Agent-based collectors would have noticed the event and forwarded it to the SOC, where the security team should have received a copy for deeper inspection. Using more aggressive malware scanning techniques, the additional files likely would have been exposed.

Supply Chain Compromise (T862)

Analysis

In the OT world, we are more reliant on our supply chain than ever before to effectively maintain the safe and reliable operation of the process systems. There are code and systems that take years to master with experience in the widest of operating environments. This why OEMs and VARs are so important for ICS. However, as the same people work together on the projects and systems for years together, familiarity and trust forms that often allows for human performance breakdowns that otherwise might not exist, as was the case in this scenario.

How To Detect It

Here again, it’s the combined notices of the removable media events and FIM technology that could have preemptively avoided this fault.

When engineering and security teams use these ATT&CK Techniques to guide their systems and event collection priorities, a platform like Industrial Defender can add helpful context that they otherwise might not have had, which also helps asset owners increase the ROI of their current security investments.

If you want to learn how to use the MITRE ATT&CK for ICS Matrix to create a threat-informed defense, join our live demonstration on April 13 at 11 AM EDT, where we will take a deep dive into the different detection techniques you can use to map the rest of the ATT&CK Tactics and Techniques in this scenario.