Your K8s Environment Was Attacked: What to Do in the First 24 Hours

Cyberattacks—every enterprise’s worst fear. It seems there’s a new headline about a company being compromised and left to deal with the fallout. While everyone from CISOs to DevOps practitioners are focused on securing Kubernetes environments, there is no guarantee that attacks can be prevented, and when (not if) one occurs, there’s no guarantee that a company won’t suffer downtime or data loss. To compound that, many leaders aren’t even sure what to do in the event of an attack, especially during the critical 24-hour period following an event.

While some companies have a clear security playbook that outlines what to do following an attack (whether it be ransomware, a data breach, etc.), others may still be in the process of creating one. For those who are putting the final touches on their businesses’ cybersecurity resiliency plan, consider the following steps to prepare for the first 24 hours after an attack.

Immediately After an Attack

When an attack hits, there are a few important steps to consider regardless of whether the compromised environment is Kubernetes or not. First and foremost, practitioners should alert the proper authorities, stakeholders and leaders to raise awareness of the compromise and ensure the message is delivered to the right people. In some cases, it can be submitting a ticket to a security or IT team; in other cases it can be a full-scale escalation to leadership. Each enterprise has its own unique structure and hierarchy that will change who should be notified of a possible attack, but the key factor is to guarantee those who need to know are in-the-know.

Once the correct people have been alerted, it’s time to assess the extent of the compromise. While something like a hacked user account may only affect one employee, other larger-scale attacks can leave an entire organization exposed. To ensure this process of identifying the compromise goes as smoothly and quickly as possible, an organization should always look to leverage additional resources. 

Within the First 12 Hours of an Attack

While some enterprises may be equipped to recover lost data and restore capabilities, most will need to call for outside help. Within the first 12 hours it may be clear what went wrong, however, chances are you may still be far from understanding the full extent of the damage. An outside consultant (or consultants) can help explore different angles and determine compromised areas you might not have considered during the attack. Some of the potential ways outside help could manifest include: 

  • Disaster recovery service providers who are already trained on the solution you use and can offer additional resources to initiate a large-scale recovery operation
  • Incident response consultants who can help plan remediation to prevent future compromises
  • Forensics analysts to investigate the indicators of compromise to better understand the incident’s root cause

Now that additional help has been outsourced, an organization’s security team can work to find the root cause of the compromise and address the vulnerabilities that were exploited. Operating in Kubernetes likely means the compromise took place in a cloud-native application, so there is an increased number of ways a breach could have occurred. Security practitioners need to work on identifying and remediating the compromise. Did any end user come across suspicious or phishing emails they did not report? This could mean credential compromise of a privileged admin account. Did platform engineers overlook any vulnerabilities found in container images? This could be a vulnerability in the CI/CD pipeline.

Determining the root cause of the breach will guide teams toward the containment phase, stopping the spread of identified incidents before they affect other end users or customers. This could involve isolation or even a temporary reduction in privileged access if necessary. 

Within the First 24 Hours

Once a containment plan is approved by executives and is in motion, it is important to notify those impacted. You may be tempted to bury the incident for fear of losing credibility, customers or trust. However, it’s critical to consider that a security incident that is visible to a customer will eventually be become known to the public, and if an organization doesn’t get ahead of this, the resulting narrative could be misconstrued.

Make your internal communications team aware of the incident immediately and ask them to help draft internal and external communications (as necessary). You don’t need to focus on casting blame or describing confidential details of the incident but instead should explain how your organization is responding swiftly with the right level of resources and explain the steps being taken or the steps that still need to be taken. Reassurance and calls-to-action are your most valuable assets during a security incident as it reinforces existing relationships and invites new ones to help.

A Call to Action: Ensure You Have a Last Line of Defense

While the first 24 hours after a security incident are the most critical, there’s still work to be done to rebuild and prepare your Kubernetes systems against any future attacks. This includes securing nodes, access, networks, containers and data. 

Treat nodes just like any traditional stack application using endpoint security and reducing the number of unnecessary packages on a distribution. Secure access by minimizing role-based access control (RBAC) with least privilege to reduce the blast radius in the event a credential is compromised. On top of this, ensure that one service can only communicate with the other services that it needs to function and minimize the number of containers that run as root. This ensures networks and containers remain protected. To complete the process, enable the Kubernetes audit log (this is not enabled automatically) and leverage immutable backups to combat ransomware, enable runtime monitoring to capture exfiltration events and encrypt data-at-rest to secure data.

For further protection, it may be beneficial to consider investments into new technology and software as well. The best places to start are in detection and response tools. Monitoring provides visibility and SIEM and SOAR platforms offer a scalable way to let the software filter noise, correlate findings and escalate only critical security alerts. Look to pair this with a cloud-native application protection platform (CNAPP) which often offers many of the above resources in one platform.

No matter how much or how little you invest in preventing or detecting future cyberattacks, always ensure you have a last line of defense in place. Data is the number-one target for many adversaries, so protecting the confidentiality, integrity and availability of data is paramount. A resilient data protection platform is an insurance policy against attacks through data immutability that leaves the door open for a swift recovery. While it is nearly impossible to fully prevent a ransomware event, being prepared and knowing how to respond are key to weathering the storm. The more prepared your business is, especially in the first 24 hours, the faster you can recover and the sooner you can get back to doing what you do best.

Joey Lei

Joey is a Principal Product Manager for Kasten by Veeam. Joey comes to Kasten with over a decade and a half of experience managing product and services portfolios worth $100M-$500M annually. Previously Joey was a Lead Product Manager for the Data Protection and Availability products at Dell EMC. He was also Director of Product Management at Synoptek. Joey excels at operating up and down the technology stack and forecasting a vision for emerging markets which has resulted in significant revenue growth for his organizations.

Joey Lei has 1 posts and counting. See all posts by Joey Lei