Just-in-Time Permissions in Microservices-Based Applications

In a previous article, we discussed keeping microservices secure, even from themselves.

But what else can you do to keep your application free of vulnerabilities that could be exploited by bad actors? While there are many things you can do, one of the most essential is to understand who in your organization needs access to your application’s production operating environment.

Usually, we think that anyone who manages the application in production needs access to production systems. But the principle of least privilege suggests that you should not give access to production systems to most people involved in managing your systems. Arguably, nobody requires direct access to everything in a production environment. The principle of separation of responsibility goes one step further and says that no single person should be able to access all of production.

But eventually, your application will have problems. Servers will need to be restarted, processes on those servers need to be terminated, containers have to be launched, files must be trimmed and runaway services need to be stopped.

How do you perform privileged production activities when those working on the issue do not have privileged access?

This is where permission escalation comes into play.

Permission escalation, sometimes known as just-in-time permissions, gives an engineer extra permissions above their normal access rights to perform emergency operations in special circumstances, such as while an incident is ongoing.

On the surface, this seems contrary to the original goal of keeping the application secure by limiting permissions to only those that absolutely need it. However, the goal of permission escalation is to allow an engineer to engage in a well-defined process that allows them to get additional permissions, but only within the specific confines of an incident response process.

There are several models for doing this, each with a different set of advantages and disadvantages.

Escalation Model #1. Break the Glass

The Break the Glass (BTG) model is where an incident responder working during an emergency procedure issues a request to an emergency management system that automatically gives them escalated permissions for a period of time. The request for permissions requires an active incident to be ongoing and the reasons for the request must be provided.

After the incident is over, the break-the-glass request is examined by upper management as part of the postmortem to make sure it was properly utilized. The review prevents bad actors from using the technique to get extra access without being noticed.

However, since the engineer making the request gets the permissions before the eventual postmortem review, there is still a chance that this mechanism could be used by bad actors. The tool, though, does help prevent covert and deceptive attacks.

This model is straightforward to implement in a production system. Since it is a reactive-only review process, the mechanism is only helpful for moderately secure environments. It also only says that an engineer requested escalated permissions. It doesn’t give information as to how those permissions were actually utilized. Hence, as is, it is insufficient for high-security systems and it doesn’t protect against a disgruntled employee’s bad actions taken during an incident.

Escalation Model #2. Logged Escalation

In logged escalation, when an engineer needs to perform certain privileged activities, they use a special tool that executes the commands at the escalated permissions level and creates a log of all activity performed. An example of such a tool is a bastion host, which is used to access production resources from outside production while still logging all activities for later review.

This model intends to ensure no bad actor can get in and perform inappropriate actions undetected. It does not give any additional protections above the BTG model. Still, it does provide usage data that can be critical during a breach postmortem to determine how an attack occurred. This can help with post-attack mitigation strategies.

It does assist with identifying a disgruntled employee’s bad actions because everything they do is logged and examined later. Hence, covert activities by employees can be avoided.

Escalation Model #3. Incident Tooling

Both previous models work and are relatively easy to implement, but they don’t, on their own, provide sufficient protections for highly secure production systems. In these environments, a better solution is Incident Tooling. These are specialized operational management tools that are created and are used to perform tasks that an engineer might not normally be able to perform.

As a simple example, often it’s necessary to reboot a production server to resolve an issue. Normally, to reboot a server, you need superuser access to the server itself. This is an inappropriately high level of permission to be given to all on-call engineers. Instead, you can create a specialized tool that can, with a single click, reboot a production server. Then, only access to click the button needs to be given to on-call engineers.

The advantage of this type of tooling is that the tooling can have security business logic built in. This business logic limits the scope of actions sufficient to allow an engineer to perform the tasks they require (such as rebooting a server) without giving them excess permissions (such as access to log in to the server itself). The tool logic can even require information, such as an incident ID, ticket or request comment before the command can be performed.

Two-Person Check and Balance

To provide an even higher level of production security, all of the above mechanisms can be combined with a two-person permission mechanism.

In two-person permission systems, two people must be actively involved in the process and approve of the action before any of the above escalation processes can be utilized. The second person provides a proactive check and balance on top of otherwise reactive mechanisms. This requirement is good for avoiding issues caused by disgruntled employees.

This check and balance can be applied to the business logic of any of the above escalation models, converting them into proactive checks versus a reactive mechanism.

Whatever method you use, just-in-time security mechanisms provide additional security protections above simply giving certain employees special access. This improves overall application operational security significantly.

Lee Atchison

Lee Atchison is an author and recognized thought leader in cloud computing and application modernization with more than three decades of experience, working at modern application organizations such as Amazon, AWS, and New Relic. Lee is widely quoted in many publications and has been a featured speaker across the globe. Lee’s most recent book is Architecting for Scale (O’Reilly Media). https://leeatchison.com

Lee Atchison has 59 posts and counting. See all posts by Lee Atchison