If you're building modern cloud software on Kubernetes, you've probably heard of the term "zero trust." This security model has risen to the forefront of security best practices because it addresses some of the new challenges of cloud native software.
Zero trust has become so important that even the US federal government is getting involved:
With that kind of buzz, zero trust has also attracted a lot of marketing attention and noise. But despite the hype, zero trust isn't just an empty term—it represents some concrete, profound, and transformative ideas for network security.
And while a zero trust approach can be difficult to adopt, the good news for Kubernetes users, at least, is that much of zero trust network security can be accomplished relatively easily just by using a service mesh like Linkerd. A sidecar-based service mesh provides a clear security model without the resource cost or complexity of non-sidecar approaches, and adds powerful zero trust primitives in a way that fits into existing Kubernetes concepts and patterns.
But first: what is zero trust, and why is it suddenly so important? And how do we "get it" in Kubernetes? Let's dig into the details.
As you might expect, the zero trust model is fundamentally about trust. It is a way to answer one of the most important questions in network security: is X allowed to access Y?
The "zero" in zero trust, of course, is a bit of a conceit. For software to work, obviously something needs to trust something else. So zero trust isn't about removing trust entirely so much as reducing it to the bare minimum necessary, and making the trust explicit rather than implicit.
Phrased that way, zero trust may not sound particularly revolutionary. Of course we should use the bare minimum trust and make things explicit. So what's all the fuss about?
As with many new ideas in technology, the best way to understand zero trust is to understand what it's a reaction to. In short, zero trust is the rejection of the perimeter security approach that has dominated network security in the past. In perimeter security, you implement a hard shell around your sensitive components—for example, a firewall around your datacenter. This model is sometimes called the "castle approach" because the firewall acts like castle walls. If you're outside the castle, you might be a bad actor, and we need to carefully vet you; if you're inside the castle, then you're a good actor and we trust you.
The zero trust model says that this model is no longer enough. According to zero trust, even within any security perimeter established by a firewall, you must still treat users, systems, and network traffic as fundamentally untrusted. The DoD's Zero Trust Reference Architecture sums it up nicely:
"[N]o actor, system, network, or service operating outside or within the security perimeter is trusted. Instead, we must verify anything and everything attempting to establish access. It is a dramatic paradigm shift in philosophy of how we secure our infrastructure, networks, and data, from verify once at the perimeter to continual verification of each user, device, application, and transaction.”
Of course, zero trust doesn't mean throwing away your firewalls. Defense in depth is an important component of any security strategy. Nor does it mean we get to ignore all the other important components of security like event logging and supply chain management.
But zero trust does require a fundamental change for us: we must move our trust checking from "once, at the perimeter", to "everywhere, every time".
The reason zero trust has become so important is that it addresses some of the security challenges in modern cloud software. To see why, let's compare the "good ol' days" of running software with the modern, cloud way.
In the good ol' days, we had:
Life was so simple back then! And in that world, the perimeter security approach was, well... not perfect, but in many ways sufficient to meet those constraints. Within our firewall, we could trust our machines, we could trust our network, and all we had to do was make sure bad actors couldn't get in.
Fast forward to the modern world. We now have:
In short, much of the trust we used to have in the physical layers of our infrastructure is no longer possible. Instead, we can only regain this trust through what we control—our software. That's why the zero trust model is important today.
This shift to zero trust has some profound implications for the way we think about identity, policy, and enforcement.
Zero trust requires that we rework the way we think about identity, especially system identity.
In the perimeter model, your network location was effectively your identity. If you were inside the firewall, you were trusted; if you were outside it, you weren't trusted. Perimeter-based systems could thus allow access to sensitive systems based on the IP address of the client.
In the zero trust world, we can no longer trust the network. At all. This means that your IP address is now an indication of location, nothing more. (And even that cannot really be trusted—there are many ways that IP addresses can be spoofed and forged!)
For zero trust, we need another form of identity: one tied to a workload, user, or system in some intrinsic way. And this identity needs to also be verifiable in some way that doesn't itself require trusting the network.
This is a big requirement with many implications. Even systems that provide security but rely on network identifiers such as IP addresses, such as IPSec or Wireguard, are not sufficient for zero trust.
Armed with our new model of identity, we also need a way of capturing the access each identity has. In the perimeter approach, it was common to grant full access to a sensitive resource to a range of IP addresses. For example, we might set up IP address filtering to ensure that only IP addresses from within the firewall are allowed to access a sensitive service. In zero trust, we instead need to enforce the minimum level of access necessary. Access to a resource should be as restricted as possible, based on identity as well as any other relevant factors.
While our application code could make these authorization decisions itself, we typically instead capture it with some form of policy specified outside the application. Having an explicit policy allows us to audit and change access without modifying application code.
In service of our zero trust goals, these policies can be very sophisticated. We may have a policy that restricts access to a service to only those calling services that need to access it (i.e. using the workload identity on both sides). We may refine that further and allow only access to certain interfaces (HTTP routes, gRPC methods) on that service. We may refine that even further and restrict access based on the user identity responsible for the request. The goal, in all cases, is the "least privilege" principle—systems and data should be accessible only when absolutely necessary.
Finally, zero trust requires that we perform both authentication (confirmation of identity) and authorization (validating that the policy allows the action) at the most granular level possible. Every system that is granting access to data or computation should be enforcing a security boundary, from the perimeter on down to individual components.
Similar to policy, this enforcement is ideally done uniformly across the stack. Rather than each component using its own custom enforcement code, using a uniform enforcement layer allows for auditing, and decouples the concerns of application developers from those of operators and security teams.
Faced with the requirement that we must rethink identity from first principles, reify trust in the form of policies of arbitrary expressiveness, and permeate our infrastructure with new enforcement mechanisms at every level, it is only natural to experience a moment of panic. And did I mention we need to do this by FY 2024?
The good news is that for Kubernetes users, at least, Kubernetes can make some aspects of adopting zero trust significantly easier. Kubernetes's gift to the world is a platform with an explicit scope, a well-defined security model, and clear mechanisms for extension, and this all makes the Kubernetes particularly fruitful for zero trust.
One of the most direct ways to tackle zero trust at the network level in Kubernetes is with a service mesh. A sidecar-based service mesh like Linkerd can provide a lightweight mechanism for applying workload identity and policy to a Kubernetes deployment with a minimal cost in resource usage and configuration. Linkerd's approach is well-suited to zero trust adopters:
Zero trust is a powerful security model that's at the forefront of modern security practices. If you can cut through the marketing noise around it, there are some profound and important benefits to adopting zero trust. And while zero trust requires some radical changes to core ideas such as identity, Kubernetes users at least have a big leg up if they are able to adopt a service mesh like Linkerd and shift from purely perimeter-based network security to "continual verification of each user, device, application, and transaction.”