Zero trust network security in Kubernetes with the service mesh

William Morgan

Introduction

If you're building modern cloud software on Kubernetes, you've probably heard of the term "zero trust."  This security model has risen to the forefront of security best practices because it addresses some of the new challenges of cloud native software. 

Zero trust has become so important that even the US federal government is getting involved:

With that kind of buzz, zero trust has also attracted a lot of marketing attention and noise. But despite the hype, zero trust isn't just an empty term—it represents some concrete, profound, and transformative ideas for network security.

And while a zero trust approach can be difficult to adopt, the good news for Kubernetes users, at least, is that much of zero trust network security can be accomplished relatively easily just by using a service mesh like Linkerd. A sidecar-based service mesh provides a clear security model without the resource cost or complexity of non-sidecar approaches, and adds powerful zero trust primitives in a way that fits into existing Kubernetes concepts and patterns.

But first: what is zero trust, and why is it suddenly so important? And how do we "get it" in Kubernetes? Let's dig into the details.

What is zero trust?

As you might expect, the zero trust model is fundamentally about trust. It is a way to answer one of the most important questions in network security: is X allowed to access Y? 

The "zero" in zero trust, of course, is a bit of a conceit. For software to work, obviously something needs to trust something else. So zero trust isn't about removing trust entirely so much as reducing it to the bare minimum necessary, and making the trust  explicit rather than implicit.

Phrased that way, zero trust may not sound particularly revolutionary. Of course we should use the bare minimum trust and make things explicit. So what's all the fuss about? 

As with many new ideas in technology, the best way to understand zero trust is to understand what it's a reaction to. In short, zero trust is the rejection of the perimeter security approach that has dominated network security in the past. In perimeter security, you implement a hard shell around your sensitive components—for example, a firewall around your datacenter. This model is sometimes called the "castle approach" because the firewall acts like castle walls. If you're outside the castle, you might be a bad actor, and we need to carefully vet you; if you're inside the castle, then you're a good actor and we trust you.

Perimeter security vs. zero trust security

The zero trust model says that this model is no longer enough. According to zero trust, even within any  security perimeter established by a firewall, you must still treat users, systems, and network traffic as fundamentally untrusted. The DoD's Zero Trust Reference Architecture sums it up nicely:

"[N]o actor, system, network, or service operating outside or within the security perimeter is trusted. Instead, we must verify anything and everything attempting to establish access. It is a dramatic paradigm shift in philosophy of how we secure our infrastructure, networks, and data, from verify once at the perimeter to continual verification of each user, device, application, and transaction.”

Of course, zero trust doesn't mean throwing away your firewalls. Defense in depth is an important component of any security strategy. Nor does it mean we get to ignore all the other important components of security like event logging and supply chain management.

But zero trust does require a fundamental change for us: we must move our trust checking from "once, at the perimeter", to "everywhere, every time".

workshop
Hands-on mTLS Deep Dive Workshop

Why is zero trust suddenly important?

The reason zero trust has become so important is that it addresses some of the security challenges in modern cloud software. To see why, let's compare the "good ol' days" of running software with the modern, cloud way.

In the good ol' days, we had:

  1. Physical machines, which we owned. These machines were in a datacenter, sitting in locked cages, behind locked doors, and staffed by security personnel. 
  2. A physical network, which we owned. Our physical network cabling was also within those secure datacenters, protected by cages, doors, and guards.
  3. 100% control over the machines and the network. Everything that ran on those machines was there because we put it there.
  4. Low expectations for our software! We could get away with quarterly releases. We could have regular planned downtime. There weren't even that many people on the Internet!
  5. And many other simplifying factors.

Life was so simple back then! And in that world, the perimeter security approach was, well... not perfect, but in many ways sufficient to meet those constraints. Within our firewall, we could trust our machines, we could trust our network, and all we had to do was make sure bad actors couldn't get in.

Fast forward to the modern world. We now have:

  1. No ownership of physical machines, which are instead rented from a cloud provider and provided through a layer of virtualization.
  2. No ownership of the network, which is also rented and provided to us virtually.
  3. No control over the actual network or machines, which is shared with all of our cloud providers' other tenants.
  4. Extremely high demands on our software, including daily releases, zero downtime, and scaling to massive amounts of traffic.
  5. And many other complicating factors.

In short, much of the trust we used to have in the physical layers of our infrastructure is no longer possible. Instead, we can only regain this trust through what we control—our software. That's why the zero trust model is important today.

This shift to zero trust has some profound implications for the way we think about identity, policy, and enforcement. 

What is identity?

Zero trust requires that we rework the way we think about identity, especially system identity.

Network identity vs. cert-based identity

In the perimeter model, your network location was effectively your identity. If you were inside the firewall, you were trusted; if you were outside it, you weren't trusted. Perimeter-based systems could thus allow access to sensitive systems based on the IP address of the client.

In the zero trust world, we can no longer trust the network. At all. This means that your IP address is now an indication of location, nothing more. (And even that cannot really be trusted—there are many ways that IP addresses can be spoofed and forged!)

For zero trust, we need another form of identity: one tied to a workload, user, or system in some intrinsic way. And this identity needs to also be verifiable in some way that doesn't itself require trusting the network.

This is a big requirement with many implications. Even systems that provide security but rely on network identifiers such as IP addresses, such as IPSec or Wireguard, are not sufficient for zero trust.

What is policy?

Armed with our new model of identity, we also need a way of capturing the access each identity has. In the perimeter approach, it was common to grant full access to a sensitive resource to a range of IP addresses. For example, we might set up IP address filtering to ensure that only IP addresses from within the firewall are allowed to access a sensitive service. In zero trust, we instead need to enforce the minimum level of access necessary. Access to a resource should be as restricted as possible, based on identity as well as any other relevant factors.

While our application code could make these authorization decisions itself, we typically instead capture it with some form of policy specified outside the application. Having an explicit policy allows us to audit and change access without modifying application code.

In service of our zero trust goals, these policies can be very sophisticated. We may have a policy that restricts access to a service to only those calling services that need to access it (i.e. using the workload identity on both sides). We may refine that further and allow only access to certain interfaces (HTTP routes, gRPC methods) on that service. We may refine that even further and restrict access based on the user identity responsible for the request. The goal, in all cases, is the "least privilege" principle—systems and data should be accessible only when absolutely necessary.

Enforcement

Finally, zero trust requires that we perform both authentication (confirmation of identity) and authorization (validating that the policy allows the action) at the most granular level possible. Every system that is granting access to data or computation should be enforcing a security boundary, from the perimeter on down to individual components.

Similar to policy, this enforcement is ideally done uniformly across the stack. Rather than each component using its own custom enforcement code, using a uniform enforcement layer allows for auditing, and decouples the concerns of application developers from those of operators and security teams.

Zero trust for Kubernetes

Faced with the requirement that we must rethink identity from first principles, reify trust in the form of policies of arbitrary expressiveness, and permeate our infrastructure with new enforcement mechanisms at every level, it is only natural to experience a moment of panic. And did I mention we need to do this by FY 2024?

The good news is that for Kubernetes users, at least, Kubernetes can make some aspects of adopting zero trust significantly easier. Kubernetes's gift to the world is a platform with an explicit scope, a well-defined security model, and clear mechanisms for extension, and this all makes the Kubernetes particularly fruitful for zero trust.

One of the most direct ways to tackle zero trust at the network level in Kubernetes is with a service mesh. A sidecar-based service mesh like Linkerd can provide a lightweight mechanism for applying workload identity and policy to a Kubernetes deployment with a minimal cost in resource usage and configuration. Linkerd's approach is well-suited to zero trust adopters:

  1. Workload identity is drawn directly from Kubernetes ServiceAccounts, building on existing Kubernetes security primitives without requiring extra configuration or departing from standard Kubernetes best practices.
  2. Connection authentication is performed via mutual TLS, an industry standard. Beyond validating identity on both sides of the connection using cryptographic proof, mTLS also provides encryption of data in transit and protection against corrupted payloads.
  3. Authorization policy is configured via a set of CRDs (Custom Resource Definitions), making policy explicit and allowing for compatibility with "gitops" approaches.
  4. Enforcement is done at the level of individual pods uniformly across the stack.  Each pod does its own authentication and authorization, meaning that the network is never trusted. These same guarantees can be extended across cluster boundaries to apply to multi-cluster communication.
Many of these feats are made possible by Kubernetes's powerful sidecar model, which allows a form of late binding of operational functionality at runtime. By injecting lightweight "micro-proxies" into pods at runtime and wiring them to handle incoming connections, a service mesh can deliver many of the security requirements of zero trust in a way that is entirely decoupled from application code, allowing security teams and developers to iterate independently in delivering a functional but secure application.

Of course, adopting a service mesh is not a panacea. Kubernetes security is a complex topic that requires a variety of tools and techniques and, most importantly, a clear understanding of the threats you are protecting against, and what Kubernetes does and does not provide in each case. A "pure" zero trust environment may never actually be fully achievable, and that's ok: the value of zero trust is as a rubric by which security decisions can be measured and evaluated.

Wrapping it up

Zero trust is a powerful security model that's at the forefront of modern security practices. If you can cut through the marketing noise around it, there are some profound and important benefits to adopting zero trust. And while zero trust requires some radical changes to core ideas such as identity, Kubernetes users at least have a big leg up if they are able to adopt a service mesh like Linkerd and shift from purely perimeter-based network security to "continual verification of each user, device, application, and transaction.”