A deep dive into Kubernetes mTLS with Linkerd
Mutual TLS (mTLS) is a hot topic in the Kubernetes world, especially for anyone tasked with getting “encryption in transit” for their applications. In this workshop, we give you a solid understanding of what mTLS is, how it works, and how it compares to alternatives. We also walk you through how to set up, monitor, and understand mTLS between your services on a Kubernetes cluster with Linkerd, the CNCF service mesh.
For the hands-on portions, it’s important that you arrive prepared. Please have a Kubernetes cluster ready (e.g. k3d, Civo, or any other managed K8s distribution), and the Linkerd CLI installed on your machine—check out the first few steps of our Linkerd Getting Started Guide if you want some specific instructions on how to do this.
(Note: this transcript has been automatically generated with light editing. It may contain errors! When in doubt, please watch the original talk!)
Welcome, intros, and logistics
Jason: Hello, everyone, and welcome. Today, we’re going to do a deep dive into mTLS, just in general, and then how it works in Kubernetes and with Linkerd. I’m your host, Jason Morgan. I do evangelism for the Linkerd project. And here we’ve got one of the maintainers of Linkerd, Matei David, and we’ll do a bit more introduction in a minute. So, just a little bit of administrative before you go. This is a webinar format, so you’re not going to be talking in Zoom. We do have a chat going on in the Zoom chat, and we also have the Linkerd Slack, and so, that’s slack.linkerd.io. And I’ll post some links in the chat in a minute. We have a workshops channel, and on the workshops channel, you’ll be able to talk directly with a couple of folks from Linkerd that are ready to ask questions and help you through the workshop material.
Speaking of which, there is going to be a Git repo that I will also put in the chat. Thank you, Catherine, for that link. You can follow along with the live demo. Love to have you there. And yeah, go in, pop in, and give a wave as you join, be so grateful to have you. Next slide. A little bit of a commercial pitch, and then, I’ll lay off it.
If you are using Linkerd and you love it, check out Buoyant Cloud. It is a SaaS offering that runs on top of Linkerd that gives you some additional features that you may find interesting, including things around mTLS that could be valuable. If you’d like, please request the Buoyant Cloud demo at buoyant.io/demo, and someone from Buoyant will walk you through what it is in a one-on-one session. All right. Thank you so much. Matei, back to you.
Matei: Right. Well, hello everyone. I am super excited to present this workshop, and I’m very glad that you could join us. So, Jason already introduced me. My name is Matei. I am a Linkerd maintainer, and I work full-time at Buoyant. So, pretty much all day, every day I work on Linkerd when I’m not walking people through workshops. You can follow me on Twitter at @__mateidavid. You can find me on GitHub and on Slack. So, if you have any questions after the workshop, or, if you want to say hi, please feel free to do so. And just again, we have a Git repository for the interactive part. If you want to follow along, it would be very helpful to clone this repo locally, just because we have some manifests in there that we’re going to be using. There’s also an optional assignment in there.
I don’t really want to call it homework, but it’s homework. So, it’s designed to give you a bit more practice after we’re done with the workshop. And finally, for the workshop, I want it to be pretty hands-on. I don’t want it to be very theoretical, although I am quite theoretical in nature. I’m a big nerd, but I won’t really go into details about implementation. So, if you do want to have a look or if you want to know more, either send me a message or I linked the relevant RFC here, have a look at the RFC because that’s always a good source of truth. Okay. So, let’s have a look at what we will be doing today. So, obviously, we’ll talk about mTLS, it’s the highlight of this workshop, but before we get to mTLS, we’ll quickly go over authentication and why it’s important in this context. We’ll talk a bit about TLS, the what’s and the why’s, and how it works from a high-level overview. We’ll cover mTLS and the differences between mTLS and TLS.
Authentication, the foundation of secure communication
And then finally do a bit of identity management theory, and get hands-on with mTLS and Linkerd. Cool. So, I’m ready to start. Once again, if you do have any questions please feel free to interrupt me and ask away. So, we’re going to start with authentication and the reason why I want to start with authentication is because, in my opinion, it’s the foundation of secure communication. So, that’s exactly why it’s a building block. Authentication as a concept, allows us to verify the identity of users. And, as we’ll see, it’s really necessary for the context of TLS and communication security. But without getting ahead of myself, what does communication security actually mean? A lot of people, not a lot of people, but some people make the mistake of thinking that security is just a number encryption, and specifically communication security. And that’s a bit far from truth because generally when we talk about a communication channel, and about communication security, we mean that connection or a channel has free guarantees.
CIA: confidentiality, integrity, and authenticity
We want those guarantees to be confidentiality, integrity, and authenticity. So, confidentiality typically refers to both parties that participate in the secure connection, having the ability to read the data, and no one else can do it. So, the data’s confidential. We want the data to have integrity. So, whatever you send on the wire in a connection, it’s also what you’ll receive on the other end. And we want it to have authenticity. And authenticity is a bit subtle here because you can take it as a synonym for authentication, but authenticity in this context is more of a property. It means that you can verify the identity of the parties that take part in the secret communication. So, I’m inspired by one of my colleagues. He always has a lot of handy mnemonics that help you memorize stuff. So, the mnemonic here is CIA, confidentiality, integrity, and authenticity.
Authn vs authz
And in practice, confidentiality is usually done for encryption. Integrity is done through hashing algorithms, and authenticity and authentication in the case of TLS are done through certificates. Before we go to TLS though, there’s another misconception that I want to quickly iron out. I see again a mistake that people mistake authentication for authorization, and that’s a bit far from the truth. Authentication, or authn, as you see it on the internet, is more related to “are you who you say you are?” It’s more related to your identity. Is there a way that I can trust your identity and verify your identity? And in practice, and in the industry, we do this through tokens and certificates. Authorization, on the other hand, authz, is more around permissions. Are you allowed to do what you want to do? Are you allowed to access this round? Are you allowed to access this file?
So, the real-world example is access control lists and policies. ACLS, if you’re familiar with Linux, you’ll definitely recognize them. And before I go on, there’s another point that I want to make, authentication is a prerequisite for authorization. So, we can’t really mistake one for the other because in order to do authorization and say, “Hey, are you allowed to actually access this resource,” we need to have a way to authenticate that user, or system, or device first. And with that being said, I don’t know if we have any questions, Jason? But if we don’t, I’m going to get into TLS.
Jason: Yeah. You’re good to go so far. Folks, if you’re listening, pop into the Slack, we posted the link in the chat a couple of times. Love to hear from you and love to answer questions either on Slack, in the chat, or here live with Matei. All right, sorry. Go ahead, Matei.
What is TLS?
Matei: Right, TLS. So, what is TLS? TLS is a communication protocol. It’s also a security protocol, and it’s connection-oriented. TLS, as a protocol runs on top of TCP. And this gives it a lot of flexibility because, if it runs on top of TCP, it means it can integrate with a bunch of applicational protocols, so you can use it with HDP1, HDP2, GRPC, and so on. Chances are if you have a protocol, or if you use plain TCP, you can do TLS on top of it, but it doesn’t answer the question what is it? Well, it’s a communication protocol that does two things, but it does them very well. First, it does authentication. It verifies the identity of the server that you’re connecting to. And second, it secures communication. So, if we look at this diagram here, it’s very simple in nature.
We have a client that connects to a server, and first, the client will verify the server’s identity. And after that, we have secure communication. Now, naturally, it’s not as easy in practice, but yeah, as an overview this is how it works. And just to give you a real-world example, you probably, even if you don’t have experience with TLS implementing, or using it, you probably saw it out in the real world. So, the browser, whenever it connects to a server point, like buoyant.io or google.com, generally verifies the server. Whenever we use HTTPS, the old SSL, we are actually doing TLS. The client, in this case, the browser, will connect to the server, and before the connection is actually established and you see your webpage served, we verify the identity of the server to make sure that we connect to who we want to connect.
The TLS handshake
And just to really hammer home a point that I made earlier on, authentication is the foundation of secure communication, because without identifying who we are talking to encryption is practically meaningless. We can encrypt, and we can have integrity and hash messages. But if we don’t know who we’re talking to, then obviously we do not have secure communication. So, just to get a bit more in-depth, how does TLS actually do it? Well, when a client connects to a server, you’ll notice that it’s exactly the same image, but we just have a few more steps here. We’ll have what’s known as a handshake. So, this is an inspiration maybe, from TCP. If you’re familiar with the protocol, in TCP we always have a three-way handshake whenever we establish a connection. Well, TLS also has a handshake.
So, whenever a TCP connection is established, we will establish a TLS session. And as part of this TLS session, the first step will be the client connecting to the server and being like, “Hey, hello, I want to do TLS. So, let’s start this session.” The server will reply with a certificate. It’ll give it to the client and will say, “Okay, well, here you go, authenticate me.” If the client authenticates the server, then it gives it an encryption secret. And based on that, they reach a consensus and encrypt your connection, and while you achieve secure communication, because you have all of the CIA guarantees.
Cool. Now a bit more of a text-heavy slide, but just I said to recap on protocol details, TLS is implemented on top of TCP. This is really helpful because you can use a variety of application-level protocols in order to do authentication. It uses asymmetric cryptography. So, this is public-key crypto. If you’re not really familiar with public-key crypto, I’m going to cover it in a bit more detail later. But what you need to know for now is that we basically generate a key pair, and this key pair has a private key and a public key. One key can encrypt and the other can decrypt. It’s not necessary that the public key encrypts, and the private key decrypts. Both can be used for the same purpose, but out of convention the public key is generally accessible to everyone and you share it with everyone and the private key you keep private.
So, it’s in the name there. The TLS setup is known as the handshake, and the handshake basically also agrees on some configuration parameters because TLS has a cipher spec. It agrees on what hashing algorithms … Well, not hashing, what security algorithms to use, and so on. There’s a bit of a performance cost. So, there’s a latency cost that’s associated with TLS, and that you should probably be aware of, but this is generally paid at connection establishment. So, again, if you’re familiar with TCP, TCP has a bit of overhead with the way it does its handshake because it’s a stateful protocol. And TLS also has to negotiate this session. So, this will obviously set you back and cost some resources, but it usually flattens out after the connection, after the session is established. You will see CPU spikes because you will constantly encrypt and decrypt, but this is generally negligible. And finally, authentication in TLS relies on certificates and public keys. So, the public keys should be available to all parties.
What is mutual TLS (mTLS)?
All right. So, moving on, we’re ready to jump into mTLS. And as you’ll notice, this is the same exact diagram as before, because mTLS is essentially the same protocol, but with just one added step. Instead of just verifying the server, now we also verify the client. So, basically, mTLS is TLS, but in both ways. That’s where the mutual comes from. So, the client will connect to the server. It will verify its identity. The server will then ask the client for its certificate and it will verify its identity. And after that, the communication is secure. Now, one question that I had when I was learning about all of this stuff and that you might have is okay, well, why don’t we just use TLS everywhere? Or why don’t we just use mTLS everywhere? If they’re so similar, why do we need a new version?
And the answer is a bit more complicated, but it generally depends on what your use case is. With TLS, when your browser is connected to a server, we need to validate the server because we might have sensitive data and we want the communication to be secure. But generally, the server does not care who the client is. The server has to serve web pages. It has to do some static stuff. It does not care who the client is. And it does not care about the browser. As sad as that sounds. But with mTLS we generally tend to use it in different use cases, mostly related to microservices and distributed systems. So, if you have, for example, an API server and you have clients connecting to your API server, you probably want a way to identify them, because once you identify them, you can first send them the bill.
If you want to charge them for using your services, you can do rate limiting. You can do a bunch of stuff, but you need to know what the client’s identity is. So, the use case for mTLS is you use it when you also want to know who the client is, and what client connects to you. But mTLS has some more interesting properties, at least in theory. So, authentication is symmetric. And this has a huge implication when it comes to cryptography because now mTLS also provides cryptographic proof of your clients. So, now you have a way to trust and verify clients, and this trust and verification also set the basis for client-based authorization. So, moving back to my example of having an API server that people can connect to, if you want to make sure that no unauthenticated clients connect to you, you can do that through mTLS.
mTLS: The best way to do zero-trust security in cloud native
And then if you want to restrict per path access, for example, and you want to say only a subset of the clients can do a get, only a subset of clients can do a post on this path, then you can do that with client-based authorization. So, all that is to say, is that mTLS is the best way to do zero-trust security, at least in a cloud native environment. What does zero-trust mean? Well, zero-trust means that you literally do not trust anything that goes in your cluster. You do not have the assumption that the network is safe. So, then you do not trust anything that goes on in your network. And this is generally useful when you do not own the network, or when you’re not overly concerned with it. And well, you want things to always be secure. And what makes mTLS the best way to do as your trust security, at least in cloud native, is that identity is very fine-grained.
It’s very granular, it’s at a pod level. So, when everything is at a pod level, you enforce basically verification and a lot of different points in your system. Obviously, running mTLS also will come at a cost, and I’m going to cover that later in identity management, but it also has the advantage of being extended, or extendable, to arbitrary cluster topologies. So, you’re not really just tied onto the network, you can also do mTLS and preserve identity across cluster boundaries. There’s also a mechanism for secret loss at almost any level. There’s a bit of an asterisk there, but again, we’ll cover that in a bit.
mTLS, protecting from man-in-the-middle attacks
But is mTLS all you need? And obviously, it is not. mTLS is just a tool in your toolbox. And if I would say anything else than that, I know I would have a bunch of angry security experts knocking on my door. But mTLS is really just a tool, and it’s meant to be used in the right cases, and in the right scenarios. So, it’s really good to protect against a certain case of vulnerabilities and to offer you this way of verifying identity. So, generally with mTLS, you’re going to be protected from on-path attacks. So, this is when you have someone that wants to intercept your connection, man-in-the-middle, as it’s also called, because you have confidentiality. You have the CIA guarantees, nobody can actually read your message. It protects against spoofing attacks because you cannot have someone impersonating your server or your client because you verify identity. And then it also protects against malicious requests. Like we talked about the API server example, you can prevent authenticated clients from ever reaching your server. Questions?
Jason: So, we’ve got a lot going on in the chat, but in general, it seems like it’s mostly what you’re about to cover. So, we’ve got Jay, who’s asking a little bit about what’s going to happen when it comes to multi-cluster? And how do you do that trust in a couple of different ways? And, sorry to spoil it folks, but that’s actually going to be something Matei talks about really, really soon. So, I’ll let him get to that naturally. Oh yeah, actually, Eric has a fantastic question for you. “Does mTLS protect the service mesh, the control plane, or both?” So, I think he means the data plane, the control plane, or both. Is there a control plane in Linkerd, as Eric is new to Linkerd?
The Linkerd control plane
Matei: Yeah. So, that is actually a great question. So, Linkerd does ship with a control plane, and I’m going to talk a bit about it in the Linkerd section in the workshop. And mTLS is used both for the control plane components and for the data plane, but as far as what protects what, the data plane mostly protects your application. So, the data plane does mTLS, so your applications can talk securely to each other without actually implementing TLS on their own. I hope that answers the question. If not, there’s probably going to be a follow-up.
Jason: It sounded good. Long story short, in Linkerd the control plane and the data plan are all protected by mTLS. Semet, I’m sorry if I said your name incorrectly. “Will mTLS add significant latency?”
Matei: Not significant. So, as I said, whenever your connection is first established, you have to do this handshake. So, you will incur a bit of latency, but I wouldn’t say significant. Obviously, all of this stuff would have to be benchmarked, and I would feel bad to speak out without having some numbers in front of me. But you’ll generally see just some small CPU spikes throughout the connection’s lifetime whenever you encrypt and decrypt stuff. So, the performance cost should be minimal is what I’m trying to say.
Jason: All right. And so, we have two questions or two or three questions that talk about, or ask about certificate rotation, expiring, and how do we get certificates to the clients in a manageable way? And we’re just going to go ahead and not answer any of that because Matei’s going to get to it directly in a minute.
Matei: All right.
Jason: And so, with that Matei, I’d say it’s probably a great time to move on. And thanks for all the questions. So, keep them coming, and if you can join Slack and ask in Slack if that’s possible. Yeah. José has another great question that you’re about to answer directly. So, I’ll let you go.
Matei: All right. Cool. So, as we’ve seen, mTLS provides those CIA guarantees. We have confidentiality for encryption. We have integrity through hashing, and then we have authenticity because we do authentication with certificates. But, like most of you have noticed, it requires identity management. So, identity management relates to certificate issuance, certificate rotations, revoking certificates, and everything that has to do with certificates and public cryptography. And I’m not sure if any of you have experience with writing something like this from scratch, but even using it in your application is a bit complicated. You need to integrate with a PKI, because I think that the rule of thumb in the industry is not to write your own PKI, or your own identity management system, similar to how you wouldn’t write your own cryptographic libraries. And then you also need to make sure, like in the question, stated in the question, that you pretty much issue these to all of your clients and servers, and you keep them in your clusters.
Identity management is hard to do, and to add to that, trust can be hierarchal as we will shortly see. So, before we get a bit more in-depth into identity management, let’s talk a bit about the foundation of mTLS, what makes it all possible, and what makes authentication possible. And I’m going to defer back to myself and say authentication is the foundation of secure communication. Hopefully, by the time you walk away from the workshop, you’ll know that. And the way we do authentication in mTLS is through certificates. So, certificates and public key crypto are pretty much the foundation of mTLS and what it relies on. All right. So, let’s see what a certificate is. When we have a public key and a private key, when we have a cryptographic key pair, we generally want to encrypt with a public key, and then decrypt with a private key. But then there’s a bit of a problem here because we need to share all of those public keys with everyone.
Distributing keys with certificates
Normally, if you just have a client and a server, it’s pretty easy to manage these on your own. You generate them, and you know that a client can trust the server because you pretty much bootstrapped everything yourself. But when you have a big Kubernetes cluster, when you start adding 20 workloads, 10 workloads, 50 workloads, you suddenly need to distribute all of these public keys to everyone. So, how can you distribute keys in a way that’s trustable and verifiable? Well, through certificates. And this is my really poor sketch of what a certificate would look like. I don’t think I’ve been certified in anything in my entire life.
So, I don’t know what it actually looks like, but this is how I picture it. And you’ll see that the certificate has a name and a public key, because fundamentally and theoretically, that’s all a certificate is. It’s a data structure, kind of like a list, a structure, a class that has an entity name, an SNI, and a public key that’s associated with that name. The certificate is then signed by another certificate, by another entity, and that entity basically says, “Okay, you can use this certificate. I certified that the public key indeed belongs to the subject of this certificate,” which in this case would be Bob. Now, to make matters a bit complicated, when you sign stuff, you sign it with your private key from your key pair.
And you sign it with your private key because only you have access to the private key, and you can use the public key, which is widely accessible, to decrypt the signature and see if it was indeed signed by that entity. So, if you trust that entity, you trust the certificate if you can decode its signature. That makes sense in a way, but I think I can settle this in an easier way with an analogy in a real-world example. So, if I want to travel and if I want to go to a different country, I usually need to provide my identification whenever I leave and whenever I enter the new country.
Now, my identification usually is my name, my date of birth, maybe my address. I have a social code, social security, I guess, it’s in the U.S. Here, we use something else. But anyway, if I write them all on a paper and I go to the airport, and I try to leave the country and they ask for my ID, and I give them the piece of paper, in the best case scenario they’re just going to laugh at me and turn me away. And that’s because they don’t actually trust any of that. And how can you trust someone that just walks up to the airport with a piece of paper with their name on it? You need a passport, you need something that’s verifiable. And that’s exactly what a certificate is.
You can think of it as a passport. When I have a passport, it has the same exact things. It has my identity. In real life that’s not my public key name. It’s just my name and some other data. And this is issued by a government agency. This is issued by someone that everybody universally and implicitly trusts. So, if you go to the border and people want to decide whether they will allow you in, they will look at your ID and they know they can trust your ID because it was issued by an authority. And it’s the same with certificates in a way. You want to make sure that the public key you have is valid. So, you use a certificate to distribute this public key, but then you ask yourself, “Can I actually trust this person? Can I trust that that’s Bob’s public key?”
Well, who signed Bob’s certificate? Oh, this entity. Well, I trust that entity. So, then I trust Bob, I can use Bob’s public key. But trust can be hierarchical to make things complicated. And as we’ve seen, you already have a hierarchy, even if you just have two entities involved. So, you have a certificate and someone else that signs it.
The issuer, an additional trust layer
Now, you have layers, you have trust layers, and it’s very common in the industry to really layer trust and layer your identity. And in this diagram, I am going to use the … Well, an operational model that’s inspired by Linkerd. So, basically what we have here is a cluster. It doesn’t matter where it’s running. We have cluster one and cluster two, and each cluster has some workloads that are running inside.
So, in this case, the application connects to the server, and the server, which is still an application, will connect to a gateway across clusters and send something to the gateway, which then sends data to another application. And so on. Now, if we want to manage certificates on our own, we can do that. We can just have a certificate for each app, and that works fine. But then we need to distribute all of these public keys, and we need everything to trust each other. And that’s hard to do between cluster boundaries, and across different topologies. So, we introduce another layer. We introduce an issuer that we implicitly trust because we bootstrapped, and this issuer will instead sign certificates. When mTLS happens, and we authenticate, we don’t just authenticate the client certificate, or the server certificate. We authenticate the whole chain.
We look at the certificate and we ask ourselves, “Who signed this?” We need to find out who signed this and determine whether we can trust that entity. So, it’s the same with an issuer. The issuer signs the certificates and issues them. We can trust the issuer. But then, if this application here sends a cross cluster call to cluster two, the gateway won’t trust it because the gateway has no notion of this issuer. It knows its own issuer. So, then we add another layer to this, and we add what’s known as a trust anchor. This trust anchor, or a certificate of authority, is usually your root certificate. It’s the thing that really bootstraps your identity. So, now the problem is solved because if we have a certificate authority that’s present in both clusters, or simply sits outside, you’d need to have it distributed in the clusters.
But for this purpose, it was easier to put it outside. And this certificate authority signs and issues all of your issuers. You now have a cross-boundary identity management system that can actually work, because when we verify everything, for example, this application sends a request to the gateway, we’ll ask, “Well, who signed the gateway? This issuer did. And who signed the issuer? Our trust anchor.” So, hopefully, that makes sense. I’m sure this will probably warrant some questions. So, anyway, when it comes to identity management, obviously it’s a problem, and mTLS is hard to implement in your application. But good news, Linkerd solves it all for you. So, with Linkerd, mTLS is an out-of-the-box feature. We generate certificates for you if you install it with the command-line tool, or you can also integrate with an existing certificate infrastructure.
Zero-trust with Linkerd
And each pod gets its own certificate, getting us close to that zero-trust thing that I talked about earlier. What does Linkerd do as a whole? Well, it’s a service mesh, and it does really three things. It provides observability, reliability, and security. Linkerd is focused on operational simplicity. So, if you had a chance to look at the project, we wanted to keep things simple, not just in the code, but also in operating Linkerd. And how does Linkerd work? And this is probably my favorite part of the workshop. So, in this case, in this diagram, I split it in half. It’s a bit convoluted, but it’ll hopefully make sense by the end. So, when you install Linkerd, and when you use Linkerd, you’re basically concerned with what’s in the bottom half of this diagram.
So, Linkerd ships with a control plane and with an issuer and a certificate authority. So, these are the two certificates that you need, and Linkerd actually generates them for you. Linkerd has a control plane, and this control plane is used to bootstrap identity and also to do service discovery, and other things. But basically, each application gets a proxy that sits in front of it as a sidecar in the same pod, it’s just a different container. And this forms your data plane. Now, your data plane is actually the one that secures communication, and mTLS is everything. Your application will want to talk to a server. Like you have a client, you talk to a server, it will open up the connection, but actually, the proxy will intercept your connection. It will intercept your network calls, and it will secure the communication to the other proxy, which is hopefully sitting in front of your application.
So, this is what you see. mTLS is magical. It’s out of the box. Everything is encrypted, but what actually happens to the proxy? Well, when it first starts up, the proxy starts up with the root CA in its memory. And we use this for validation, for certificate validation. The first step that it does is generate its own private key. It does this only once in its lifetime, and it will use this private key to decrypt stuff. It will connect to the identity service. This is part of the control plane. The identity service will make use of the issuer certificate to validate itself against the proxy, and the proxy can validate it with its certificate authority. The proxy will send a certificate signing request with its service account identity, and service account token.
Identity tied to the workload
So, in Linkerd, we tie identity to the service account, to the workload. So, the identity service will basically receive the service account identity. So, it knows which pod it’s supposed to talk to. And it also receives the token so that it can verify it’s talking to the right pod. After it validates this token, it sends what’s known as a trust bundle. A trust bundle is one or more certificates bundled together. And in this case, it will send the issuer certificate and the proxy certificate. So, the proxy basically will have all of the mechanisms it needs to verify the authenticity of whoever it is talking to. And then, you just rinse and repeat every 24 hours, without actually even noticing. So, it’s pretty cool. Awesome. And I am finally ready for the demo, but do we have any questions before we go into the demo?
Jason: You know, t’s been pretty active. So, there were a lot of questions around multi-cluster, which hopefully got answered, but folks in the chat, please, please feel free to sound off if you still have questions. There’s a bit about certificate rotation. So, the internal CA, this is a great one from Cade. The internal CA could be a single point of failure for the cluster. What redundancy and availability might we expect? What it is, just to answer this real quick. What you have is a root CA that you might share between clusters. And ideally, you’ve either got a setup to rotate that, or you’ve got a significantly long-lived root CA that’s not an issue for you. And I’m not sure if that will get covered in the demo, but you’re holding the public key for the root CA in your cluster. So, you don’t have to worry about it going down and then all communication failing.
Matei: Cool. I see another interesting question in the workshop channel that I might want to take on. So, how do you respond to critics that argue that it is insecure or invalid to have an unencrypted channel between containers running within a pod? Right. So, this is actually a great question because mTLS actually does not protect against localhost attacks. So, if you get access to the pod’s network namespace, you can pretty much see everything in plain text. And you’re right about that. But generally, if it comes to that, that’s out of the scope of mTLS. mTLS just protects the communication and secures the communication between your pods, but it does not secure the actual loopback interface. And generally, the guarantee and the assumption that people make is that localhost is secure. If someone gets access to the loopback interface, and they can actually seal off the packets that go over the loopback interface, then …
Well, there are a couple of problems associated with that. So, generally, as I said, mTLS is just a tool in the toolbox and you should use it with other security practices to make sure that everything is locked down. It’s not a one-stop solution. And it does not cover encrypting loopback packets. And then there’s another question. Any issue with long lift connections like gRPC? No, there shouldn’t necessarily be an issue. Or at least not that I know of.
But anyway, I digress. So, I hope all of you had a bit of time to set up your environment. So, these are the steps that we’re going to do. We’re first going to install Linkerd. We’re going to deploy some example applications. We’re going to verify that there’s no TLS between them.
Then we’re going to deploy Linkerd, and we’re going to check TLS again. And we’re going to do all of this with TShark to capture any code packets. So, if you’re all ready, I’m using k3d locally. So, if I do k3d get nodes … Oh, sorry. That’s kube control get nodes. You’ll see that I have just one node workshop server. That’s my control plane and my master. And right now this is a shorthand for kubectl get pods. I only have whatever is in the kube system So, in the workshop repository, we have a manifest folder, manifest directory. And in here we have two deployments. We have a curl deployment, and an Nginx deployment. So, I’m just going to look at both of them in turn. The curl deployment is really simple. It just deploys a curl image.
And we’re going to use this as our client. Now, the Nginx deployment has a few more things. So, it has a service and we’re going to use this service to send requests from the curl pod through the Nginx pod. And then it includes the actual Nginx container. And it also includes a Linkerd debug sidecar. So, this is me cheating a bit, but basically, I want to use TShark to show you how we can decode and sniff packets. And the easiest way was to use the Linkerd debug sidecar, even though we actually don’t use Linkerd yet. So, the debug sidecar just has a couple of traffic and networking tools. So yeah, it’ll just make everything a bit easier. So, I am going to deploy both of these things.
So, just because I’m using aliases, for me, K stands for kubectl. So, that’s just ctl, and if we get pods now we should see that in the default namespace, we have both a curl pod and an Nginx pod. So, I’m going to exec into both pods. Hopefully, you can follow along. If I’m going a bit too fast, you can feel free to slow me down, but I’m going to first exec into the curl pod. And I’m going to start a shell. Then I’m going to do the exact same thing, but with the Nginx pod. But here we need to also specify the container, which is Linkerd debug. And I’m also going to start a bash session. So, as I said, we’re going to use TShark to sniff traffic. And the command here is pretty simple. TShark will pass in dash, which means the interface that we want to listen to. For a purpose of simplicity, I’m going to choose any. We can also do just ethernet, but I think it’s just simpler to use any.
Decoding packets that go to TCP port 80
And then we’re going to decode any packets that go to TCP port 80 as HTTP. So, we can see exactly what’s being sent from the curl pod. So, this is going to run. It’s going to say that it’s capturing on any, but we actually haven’t sent any packets through yet. So, going to curl the fully qualified domain name of our Nginx service. So, it’s Nginx deploy, because that’s the name of the service, dot default, which is the namespace deposit running it. And then we’re going to pass in the cluster domain and port 80. So, the response is coming back. This is what Nginx sends us. But if we look at everything that’s happening here, we can have a better idea of exactly what packets have been sent. And if you’re not familiar with TCP’s handshake, basically …
Or with TShark, for that matter, basically here we have the timestamp, here we are going to have the client, here we’re going to have the server. So, this is pretty much the sender and the receiver. When we sniff the packets, this will make a bit more sense in a minute, but basically, our receiver, in this case, is Nginx, and our sender is the curl pod. So, 0.9 is curl, 0.10 is Nginx. And the way TCP starts is with the three-way handshake. So, we’re going to have an SYN sent from the client to the server. We’re going to have an SYN-ACK sent from the server to the client. We can see it’s from 10 to 0.9. And finally, we’re going to have an ACK. And after that, we can start sending stuff on the wire.
Running the Linkerd script
And you’ll see that basically here we can see exactly what was sent. HTTP is a protocol, and whenever you send an HTTP request, the first thing that you send is the method, the path, and the version of HTTP. So, we can see everything here in plain text. We can see it receives a 200 OK. So, pretty much there’s no NTLS here. And of course, there isn’t, because we’re not using NTLS. So, now I’m going to install Linkerd. So, I already have it installed. I’m not going to run the script. If you’d to have a look at the script instead of just randomly installing stuff on the internet, and I think that’s good practice, you can just remove this bit. But yeah, just curl this endpoint, run that linkerd.io/install. And after that, you’re going to have the Linkerd CLI. So, if I do Linkerd dash dash help, we can see all of the commands that are available.
Obviously, I know a lot of them by heart, so I won’t be going through them now, but basically what we want to do is just Linkerd install. This is a favorite of mine. So, basically, when we do Linkerd install, we don’t actually operate on your cluster at all. We just render some manifests. This is the control plane manifests, and you can have a look at them, go through them, and pretty much verify that everything is up to your standard, and we’re not doing anything dodgy. So, we’re not going to directly operate on your cluster. So, instead, we have to pipe this to kubectl and apply it directly into our cluster. So, we’re going to do that.
And in a minute, we’re going to see all of our pods spinning up. So, the control plane in Linkerd, actually in 2.11, is even lighter now. And I think the data plane footprint is around 15 megabytes for the proxy. So, we’re trying to keep it as light as possible. And for this purpose, we only have three pods. We have the identity service that we talked about, the destination service, which provides us with service discovery, and then injector, which injects the data plane next to your application in the pod. I’m going to see, yep, everything is ready. So, the next thing that we want to do, if we have a look at our current deployments, I’m just going to zoom this in, if we have a look at our current deployments we only have one container in curl and two containers in Nginx. The Nginx container and the debug sidecar that we’re using.
But now we also want to install Linkerd. So, the easiest way to do this is to actually get the deployments. This is already filled in for me. We’re going to get the deployments in YAML format. This is hopefully going to render them all on a list. We’re going to inject them for Linkerd, and then we’re going to reapply them. In Linkerd inject, really all it does is adding an inject annotation. So, if we get, for example, the curl deploy, we’ll just see exactly what Linkerd injected, and just add it in the pod specifically, this inject annotation. And the proxy injector, which is a mutating workbook, will now add a sidecar to all of our workloads. So, you can see curl, for example now has two out of two containers, and Nginx deploy has three out of three.
All traffic is mTLSed
We’re going to do the exact same thing that we did before, but this time we are going to see that everything is mTLS. So, we’re going to exec into the curl pod, which curl pod is it? This one’s terminating at 86. We now have to also specify the container, which we have more than one container. And then we’re going to start a shell instance. And then shell session. And then here we are going to do the same thing. We’re going to Linkerd debug. And start a bash session. So, again, I always write TCP instead of TShark there, we’re going to listen on any interface and we’re going to decode the packets, but this time, instead of HTTP, I am going to pass an SSL so you can also see the handshake. And I am also going to exclude any packets that have the IP set to the loopback address, because this is going to be plain text anyway.
Jason: Sorry Matei, to pause you. For the folks in the chat, the recording will be available after this. And also if you haven’t seen it, I’ll paste it back in, but there’s a step-by-step guide in the GitHub link we posted earlier. So, we’ll share that again.
Matei: Oh, sorry. I had no idea that I’m going a bit too fast. So, if I can slow down and help people out …
Jason: Well, just again, if you’re stuck on a particular command, please do ask, ask right there in that slack channel and we can talk you through it. Yeah. Time’s the big constraint. Absolutely, Michelle. But again, there’ll be a recording and then we’re happy to answer your questions as you work through the material directly in Slack. So, thank you so much.
Matei: Yeah. We can wait a bit more, and just get people on the same page. So, if I would have a bit of an indication of where people are currently at, that would be super helpful.
Jason: Yeah. Really grateful to the folks that are following along. So, there were a couple of people asking about the TShark commands. Are we confident those are in the mTLS workshop?
Matei: They should be. So, I have them in here, and I can update this. I can update it with a comment, the code, TCP packets as HTTP. And we will also decode them as SSL. This is for the code PCP, HTTP packets. And TLS packets.
Jason: All right, so folks in the chat are happy with you continuing to go fast, as long as it’s recorded and they can get it later. So, thanks so much for slowing down for a minute, Matei.
Matei: Yeah, no worries.
Jason: And thanks to everyone in the chat. Sorry. Go ahead.
Matei: It would be helpful to know where people are at now because I can go back and redo a couple of things. So, I can wait a bit longer until people inject stuff, and answer a couple of questions maybe.
Jason: I think they’re okay. I think you’re all right. So, there are a couple of folks who were asking about phase and stuff, but that’s fine. They’ll get to that as they get to that material. So, he hasn’t installed Viz yet, right?
Matei: Yeah, I haven’t installed Viz. So, for the purpose of this demo, I only installed Linkerd. Well, I can get all of my namespaces. So, I only have Linkerd installed.
Matei: Viz is great to look at the dashboard, but for the purpose of mTLS it doesn’t necessarily add much. So, that’s why I skipped it in this demo.
Jason: Yeah. If you’re following along and you’ve run Linkerd Viz dashboard, unless you installed it yourself, you’re not going to see it. So, it’s demitted from these instructions. Go ahead, Matei.
Certificates and secrets
Matei: Charles, I see you asked a good question. Do you suggest short-lived certificates, and how are certificates stored? A vault or a key store? So, usually, we recommend that people implement … Well, not implement, but use their own certificate management systems. So, you can use, for example, cert-manager, vault, or anything like that. As far as the longevity of certificates goes, I think generally something like one year might be good, and the issuer and CA certificates, the way we store them in Linkerd is in a secret. So, the issuer certificate is in a secret, and the private key of the issuer certificate is a secret. But certificates are not meant to be secret. Even the root CA we just stored in memory of the proxies, for example, just the certificate itself. Not its private key, because the certificate is meant to be shared. So, there isn’t really a problem with making your certificates publicly available.
A few more questions maybe? Are people keeping up now? Should I go ahead?
Jason: Yeah, please continue.
Matei: All right. Well, yeah. So, Chad also had a good addition in the chat. You can download certs at any time. The secret key is the secret part. Exactly. So, usually, you want to keep your root CA secret offline. It’s the best thing you can do. And yeah, always keep the secret key secret. Anyway. So, what I did here, let me just do it again. Let me start from the top. I’m going to exit both of these pods. So, I installed Linkerd, and I have all of my Linkerd deployments up and running. And then in my default namespace, going to go to the default namespace, I have an Nginx deployment and a curl deployment. And both of these have been injected with Linkerd, and we can verify that by getting the pods in the default namespace. So, we can see that the Nginx deployment has three out of three containers.
And the curl pod has two out of two containers. So, this is an indication that there’s another sidecar running in there. And since the only other thing we did was install Linkerd, well, that’s a good indication. So, what I’m going to do now is I’m going to exec … Sorry, wrong command. I’m going to exec into the curl pod first. We need to pass the dash kit because it’s interactive. And dash C, because we want to target just the curl container. We want to start a shell script there, a shell … Sorry, shell session. And then I’m going to give you exact same thing, but with the Nginx deployment. So, again, we’re going to make it interactive. We’re going to go into Linkerd debug, because that’s the debug sidecar with all of our networking tools, and we are going to start up a bash session.
Setting up TShark
And in here I’m going to set up TShark to again sniff out all of our packets. So, we’re going to pass it the interface. And that’s what the dash argument stands for. And then we’re also going to pass in dash D, which means decode any TCP packets that go to port 80 as SSL. And finally, we’re going to … Excuse me, we’re going to remove any packets that are sent or received by local hosts from the output. So, this again will show us that it’s capturing. And then, what we’re going to do here is we’re just going to curl the Nginx service. So, this is going to be the fully qualified name of the service. And we can verify it by doing kubectl get services. Sorry, I’m doing it in a different window. We have this Nginx deploy service. So, this is exactly the one we want to target.
And when you use the fully qualified domain name, you usually also pass in the namespace. So, it’ll be the service name dot namespace dot service, and then cluster local. And then we’re going to pass the port, and this should have sent something. Now, we also received some data while I was talking. So, we’re going to go and look for SYN ACK. SYN, SYN ACK, ACK. Okay. So, I think this is what we’re looking for. The proxy is also sending some stuff. So, we’re going to see a bit more data in here, but basically, we can verify that the connection is TLS now. So, we have the initial TCP handshakes, SYN, SYN ACK, ACK. And then we can verify that we started a TLS session because we see a Client Hello.
So, if you remember, after the client first sends its hello, the server also sends its hello and it sends its certificate, some secrets, and some configuration parameters. And after that, the data is pretty much encrypted. So, I’m going to start a capture again, just to have clean output, and I’m going to issue the same curl again. Yeah, there we go. And you can see that everything basically will figure out as TLS. We have the TLS version, and again, everything is application data, all of the data that’s meant to be hidden and confer that confidentiality we can just see it as application data. So, we don’t actually know what’s being sent. We just know that we have a TLS session, and that has been established. So, that concludes this part of the demo. Do we have any questions so far? Have people been able to follow along?
Demo ends and Q&A
Jason: Yeah, we’ve had fantastic questions in the chat from folks, and in the Slack workshop. Thanks so much everybody. But they seem to actually be Linkerd questions. So, I think a lot of folks are excited to follow along with your recording later.
Matei: Yeah. Sorry I went too fast, but there’s only so much time.
Jason: Think you’re doing great.
Verifying mTLS with Buoyant Cloud
Matei: Well, the last thing that I want to show is related to Buoyant Cloud. So, I’m actually going to share a different screen. So, many screen shares. Let’s see. You should hopefully be able to see this, and I hope you’re not seeing the wrong thing. So, Buoyant Cloud is a different way to verify TLS. For those of you that don’t know, Buoyant Cloud is the SaaS offering that the folks from Buoyant have, the creators and makers of Linkerd, and it’s a really nice UI tool that can pretty much do the same thing as TShark when it comes to verifying your mTLS connection. So, it’s a bit easier to use, and probably and possibly a bit easier to follow along with. Basically, after you register, you have this dashboard and you have to add a cluster. I’m going to go ahead and add my own cluster here. I’m just going to call it a workshop. You have to pick an environment. I think this might be a bit hard for people to follow along with, just because you need an account. But then basically you add your own cluster.
You add your own environment, you can tag it with different things. Just going to skip that for the time being. And then you apply an agent into your Kubernetes cluster that will scrape all of the Linkerd specific metrics and all of your application metrics, and just make them available in the dedicated dashboard. So, I pretty much just pasted this manifest and applied it into my cluster. And I’m going to click I’ve done this. The agent will then connect to DUI, we’ll just wait for everything to come online on our side. Sadly, you cannot see my terminal just because it’s hard for me to share both of them. Or maybe I’m just a bit of a Zoom noob and I don’t really know how to do it, but we’re still initializing the Buoyant Cloud agent. I can probably switch back here. So, just so you can see what I’ve been doing. I’ve just been getting the pods in the Buoyant Cloud namespace. So, the manifest that I applied created a Buoyant Cloud namespace, and it installed the agent. And I’m waiting for this to be initialized. And I hope that my cluster isn’t messed up.
Any questions so far? I see the chat keeps growing and growing, but I’m sorry I don’t have enough time to follow through with everything.
Jason: Brendan. Yes, it does. So, there are a couple of Buoyant Cloud questions, which I’ll just answer in the chat.
Matei: All right. I’m still waiting for this to initialize, and I’m getting very upset with k3d at the moment. But this happens every once in a while. Live demos, I guess, bit of a pitfall there. Still initializing. Let me just do a rollout. So, rollout, restart, deploy. So, what I’m going through is I’m just going to restart all of the pods and in the namespace. And I hope that my demo will work after this, but … And yeah, moving unanswered questions to Slack would be great. Dwayne, I see you have the same problem. You know what? It worked fine in my day-to-day work this morning. So, I’m a bit bummed that it stopped working just now.
Jason: Well, folks, if you’re looking for a more in-depth Buoyant Cloud demo, buoyant.io/demo, and we’ll happy to do a one-on-one demo with you and show you how to get it working.
Matei: Well, I’m sorry that my environment started to act up all of a sudden. So, what I wanted to show you was that you can verify basically the same thing but from DUI. And with that, I’ll just conclude the demo and go back to the camera. Any final questions?
Okay. Well, I see a bunch of stuff here. Jason, do you have any questions that you want to take on, or should we just take them on in Slack?
Jason: Just because we’ve got a minute left, it would be amazing to just show how they can sign up for the rest of the things in the workshop series.
Matei: Yeah. So, let me share my screen again. You are completely right.
Jason: No, no problem. You’re doing great. So, for folks who enjoyed it, we’re going to just move on. There’s a Service Mesh Academy. So, go back a couple of slides, Service Mesh Academy. So, there are more workshops in the same vein. One, if you say things in the Linkerd workshop slack about the things that you want to see, we’d love to produce it. And really thanks so much for attending, really hoping to see you at the next one.
Matei: Right. Well, thank you so much, everyone. And that concludes the workshop. Thank you for sticking through to the end. And yeah, if you have more questions, feel free to ask me privately or directly in Slack. I’ll be happy to answer any questions I can get to. All right, bye-bye, and thank you again for coming.
Jason: Have a great day, folks.