Certificate management with Linkerd

March 17, 2022

In this workshop, we cover the basics of TLS certificate management in Linkerd. While Linkerd issues, rotates, and validates per-pod TLS certificates automatically, the treatment of per-cluster issuer credentials and global trust root credentials can differ based on security goals and organizational policies.

Transcript

(Note: this transcript has been automatically generated with light editing. It may contain errors! When in doubt, please watch the original talk!)

Welcome and logistics

Charles: Okay. Before we get started, a couple of quick housekeeping notes. As I’ve already mentioned, please make sure that, if you’re using the Zoom chat, you have it set to send messages to everyone. Zoom defaults to hosts and panelists, and we want to make sure that everyone gets the message so that they can read what you have to say.

And then for questions, in particular, we’ve got some folks who are helping out in the Linkerd Slack. So, if you go to slack.linkerd.io or maybe you already have an account there, there’s the workshops channel. We’ve got a couple of folks in there who are helping out.

Matei and I will be answering as many questions as we can during the course of the webinar. But some of those questions may have longer answers or details that other folks inside of in the Linkerd chat can help out with. So Slack is the best place to answer questions, but we will still answer the questions that you post in the Zoom channel.

So welcome! This is the official welcome to the certificate management with Linkerd starring Matei David, our current Linkerd influencer. This is part of the ongoing series of our Service Mesh Academy courses that we are designing and will continue to design. We’ve got a few in the past that cover various topics as well. They’re all recorded and you can view them I believe if you go to the buoyant.io website. We’ve got a media resources dropdown that you can select and see some of the past webinars.

Today, we’re going to talk about certificate management with Linkerd. So if you’re familiar with Linkerd, if you’ve kicked the tires on it, you’ve tested it out. I see Hujwall is talking about using a K3d cluster running Emojivoto. That’s great!

When you deploy Linkerd in that kind of environment, it’s going to generate some certificates for you and these certificates, Matei is going to go into great detail about what these certificates are used for. What we’re talking about today, and those certificates are great for a dev sandbox environment that you’re just going to test things out, maybe deploy a few of your own services. Take a look at the dashboard.

What Matei is going to talk about today are certificates for use in a long, lived, more mature environment. So these are your staging, UAT, production, whatever kind of clusters you have assigned for specific environments that you expect multiple folks to use and that you want to keep around for a long time and manage in an automated way, that’s what this we’re going to focus on today.

So I think I’ve covered all my stuff. That’s enough talking from me, so I will hand it off to Matei and he’ll get us started on this journey with certificate management with Linkerd. So take it away.

About the speaker

Matei: Thank you, Charles, and also I’m off to a great start. I didn’t realize that I’ve been talking for the past four minutes but I was on mute. Anyway, before we dive into certificates, a little bit about myself.

So I see some familiar faces in the chat. Hello, everyone. If we haven’t met before, I’m a Linkerd maintainer. I work full-time at Buoyant where I’m also a software engineer. Most of my days are spent hacking on Linkerd, and yeah, that’s all you need to know about myself for now.

You can reach me on Twitter, GitHub, or Slack. So if you have any questions after the presentation or, if more generally you want to chat about Linkerd or about the open source space, feel free to get in touch. Charles did a good explanation of the agenda and overall direction for our workshop, but I’m just going to recap some of this stuff.

Today’s Agenda

So, for the agenda today, we’re first and foremost going to talk about certificates. What is a certificate and what are certificates used for? How does a certificate look like? I think it’s really helpful to not assume any prior knowledge here and just get everybody up on the same page.

We’re going to talk a bit about Linkerd’s identity operational model. So we’re going to talk about why is it that we have so many certificates. We have an issuer certificate, a trust anchor, and then leave certificates for the proxies. If you don’t know what any of these things mean, also, don’t worry. We’re going to cover all of it. But yeah, I think it really helps to grok certificates and grok certificates in the Linkerd context, if you know how we use them and what our plan is.

We’re going to talk a bit about the benefits of certificate rotation and whether you should do it in shorter periods or if you should let your certificate validity just go on for 10, 20 years. And finally, we’re going to have a hands-on part. So my aim for this workshop and for people, in general, is to just get comfortable with operating Linkerd. So this is strictly about operating on a day-to-day basis.

What do I need to know about certificates? How can we make this simpler? Do I need to rotate certificates? Can I just do everything automatically? Do I even need to generate certificates myself? I think docs do a great job at specifying the goal of the details around certificates, but obviously doing it together and asking questions, and engaging in conversation is much more helpful for people to understand.

So this is my only ask out of all of you. If you have questions, please just go ahead and ask them. I think we can all just help each other out on Slack or in the Zoom chat and just complement our understanding of what certificates are used for and how to use them in a better way with Linkerd.

What is a service mesh?

All right. So for people who are not familiar with service meshes, you might have heard this buzzword term. It’s been going around for a few years now and service meshes simplified are just platform-level tools that you use at a platform level, not at an application level and they give you free major things. They give you observability, reliability, and security, and all of this is out of the box.

With Linkerd, we have a strict philosophy of keeping things very simple; simple to code and simple to use. And I know some people who are in the chat now have contributed in the past to Linkerd and this operational simplicity philosophy extends to the way we approach contributions in code as well.

Linkerd is also security-focused. So our main priority with Linkerd is to make things as secure as we can. That’s not a side effect of using a service mesh, that’s one of our priorities. Let’s make things secure, let’s make things reliable, and let’s make things fast.

Public key crypto

We’re going to talk more about Linkerd in a second, but before we actually dive into certificates and into Linkerd’s operational model and all of that, we need to talk about public key crypto. So I’m not sure if people are super familiar with public key crypto, I’m going to do my best to simplify it. But essentially, it’s a really neat way of proving that you have knowledge of a secret or of some information without actually sharing that secret.

So if you think about passwords in general and tokens, whenever you want to prove that you have access to a password or that you own a password, you generally have to send it, right? You go on a website, you type in your password, you send it along with the form. And, in a similar way, tokens are embedded in requests, right? So in order to prove that you have the token and all of that, and the tokens been issued to you, have to send it with your requests. Tokens are not really secret, but the point still stands.

Public key crypto does things a little bit different. So generally, it’s a key pair. You have a public key and a private key. The public key encrypts stuff, the private key decrypts stuff. The public key is supposed to be shared around, so everybody should have access to your public key and the private key is supposed to stay secret.

And this combination basically allows us to prove that we have access to this private key, we have access to this secret without actually sharing it around, right? So if, for example, somebody wants to encrypt and stuff using my public key, they do it. They send the information over to me and I use my private key to decrypt the stuff, gain access to the information, and I prove that I have access to it without actually sharing. You don’t need to see my private key in order for me to prove that I have access to it. If I don’t have access, I just don’t see the information.

So this is a really neat thing. It’s all powered by super mathematics. I don’t really understand all of the math behind it, but I’m told it’s a mathematics gift to computer science. And yeah, that covers what we need to know. There’s also another buzzword that’s super commonly used throughout the industry, and that’s a PKI. It’s not really a buzzword, but it’s more of a catch-all term that’s used to refer to all sorts of tools and scripts and actions that we can do with key pairs and certificates and stuff like that.

Good use of public key crypto is in communication security. So if some of you have heard of the TLS protocol or mTLS protocol. If you haven’t, I also held a workshop about that. But if you’ve heard about it, then mTLS is basically powered by public key crypto and certificates. And of course, public key crypto feeds into certificates. So yeah, it’s a really powerful system for us to gain access to cryptographic identity and just do a bunch of cool stuff.

Public key crypto downsides

But it also has two major downsides. First of all, it’s hard to share public keys around. So if you think about a system where we just have two applications, it’s pretty easy to share the public key around. Each one of these applications will generate a public key, and we just bootstrap each other with each other’s public keys.

But if you have a system that has tens of microservices or hundreds of microservices, it’s hard to share all of the stuff around, right? There’s a lot of bootstrapping involved. And also, if you have a service that has to access external services and you want to do it for secure communication and TLS and all of that, then you need to have a way to prove that the public key that you’re going to be using actually belongs to the device or person that you’re trying to connect to.

So, just to give you a really trivial example. If you want to talk to a bank over mTLS or whatever and you want to use their public key, then you need to know the public key belongs to the bank. Otherwise, if it belongs to someone else and you think it belongs to the bank, then you’re in deep trouble.

How certificates help

Certificates fundamentally came out of a need to solve these two problems, right? So first of all, certificates are really easy. They’re just a data structure that contains a name and public key. This name is bound to the public key when the certificate is signed. A certificate is usually signed by another certificate or another entity. So, usually, when we talk about publicly crypto, in certificates, we talk about entities. Entities are the pieces of code or other certificates that participate in this whole thing.

And yeah, a data structure that just finds a name to a public key. This is signed by someone else to confer a degree of trust, and we’ll come back to this in a little bit. But this is what solves both of these problems, because first of all, if we just share certificates, which are like a string of bites, it’s really easy to share them around. You can put them in a shared trust store, you can put them online and you can download them.

There are a bunch of ways which in you can share these certificates. And because they’re signed by another trusted authority, we have this guarantee that they belong to who we think they belong to. So as soon as the name is bound to a public key, we know that we can trust it if we trust whoever signed this.

Signatures also rely on public key crypto. So, in the first slide, I mentioned that usually when you have a key pair, a public key encrypts stuff and a private key decrypts stuff, and that’s mainly true. But we can also use it the other way around, because of the way mathematics works in this context. One key can encrypt and the other one can decrypt. So if we, for example, have a signature, we can sign that with our private key. So we encrypt it with our private key and whoever has access to our public key can decrypt it and verify the signature.

And this is really useful in the context of certificates, because if we trust the person who signed this and we trust their public key, then we can make sure it has indeed been signed by them. So, all of this is to say that certificates provide this very neat way of authenticating and making sure that a public key really belongs to someone that we want to communicate with. Yeah. Signatures prevent the certificates from being spoofed and they provide us with this degree of trust.

Before I move on, are there any questions so far? Anything that we can chat through and explain a bit better? How are we doing with the questions Charles? I’m monitoring the Zoom chat, but nothing here so far. I’m going to wait two more seconds before moving on.

Charles: Yeah, there’s a question that says, “Overheard that one of the downsides of mTLS is accessing external services. Is this somehow related to the multi-cluster functionality that Linkerd provides?”

Matei: Well, we’re going to talk a bit about multi-cluster. Well, through mTLS you can still access external services, but it means that the external service needs to be implemented in such a way that it can accept the mTLS protocol. So mostly, when we want to access external services, they don’t necessarily care about the identity of a client. So mTLS as a protocol basically ensures that we can authenticate both ends that take part in secure communication. And that’s fundamental to having things being secure. You need to know who you’re talking to, right?

In most cases, when you access an external service or an external website, even when you’re on a browser, the server doesn’t really need to know who you are. You need to introduce the concept of client identity only when you want to have metrics or when you want to do authorization or just finer grained things where the identity of the client is really important. But for example, an API service like GitHub or Twilio would generally rely on tokens or something else instead of mTLS. And we’ll cover the mTLS in multi cluster context soon too. But I hope that answers the question, I’m rambling at this point.

Charles: You made a good point there talking about the client. And I think when I was first learning about TLS and mTLS, there was often the comparison between when you use your browser and you go to some site and it’s using HTTPS, your client is just accepting that certificate from the server just like, “I trust you.” With mTLS, the nuance there that was not obvious to me at the beginning was that the server is also asking for the client’s identity and saying, “Do I trust you? Can we establish this trust together?” Does that sound about right?

Matei: Yeah, that’s the gist of it. Yeah, you’re right.

Charles: Great. As I said, that was something that tripped me up so I want to make sure that other folks don’t stumble on that as well. And I want to take just one quick step back, I have a question for you. Do you have off the top of your head any, and we can answer this later as well on Slack. Off the top of your head, do you have any resources that you would recommend for learning more about PKI, the stuff that you’ve just discussed?

Matei: Well, there’s a good blog post by William who also works at Point. If you don’t know about it, you can look him up on Twitter. He’s got a lot of hot takes on Linkerd. I think William’s post is really well written and simplified for understanding mTLS in general. And then for PKIs what I normally suggest, and this applies to any technical concept that I try to learn is to read the RFCs. I don’t think I have the RFCs linked here, but I would be more than happy to link them after the presentation.

So specifically, if you want to understand mTLS, read the specification. Read on how it’s supposed to be implemented, what prompted people to first come up with this solution. And it’s the same with certificates. You want to learn more about certificates than just look at how they’re implemented.

It’s really hard to read through it. It’s super dry, it’s not easy to grok this type of a document, but it’s also the source of truth when it comes to it. So it’s much more helpful to read something like this instead of reading answers on Reddit or something else, just go straight to the source.

Charles: I think that’s a great recommendation. No, It’s a great recommendation. I recently had to read through the web sockets RFC, so I grabbed a cup of coffee and just started pouring through it. So that’s a really good recommendation. So thanks for that.

Matei: Yeah, no worries. Anyway, just a quick recap. So we have a certificate, we have a subject, which is the name of the certificate. It’s the entity that claims the certificate. We have the entity’s public key, the subject’s public key, and then a signature that’s signed by someone else. So whenever we deal with certificates, we deal with this hierarchal system of trust.

And okay, one more thing that I do want to say is how certificate validation works. And I’m not going to spend too much time because this can get complicated really fast. But basically, when you deal with certificates, you take the certificate and you want to make sure that whoever’s signed it is trusted. And this all forms a chain. You can think of it as a chain where you go through each link and you check whoever signed it until you get to the topmost part.

Root certificates

When you get to the topmost part, you generally implicitly trust whoever signed the first certificate. We call these certificate root certificates, and there’s a bit of a nuance here with root certificates in web PKIs. So public key infrastructures that’s used mainly for static websites or websites that you access in your browser.

The topmost certificates to root certificates are generally bootstrapped in your host’s trust store. So you use a Mac or windows or Linux, and this comes with some trust stores pre-installed. And in these trust stores, you have some root certificates that have been generated by certain companies and certain certified bodies. And then every single certificate that you see on the internet is signed by one of those. So whenever you perform this certificate validation, you go all the way to the top. Now, this is for web PKIs.

We also have the concept of internal PKIs, and this is what you would use in a system that you’re on yourself. So in your own platform, you don’t need to really rely on Let’s Encrypt or any of this web PKI stuff. You can just generate your own root certificate that’s going to bootstrap your whole PKI. And I’m saying this because it feeds into the next concept that I want to talk about, and that’s Linkerd operational model.

So I have here a really poorly written diagram with two clusters. So it doesn’t matter where these clusters run, it just matters that we have two clusters. And in these two clusters, we have an ingress request here. It goes to an application, this application talks to another, which needs for whatever reason to talk to another application in the second cluster. And to do that, it goes through a gateway.

So in this scenario, for example, we want to achieve secure communication. So we want to make sure that we have certificates for these four applications. What would normally happen is, the simplest way to do this is to just create some self-sign certificates for each individual application. And then this theoretical scenario that kind of works. We just have four applications, we trust them, we know them. They’re ours, we can just generate four certificates, share them between them, and be done with it.

Revoking or rotating certificates

There’s a bit of a problem here. Whenever we want to revoke a certificate or whenever we want to rotate a certificate or do something with it, there’s a lot of labor involved. And then if we want to do it across clusters, it’s a bit hard to do because who do we trust? For example, in the gateway, if another request comes through when they have a certificate, do we trust it? Do we not trust it?

So typically, in most PKIs, it’s very common to have this root certificate that signs everything, an issuer certificate. The problem in our system is though that if we want to have arbitrary topologies where we can have as many clusters as we want and have this really hierarchical trust system, if we have one issuer certificate per cluster, we, first of all, cannot talk between clusters.

Because if these two would have secure a communication channel, they need to validate each other’s certificates. And if they’re not signed by the same issuer, then that’s going to fail. And if we reuse the same issuer for all of our clusters, then whenever this issuer expires or needs to be rotated, we have to do that for all of the clusters that we have. And again, this is simple in this theoretical scenario because I have for applications, two clusters, that’s an hour’s work.

The trust anchor

But when you deal in an enterprise or a company with slightly more complicated scenarios, this becomes such a pain that it’s almost not worth doing it. So that’s why with Linkerd we introduced another layer of trust. So we have this trust anchor, that’s what we called it. This is a root certificate.

And generally, when we generate it, we keep the private key offline. So we generate the certificate, this is going to bootstrap our internal PKI, going to keep the private key private. We’re not going to put it on the cluster. And then with this, we’re going to generate a bunch of issuer certificates, one per cluster. And then these issuer certificates will generate new keys for all of our applications.

So the advantage to that is that it gives you more flexibility. You have a more robust system where if something breaks or if you need to revoke something, it’s much easier to replace that link in the chain. You don’t have to replace the whole chain, right? So say, for example, we have two issuer certificates. One of them expires, well, we just swap the certificate out. We put a new one in, and that will automatically generate new certificates for our leaf or applications.

So I hope that makes sense, but the more hierarchical trust is, the more robustness you have in your system. But then of course, if you introduce too much complexity, things can also go south. So with Linkerd in our operational model, we found that having three layers of trust generally works out really well, both in theory and in practice.

But then, trying to, oh, nevermind. I see a question from Miguel. Miguel, I’m going to answer that really soon. I’m glad you asked because that’s another question that I often get. So how often should we rotate our certificates is the first question, and how to pull in a certificate from an enterprise PKI is another one that I get.

Rotating certificates

So I’m going to talk about that in the, no need to apologize. I’m going to talk about that in the hands-on section. But with rotation, generally, it’s very subjective, right? So I’m not a security expert, I’m a maintainer on an open source project. And I know a little bit about security after burning through RFCs and lots of cups of coffee. But what I do know is that when it comes to certificate rotation, it generally really depends on how your system works and…

Get rotation, it generally really depends on how your system works, how your policies around security are drafted, and what your thread model looks like.

Generally, shorter validity periods for certificates, so rotating certificates more often, is definitely better for security. But it also introduces the complexity of having to rotate stuff and having to bootstrap new certificates, so it’s a bit of a trade-off. But shorter periods mean that if you have a long-running window, if your certificate is compromised, then it’s not going to be compromised for a long time. If your certificate is expiring in 10 years, and it’s been compromised, then somebody can impersonate that application, that certificate, for 10 years. You can have them all without knowing that you have had them all for a very long time. Reestablishing the identity when you have shorter periods of time before a certificate needs to be rotated, also greatly reduces the risks of the certificate being pulled out, or being used by someone else.

And finally, this is kind of a practical thing, but the more you rotate your certificates, the more confidence you get into your system, and into your practices. So that’s also something important, because all of this time we’re talking about the theoretical nuances of security in the system and stuff, but we also need to talk about the human aspect of it, right? If we do this more often, then we get more practice doing it, and we sort of minimize the risk of something going wrong when we do need to rotate certificates. But obviously, if you leave it on there for longer, then you can experiment with your work more, and just not focus on your platform as much.

Time to roll up your sleeves

Cool. That was a lot to say. Any final questions before we get hands-on with certificates? And there are two things here that I’m going to say before I take questions. First of all, I think this might be shared on Slack. Clone this repository, it’s going to be way easier for us to work through it together. So that’s one. And two, we are going to use steps to generate certificates. So step is a CLI tool that’s going to help us sort of interact with certificates, and generate them. I can post a link on Slack for the binaries if people don’t have them. I’ll wait a little while while I answer questions for people to get set up, so start a k3d cluster, get Steps installed, and clone them on the repository. And Charles, if we have any questions. I’m going to move over to my terminal.

Charles: Yeah. Nothing at the moment. I thought of one and then got distracted posting messages into Slack. But if I can remember it, I will pose the question to you. But yeah, I’m excited about this demo. I think it is important to dive into reading through these certificates, and understanding how they interact with the entire system. So… Oh, you know what? I remembered it was more of a comment than a question. By default, you mentioned that rotating certificates frequently is a form of… Well, I interpret it as a form of good security. And by default, Linkerd rotates the certificates for the proxies every 24 hours.

Matei: Yeah, that’s right.

Charles: What are your… Is that, just sticking with the default? Do you have any opinions on whether…

Proxy certificates

Matei: So, with the proxies, that’s sort of something that they do inherently, that’s what they’re programmed to do. And the reason why we do it, and, well, why we do the specific rotation so often, is because we want to minimize the risk of them being compromised. It’s also sort of an inexpensive operation for us to do. So, if people are curious, generally, what the proxies do, is when they first start up, they establish a connection to the identity service, Linkerd ships with a control plane, they establish a connection to the identity service, and they send a certificate signing request. They basically get their certificate signed, they send a time-bomb token as well. It’s Time Bomb Now, I think. Or at least, with the new service account tokens, it kind of improves security.

And then, every 24 hours, they just generate a new certificate, so that the private keys don’t somehow get extracted, or they don’t get impersonated. So it is a form of good security, we’re very security conscious. But there’s also a difference between the leaf certificates that proxies use and intermediate certificates, and certificate authorities. So, what we do in the proxy doesn’t necessarily translate to what you would do with your CA, for example. Because we’re very low down in the chain with the proxies, so we can do this operation. But if you would rotate your trust anchor every 24 hours, I mean, that would be great. It means you really have a low risk of it being taken out, or being messed with, but it’s also a pretty expensive operation. There’s a lot of stuff that you have to roll out, there’s a bit of downtime involved.

Charles: Gotcha.

Matei: As with everything, always use common sense and good judgment. But there isn’t a one-step solution for everything. It’s all depending on your context, and your requirements, and so on, and so forth.

Charles: Gotcha. And I do apologize, I realize that I added that extra layer of the proxy certificates when today we’re talking about the root certificates and the intermediate certificates, which the proxy certificates are derived from. So, I will try to avoid confusing folks, in making things continue to be clear like you’re already doing.

Matei: So I seem to have misplaced ReadMe. Have I? Yeah, so, let’s see. It should be there. I don’t know what happened. Anyway, do people have this ReadMe, can you access it, are you in the directory? So once you clone the Service Mesh Academy repository that I linked on Slack, you’re going to just CD into it.

Well, I’m kind of failing it just so far, but bear with me. And then, we’re going to go to Linkerd certificate management, CD, and that directory. And over here, you should see a directory for binaries, a directory for some manifests that we’re going to use to show how to automate certificate rotation with cert-manager, and how you can pull in a certificate already from Vault. And finally, your Read Me file.

So the Read Me file is going to have all of the steps that we’re going to do, and most of the stuff that we’re going to talk through. This is so that if you get lost, or if for some reason you have to drop out, or whatever, you can come back to it and follow these exact same steps, to reach the exact conclusion. So most of these will explain what we’re doing. I’m going to just be going through them as they are in the Read Me. And let me know if I need to increase my font, or if it’s all right as it is.

But yeah, I’m going to wait just two more minutes, I think we have plenty of time to go through all of this. I’m going to wait for two more minutes. Once people are sort of up and running with k3d and with step, once you have step installed, we can carry on.

And I’m going to create a cluster. Well, one already exists. So I’m going to delete my clusters, and then I’m going to recreate them.

Charles: Yeah, so I think this ties into Adriel’s question that he asked earlier, whether… Oh, and we have a comment, “One or two clicks larger would be nice in terms of the font size.” Adriel’s question was…

It looks good to me. I also have a decent-sized monitor, so. But yeah, Adriel asked if Emojivoto with Linkerd was sufficient for this. And you’ve mentioned a couple of pieces already, the step-CLI and cert-manager, that we’ll be using, in addition to other parts.

Matei: So, for cert-manager, we’re going to have manifests. The only thing that you really need is to have the step-CLI installed. I didn’t really have time to get a binary together, but for k3d, if you don’t have it locally, and you need to spin up a cluster, there’s a script in the binaries directory that you can use. So for example, I can go to k3d, cluster, create, random. Well, apparently not, but I’m going to fix that script as well. But not now, it’s beyond the point. Anyway, are we ready to sort of get started?

Charles: Yeah, let’s do it.

Demo time

Matei: All right, cool. So the first thing that we’re going to do is generate a certificate. So I have all of the steps here. First of all, we’re going to experiment with Steps to create a certificate, we’re going to be looking a little bit at the certificate, see how we can inspect it, see how it looks like, and so on. The second step will be to install Linkerd with self-signed CA and an issuer. And we’re going to pass in some really long expiry dates here, just to sort of emulating what you would do in a staging environment. So, whenever you have to install Linkerd, you have to think a bit through their requirements, am I going to be working in a staging environment, where maybe security isn’t really your priority, and what really matters is that we get up and running, and we don’t have to do a lot of maintenance on it.

Or, if it’s a production environment, it’s the complete opposite. Well, not really with maintenance, because you have just as much maintenance to do. But you want to be security conscious, and you want to make sure that this is updated as soon as possible. So, for staging environments, we’re going to use a really long expiry date, just to show how that would look like for when you don’t want to rotate as often. And then for our production environment, we’re going to create some really short validity-period certificates, install Linkerd with them. We’re going to rotate the issuer and the CA, and this sort of production environment. And then, finally, we’re going to reinstall Linkerd using a CA managed by cert-manager.

Self-signed certificates and CA

So, the first free steps have us doing a bunch of work using self-signed certificates, self-signed CA that we’re keeping off the cluster. And with cert-manager, we’re going to see how to do all of that automatically, from bootstrapping your CA, to distributing and installing Linkerd without passing any sort of arguments to it. And this is really helpful for people who have a GitOps flow, where you don’t really want to have this manual step of generating a certificate and passing it in through the arcs, you just want to sort of have a certificate in your cluster and make use of it.

Cool. So the first thing that I want to show you is the Step command. So step is a tool that’s used to generate certificates, inspect certificates, and so on. If you, for whatever reason, get lost, just do step, dash, dash, help. As you can see, it’s plumbing for distributed systems. And we’re going to work a lot with the certificate command. So most of the stuff that we’re doing today is step-certificate-based.

Creating a certificate and private key

Cool. So first we’re going to create a certificate. So you can see here, like I said, when in doubt, just do dash, dash, help. But to create a certificate is pretty easy. We’re just going to type in step certificate create, we’re going to pass in the subject. And remember the subject as the name of the entity that’s going to hold a certificate. We’re going to pass the name of the output file where the certificate will be written to, and also the name of the file the key is going to be written to.

So what step certificate create does, basically, is it creates a public key here. It will bind the name to the public key because this one will be self-signed. And this binding will basically be done by the signature. But because it’s self-signed, I mean, there isn’t really more explaining to do there. But anyway, we’ll see in a second. So for example, I’m going to write a certificate. You need to pass in this .certificate CRT extension. Going to write the certificate, going to say, the name is Matei, you can put whatever name you want in there. And then, we’re also going to put in the key output, so CRT key, .CRT.

We are going to then set a profile. So the profile is the type of certificate that we have. I forgot to mention this a bit earlier on. But basically, the profile means is this a CA, is this an intermediate certificate, is this a leaf certificate? And so on, and so forth. For this one, we’re going to say route CA, and then we’re going to pass in another argument, which is no password. And no password means we’re not going to have a password to encrypt our private key with. This is something, if you’ve used SSH, you might know that with your private key you can have a passphrase to encrypt it with, so whenever you have to sign stuff, you have to also pass in your password. But we don’t want that, it just complicates things. So we’re going to pass in this no password flag, and also insecure. We need to use this in order to use the no password flag.

Cool. So this created a certificate and a private key. So LL, in my case, is just the LS function. So if you see me using this alias, it just means LS minus LA. And it just, well, no, sorry, dash LA. And that’s just going to list all of the stuff that I have in my directory. So I’m going to do that often, just so I can show you how the file system changed after all of these operations. And we can see here that we have basically the key and the certificate. Think I messed up the key extension, but such as life when you do live demos. Anyway, let’s look at this certificate.

Encoded certificates

And, if you worked with certificates before, you probably expected this, but if we look at this, this is Joel all bites. We have no idea what this means, right? We just know that the certificate begins here, and it ends here, with these two equals. But there’s nothing more that we can infer based on this. That’s because certificates use encoding. So most certificates that we deal with in web PKIs and internal PKIs are X509, encoded, and this is basically what we see here. You can think of it as proto buff. You know how, basically, with proto above you define your objects, and over the wire, they get serialized into bites, or whatever. This is kind of the same concept. It’s not the same thing, but the same concept.

So in order to actually look at the certificate, and be able to tell what it means, and what it’s supposed to do, we can just inspect it using Step. So if we type step, certificate, inspect, and we pass in our certificate, we can see all of the data. And you might notice that the data is pretty much the same as the one that we had in the slides.

So we have a version here and a serial number. We’re not necessarily concerned with this. We have a signature algorithm, which is used to compute the signature. No surprise there. We have an issuer. The issuer is ourself, because we’re self-signed. We have a validity period, so this is when the certificate expires. Mine’s going to expire in 10 years, not bad. And also, it can’t be used before this date, so it’s the date that we just generated the certificate. We have a subject, and then we have the public key. All of the other information is just used to sort of add more metadata to the certificate, but it’s not something that we’re concerned with right now.

Are there any questions so far? Have you all managed to generate certificates and stuff? Can I get a quick confirmation on Slack or Zoom?

Charles: I don’t have any questions, but just as a comment, that step certificate inspect has been a really, really useful tool as I’ve worked with folks in Linkerd in the community, and with Buoyant customers, just to understand what’s going on with certificates. And it can be used to inspect any certificate that’s generated. So if you’re using open SSL, or if you’ve already got a certificate somewhere, you can use that inspect command to take a look at it, and see those same details in the TA just showed in his terminal.

Matei: Yeah, and the certificate can be generated, or the public key can be generated, with any algorithm, it’s a really flexible tool. This Step certificate thing is a bit like a Swiss army knife. And I like it because it’s a bit more intuitive than using open SSL. But anyway, if this all made sense to everybody up until now, we’re going to be moving on how to generate a self-signed certificate authority, and an issuer.

Generating a self-signed certificate authority and issuer

So, with Linkerd, we mentioned that we have hierarchical trust in this operational model that we have, right? So we have different layers of trust. And in order to achieve this, we need to generate a certificate authority, whose key we’re going to be keeping private on our computer. And then, we are going to generate an issuer certificate that’s going to be used to sign under certificates.

So you can go ahead and work through this. And you don’t have to wait on me to go through all of this stuff, so if you’re feeling confident in your step commands, you can give this a try. But what I wanted to say, is that in order to make this a bit more interesting, we’re going to experiment with changing the validity periods for our certificate. So I mentioned at the start of the sort of hands-on phase, that depending on what environment you use, you might want to use longer-lived certificate. So for example, if you have a staging environment, you might want to just set the expire date to be 20 years for the CA, 10 years for the issuer certificate, and then just sort of be done with it and not have to worry about it, unless 10 years pass, which they might, if you’re still working the same job, and working on the same system.

So yeah, one thing that you need to know, is basically, in order to pass this expiry date, we don’t need to use a duration. So we can express it in seconds, minutes, and hours. And that sort of, over here. Or, we can pass in the time. But the time must be RFC compliant, RFC 3339 compliant. So this looks a bit like this. If you’re on a Unix system, you can basically get it using the date command. So it’s the date today, the calendar date today, the time, and then the time zone. It’s going to make sense in a second. It’s going to be a bit harder to type these commands out at first, but you’re going to get used to it in a day or two. Well, not even. We’ll just give it a few tries, and you’re going to get used to it, is what I wanted to say.

Anyway, if you have more questions about this date command or date in general, you can just do mandate, and you’re going to be able to see all of the stuff there. So yeah, we can definitely share the repo, I’m just looking over Zoom. And while I do that, I have a question for the audience. Does everyone, well, does anyone know why, when we generate our CA and issuer, we should have the CA be longer-lived than the issuer? What happens if, for example, the CA expires in a year, and the issuer expires in two years, would that be a problem? I know nobody can speak to me, but I’m hoping one of you will answer this on Slack or Zoom.

What happens if the CA expires in a year and the issue in two?

I’ll leave the question there. I hope someone can take it on. But I think it would really help sort of solidify some knowledge if we make this… No, root certificate left if it expires. Well, sort of. Yeah, exactly. So the issuer will stop working when the upstream trust chain breaks. Because if the CA basically expires before the issuer, then how are we going to do the whole certificate-validation path? So all of the answers were sort of correct. It’s a combination of everything. Thank you for being interactive, helps out see exactly how much you’re understanding based on what I’m saying. It stops me from rambling too much.

Generating trust anchors

Anyway, let’s create the certificate. So what I’m going to do first, I’m going to remove all of the stuff that I just generated, and we’re going to first generate our trust anchors. Okay? So we’re going to be passing in this subject name. So boot.linkerdclusterlocal is going to be our cluster domain. So this will basically coincide with the service account tokens that we use for identity and Linkerd. We’re going to call this CA.CRT, CA.key. I missed a format before. Then, we’re going to say, for a profile, this is a root certificate authority, we don’t want any password, and we want it to be insecure.

And finally, we’ll say that this certificate will not work after this date. So I’m going to pass in something super long, 2060, let’s hope we won’t make it until then. And I’m just going to have it expire at four o’clock GMT, because that’s my time zone. Invalid value. Okay, I think I have too many zeros there. There we go. And over here, a plus.

Okay, it seems this is still failing. It’s much harder to do all of this stuff with people watching you. Let’s see. So, okay, I see where I went wrong. Hey, this I’ll work in.

Cool. So what I did here is generate a certificate authority. This is going to be our trust anchor. So we can verify it’s a trust anchor because it’s been signed by itself. It’s really easy, it has this signature, it’s signed itself, and a public key.

Generating the issuer

Now, let’s generate an issuer. So for the issuer, we generally use a different subject name. And Linkerd we use identity, because the issuer certificate is mainly used by the identity service. And then it’s used by everything else to sort of doing certificate validation, but it’s mainly used by the identity service, so we’ll just name that identity, and we will make sure we save the key and the certificate. The profile will be an intermediate certificate authority, which means it will not be signed by itself. And you might have guessed it, but because it’s not signed by itself, it needs to be signed by a certificate authority, it needs to be signed by someone else. And in this case, it’ll be signed by the CA certificate we just created. So we’re going to pass in the certificate key and the actual certificate. No help topic. Oh. Move. Bear with me. And then, in the end, we can also pass in a different expiry date for this one, if we want to. I think by default, certificates are generated with a 10-year expiry date.

… tickets are generated with a 10-year expiry date, but we can go even further than that and say, this will expire in 2050. So long time from now. And then we’re going to say that expires at 4:00 just like our CA, and we’re going to pass in the timezone. And again, I want to sort of drill this mechanic of inspecting your certificates. If we inspect our certificate, we’ll see that we have the subject name and the issuer is our route certificate.

Charles: Cool. We’ve got a question. Sorry to interrupt you.

Matei: That’s okay.

Updating route and intermediate certs

Charles: So once the route certificate expires, I would need to update the route and all intermediate certs. How do you automate that?

Matei: Yeah, that’s a good question. So generally you either automated through scripts if you trust the batch. And if you don’t do anything, that’s too much work, then certain tools like cert-manager can definitely make this a bit easier. And we’re going to cover a bit on how to automatically provision certificates. We won’t cover automatic rotation but it’s kind of in the same vein. You use cert-manager to sort of creating your certificates for you. Cert-manager will be responsible for renewing the certificates and reissuing new certificates based on the declarative conflict that you put in. So everything that we’re doing now in these three steps that we’re tackling at first is manual work. How can I manually start my own internal PKI to work with Linkerd? The last step will be on how to integrate with an existing PKI so that everything is done automatically and declaratively.

So more of a get-up style deployment and bootstrapping. I hope that answers the question, but stick until the end then we’ll talk about cert-manager, and I’m actually kind of excited to talk about it. It’s really interesting and we discovered a new tool that will work well cert-manager, so you can install Linkerd without passing any arcs. And I’m glad it helps.

Okay. Anyway, I left some questions here. I noticed the additional flags. Why do we need the additional flags? So the additional flags mean that we pass in the CA and CA key. I sort of went through that and you’ve all been nice enough to answer this question, is it necessary for the trust anchor to expire after the issuer?

Installing Linkerd

Cool. So now we can basically install Linkerd. Yeah. Why don’t I just do that right now? So let’s get all of the notes. I’m going to use kubectl. I use a K alias but I’m going to, yeah, help you. At least, I don’t think I did. It’s not going to take too long.

Anyway, I’m going to use this K alias from time to time. I’m trying not to use it too much but if you see it, just know that means kubectl. I mean, you can pronounce it as kubecontrol but kubectl, it just rolls off the tongue. Anyway, we can start the debate a bit later on after the workshop along with tabs versus spaces, but for now, let’s go back to certificates.

Okay. So let’s install Linkerd. So kubectl, get notes. We have one note there, it’s our control plane and our main node. It’s just K3D. And we’re going to install Linkerd. You’re going to pass a few flags. So the first flag that we need to pass when we install Linkerd is the trust anchor’s file. So if you have experience with Linkerd but you never had to sort of pass these things, then the certificates in it mean you relied on Linkerd to generate certificates for you.

And Charles sort of alluded to this at the start of the workshop. When you want more control over your environment and when you want it to be long lift and you want it to be more robust from an operational point of view, it’s definitely a good idea to supply your own certificate. So that’s how we sort of do it for the command line. You can also do it for Helm. I think the flags might be slightly different. You can look over our documentation for the exact flags. But basically, the first thing that we do is we pass in that boots CA certificate. Right? And we use this identity trust anchors file flag to do that. Remember that the private key for the certificate authority in our case will stay offline because we manually bootstrap these certificates. We’re going to pass in the issuer certificate file. And then we’re going to pass in… That’s the issuer key file.

The issuer

So with the issuer, we want the private key to actually exist on the cluster because we’re going to use that to sign other certificates, the proxy certificates for chain validation. So that sort of needs to be in there. And then we’re going to just apply this. We can see that everything is created.

Cool. If we get all of the pods and the Linkerd namespace, we can see that they’re all initializing still. But what I want to show you is, if we do kubectl, get secrets. Now I feel very conscious about saying kubectl. Thank you, everyone. The Linkerd namespace, we’re going to notice that first of all, we have an identity issuer certificate. And if we have a closer look at the secret, Oops.

I would put it as you all know, we’ll notice that we basically have this certificate and this key. So basically when we pass in our certificate key in file for the identity issuer, the CLI will generate a secret manifest where our key in the certificate is going to be basics 64 encoded. These are going to be mounted to the identity issuer pod and the identity issuer pod will use it to sign requests for the proxies.

So basically your certificate that you just use lives here. And then for the trust anchors, because we do not have a private key, we have a conflict map. And that’s going to be in a different namespace, and it’s going to be called… No, sorry, not different namespace, different name default and that command over there. But basically, this is going to be in our Linkerd namespace, going to be a map and it’s called identity trust routes.

So if we output this as YAML and Linkerd namespace. So many things that you have to type in there. We can see that this sort of closely resembles what we see when we just open up the file in our shell. It’s just a certificate and it’s X509 encoding. We don’t need it to be basics important coded. We’re changing. So withdrawal has a question on Zoom. I realized that for people sort of watching the recording might not be super intuitive when we go for the question. So I’m just going to repeat the question. I already installed Linkerd on the cluster beforehand. Would changing them to the current ones be equivalent to rotating them? Yeah, pretty much.

So if you do a Linkerd upgrade now, I think you can’t do an install but you can do an upgrade. If you do an upgrade and pass your new certificates, that is essentially rotating your certificates. But there’s a bit of a caveat to this and we’ll see you in a little bit. When you rotate your certificates, you want to do it with no downtime if possible.

So that’s the only sort of difference. If you do it now, just upgrading, you’re going to have to restart your Linkerd workloads. If you do it, well in the way that we’re going to do it shortly, you’ll see that you can sort of limit the downtime that you have. Anyway, we’ll get there in a second but that’s pretty much it. Yeah. Okay, cool.

Generating cell sign certificates

So the next step. Step number three is generating cell sign certificates for the security-conscious operator. And I already sort of went through this. Why do we want shorter validity periods? It’s because they make our environment more secure and robust. Key compromises might happen more often than you think. So that’s something that sort of people ask us from time to time. Why do I need mTLS? Why do I need to be so secure about my environment?

Well, the truth is you never know when your key’s going to get compromised. You could, for example, pull an image of a bad dock or repository or pull the wrong image, and suddenly someone has access to your network. That’s not unheard of. It has happened quite a few times. So in these situations, it’s better to be paranoid than to be very relaxed and make the assumption that your system will not be broken because it can definitely be. So we saw how we can generate some certificates with a long period of time and now we’re going to do the exact opposite.

So if you’re feeling confident enough, feel free to sort of just breeze through and generate your own certificate. We’re going to do the same thing as before. We’re going to generate a route certificate. We’re going to call it, again CA for the output file. We’re going to pass in a route CA profile, no password, insecure, and then we’re going to do this in hours now. So we’re going to say 8700 hours, I think that might be a year. And we would like to override these deposit.

And now we can inspect our route. Oops, it’s CA. We can inspect our route. And if we look at the expiry date, this is going to expire in one year. Not even one year, it’s less than a year, I guess by two days. And then finally, we can also create an identity, well sort of issuer certificate. Okay. And we’re going to pass in a profile, intermediate certificate authority. We’re going to pass in the key and certificate for the CA so it can be signed, no password, insecure. And we’re going to put 170 hours. And I forgot to type in certificate. Create again.

And hide those everywhere. We would like to override this. Yes. And now if we inspect this identity certificate, we can see that it will expire in one week. So for people who are security conscious, this is sort of the recommendation that we have. Keep your issuer as short as possible. One week is definitely super short. You’re going to have to rotate a lot, but it’s the best practice if you want to be super secure. If you’re dealing with compliance and auditing, it makes it easier to see this auditing trail.

If that’s too much and it’ll just put too much of a maintenance burden on your production environment, then it’s also good to set it to one month, two months, three months, even a year, but definitely do not surpass a year. I think a year is sensible but definitely try to go as short as you can always. And for our certificate authority, because the private key is kept off cluster, we’re not as worried. And generally, when you want to rotate your route, it’s a bit more of an involved process. So we think maybe one year is sensible here. I think personally that one year is sensible here.

Cool. So now that we went through all of this, do we have any questions? Are people still keeping up?

Charles: There are no new questions. Though I just want to point out that you’ve gone through and you’ve created the route certificate and the issuer certificate a couple of times. And those were just examples for showing how to create certificates with different dates, different durations. Normally when you would go through this process, you would create one route CA, one issuer certificate, and then that would lead into, I think what you’re about to talk about, which is the rotation and management of those.

Certificate rotation

Matei: Yeah. So I’m going to talk about rotation. And I see that people have already caught on that the Linkerd check is going to pretty much scream at you if the certificate’s going to expire within the 90 days. So generally with check whenever we check stuff. And I’m actually super glad that people have already caught on and they started experimenting with Linkerd check. Linkerd check, in general, is not going to error out if your certificate… Well, mine doesn’t expire in 90 or 60 days, but it’s not going to error out. It’s just going to pretty much scream at you and give you a warning, “Hey, within 90 days, you need to rotate this.” So I think it’s good to be mindful of it. Generally, if you want to get rid of the warnings, you can just put in 90 days or whatever for the validity period.

But I think it’s useful when it screams at you because it just makes you sort of realize that, “Okay. Within the next month, I need to rotate my certificates.” And again, with certificate rotation, you want to be in a place where you can rotate your certificates before they expire. Because once they expire, the whole process becomes more complicated and there’s no guarantee that you can do it without downtime. So yeah. Really happy most of you have caught onto that. I also caught onto it when I was writing all of this, but I think it’s good to have a warning. If it’s annoying, then you can always just keep within that timeframe or over that timeframe of 90 days. Anyway, let’s see. So we’re going to go into rotating your certificates manually. So this is a bit of a headache and I know a lot of people have asked us questions about this.

It’s unsurprising that it’s a headache. It’s not an easy process, but we’re going to do it in such a way that we don’t have any downtime. And hopefully, you’re going to see how going through these steps is more intimidating in theory than in practice. So I’m going to rotate the certificates that I initially generated. Those that have a 10, 20, 30 year validity period, but that does not matter. As long as the certificates are not expired, then we can rotate them safely this way. So the first thing that I’m going to do is I’m going to create a new certificate, and I’m not going to overwrite the certificate that I have before. So I’m going to give it a different name this time. We’re going to say this is the new CA. This new C will have a new CA key. Again, we have to type in the profile. No password and insecure. And I think going through these commands, you can also sort of get a feel for how you would automate this with bash or something similar.

Charles: You forgot to create.

Matei: Yeah. I forgot to create. See a bit of a theme here. Cool. So, what I’m left with now, so I’ll do it. LS-LA. I have my old certificate authority. Well, I think I messed up because I’ve overwritten that already. And I have my new certificate authority. And just because I no longer have my old certificate authority, I’ll show you what you can do when that happens. So say your certificates are due to expire. You don’t have access to the CA certificate on your local machine. You used a different machine to bootstrap it. It’s just not on your file system. And it wasn’t checked into GitHub or anything else. So what I’m going to do is I’m going to get this config map and I’m going to sort of copy this bundle. So if I’m just going to do old CA, I’m just going to go into… I think this should do the trick. Let’s see if we can inspect it.

Matei: If we can inspect, and yeah. This will basically let me view my old certificate. Now, unfortunately, I’ve also committed the mistake of overwriting my key. And if you overwrite your key, you’re pretty much done with it. So I’d definitely do not recommend replacing your key. So I’m going to do this. I’m just going to install Linkerd with my old generated CA. So basically just a sort of recap, I, unfortunately, have overwritten my CA, the CA that I installed Linkerd with initially. And because of that, rotating is not really possible because we no longer sort of have access to that. I think it would’ve been good either way. But just for the sake of being consistent here, sorry if I’m confusing anyone.

Charles: This is a good example of how to do disaster recovery, everybody.

Matei: Yeah. Sort of. I mean, I guess the best thing is to not get into this situation. So don’t be like me. Save your private key somewhere else and be careful not to override it. Learn from my mistakes. And definitely make sure to not do it during a live demo.

Matei: I’m just quickly going to verify that my Linkerd pods are sort of getting started and I’m just going to go through the process now. So I still have my new certificate authority. Right? I have my certificate authority and my identity certificate. So we want to rotate both. The trust anchor and the identity issuer. So I’ve generated a new certificate authority and I’m going to generate a new… Now I’m going to create a new issuer certificate. It’s for local, going to call it new issuer. Well new identity, just so we’re consistent with the names. New identity key. All intermediate CA. I’m going to sign it with the new certificate authority by passing in the relevant files. No password, too much of a headache. And make it insecure. Now I’m also going to check if my Linkerd pods have started and they have. Cool. So now the first step in rotating is to generate the certificates. We have the certificates. Now, what do we do?

Upgrading Linkerd

Well, it’s kind of easy actually. We’re just going to upgrade Linkerd. The first thing that we want to do is well sort of put our all certificate already in a bundle. So let’s go through it. The certificate bundle is basically kind of like a bundle of flowers in real life. You just sort of put things together into the same document. So we’re going to use that to create a bundle. And the way we do this, we can see it here, is say certificate step, certificate bundle. And we’re going to bundle our new CA with the original CA and put it into a bundle certificate file type, or certificate file type named bundle. So now if we inspect the bundle, certificate inspect bundle, we’re going to basically see both of our CA’s are linked together.

So the first one is the one that we have just generated, and the second one is the old one or the other way around. The point is both of our CA’s are in the same bundle. Okay. So I see that. I have a question and I think we could maybe take a step back and redo all of this. So Anil says, not sure I understand if you override it, could you have just rotated with a new set of search and keys. Right? Right. Sort of. So the thing that we want to do first is create a new certificate authority, get access to our old certificate authority, and bundle both of these in the same file. We’re going to upgrade Linkerd first with the new CA bundle. And this means that basically all of the old workloads that we’ll have will do certificate validation again but when they do it, they’ll have both of the certificates in the bundle.

So whenever they get to the CA, they have both of the CA’s available to use instead of just one. So that’s the first step. The second step will be to upgrade Linkerd again with the new identity issuer. And we basically need to have both the old and the new CA to ensure that all of the previous workloads that had certificates continue you to work because they can do the certificate validation with the old CA. And once we upgrade with the new issuer, we want to make sure that once workloads acquire new certificates, they can do validation with the new CA. Does that sort of make sense?

I hope it’s going to make a bit more sense once we deploy this. Okay. Cool. I’m glad. So Anil said it does make more sense. Yeah. It is kind of like a rollout, but we basically do it in two steps. Right? So just because we don’t want to restart the Linkerd control plane or rather not restart all of it, we sort of just do it incrementally first, put the CA in. Once the CA is in, we know the old way works and the new way will work. Then we put in the new issuer and now we are making sure that the new way works. And finally, we take the old way out. We take the old certificate out so we’re just left with the new certificates. So it’s a three-step process. A three-step rollout, let’s say. Okay. Well, if there are any more questions, please feel free to ask me.

Okay. Where were we? So we have our bundle. We’re just going to say Linkerd upgrade. And we’re just going to basically override the identity for standards. And we’re going to override them with the bundle, and the bundle contains both of our certificates. So now Linkerd has the new CA and the old CA. We can see that the proxy injector and destination are re-rolling. So the identity service stays still, but the proxy injector and the destination service will sort of roll out on their own because now they have a new CA in the environment.

Generating a new identity certificate

Cool. Everything works so far. So let’s see what the next steps are. Once we put it in the bundle, we generate the new identity certificate, and I believe we’ve already done that. We have the new identity certificate. And now we’re going to upgrade Linkerd again using the identity issuer certificate file and the new issuer private key. So now we can basically sign new proxy certificates using the new issuer instead of the old one. So my new issuer, oh, it’s because it’s a new identity.

A new issuer. Oh, it’s because it’s a new identity. I think I diverted a little bit from those readme files and changed the naming. So instead of a new issuer, it’s a new identity. No block type certificate. It’s because the key file is key, not a certificate. So again, if we look at the pods, the proxy injector has just been rolled out and so has the destination service because now they need to acquire new certificates. And you’ll see here that I put a bit of a caveat. We also need to restart the identity service so that it can basically use the new issuer instead of the old one. It generally reads from the config map every time it changes, but we just need to open up a little bit here. So I’m going to say, I want to restart this Linkerd identity deployment in the Linkerd namespace. And that’s going to roll itself out. And we can check that the identity service has successfully updated the issuer by looking for this issuer updated event. So what we’re going to do is kubectl. We’re going to get events.

We’re going to use a field selector and the field is reason and the reason must be equal to issuer updated. And apparently, that’s not found for one reason or another. So the issuer update failed because the search could not be read from the disc, certificate signed by unknown. Well, that is a bit of a problem. So you see how manually rotating your certificates can get you into trouble. So what we should do now is get all of the pods. Everything seems to be running correctly. So let’s just check the link of the identity events and this plays out much better when we actually have a failure. Can we get the events straight for the pod we should be able to or I guess not. We can describe the pod though. Nothing in the event so far. So let’s see. Maybe it’s perhaps the fact that we have it in a bundle that’s going to change everything.

Trust anchors new CA

So we’re going to upgrade Linkerd again. And this time we’re going to just change the trust anchors to be the new certificate authority. So I think what was happening there as we check the events, when the events wanted to do certificate validation they found this bundle and they couldn’t reload the certs, but that doesn’t mean that our current communication is not working. So we’re just going to upgrade this and check for the event again. You’ll see a little bit of a warning here. Rotating the trust anchors will affect existing proxies. So something that you should also know once you want to basically make the rotating official and you know that your changes have been picked up by the identity service you’ll want to roll out your data plan so they can acquire new certificates.

If we try to get the events now, we still don’t see any events. Let’s just get all of the events in this Linkerd space. Someone has a question. Was the rollout restart done after upgrading the root certificate or the issuer certificate? Slow internet missed it due to lag. So the rollout restart we did after upgrading the issuer certificate for the identity service. When we upgrade the route our destination and proxy injector services are going to be rolled out on their own because we basically have new trust anchors in there. Cool.

Anyway, you can see how rotating your certificates can be a bit problematic. Manually rotating them should generally be with zero downtime as nothing is going to just break at the same time, but you will run into issues as I’ve just run into now. So there’s an easier way to do all of these things and that’s using cert-manager to automate all of your identity bootstrapping. And just another thing that I want to mention before I move on, if you want to rotate expired certificates then you will most certainly incur some downtime peak because at that point it’s hard for connections to still do all of the certificate validation without failing basically. So I’ve left it as homework. We do have a doc that explains this process. So if you want to try your hand at it, you’re more than welcome to.

Using cert-manager

Anyway, let’s look at how we can use cert-manager. So what I’m going to do is I’m going to delete my cluster and I’m going to start fresh. And while I’m going to create a new cluster, I’m going to go into the manifest folder to look at what we have here. So cert-manager is a tool that lets you basically declaratively create and bootstrap certificates. The way it works is you pass in a couple of CRDs and you say you want an issuer certificate or a CA certificate and you want to sign under certificates. And basically, cert-manager creates them for you and puts them in the cluster. There’s a bit of an issue here and in the past, this worked really well for bootstrapping your identity service, but not necessarily for your root certificate. And that’s because cert-manager distributes all of the certificates it creates as secrets and Kubernetes. And you might have noticed that our certificate authority is a config map and that’s because we want to keep the key separate from the rest of it.

And we basically don’t want the certificate authority to be secret. We want it to be mounted to pods. It needs to exist everywhere so that other certificates have access to it for certificate validation. I know I have said certificate a lot of times in there, but what you need to know with cert-manager is that standalone it will not work with Linkerd for your CA. It will work well for identity certificates, but not for the CA. So what we do is we bring in another tool from the folks over at Jetstack, which is called Trust. And Trust will be responsible for redistributing our CA certificate as a config map without giving us access to its private key. So then we can safely mount it to pods without actually giving pods access to the CA’s private key because that’s one of the biggest problems. We couldn’t just mount a secret with the CA certificate and private key because then all of the pods have access to the CA’s private key.

And that’s a huge no-no from a security perspective. So let’s look a bit at what CRDs we’re going to be dealing with. First off, when we want to bootstrap identity, of course, we’re going to start with a CA certificate. And in cert-manager, we’re going to be using basically two types of resources throughout all of this. We’re going to be using a resource of type cluster issuer and a cluster issuer resource means that a certificate can be used to sign another certificate. And then we’re going to be using a certificate resource, which will create a certificate itself. So in this example, we have a cluster issuer named Linkerd self-signed issuer. So this will be our CA issuer. And then we’ll create the certificate itself, the CA certificates.

And we pass in a common name, we pass in the name for the secret, we pass in the algorithm and then we say, “Well this is going to be signed by the self-signed issuer we just created.” And the self-signed issuer is this, which is self-signed. Then finally, after we get our certificate authority bootstrapped, we’re going to create another issuer. That’s going to use this CA we just created to sign other issuers. Lots of issuers, lots of signing, but basically what you need to know is that the first two resources help us create the certificate authority itself. And the last resource allows us to use the certificate authority to sign our issuer certificates.

So that’s the certificate issuer or the certificate authority issuer and then we’re going to be moving on to the identity issuer. This is going to be a bit simpler. We’re first going to create a Linkerd namespace. And then we’re just going to create an issuer certificate. And we’re going to say, “This certificate will be signed by our trust anchor certificate.” So this is the issuer rep, the trust anchor issuer we created last, this is the name of the secret, and then put the secret in the Linkerd namespace. So we want cert-manager to create this secret in the Linkerd namespace.

If we look at the first certificate, the CA certificate, bear with me while I type all of this stuff. The namespace is going to be cert-manager because again, we do not want the certificate to be publicly available to pods or be mounted. We don’t want it to be in the Linkerd namespace. So then the final step is to actually distribute the CA certificate into our Linkerd namespace as a config map. And that’s what Trust does. Trust as a tool is used to redistribute a certificate from a specific namespace from a secret into a config map and another namespace. So basically what we say here is take this secret, take only this field of the secret, just a certificate not the key, and then put this in a config map. And that’s it. This is the theoretical view to it, but we’re going to get into specifics in just a minute. So let’s install Linkerd. No, we’ll install Linkerd last, sorry.

Let’s install, the Jetstack stuff. So I’m going to be doing this through Helm. You have the steps here. First, well, you have to add in the Jetstack repository and then you need to install cert-manager. So we can still install through this dash flag when we do the upgrade. It’s going to say the release doesn’t exist now. So we’re going to be installing it. And then over here in this other show window, I’m going to install cert-manager trust. And we’re just going to wait a little bit, wait for this thing to go through, and then apply our manifests. So both of them have been installed successfully, going to get rid of that window. Oops, wrong one. Cool. Let’s start applying stuff from the manifest directory.

So the first thing that we want to do is create the CA. So cert-manager, the CA issuer, I’m going to apply and we are going to get the secrets from the cert-manager in space. So we’re going to see that here we have this identity, trust root secret. This is base 64 encoded and then we’ll have our CA certificate and our CA key. The next thing that we want to do is create an issuer certificate based on this CA. So we’re going to create our identity issuer. We’re also going to create a Linkerd namespace. So now if I’m going to get secrets in the Linkerd namespace, we’re going to see we have this identity issuer secret. And this will basically be the same as the one we created when we installed through the CLI with our own certificate. So if we output it as YAML, we have the key and the certificate. Cool.

Now the next thing I said is we want to make sure that we have this identity trust roots config map in the Linkerd namespace. So we’re going to apply the last manifest, which is that trust object that I talked about earlier. Now in the Linkerd namespace, you can see that we have this identity, trust roots config map. And if we inspect it, it’s just our certificate. Now we can pull this out and we can inspect it if we want to. Cert-manager CA. Just need to get rid of ads there. Just like before we can inspect this. It’s been issued by itself, It has the subject name that we expect and that Linkerd expects, it will expire in June. So just a few months. While we passed the expiry date the cert-manager will also have different defaults. You should consult the docs for this. But finally, we are ready to install Linkerd.

And instead of passing in our certificates, like before, it’s going to say, we want to install, but we are going to be using… I think I’m around here somewhere. We’re going to be using an external issuer. And this flag basically tells the control plane to expect an issuer secret in the Linkerd namespace. And then we’re going to use home flags to say that we also expect an external CA. So now the Linkerd control plane will also expect the CSCA config map in the Linkerd namespace. And now if we get the pods in the Linkerd namespace, well, they’re not ready yet, but they’ll basically be installed without us having to generate any certificates in the CLI or to pass in any certificates explicitly to the command line.

So this is basically one way of automating your PKI, where you rely on cert-manager to provision these CA and issuer certificates. You rely on cert-manager to rotate them for you as well. I think they can do that, again, you need to check the docs for specific details, but it really reduces the manual labor and the impact that you need to have as an operator. And that’s it. You sort of have automated cluster certificate bootstrapping. It’s still initializing. I think my KPD is just moving a bit slow. You can get a random pod in this Linkerd namespace and just inspect it. Go down to the conditions at the end, I guess, see if there’s anything wrong, but there shouldn’t be. Everything should be working fine.

Charles: Awesome! You’ve shown us a ton of stuff in a short amount of time. And as somebody who’s done live demos and live demos of the certificate management stuff in Linkerd, you end up using the word certificate a lot because there are a lot of certificates that are involved. You did a great job of explaining things. I think it’s very helpful to understand the installation of Linkerd and then take that dev sandbox environment and take cert-manager and the other open source projects, Jetstack’s Trust. And showing us how we can apply this automated process to our mature long-lived clusters. And it takes an investment in time in order to get this set up, but once it’s automated and it’s up and running, you don’t have to think about it anymore, which is really great.

Q&A

Matei: For sure. What I want people to get from this is that all of my mistakes and spelling step certificate create and all of that are tangential, but most of it is here’s how you work with certificates. If for some reason you try to rotate trust anchors and it doesn’t seem to work this is what you need to do. Go and look at the events, go and look at the pods. And what I really hope that people get out of it is okay, I can work with certificates, they’re not as intimidating as I think they are. All I need is these free tools and just drill down into it. So since this is Q&A, I have a couple of questions that maybe I can answer really quickly.

So the first one is “Where would Vault fit in for managing certificates? Would vault replace cert-manager or work with it?” Vault would work with it. So if you have an external PKI, so it’s very common for enterprises, for example, to have a trust store, like Vault, you can instruct cert-manager instead of provisioning certificates for you to pull them out of Vault. So basically it’s the same declarative config and you tell a certain manager, “Hey, go into Vault, pull these certificates and use them for Linkerd as a CA. And then using trust you distribute that route certificate in the Linkerd namespace, you install Linkerd and it works well with it. So you would use it in conjunction with Vault. And then another question, can you introduce cert-manager and still do an upgrade of Linkerd with no downtime?

Yes. You just have to think through the pods that you need to roll out. So I haven’t had enough time to cover it as part of this workshop. It’s a homework exercise, but if you need help in figuring it out, definitely send me a message or Charles or Alejandro on Slack and we can work through it. But I think you should be able to do it if you basically do a rollout strategy here where you can progressively roll out your pods. And then Priya has an issue, any tips for this particular error, we’re getting this in all of our environments after an Argo restart. And it has to do with the Prometheus chart requiring some identity trust anchors. I don’t think I can help with that now, but post that message in the Linkerd Slack and we’ll take a look. And that’s about it from me.

Charles: Yeah. We can definitely help you with that, Priya. Jump into either the workshops channel or Linkerd 2 channel on Slack and we will work through that particular area with you. Just as we wrap up here, want to let you know again, this is part of our Service Mesh Academy, teaching you everything that you need to know about service mesh. The next workshop that’s coming up is secure multi-cluster Kubernetes with Linkerd. So this is again, another big topic we’re going to reserve enough time to be able to cover all the different aspects of how you would manage your services across multiple clusters and Linkerd. So this is a topic that’s near and dear to my heart. So I hope you’ll all join us for it. Miguel, I see you’ve got a question or a note that you put in there. So we can follow up with you on the workshop channel, Linkerd Slack that will be ongoing.

You can post in there anytime you want. Even if you are watching this video many years in the future and your certificates have expired and you’re trying to figure out how to rotate them, jump into Linkerd Slack. We’ll be there. We’ll help you out. So thanks again, Matei. Great job. Again. even though I have gone through this stuff myself, I still learn some valuable things today, so thank you again and we’ll see you all in Linkerd Slack.

Matei: All right. Thank you all. Thank you for being an amazing audience and for asking questions, if there’s anything else that I can do to fill in the gaps, let me know. And have a good one.

Charles: All right. Take care, everybody. Thank you.

‍