Kubernetes traffic management with the Emissary-ingress and Linkerd

Transcript

Note: this transcript has been automatically generated with light editing and cuts. It may contain errors! When in doubt, please watch the original talk!

Welcome and intro [0 min]

Daniel Bryant: Welcome everyone to the run module of summer of Kubernetes! Welcome along, if it’s your first time. Welcome, if you’re coming back after the code and ship modules!

If you haven’t looked at the previous modules, no worries, you can go back and look at them in your own time. Each of the modules is standalone. The topic we’re going to cover today, ingress and service mesh, will build on some of the things we’ve talked about in previous modules but it’s not essential to have been there. And again, it’s all on YouTube, it’s all on the website, so you can pop back and look at that.

I am super excited today to be joined by Jason Morgan from Buoyant. Jason is a legend, he’s helped us out so many times. Awesome presentations, loving the CLI skills, they are magical! The fact that you can talk while typing on the CLI — I’m constantly impressed by your magic CLI skills. I can do one thing at a time: I can speak or I can be on the CLI. You can multitask, so super excited!

Jason is with the Buoyant team working with Linkerd — all that great technology! I follow Linkerd literally from the day it emerged — fascinating tech! The folks there are all super smart, a CNCF project, of course, and graduated now. A fantastic achievement! Kudos to yourself Jason, everyone in the community, all the Buoyant folks — an amazing achievement!

Do pop along to Linkerd, I know Jason’s going to mention the various sites to check out all the info. But check it out if you’re looking for a service mesh. Super easy to get started, we’ve done some CNCF webinars and Jason’s going to go into this today as well. Out of all the service meshes, I think it’s the easiest to get up and running. There are enough challenges in the Kubernetes world, make your life simpler by choosing Linkerd.

Jason Morgan: Thanks, that was a super nice intro! I really appreciate it. Hey folks, really appreciate y’all taking the time to join. Let’s kick off.

The setup [2:13 min]

Feel free to follow along. I’m going to use a tool called k3d to spin up an in-memory cluster. I’m also going to borrow a cluster from the folks at Civo to do some cool blue-green stuff live. With that, let’s get started and talk a little bit about Kubernetes, applications, and how a service mesh and ingress work.

We’ve got our applications with a frontend and two backends but, on its own, this is unlikely to be sufficient. I’m not gonna do kubectl port forward to get access to my frontend for live traffic. I need something that will get things from the internet, secure traffic, allow me to do intelligent routing. That tool is an ingress.

What is ingress? [3:14 min]

Think of an ingress as the front door to your cluster. This is where http://myapp.example.com turns into a cluster entry point that gets you to the frontend. The frontend talks to the various backend services. Ingresses work with what we call north-south traffic. When you draw it, you typically draw the internet up here (as a cloud, not a person) and get this up and down flow of traffic from the world to my app, and from my app to the world. That’s where north-south comes from.

The ingress is the front door. It changes the URL to the pod mapping and allows you to build sticky sessions.

Let’s step back for a second and talk about service mesh.

What is a service mesh? [4:36 min]

Here’s our app on service mesh. Before the components were talking to each other directly. With a service mesh, we take advantage of a construct in Kubernetes called a sidecar. It’s another container that runs beside your application container and can do all sorts of things.

In this case, our container runs a little load balancer or proxy. In Linkerd, it’s called the Linkerd2-proxy. It’s a purpose-built Rust proxy that is very fast, small, and will handle the traffic in and out of your application. It’ll do things like add mTLS, so all connections are encrypted and authenticated. That way I know the call from frontend to backend B is coming from frontend.

In addition to the encryption, it gives you standard metrics for every single call. Because the proxy is handling gRPC or HTTP requests, we can get a bunch of details like: Was this call successful? How fast did it go? What API are you calling?

The proxies make up, what we call, a data plane in the service mesh. We have a control plane— the user interface, how humans interact with the service mesh and give instructions to the proxies.

Daniel: Adrian asked: “You mentioned HTTP and gRPC. Does it also cover messaging like AMQP?”

Jason: I’m not super familiar with it but, if it’s like RabbitMQ, you can encrypt that TCP connection between your app and that instance. Depending on the protocol, you may or may not get higher-level metrics.

If you’re talking to MySQL database, it won’t tell you what API call it hit inside MySQL. Instead, you’re getting info like “this is an encrypted layer 4 trunk from your application to this database, and here’s what’s happening over that link.”

Going back. We’ve got our app with ingress and our app with a service mesh. How do we extend this? Essentially, you add your ingress and a proxy to that ingress. And that’s exactly what we’re going to do today.

The Linkerd service mesh [7:58 min]

Depending on what service mesh you’re familiar with, you may have different expectations. Linkerd is designed to only do service mesh functionality. It isn’t an ingress and doesn’t do ingress stuff; it isn’t an API gateway and doesn’t do API gateway stuff. It is only worried about east-west traffic (instead of north-south from the internet to our cluster). It’s worried about traffic inside the cluster between services, generally referred to as east-west traffic.

We’ll show how to get an end-to-end TLS connection for traffic coming from the internet into our cluster without instrumenting any TLS within our environment. We’ll only rely on Ambassador and Linkerd.

You’ll see all the complex integration points between them. Quick spoiler alert: there are no complex integration points.

When to Envoy, and when not to Envoy [9:21 min]

Daniel: There’s a question on Slack: “Is everything using Envoy proxy?”

Jason: It is not. Linkerd differs from most service meshes in that we wrote our own custom-built proxy called Linkerd2-proxy. We wrote a whole article on why Linkerd doesn’t use Envoy, but long story short: Linkerd’s micro-proxy is not a general-purpose proxy. We’ve also got a video and a blog on Linkerd and Emissary, the best of both worlds.

Emissary is built on the Envoy proxy because Envoy is really powerful. It can do all kinds of intelligent things and is somewhat complicated on account of that.

You can’t use Linkerd’s proxy as an ingress — it doesn’t know how to do anything beyond a service mesh. That’s why you don’t have to become a Linkerd proxy expert.

If you’ve used an Envoy-based mesh, you might be familiar with Envoy filter chains, etc. — there’s no equivalent within Linkerd. The only configuration required by the Linkerd proxy is ensuring it has the right resources for the app.

While we’re doing something complicated when integrating with the ingress, you’ll see that it’s straightforward. We’ll use Envoy because Emissary will be our Envoy ingress and we’ll integrate with a Linkerd-based mesh.

Demo time [11:20 min]

Let’s go ahead and install Ambassador. I have a cluster, so let’s just do k get nodes. Here’s my Emissary server and, if we look at all the pods, we see that nothing is going on — it’s a little in-memory k3s cluster. We’ll create an Ambassador namespace, do an Ambassador Helm install, and set one replica. Then, we’ll wait for it to get ready.

The first thing I’ll do is (going back to my diagram) add the Ambassador ingress. That’s where we are at now, we don’t have the app yet, that’ll come next. I’m using Ambassador, but you can also use Emissary. The big difference is that Ambassador allows you to auto-generate certs and do…

Daniel: …auth, rate-limiting. You can still do that with the Emissary ingress but you have to code a lot of it yourself — the APIs are available. I’m a big fan of the Ambassador Edge Stack (AES) out of the box.

Jason: I finished the Emissary install. Let’s do k get pods -n ambassador. I’ve got my Ambassador redis, my agent, and the cluster. This is my load balancer, so let’s do the Linkerd getting started guide. Then, we’ll do the Ambassador install.

I’ll do a Linkerd CLI install. I’m cheating a bit because I already have it installed. Linkerd check is built into the CLI. It’ll tell you whether you have the right permissions and your cluster is valid for installing Linkerd. I’m running k3s in Docker on WSL2 in Windows, so there’s a lot of weirdness.

It’s giving us a warning because we’re still using pod security policies and those are gone. Now let’s do a Linkerd install.

The Linkerd install command outputs a bunch of YAML. It’ll auto-generate a certificate for the root trust within your environment. Let’s take that output and send it over to kubectl apply and add it to the Linkerd check.

GitOps flow [15:18 min]

GitOps is a big theme. So let’s cover that real quick. The Linkerd install command uses the same templates as our Helm chart so, whether you use Helm, Kustomize, or the Linkerd CLI, you’re using the same arguments and getting the same YAML. If you hold that Linkerd YAML in version control, you can use it with a full GitOps flow.

Today, I’ll demo how cool our install command is. We’re going to apply this and pair it with a Linkerd check command that will tell me the status. We’ll create some stuff and then Linkerd check will look and block the rest of my script until Linkerd is up and running.

The Linkerd difference [16:17 min]

We’re using Linkerd 2.10.2 but have another release coming up in September which will include policy. Looking at service meshes, you’ll see different features and technologies. Our view of why Linkerd is the best service mesh on the market — I work for Buoyant and promote Linkerd exclusively, so take that for what it’s worth — is that it’s the easiest to use while providing you with key service mesh features with the least work. We continue to add more features but without adding complexity to the users. We want users who just want to do mTLS or gRPC load balancing or get standard metrics for their apps, to easily get those capabilities.

We installed core Linkerd. That’s what gets us mTLS, gRPC load balancing — all the goodies we want from a service mesh. We’ll also add the dashboard. In 2.10, we broke up Linkerd into various extensions so you can choose what you want to install in your cluster and have an easier experience.

Since I demo stuff, I need a shiny dashboard. I’m going to do a Linkerd viz install.

Adding the application [18:30 min]

We’ve run two commands so far: Linkerd install and Linkerd viz install. This is what we need to get that second diagram. Although we’re still missing the app, we’ve got our control plane and we’re ready to add our application.

I’m doing my demo backwards, so I’ll add the app next and then the Emissary or Ambassador to the mesh. We’ve got viz installed and can see from a check that everything’s healthy. I’ll do k get pods on all namespaces. I see Linkerd is installed — this is core Linkerd — and we have Linkerd’s viz components in their namespace.

Everything you see, all the Linkerd components, have two of two containers. Linkerd uses the proxy to encrypt traffic and provides monitoring details between the components. That’s why you see those components. Looking at Ambassador, there’s only one pod, we’ll fix that in a minute. First, I’ll add our demo app.

With this long command, I’m curling the emoji voto YAML. I’ll pipe that to the Linkerd CLI and inject it — a fearsome sounding word but all we’re doing is adding an annotation to the deployment that tells Linkerd to add a proxy to these containers.

We won’t modify the application code. If you look at emoji voto, there’s nothing Linkerd specific in there. Let’s grab the YAML, modify it, and hand it to Kubernetes. That’s the sequence of this command, so let’s run it.

Let’s switch our namespace to emoji voto. There are a couple of pods initializing. We can watch kubectl get pods.

Injecting X in a GitOps flow [21:40 min]

Daniel: Quick question, how would injecting work in a GitOps flow? Because the original YAML gets checked into git, not the injected one I assume.

Jason: Essentially, you put the namespace or the deployment with that annotation in git. Let’s take a look at one of these deployments.

k get deploy web -o yaml. This is the actual deployment, so let’s see what Linkerd injected. We take the YAML and add this annotation. You can also inject it at the namespace level and set this annotation to false or disabled it for apps that you didn’t want in the mesh. In a GitOps flow, you add these annotations to the object that’s going through the GitOps flow.

Daniel: Thank you, Jason. We’ve had similar challenges with Telepresence and have a mutating webhook to resolve some of this as well. Folks can have a look at our website to learn more about it. When you’re modifying the Kubernetes config, post the git config, it does get a bit complicated, depending on what tool you’re using. So it’s a great question and your answer was spot on.

There is some nuance as well, depending on whether you’re using Argo or Flux, but there are good patterns folks can look up online.

Connecting Ambassador with the service mesh [23:36 min]

Jason: We’ve got our application, it’s injected and part of the mesh, and Ambassador. Now we have to connect Ambassador to the mesh. To inject Ambassador, I’ll get the Ambassador deployment, output it as YAML (I could also do an edit just as easily), and skip inbound ports 80 and 443.

Let’s talk about this for a quick sec. Ambassador is the front door to your cluster and Linkerd handles east-west service-to-service traffic. Linkerd doesn’t add any value to things coming from the internet through the ingress. It doesn’t care until you’re inside the cluster.

When we skip inbound ports 80 and 443 on Ambassador, Linkerd ignores everything until Ambassador sends it into the mesh. So, after skipping inbound ports 80 and 443, we inject the ambassador deployment and send it back to the Kubernetes API.

It gives me a warning because I deployed it with Helm and am doing a live edit now. Again, for a GitOps flow, you just add the annotation to the Emissary or Ambassador instance as you go.

Setting up the Linkerd viz dashboard [25:18 min]

Let’s do a dashboard: linkerd viz dashboard

When running the Linkerd viz dashboard command, I get this dashboard which gives me a view of my cluster. Going back to our picture here, what we did in that last step is injecting the proxy into the ingress. My mapping will get me from the ingress to various components or my ingress rule.

I’m going to do one more thing: k apply -f. I want to add a mapping here. It’s git_repos/jasonmorgan/linkerd-demos/emissary/ This mapping will tell Emissary, or, in our case, Ambassador, to go from the front door, from traffic hitting the ingress, to my service. I’m going to call it emoji. If you hit any host at the root, I want you to send me to this web service emoji voto app — that’s all.

Again, I didn’t do a bunch of special integrations — these two CNCF projects work smoothly together. You get all that Envoy goodness in terms of rate-limiting, special header rules — things you can do at that layer in the ingress. Plus, you’re getting the benefits of the Linkerd mesh without a ton of work.

Let’s go to localhost 8443… I’ve routed traffic into emoji voto and I’m gonna spam it to get some traffic. We want to see how it looks inside Linkerd. Let’s run that Linkerd dashboard: linkerd viz dashboard

Exploring the Linkerd viz dashboard [29:00 min]

Now I can take a look at my environment and see what’s going on — nothing has been specially instrumented. I have a fully encrypted connection from my web browser to the front door at Ambassador through to the backends of emoji voto. I can see my cluster by namespace, amount of pods per namespace, and how many of those pods are meshed.

There’s a built-in Prometheus and Grafana within Linkerd viz, but you can also use your external Prometheus and Grafana without any issues. There are tons of docs on how to make that work.

I can sort by success rate to see what namespace has a sub 100 success rate. Surprise! Emojivoto it’s broken. There are about 10 requests a second and we can go see why. There’s a lot we can do, but we won’t get into diagnostics because it’s not really what we’re here for. Today, we’ll focus on the integration between Linkerd and Ambassador.

But long story short: the default behavior of Emissary/Ambassador is Kubernetes-native and, because everyone is honoring standard Kubernetes concepts and APIs, it works really seamlessly. We recently updated our docs for using the Ambassador ingress and it’s super simple!

Let’s click on web deployment. We can see that my web deployment is getting traffic from our vote bot. It’s the built-in traffic generator as well as from Ambassador. Ambassador isn’t having any issues.

We see live calls coming into this environment. Again, I can see from Ambassador that these are the endpoints inside the API that are exercised and these are the results by those endpoints. We can also see that all the connections are encrypted. We’ve got TLS everywhere for this example which is pretty cool!

Don’t proxies slow things down? [32:06 min]

Daniel: Sharon’s asked, do all the proxies involved slow things down?

Jason: Adding a proxy is gonna have a computational and time tax. For each raw request, it’s going to be more expensive in terms of time and compute power. But that changes if the proxy makes your app faster.

Say we have a couple of applications and they’re using gRPC. gRPC is a protocol that makes it easier to do connections between apps. They become really efficient in terms of data. But if I have five frontend applications that are making a gRPC call to a backend application… Actually, we can see it right in the dashboard.

In this environment, web is talking to voting and emoji over a gRPC connection. No matter how many frontends I have, when I make a connection over to one of the backends, what I get in Kubernetes is called connection-level load balancing. gRPC will build one connection and multiplex or run a bunch of requests over that connection. It’s super bandwidth-efficient and a really smart way to do it.

The problem with Kubernetes is that you get connection-level load balancing, so if I had five emoji pods, all connections are gonna go to one pod by default. If that one falls over, I’ll go to another pod. Your transactions could slow down if you have a lot of requests to any given backend with gRPC.

The exponentially weighted moving average (EWMA) magic [34:27 min]

With Linkerd, you get request-based load balancing. Linkerd opens up that multiplex connection from every frontend to every backend and load-balances the requests across those backends. We get faster response times than you’d otherwise get and more resiliency. There’s compute overhead for using proxies but that overhead is offset by the advantage from gRPC. That’s just one of many examples. The load balancing in Kubernetes is round-robin — it’s fine but dumb.

Linkerd’s routing is called EWMA, or exponentially weighted moving average. It looks for those pods that are going to respond fastest from where you’re at. One Linkerd user we’ve talked to has so much traffic that their per AZ cost of bandwidth was in the thousands of dollars a day. Just by adding Linkerd into their environment, they were picking pods within their own availability zones which shaved thousands of dollars a day in terms of bandwidth cost.

So, the answer is, yes, there’s a cost associated, but depending on your use case, the benefits may offset it and you may end up being a lot faster with those proxies.

Daniel: I hope, Sharon, that addresses all your questions. It’s always about trade-offs and you gave a great example there, Jason.

Argo Rollouts [36:30 min]

Jason: Do you want to see some Argo rollouts? Let’s plug the folks over at Civo because they give me extremely cheap Kubernetes clusters. I paid five dollars this month to run a Kubernetes cluster!

Let’s switch clusters and check out what we have in this other environment and I’ve got a link you can go to and see the blue-green as it happens. Let’s do it. k ctx infra

k get pods all namespaces. There is a lot running in the dashboard. You can check out dashboard.civo.59s.io and see this and various other apps.

I’ve got Argo Rollouts already installed in the cluster. There was a previous lesson on how to do Argo Rollouts. Long story short: I applied a YAML manifest and, if you’re using Argo to install rollouts, then you’ve already got a flow.

We have another app called podinfo.civo. Let’s talk through what objects we have and what’s going on. We’re going to do an update to our Civo environment and see what happens.

k get pods. I’ve got a frontend, a traffic generator, and this podinfo backend. Podinfo is another demo app that allows you to do all sorts of neat stuff. We made podinfo into an Argo Rollouts object. Rollout replaces the deployment and creates a new construct that you can manage with the Argo CLI commands

k get rollouts I’ve got podinfo and can do k get deploy in this namespace — we see the frontend and generator…

Let’s stop for a second and take a look at some YAMLs. Argo Rollouts is great but one thing I don’t always love about Argo Rollouts is that the documentation can get a bit wonky.

SMI-based rollout with Argo [39:48 min]

We are using the SMI-based rollout mechanism with Argo. SMI stands for service mesh interface, a project led by a working group within the CNCF that tries to make the concepts of inter-mesh interoperability a bit more universal.

Specifically, we’re using the traffic split object: k get crd. There’s a bunch of stuff related to k3s but, if we look for Linkerd custom resource definitions, we only got two. We have service profiles (we won’t get into those but they’re a way to improve your monitoring within Linkerd). And we have traffic splits which is this SMI specification that allows you to split traffic which we’ll use for Argo Rollouts. You’ll use the same thing for Flagger, by the way — very similar concepts.

Let’s talk about the code. This one had GitOps in mind so, as I created the namespace, I added the Linkerd inject enabled annotation. This is how I’d do it in a flow and then I use Kustomize “apply” to get this out.

Here is a simple config map for our frontend — you don’t have to worry about it. And this is for my traffic generator and we’ll ignore it, too. With SMI, you have to create all the services yourself. There’s a lot of upfront work around Argo Rollouts and SMI based-apply.

We have our podinfo service, our backend, the thing that we’re gonna manage with a rollout. We also have podinfo stable, the second version of that service that we have to create manually. You’ll see that the config is the same but you have to have three different versions. I also need podinfo canary. So, I have three services: podinfo, podinfo stable, and podinfo canary. This will allow me to do a traffic splits. I’ve got my deployment which we’ll ignore, I’ve got another deployment we’ll ignore, and we’ve got the rollout which is important.

I ask the rollout what service it will use for the canary and stable pods. And then, what the root service is. For this one, I had to dig in a little to find where it was. When using Argo Rollouts with Linkerd, you want to specify all three of these services: the canary, stable, and the root or apex service. I know this is weird but we’re going to talk about it and it’s gonna make more sense.

This is how much traffic we’re going to send at any one time. Then, I added the label to be thorough.

Blue-green demo [43:45 min]

Let’s do a blue-green deploy. I’m pretty literal, you’re going to see my blue-green here in a minute. I’ll change the podinfo UI color and then we’re going to reapply this.

kubectl apply -f git_repo/jasonmorgan/linkerd-demos/emissary/manifest/podinfo.yaml

When demoing locally, I don’t want to tie the route to a host. When I’m demoing in my Civo cluster, because I have proper DNS, I do. This host allows me to get my certificate. This is my podinfo certificate request — that’s pretty darn easy!

Let’s recreate that and unbreak my cluster: k get pods We can see that, where I should have three pods, I now have four, and we’ve got a traffic split.

Let’s see more: Linkerd viz stat ts. This allows me to get the statistics for my traffic splits in this namespace from the Linkerd viz CLI.

We can see now that 10% of my requests are going to canary instead of stable. If you’re on that page, you’ll see that now and then, the UI will turn blue. We can start upping that: kubectl argo rollouts dashboard.

Localhost 3100. I’ve got my canary and we can use Argo Rollouts. To be clear, this is a plain manual promotion process. Let me make this a little bit more complicated because I want the dashboard, too.

Buoyant Cloud in action [47:20 min]

Linkerd viz dashboard, we’re going to spin that up. Daniel, do I have time for a quick Buoyant Cloud plug?

Daniel: Yeah, feel free. Always good to hear about Buoyant Cloud!

Jason: We’ve got a managed version of Linkerd and I also added this cluster to my Buoyuant Cloud account.

I’ve got one cluster. I can see the topology, here’s my Ambassador ingress. It’s talking to my web, frontend, and all the various components. We can see that a bunch of them are broken. Clearly, these are broken on purpose. We can get the metrics between these components in a number of views. I can also see the health of my control plane and get any alerts, so I know that my environment is healthy and what version I’m using.

Say, I had a particular problem with podinfo. I could send a diagnostic bundle on any of my clusters. Check out Buoyant Cloud, it’s free for two clusters and up to 50 workloads. No reason not to do it and hopefully, it’ll make your Linkerd life even easier.

Argo Rollout promotions [49:13 min]

I see the traffic split, the success from the CLI, and I can also see it from the dashboard. I’ve got these metrics like success rate, requests per second, latencyand I can build Argo Rollout rules to look at Prometheus — either your Prometheus or the Prometheus that’s included with Linkerd — and use that to programmatically decide whether you should do the promotion.

I didn’t do that, so we’ll go ahead and do a promotion. We’ll see the amount of load changes and some more blue-greens.

Linkerd does intelligent stuff around traffic split, you can do it with Ambassador, and you can do it with both. You’re not limited and can use native features from either of these components because they work together seamlessly.

I’m using Argo Rollouts to handle my canary. Now we’re seeing the pod numbers change because we’ve gone beyond whatever limit, so it’s adding the next one.

If you’ve never seen the Argo Rollouts dashboard, it shows a bunch of stuff in here. One thing that confused me for a long time are these steps — it always seems to increment two steps at a time. It’s because you’ve got the step and then the pause, and the pause is a step.

That really got me, so when you see it going up unevenly, that’s why.

Daniel: That’s a great show because I’ve often wondered about what’s going on there!

Jason: Yeah, it totally throws you but that’s the story. I hope this was informative. We’ve got a couple of minutes, I’m happy to play around with anything, dig in, I’d love to hear from those listening.

Q&A not captured in this transcript