Sep 8, 2023
Guest post by Michael Levan
In my previous two blog posts, I introduced the Linkerd service mesh and showed you how to deploy it in Azure, AWS, or GCP. Now that we understand what a service mesh is and why you need one, and have deployed it in our environment, we'll focus on a core day two concern: how to effectively monitor your network with the CNCF-graduated service mesh.
A service mesh is key to ensuring network, security, and reliability success. Without one, you can't really understand what’s happening inside your cluster at a network layer. You'd be basically deploying applications while flying blind. Service meshes provide deep insights into cluster health, providing powerful observability capabilities. Seems like a no-brainer, right? So, let's dive right into it!
To get started, be sure to have the following available:
To see the power of Linkerd's monitoring capabilities in action, you'll need various microservices. The Sockshop demo app is a great option and includes various Kubernetes resources. You can find the Sockshop repo here. Now, let’s deploy it.
First, clone the Sockshop application repository.
Change the directory (cd) into the microservices-demo directory.
Then, change the directory to the Kubernetes deployment directory.
The Sockshop demo expects to be installed in the sock-shop Namespace, so we need to create that.
Lastly, deploy the Sockshop application.
You should now see a similar output to this on the terminal.
It will take a few minutes for the sock shop to be running; you can run the following Kubectl command to wait for it:
Once it’s all ready, let’s take a look inside the sock-shop namespace:
You should see many Pods and Services, something like:
Note that most Pods have only one container (with the exception of the rabbitmq Pod). Go ahead and inject Linkerd’s sidecar into the Deployment Pods, so they’re fully managed by the service mesh:
Again, it will take a few minutes for everything to be ready again, so rerun:
Once that’s done, you can re-fetching the Pods, and you’ll see an additional container in each Pod — that's the Linkerd sidecar.
Now that we deployed the Sockshop application, let's monitor it with Viz — Linkerd's built-in monitoring tool. Viz allows you to see all service mesh resources and provides deep insights into each pod (more to that, in the next section). So let's get it installed.
To install Viz, use the Linkerd CLI with the following command:
You’ll need to restart the sock-shop deployments for the full benefit of Linkerd Viz:
Finally, launch a browser to look at the Viz dashboard. This actually starts a port-forward, which is why we’re running it in the background:
Once Viz is installed and running, we can use it to observe the meshed resources. On the Linkerd Viz dashboard, click the Namespace section and select the sock-shop Namespace.
Under the HTTP metrics section, you’ll see the sock-shop Namespace. Click on it.
As you can see, all Deployments within the sock-shop Namespace are now successfully meshed. You can also see the success rate and latency of each Deployment.
Let's see what else we can find out about a particular deployment. Click on the Carts Deployment. Now we can see the success rate and latency for each service within that pod.
On the same page, scroll down, and you'll see inbound and outbound routing to the Deployment. This allows you to understand what's happening within the pod network of your application. Understanding what's “talking to” your app from a networking perspective, is crucial for troubleshooting and understanding pod communication.
Let's dive a little deeper. Click on the Cart Pod.
Now you'll see communication flow to and from the pod, both inbound and outbound. If you look at the Edges display, toward the bottom, you can see markers on the right side of the page that show that mTLS is actually happening.
For a deeper look into the various application routes, use the linkerd viz routes command.
The Linkerd Viz dashboard has a lot of great features, but it’s not great for long-term monitoring, and it’s not very customizable. For production scenarios, you’ll want a monitoring tool that keeps historical data about your environment, and works with other monitoring and graphing solutions. That’s where Prometheus and Grafana come in.
Linkerd Viz is really a front end that displays metrics from a Prometheus data store. This means that all the data that Linkerd supplies and Linkerd Viz displays are available to any visualization, monitoring, or alerting system that uses Prometheus metrics.
Installing Linkerd Viz automatically installs a Prometheus instance, which we’ll use to demonstrate the functionality that Prometheus brings to the table. It’s important to note that you should never use the Prometheus that Linkerd Viz installs in production: it uses volatile storage for its metrics, so every time its Pod restarts, you’ll lose all your historical data. Instead, install your own Prometheus and use that, as described in LInkerd’s documentation.
Prometheus ships with its own dashboard, so we’ll start with that. To view the Prometheus dashboard, use the following command for port forwarding.
Once the port-forward is running, open http://localhost:9090/ to see the Prometheus dashboard. Go to Status (in the upper left of the window) and select Targets from its pulldown:
Within the targets, you’ll see that all Linkerd resources are automatically observed via Prometheus. It’ll show the endpoints for metrics, labels, and the endpoint status to ensure it’s up and operational.
The raw Prometheus dashboard is honestly less useful than the Linkerd Viz dashboard. That’s OK: the real point here is that since we have all this information in Prometheus, we can easily bring any tool based on Prometheus into play – for example, Grafana.
Grafana is one of the most popular monitoring UIs in the Kubernetes space, so let’s take a look at it. We’ll install it per Linkerd’s instructions at https://linkerd.io/2/tasks/grafana/:
The values.yaml we’re supplying as input configures Grafana with several default charts published by the Linkerd team at https://grafana.com/orgs/linkerd – we’ll take a look at one of the charts in this article, but you can always refer to the official list for a bit more information.
Done? You should see an similar output to this one:
(PAY CAREFUL ATTENTION to that warning at the bottom! Remember, we’re doing this with the Prometheus installed by Linkerd Viz: you do not want to do that in production, which is why that warning appears.)
Wait a few minutes for the full Grafana installation to be complete. To confirm the installation has been completed, you can use kubectl rollout status again:
An additional very important note is that we need to make sure that Grafana has access to its Prometheus instance. If you’re using your own Prometheus instance (as you should!) you’ll need to verify this for your own environment. Since we’re using the one installed by Linkerd Viz, we can install a Linkerd AuthorizationPolicy that allows access from Grafana’s ServiceAccount to the prometheus-admin Server that’s created by Linkerd Viz:
(This YAML is linked from the docs at https://grafana.com/orgs/linkerd.)
Once that’s done, we should have a functioning Grafana! While you can use port forwarding to get access to the Grafana directly, it’s much simpler to connect it to Linker Viz and just use links shown on the Linkerd Viz dashboard. To make that happen, we’ll rerun linkerd viz install to connect it with Grafana:
Reload the Linkerd Viz dashboard and go back to the “Namespaces” display. Now you’ll see little Grafana icons on the right of the dashboard:
Click on the Grafana icon at the right of the “sock-shop” namespace and you’ll see a Grafana dashboard showing statistics for that namespace! Though a full discussion of everything Grafana brings to the table is out of scope for this blog, the dashboard is a great way to get a quick overview of how your application is performing.
There’s also Buoyant Cloud, a paid dashboard from the creators of Linkerd, which automates Linkerd and provides users with deep insights into the health of Linkerd and the behavior of the applications. Its free tier makes it easy to try out (see https://buoyant.io/cloud), It’s fully hosted and can be used by multiple users, and it provides a simple cross-cluster view.
We deployed the Sockshop app, a demo application with multiple microservices, and meshed all services. We used Viz, Linkerd's default application tool, to see application success rate and latency for each service within each pod. Viz is a great tool for dev environments, but if you're running in production, you'll need a more powerful monitoring tool, which is why Linkerd metrics are based on Prometheus: any of the great tools in the Prometheus ecosystem (like Grafana) are straightforward to set up to use Linkerd metrics directly. In our case, we installed Grafana and used the freely-available Linkerd Grafana dashboard to get great visualization easily and quickly. (Just remember: in production, install your own Prometheus instance!)
Prometheus and Grafana give you powerful open-source monitoring tools for your whole environment, and for deeper, cross-cluster data, you can use the free tier of Buoyant Cloud or the paid version for additional enterprise features.