Buoyant’s Linkerd Production Runbook
A guide to running a service mesh in production

Last update: Sep 7, 2022 / Linkerd 2.12.0

Monitoring Linkerd

Monitoring Linkerd involves monitoring both control plane components and data plane components.

Note: Buoyant Cloud provides a comprehensive suite of monitoring and diagnostics for Linkerd, including monitoring of TLS certificate expiration, control plane and data plane health, and more. If you're using Buoyant Cloud, you may be able to skip this section.

Screenshot of Buoyant Cloud's monitoring alerts

Monitoring Linkerd control plane metrics

Central to monitoring Linkerd’s heath is monitoring its metrics. Since the Linkerd control plane runs on the data plane, you can use the same metrics pipeline you’ve already set up.

As a starting point, we recommend monitoring:

  • Existence of control plane components. Each component needs to be running in order for Linkerd to function.
  • Success rate of control plane components. This should never drop below 100%; any failure responses are a sign that something is going wrong.
  • Latency of control plane components. These levels should be set empirically and unexpected changes should be investigated.
  • Optionally, resource consumption of control plane components. This also requires tuning, as some components scale in memory and CPU usage with the overall level of traffic passing through the mesh. However, rapid changes are worth investigating, and consumption that approaches any resource limit should be addressed before it becomes a problem.

Monitoring Linkerd’s data plane

Monitoring of Linkerd’s proxies should focus primarily on resource usage, since the golden metrics reported will be that of the application pod. As with control plane components, the exact thresholds will be dependent on the traffic to the pod, and thus alerting should focus on rapid changes, or on situations where consumption approaches resource limits.

Accessing Linkerd’s logs

You can view logs from Linkerd’s control plane or data plane through the usual kubectl logs command. For the control plane, both the main container and the linkerd-proxy container for each pod may deliver usable information.

By default the control plane’s log level is set at the INFO level, which surfaces various events of interest, plus warnings and errors. For diagnostic purposes, it may be helpful to raise log levels to DEBUG; this can be accomplished with the linkerd upgrade --controller-log-level debug command.

Similarly, by default, the proxy’s log level is set to INFO. The log level of a proxy can be modified at runtime if necessary. Note that debug mode can be extremely verbose, especially for high-traffic proxies. Care should be taken to change the level back to INFO after debugging, especially in environments where increased log usage has a financial impact.

Sending proxy diagnostics to Buoyant

Buoyant Cloud users can use the Send Diagnostics feature to send metrics and log information direction to Buoyant for debugging purposes.