Buoyant’s Linkerd Production Runbook
A guide to running a service mesh in production
Note: Buoyant Cloud provides a comprehensive suite of monitoring and diagnostics for Linkerd, including monitoring of TLS certificate expiration, control plane and data plane health, and more. If you're using Buoyant Cloud, you may be able to skip this section.
Central to monitoring Linkerd’s heath is monitoring its metrics. Since the Linkerd control plane runs on the data plane, you can use the same metrics pipeline you’ve already set up.
As a starting point, we recommend monitoring:
Monitoring of Linkerd’s proxies should focus primarily on resource usage, since the golden metrics reported will be that of the application pod. As with control plane components, the exact thresholds will be dependent on the traffic to the pod, and thus alerting should focus on rapid changes, or on situations where consumption approaches resource limits.
You can view logs from Linkerd’s control plane or data plane through the usual kubectl logs command. For the control plane, both the main container and the linkerd-proxy container for each pod may deliver usable information.
By default the control plane’s log level is set at the INFO level, which surfaces various events of interest, plus warnings and errors. For diagnostic purposes, it may be helpful to raise log levels to DEBUG; this can be accomplished with the linkerd upgrade --controller-log-level debug command.
Similarly, by default, the proxy’s log level is set to INFO. The log level of a proxy can be modified at runtime if necessary. Note that debug mode can be extremely verbose, especially for high-traffic proxies. Care should be taken to change the level back to INFO after debugging, especially in environments where increased log usage has a financial impact.
Buoyant Cloud users can use the Send Diagnostics feature to send metrics and log information direction to Buoyant for debugging purposes.