Buoyant’s Linkerd Production Runbook

A guide to running a service mesh in production

Last update: Feb 9, 2021 / Linkerd 2.9.3

Welcome to Buoyant’s Linkerd Production Runbook! We’re thrilled that you’re taking Linkerd to production. Today, organizations around the globe rely on Linkerd for their mission-critical systems—including us. You’ll be in great company.

This goal of this guide is to provide concrete, practical advice for deploying and operating Linkerd in production environments. We’ve written this guide based on our experience helping organizations large and small adopt Linkerd, as well as our experience running Linkerd ourselves.

Note: this doc covers 2.x versions of Linkerd only. It does not cover versions 1.x or earlier.

How to use this guide

This runbook is meant to be used in conjunction with the official Linkerd docs. While this guide will give you our advice for getting to (and staying in!) production, the official docs contain the majority of the information you’ll actually need to understand to be successful with Linkerd. Whenever possible, we’ll point to the relevant docs section from this guide.

This is a “living document.” Linkerd moves fast. We’ll update this guide for every release and add our release commentary to the Upgrade Notes.

Finally, please read the important Disclaimer below. We do our best to ensure this doc is accurate, but mistakes, omissions, and inaccuracies do happen. Ultimately, you are responsible for your production systems, not us. (But if you do find an error in the guide, please tell us!)

Let’s get started!

Before going to production

Before you can be ready to deploy Linkerd to production, there are some things you should do to prepare.

Join the community

If you’re serious about operating Linkerd in production, you should join the open source community channels around it. This is important for staying aware of important updates and announcements, and for learning from other users who are doing the same thing.

We recommend you join:

You can also join:

All Linkerd development happens on GitHub. That’s also the best place to submit bug reports and pull requests. Please also star the repo to inflate our vanity metric.

Understand how to get help

If you need help with Linkerd, you have a couple of channels available to you. Our recommendations are:

  • One-off question? Start with the Slack room.
  • Complex troubleshooting? Spin up a thread on the mailing list or use GitHub Discussions.
  • Bug report? File a GitHub issue.

Of course, open source support is provided by the community on a best-effort basis. And don’t forget to help others—this is often the best way to give back!

(Note that if you are a Linkerd commercial support customer of Buoyant, you also have other, dedicated support channels available. Please consult the support onboarding instructions you received.)

Understand how to report a security disclosures

In the unlikely event that you discover a security vulnerability in Linkerd, please email the private [email protected] list. We’ll send a confirmation email to acknowledge your report, and we’ll send an additional email when we’ve identified the issue positively or negatively.

To receive notifications of vulnerabilities and critical updates, please subscribe to the linkerd-announce mailing list.

Understand where to get Linkerd

Linkerd is 100% open source, and the open source project contains everything you need to run Linkerd at scale and in production. Linkerd’s code is hosted in the GitHub repo. You may choose to build your own binaries or images from this code, or simply to use one of the published releases.

Open source releases are published in a split fashion: GitHub hosts the CLI binaries, and GitHub Container Registry hosts the container images. These are the canonical binaries and images for Linkerd.

(Note: Prior to version 2.9.0, container images were hosted on GCR rather than GHCR.)

Understand Linkerd’s versioning scheme

The 2.x branch of Linkerd follows two versioning schemes: one for stable releases and one for “edge” releases.

Stable releases

Linkerd stable releases follows a modified form of semantic versioning. Linkerd version numbers are of the form stable-2.<major>.[<minor>[.<patch>]]. Breaking changes (typically, configuration incompatibilities) and significant changes to functionality are denoted by changes in major version. Non-breaking changes and minor feature improvements are denoted by changes in minor version. Occasionally, we will release critical bugfixes to stable releases by incrementing the patch version.

For example:

  • 2.3.6 -> 2.4: major improvements, possible breaking changes
  • 2.3.6 -> 2.3.7: improvements or bugfixes, no breaking changes

Edge releases

Linkerd 2.x is also published in “edge” releases, typically on a weekly cadence. In contrast to stable releases, Linkerd edge releases follow a flat versioning scheme, of the form edge-<year>.<month>.<number>. Edge releases are provided to the community as a way of getting early access to feature work, and may introduce breaking changes at any point. Sometimes, edge releases are designated informally as a “release candidate” for the upcoming stable release; this designation also provides no guarantees about feature compatibility.

For example:

  • edge-20.4.5: the fifth edge release published in April 2020
  • edge-20.11.1: the first edge release published in November 2020

Understand feature denotations

Sometimes, Linkerd features are denoted as experimental in the documentation. This designation means that, while we feel confident in the viability of the feature, it hasn’t seen enough production use for us to recommend it unreservedly. Caution should be exercised before using an experimental feature in a production environment. The documentation for each experimental feature will describe why it has been classified this way; for example, “this feature has not been tested on all major cloud providers”.

Sometimes, Linkerd features as denoted as deprecated. This means that, while currently supported, we expect to remove the corresponding configuration in an upcoming release.

Rarely, we may denote features as not for production in the documentation. These features may be useful for debugging and getting started, but have known issues when applied to production. The documentation for each not for production feature will describe why it has been classified this way; for example, “this feature has known scaling issues above 10 services”.

Understand which environments are supported

The only requirement for Linkerd is a modern, functioning Kubernetes cluster. Regardless of whether the cluster is on-premises or in the cloud, and regardless of Kubernetes distribution or provider, if it’s running Kubernetes, generally speaking, Linkerd should work. (Of course, Linkerd does require specific capabilities and features of the Kubernetes cluster in order to function. See Preparing your environment for more on this topic.)

Understand which versions of Kubernetes are supported

Generally speaking, Linkerd follows the published policy for “supported” Kubernetes releases: effectively, the three most recent minor Kubernetes versions are supported. Of course, earlier Kubernetes versions may still work.

Going to production

With our preparations out of the way, let’s get into the details. In this section, we cover the basic recommendations for preparing Linkerd for production use.

Your preflight checklist

So you’re ready to take Linkerd into production. Great! Here are the basic steps we suggest for your production deploy:

  1. Prepare your Kubernetes environment. (See Preparing your environment below.)
  2. Configure Linkerd in a production-ready way. (See Configuring Linkerd for Production Use below.)
  3. Set up monitoring and alerting so that you’re informed if Linkerd’s behavior falls outside its normal operating range. (See Monitoring Linkerd’s Health below.)
  4. Have fun! Linkerd can be pretty amazing, in our opinion! :)

We’ll go over each of these in turn.

Preparing your environment

In this section, we cover how to configure your Kubernetes environment for Linkerd production use. The good news is that much of his preparation can be verified automatically.

Run the automated environment verification

Linkerd automates as much as possible. This includes verifying the pre-installation and post-installation environments. The linkerd check command will automatically validate most aspects of the cluster against Linkerd’s requirements, including operating system, Kubernetes deployment, and network requirements.

To run the automated verification, follow these steps:

  1. Validate that you have the linkerd CLI binary installed locally and that its version matches the version of Linkerd you want to install, by running linkerd version and checking the “Client version” section.
  2. Validate that the kubectl command in your terminal is configured to use the Kubernetes cluster you wish to validate.
  3. Run linkerd check --pre.
  4. Correct any issues in your environment highlighted in the output. Each failing check should contain a URL pointing to a more detailed explanation with possible solutions.

Once linkerd check --pre is passing, you can start to plan the control plane installation using the information in the following sections.

Validate certain exceptional conditions

There are some conditions that may prevent Linkerd from operating properly that linkerd check --pre cannot currently validate. The most common such conditions are those that prevent the Kubernetes API server from contacting Linkerd’s tap and proxy-injector control plane components.

Environments that can exhibit these conditions include:

  1. Private GKE clusters
  2. Private AKS clusters
  3. EKS clusters with custom CNI plugins

If you are in one of these environments, you may wish to do a “dry run” installation first to flesh out any issues. Remediations include updating internal firewall rules to allow certain ports (see the example in the Cluster Configuration documentation), or, for EKS clusters, switching to the AWS CNI plugin.

Provide sufficient system resources for Linkerd

Production users should use the high availability, or “HA”, installation path. We’ll get into all the details about HA mode later, in Configuring Linkerd for Production Use. In this section, we’ll describe our recommendations for resource allocation to Linkerd installed in HA mode. These values represent best practices, but you may need to tune them based on the specifics of your traffic and workload.

Control plane resources

The control plane of an HA Linkerd installation requires three separate nodes for replication. As a rule of thumb, we suggest planning for 512mb and 0.5 cores of consumption on each such node. Additionally, the Linkerd control plane, in its basic configuration, contains a single Prometheus instance; we suggest planning for 512mb minimum for this instance.

The Prometheus instance is worth some extra attention. At scale, control plane resource consumption is typically dominated by this Prometheus. A rough starting point is to expect 5mb of memory per meshed pod, but this can vary very wildly basic on traffic patterns. Our guidance here is to take an empirical approach and monitor carefully.

(There are configurations of Linkerd that avoid running this control plane Prometheus instance. We’ll discuss these below as well.)

Data plane resources

Generally speaking, for Linkerd’s data plane proxies, resource requirements are a function of network throughput. Our conservative rule of thumb is that, for each 1,000 RPS of traffic expected to an individual proxy (i.e. to a single pod), you should ensure that you have 0.25 CPU cores and 20mb memory. Very high throughput pods (>5k RPS per individual pod), such as ingress controller pods, may require setting custom proxy limit/requests (e.g. via the --proxy-cpu-\* configuration flags).

In practice, proxy resource consumption is affected by the nature of the traffic, including payload sizes, level of concurrency, protocol, etc. Our guidance here, again, is to take an empirical approach and monitor carefully.

Restrict NET_ADMIN permissions if necessary

By default, the act of creating pods with the Linkerd data plane proxies injected requires NET_ADMIN capabilities. This is because, at pod injection time, Linkerd uses a Kubernetes InitContainer to automatically reroute all traffic to the pod through the pod’s Linkerd data plane proxy. This rerouting is done via iptables, which requires the NET_ADMIN capability.

In some environments this is undesirable, as NET_ADMIN privileges grant access to all network traffic on the host. As an alternative, Linkerd provides a CNI plugin which allows Linkerd to run iptables commands within the CNI chain (which already has elevated privileges) rather than in InitContainers.

Using the CNI plugin adds some complication to installation, but may be required by the security context of the Kubernetes clusters. See the CNI documentation for more.

Ensure that time is synchronized across nodes in the cluster

Clock drift is surprisingly common in cloud environments. When the nodes in a cluster have different timestamps, the output won’t line up between log lines, metrics, etc. Clock skew can also break Linkerd’s ability to validate mutual TLS certificates, which can cause a critical outage of Linkerd’s ability to service requests.

Ensuring clock synchronization in networked computers is outside the scope of this document, but we will note that cloud providers often provide their own dedicated services on top of NTP to help with this problem, e.g. AWS’s Amazon Time Sync Service. A production environment should use those services. Similarly, if you are running on a private cloud, we suggest ensuring that all the servers use the same NTP source.

Ensure the kube-system namespace can function without proxy-injector

One sometimes surprising detail of Linkerd’s HA mode is that the proxy-injector component of Linkerd (which adds the proxy to scheduled pods) is configured so that it must be present and functioning before any pod on the system to be scheduled. In other words, in HA mode, if all proxy-injector instances are down, no pod can be scheduled. HA mode adds this restriction in order to guarantee that all application pods have access to mTLS—creation of any application pods that are unable to have an injected proxy could lead to an insecure system.

To avoid having this behavior affect core Kubernetes components and possibly rendering the cluster itself inoperable in the presence of complete proxy-injector failure, the kube-system namespace should have the config.linkerd.io/admission-webhooks=disabled Kubernetes label applied. This will allow system pods to be scheduled even in the absence of a functioning proxy-injector.

See the Linkerd HA documentation for more.

Configure your ingress for Linkerd

Adding Linkerd’s data plane proxies to ingress pods allows Linkerd to provide end-to-end mTLS and metrics. While in most respects this is identical to adding Linkerd’s data plane proxies to any other pod, it is different in one way: due to the specifics of how Linkerd routes requests, ingress pods will typically need to modify their Host header. Otherwise, an infinite routing loop will occur.

Follow the instructions in the Ingress configuration documentation to configure your ingress appropriately.

Configuring Linkerd for Production Use

Having configured our environment for Linkerd, we now turn to Linkerd itself. In this section, we outline our recommendations for configuring Linkerd for production environments.

Choose your deployment tool

Linkerd supports two basic installation flows: using a Helm chart, and using the CLI to generate a manifest. For a production deployment, either is acceptable, but we recommend a strategy that allows for a repeatable, automated approach. See the Helm docs and the CLI installation docs for more details.

Decide on your metrics pipeline

Each data plane proxy contains an endpoint for reporting metrics data for traffic seen by that proxy. By default, Linkerd’s control plane includes a “short term” Prometheus instance which scrapes these endpoints and aggregates this data for reporting. This instance powers the dashboard as well as the output of certain commands such as linkerd stat. This short-term, built-in Prometheus instance defaults to a limit of 6 hours of data, and will lose metrics data if restarted.

For production use, however, we recommend that you store this metrics data somewhere external to the cluster rather than relying on this short term instance. This provides several benefits: you can store data for more than 6 hours; you can avoid losing data on restart (including Linkerd upgrades); and you can aggregate data across clusters.

There are three basic ways to set up a metrics pipeline to handle Linkerd’s metrics.

  1. Use Prometheus federation. The simplest option is to federate data from the short-term Prometheus instance to a separate, off-cluster Prometheus instance. Refer to the docs on metrics federation for this approach.
  2. Use an external Prometheus instance to scrape Linkerd proxies directly. This option avoids running the per-cluster Prometheus entirely. This is a more resource-efficient option, at the expense of some complexity. Refer to the “Bring your own Prometheus” docs for this approach.
  3. Use a third-party metrics provider. In this option, you use an external metrics provider (e.g. DataDog) to host the metrics. We provide no specific guidance here; should you choose this route, most metrics providers are able to consume data from Prometheus, and either of the two approaches above may serve as a viable template.

Enable HA mode

Production deployments of Linkerd should use the “high availability”, or HA, configuration. This mode enables several production-grade behaviors of Linkerd, including:

  • Running three replicas of critical control plane components.
  • Setting production-ready CPU and memory resource requests on control plane components.
  • Setting production-ready CPU and memory resource requests on data plane proxies
  • Requiring that the proxy auto-injector be functional for any pods to be scheduled.
  • Setting anti-affinity policies on critical control plane components so that are scheduled on separate nodes and in separate zones.

This mode can be enabled by using the --ha flag to linkerd install or by using the values-ha.yaml Helm file. See the HA docs for more.

Create certificates and a rotation schedule for mTLS

For mutual TLS, Linkerd uses three basic sets of certificates: the trust anchor, shared across clusters; the issuer certificate, stored per cluster; and the proxy certificates, issued per pod.

Proxy certificates are automatically rotated every 24 hours without any intervention on your part. However, both the trust anchor and issuer certificate require configuration for production use. Without configuration, at install time, Linkerd will generate one-year self-signed certificates for both trust anchor and issuer certificates and will discard the trust anchor key. This allows for an automated installation, but it is rarely the configuration you want in production.

For production environments, we recommend manually generating trust roots and issuer certificates for clusters, and setting up automatic rotation for issuer certificates. We suggest:

  1. Creating a trust anchor with a 10-year expiration period, and storing the key in a safe location.
  2. Create issuer certificates with 1-year expiration periods.
  3. Setting up automatic rotation for control plane credentials with cert-manager. (See the rotation docs).

Certificate management can be subtle. Be sure to read through Linkerd’s full mTLS documentation, including the sections on rotating certificates.

Set up automatic rotation of webhook credentials

Linkerd relies on set of TLS credentials for webhooks. (These credentials are independent of the ones used for mTLS.) These credentials are used when Kubernetes calls the webhooks endpoints of Linkerd’s control plane, which are secured with TLS.

As above, by default these credentials expire after one year, at which point Linkerd becomes inoperable. To avoid last-minute scrambles, we recommend using cert-manager to automatically rotate the webhook TLS credentials for each cluster. See the Webhook TLS documentation.

Annotate any “server-speaks-first” protocols in use

The Linkerd proxies perform protocol detection to automatically identify the protocol used by applications. However, there are some “server-speaks-first” protocols which Linkerd cannot detect because the client does not send the initial bytes of the connection. If any these protocols are in use, you should configure Linkerd to skip the ports for these protocols.

(Note: in upcoming versions of Linkerd, you will be able to configure these ports as “opaque ports”, so that Linkerd can still apply connection-level features such as mTLS even in the absence of protocol detection.)

Known server-speaks-first protocols include:

  • 25 - SMTP
  • 3306 - MySQL
  • 4222 - NATS
  • 27017 - MongoDB

These ports and the ports of any other server-speaks-first protocols in use should be added to Linkerd’s configuration as skip-inbound-ports and skip-outbound-ports settings. See the Protocol Detection documentation for more.

Secure your tap command if necessary

Linkerd’s tap command allows users to view live request metadata, including request and response gRPC and HTTP headers (but not bodies). In some organizations, this data may contain sensitive information that should not be available to operators.

If this applies to you, follow the instructions in the Securing your Cluster documentation to restrict or remove access to tap.

Validate your installation

After installing the control plane, we recommend running linkerd check to ensure everything is set up correctly. Correct any issues in your environment highlighted in the output. Each failing check should contain a URL pointing to a more detailed explanation with possible solutions.

Once linkerd check passes, congratulations! You have successfully installed Linkerd.

Monitoring

Since Linkerd’s control plane also runs on its data plane, the same metrics pipeline for applications can be used to understand the health of Linkerd itself.

Monitoring Linkerd’s health

Monitoring Linkerd’s health is best done by monitoring the success rate and latency of its control plane components.

As a starting point, we recommend looking at the built-in Grafana dashboards for control plane health, e.g. the control plane namespace graphs at http://<local dashboard>/grafana/d/linkerd-namespace/linkerd-namespace. The Grafana dashboard for the Linkerd control plane will display success rates, latencies, and request volumes for all control plane components.

Configure your monitoring to alert on drops in success rate. You may also consider lower-severity alerting on spikes in latency or changes in request volume, which may be indicators of upcoming issues.

Accessing control plane logs

If necessary, you can view logs from Linkerd’s control plane through the usual kubectl logs command. Both the main container and the linkerd-proxy container for each pod might deliver usable information.

By default the control plane’s log level is set at the INFO level, which surfaces various events of interest, plus warnings and errors. For diagnostic purposes, it may be helpful to raise log levels to DEBUG; this can be accomplished with the linkerd upgrade --controller-log-level debug command.

Upgrading Linkerd

Generally speaking, Linkerd is designed for safe, in-place upgrades with no application downtime, when upgraded between consecutive stable versions—for example, from 2.8.1 to 2.9. (Upgrades that skip stable versions are sometimes possible, but are not always guaranteed; see the version-specific release notes for details.)

Note that, due to constraints that Kubernetes imposes, true zero-downtime upgrades are only possible if application components can themselves be “rolled” with zero downtime, as upgrading the data plane involves rolling injected workloads.

Upgrading Linkerd is done in two stages: control plane first, then data plane. To accomplish this, Linkerd’s data plane proxies are compatible with a control plane that is one stable version ahead; e.g. 2.8.1 data plane proxies can safely function with a 2.9 control plane.

Upgrading the control plane is typically done,via the linkerd upgrade command. This will trigger a rolling deploy of control plane components, which should allow critical components to be upgraded without downtime. (Note that, in the event something does go wrong, Linkerd’s data plane proxies will continue functioning even if the control plane is unreachable; however, they will not receive service discovery updates.)

Once the control plane has been updated, the proxy-injector component will start injecting data plane proxies from the corresponding (newer) version. Since Kubernetes treats pods as immutable, upgrading the data plane thus requires rolling application components Fortunately, because of the forward compatibility between data plane proxy and control plane described above, these data plane upgrades can be done “lazily”—it is not necessary to immediately roll data plane deployments after upgrading the control plane.

Our recommendated steps for upgrading are:

  • Thoroughly read through the version-specific upgrade notes for the new release.
  • Survey the data plane versions at play in the cluster (e.g. via linkerd check --proxy) and ensure that existing data plane versions are within parameters for the new control plane version. Typically, they should all be the same version corresponding to one stable release prior to the version to which you want to upgrade.
  • Upgrade the control plane by following the documentation, and monitoring for changes.
  • Upgrade the data plane by rolling application components when possible.

As with all modifications of critical system software, extreme care should be taken during the upgrade process. In our experience, human error is almost always the source of software failures.

Version-specific upgrade notes are published in the Linkerd Upgrade documentation.

Good luck!

We know that productionizing and being on-call for critical systems can be difficult, stressful, and often thankless. We’ve done our best to make Linkerd as simple as possible to operate, but successfully operating a Kubernetes platform is by no means a easy task. We hope Linkerd treats you well, and from one group of engineers to another: we wish you the best of luck.

(And once you’re up and running, add yourself to ADOPTERS.md and we will send you some Linkerd swag!)

Disclaimer

Buoyant has made best efforts to confirm the accuracy and reliability of the information provided in this document. However, the information is provided “as is” without representation or warranty of any kind. Buoyant does not accept any responsibility or liability for the accuracy, completeness, legality, or reliability of the information in this document. Importantly, the information in this document is of a general nature; applicability and effectiveness will vary user by user according to use case, technical environment, traffic patterns, and integration, among many other factors.

No warranties, promises, or representations of any kind, expressed or implied, are given as to the nature, standard, accuracy, or otherwise of the information provided on this document, nor to the suitability or otherwise of the information to your particular circumstances.

Buoyant shall not be liable for any loss or damage of whatever nature (direct, indirect, consequential, or other) whether arising in contract, tort or otherwise, which may arise as a result of the use of (or failure to use) the information in this document.

Copyright © 2021 Buoyant, Inc. All rights reserved. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Appendix: Upgrade notes

2.9.3

This stable release fixes an issue that prevented the proxy from being able to speak HTTP/1 with older versioned proxies. It also fixes an issue where the linkerd-config-overrides secret would be deleted during upgrade and provides a linkerd repair command for restoring it if it has been deleted.

Who should upgrade: Several classes of users should upgrade to this release. First, all users who upgraded from 2.8.x to 2.9.x should upgrade to this release prior to upgrading to the (future) 2.10 release. Second, 2.8.x users who were unable to upgrade to 2.9.x due to errors with communication between 2.9.x and 2.8.x proxies over HTTP/1 should upgrade. Finally, users who used cert-manager to automatically rotate webhook certificates (as opposed to mTLS certificates) should upgrade.

Before upgrading: Please review the 2.9.3 release notes.

2.9.2

Release summary: This stable release fixes an issue that stops traffic to a pod when there is an IP address conflict with another pod that is not in a running state.

It also fixes an upgrade issue when using HA that would lead to values being overridden.

Who should upgrade: Users who are experiencing unexpected traffic stops with Linkerd 2.9.1.

Before upgrading: Please review the 2.9.2 release notes.

2.9.1

Release summary: This stable release contains a number of proxy enhancements: better support for high-traffic workloads, improved performance by eliminating unnecessary endpoint resolutions for TCP traffic and properly tearing down serverside connections when errors occur, and reduced memory consumption on proxies which maintain many idle connections (such as Prometheus’ proxy).

On the CLI and control plane sides, it relaxes checks on root and intermediate certificates (following X509 best practices), and fixes two issues: one that prevented installation of the control plane into a custom namespace and one which failed to update endpoint information when a headless service was modified.

Who should upgrade: Users with high-traffic workloads or who are experiencing issues with the 2.9 release.

Before upgrading: Please review the 2.9.1 release notes.

2.9.0

Release summary: This release extends Linkerd’s zero-config mutual TLS (mTLS) support to all TCP connections, allowing Linkerd to transparently encrypt and authenticate all TCP connections in the cluster the moment it’s installed. Other notable features in this release are: support for ARM architectures, a new multi-core proxy runtime for higher throughput, and support for Kubernetes service topologies.

Who should upgrade: This is a feature release.

Before upgrading: Please review the 2.9.0 upgrade notice and release notes.

2.8.1

Release summary: This release fixes multicluster gateways support on EKS.

Who should upgrade: EKS users who desire cross-cluster connectivity.

Before upgrading: Please review the 2.8.1 release notes.

2.8.0

Release summary: This release introduces a new multi-cluster extension to Linkerd, allowing it to establish connections across Kubernetes clusters that are secure, transparent to the application, and work with any network topology.

Who should upgrade: This is a feature release. However, support for multi-cluster connectivity in EKS is a known issue. Users who desire this feature on EKS should delay upgrading until 2.8.1, expected within a few weeks.

Pleaes review the 2.8.0 upgrade notice and release notes.

2.7.1

Release summary: This release introduces substantial proxy improvements, resulting from continued profiling & performance analysis. Also support for Kubernetes 1.17 was improved.

Who should upgrade: Users of Kubernetes 1.17, and users who are experiencing missing updates from service discovery (often manifesting as 503 errors).

Before upgrading: Please review the 2.7.1 release notes.

2.7.0

Release summary: This release adds support for integrating Linkerd’s PKI with an external certificate issuer such as cert-manager as well as streamlining the certificate rotation process in general. For more details about cert-manager and certificate rotation, see the documentation. This release also includes performance improvements to the dashboard, reduced memory usage of the proxy, various improvements to the Helm chart, and much much more.

Who should upgrade: This is a feature release.

Before upgrading: Please review the 2.7.0 upgrade notice and release notes.

2.6.1

Release summary: This release improves proxy stability by fixing a bug where the proxy could stop receiving service discovery updates, resulting in 503 errors.

Who should upgrade: Users of Linkerd 2.6 who are experiencing 503 errors not caused by the underlying application should upgrade.

Before upgrading: Please review the 2.6.1 release notes.

2.6.0

Release summary: This release introduces distributed tracing support, adds request and response headers to linkerd tap, dramatically improves the performance of the dashboard on large clusters, adds traffic split visualizations to the dashboard, adds a public Helm repo, and many more improvements!

Who should upgrade: This is a feature release.

Before upgrading: Please review the 2.6.0 upgrade notice and release notes.