Aug 13, 2024
Note: this is an expanded version of the post on linkerd.io with additional details.
Today we're happy to announce Linkerd 2.16, a major step forward for Linkerd that adds a whole host of new features, including support for IPv6; an "audit mode" for Linkerd's zero trust security policies; a new implementation of retries, timeouts, and per-route metrics for HTTPRoute and GPRCRoute resources; and much more.
The 2.16 release also introduces two features to Buoyant Enterprise for Linkerd: automation for external workloads (e.g. VM applications) at scale, and a "send a flare" CLI tool to improve remote debugging and support.
See the full release notes or read on for details!
The 2.16 release continues our goal of ensuring Linkerd is the truly future-proof service mesh. We expect the Gateway API to emerge as the standard for traffic configuration in the Kubernetes space, and when that happens, Linkerd users will be ready. To this end, Linkerd 2.16 now publishes metrics for Gateway API HTTPRoute and GRPCRoute resources, so you can capture granular per-route success rates, latencies, request volumes, and other metrics without changing any application code.
Linkerd 2.16 also adds retry and timeout configuration to these same Gateway API resources, bringing the feature sets for Gateway API and ServiceProfiles to parity (as promised in our February Linkerd 2.15 announcement). This configuration is backed by a new implementation that improves upon Linkerd's earlier retry and timeout logic in two key ways:
Enabling Linkerd's new retry and timeout support is as simple as adding annotations to Gateway API resources. For example:
kind: HTTPRoute
apiVersion: gateway.networking.k8s.io/v1beta1
metadata:
name: myapp-default-route
namespace: myns
annotations:
retry.linkerd.io/http: 5xx
retry.linkerd.io/limit: "2"
retry.linkerd.io/timeout: 300ms
spec:
parentRefs:
- name: myapp
kind: Service
group: core
port: 80
rules:
- matches:
- path:
type: PathPrefix
value: "/foo/"
In short, Linkerd's new implementation of per-route metrics, retries, and timeouts are now provided in a principled, future-proof way that is composable with existing features such as circuit breaking, and configured using the Gateway API resources that we believe are the future of service mesh configuration. Learn more.
In Linkerd 2.15 we introduced mesh expansion capabilities, allowing Linkerd users to add non-Kubernetes applications, e.g. apps running on VMs, to the mesh and take advantage of Linkerd's secure, reliable, and observable communication without changing application code.
In Buoyant Enterprise for Linkerd 2.16, we've further improved this story through automation, adding several new components designed to make managing these external workloads easier:
These components combine to make meshing external applications with Buoyant Enterprise for Linkerd significantly easier, especially at scale. Learn more.
Linkerd's "zero trust" authorization policies provide a powerful and expressive mechanism for controlling which network traffic is allowed. They support a wide variety of approaches to network security, including micro-segmentation and "deny by default" policies. In contrast to ambient or host-proxy approaches, Linkerd's sidecar design provides a clear security boundary that fits directly into the zero trust model, where each pod makes its own authorization decisions independently, maintains its (and only its) TLS keys, and makes policy decisions based on cryptographic workload identity, not IP addresses.
However, introducing authorization policy in a live system can be tricky. To address this, Linkerd 2.16 introduces a new audit mode to policies. In this mode, policy violations are logged but not enforced. This allows policies to be rolled out in a lower-risk fashion, as they can now start in audit mode and only move to enforcement once fully vetted. Audit mode can be enabled cluster-wide, per-namespace, or on specific Server resources by setting the new `accessPolicy` field to `audit`, vs its default `deny`.
For example:
apiVersion: policy.linkerd.io/v1beta3
kind: Server
metadata:
namespace: emojivoto
name: web-http
spec:
accessPolicy: audit
podSelector:
matchLabels:
app: web-svc
port: http
proxyProtocol: HTTP/1
Similarly, the `linkerd policy generate` command in Buoyant Enterprise for Linkerd, which watches live traffic to a system and generates policy scaffolding that accounts for observed traffic, has been updated to use audit mode by default. Learn more.
Linkerd 2.16 adds support for IPv6 on IPv6-only and dual-stack clusters. (When enabled on dual stack clusters, Linkerd will only use IPv6 endpoints.) For backwards compatibility, this feature is disabled by default, but enabling it is a simple boolean. Learn more.
See the full Linkerd 2.16 changelog for more.
In May, cloud consultancy LiveWyre published a set of service mesh benchmarks showing that Linkerd resulted in lower latency and less resource consumption than either Istio or Cilium. This has been the consistent result of service mesh benchmarks since 2021, and we were happy to see this confirmed by another third party.
Momentum compounds, and right now Linkerd's momentum is at an all-time high. We've shipped six stable point releases for Buoyant Enterprise for Linkerd 2.15 and 29 hotpatch releases, each with extensive documentation and release notes, all designed to provide safe and stable upgrades to our customers while getting access to the latest bugfixes and CVE remediations. On the open source side, we've merged 250+ pull requests in Linkerd and published an average of 5 edge releases a month—more than one a week. There is a lot more great news to report, and early next month we'll publish a deeper retrospective of the past six months, but in short—it's an incredibly exciting time to be involved with Linkerd!
We're hard at work on egress functionality, which will provide both visibility into all traffic leaving the cluster as well as the authorization policies necessary to control it. Our original plan was to deliver this feature in Linkerd 2.16, but we ultimately decided some of the features we had already shipped were too good to delay any longer. Egress is now slated for the upcoming Linkerd 2.17 release, which should follow relatively quickly after 2.16. After egress we have our sights on ingress, plus a couple other exciting multi-cluster features to make managing clusters at scale a lot easier.
We've discussed deprecating ServiceProfiles in the past. Based on the extensive use within the Linkerd community, we've decided to continue supporting them for the foreseeable future. However, the new Gateway API retry and timeout logic is a separate implementation, and that's where our active development will be focused. We expect the feature gap to grow over time, and encourage you to migrate to the new types.
Many of the maintainers will be in attendance at Kubecon NA this November in Salt Lake City, UT, where we have a great lineup of Linkerd talks as well as many of your fellow Linkerd users. If you're attending the conference, please stop by the Linkerd booth in the Project Pavilion and say hi!
BEL is our production-ready distribution of Linkerd. It is a complete distribution of Linkerd plus a set of additional tools, features, and testing designed for sustained, production use, including a dynamic zone-aware load balancer which can dramatically cut cloud spend without the reliability sacrifices of Topology-Aware Routing; a Kubernetes operator that automates installs and upgrades; tools for managing external workloads at scale; and much more.
BEL is the distribution of Linkerd that we run in production ourselves. It's free for anyone to use in non-production environments and free for production use at companies with fewer than 50 employees. Get started with BEL in under five minutes!