The Creators of Linkerd
DevOps teams with Linkerd in production have bypassed the pain that plagues many Istio deployments.
The service mesh architecture helps IT teams manage complex networks through a web of specialized code components called sidecar proxies. It has gained popularity over the last two years among users of Kubernetes and microservices, since it offers detailed observability and security for connections between distributed applications.
Istio, a project established by Google, IBM and Lyft in 2017, boasts the largest contributor community and broadest production use so far among enterprises, but the Cloud Native Computing Foundation’s (CNCF) service mesh project Linkerd is a strong second.
Of 1,324 respondents to a CNCF survey in 2020 who use a service mesh in production, 47% used Istio, followed by Linkerd and HashiCorp Consul, both at 41%. In 2019, Linkerd trailed Consul, according to the 2020 survey report, though specific 2019 production use numbers were not published. However, Linkerd was being evaluated by 64% of respondents in 2019, putting it second to only Istio at 69%.
Istio still offers more advanced features and flexibility, especially in network policy management for security. But for some early adopters choosing sides in the Linkerd vs. Istio battle, some of the management headaches discussed by Istio users at last week’s IstioCon virtual event have been non-issues.
“We are using the auto-injection feature, so that we can just annotate some [Kubernetes] namespaces and then proxies will be automatically injected,” said Fredrik Klingenberg, senior consultant at Aurum AS, which helped design and set up a Linkerd service mesh for Norwegian consumer electronics firm Elkjøp last year. “Something we’ve enjoyed with Linkerd is that … you can [also] opt out of it quite easily.”
By contrast, the Istio discovery mechanism watches all services in a Kubernetes cluster, regardless of whether they are part of the mesh. This means services outside the mesh can potentially disrupt its performance. A proposed change that would limit Istio discovery to specific Kubernetes namespaces was recently submitted upstream.
In large multi-cluster deployments, Istio sometimes requires platform engineers to set up configuration tools such as Intuit’s Admiral or Helm starter templates to abstract its complex configuration from developers. By contrast, Linkerd is simple enough for developers to add – and remove – on their own.
“Kubernetes is complicated enough,” Klingenberg said. “There’s a lot of other moving parts, and we wanted to solve [specific] problems really, really well.”
The problems Elkjøp needed to solve were primarily troubleshooting and network monitoring within Kubernetes clusters shared by hundreds of apps and multiple development teams. Here, the ability to not just add Linkerd easily to Kubernetes clusters, but to also quickly take it out helped the company resolve an issue that arose just as it put the mesh into production.
“We got a really, really sneaky error that once we start to get more traffic, more load [onto Linkerd], suddenly, outbound calls from Kubernetes started failing, and just disappearing,” said Henry Hagnäs, formerly an enterprise cloud architect at Elkjøp, now an Azure data center lead at Microsoft.
We use the [Linkerd] dashboard to check that code is using the network in an efficient way.Henry Hagnäs, Former enterprise cloud architect, Elkjøp
We use the [Linkerd] dashboard to check that code is using the network in an efficient way.
Henry Hagnäs, Former enterprise cloud architect, Elkjøp
The issue turned out to be the way the company’s apps interacted with network address translation in the Microsoft Azure cloud, not a problem with the Linkerd service mesh. The ability to quickly add and remove the mesh helped eliminate it as a root cause almost immediately, Hagnäs said. Now that Elkjøp engineers understand the issue, Linkerd’s Prometheus integration and observability dashboards help them ensure it doesn’t recur.
“We use the [Linkerd] dashboard to check that code is using the network in an efficient way,” Hagnäs said. “Part of our QA process now is to look at the Linkerd dashboards to make sure that the code is actually doing what it’s supposed to.”
Linkerd telemetry has helped eliminate finger-pointing between developer teams at Elkjøp by pinpointing problems within the complex microservices network more accurately than would be possible without a mesh, according to Hagnäs.
“You can get some telemetry from the Azure platform, but you just get, ‘Oh, your Kubernetes platform is using lots of ports,'” he said. “And that’s not useful if you have 200 applications.”
Any service mesh carries these observability benefits, including Istio, but Linkerd’s simplicity is better suited to Elkjøp, which has many microservices to manage but two primary Kubernetes clusters. It has no need to integrate the meshes between these clusters or extend the service mesh environment to VMs, two areas where the Istio project has made recent progress. Istio also retains a lead over Linkerd with network policy management for security, but Elkjøp is more focused on service mesh observability features.
Linkerd was not without some snags as Elkjøp moved it from a proof-of-concept deployment to production, Klingenberg said.
“[We needed] a little bit of help on how we should run the Linkerd control plane in high-availability mode, and there was another issue where we wanted to do [a] gRPC [connection] to a Thanos store API,” he said. “But the documentation was there; we just needed to be pointed in that direction [by the community].”
Like many large enterprises, HP Inc. runs a little bit of everything within its various departments – the company has a close relationship with Google, and some HP teams use Istio with Google’s Apigee API management tool. But for one team in HP’s print services division, Linkerd was the best fit for a customer-facing cloud platform launched in March 2020.
“My concern was, I don’t have a large team with a lot of resources dedicated to infrastructure,” said Chris Campbell, a cloud platform architect at HP. The platform serves 800 developers, but 10 people work on Campbell’s team, with two focused on Kubernetes and service mesh.
“The experience of operating Linkerd … reminded me of my first time using Docker… [which] made it easy enough that the average developer could do containers,” Campbell said.
Prior to its shift to a monolithic architecture with version 1.5, Istio required the installation of five separate control plane components, while Linkerd required one Kubernetes Custom Resource Definition (CRD).
By contrast, when Campbell installed Istio as a test, his application immediately stopped working.
“I had to go and figure out all the ins and outs of what I needed to add, trying to kind of handle some of their advanced use cases, and had to navigate quite a few CRDs,” he said.
The fact that Linkerd is governed by the CNCF is also important to Campbell, and he joined the project’s newly formed Steering Committee in January as one of its founding members to support it in catching up to some of Istio’s more advanced features.
“One thing I’ve been bugging [them] about for maybe a year now is getting network policies into Linkerd,” he said. Linkerd can apply network policies to segment traffic at Layers 3 and 4 of the OSI Network Model, expressed in terms of IP addresses and ports, but doesn’t yet support Layer 7 policies. These restrict application-specific access to sensitive data on the network via the service mesh.
“That’s something that’s going to come up for us here pretty soon as we add more services and want to consolidate Kubernetes environments,” Campbell said. “We need that network security control at an infrastructure layer.”
Linkerd officials confirmed network policies will be added in Linkerd version 2.11, due out in the second quarter of 2021.