The State Of Service Mesh

Do you need a service mesh? Is service mesh getting easier? What is the current state of the service mesh ecosystem? Join us for a live interactive session where our panel of service mesh experts from various service mesh projects (Istio, Linkerd, Consul Connect and Kuma) are eager to discuss with you the latest service mesh technology improvements, best practices for adopting service mesh, challenges that lie ahead for these projects, and what’s next for service mesh in the coming year.

Transcript

Note: this transcript has been automatically generated with light editing. It may contain errors! When in doubt, please watch the original talk!

Hi everyone. This is the state of the service mesh panel. My name is Lin Sun and I’m the director of open source with Solo.io. I am super excited to moderate the panel for you. We have an excellent line of panelists! Idit, can you introduce yourself?

Sure, my name is Idit Levine and I’m the founder and CEO of Solo. At Solo, we are trying to make service mesh easier to adopt and operate, focusing mainly on Istio right now.

I’m Nic Jackson, I work as a developer advocate at Hashicorp and I’m also writing a book on service mesh with O’Reilly.

Hi everybody, my name is Marco. I’m the co-founder and CEO of Kong. Kong is the creator of Kong Mesh and one of the maintainers for Kuma.

Hi, I’m Louis Ryan. I work at Google and I spend a lot of time working on Istio and Google’s Istio-related products.

Hi, I’m William Morgan, CEO of Buoyant. I’m also one of the creators of Linkerd.

Excellent! A lot of service mesh knowledge here. I would like to ask our panelists:

How should a user decide if they need service mesh?

Idit?

Like everything in life, there is a trade-off. I mean, there are a lot of advantages that service meshes bring to the table. Specifically, all service meshes focus on observability, security, as well as routing, traffic, and policy. That brings a lot of benefits. Although, if you have one application with two microservices, maybe it’s an overkill.

I think the trade-off is where people really need to play. It’s the volume of microservices, the amount of application you have, the team working with that, versus the complexity of operating it.

Marco, anything to add?

I guess, perhaps another way to frame this question would be to ask ourselves: when should our application team stop building and reinventing the wheel when it comes to service connectivity, every time they create a new service or a new application?

Service mesh has always been positioned as yet another thing that we have to build, implement, and deploy. But perhaps it is an opportunity for us to stop doing the hundreds of things application teams are doing whenever they want to make a request or receive a request over the network.

The things that a service mesh solves, are not things we don’t need without a service mesh. We still need security and observability. The difference is how we implement them. A service mesh allows us to do it from the infrastructure. They’re forfeiting very precious time from the application teams.

I’ll just chime in and say, we help a lot of organizations adopt Linkerd, and typically there’s the value prop of the service mesh and then there’s the “are you set up for this?” And the “are you set up for this” component means are you already operating in a cloud native way? Do you have a platform team that owns the underlying platform that can take on the service mesh as part of their responsibility? If you don’t have that, then all the technology in the world is not going to help you. Are you operating in a world where developers are able to own their services and build on top of the platform without having to understand every detail of the platform? If you don’t have that, then the technology is not going to help you.

So, there are some organizational prerequisites we typically look for first, before even having the conversation. Like, are we going to improve your observability or not?

These are really [great] insights. Nic or Louis, …?

I much agree with Marco. I think it’s a trade-off between a platform running a platform and having to write code. But I do think that the problem of writing the code isn’t necessarily going to go away. The problem around managing the platform will get easier as time goes on and various vendors come to the market providing managed service meshes.

Louis, anything to add?

Yeah, I think talking about platform and platform management, William, Nic, and Marco covered that pretty well. There’s also the top-down constraint. If you’re in a regulated industry and you have to do zero-trust networking, your options are moderately limited. They range from open source to the eye watering-ly expensive commercial solutions. Those are the things that often drive these decisions outside the core “are you ready to engage with the value prop of the mesh.” Is it meaningful to you or do you need to get to a different level of maturity?

Yeah, absolutely! The next question I would like to know the answer to is, from your perspective:

Is service mesh getting easier for the enterprise to adopt?

William, do you want to start that one?

Is it getting easier? I guess. It’s not getting harder. I think for enterprises, especially, there’s a new swath of vendors and some of the landscape becomes more complicated. Now your feature matrix has 100 rows, whereas last year it might have had 50 rows. In some ways, it gets a lot more complicated. But tooling is being built up, so it’s a little bit of a mix, honestly.

Nic, anything to show?

No, I pretty much would echo what William just said.

Cool! Idit?

So, I think it’s getting easier, but we’re just scratching the surface of how to make it even easier. Right now, we are getting to the point where the service mesh is mature enough so is easier to adopt. But, as I said, there is way more we can do like building easier tools, make the user experience easier, feed it better to the organization that’s going to run it, etc. So, I think it’s getting easier but it’s still hard. Hopefully, we will work as a community to make it simpler.

Marco, anything to add?

I’m a big believer in simplicity. Simplicity is a feature, good documentation is a feature. It’s easier to adopt, easier for the platform teams to deploy, easier for the application teams to write their software knowing that there is a service mesh.

There are two angles to this question. Service mesh is certainly getting easier to use, easier to deploy. I guess the industry can do better educating the application teams on how to operate with the assumption that there is a service mesh running in the underlying infrastructure.

Louis, anything you want to add?

I think that the service mesh products that are out there have gotten collectively better at day zero and day one. So, what you see now in enterprises, is that day two operation stuff starting to dominate conversations, at least at the ones I have been.

If you ship four releases a year and the company has the manpower to absorb one update per year, how are they supposed to engage with the product? What are their costs to perform upgrades? You’re seeing a growth in the number of managed service mesh offerings, not just installed offerings, in response to that. So, the conversation has shifted. The early adopters are now a little bit later in their maturity cycle and are dealing with those day-two issues, and those are becoming known to the buying side of the market, so they’re asking the same questions in RFCs and RFPs.

There’s still plenty of room for growth and to make it easier for enterprises to adopt and maintain. They won’t be willing to engage if they don’t feel they can maintain it long-term.

Totally. But it’s exciting that we’re making at least onboarding easier.

What is the current state of service mesh?

Nic, anything you want to share?

I think it’s really good. Louis and Marco touched on this, which is around knowledge. The fact that the practitioner can now more easily find information on how do I do X with product Y, makes a massive advancement in the successful use of the tools and the adoption. As time goes forward, and more and more people are creating tutorials, videos, blog posts, and things like that kind of community contribution of knowledge really helps the adoption of service mesh. It can always get better and it will, but I think right now, it’s really good, it’s not something you question whether it’s a production-ready technology anymore, and that’s awesome.

Marco, anything to add?

I agree with Nic. Service mesh is one of those things that, five years from now, looking back, will feel inevitable. We are distributing our applications, we are decoupling them so we can deploy them faster in a highly available way and, the more services we create, the more applications we create, the more connections we have among all of these moving parts. It is impossible to think that any organization can be successful with this transformation without having something in place at the infrastructure level that takes care of all of these connections, so we don’t have to worry about them anymore.

Without a service mesh, I really cannot see how that could be possibly successful. Lots of enterprise organizations and practitioners are seeing that. As the adoption of service mesh increases, increasing service mesh products, they will get more mature and we’re going to hear more and more success stories from them on how they enabled these transformations. So very interesting times ahead of us!

Idit, anything to add?

Yeah, when looking at the current state and the roadmap, for most meshes, everybody’s talking about making it boring. It’s done, the features are there, now it’s relatively boring. I think this is a very interesting time because it shows a huge maturity in the market and definitely, there is a market fit, right? That’s why we have this conference, that’s why we are here talking about it. Obviously, service mesh has a market fit. The interesting thing that will come after this — which I’m personally extremely excited about — is how can we push the boundary?

Now we have this great platform we all agree should be there, and everybody will agree in the organization, as Marco said, this fibrous thing it’s just going to become part of this platform. What’s interesting is, what can you do with this right now? You have a platform, how can you extend it? How can you make it more interesting and customized to your use case?

I think that’s what we, as an ecosystem, are going to do. Try to push the boundaries which is pretty exciting!

Yeah, totally. Louis or William?

I think there are a couple of things. Along with the boring, that Idit just referred to, is also the platformization. Service meshes will become a little bit less about the features they ship, and more about how easy it is to enable that last mile of integration that customers need. That’s a transformation that takes time, it will take as long as it took to build service meshes. So that’s one trend.

The other trend are the platforms on which people are deploying service meshes that are starting to incorporate service mesh features themselves. So there’s the bottom-up market validation. You see some of that even in Kubernetes, where Kubernetes has multi-cluster services now. That value proposition is starting to sink down into the infrastructure which just makes it even more boring, which, in my opinion, is just good.

Yeah, totally! William?

Yeah, I can only really speak to the Linkerd perspective, but Linkerd was the first service mesh and the one that introduced the term to the lexicons and we’ve been asked this question every year since 2016 or whatever — the ancient days. Linkerd is in a particularly exciting state. Even at this conference, we have end users talking about using Linkerd for scheduling COVID 19 tests for their students, or for doing rapid experimentation at big financial institutions or for doing chaos engineering, or adding FIPS 140 compliance. It’s just stuff that I never imagined Linkerd would be used for. So that feels awesome!

We’re up for CNCF graduation, there’s a whole bunch of cool stuff going on but the thing that we keep coming back to — maybe to Louie’s point — is that ultimately service mesh is going to be absorbed, maybe that’s what “become boring” means. It’s going to be part of the ecosystem, whether we call it a special name or not. The exciting things, especially as we think towards the future, are what can we build on top of this? Because we know that building a big cloud native application — subject to all the demands that we place on software today — is a hard thing to do and service mesh can solve one critical part of that. There’s a lot more that has to be done. To me that’s the most exciting bit: Know where do we go from here? What are we building on top of the service mesh?

Yeah, excellent. I think this meets nicely our next question:

What’s next for your service mesh project?

William, do you want to start?

We’re just shutting it down. We’re done, no more service mesh. No, what’s next for us?

Actually, it is going to sound pretty boring because for Linkerd it will be largely around policy features. Over the next couple of releases… Well, I should say the releases leading up to the most recent one, 2.10, have been heavily focused on mTLS and getting identity wired all the way through. All that has been in service of setting us up to do policy and tackle some of the difficult challenges that we know people have, especially in multi-tenant environments. That is the concrete answer.

More generally, what’s next for Linkerd? We have a sense for this project that it has to be a platform on top of which people build things. We recently introduced this idea of extensions which are very easy ways of plugging into Linkerd. We’ve already got some interesting extensions built on top of that and, to me, that solves Linkerd’s core vision here which is: we want the service mesh itself to be really small and tightly contained. But we want people to build on top of it and have a modular approach. That’s what’s on the roadmap for Linkerd and I’m excited to see how that evolves over the next six months or so.

That’s great! Marco, do you want to tell us about Kuma?

Kuma comes out of the work and the efforts that Kong is doing with our enterprise customers — it’s the fruit of that work. Service mesh is an important piece of the broader connectivity puzzle of how enterprise architects are going to be providing connectivity to their application teams. When we built Kuma, we started with that starting point.

We’re going to have teams that are far away in their Kubernetes journey, we’re going to have teams that are not on Kubernetes yet. How can we provide a connectivity layer that creates an overlay and abstraction across, not only Kubernetes but anything, including virtual machines that the organization may be running?

Obviously, that’s a very complicated problem: being able to run a service mesh in a multi-zone capacity, upgrading it while making sure that we always know if something goes wrong, where it goes wrong, the operations of running a multi-zone service mesh across Kubernetes and VMs, across multiple clouds…

An easy upgrade button that allows us to upgrade the service mesh and the data plane proxies — that’s certainly something that I’m very excited about.

We’ve been doing lots of work building a foundation for our adaptive routing features that would allow us to improve high availability of our applications without necessarily having to have human intervention every time. The more services, the more applications, the more connections, and obviously the harder it’s going to be on top of all of this. The infrastructure, at the end of the day — service mesh even — it’s a means to an end. And that end is reliability and security. And we are working towards automating all of that, so we can remove the human factor out of the equation.

Yeah, Nic, what about you? I saw you nodding your head.

Yeah, the operational aspect, making it easier to operate is one of the key things we’re going to continue to do with Hashicorp Cloud. You’ll have a managed console across more cloud vendors. In the coming year, the operational aspect of configuring multi-cluster capabilities or for connecting a Kubernetes workload of virtual machine. The operational elements of that are going to be simpler, easier to manage the security, and the actual elements of the configuration. And predominantly it’s around the sort of the Kubernetes story as well, and delivering a great experience to a Kubernetes practitioner. Treating things like…it should feel native, you’re just using an extension of Kubernetes and not a different product. Those are some of the goals that we have.

Yeah, makes sense. Louis, anything you want to share from an Istio perspective?

Well, I guess I’ll share two perspectives. With my Istio hat on, like a lot of what Nic just said plus the day two operation stuff around upgrades, maintenance, lifecycle management, Istio has lots of features. Most of our future roadmap is just incremental customer-driven stuff. Probably with a focus on compliance and security things.

With my Google hat on, it’s enabling Google’s customers to easily adopt and absorb service mesh. We recently launched a fully managed Istio based solution for customers. They don’t manage the control plane aspects of Istio anymore — we do that for them.

And yeah, that’s aligned with the goals around day two operations as well. Just to try and lighten the load for people as much as we can. Yeah, and that’s aligned with the trend Nic just talked about at Hashicorp that is gonna provide a managed console connector.

Yeah, makes sense.

Idit, anything you want to share about Istio, Gloo Mesh?

Of course! We are a little bit different because each of the people who talked here are associated with one project. At Solo, we’re working mainly with Istio today but we started our journey a little bit different. We started with Envoy. That was the thing that we were based on. Therefore, the first thing we built was an API gateway, we built the building blocks to create this platform. What we have is an API gateway built on top of Enjoy and now it’s going to shift on top of Istio.

The second thing we have is Gloo Mesh which is helping to manage and focus on day two operations, as well as a very big focus on multi-cluster and managing a lot of instances of Istio, failover between them, and so on.

The third thing we’re focusing on is extending the mesh with web assembly. This is something we worked very, very hard on and brought a web assembly hub to the community and our product.

And the last one was a developer portal which is the next step to manage all of this, bringing it to the developer, being able to expose all the API that’s running. So these are the building blocks we have. Now that we have this platform, that’s working specifically on Istio and Envoy and we have the knowledge of doing Istio and Envoy and that’s why again we are pushing the boundaries.

So, the next thing that we’re going to do is working on building on top of it and we’re going to have some crazy and awesome announcements, so stay tuned.

That’s excellent! These are all the questions I have. I would love to hear the questions from the audience.