Operating high-density bare-metal clusters in the highly regulated financial industry
Finleap Connect operates various high-density bare-metal Kubernetes clusters with up to 5,000 pods — keeping their customers’ highly sensitive financial data safe is business-critical.
The cloud team migrated their platform to a cloud native architecture and mTLed all services to comply with strict EU regulatory requirements.
"It was a huge undertaking — especially considering our tight deadline of five months! To our surprise, the mTLS aspect was fairly easy. Linkerd was installed in an hour and running in production within a week without impacting our developer team."
— Christian Hüning, Director of Cloud Technologies
Their five-month deadline was driven by two things: There was the European PSD2 payment directive, a new EU law requiring payment services providers to improve customer authentication processes. Secondly, Connect’s legacy system was hard to maintain. Every night, something seemed to break, changes were hard, and failovers were mostly manual. That’s why Hüning’s team decided to migrate all customers to a new cloud native infrastructure.
Next-gen financial services
Finleap Connect, a leading independent European open banking platform, provides a full-stack platform. It enables Connect customers across banking, accounting, and lending to offer next-gen, mobile-first financial services to their customers. Their services include data and analytics enrichment, default financial data accessibility, seamless payments across a range of applications, and much more.
Connect understands how customers transact and interact, and that know-how is embedded into their platform. The platform allows their clients to compliantly access their customer’s financial transactions and enrich that data with analytics tools while providing digital banking services that deliver high-quality digital products and services.
The engineering team
The Connect engineering team includes a hand full of cloud engineers and about 60 developers spread across multiple teams. The cloud team is responsible for over 50 microservices spread across 10 Kubernetes clusters, distributed in 3 geographic regions across GCP, AWS, and a bare-metal private cloud. Their largest cluster runs 52 nodes with 5,000 pods.
To operate Connect Cloud, the company’s cloud-agnostic private setup, Connect uses SAP Gardener. Linkerd runs across all clusters, representing an integral part of their infrastructure. Today, Linkerd — including its metrics — is centrally managed through Buoyant Cloud.
A regulatory requirement: mTLS across all services
Since they are operating in a highly regulated environment, important considerations have to be taken into account — Connect is dealing with highly sensitive financial customer data. In 2018, they had to implement mTLS across all services in their clusters, independent from the business code (i.e., solving it on a different layer).
To address that challenge, they evaluated a variety of available solutions. One of the options was Istio. They installed it on their test cluster, and although it worked fine, it also required quite a bit of configuration. When they realized they’d need to configure each service, Istio became less feasible. This was in 2018, when Connect was migrating their stack to a cloud native architecture with a very ambitious roadmap (they needed to finalize the migration within five months). Their dev teams were already dealing with many transformation assignments, so they concluded that Istio would become an additional config burden and decided against it.
The other service mesh they evaluated was Linkerd (at the time known as Conduit). Connect appreciated Linkerd’s approach to simplicity, roadmap, and the various Slack discussions with the project maintainers. The service mesh gave them the confidence they needed to move forward with the project.
Installed within an hour, in production within a week
Installing Linkerd took less than an hour, and the overall production setup was probably about a week. Some updates, such as contributing to Linkerd’s certificate management feature, took a few more weeks (more to that in a minute).
Overall, Connect liked the entire Linkerd experience. While they encountered some trouble when starting to scale Linkerd, it mostly was due to required scale-up and -out of Linkerd components.
A few things were on the Linkerd roadmap but not yet implemented — certificate management was one of them. Certificates expire after a year, and Connect had one year to address that. To speed up that process, they decided to contribute to the Linkerd project to help develop that feature. Today, certificate rotation is fully automated, and nothing Connect needs to worry about. Server-speaks-first protocols was another required feature, but that is also supported since Linkerd 2.10.
End-to-end encryption with minimal impact on dev productivity
Connect was able to implement mTLS across all their services at scale while minimizing the impact on dev productivity. The process was fairly quick and allowed them to meet their initial critical deadline to go live with their new platform. Without Linkerd, they meeting that deadline would have ben hard.
Additionally, Linkerd’s four golden metrics are great for uniform and generic platform-level debugging and service health observability. They provide Connect with immediate insights when migrating workloads to Kubernetes. The team gets all this data without digging too deep into app specifics — a big time-saver for them and a great way to get started with cloud native application management for new developments.
Connect also implemented canary deployments through Linkerd and Flagger, and they can now deliver features faster and with more confidence.
"All this was almost automatically enabled by deploying and activating Linkerd across our applications. Linkerd helped us avoid more complex TLS setups for certain services, saving my team lots of backlog time. This is all pretty neat and one of the reasons I’ve been so outspoken about this project."
— Christian Hüning
The community around Linkerd
“The Linkerd community is the best! Everyone is incredibly welcoming. The Slack channel is a great way to get valuable input and collaborate with others. You can literally find solutions to any kind of problem,” states Hüning. “In fact, you can find me there regularly. I’ve been active in the community, and because I enjoy jumping on any opportunity to help educate others, I was invited to become a Linkerd Ambassador along with some other fantastic Linkerd end users.”
Zero-trust doesn’t have to be hard
Zero-trust is a requirement for companies like Connect operating in the fintech industry. But even in other industries, zero-trust is increasingly becoming a must-have. With microservices, a firewall won’t do the trick anymore. While many cloud or enterprise architects are concerned about the complexity they might be adding to an already complex system, zero-trust doesn’t have to be hard. At least not if you’re using Linkerd.
Connect mTLSed all services in five months while minimizing the impact on dev productivity. The platform metrics have also proven a time-saver when debugging and keeping a pulse on service health.
Connect chose Linkerd because it requires minimal configuration while automatically enabling key features. Features that were not yet available, they helped build. “Today, there is no reason not to mTLS all your services,” said Hüning.