RESOURCES

AI and ML: Let’s Talk About the Boring (yet Critical!) Operational Side

December 6, 2024

AI and ML are becoming increasingly prevalent, so it's worth looking harder at the operational side of running these applications. We need a lot of compute and access to GPU workloads. We also need to be reliable while providing rock-solid separation between datasets and training processes. And we need great observability in case things go wrong, and must be simple to operate. Let's build our ML applications on top of a service mesh instead of spending resources reimplementing the wheel – or, worse, the flat tire. Watch this lively, informative, and entertaining talk on how a service mesh can solve real-world issues with ML applications while making it simpler and faster to actually get things done in the world of ML. Rob and Milad demonstrate how you can use Linkerd together with multiple clusters to develop, debug, and deploy an ML application in Kubernetes (including IPv6 and GPUs), with special attention to multitenancy and scaling.