Tech Week Singapore 2025
Scaling AI and Autonomous Agents: A Production Platform on Kubernetes
As enterprises increasingly integrate AI into their applications and develop autonomous agents, deploying Large Language Models demands a robust, scalable infrastructure. This talk explores a production-grade AI platform built on Amazon EKS, demonstrating how Kubernetes orchestrates complex AI workloads while ensuring operational excellence. We'll showcase an innovative architecture that leverages vLLM for high-performance inference, LiteLLM for unified access management, and Langfuse for end-to-end observability. The platform effectively addresses hybrid deployment requirements, resolving critical challenges in privacy, GPU optimization, and latency. Through EKS Auto Mode and intelligent scaling mechanisms, organizations can shift their focus from infrastructure management to innovation. Whether you're deploying basic language models or sophisticated AI agents with advanced reasoning capabilities, this architecture provides a solid foundation for your AI journey on Kubernetes.