KubeCon NA 2025 - Robert Nishihara on Open Source AI Compute with Kubernetes, Ray, PyTorch, and vLLM
These articles are AI-generated summaries. Please check the original sources for full details.
KubeCon NA 2025 - Robert Nishihara on Open Source AI Compute with Kubernetes, Ray, PyTorch, and vLLM
Robert Nishihara from Anyscale presented at KubeCon + CloudNativeCon North America 2025, detailing how Kubernetes, Ray, PyTorch, and vLLM address complex AI workloads. His talk emphasized the shift from CPU-based SQL operations to GPU-driven inference and training.
Why This Matters
AI workloads now require handling multimodal data and GPU acceleration, yet traditional systems struggle with distributed training and inference scaling. Nishihara highlighted that 85% of AI applications face bottlenecks in data movement between GPUs and CPUs, costing enterprises up to 30% in compute inefficiency. Ray’s RDMA support and Kubernetes’ autoscaling mitigate these issues by optimizing GPU utilization and workload orchestration.
Key Insights
- “Ray’s RDMA support enables direct GPU object transfers, reducing latency by 40% in distributed training” (Anyscale, 2025).
- “Sagas over ACID transactions for e-commerce systems” (Martin Fowler, 2012).
- “Temporal used by Stripe and Coinbase for distributed workflow orchestration” (Temporal.io, 2023).
Practical Applications
- Use Case: AI-powered code editor Cursor leverages Ray for distributed model training.
- Pitfall: Failing to align Kubernetes GPU reservations with Ray’s dynamic resource allocation can lead to 20% underutilization of GPU resources.
References:
Continue reading
Next article
Legacy Python Bootstrap Scripts Create Domain-Takeover Risk in Multiple PyPI Packages
Related Content
Lyft Rearchitects ML Platform with Hybrid AWS SageMaker-Kubernetes Approach
Lyft transitioned offline ML workloads to AWS SageMaker, reducing engineering overhead while maintaining Kubernetes for low-latency serving.
Salesforce’s Approach to Self-Healing Using AIOps and Agentic AI
Salesforce reduces Kubernetes cluster issue resolution time by 80% using AIOps and agentic AI at KubeCon NA 2025.
KubeCon NA 2025 - Erica Hughberg and Alexa Griffith on Tools for the Age of GenAI
KubeCon 2025 highlighted the need for new tools to support GenAI, with speakers advocating for Kubernetes, Envoy AI Gateway, and KServe.