CNCF [Cloud Native Computing Foundation]
2016-08-30T22:10:40Z
Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon North America in Salt Lake City from November 12 - 15, 2024. Connect with our current graduated, incubating, and sandbox projects as the community gathers to further the education and advancement of cloud native computing. Learn more at https://kubecon.io
Scaling Kubernetes: Best Practices for Managing Large-Scale Batch Jobs with Spark and Argo Workflow | 扩展Kubernetes:管理大规模批处理作业的最佳实践与Spark和Argo工作流 - Yu Zhuang & Liu Jiaxu, Alibaba Cloud
Are you managing large-scale batch jobs on Kubernetes, like data processing with Spark applications or genomics computing with Argo workflows? To complete these jobs promptly, a significant number of pods have to be scaled out/in quickly for parallel computation. It means a big pressure to Kubernetes control plane. In this talk, we will use Spark and Argo workflows as example, guiding you how to build a Kubernetes cluster which supports creating/deleting 20000 of pods frequently. Our focus will be on tuning the Kubernetes control plane, including optimizing the list-watch mechanism, service broadcasting, environment variable attachments, API server configurations. Additionally, we'll share some of the best practices for configuring Spark operator and Argo workflows controller.
您是否正在Kubernetes上管理大规模的批处理作业,比如使用Spark应用程序进行数据处理或使用Argo工作流进行基因组计算?为了及时完成这些作业,需要快速地扩展/缩减大量的Pod以进行并行计算,这给Kubernetes控制平面带来了巨大压力。 在本次演讲中,我们将以Spark和Argo工作流为例,指导您如何构建一个支持频繁创建/删除20000个Pod的Kubernetes集群。我们将重点放在调优Kubernetes控制平面上,包括优化列表-观察机制、服务广播、环境变量附加、API服务器配置等。此外,我们还将分享一些配置Spark操作员和Argo工作流控制器的最佳实践。