Python-based Open Source LLM Cloud Native MLOps Deployment Practices

Keynote

Abstract

In the context of the rapid development of open-source large models, how to quickly convert open-source large models such as Qwen, DeepSeek, and Llama into usable API services has become a key challenge faced by developers. Traditional deployment often needs to handle complex steps including containerization, infrastructure configuration, and inference engine optimization. This not only requires in-depth operation and maintenance knowledge, but also involves a large amount of manual configuration.

Details

This speech will share practical experience in building a unified open-source model deployment system, and introduce a Python-based multi-engine abstraction layer architecture: trigger the deployment process through simple CLI commands, realize "build once, deploy anywhere" by using unified container design, optimize the inference engine for different models through dynamic configuration, and finally automatically deploy them as OpenAI-compatible API services.

Combined with practical cases of open-source models such as GPT-OSS, it will demonstrate the complexity management of multi-engine deployment, performance parameter optimization, cross-platform adaptation, and the construction of standardized interfaces.