TKE 部署指南¶
📚 概述¶
本文档介绍如何在 TKE 集群中解包和部署 ModelKit,实现模型的快速部署和管理。我们将介绍多种部署模式,帮助你选择最适合业务场景的方案。
🎯 文档元信息¶
- 适用产品: TKE 标准集群 / TKE Serverless
- 适用场景: 模型推理服务部署、批量预测任务
- Agent 友好度: ⭐⭐⭐⭐⭐
📋 部署方式对比¶
| 方式 | 适用场景 | 优点 | 缺点 |
|---|---|---|---|
| Init Container | 启动时加载 | 简单直接、资源占用少 | 更新需重启 Pod |
| Sidecar | 运行时更新 | 支持热更新 | 资源占用多 |
| 定时任务 | 定期同步 | 自动化、批量更新 | 延迟较大 |
| PV 预加载 | 共享模型 | 多 Pod 共享、节省带宽 | 配置复杂 |
🚀 方式一:Init Container 部署(推荐)¶
Init Container 是最常用的部署方式,在主容器启动前完成模型加载。
基本配置¶
apiVersion: apps/v1
kind: Deployment
metadata:
name: model-inference
labels:
app: model-inference
spec:
replicas: 3
selector:
matchLabels:
app: model-inference
template:
metadata:
labels:
app: model-inference
spec:
# 拉取 TCR 镜像的凭证
imagePullSecrets:
- name: tcr-secret
# Init Container 加载模型
initContainers:
- name: model-loader
image: ghcr.io/kitops-ml/kit:latest
command:
- sh
- -c
- |
kit login $TCR_REGISTRY -u $TCR_USERNAME -p $TCR_PASSWORD
kit unpack $MODEL_REFERENCE --filter=model -d /models -o
env:
- name: TCR_REGISTRY
value: "ml-registry-xxxx.tencentcloudcr.com"
- name: MODEL_REFERENCE
value: "ml-registry-xxxx.tencentcloudcr.com/ml-models/bert-sentiment:v1.2.0"
- name: TCR_USERNAME
valueFrom:
secretKeyRef:
name: tcr-credentials
key: username
- name: TCR_PASSWORD
valueFrom:
secretKeyRef:
name: tcr-credentials
key: password
volumeMounts:
- name: model-volume
mountPath: /models
# 主容器运行推理服务
containers:
- name: inference-server
image: your-inference-image:latest
ports:
- containerPort: 8080
volumeMounts:
- name: model-volume
mountPath: /app/models
readOnly: true
resources:
requests:
memory: "2Gi"
cpu: "1"
limits:
memory: "4Gi"
cpu: "2"
volumes:
- name: model-volume
emptyDir: {}
使用 ConfigMap 管理模型版本¶
apiVersion: v1
kind: ConfigMap
metadata:
name: model-config
data:
MODEL_VERSION: "v1.2.0"
MODEL_NAME: "bert-sentiment"
MODEL_NAMESPACE: "ml-models"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: model-inference
spec:
template:
spec:
initContainers:
- name: model-loader
image: ghcr.io/kitops-ml/kit:latest
command:
- sh
- -c
- |
MODEL_REF="${TCR_REGISTRY}/${MODEL_NAMESPACE}/${MODEL_NAME}:${MODEL_VERSION}"
echo "Loading model: $MODEL_REF"
kit login $TCR_REGISTRY -u $TCR_USERNAME -p $TCR_PASSWORD
kit unpack $MODEL_REF --filter=model -d /models -o
echo "Model loaded successfully"
envFrom:
- configMapRef:
name: model-config
env:
- name: TCR_REGISTRY
value: "ml-registry-xxxx.tencentcloudcr.com"
# ... 凭证配置
选择性加载组件¶
initContainers:
- name: model-loader
image: ghcr.io/kitops-ml/kit:latest
command:
- sh
- -c
- |
# 仅加载模型
kit unpack $MODEL_REF --filter=model -d /models -o
# 加载模型和配置
# kit unpack $MODEL_REF --filter=model --filter=docs -d /models -o
# 加载模型和特定数据集
# kit unpack $MODEL_REF --filter=model --filter=datasets:validation -d /models -o
🔄 方式二:Sidecar 部署(支持热更新)¶
Sidecar 模式允许在运行时更新模型,无需重启主容器。
基本配置¶
apiVersion: apps/v1
kind: Deployment
metadata:
name: model-inference-with-sidecar
spec:
template:
spec:
containers:
# 主推理容器
- name: inference-server
image: your-inference-image:latest
ports:
- containerPort: 8080
volumeMounts:
- name: model-volume
mountPath: /app/models
readOnly: true
# 监听文件变化并重新加载模型
lifecycle:
postStart:
exec:
command:
- /bin/sh
- -c
- |
# 等待模型加载完成
while [ ! -f /app/models/.ready ]; do sleep 1; done
# Sidecar 容器:模型更新器
- name: model-updater
image: ghcr.io/kitops-ml/kit:latest
command:
- sh
- -c
- |
# 初始加载
kit login $TCR_REGISTRY -u $TCR_USERNAME -p $TCR_PASSWORD
kit unpack $MODEL_REF --filter=model -d /models -o
touch /models/.ready
# 定期检查更新
while true; do
sleep 300 # 每 5 分钟检查一次
# 检查是否有新版本
LOCAL_DIGEST=$(cat /models/.digest 2>/dev/null || echo "")
REMOTE_DIGEST=$(kit info $MODEL_REF --format '{{.Digest}}' 2>/dev/null || echo "")
if [ "$LOCAL_DIGEST" != "$REMOTE_DIGEST" ] && [ -n "$REMOTE_DIGEST" ]; then
echo "New model version detected, updating..."
kit unpack $MODEL_REF --filter=model -d /models -o
echo "$REMOTE_DIGEST" > /models/.digest
touch /models/.updated
echo "Model updated successfully"
fi
done
env:
- name: MODEL_REF
value: "ml-registry-xxxx.tencentcloudcr.com/ml-models/bert-sentiment:latest"
# ... 凭证配置
volumeMounts:
- name: model-volume
mountPath: /models
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "200m"
volumes:
- name: model-volume
emptyDir: {}
⏰ 方式三:定时任务更新¶
使用 CronJob 定期将模型同步到共享存储,适合多 Pod 共享同一模型的场景。
CronJob 配置¶
apiVersion: batch/v1
kind: CronJob
metadata:
name: model-sync-job
spec:
schedule: "0 */6 * * *" # 每 6 小时执行一次
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
jobTemplate:
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: model-syncer
image: ghcr.io/kitops-ml/kit:latest
command:
- sh
- -c
- |
set -e
# 登录 TCR
kit login $TCR_REGISTRY -u $TCR_USERNAME -p $TCR_PASSWORD
# 同步多个模型
MODELS="bert-sentiment:v1.2.0 image-classifier:v2.0.0 text-generator:v1.0.0"
for model in $MODELS; do
MODEL_REF="${TCR_REGISTRY}/ml-models/${model}"
MODEL_NAME=$(echo $model | cut -d: -f1)
echo "Syncing model: $MODEL_REF"
kit unpack $MODEL_REF --filter=model -d /models/$MODEL_NAME -o
echo "Model $MODEL_NAME synced successfully"
done
# 更新同步时间戳
date > /models/.last_sync
env:
- name: TCR_REGISTRY
value: "ml-registry-xxxx.tencentcloudcr.com"
- name: TCR_USERNAME
valueFrom:
secretKeyRef:
name: tcr-credentials
key: username
- name: TCR_PASSWORD
valueFrom:
secretKeyRef:
name: tcr-credentials
key: password
volumeMounts:
- name: model-storage
mountPath: /models
volumes:
- name: model-storage
persistentVolumeClaim:
claimName: model-pvc
共享 PVC 配置¶
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: model-pvc
spec:
accessModes:
- ReadWriteMany # 支持多 Pod 同时读取
storageClassName: cfs # 使用 CFS 共享存储
resources:
requests:
storage: 100Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: model-inference
spec:
replicas: 10
template:
spec:
containers:
- name: inference-server
image: your-inference-image:latest
volumeMounts:
- name: model-storage
mountPath: /app/models
readOnly: true
volumes:
- name: model-storage
persistentVolumeClaim:
claimName: model-pvc
🎯 方式四:与推理框架集成¶
与 Triton Inference Server 集成¶
apiVersion: apps/v1
kind: Deployment
metadata:
name: triton-inference-server
spec:
replicas: 2
selector:
matchLabels:
app: triton-server
template:
metadata:
labels:
app: triton-server
spec:
initContainers:
- name: model-loader
image: ghcr.io/kitops-ml/kit:latest
command:
- sh
- -c
- |
kit login $TCR_REGISTRY -u $TCR_USERNAME -p $TCR_PASSWORD
# 加载多个模型到 Triton 模型仓库格式
# Triton 要求的目录结构: /models/<model-name>/<version>/model.xxx
kit unpack $TCR_REGISTRY/ml-models/bert-classifier:v1.0.0 \
--filter=model -d /model-repository/bert-classifier/1 -o
kit unpack $TCR_REGISTRY/ml-models/resnet50:v2.0.0 \
--filter=model -d /model-repository/resnet50/1 -o
# 创建配置文件
cat > /model-repository/bert-classifier/config.pbtxt << EOF
name: "bert-classifier"
platform: "pytorch_libtorch"
max_batch_size: 32
input [
{
name: "input_ids"
data_type: TYPE_INT64
dims: [ -1 ]
}
]
output [
{
name: "logits"
data_type: TYPE_FP32
dims: [ -1, 2 ]
}
]
EOF
env:
- name: TCR_REGISTRY
value: "ml-registry-xxxx.tencentcloudcr.com"
# ... 凭证配置
volumeMounts:
- name: model-repository
mountPath: /model-repository
containers:
- name: triton-server
image: nvcr.io/nvidia/tritonserver:24.01-py3
args:
- tritonserver
- --model-repository=/models
- --strict-model-config=false
ports:
- containerPort: 8000
name: http
- containerPort: 8001
name: grpc
- containerPort: 8002
name: metrics
volumeMounts:
- name: model-repository
mountPath: /models
readOnly: true
resources:
requests:
nvidia.com/gpu: 1
limits:
nvidia.com/gpu: 1
volumes:
- name: model-repository
emptyDir: {}
与 vLLM 集成(LLM 推理)¶
apiVersion: apps/v1
kind: Deployment
metadata:
name: vllm-server
spec:
replicas: 1
selector:
matchLabels:
app: vllm-server
template:
metadata:
labels:
app: vllm-server
spec:
initContainers:
- name: model-loader
image: ghcr.io/kitops-ml/kit:latest
command:
- sh
- -c
- |
kit login $TCR_REGISTRY -u $TCR_USERNAME -p $TCR_PASSWORD
# 加载 LLM 模型(含 LoRA 权重)
kit unpack $TCR_REGISTRY/ml-models/qwen-7b-chat:v1.0.0 \
--filter=model -d /models -o
echo "Model loaded successfully"
ls -la /models/
env:
- name: TCR_REGISTRY
value: "ml-registry-xxxx.tencentcloudcr.com"
# ... 凭证配置
volumeMounts:
- name: model-volume
mountPath: /models
containers:
- name: vllm-server
image: vllm/vllm-openai:latest
args:
- --model=/models
- --host=0.0.0.0
- --port=8000
- --tensor-parallel-size=1
ports:
- containerPort: 8000
volumeMounts:
- name: model-volume
mountPath: /models
readOnly: true
resources:
requests:
nvidia.com/gpu: 1
memory: "32Gi"
limits:
nvidia.com/gpu: 1
memory: "64Gi"
volumes:
- name: model-volume
emptyDir:
medium: Memory # 使用内存加速
sizeLimit: "50Gi"
📊 性能优化¶
模型缓存策略¶
使用节点本地存储缓存模型,减少重复下载:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: model-cache
spec:
selector:
matchLabels:
app: model-cache
template:
metadata:
labels:
app: model-cache
spec:
containers:
- name: cache-manager
image: ghcr.io/kitops-ml/kit:latest
command:
- sh
- -c
- |
# 预加载常用模型到节点本地
kit login $TCR_REGISTRY -u $TCR_USERNAME -p $TCR_PASSWORD
MODELS="bert-sentiment:v1.2.0 image-classifier:v2.0.0"
for model in $MODELS; do
MODEL_REF="${TCR_REGISTRY}/ml-models/${model}"
kit pull $MODEL_REF # 拉取到本地缓存
done
# 保持运行
sleep infinity
volumeMounts:
- name: kit-cache
mountPath: /root/.kitops
volumes:
- name: kit-cache
hostPath:
path: /var/lib/kitops
type: DirectoryOrCreate
并行解包¶
# 使用多个 Init Container 并行加载
initContainers:
- name: load-model
image: ghcr.io/kitops-ml/kit:latest
command: ["sh", "-c", "kit unpack $MODEL_REF --filter=model -d /models -o"]
- name: load-config
image: ghcr.io/kitops-ml/kit:latest
command: ["sh", "-c", "kit unpack $MODEL_REF --filter=docs -d /config -o"]
增量更新¶
# 仅在有变化时更新
CURRENT_DIGEST=$(kit info $MODEL_REF --format '{{.Digest}}')
CACHED_DIGEST=$(cat /models/.digest 2>/dev/null || echo "")
if [ "$CURRENT_DIGEST" != "$CACHED_DIGEST" ]; then
kit unpack $MODEL_REF --filter=model -d /models -o
echo "$CURRENT_DIGEST" > /models/.digest
fi
🔒 Secret 配置¶
创建 TCR 凭证 Secret¶
# 创建包含 TCR 凭证的 Secret
kubectl create secret generic tcr-credentials \
--from-literal=username=<TCR用户名> \
--from-literal=password=<TCR密码> \
-n <命名空间>
使用 ServiceAccount 的 ImagePullSecret¶
apiVersion: v1
kind: ServiceAccount
metadata:
name: model-inference-sa
imagePullSecrets:
- name: tcr-secret
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: model-inference
spec:
template:
spec:
serviceAccountName: model-inference-sa
# 无需在 Pod 中配置 imagePullSecrets