跳转至

CI/CD 集成指南

📚 概述

本文档介绍如何将 KitOps 集成到 CI/CD 流水线中,实现模型的自动打包、测试和部署。通过自动化流水线,可以确保模型版本的一致性和可追溯性。

🎯 文档元信息

  • 适用平台: GitHub Actions、GitLab CI、腾讯云 CODING、蓝盾流水线
  • 适用场景: MLOps、模型持续集成/部署
  • Agent 友好度: ⭐⭐⭐⭐⭐

📋 CI/CD 流程概览

graph LR
    A[代码/模型提交] --> B[构建测试]
    B --> C[模型验证]
    C --> D[kit pack]
    D --> E[kit push]
    E --> F{环境}
    F -->|Dev| G[开发环境部署]
    F -->|Staging| H[预发布环境部署]
    F -->|Prod| I[生产环境部署]
    G --> J[集成测试]
    H --> K[性能测试]
    I --> L[监控告警]

🔧 GitHub Actions 集成

使用官方 Action

KitOps 提供了官方的 GitHub Action:setup-kit-cli

基本配置

# .github/workflows/model-ci.yml
name: Model CI/CD

on:
  push:
    branches: [main, develop]
    paths:
      - 'models/**'
      - 'data/**'
      - 'Kitfile'
  pull_request:
    branches: [main]
    paths:
      - 'models/**'
      - 'data/**'
      - 'Kitfile'

env:
  TCR_REGISTRY: ml-registry-xxxx.tencentcloudcr.com
  MODEL_NAMESPACE: ml-models

jobs:
  build-and-test:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4
        with:
          lfs: true  # 如果使用 Git LFS 存储大文件

      - name: Setup Kit CLI
        uses: jozu-ai/gh-kit-setup@v1.0.0
        with:
          version: latest  # 或指定版本如 v1.11.0

      - name: Verify Kit installation
        run: kit version

      - name: Validate Kitfile
        run: kit info .

      - name: Run model tests
        run: |
          # 运行模型验证脚本
          python -m pytest tests/test_model.py -v

  package-and-push:
    needs: build-and-test
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'

    steps:
      - name: Checkout repository
        uses: actions/checkout@v4
        with:
          lfs: true

      - name: Setup Kit CLI
        uses: jozu-ai/gh-kit-setup@v1.0.0

      - name: Login to TCR
        run: |
          kit login ${{ env.TCR_REGISTRY }} \
            -u ${{ secrets.TCR_USERNAME }} \
            -p ${{ secrets.TCR_PASSWORD }}

      - name: Build version tag
        id: version
        run: |
          # 基于 Git 信息生成版本号
          VERSION=$(cat VERSION || echo "0.0.0")
          SHORT_SHA=$(git rev-parse --short HEAD)
          echo "version=${VERSION}" >> $GITHUB_OUTPUT
          echo "tag=${VERSION}-${SHORT_SHA}" >> $GITHUB_OUTPUT

      - name: Pack ModelKit
        run: |
          kit pack . \
            -t ${{ env.TCR_REGISTRY }}/${{ env.MODEL_NAMESPACE }}/my-model:${{ steps.version.outputs.tag }} \
            -t ${{ env.TCR_REGISTRY }}/${{ env.MODEL_NAMESPACE }}/my-model:latest

      - name: Push ModelKit
        run: |
          kit push ${{ env.TCR_REGISTRY }}/${{ env.MODEL_NAMESPACE }}/my-model:${{ steps.version.outputs.tag }}
          kit push ${{ env.TCR_REGISTRY }}/${{ env.MODEL_NAMESPACE }}/my-model:latest

      - name: Output ModelKit info
        run: |
          echo "### ModelKit Published 🚀" >> $GITHUB_STEP_SUMMARY
          echo "" >> $GITHUB_STEP_SUMMARY
          echo "- **Registry**: ${{ env.TCR_REGISTRY }}" >> $GITHUB_STEP_SUMMARY
          echo "- **Image**: ${{ env.MODEL_NAMESPACE }}/my-model" >> $GITHUB_STEP_SUMMARY
          echo "- **Tag**: ${{ steps.version.outputs.tag }}" >> $GITHUB_STEP_SUMMARY

完整的多阶段流水线

# .github/workflows/model-pipeline.yml
name: Model Pipeline

on:
  push:
    branches: [main]
  workflow_dispatch:
    inputs:
      deploy_env:
        description: 'Deployment environment'
        required: true
        default: 'staging'
        type: choice
        options:
          - dev
          - staging
          - production

env:
  TCR_REGISTRY: ml-registry-xxxx.tencentcloudcr.com
  MODEL_NAMESPACE: ml-models
  MODEL_NAME: sentiment-classifier

jobs:
  # 阶段 1: 验证
  validate:
    runs-on: ubuntu-latest
    outputs:
      version: ${{ steps.version.outputs.version }}
    steps:
      - uses: actions/checkout@v4
      - uses: jozu-ai/gh-kit-setup@v1.0.0

      - name: Validate Kitfile syntax
        run: kit info . --format json

      - name: Generate version
        id: version
        run: |
          VERSION=$(date +%Y%m%d)-$(git rev-parse --short HEAD)
          echo "version=$VERSION" >> $GITHUB_OUTPUT

  # 阶段 2: 测试
  test:
    needs: validate
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.10'

      - name: Install dependencies
        run: pip install -r requirements.txt

      - name: Run unit tests
        run: pytest tests/unit/ -v --junitxml=test-results.xml

      - name: Run model validation
        run: python scripts/validate_model.py

  # 阶段 3: 打包
  package:
    needs: [validate, test]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          lfs: true

      - uses: jozu-ai/gh-kit-setup@v1.0.0

      - name: Login to TCR
        run: |
          kit login ${{ env.TCR_REGISTRY }} \
            -u ${{ secrets.TCR_USERNAME }} \
            -p ${{ secrets.TCR_PASSWORD }}

      - name: Pack and Push
        run: |
          VERSION=${{ needs.validate.outputs.version }}
          IMAGE=${{ env.TCR_REGISTRY }}/${{ env.MODEL_NAMESPACE }}/${{ env.MODEL_NAME }}

          kit pack . -t ${IMAGE}:${VERSION}
          kit push ${IMAGE}:${VERSION}

          # 同时更新 latest 标签
          kit pack . -t ${IMAGE}:latest
          kit push ${IMAGE}:latest

  # 阶段 4: 部署到 Dev
  deploy-dev:
    needs: package
    runs-on: ubuntu-latest
    environment: development
    steps:
      - name: Deploy to Dev cluster
        run: |
          echo "Deploying to development environment..."
          # 更新 Kubernetes Deployment
          # kubectl set image deployment/model-inference ...

  # 阶段 5: 部署到 Staging
  deploy-staging:
    needs: deploy-dev
    runs-on: ubuntu-latest
    environment: staging
    if: github.event.inputs.deploy_env != 'dev'
    steps:
      - name: Deploy to Staging
        run: |
          echo "Deploying to staging environment..."

  # 阶段 6: 部署到 Production
  deploy-production:
    needs: deploy-staging
    runs-on: ubuntu-latest
    environment: production
    if: github.event.inputs.deploy_env == 'production'
    steps:
      - name: Deploy to Production
        run: |
          echo "Deploying to production environment..."

🦊 GitLab CI 集成

基本配置

# .gitlab-ci.yml
stages:
  - validate
  - test
  - package
  - deploy

variables:
  TCR_REGISTRY: ml-registry-xxxx.tencentcloudcr.com
  MODEL_NAMESPACE: ml-models
  MODEL_NAME: my-model

# 全局配置
default:
  image: ubuntu:22.04
  before_script:
    # 安装 Kit CLI
    - curl -fsSL https://get.kitops.org | sh
    - kit version

# 验证阶段
validate:
  stage: validate
  script:
    - kit info .
  rules:
    - changes:
        - Kitfile
        - models/**/*
        - data/**/*

# 测试阶段
test:
  stage: test
  image: python:3.10
  script:
    - pip install -r requirements.txt
    - pytest tests/ -v
  rules:
    - changes:
        - "**/*.py"
        - requirements.txt

# 打包阶段
package:
  stage: package
  script:
    - kit login $TCR_REGISTRY -u $TCR_USERNAME -p $TCR_PASSWORD
    - |
      VERSION="${CI_COMMIT_SHORT_SHA}"
      IMAGE="${TCR_REGISTRY}/${MODEL_NAMESPACE}/${MODEL_NAME}"

      kit pack . -t ${IMAGE}:${VERSION}
      kit push ${IMAGE}:${VERSION}

      if [ "$CI_COMMIT_BRANCH" == "main" ]; then
        kit pack . -t ${IMAGE}:latest
        kit push ${IMAGE}:latest
      fi
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
    - if: $CI_COMMIT_BRANCH == "develop"
  artifacts:
    reports:
      dotenv: deploy.env

# 部署到开发环境
deploy:dev:
  stage: deploy
  environment:
    name: development
    url: https://dev.example.com
  script:
    - echo "Deploying to dev environment"
    - |
      # 使用 kubectl 或其他部署工具
      # kubectl set image deployment/model-inference ...
  rules:
    - if: $CI_COMMIT_BRANCH == "develop"

# 部署到生产环境
deploy:prod:
  stage: deploy
  environment:
    name: production
    url: https://api.example.com
  script:
    - echo "Deploying to production environment"
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
  when: manual  # 需要手动触发

使用 GitLab Container Registry

# 使用 GitLab 内置的容器镜像仓库
package:
  stage: package
  script:
    - kit login $CI_REGISTRY -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD
    - |
      IMAGE="${CI_REGISTRY_IMAGE}/model"
      kit pack . -t ${IMAGE}:${CI_COMMIT_SHORT_SHA}
      kit push ${IMAGE}:${CI_COMMIT_SHORT_SHA}

🔵 腾讯云 CODING 集成

Jenkinsfile 配置

// Jenkinsfile
pipeline {
    agent {
        docker {
            image 'ubuntu:22.04'
        }
    }

    environment {
        TCR_REGISTRY = 'ml-registry-xxxx.tencentcloudcr.com'
        MODEL_NAMESPACE = 'ml-models'
        MODEL_NAME = 'my-model'
        TCR_CREDENTIALS = credentials('tcr-credentials')
    }

    stages {
        stage('Setup') {
            steps {
                sh '''
                    curl -fsSL https://get.kitops.org | sh
                    kit version
                '''
            }
        }

        stage('Validate') {
            steps {
                sh 'kit info .'
            }
        }

        stage('Test') {
            steps {
                sh '''
                    pip install -r requirements.txt
                    pytest tests/ -v
                '''
            }
        }

        stage('Package') {
            steps {
                sh '''
                    kit login $TCR_REGISTRY -u $TCR_CREDENTIALS_USR -p $TCR_CREDENTIALS_PSW

                    VERSION="${GIT_COMMIT:0:7}"
                    IMAGE="${TCR_REGISTRY}/${MODEL_NAMESPACE}/${MODEL_NAME}"

                    kit pack . -t ${IMAGE}:${VERSION}
                    kit push ${IMAGE}:${VERSION}
                '''
            }
        }

        stage('Deploy') {
            when {
                branch 'main'
            }
            steps {
                sh '''
                    echo "Deploying to production..."
                    # 部署逻辑
                '''
            }
        }
    }

    post {
        success {
            echo 'Pipeline completed successfully!'
        }
        failure {
            echo 'Pipeline failed!'
        }
    }
}

🛡️ 蓝盾流水线集成

YAML 流水线配置

# .ci/pipeline.yml
version: v2.0

stages:
  - name: "构建测试"
    jobs:
      - name: "验证和测试"
        runs-on: 
          pool-name: docker
          container:
            image: python:3.10
        steps:
          - checkout: self

          - script: |
              curl -fsSL https://get.kitops.org | sh
              kit version
            name: "安装 Kit CLI"

          - script: kit info .
            name: "验证 Kitfile"

          - script: |
              pip install -r requirements.txt
              pytest tests/ -v
            name: "运行测试"

  - name: "打包推送"
    jobs:
      - name: "构建 ModelKit"
        runs-on:
          pool-name: docker
          container:
            image: ubuntu:22.04
        steps:
          - checkout: self

          - script: curl -fsSL https://get.kitops.org | sh
            name: "安装 Kit CLI"

          - script: |
              kit login ${TCR_REGISTRY} -u ${TCR_USERNAME} -p ${TCR_PASSWORD}

              VERSION="$(date +%Y%m%d)-${BK_CI_BUILD_ID}"
              IMAGE="${TCR_REGISTRY}/${MODEL_NAMESPACE}/${MODEL_NAME}"

              kit pack . -t ${IMAGE}:${VERSION}
              kit push ${IMAGE}:${VERSION}
            name: "打包并推送"
            env:
              TCR_REGISTRY: ml-registry-xxxx.tencentcloudcr.com
              MODEL_NAMESPACE: ml-models
              MODEL_NAME: my-model

  - name: "部署"
    jobs:
      - name: "部署到 TKE"
        runs-on:
          pool-name: docker
        steps:
          - script: |
              echo "部署到 TKE 集群..."
              # 使用 kubectl 或 helm 部署
            name: "执行部署"

📊 自动化测试策略

模型验证测试

# 在 CI 中运行模型验证
test-model:
  stage: test
  script:
    - |
      # 解包模型进行测试
      kit unpack . --filter=model -d ./test-model

      # 运行模型验证脚本
      python scripts/validate_model.py \
        --model-path ./test-model \
        --test-data ./test_data.csv \
        --metrics accuracy,f1,auc \
        --threshold 0.85

性能基准测试

benchmark:
  stage: test
  script:
    - |
      # 运行推理性能测试
      python scripts/benchmark.py \
        --model-path ./models \
        --batch-sizes 1,8,32,64 \
        --iterations 1000 \
        --output benchmark_results.json
  artifacts:
    paths:
      - benchmark_results.json

集成测试

integration-test:
  stage: test
  services:
    - name: model-server
      image: ${TCR_REGISTRY}/${MODEL_NAMESPACE}/${MODEL_NAME}:${CI_COMMIT_SHORT_SHA}
  script:
    - |
      # 等待服务启动
      sleep 30

      # 运行集成测试
      pytest tests/integration/ -v \
        --server-url http://model-server:8080

🔐 Secret 管理

GitHub Secrets 配置

在 GitHub 仓库设置中添加以下 Secrets:

Secret 名称 说明
TCR_USERNAME TCR 用户名
TCR_PASSWORD TCR 密码
KUBECONFIG Kubernetes 配置(Base64 编码)

使用 Vault 管理敏感信息

# GitHub Actions 集成 HashiCorp Vault
jobs:
  package:
    steps:
      - name: Import secrets from Vault
        uses: hashicorp/vault-action@v2
        with:
          url: https://vault.example.com
          method: jwt
          role: github-actions
          secrets: |
            secret/data/tcr username | TCR_USERNAME ;
            secret/data/tcr password | TCR_PASSWORD

      - name: Login to TCR
        run: kit login $TCR_REGISTRY -u $TCR_USERNAME -p $TCR_PASSWORD

📈 监控和通知

发送通知

# 发送企业微信通知
notify:
  stage: .post
  script:
    - |
      curl -X POST "https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=${WECHAT_KEY}" \
        -H "Content-Type: application/json" \
        -d '{
          "msgtype": "markdown",
          "markdown": {
            "content": "## ModelKit 发布通知\n\n- **模型**: '${MODEL_NAME}'\n- **版本**: '${VERSION}'\n- **状态**: ✅ 发布成功\n- **时间**: '$(date)'"
          }
        }'
  when: on_success

🔗 相关资源