本頁面由 Cloud Translation API 翻譯而成。

使用 Ollama 和 Open-WebUI 部署 Gemma

GDC Sandbox AI Optimized SKU 內含企業級 NVIDIA GPU，可供您開發及測試資源需求量大的 AI 訓練和推論應用程式，例如生成式 AI。

Gemma 是以 Gemini 技術為基礎的輕量型大型語言模型，本教學課程指南說明如何在 GDC Sandbox 上使用 Ollama 和 Open-WebUI 部署 Gemma，並達成下列目標。

在 AI 最佳化 GDC Sandbox 上，使用 GPU 部署 Ollama 和 Gemma 模型。
透過 Open-WebUI 介面，將提示傳送至私人端點上的 Ollama 服務。

事前準備

GDC Sandbox 中的 GPU 包含在 org-infra 叢集中。

如要對機構基礎架構叢集執行指令，請確認您擁有 org-1-infra 叢集的 kubeconfig，如「使用叢集」一文所述：
- 使用 gdcloud 指令列設定及驗證，以及
- 產生機構基礎架構叢集的 kubeconfig 檔案，並將其路徑指派給環境變數 KUBECONFIG。
確認使用者已獲派專案 sandbox-gpu-project 的 sandbox-gpu-admin 角色。根據預設，系統會將角色指派給 platform-admin 使用者。您可以登入 platform-admin，然後執行下列指令，將角色指派給其他使用者：
```
kubectl --kubeconfig ${KUBECONFIG} create rolebinding ${NAME} --role=sandbox-gpu-admin \
--user=${USER} --namespace=sandbox-gpu-project
```
請務必按照「使用 Artifact Registry」一文所述設定 Artifact Registry 存放區，並登入帳戶，以便將映像檔推送至 Artifact Registry，以及從該處提取映像檔。

使用 Ollama 和 Open-WebUI 部署 Gemma 模型

部署作業是透過一組 Kubernetes 設定檔 (YAML 資訊清單) 協調處理，每個檔案都會定義特定元件或服務。

建立 Dockerfile，並預先下載 Gemma。

 FROM ubuntu

 # Install Ollama
 # This uses Ollamas official installation script, which adds Ollama to /usr/local/bin
 RUN apt-get update && apt-get install -y --no-install-recommends curl ca-certificates
 RUN curl -fsSL https://ollama.com/install.sh -o install.sh
 RUN chmod +x install.sh
 RUN ./install.sh && \
     rm -rf /var/lib/apt/lists/*

 # Set environment variables for Ollama (optional, but good practice)
 ENV OLLAMA_HOST="0.0.0.0"
 # ENV OLLAMA_MODELS="/usr/local/ollama/models" # Default is /root/.ollama
 # If you want to customize the model storage path within the container, set OLLAMA_MODELS
 # and then ensure you create and populate that directory. Default is usually fine for pre-downloaded.

 # --- Predownload Gemma Model ---
 # This step starts Ollama server in the background, pulls the model,
 # and then kills the server to allow the Docker build to continue.
 # This approach works around Docker''s RUN command limitations for services.

 RUN ollama serve & \
     sleep 5 && \
     # Give the Ollama server a moment to start up
     # Use --retry and --retry-connrefused to handle startup delays
     curl --retry 10 --retry-connrefused -s http://localhost:11434 || true && \
     echo "Attempting to pull gemma:7b..." && \
     ollama pull gemma:7b && \
     echo "Model pull complete. Cleaning up background Ollama process." && \
     pkill ollama || true # Gracefully kill the ollama serve process

 # Expose Ollama's default port
 EXPOSE 11434

 # Command to run Ollama server when the container starts
 CMD ["ollama", "serve"]

建構 Docker 映像檔，並上傳至 Artifact Registry 存放區。

docker build -t ollama-gemma .
docker tag ollama-gemma REGISTRY_REPOSITORY_URL/ollama-gemma:latest
docker push REGISTRY_REPOSITORY_URL/ollama-gemma:latest

更改下列內容：

REGISTRY_REPOSITORY_URL，並提供存放區網址。

建立密鑰來儲存 Docker 憑證。


export SECRET=DOCKER_REGISTRY_SECRET
export DOCKER_TEST_CONFIG=~/.docker/config.json 
kubectl --kubeconfig ${KUBECONFIG}$ create secret docker-registry ${SECRET} --from-file=.dockerconfigjson=${DOCKER_TEST_CONFIG} -n sandbox-gpu-project

更改下列內容：

DOCKER_REGISTRY_SECRET 密鑰名稱。

建立 ollama-deployment.yaml 檔案，定義 Ollama AI 引擎部署作業：

部署 Ollama 伺服器時，系統會要求一個 GPU。

  apiVersion: apps/v1
  kind: Deployment
  metadata:
    annotations:
      deployment.kubernetes.io/revision: "9"
    name: ollama
    namespace: sandbox-gpu-project
  spec:
    progressDeadlineSeconds: 600
    replicas: 1
    revisionHistoryLimit: 10
    selector:
      matchLabels:
        app: ollama
    strategy:
      rollingUpdate:
        maxSurge: 25%
        maxUnavailable: 25%
      type: RollingUpdate
    template:
      metadata:
        creationTimestamp: null
        labels:
          app: ollama
          egress.networking.gke.io/enabled: "true"
      spec:
        containers:
          - name: ollama
            image: REGISTRY_REPOSITORY_URL/ollama-gemma:latest
            imagePullPolicy: Always
            ports:
              - containerPort: 11434
                protocol: TCP
            resources:
              limits:
                nvidia.com/gpu-pod-NVIDIA_H100_80GB_HBM3: "1"
              requests:
                nvidia.com/gpu-pod-NVIDIA_H100_80GB_HBM3: "1"
            env:
              - name: OLLAMA_HOST
                value: 0.0.0.0
              - name: OLLAMA_ORIGINS
                value: http://localhost:8080,http://ollama-webui.ollama-llm.svc.cluster.local:8080,http://ollama-webui:8080
            securityContext:
              seLinuxOptions:
                type: unconfined_t
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
        imagePullSecrets:
        - name: DOCKER_REGISTRY_SECRET
        dnsConfig:
          nameservers:
            - 8.8.8.8
        dnsPolicy: ClusterFirst
        restartPolicy: Always
        schedulerName: default-scheduler
        terminationGracePeriodSeconds: 30

更改下列內容：

REGISTRY_REPOSITORY_URL：存放區網址。
DOCKER_REGISTRY_SECRET：密鑰名稱。

建立 ollama-service.yaml 檔案，在內部公開 Ollama 伺服器。

apiVersion: v1
kind: Service
metadata:
  name: ollama
  namespace: sandbox-gpu-project
  annotations:
    metallb.universe.tf/ip-allocated-from-pool: lb-address-pool-0-ptleg
spec:
  type: LoadBalancer
  selector:
    app: ollama
  ports:
    - port: 11434
      nodePort: 30450
  ipFamilyPolicy: SingleStack
  ipFamilies:
    - IPv4
  clusterIPs:
    - 10.1.122.216
  clusterIP: 10.1.122.216

套用資訊清單

kubectl --kubeconfig ${KUBECONFIG} apply -f ollama-deployment.yaml
kubectl --kubeconfig ${KUBECONFIG} apply -f ollama-service.yaml

確認 ollama Pod 正在執行。

kubectl --kubeconfig ${KUBECONFIG} get deployments -n sandbox-gpu-project
kubectl --kubeconfig ${KUBECONFIG} get service -n sandbox-gpu-project

記下輸出內容中 Ollama 服務的外部 IP OLLAMA_BASE_END_POINT

kubectl --kubeconfig ${KUBECONFIG} get service ollama \
      -n sandbox-gpu-project -o jsonpath='{.status.loadBalancer.ingress[*].ip}'

建立檔案 openweb-ui-deployment.yaml，部署 Open-WebUI 介面。

  apiVersion: apps/v1
  kind: Deployment
  metadata:
    name: ollama-webui
    namespace: sandbox-gpu-project
    labels:
      app: ollama-webui
    annotations:
      deployment.kubernetes.io/revision: "5"
  spec:
    replicas: 1
    selector:
      matchLabels:
        app: ollama-webui
    strategy:
      type: RollingUpdate
      rollingUpdate:
        maxSurge: 25%
        maxUnavailable: 25%
    progressDeadlineSeconds: 600
    revisionHistoryLimit: 10
    template:
      metadata:
        labels:
          app: ollama-webui
        creationTimestamp: null
      spec:
        containers:
          - name: ollama-webui
            image: ghcr.io/open-webui/open-webui:main
            imagePullPolicy: IfNotPresent
            ports:
              - name: http
                containerPort: 8080
                protocol: TCP
            env:
              - name: OLLAMA_BASE_URL
                value: OLLAMA_BASE_END_POINT
              - name: PORT
                value: "8080"
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
        restartPolicy: Always
        dnsPolicy: ClusterFirst
        schedulerName: default-scheduler
        terminationGracePeriodSeconds: 30

更改下列內容：

OLLAMA_BASE_END_POINT：Ollama 服務的外部 IP 位址。

建立 ollama-webui-service.yaml 檔案，從外部公開開放式網頁介面。

apiVersion: v1
kind: Service
metadata:
  name: ollama-webui
  namespace: sandbox-gpu-project
  annotations:
    metallb.universe.tf/ip-allocated-from-pool: lb-address-pool-0-ptleg
spec:
  type: LoadBalancer
  ipFamilyPolicy: SingleStack
  ipFamilies:
  - IPv4
  clusterIPs:
  - 10.1.104.52
  clusterIP: 10.1.104.52
  ports:
  - port: 80
    targetPort: 8080
    nodePort: 32351
  selector:
    app: ollama-webui

將資訊清單 openweb-ui-deployment.yaml 和 ollama-webui-service.yaml` 套用至叢集。

    kubectl --kubeconfig ${KUBECONFIG} apply -f openweb-ui-deployment.yaml
    kubectl --kubeconfig ${KUBECONFIG} apply -f ollama-webui-service.yaml

建立專案網路政策，允許來自外部 IP 位址的傳入流量。

kubectl --kubeconfig ${KUBECONFIG} apply -f - <<EOF
apiVersion: networking.global.gdc.goog/v1
kind: ProjectNetworkPolicy
metadata:
  namespace: sandbox-gpu-project
  name: allow-inbound-traffic-from-external
spec:
  policyType: Ingress
  subject:
    subjectType: UserWorkload
  ingress:
  - from:
    - ipBlock:
        cidr: 0.0.0.0/0
EOF

執行下列指令，找出 Ollama 服務的外部 IP。請記下這個值，後續步驟會將這個值代入 OPEN_WEB_UI_ENDPOINT。
```
kubectl --kubeconfig ${KUBECONFIG} get service -n sandbox-gpu-project
```
開啟 Google Chrome，然後使用上一步中找到的外部 IP 位址輸入網址。現在您可以使用 Open Web UI 介面與 Gemma 模型互動。
```
http://OPEN_WEB_UI_ENDPOINT/
```

使用 Ollama 和 Open-WebUI 部署 Gemma 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

事前準備

使用 Ollama 和 Open-WebUI 部署 Gemma 模型

使用 Ollama 和 Open-WebUI 部署 Gemma