Cette page a été traduite par l'API Cloud Translation.

Déployer Gemma à l'aide d'Ollama et d'Open-WebUI

Grâce aux GPU NVIDIA de niveau entreprise inclus dans le SKU optimisé pour l'IA de la sandbox GDC, vous pouvez développer et tester des applications exigeantes d'entraînement et d'inférence d'IA, telles que l'IA générative.

Gemma est un grand modèle de langage léger basé sur la technologie Gemini. Ce tutoriel explique comment déployer Gemma avec Ollama et Open-WebUI dans le bac à sable GDC. Il a les objectifs suivants :

Déployez Ollama avec le modèle Gemma dans un bac à sable GDC optimisé pour l'IA avec des GPU.
Envoyez des requêtes au service Ollama sur son point de terminaison privé via l'interface Open-WebUI.

Avant de commencer

Les GPU de GDC Sandbox sont inclus dans le cluster org-infra.

Pour exécuter des commandes sur le cluster d'infrastructure de l'organisation, assurez-vous de disposer du fichier kubeconfig du cluster org-1-infra, comme décrit dans Utiliser des clusters :
- Configurez et authentifiez-vous avec la ligne de commande gdcloud.
- générez le fichier kubeconfig pour le cluster d'infrastructure de l'organisation et attribuez son chemin d'accès à la variable d'environnement KUBECONFIG.
Assurez-vous que le rôle sandbox-gpu-admin est attribué à l'utilisateur pour le projet sandbox-gpu-project. Par défaut, le rôle est attribué à l'utilisateur platform-admin. Vous pouvez attribuer le rôle à d'autres utilisateurs en vous connectant en tant que platform-admin et en exécutant la commande suivante :
```
kubectl --kubeconfig ${KUBECONFIG} create rolebinding ${NAME} --role=sandbox-gpu-admin \
--user=${USER} --namespace=sandbox-gpu-project
```
Assurez-vous de configurer le dépôt Artifact Registry comme décrit dans Utiliser Artifact Registry et connectez-vous pour pouvoir transférer des images vers le registre d'artefacts et en extraire.

Déployer un modèle Gemma avec Ollama et Open-WebUI

Le déploiement est orchestré à l'aide d'un ensemble de fichiers de configuration Kubernetes (manifestes YAML), chacun définissant un composant ou un service spécifique.

Créez un fichier Dockerfile avec Gemma pré-téléchargé.

 FROM ubuntu

 # Install Ollama
 # This uses Ollamas official installation script, which adds Ollama to /usr/local/bin
 RUN apt-get update && apt-get install -y --no-install-recommends curl ca-certificates
 RUN curl -fsSL https://ollama.com/install.sh -o install.sh
 RUN chmod +x install.sh
 RUN ./install.sh && \
     rm -rf /var/lib/apt/lists/*

 # Set environment variables for Ollama (optional, but good practice)
 ENV OLLAMA_HOST="0.0.0.0"
 # ENV OLLAMA_MODELS="/usr/local/ollama/models" # Default is /root/.ollama
 # If you want to customize the model storage path within the container, set OLLAMA_MODELS
 # and then ensure you create and populate that directory. Default is usually fine for pre-downloaded.

 # --- Predownload Gemma Model ---
 # This step starts Ollama server in the background, pulls the model,
 # and then kills the server to allow the Docker build to continue.
 # This approach works around Docker''s RUN command limitations for services.

 RUN ollama serve & \
     sleep 5 && \
     # Give the Ollama server a moment to start up
     # Use --retry and --retry-connrefused to handle startup delays
     curl --retry 10 --retry-connrefused -s http://localhost:11434 || true && \
     echo "Attempting to pull gemma:7b..." && \
     ollama pull gemma:7b && \
     echo "Model pull complete. Cleaning up background Ollama process." && \
     pkill ollama || true # Gracefully kill the ollama serve process

 # Expose Ollama's default port
 EXPOSE 11434

 # Command to run Ollama server when the container starts
 CMD ["ollama", "serve"]

Créez l'image Docker et importez-la dans le dépôt Artifact Registry.

docker build -t ollama-gemma .
docker tag ollama-gemma REGISTRY_REPOSITORY_URL/ollama-gemma:latest
docker push REGISTRY_REPOSITORY_URL/ollama-gemma:latest

Remplacez les éléments suivants :

REGISTRY_REPOSITORY_URL par l'URL du dépôt.

Créez un secret pour enregistrer les identifiants Docker.


export SECRET=DOCKER_REGISTRY_SECRET
export DOCKER_TEST_CONFIG=~/.docker/config.json 
kubectl --kubeconfig ${KUBECONFIG}$ create secret docker-registry ${SECRET} --from-file=.dockerconfigjson=${DOCKER_TEST_CONFIG} -n sandbox-gpu-project

Remplacez les éléments suivants :

DOCKER_REGISTRY_SECRET : nom du secret.

Créez un fichier ollama-deployment.yaml pour définir le déploiement du moteur d'IA Ollama :

Le déploiement du serveur Ollama nécessite un GPU.

  apiVersion: apps/v1
  kind: Deployment
  metadata:
    annotations:
      deployment.kubernetes.io/revision: "9"
    name: ollama
    namespace: sandbox-gpu-project
  spec:
    progressDeadlineSeconds: 600
    replicas: 1
    revisionHistoryLimit: 10
    selector:
      matchLabels:
        app: ollama
    strategy:
      rollingUpdate:
        maxSurge: 25%
        maxUnavailable: 25%
      type: RollingUpdate
    template:
      metadata:
        creationTimestamp: null
        labels:
          app: ollama
          egress.networking.gke.io/enabled: "true"
      spec:
        containers:
          - name: ollama
            image: REGISTRY_REPOSITORY_URL/ollama-gemma:latest
            imagePullPolicy: Always
            ports:
              - containerPort: 11434
                protocol: TCP
            resources:
              limits:
                nvidia.com/gpu-pod-NVIDIA_H100_80GB_HBM3: "1"
              requests:
                nvidia.com/gpu-pod-NVIDIA_H100_80GB_HBM3: "1"
            env:
              - name: OLLAMA_HOST
                value: 0.0.0.0
              - name: OLLAMA_ORIGINS
                value: http://localhost:8080,http://ollama-webui.ollama-llm.svc.cluster.local:8080,http://ollama-webui:8080
            securityContext:
              seLinuxOptions:
                type: unconfined_t
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
        imagePullSecrets:
        - name: DOCKER_REGISTRY_SECRET
        dnsConfig:
          nameservers:
            - 8.8.8.8
        dnsPolicy: ClusterFirst
        restartPolicy: Always
        schedulerName: default-scheduler
        terminationGracePeriodSeconds: 30

Remplacez les éléments suivants :

REGISTRY_REPOSITORY_URL : URL du dépôt.
DOCKER_REGISTRY_SECRET : nom du secret.

Créez le fichier ollama-service.yaml pour exposer le serveur Ollama en interne.

apiVersion: v1
kind: Service
metadata:
  name: ollama
  namespace: sandbox-gpu-project
  annotations:
    metallb.universe.tf/ip-allocated-from-pool: lb-address-pool-0-ptleg
spec:
  type: LoadBalancer
  selector:
    app: ollama
  ports:
    - port: 11434
      nodePort: 30450
  ipFamilyPolicy: SingleStack
  ipFamilies:
    - IPv4
  clusterIPs:
    - 10.1.122.216
  clusterIP: 10.1.122.216

Appliquer les fichiers manifestes

kubectl --kubeconfig ${KUBECONFIG} apply -f ollama-deployment.yaml
kubectl --kubeconfig ${KUBECONFIG} apply -f ollama-service.yaml

Assurez-vous que les pods Ollama sont en cours d'exécution.

kubectl --kubeconfig ${KUBECONFIG} get deployments -n sandbox-gpu-project
kubectl --kubeconfig ${KUBECONFIG} get service -n sandbox-gpu-project

Notez l'adresse IP externe du service Ollama OLLAMA_BASE_END_POINT dans le résultat.

kubectl --kubeconfig ${KUBECONFIG} get service ollama \
      -n sandbox-gpu-project -o jsonpath='{.status.loadBalancer.ingress[*].ip}'

Créez le fichier openweb-ui-deployment.yaml pour déployer l'interface Open-WebUI.

  apiVersion: apps/v1
  kind: Deployment
  metadata:
    name: ollama-webui
    namespace: sandbox-gpu-project
    labels:
      app: ollama-webui
    annotations:
      deployment.kubernetes.io/revision: "5"
  spec:
    replicas: 1
    selector:
      matchLabels:
        app: ollama-webui
    strategy:
      type: RollingUpdate
      rollingUpdate:
        maxSurge: 25%
        maxUnavailable: 25%
    progressDeadlineSeconds: 600
    revisionHistoryLimit: 10
    template:
      metadata:
        labels:
          app: ollama-webui
        creationTimestamp: null
      spec:
        containers:
          - name: ollama-webui
            image: ghcr.io/open-webui/open-webui:main
            imagePullPolicy: IfNotPresent
            ports:
              - name: http
                containerPort: 8080
                protocol: TCP
            env:
              - name: OLLAMA_BASE_URL
                value: OLLAMA_BASE_END_POINT
              - name: PORT
                value: "8080"
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
        restartPolicy: Always
        dnsPolicy: ClusterFirst
        schedulerName: default-scheduler
        terminationGracePeriodSeconds: 30

Remplacez les éléments suivants :

OLLAMA_BASE_END_POINT : adresse IP externe du service Ollama.

Créez un fichier ollama-webui-service.yaml pour exposer en externe l'interface WebUI ouverte.

apiVersion: v1
kind: Service
metadata:
  name: ollama-webui
  namespace: sandbox-gpu-project
  annotations:
    metallb.universe.tf/ip-allocated-from-pool: lb-address-pool-0-ptleg
spec:
  type: LoadBalancer
  ipFamilyPolicy: SingleStack
  ipFamilies:
  - IPv4
  clusterIPs:
  - 10.1.104.52
  clusterIP: 10.1.104.52
  ports:
  - port: 80
    targetPort: 8080
    nodePort: 32351
  selector:
    app: ollama-webui

Appliquez les fichiers manifestes openweb-ui-deployment.yaml et ollama-webui-service.yaml au cluster.

    kubectl --kubeconfig ${KUBECONFIG} apply -f openweb-ui-deployment.yaml
    kubectl --kubeconfig ${KUBECONFIG} apply -f ollama-webui-service.yaml

Créez une règle de réseau de projet pour autoriser le trafic entrant provenant d'adresses IP externes.

kubectl --kubeconfig ${KUBECONFIG} apply -f - <<EOF
apiVersion: networking.global.gdc.goog/v1
kind: ProjectNetworkPolicy
metadata:
  namespace: sandbox-gpu-project
  name: allow-inbound-traffic-from-external
spec:
  policyType: Ingress
  subject:
    subjectType: UserWorkload
  ingress:
  - from:
    - ipBlock:
        cidr: 0.0.0.0/0
EOF

Identifiez l'adresse IP externe du service Ollama en exécutant la commande suivante. Notez-le pour l'utiliser dans les étapes suivantes, où vous remplacerez OPEN_WEB_UI_ENDPOINT par cette valeur.
```
kubectl --kubeconfig ${KUBECONFIG} get service -n sandbox-gpu-project
```
Ouvrez Google Chrome et saisissez l'URL en utilisant l'adresse IP externe que vous avez trouvée à l'étape précédente. Vous pouvez désormais interagir avec le modèle Gemma via l'interface Open Web UI.
```
http://OPEN_WEB_UI_ENDPOINT/
```

Déployer Gemma à l'aide d'Ollama et d'Open-WebUI Restez organisé à l'aide des collections Enregistrez et classez les contenus selon vos préférences.

Avant de commencer

Déployer un modèle Gemma avec Ollama et Open-WebUI

Déployer Gemma à l'aide d'Ollama et d'Open-WebUI