Créer un chatbot RAG avec GKE et Cloud Storage


Ce tutoriel vous explique comment intégrer une application de grand modèle de langage (LLM) basée sur la génération augmentée par récupération (RAG) à des fichiers PDF que vous importez dans un bucket Cloud Storage.

Ce guide utilise une base de données comme moteur de stockage et de recherche sémantique qui contient les représentations (embeddings) des documents importés. Vous utilisez le framework Langchain pour interagir avec les embeddings et les modèles Gemini disponibles via Vertex AI.

Langchain est un framework Python Open Source populaire qui simplifie de nombreuses tâches de machine learning et qui dispose d'interfaces permettant de s'intégrer à différentes bases de données vectorielles et services d'IA.

Ce tutoriel est destiné aux administrateurs et architectes de plate-forme cloud, aux ingénieurs en ML et aux professionnels du MLOps (DevOps) qui souhaitent déployer des applications LLM RAG sur GKE et Cloud Storage.

Objectifs

Dans ce tutoriel, vous allez apprendre à effectuer les opérations suivantes :

  • Créez et déployez une application pour créer et stocker des embeddings de documents dans une base de données vectorielle.
  • Automatisez l'application pour déclencher l'importation de nouveaux documents dans un bucket Cloud Storage.
  • Déployez une application de chatbot qui utilise la recherche sémantique pour répondre aux questions en fonction du contenu du document.

Architecture de déploiement

Dans ce tutoriel, vous allez créer un bucket Cloud Storage, un déclencheur Eventarc et les services suivants :

  • embed-docs : Eventarc déclenche ce service chaque fois qu'un utilisateur importe un nouveau document dans le bucket Cloud Storage. Le service démarre un Job Kubernetes qui crée des représentations vectorielles continues pour le document importé et les insère dans une base de données vectorielle.
  • chatbot : ce service répond aux questions en langage naturel sur les documents importés à l'aide de la recherche sémantique et de l'API Gemini.

Le schéma suivant illustre le processus d'importation et de vectorisation des documents :

Dans le diagramme, l'utilisateur importe des fichiers dans le bucket Cloud Storage. Eventarc s'abonne aux événements metadataUpdated d'objet pour le bucket et utilise le redirecteur d'événements d'Eventarc, qui est une charge de travail Kubernetes, pour appeler le service embed-docs lorsque vous importez un nouveau document. Le service crée ensuite des embeddings pour le document importé. Le service embed-docs stocke les embeddings dans une base de données vectorielles à l'aide du modèle d'embedding Vertex AI.

Le schéma suivant montre comment poser des questions sur le contenu d'un document importé à l'aide du service chatbot :

Les utilisateurs peuvent poser des questions en langage naturel, et le chatbot génère des réponses basées uniquement sur le contenu des fichiers importés. Le chatbot récupère le contexte de la base de données vectorielle à l'aide de la recherche sémantique, puis envoie la question et le contexte à Gemini.

Coûts

Dans ce document, vous utilisez les composants facturables de Google Cloudsuivants :

Vous pouvez obtenir une estimation des coûts en fonction de votre utilisation prévue à l'aide du simulateur de coût.

Les nouveaux utilisateurs de Google Cloud peuvent bénéficier d'un essai gratuit.

Une fois que vous avez terminé les tâches décrites dans ce document, supprimez les ressources que vous avez créées pour éviter que des frais vous soient facturés. Pour en savoir plus, consultez la section Effectuer un nettoyage.

Avant de commencer

Dans ce tutoriel, vous utilisez Cloud Shell pour exécuter des commandes. Cloud Shell est un environnement shell permettant de gérer les ressources hébergées sur Google Cloud. Cloud Shell est préinstallé avec les outils de ligne de commande Google Cloud CLI, kubectl et Terraform. Si vous n'utilisez pas Cloud Shell, installez la Google Cloud CLI.

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. Install the Google Cloud CLI.

  3. Si vous utilisez un fournisseur d'identité (IdP) externe, vous devez d'abord vous connecter à la gcloud CLI avec votre identité fédérée.

  4. Pour initialiser la gcloud CLI, exécutez la commande suivante :

    gcloud init
  5. Create or select a Google Cloud project.

    • Create a Google Cloud project:

      gcloud projects create PROJECT_ID

      Replace PROJECT_ID with a name for the Google Cloud project you are creating.

    • Select the Google Cloud project that you created:

      gcloud config set project PROJECT_ID

      Replace PROJECT_ID with your Google Cloud project name.

  6. Make sure that billing is enabled for your Google Cloud project.

  7. Enable the Vertex AI, Cloud Build, Eventarc, Artifact Registry APIs:

    gcloud services enable aiplatform.googleapis.com cloudbuild.googleapis.com eventarc.googleapis.com artifactregistry.googleapis.com
  8. Install the Google Cloud CLI.

  9. Si vous utilisez un fournisseur d'identité (IdP) externe, vous devez d'abord vous connecter à la gcloud CLI avec votre identité fédérée.

  10. Pour initialiser la gcloud CLI, exécutez la commande suivante :

    gcloud init
  11. Create or select a Google Cloud project.

    • Create a Google Cloud project:

      gcloud projects create PROJECT_ID

      Replace PROJECT_ID with a name for the Google Cloud project you are creating.

    • Select the Google Cloud project that you created:

      gcloud config set project PROJECT_ID

      Replace PROJECT_ID with your Google Cloud project name.

  12. Make sure that billing is enabled for your Google Cloud project.

  13. Enable the Vertex AI, Cloud Build, Eventarc, Artifact Registry APIs:

    gcloud services enable aiplatform.googleapis.com cloudbuild.googleapis.com eventarc.googleapis.com artifactregistry.googleapis.com
  14. Grant roles to your user account. Run the following command once for each of the following IAM roles: eventarc.admin

    gcloud projects add-iam-policy-binding PROJECT_ID --member="user:USER_IDENTIFIER" --role=ROLE
    • Replace PROJECT_ID with your project ID.
    • Replace USER_IDENTIFIER with the identifier for your user account. For example, user:myemail@example.com.

    • Replace ROLE with each individual role.
  15. Créer un cluster

    Créez un cluster Qdrant, Elasticsearch ou Postgres :

    Qdrant

    Suivez les instructions de la section Déployer une base de données vectorielle Qdrant sur GKE pour créer un cluster Qdrant s'exécutant sur un cluster GKE en mode Autopilot ou Standard.

    Elasticsearch

    Suivez les instructions de la section Déployer une base de données vectorielle Elasticsearch sur GKE pour créer un cluster Elasticsearch s'exécutant sur un cluster GKE en mode Autopilot ou Standard.

    PGVector

    Suivez les instructions de la section Déployer une base de données vectorielle PostgreSQL sur GKE pour créer un cluster Postgres avec PGVector exécuté sur un cluster GKE en mode Autopilot ou Standard.

    Weaviate

    Suivez les instructions pour déployer une base de données vectorielle Weaviate sur GKE afin de créer un cluster Weaviate s'exécutant sur un cluster GKE en mode Autopilot ou Standard.

    Configurer votre environnement

    Configurez votre environnement avec Cloud Shell :

    1. Définissez les variables d'environnement pour votre projet :

      Qdrant

      export PROJECT_ID=PROJECT_ID
      export KUBERNETES_CLUSTER_PREFIX=qdrant
      export REGION=us-central1
      export DB_NAMESPACE=qdrant
      

      Remplacez PROJECT_ID par l'ID de votre projetGoogle Cloud .

      Elasticsearch

      export PROJECT_ID=PROJECT_ID
      export KUBERNETES_CLUSTER_PREFIX=elasticsearch
      export REGION=us-central1
      export DB_NAMESPACE=elastic
      

      Remplacez PROJECT_ID par l'ID de votre projetGoogle Cloud .

      PGVector

      export PROJECT_ID=PROJECT_ID
      export KUBERNETES_CLUSTER_PREFIX=postgres
      export REGION=us-central1
      export DB_NAMESPACE=pg-ns
      

      Remplacez PROJECT_ID par l'ID de votre projetGoogle Cloud .

      Weaviate

      export PROJECT_ID=PROJECT_ID
      export KUBERNETES_CLUSTER_PREFIX=weaviate
      export REGION=us-central1
      export DB_NAMESPACE=weaviate
      

      Remplacez PROJECT_ID par l'ID de votre projetGoogle Cloud .

    2. Vérifier que votre cluster GKE est en cours d'exécution :

      gcloud container clusters list --project=${PROJECT_ID} --region=${REGION}
      

      Le résultat ressemble à ce qui suit :

      NAME                                    LOCATION        MASTER_VERSION      MASTER_IP     MACHINE_TYPE  NODE_VERSION        NUM_NODES STATUS
      [KUBERNETES_CLUSTER_PREFIX]-cluster   us-central1   1.30.1-gke.1329003  <EXTERNAL IP> e2-standard-2 1.30.1-gke.1329003   6        RUNNING
      
    3. Clonez l'exemple de dépôt de code depuis GitHub :

      git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples
      
    4. Accédez au répertoire databases :

      cd kubernetes-engine-samples/databases
      

    Préparer votre infrastructure

    Créez un dépôt Artifact Registry, compilez des images Docker et transmettez-les à Artifact Registry :

    1. Créer un dépôt Artifact Registry :

      gcloud artifacts repositories create ${KUBERNETES_CLUSTER_PREFIX}-images \
          --repository-format=docker \
          --location=${REGION} \
          --description="Vector database images repository" \
          --async
      
    2. Définissez les autorisations storage.objectAdmin et artifactregistry.admin sur le compte de service Compute Engine pour utiliser Cloud Build afin de compiler et d'envoyer des images Docker pour les services embed-docs et chatbot.

      export PROJECT_NUMBER=PROJECT_NUMBER
      
      gcloud projects add-iam-policy-binding ${PROJECT_ID}  \
      --member="serviceAccount:${PROJECT_NUMBER}-compute@developer.gserviceaccount.com" \
      --role="roles/storage.objectAdmin"
      
      gcloud projects add-iam-policy-binding ${PROJECT_ID}  \
      --member="serviceAccount:${PROJECT_NUMBER}-compute@developer.gserviceaccount.com" \
      --role="roles/artifactregistry.admin"
      

      Remplacez PROJECT_NUMBER par le numéro de votre projetGoogle Cloud .

    3. Compilez des images Docker pour les services embed-docs et chatbot. L'image embed-docs contient le code Python de l'application qui reçoit les requêtes de redirection Eventarc et le job d'intégration.

      Qdrant

      export DOCKER_REPO="${REGION}-docker.pkg.dev/${PROJECT_ID}/${KUBERNETES_CLUSTER_PREFIX}-images"
      gcloud builds submit qdrant/docker/chatbot --region=${REGION} \
        --tag ${DOCKER_REPO}/chatbot:1.0 --async
      gcloud builds submit qdrant/docker/embed-docs --region=${REGION} \
        --tag ${DOCKER_REPO}/embed-docs:1.0 --async
      

      Elasticsearch

      export DOCKER_REPO="${REGION}-docker.pkg.dev/${PROJECT_ID}/${KUBERNETES_CLUSTER_PREFIX}-images"
      gcloud builds submit elasticsearch/docker/chatbot --region=${REGION} \
        --tag ${DOCKER_REPO}/chatbot:1.0 --async
      gcloud builds submit elasticsearch/docker/embed-docs --region=${REGION} \
        --tag ${DOCKER_REPO}/embed-docs:1.0 --async
      

      PGVector

      export DOCKER_REPO="${REGION}-docker.pkg.dev/${PROJECT_ID}/${KUBERNETES_CLUSTER_PREFIX}-images"
      gcloud builds submit postgres-pgvector/docker/chatbot --region=${REGION} \
        --tag ${DOCKER_REPO}/chatbot:1.0 --async
      gcloud builds submit postgres-pgvector/docker/embed-docs --region=${REGION} \
        --tag ${DOCKER_REPO}/embed-docs:1.0 --async
      

      Weaviate

      export DOCKER_REPO="${REGION}-docker.pkg.dev/${PROJECT_ID}/${KUBERNETES_CLUSTER_PREFIX}-images"
      gcloud builds submit weaviate/docker/chatbot --region=${REGION} \
        --tag ${DOCKER_REPO}/chatbot:1.0 --async
      gcloud builds submit weaviate/docker/embed-docs --region=${REGION} \
        --tag ${DOCKER_REPO}/embed-docs:1.0 --async
      
    4. Vérifiez les images :

      gcloud artifacts docker images list $DOCKER_REPO \
          --project=$PROJECT_ID \
          --format="value(IMAGE)"
      

      Le résultat ressemble à ce qui suit :

      $REGION-docker.pkg.dev/$PROJECT_ID/${KUBERNETES_CLUSTER_PREFIX}-images/chatbot
      $REGION-docker.pkg.dev/$PROJECT_ID/${KUBERNETES_CLUSTER_PREFIX}-images/embed-docs
      
    5. Déployez un compte de service Kubernetes avec des autorisations pour exécuter des jobs Kubernetes :

      Qdrant

      sed "s/<PROJECT_ID>/$PROJECT_ID/;s/<CLUSTER_PREFIX>/$KUBERNETES_CLUSTER_PREFIX/" qdrant/manifests/05-rag/service-account.yaml | kubectl -n qdrant apply -f -
      

      Elasticsearch

      sed "s/<PROJECT_ID>/$PROJECT_ID/;s/<CLUSTER_PREFIX>/$KUBERNETES_CLUSTER_PREFIX/" elasticsearch/manifests/05-rag/service-account.yaml | kubectl -n elastic apply -f -
      

      PGVector

      sed "s/<PROJECT_ID>/$PROJECT_ID/;s/<CLUSTER_PREFIX>/$KUBERNETES_CLUSTER_PREFIX/" postgres-pgvector/manifests/03-rag/service-account.yaml | kubectl -n pg-ns apply -f -
      

      Weaviate

      sed "s/<PROJECT_ID>/$PROJECT_ID/;s/<CLUSTER_PREFIX>/$KUBERNETES_CLUSTER_PREFIX/" weaviate/manifests/04-rag/service-account.yaml | kubectl -n weaviate apply -f -
      
    6. Lorsque vous utilisez Terraform pour créer le cluster GKE et que create_service_account est défini sur "true", un compte de service distinct est créé et utilisé par le cluster et les nœuds. Attribuez le rôle artifactregistry.serviceAgent à ce compte de service Compute Engine pour permettre aux nœuds d'extraire l'image du registre Artifact Registry créé pour embed-docs et chatbot.

      export CLUSTER_SERVICE_ACCOUNT=$(gcloud container clusters describe ${KUBERNETES_CLUSTER_PREFIX}-cluster \
      --region=${REGION} \
      --format="value(nodeConfig.serviceAccount)")
      
      gcloud projects add-iam-policy-binding ${PROJECT_ID}  \
      --member="serviceAccount:${CLUSTER_SERVICE_ACCOUNT}" \
      --role="roles/artifactregistry.serviceAgent"
      

      Si vous n'accordez pas l'accès au compte de service, vos nœuds peuvent rencontrer des problèmes d'autorisation lorsqu'ils tentent d'extraire l'image d'Artifact Registry lors du déploiement des services embed-docs et chatbot.

    7. Déployez un déploiement Kubernetes pour les services embed-docs et chatbot. Un déploiement est un objet de l'API Kubernetes qui vous permet d'exécuter plusieurs instances dupliquées de pods répartis entre les nœuds d'un cluster :

      Qdrant

      sed "s|<DOCKER_REPO>|$DOCKER_REPO|" qdrant/manifests/05-rag/chatbot.yaml | kubectl -n qdrant apply -f -
      sed "s|<DOCKER_REPO>|$DOCKER_REPO|" qdrant/manifests/05-rag/docs-embedder.yaml | kubectl -n qdrant apply -f -
      

      Elasticsearch

      sed "s|<DOCKER_REPO>|$DOCKER_REPO|" elasticsearch/manifests/05-rag/chatbot.yaml | kubectl -n elastic apply -f -
      sed "s|<DOCKER_REPO>|$DOCKER_REPO|" elasticsearch/manifests/05-rag/docs-embedder.yaml | kubectl -n elastic apply -f -
      

      PGVector

      sed "s|<DOCKER_REPO>|$DOCKER_REPO|" postgres-pgvector/manifests/03-rag/chatbot.yaml | kubectl -n pg-ns apply -f -
      sed "s|<DOCKER_REPO>|$DOCKER_REPO|" postgres-pgvector/manifests/03-rag/docs-embedder.yaml | kubectl -n pg-ns apply -f -
      

      Weaviate

      sed "s|<DOCKER_REPO>|$DOCKER_REPO|" weaviate/manifests/04-rag/chatbot.yaml | kubectl -n weaviate apply -f -
      sed "s|<DOCKER_REPO>|$DOCKER_REPO|" weaviate/manifests/04-rag/docs-embedder.yaml | kubectl -n weaviate apply -f -
      
    8. Activez les déclencheurs Eventarc pour GKE :

      gcloud eventarc gke-destinations init
      

      Lorsque vous y êtes invité, saisissez y.

    9. Déployez le bucket Cloud Storage et créez un déclencheur Eventarc à l'aide de Terraform :

      export GOOGLE_OAUTH_ACCESS_TOKEN=$(gcloud auth print-access-token)
      terraform -chdir=vector-database/terraform/cloud-storage init
      terraform -chdir=vector-database/terraform/cloud-storage apply \
        -var project_id=${PROJECT_ID} \
        -var region=${REGION} \
        -var cluster_prefix=${KUBERNETES_CLUSTER_PREFIX} \
        -var db_namespace=${DB_NAMESPACE}
      

      Lorsque vous y êtes invité, saisissez yes. L'exécution de la commande peut prendre plusieurs minutes.

      Terraform crée les ressources suivantes :

      • Un bucket Cloud Storage pour importer les documents
      • Déclencheur Eventarc
      • Un compte de service Google Cloud nommé service_account_eventarc_name avec l'autorisation d'utiliser Eventarc.
      • Un compte de service Google Cloud nommé service_account_bucket_name avec l'autorisation de lire le bucket et d'accéder aux modèles Vertex AI.

      Le résultat ressemble à ce qui suit :

      ... # Several lines of output omitted
      
      Apply complete! Resources: 15 added, 0 changed, 0 destroyed.
      
      ... # Several lines of output omitted
      

    Charger des documents et exécuter des requêtes de chatbot

    Importez les documents de démonstration et exécutez des requêtes pour les rechercher à l'aide du chatbot :

    1. Importez l'exemple de document carbon-free-energy.pdf dans le bucket :

      gcloud storage cp vector-database/documents/carbon-free-energy.pdf gs://${PROJECT_ID}-${KUBERNETES_CLUSTER_PREFIX}-training-docs
      
    2. Vérifiez que le job d'intégration de documents a bien été effectué :

      kubectl get job -n ${DB_NAMESPACE}
      

      Le résultat ressemble à ce qui suit :

      NAME                            COMPLETIONS   DURATION   AGE
      docs-embedder1716570453361446   1/1           32s        71s
      
    3. Obtenez l'adresse IP externe de l'équilibreur de charge :

      export EXTERNAL_IP=$(kubectl -n ${DB_NAMESPACE} get svc chatbot --output jsonpath='{.status.loadBalancer.ingress[0].ip}')
      echo http://${EXTERNAL_IP}:80
      
    4. Ouvrez l'adresse IP externe dans votre navigateur :

      http://EXTERNAL_IP
      

      Le chatbot répond par un message semblable à celui-ci :

      How can I help you?
      
    5. Posez des questions sur le contenu des documents importés. Si le chatbot ne trouve rien, il répond I don't know. Par exemple, vous pouvez lui poser les questions suivantes :

      You: Hi, what are Google plans for the future?
      

      Voici un exemple de réponse du chatbot :

      Bot: Google intends to run on carbon-free energy everywhere, at all times by 2030. To achieve this, it will rely on a combination of renewable energy sources, such as wind and solar, and carbon-free technologies, such as battery storage.
      
    6. Posez au chatbot une question qui n'a aucun rapport avec le document importé. Par exemple, vous pouvez poser les questions suivantes :

      You: What are Google plans to colonize Mars?
      

      Voici un exemple de réponse du chatbot :

      Bot: I don't know. The provided context does not mention anything about Google's plans to colonize Mars.
      

    À propos du code d'application

    Cette section explique le fonctionnement du code. Les images Docker contiennent trois scripts :

    • endpoint.py : reçoit les événements Eventarc lors de chaque importation de document et lance les jobs Kubernetes pour les traiter.
    • embedding-job.py : télécharge les documents depuis le bucket, crée des embeddings et les insère dans la base de données vectorielle.
    • chat.py : exécute des requêtes sur le contenu des documents stockés.

    Le schéma montre le processus de génération de réponses à l'aide des données des documents :

    Dans le diagramme, l'application charge un fichier PDF, le divise en blocs, puis en vecteurs, et envoie ensuite les vecteurs à une base de données vectorielle. Plus tard, un utilisateur pose une question au chatbot. La chaîne RAG utilise la recherche sémantique pour rechercher dans la base de données vectorielle, puis renvoie le contexte avec la question au LLM. Le LLM répond à la question et l'enregistre dans l'historique du chat.

    À propos de endpoint.py

    Ce fichier traite les messages d'Eventarc, crée un job Kubernetes pour intégrer le document et accepte les requêtes depuis n'importe où sur le port 5001.

    Qdrant

    # Copyright 2024 Google LLC
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #     https://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    
    from flask import Flask, jsonify
    from flask import request
    import logging
    import sys,os, time
    from kubernetes import client, config, utils
    import kubernetes.client
    from kubernetes.client.rest import ApiException
    
    
    app = Flask(__name__)
    @app.route('/check')
    def message():
        return jsonify({"Message": "Hi there"})
    
    
    @app.route('/', methods=['POST'])
    def bucket():
        request_data = request.get_json()
        print(request_data)
        bckt = request_data['bucket']
        f_name = request_data['name']
        id = request_data['generation'] 
        kube_create_job(bckt, f_name, id)
        return "ok"
    
    # Set logging
    logging.basicConfig(stream=sys.stdout, level=logging.INFO)
    
    # Setup K8 configs
    config.load_incluster_config()
    def kube_create_job_object(name, container_image, bucket_name, f_name, namespace="qdrant", container_name="jobcontainer", env_vars={}):
    
        body = client.V1Job(api_version="batch/v1", kind="Job")
        body.metadata = client.V1ObjectMeta(namespace=namespace, name=name)
        body.status = client.V1JobStatus()
    
        template = client.V1PodTemplate()
        template.template = client.V1PodTemplateSpec()
        env_list = [
            client.V1EnvVar(name="QDRANT_URL", value=os.getenv("QDRANT_URL")),
            client.V1EnvVar(name="COLLECTION_NAME", value="training-docs"), 
            client.V1EnvVar(name="FILE_NAME", value=f_name), 
            client.V1EnvVar(name="BUCKET_NAME", value=bucket_name),
            client.V1EnvVar(name="APIKEY", value_from=client.V1EnvVarSource(secret_key_ref=client.V1SecretKeySelector(key="api-key", name="qdrant-database-apikey"))), 
        ]
    
        container = client.V1Container(name=container_name, image=container_image, env=env_list)
        template.template.spec = client.V1PodSpec(containers=[container], restart_policy='Never', service_account='embed-docs-sa')
    
        body.spec = client.V1JobSpec(backoff_limit=3, ttl_seconds_after_finished=60, template=template.template)
        return body
    def kube_test_credentials():
        try: 
            api_response = api_instance.get_api_resources()
            logging.info(api_response)
        except ApiException as e:
            print("Exception when calling API: %s\n" % e)
    
    def kube_create_job(bckt, f_name, id):
        container_image = os.getenv("JOB_IMAGE")
        namespace = os.getenv("JOB_NAMESPACE")
        name = "docs-embedder" + id
        body = kube_create_job_object(name, container_image, bckt, f_name)
        v1=client.BatchV1Api()
        try: 
            v1.create_namespaced_job(namespace, body, pretty=True)
        except ApiException as e:
            print("Exception when calling BatchV1Api->create_namespaced_job: %s\n" % e)
        return
    
    if __name__ == '__main__':
        app.run('0.0.0.0', port=5001, debug=True)
    

    Elasticsearch

    # Copyright 2024 Google LLC
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #     https://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    
    from flask import Flask, jsonify
    from flask import request
    import logging
    import sys,os, time
    from kubernetes import client, config, utils
    import kubernetes.client
    from kubernetes.client.rest import ApiException
    
    
    app = Flask(__name__)
    @app.route('/check')
    def message():
        return jsonify({"Message": "Hi there"})
    
    
    @app.route('/', methods=['POST'])
    def bucket():
        request_data = request.get_json()
        print(request_data)
        bckt = request_data['bucket']
        f_name = request_data['name']
        id = request_data['generation'] 
        kube_create_job(bckt, f_name, id)
        return "ok"
    
    # Set logging
    logging.basicConfig(stream=sys.stdout, level=logging.INFO)
    
    # Setup K8 configs
    config.load_incluster_config()
    
    def kube_create_job_object(name, container_image, bucket_name, f_name, namespace="elastic", container_name="jobcontainer", env_vars={}):
    
        body = client.V1Job(api_version="batch/v1", kind="Job")
        body.metadata = client.V1ObjectMeta(namespace=namespace, name=name)
        body.status = client.V1JobStatus()
    
        template = client.V1PodTemplate()
        template.template = client.V1PodTemplateSpec()
        env_list = [
            client.V1EnvVar(name="ES_URL", value=os.getenv("ES_URL")),
            client.V1EnvVar(name="INDEX_NAME", value="training-docs"), 
            client.V1EnvVar(name="FILE_NAME", value=f_name), 
            client.V1EnvVar(name="BUCKET_NAME", value=bucket_name),
            client.V1EnvVar(name="PASSWORD", value_from=client.V1EnvVarSource(secret_key_ref=client.V1SecretKeySelector(key="elastic", name="elasticsearch-ha-es-elastic-user"))), 
        ]
    
        container = client.V1Container(name=container_name, image=container_image, image_pull_policy='Always', env=env_list)
        template.template.spec = client.V1PodSpec(containers=[container], restart_policy='Never', service_account='embed-docs-sa')
    
        body.spec = client.V1JobSpec(backoff_limit=3, ttl_seconds_after_finished=60, template=template.template)
        return body
    
    def kube_test_credentials():
        try: 
            api_response = api_instance.get_api_resources()
            logging.info(api_response)
        except ApiException as e:
            print("Exception when calling API: %s\n" % e)
    
    def kube_create_job(bckt, f_name, id):
        container_image = os.getenv("JOB_IMAGE")
        namespace = os.getenv("JOB_NAMESPACE")
        name = "docs-embedder" + id
        body = kube_create_job_object(name, container_image, bckt, f_name)
        v1=client.BatchV1Api()
        try: 
            v1.create_namespaced_job(namespace, body, pretty=True)
        except ApiException as e:
            print("Exception when calling BatchV1Api->create_namespaced_job: %s\n" % e)
        return
    
    if __name__ == '__main__':
        app.run('0.0.0.0', port=5001, debug=True)
    

    PGVector

    # Copyright 2024 Google LLC
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #     https://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    
    from flask import Flask, jsonify
    from flask import request
    import logging
    import sys,os, time
    from kubernetes import client, config, utils
    import kubernetes.client
    from kubernetes.client.rest import ApiException
    
    
    app = Flask(__name__)
    @app.route('/check')
    def message():
        return jsonify({"Message": "Hi there"})
    
    
    @app.route('/', methods=['POST'])
    def bucket():
        request_data = request.get_json()
        print(request_data)
        bckt = request_data['bucket']
        f_name = request_data['name']
        id = request_data['generation'] 
        kube_create_job(bckt, f_name, id)
        return "ok"
    
    # Set logging
    logging.basicConfig(stream=sys.stdout, level=logging.INFO)
    
    # Setup K8 configs
    config.load_incluster_config()
    def kube_create_job_object(name, container_image, bucket_name, f_name, namespace="pg-ns", container_name="jobcontainer", env_vars={}):
    
        body = client.V1Job(api_version="batch/v1", kind="Job")
        body.metadata = client.V1ObjectMeta(namespace=namespace, name=name)
        body.status = client.V1JobStatus()
    
        template = client.V1PodTemplate()
        template.template = client.V1PodTemplateSpec()
        env_list = [
            client.V1EnvVar(name="POSTGRES_HOST", value=os.getenv("POSTGRES_HOST")),
            client.V1EnvVar(name="DATABASE_NAME", value="app"), 
            client.V1EnvVar(name="COLLECTION_NAME", value="training-docs"), 
            client.V1EnvVar(name="FILE_NAME", value=f_name), 
            client.V1EnvVar(name="BUCKET_NAME", value=bucket_name),
            client.V1EnvVar(name="PASSWORD", value_from=client.V1EnvVarSource(secret_key_ref=client.V1SecretKeySelector(key="password", name="gke-pg-cluster-app"))), 
            client.V1EnvVar(name="USERNAME", value_from=client.V1EnvVarSource(secret_key_ref=client.V1SecretKeySelector(key="username", name="gke-pg-cluster-app"))), 
        ]
    
        container = client.V1Container(name=container_name, image=container_image, image_pull_policy='Always', env=env_list)
        template.template.spec = client.V1PodSpec(containers=[container], restart_policy='Never', service_account='embed-docs-sa')
    
        body.spec = client.V1JobSpec(backoff_limit=3, ttl_seconds_after_finished=60, template=template.template)
        return body
    def kube_test_credentials():
        try: 
            api_response = api_instance.get_api_resources()
            logging.info(api_response)
        except ApiException as e:
            print("Exception when calling API: %s\n" % e)
    
    def kube_create_job(bckt, f_name, id):
        container_image = os.getenv("JOB_IMAGE")
        namespace = os.getenv("JOB_NAMESPACE")
        name = "docs-embedder" + id
        body = kube_create_job_object(name, container_image, bckt, f_name)
        v1=client.BatchV1Api()
        try: 
            v1.create_namespaced_job(namespace, body, pretty=True)
        except ApiException as e:
            print("Exception when calling BatchV1Api->create_namespaced_job: %s\n" % e)
        return
    
    if __name__ == '__main__':
        app.run('0.0.0.0', port=5001, debug=True)
    

    Weaviate

    # Copyright 2024 Google LLC
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #     https://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    
    from flask import Flask, jsonify
    from flask import request
    import logging
    import sys,os, time
    from kubernetes import client, config, utils
    import kubernetes.client
    from kubernetes.client.rest import ApiException
    
    
    app = Flask(__name__)
    @app.route('/check')
    def message():
        return jsonify({"Message": "Hi there"})
    
    
    @app.route('/', methods=['POST'])
    def bucket():
        request_data = request.get_json()
        print(request_data)
        bckt = request_data['bucket']
        f_name = request_data['name']
        id = request_data['generation'] 
        kube_create_job(bckt, f_name, id)
        return "ok"
    
    # Set logging
    logging.basicConfig(stream=sys.stdout, level=logging.INFO)
    
    # Setup K8 configs
    config.load_incluster_config()
    def kube_create_job_object(name, container_image, bucket_name, f_name, namespace, container_name="jobcontainer", env_vars={}):
    
        body = client.V1Job(api_version="batch/v1", kind="Job")
        body.metadata = client.V1ObjectMeta(namespace=namespace, name=name)
        body.status = client.V1JobStatus()
    
        template = client.V1PodTemplate()
        template.template = client.V1PodTemplateSpec()
        env_list = [
            client.V1EnvVar(name="WEAVIATE_ENDPOINT", value=os.getenv("WEAVIATE_ENDPOINT")),
            client.V1EnvVar(name="WEAVIATE_GRPC_ENDPOINT", value=os.getenv("WEAVIATE_GRPC_ENDPOINT")),
            client.V1EnvVar(name="FILE_NAME", value=f_name), 
            client.V1EnvVar(name="BUCKET_NAME", value=bucket_name),
            client.V1EnvVar(name="APIKEY", value_from=client.V1EnvVarSource(secret_key_ref=client.V1SecretKeySelector(key="AUTHENTICATION_APIKEY_ALLOWED_KEYS", name="apikeys"))), 
        ]
    
        container = client.V1Container(name=container_name, image=container_image, image_pull_policy='Always', env=env_list)
        template.template.spec = client.V1PodSpec(containers=[container], restart_policy='Never', service_account='embed-docs-sa')
    
        body.spec = client.V1JobSpec(backoff_limit=3, ttl_seconds_after_finished=60, template=template.template)
        return body
    def kube_test_credentials():
        try: 
            api_response = api_instance.get_api_resources()
            logging.info(api_response)
        except ApiException as e:
            print("Exception when calling API: %s\n" % e)
    
    def kube_create_job(bckt, f_name, id):
        container_image = os.getenv("JOB_IMAGE")
        namespace = os.getenv("JOB_NAMESPACE")
        name = "docs-embedder" + id
        body = kube_create_job_object(name, container_image, bckt, f_name, namespace)
        v1=client.BatchV1Api()
        try: 
            v1.create_namespaced_job(namespace, body, pretty=True)
        except ApiException as e:
            print("Exception when calling BatchV1Api->create_namespaced_job: %s\n" % e)
        return
    
    if __name__ == '__main__':
        app.run('0.0.0.0', port=5001, debug=True)
    

    À propos de embedding-job.py

    Ce fichier traite les documents et les envoie à la base de données vectorielle.

    Qdrant

    # Copyright 2024 Google LLC
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #     https://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    
    from langchain_google_vertexai import ChatVertexAI
    from langchain.prompts import ChatPromptTemplate
    from langchain_google_vertexai import VertexAIEmbeddings
    from langchain.memory import ConversationBufferWindowMemory
    from langchain_community.vectorstores import Qdrant
    from qdrant_client import QdrantClient
    import streamlit as st
    import os
    
    vertexAI = ChatVertexAI(model_name=os.getenv("VERTEX_AI_MODEL_NAME", "gemini-2.5-flash-preview-04-17"), streaming=True, convert_system_message_to_human=True)
    prompt_template = ChatPromptTemplate.from_messages(
        [
            ("system", "You are a helpful assistant who helps in finding answers to questions using the provided context."),
            ("human", """
            The answer should be based on the text context given in "text_context" and the conversation history given in "conversation_history" along with its Caption: \n
            Base your response on the provided text context and the current conversation history to answer the query.
            Select the most relevant information from the context.
            Generate a draft response using the selected information. Remove duplicate content from the draft response.
            Generate your final response after adjusting it to increase accuracy and relevance.
            Now only show your final response!
            If you do not know the answer or context is not relevant, response with "I don't know".
    
            text_context:
            {context}
    
            conversation_history:
            {history}
    
            query:
            {query}
            """),
        ]
    )
    
    embedding_model = VertexAIEmbeddings("text-embedding-005")
    
    client = QdrantClient(
        url=os.getenv("QDRANT_URL"),
        api_key=os.getenv("APIKEY"),
    )
    collection_name = os.getenv("COLLECTION_NAME")
    vector_search = Qdrant(client, collection_name, embeddings=embedding_model)
    def format_docs(docs):
        return "\n\n".join([d.page_content for d in docs])
    
    st.title("🤖 Chatbot")
    if "messages" not in st.session_state:
        st.session_state["messages"] = [{"role": "ai", "content": "How can I help you?"}]
    if "memory" not in st.session_state:
        st.session_state["memory"] = ConversationBufferWindowMemory(
            memory_key="history",
            ai_prefix="Bob",
            human_prefix="User",
            k=3,
        )
    for message in st.session_state.messages:
        with st.chat_message(message["role"]):
            st.write(message["content"])
    if chat_input := st.chat_input():
        with st.chat_message("human"):
            st.write(chat_input)
            st.session_state.messages.append({"role": "human", "content": chat_input})
    
        found_docs = vector_search.similarity_search(chat_input)
        context = format_docs(found_docs)
    
        prompt_value = prompt_template.format_messages(name="Bob", query=chat_input, context=context, history=st.session_state.memory.load_memory_variables({}))
        with st.chat_message("ai"):
            with st.spinner("Typing..."):
                content = ""
                with st.empty():
                    for chunk in vertexAI.stream(prompt_value):
                        content += chunk.content
                        st.write(content)
                st.session_state.messages.append({"role": "ai", "content": content})
    
        st.session_state.memory.save_context({"input": chat_input}, {"output": content})
    
    

    Elasticsearch

    # Copyright 2024 Google LLC
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #     https://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    
    from langchain_google_vertexai import VertexAIEmbeddings
    from langchain_community.document_loaders import PyPDFLoader
    from langchain.text_splitter import RecursiveCharacterTextSplitter
    from elasticsearch import Elasticsearch
    from langchain_community.vectorstores.elasticsearch import ElasticsearchStore
    from google.cloud import storage
    import os
    
    bucketname = os.getenv("BUCKET_NAME")
    filename = os.getenv("FILE_NAME")
    
    storage_client = storage.Client()
    bucket = storage_client.bucket(bucketname)
    blob = bucket.blob(filename)
    blob.download_to_filename("/documents/" + filename)
    
    loader = PyPDFLoader("/documents/" + filename)
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
    documents = loader.load_and_split(text_splitter)
    
    embeddings = VertexAIEmbeddings("text-embedding-005")
    
    client = Elasticsearch(
        [os.getenv("ES_URL")], 
        verify_certs=False, 
        ssl_show_warn=False,
        basic_auth=("elastic", os.getenv("PASSWORD"))
    )
    
    db = ElasticsearchStore.from_documents(
        documents,
        embeddings,
        es_connection=client,
        index_name=os.getenv("INDEX_NAME")
    )
    db.client.indices.refresh(index=os.getenv("INDEX_NAME"))
    
    print(filename + " was successfully embedded") 
    print(f"# of vectors = {len(documents)}")
    
    

    PGVector

    # Copyright 2024 Google LLC
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #     https://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    
    from langchain_google_vertexai import VertexAIEmbeddings
    from langchain_community.document_loaders import PyPDFLoader
    from langchain.text_splitter import RecursiveCharacterTextSplitter
    from langchain_community.vectorstores.pgvector import PGVector
    from google.cloud import storage
    import os
    bucketname = os.getenv("BUCKET_NAME")
    filename = os.getenv("FILE_NAME")
    
    storage_client = storage.Client()
    bucket = storage_client.bucket(bucketname)
    blob = bucket.blob(filename)
    blob.download_to_filename("/documents/" + filename)
    
    loader = PyPDFLoader("/documents/" + filename)
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
    documents = loader.load_and_split(text_splitter)
    for document in documents:
        document.page_content = document.page_content.replace('\x00', '')
    
    embeddings = VertexAIEmbeddings("text-embedding-005")
    
    CONNECTION_STRING = PGVector.connection_string_from_db_params(
        driver="psycopg2",
        host=os.environ.get("POSTGRES_HOST"),
        port=5432,
        database=os.environ.get("DATABASE_NAME"),
        user=os.environ.get("USERNAME"),
        password=os.environ.get("PASSWORD"),
    )
    COLLECTION_NAME = os.environ.get("COLLECTION_NAME")
    
    db = PGVector.from_documents(
        embedding=embeddings,
        documents=documents,
        collection_name=COLLECTION_NAME,
        connection_string=CONNECTION_STRING,
        use_jsonb=True
    )
    
    print(filename + " was successfully embedded") 
    print(f"# of vectors = {len(documents)}")
    
    

    Weaviate

    # Copyright 2024 Google LLC
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #     https://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    
    from langchain_google_vertexai import VertexAIEmbeddings
    from langchain_community.document_loaders import PyPDFLoader
    from langchain.text_splitter import RecursiveCharacterTextSplitter
    import weaviate
    from weaviate.connect import ConnectionParams
    from langchain_weaviate.vectorstores import WeaviateVectorStore
    from google.cloud import storage
    import os
    bucketname = os.getenv("BUCKET_NAME")
    filename = os.getenv("FILE_NAME")
    
    storage_client = storage.Client()
    bucket = storage_client.bucket(bucketname)
    blob = bucket.blob(filename)
    blob.download_to_filename("/documents/" + filename)
    
    loader = PyPDFLoader("/documents/" + filename)
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
    documents = loader.load_and_split(text_splitter)
    
    embeddings = VertexAIEmbeddings("text-embedding-005")
    
    auth_config = weaviate.auth.AuthApiKey(api_key=os.getenv("APIKEY"))
    client = weaviate.WeaviateClient(
        connection_params=ConnectionParams.from_params(
            http_host=os.getenv("WEAVIATE_ENDPOINT"),
            http_port="80",
            http_secure=False,
            grpc_host=os.getenv("WEAVIATE_GRPC_ENDPOINT"),
            grpc_port="50051",
            grpc_secure=False,
        ),
        auth_client_secret=auth_config
    )
    client.connect()
    if not client.collections.exists("trainingdocs"):
        collection = client.collections.create(name="trainingdocs")
    db = WeaviateVectorStore.from_documents(documents, embeddings, client=client, index_name="trainingdocs")
    
    print(filename + " was successfully embedded") 
    print(f"# of vectors = {len(documents)}")
    
    

    À propos de chat.py

    Ce fichier configure le modèle pour qu'il réponde aux questions en utilisant uniquement le contexte fourni et les réponses précédentes. Si le contexte ou l'historique des conversations ne correspondent à aucune donnée, le modèle renvoie I don't know.

    Qdrant

    # Copyright 2024 Google LLC
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #     https://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    
    from flask import Flask, jsonify
    from flask import request
    import logging
    import sys,os, time
    from kubernetes import client, config, utils
    import kubernetes.client
    from kubernetes.client.rest import ApiException
    
    
    app = Flask(__name__)
    @app.route('/check')
    def message():
        return jsonify({"Message": "Hi there"})
    
    
    @app.route('/', methods=['POST'])
    def bucket():
        request_data = request.get_json()
        print(request_data)
        bckt = request_data['bucket']
        f_name = request_data['name']
        id = request_data['generation'] 
        kube_create_job(bckt, f_name, id)
        return "ok"
    
    # Set logging
    logging.basicConfig(stream=sys.stdout, level=logging.INFO)
    
    # Setup K8 configs
    config.load_incluster_config()
    def kube_create_job_object(name, container_image, bucket_name, f_name, namespace="qdrant", container_name="jobcontainer", env_vars={}):
    
        body = client.V1Job(api_version="batch/v1", kind="Job")
        body.metadata = client.V1ObjectMeta(namespace=namespace, name=name)
        body.status = client.V1JobStatus()
    
        template = client.V1PodTemplate()
        template.template = client.V1PodTemplateSpec()
        env_list = [
            client.V1EnvVar(name="QDRANT_URL", value=os.getenv("QDRANT_URL")),
            client.V1EnvVar(name="COLLECTION_NAME", value="training-docs"), 
            client.V1EnvVar(name="FILE_NAME", value=f_name), 
            client.V1EnvVar(name="BUCKET_NAME", value=bucket_name),
            client.V1EnvVar(name="APIKEY", value_from=client.V1EnvVarSource(secret_key_ref=client.V1SecretKeySelector(key="api-key", name="qdrant-database-apikey"))), 
        ]
    
        container = client.V1Container(name=container_name, image=container_image, env=env_list)
        template.template.spec = client.V1PodSpec(containers=[container], restart_policy='Never', service_account='embed-docs-sa')
    
        body.spec = client.V1JobSpec(backoff_limit=3, ttl_seconds_after_finished=60, template=template.template)
        return body
    def kube_test_credentials():
        try: 
            api_response = api_instance.get_api_resources()
            logging.info(api_response)
        except ApiException as e:
            print("Exception when calling API: %s\n" % e)
    
    def kube_create_job(bckt, f_name, id):
        container_image = os.getenv("JOB_IMAGE")
        namespace = os.getenv("JOB_NAMESPACE")
        name = "docs-embedder" + id
        body = kube_create_job_object(name, container_image, bckt, f_name)
        v1=client.BatchV1Api()
        try: 
            v1.create_namespaced_job(namespace, body, pretty=True)
        except ApiException as e:
            print("Exception when calling BatchV1Api->create_namespaced_job: %s\n" % e)
        return
    
    if __name__ == '__main__':
        app.run('0.0.0.0', port=5001, debug=True)
    

    Elasticsearch

    # Copyright 2024 Google LLC
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #     https://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    
    from langchain_google_vertexai import ChatVertexAI
    from langchain.prompts import ChatPromptTemplate
    from langchain_google_vertexai import VertexAIEmbeddings
    from langchain.memory import ConversationBufferWindowMemory
    from elasticsearch import Elasticsearch
    from langchain_community.vectorstores.elasticsearch import ElasticsearchStore
    import streamlit as st
    import os
    
    vertexAI = ChatVertexAI(model_name=os.getenv("VERTEX_AI_MODEL_NAME", "gemini-2.5-flash-preview-04-17"), streaming=True, convert_system_message_to_human=True)
    prompt_template = ChatPromptTemplate.from_messages(
        [
            ("system", "You are a helpful assistant who helps in finding answers to questions using the provided context."),
            ("human", """
            The answer should be based on the text context given in "text_context" and the conversation history given in "conversation_history" along with its Caption: \n
            Base your response on the provided text context and the current conversation history to answer the query.
            Select the most relevant information from the context.
            Generate a draft response using the selected information. Remove duplicate content from the draft response.
            Generate your final response after adjusting it to increase accuracy and relevance.
            Now only show your final response!
            If you do not know the answer or context is not relevant, response with "I don't know".
    
            text_context:
            {context}
    
            conversation_history:
            {history}
    
            query:
            {query}
            """),
        ]
    )
    
    embedding_model = VertexAIEmbeddings("text-embedding-005")
    
    client = Elasticsearch(
        [os.getenv("ES_URL")], 
        verify_certs=False, 
        ssl_show_warn=False,
        basic_auth=("elastic", os.getenv("PASSWORD"))
    )
    vector_search = ElasticsearchStore(
        index_name=os.getenv("INDEX_NAME"),
        es_connection=client,
        embedding=embedding_model
    )
    
    def format_docs(docs):
        return "\n\n".join([d.page_content for d in docs])
    
    st.title("🤖 Chatbot")
    if "messages" not in st.session_state:
        st.session_state["messages"] = [{"role": "ai", "content": "How can I help you?"}]
    
    if "memory" not in st.session_state:
        st.session_state["memory"] = ConversationBufferWindowMemory(
            memory_key="history",
            ai_prefix="Bot",
            human_prefix="User",
            k=3,
        )
    
    for message in st.session_state.messages:
        with st.chat_message(message["role"]):
            st.write(message["content"])
    
    if chat_input := st.chat_input():
        with st.chat_message("human"):
            st.write(chat_input)
            st.session_state.messages.append({"role": "human", "content": chat_input})
    
        found_docs = vector_search.similarity_search(chat_input)
        context = format_docs(found_docs)
    
        prompt_value = prompt_template.format_messages(name="Bot", query=chat_input, context=context, history=st.session_state.memory.load_memory_variables({}))
        with st.chat_message("ai"):
            with st.spinner("Typing..."):
                content = ""
                with st.empty():
                    for chunk in vertexAI.stream(prompt_value):
                        content += chunk.content
                        st.write(content)
                st.session_state.messages.append({"role": "ai", "content": content})
    
        st.session_state.memory.save_context({"input": chat_input}, {"output": content})
    
    

    PGVector

    # Copyright 2024 Google LLC
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #     https://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    
    from langchain_google_vertexai import ChatVertexAI
    from langchain.prompts import ChatPromptTemplate
    from langchain_google_vertexai import VertexAIEmbeddings
    from langchain.memory import ConversationBufferWindowMemory
    from langchain_community.vectorstores.pgvector import PGVector
    import streamlit as st
    import os
    
    vertexAI = ChatVertexAI(model_name=os.getenv("VERTEX_AI_MODEL_NAME", "gemini-2.5-flash-preview-04-17"), streaming=True, convert_system_message_to_human=True)
    prompt_template = ChatPromptTemplate.from_messages(
        [
            ("system", "You are a helpful assistant who helps in finding answers to questions using the provided context."),
            ("human", """
            The answer should be based on the text context given in "text_context" and the conversation history given in "conversation_history" along with its Caption: \n
            Base your response on the provided text context and the current conversation history to answer the query.
            Select the most relevant information from the context.
            Generate a draft response using the selected information. Remove duplicate content from the draft response.
            Generate your final response after adjusting it to increase accuracy and relevance.
            Now only show your final response!
            If you do not know the answer or context is not relevant, response with "I don't know".
    
            text_context:
            {context}
    
            conversation_history:
            {history}
    
            query:
            {query}
            """),
        ]
    )
    
    embedding_model = VertexAIEmbeddings("text-embedding-005")
    
    CONNECTION_STRING = PGVector.connection_string_from_db_params(
        driver="psycopg2",
        host=os.environ.get("POSTGRES_HOST"),
        port=5432,
        database=os.environ.get("DATABASE_NAME"),
        user=os.environ.get("USERNAME"),
        password=os.environ.get("PASSWORD"),
    )
    COLLECTION_NAME = os.environ.get("COLLECTION_NAME"),
    
    vector_search = PGVector(
        collection_name=COLLECTION_NAME,
        connection_string=CONNECTION_STRING,
        embedding_function=embedding_model,
    )
    
    def format_docs(docs):
        return "\n\n".join([d.page_content for d in docs])
    
    st.title("🤖 Chatbot")
    if "messages" not in st.session_state:
        st.session_state["messages"] = [{"role": "ai", "content": "How can I help you?"}]
    
    if "memory" not in st.session_state:
        st.session_state["memory"] = ConversationBufferWindowMemory(
            memory_key="history",
            ai_prefix="Bot",
            human_prefix="User",
            k=3,
        )
    
    for message in st.session_state.messages:
        with st.chat_message(message["role"]):
            st.write(message["content"])
    
    if chat_input := st.chat_input():
        with st.chat_message("human"):
            st.write(chat_input)
            st.session_state.messages.append({"role": "human", "content": chat_input})
    
        found_docs = vector_search.similarity_search(chat_input)
        context = format_docs(found_docs)
    
        prompt_value = prompt_template.format_messages(name="Bot", query=chat_input, context=context, history=st.session_state.memory.load_memory_variables({}))
        with st.chat_message("ai"):
            with st.spinner("Typing..."):
                content = ""
                with st.empty():
                    for chunk in vertexAI.stream(prompt_value):
                        content += chunk.content
                        st.write(content)
                st.session_state.messages.append({"role": "ai", "content": content})
    
        st.session_state.memory.save_context({"input": chat_input}, {"output": content})
    
    

    Weaviate

    # Copyright 2024 Google LLC
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #     https://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    
    from langchain_google_vertexai import ChatVertexAI
    from langchain.prompts import ChatPromptTemplate
    from langchain_google_vertexai import VertexAIEmbeddings
    from langchain.memory import ConversationBufferWindowMemory
    import weaviate
    from weaviate.connect import ConnectionParams
    from langchain_weaviate.vectorstores import WeaviateVectorStore
    import streamlit as st
    import os
    
    vertexAI = ChatVertexAI(model_name=os.getenv("VERTEX_AI_MODEL_NAME", "gemini-2.5-flash-preview-04-17"), streaming=True, convert_system_message_to_human=True)
    prompt_template = ChatPromptTemplate.from_messages(
        [
            ("system", "You are a helpful assistant who helps in finding answers to questions using the provided context."),
            ("human", """
            The answer should be based on the text context given in "text_context" and the conversation history given in "conversation_history" along with its Caption: \n
            Base your response on the provided text context and the current conversation history to answer the query.
            Select the most relevant information from the context.
            Generate a draft response using the selected information. Remove duplicate content from the draft response.
            Generate your final response after adjusting it to increase accuracy and relevance.
            Now only show your final response!
            If you do not know the answer or context is not relevant, response with "I don't know".
    
            text_context:
            {context}
    
            conversation_history:
            {history}
    
            query:
            {query}
            """),
        ]
    )
    
    embedding_model = VertexAIEmbeddings("text-embedding-005")
    
    auth_config = weaviate.auth.AuthApiKey(api_key=os.getenv("APIKEY"))
    client = weaviate.WeaviateClient(
        connection_params=ConnectionParams.from_params(
            http_host=os.getenv("WEAVIATE_ENDPOINT"),
            http_port="80",
            http_secure=False,
            grpc_host=os.getenv("WEAVIATE_GRPC_ENDPOINT"),
            grpc_port="50051",
            grpc_secure=False,
        ),
        auth_client_secret=auth_config
    )
    client.connect()
    
    vector_search = WeaviateVectorStore.from_documents([],embedding_model,client=client, index_name="trainingdocs")
    
    def format_docs(docs):
        return "\n\n".join([d.page_content for d in docs])
    
    st.title("🤖 Chatbot")
    if "messages" not in st.session_state:
        st.session_state["messages"] = [{"role": "ai", "content": "How can I help you?"}]
    
    if "memory" not in st.session_state:
        st.session_state["memory"] = ConversationBufferWindowMemory(
            memory_key="history",
            ai_prefix="Bot",
            human_prefix="User",
            k=3,
        )
    
    for message in st.session_state.messages:
        with st.chat_message(message["role"]):
            st.write(message["content"])
    
    if chat_input := st.chat_input():
        with st.chat_message("human"):
            st.write(chat_input)
            st.session_state.messages.append({"role": "human", "content": chat_input})
    
        found_docs = vector_search.similarity_search(chat_input)
        context = format_docs(found_docs)
    
        prompt_value = prompt_template.format_messages(name="Bot", query=chat_input, context=context, history=st.session_state.memory.load_memory_variables({}))
        with st.chat_message("ai"):
            with st.spinner("Typing..."):
                content = ""
                with st.empty():
                    for chunk in vertexAI.stream(prompt_value):
                        content += chunk.content
                        st.write(content)
                st.session_state.messages.append({"role": "ai", "content": content})
    
        st.session_state.memory.save_context({"input": chat_input}, {"output": content})
    
    

    Effectuer un nettoyage

    Pour éviter que les ressources utilisées lors de ce tutoriel soient facturées sur votre compte Google Cloud, supprimez le projet contenant les ressources, ou conservez le projet et supprimez les ressources individuelles.

    Supprimer le projet

    Le moyen le plus simple d'empêcher la facturation est de supprimer le projet que vous avez créé pour ce tutoriel.

    Delete a Google Cloud project:

    gcloud projects delete PROJECT_ID

    Si vous avez supprimé le projet, le nettoyage est terminé. Si vous n'avez pas supprimé le projet, suivez les étapes ci-après afin de supprimer les ressources individuelles.

    Supprimer des ressources individuelles

    1. Supprimez le dépôt Artifact Registry :

      gcloud artifacts repositories delete ${KUBERNETES_CLUSTER_PREFIX}-images \
          --location=${REGION} \
          --async
      

      Lorsque vous y êtes invité, saisissez y.

    2. Supprimez le bucket Cloud Storage et le déclencheur Eventarc :

      export GOOGLE_OAUTH_ACCESS_TOKEN=$(gcloud auth print-access-token)
      terraform -chdir=vector-database/terraform/cloud-storage destroy \
        -var project_id=${PROJECT_ID} \
        -var region=${REGION} \
        -var cluster_prefix=${KUBERNETES_CLUSTER_PREFIX} \
        -var db_namespace=${DB_NAMESPACE}
      

      Lorsque vous y êtes invité, saisissez yes.

      Eventarc exige que vous disposiez d'une cible de point de terminaison valide à la fois lors de la création et de la suppression.

    Étape suivante