Try Gemini 1.5 models, our newest multimodal models in Vertex AI, and see what you can build with a 1M token context window. Try Gemini 1.5 models, our newest multimodal models in Vertex AI, and see what you can build with a 1M token context window.

Return the response from the LLM

This sample demonstrates how to run a retrieval query to get a response from the LLM.

Explore further

For detailed documentation that includes this code sample, see the following:

LlamaIndex on Vertex AI for RAG API

Code sample

Python

Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.


from vertexai.preview import rag
import vertexai

# TODO(developer): Update and un-comment below lines
# project_id = "PROJECT_ID"
# rag_corpus_id = "9183965540115283968" # Only one corpus is supported at this time

# Initialize Vertex AI API once per session
vertexai.init(project=project_id, location="us-central1")

response = rag.retrieval_query(
    rag_resources=[
        rag.RagResource(
            rag_corpus=rag_corpus_id,
            # Supply IDs from `rag.list_files()`.
            # rag_file_ids=["rag-file-1", "rag-file-2", ...],
        )
    ],
    text="What is RAG and why it is helpful?",
    similarity_top_k=10,  # Optional
    vector_distance_threshold=0.5,  # Optional
)
print(response)

What's next

To search and filter code samples for other Google Cloud products, see the Google Cloud sample browser.