자체 배포된 개방형 모델 엔드포인트와 함께 Vertex AI RAG Engine을 사용합니다.
코드 샘플에 사용된 변수를 바꿉니다.
PROJECT_ID: 프로젝트 ID.
LOCATION: 요청을 처리할 리전.
ENDPOINT_ID: 엔드포인트 ID입니다.
# Create a model instance with your self-deployed open model endpointrag_model=GenerativeModel("projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID",tools=[rag_retrieval_tool])
Vertex AI의 관리형 API가 있는 모델
Vertex AI에서 관리형 API를 사용하고 Vertex AI RAG Engine을 지원하는 모델은 다음과 같습니다.
다음 코드 샘플은 Gemini GenerateContent API를 사용하여 생성형 모델 인스턴스를 만드는 방법을 보여줍니다. 모델 ID(/publisher/meta/models/llama-3.1-405B-instruct-maas)는 모델 카드에서 확인할 수 있습니다.
코드 샘플에 사용된 변수를 바꿉니다.
PROJECT_ID: 프로젝트 ID.
LOCATION: 요청을 처리할 리전.
RAG_RETRIEVAL_TOOL: RAG 검색 도구입니다.
# Create a model instance with Llama 3.1 MaaS endpointrag_model=GenerativeModel("projects/PROJECT_ID/locations/LOCATION/publisher/meta/models/llama-3.1-405B-instruct-maas",tools=RAG_RETRIEVAL_TOOL)
다음 코드 샘플에서는 OpenAI 호환 ChatCompletions API를 사용하여 모델 응답을 생성하는 방법을 보여줍니다.
코드 샘플에 사용된 변수를 바꿉니다.
PROJECT_ID: 프로젝트 ID.
LOCATION: 요청을 처리할 리전.
MODEL_ID: 콘텐츠 생성을 위한 LLM 모델. 예를 들면 meta/llama-3.1-405b-instruct-maas입니다.
INPUT_PROMPT: 콘텐츠 생성을 위해 LLM에 전송된 텍스트입니다. Vertex AI Search의 문서와 관련된 프롬프트를 사용합니다.
RAG_CORPUS_ID: RAG 쿠퍼스 리소스의 ID.
ROLE: 역할
USER: 사용자 이름.
CONTENT: 콘텐츠
# Generate a response with Llama 3.1 MaaS endpointresponse=client.chat.completions.create(model="MODEL_ID",messages=[{"ROLE":"USER","content":"CONTENT"}],extra_body={"extra_body":{"google":{"vertex_rag_store":{"rag_resources":{"rag_corpus":"RAG_CORPUS_ID"},"similarity_top_k":10}}}},)
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-09-04(UTC)"],[],[],null,["# Vertex AI RAG Engine supported models\n\n| The [VPC-SC security controls](/vertex-ai/generative-ai/docs/security-controls) and\n| CMEK are supported by Vertex AI RAG Engine. Data residency and AXT security controls aren't\n| supported.\n\nThis page lists Gemini models, self-deployed models, and models with\nmanaged APIs on Vertex AI that support Vertex AI RAG Engine.\n\nGemini models\n-------------\n\nThe following table lists the Gemini models and their versions that\nsupport Vertex AI RAG Engine:\n\n- [Gemini 2.5 Flash-Lite](/vertex-ai/generative-ai/docs/models/gemini/2-5-flash-lite)\n- [Gemini 2.5 Pro](/vertex-ai/generative-ai/docs/models/gemini/2-5-pro)\n- [Gemini 2.5 Flash](/vertex-ai/generative-ai/docs/models/gemini/2-5-flash)\n- [Gemini 2.0 Flash](/vertex-ai/generative-ai/docs/models/gemini/2-0-flash)\n\nFine-tuned Gemini models are unsupported when the Gemini\nmodels use Vertex AI RAG Engine.\n\nSelf-deployed models\n--------------------\n\nVertex AI RAG Engine supports all models in\n[Model Garden](/vertex-ai/generative-ai/docs/model-garden/explore-models).\n\nUse Vertex AI RAG Engine with your self-deployed open model endpoints.\n\nReplace the variables used in the code sample:\n\n- **\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e**: Your project ID.\n- **\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e**: The region to process your request.\n- **\u003cvar translate=\"no\"\u003eENDPOINT_ID\u003c/var\u003e**: Your endpoint ID.\n\n # Create a model instance with your self-deployed open model endpoint\n rag_model = GenerativeModel(\n \"projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e/locations/\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e/endpoints/\u003cvar translate=\"no\"\u003eENDPOINT_ID\u003c/var\u003e\",\n tools=[rag_retrieval_tool]\n )\n\nModels with managed APIs on Vertex AI\n-------------------------------------\n\nThe models with managed APIs on Vertex AI that support\nVertex AI RAG Engine include the following:\n\n- [Mistral on Vertex AI](/vertex-ai/generative-ai/docs/partner-models/mistral)\n- [Llama 3.1 and 3.2](/vertex-ai/generative-ai/docs/partner-models/llama)\n\nThe following code sample demonstrates how to use the Gemini\n`GenerateContent` API to create a generative model instance. The model ID,\n`/publisher/meta/models/llama-3.1-405B-instruct-maas`, is found in the\n[model card](/vertex-ai/generative-ai/docs/model-garden/explore-models).\n\nReplace the variables used in the code sample:\n\n- **\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e**: Your project ID.\n- **\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e**: The region to process your request.\n- **\u003cvar translate=\"no\"\u003eRAG_RETRIEVAL_TOOL\u003c/var\u003e**: Your RAG retrieval tool.\n\n # Create a model instance with Llama 3.1 MaaS endpoint\n rag_model = GenerativeModel(\n \"projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e/locations/\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e/publisher/meta/models/llama-3.1-405B-instruct-maas\",\n tools=\u003cvar translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-n\"\u003eRAG_RETRIEVAL_TOOL\u003c/span\u003e\u003c/var\u003e\n )\n\nThe following code sample demonstrates how to use the OpenAI compatible\n`ChatCompletions` API to generate a model response.\n\nReplace the variables used in the code sample:\n\n- **\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e**: Your project ID.\n- **\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e**: The region to process your request.\n- **\u003cvar translate=\"no\"\u003eMODEL_ID\u003c/var\u003e** : LLM model for content generation. For example, `meta/llama-3.1-405b-instruct-maas`.\n- **\u003cvar translate=\"no\"\u003eINPUT_PROMPT\u003c/var\u003e**: The text sent to the LLM for content generation. Use a prompt relevant to the documents in Vertex AI Search.\n- **\u003cvar translate=\"no\"\u003eRAG_CORPUS_ID\u003c/var\u003e**: The ID of the RAG corpus resource.\n- **\u003cvar translate=\"no\"\u003eROLE\u003c/var\u003e**: Your role.\n- **\u003cvar translate=\"no\"\u003eUSER\u003c/var\u003e**: Your username.\n- **\u003cvar translate=\"no\"\u003eCONTENT\u003c/var\u003e**: Your content.\n\n # Generate a response with Llama 3.1 MaaS endpoint\n response = client.chat.completions.create(\n model=\"\u003cvar translate=\"no\"\u003eMODEL_ID\u003c/var\u003e\",\n messages=[{\"\u003cvar translate=\"no\"\u003eROLE\u003c/var\u003e\": \"\u003cvar translate=\"no\"\u003eUSER\u003c/var\u003e\", \"content\": \"\u003cvar translate=\"no\"\u003eCONTENT\u003c/var\u003e\"}],\n extra_body={\n \"extra_body\": {\n \"google\": {\n \"vertex_rag_store\": {\n \"rag_resources\": {\n \"rag_corpus\": \"\u003cvar translate=\"no\"\u003eRAG_CORPUS_ID\u003c/var\u003e\"\n },\n \"similarity_top_k\": 10\n }\n }\n }\n },\n )\n\nWhat's next\n-----------\n\n- [Use Embedding models with Vertex AI RAG Engine](/vertex-ai/generative-ai/docs/use-embedding-models)."]]