Developers & Practitioners

Build an AI agent for trip planning with Gemini 1.5 Pro: A step-by-step guide

November 22, 2024

Kelci Mensah

Cloud Architect, Google

Dagmawe Legesse

Cloud Engineer

Join us at Google Cloud Next

April 9-11 in Las Vegas

Gemini 1.5 Pro is creating new possibilities for developers to build AI agents that streamline the customer experience. In this post, we'll focus on a practical application that has emerged in the travel industry – building an AI-powered trip planning agent. You'll learn how to connect your agent to external data sources like event APIs, enabling it to generate personalized travel itineraries based on real-time information.

Understanding the core concepts

Function calling: Allows developers to connect Gemini models (all Gemini models except Gemini 1.0 Pro Vision) with external systems, APIs, and data sources. This enables the AI to retrieve real-time information and perform actions, making it more dynamic and versatile.
Grounding: Enhances Gemini' model’s ability to access and process information from external sources like documents, knowledge bases, and the web, leading to more accurate and up-to-date responses.

By combining these features, we can create an AI agent that can understand user requests, retrieve relevant information from the web, and provide personalized recommendations.

Step-by-step: Function calling with grounding

Let’s run through a scenario:

Let’s say you’re an AI engineer tasked with creating an AI agent that helps users plan trips by finding local events and potential hotels to stay at. Your company has given you full creative freedom to build a minimal viable product using Google’s generative AI products, so you’ve chosen to use Gemini 1.5 Pro and loop in other external APIs.

The first step is to define potential queries that any user might enter into the Gemini chat. This will help clarify development requirements and ensure the final product meets the standards of both users and stakeholders. Here are some examples:

“I’m bored, what is there to do today?”
“I would like to take me and my two kids somewhere warm because spring break starts next week. Where should I take them?”
“My friend will be moving to Atlanta soon for a job. What fun events do they have going on during the weekends?”

From these sample queries, it looks like we’ll need to use an events API and a hotels API for localized information. Next, let’s set up our development environment.

Notebook setup

To use Gemini 1.5 Pro for development, you’ll need to either create or use an existing project in Google Cloud. Follow the official instructions that are linked here before continuing. Working in a Jupyter notebook environment is one of the easiest way to get started developing with Gemini 1.5 Pro. You can either use Google Colab or follow along in your own local environment.

First, you’ll need to install the latest version of the Vertex AI SDK for Python, import the necessary modules, and initialize the Gemini model:

1. Add a code cell to install the necessary libraries. This demo notebook requires the use of the google-cloud-aiplatform>=1.52 Python module.

lang-py

2. Add another code cell to import the necessary Python packages.

lang-py

3. Now we can initialize Vertex AI with your exact project ID. Enter your information in between the variable quotes so you can reuse them. Uncomment the gcloud authentication commands if necessary.

lang-py

API key configuration

For this demo, we will also be using an additional API to generate information for the events and hotels. We'll be using Google’s SerpAPI for both, so be sure to create an account and select a subscription plan that fits your needs. This demo can be completed using their free tier. Once that’s done, you'll find your unique API key in your account dashboard.

Once you have the API keys, you can pass them to the SDK in one of two ways:

Put the key in the GOOGLE_API_KEY environment variable (where the SDK will automatically pick it up from there)
Pass the key using genai.configure(api_key = . . .)

Navigate to https://serpapi.com and replace the contents of the variable below between the quotes with your specific API key:

lang-py

Defining custom functions for function calling

In this step, you'll define custom functions in order to pass them to Gemini 1.5 Pro and incorporate the API outputs back into the model for more accurate responses. We'll first define a function for the events API.

To use function calling, pass a list of functions to the tools parameter when creating a generative model. The model uses the function name, docstring, parameters, and parameter type annotations to decide if it needs the function to best answer a prompt.

lang-py

Now we will follow the same format to define a function for the hotels API.

lang-py

Declare the custom function as a tool

The function declaration below describes the function for the events API. It lets the Gemini model know this API retrieves event information based on a query and optional filters.

lang-py

event_function = FunctionDeclaration(
    name = "event_api",
    description = "Retrieves event information based on a query and optional filters.",
    parameters = {
        "type":"object",
        "properties": {
            "query":{
                "type":"string",
                "description":"The query you want to search for (e.g., 'Events in Austin, TX')."
            },
            "htichips":{
                "type":"string",
                "description":"""Optional filters used for search. Default: 'date:today'.
                
                Options:
                - 'date:today' - Today's events
                - 'date:tomorrow' - Tomorrow's events
                - 'date:week' - This week's events
                - 'date:weekend' - This weekend's events
                - 'date:next_week' - Next week's events
                - 'date:month' - This month's events
                - 'date:next_month' - Next month's events
                - 'event_type:Virtual-Event' - Online events
                """,
            }
    },
    "required": [
            "query"
        ]
    },
)

Again, we will follow the same format for the hotels API.

lang-py

hotel_function = FunctionDeclaration(
    name="hotel_api",
    description="Retrieves hotel information based on location, dates, and optional preferences.",
    parameters= {
        "type":"object",
        "properties": {
            "query":{
                "type":"string",
                "description":"Parameter defines the search query. You can use anything that you would use in a regular Google Hotels search."
            },
            "check_in_date":{
                "type":"string",
                "description":"Check-in date in YYYY-MM-DD format (e.g., '2024-04-30')."
            },
           "check_out_date":{
               "type":"string",
               "description":"Check-out date in YYYY-MM-DD format (e.g., '2024-05-01')."
           },
           "hotel_class":{
               "type":"integer",
                "description":"""hotel class.

Options:
                  - 2: 2-star
                  - 3: 3-star
                  - 4: 4-star
                  - 5: 5-star
                
                  For multiple classes, separate with commas (e.g., '2,3,4')."""
           },
           "adults":{
               "type": "integer",
               "description": "Number of adults. Only integers, no decimals or floats (e.g., 1 or 2)"
           }
    },
    "required": [
            "query",
            "check_in_date",
            "check_out_date"
        ]
    },
)

Consider configuring safety settings for the model

Safety settings in Gemini exist to prevent the generation of harmful or unsafe content. They act as filters that analyze the generated output and block or flag anything that might be considered inappropriate, offensive, or dangerous. This is good practice when you’re developing using generative AI content.

lang-py

Pass the tool and start a chat

Here we’ll be passing the tool as a function declaration and starting the chat with Gemini. Using the chat.send_message(“ . . . “) functionality, you can send messages to the model in a conversation-like structure.

lang-py

Build the agent

Next we will create a callable hashmap to map the tool name to the tool function so that it can be called within the agent function. We will also implement prompt engineering (mission prompt) to better prompt the model to handle user inputs and equip the model with the datetime.

lang-py

CallableFunctions = {
    "event_api": event_api,
    "hotel_api": hotel_api
}

today = date.today()

def mission_prompt(prompt:str):
    return f"""
    Thought: I need to understand the user's request and determine if I need to use any tools to assist them.
    Action: 
    
    - If the user's request needs following APIs from available ones: weather, event, hotel, and I have all the required parameters, call the corresponding API.
    - Otherwise, if I need more information to call an API, I will ask the user for it.
    - If the user's request doesn't need an API call or I don't have enough information to call one, respond to the user directly using the chat history.
    - Respond with the final answer only

[QUESTION] 
    {prompt}

[DATETIME]
    {today}

""".strip()

def Agent(user_prompt):
    prompt = mission_prompt(user_prompt)
    response = chat.send_message(prompt)
    tools = response.candidates[0].function_calls
    while tools:
        for tool in tools:
            function_res = CallableFunctions[tool.name](**tool.args)
            response = chat.send_message(Content(role="function_response",parts=[Part.from_function_response(name=tool.name, response={"result": function_res})]))
        tools = response.candidates[0].function_calls
    return response.text

Test the agent

Below are some sample queries you can try to test the chat capabilities of the agent. Don’t forget to test out a query of your own!

lang-py

Wrapping up

That’s all! Gemini 1.5 Pro’s function calling and grounding features enhances its capabilities, enabling developers to connect to external tools and improve model results. This integration enables Gemini models to provide up-to-date information while minimizing hallucinations.

If you’re looking for more hands-on tutorials and code examples, check out some of Google’s Codelabs (such as How to Interact with APIs Using Function Calling in Gemini) to guide you through examples of building a beginner function calling application.

Posted in

https://storage.googleapis.com/gweb-cloudblog-publish/images/1_-_GenAI_Roadshow.max-600x600.png

Developers & Practitioners

Deep dive into AI with Google Cloud’s global generative AI roadshow

By Christina Lin • 4-minute read

https://storage.googleapis.com/gweb-cloudblog-publish/images/0-hero-hpc.max-700x700.png

Networking

Networking support for AI workloads

By Ammett Williams • 4-minute read

AI & Machine Learning

How to build a strong brand logo with Imagen 3 and Gemini

By Layolin Jesudhass • 4-minute read

Financial Services

Getting started with Swift’s Alliance Connect Virtual on Google Cloud

By Maria Alejandra Emmanuelli • 11-minute read

Build an AI agent for trip planning with Gemini 1.5 Pro: A step-by-step guide

Kelci Mensah

Dagmawe Legesse

Join us at Google Cloud Next

Understanding the core concepts

Step-by-step: Function calling with grounding

Notebook setup

API key configuration

Defining custom functions for function calling

Declare the custom function as a tool

Consider configuring safety settings for the model

Pass the tool and start a chat

Build the agent

Test the agent

Wrapping up

Related articles

Deep dive into AI with Google Cloud’s global generative AI roadshow

Networking support for AI workloads

How to build a strong brand logo with Imagen 3 and Gemini

Getting started with Swift’s Alliance Connect Virtual on Google Cloud