Build a data agent using the Python SDK

This page shows you how to use the Python SDK to make requests to the Conversational Analytics API. The sample Python code demonstrates how to complete the following tasks:

Authenticate and set up your environment

To use the Python SDK for the Conversational Analytics API, follow the instructions in the Conversational Analytics API SDK Colaboratory notebook to download and install the SDK. Note that the download method and the contents of the SDK Colab are subject to change.

After you've completed the setup instructions in the notebook, you can use the following code to import the required SDK libraries, authenticate your Google Account within a Colaboratory environment, and initialize a client for making API requests:

from google.colab import auth
auth.authenticate_user()

from google.cloud import geminidataanalytics

data_agent_client = geminidataanalytics.DataAgentServiceClient()
data_chat_client = geminidataanalytics.DataChatServiceClient()

Specify the billing project and system instructions

The following sample Python code defines the billing project and system instructions that are used throughout your script:

# Billing project
billing_project = "my_project_name"

# System instructions
system_instruction = "Help the user analyze their data."

Replace the sample values as follows:

  • my_project_name: The ID of your billing project that has the required APIs enabled.
  • Help the user analyze their data.: System instructions to guide the agent's behavior and customize it for your data needs. For example, you can use system instructions to define business terms, control response length, or set data formatting. Ideally, define system instructions by using the recommended YAML format in Write effective system instructions to provide detailed and structured guidance.

Connect to a data source

The following Python code samples show how to define the connection details for the Looker, BigQuery, or Looker Studio data source that your agent will query to answer questions.

Connect to Looker data

The following code examples show how to define the details for a connection to a Looker Explore with either API keys or an access token.

API keys

You can establish a connection with a Looker instance with generated Looker API keys, as described in Authenticate and connect to a data source with the Conversational Analytics API.

looker_client_id = "my_looker_client_id"
looker_client_secret = "my_looker_client_secret"
looker_instance_uri = "https://my_company.looker.com"
lookml_model = "my_model"
explore = "my_explore"

looker_explore_reference = geminidataanalytics.LookerExploreReference()
looker_explore_reference.looker_instance_uri = looker_instance_uri
looker_explore_reference.lookml_model = lookml_model
looker_explore_reference.explore = explore

credentials = geminidataanalytics.Credentials()
credentials.oauth.secret.client_id = looker_client_id
credentials.oauth.secret.client_secret = looker_client_secret

datasource_references = geminidataanalytics.DatasourceReferences()
datasource_references.looker.explore_references = [looker_explore_reference]

Replace the sample values as follows:

  • my_looker_client_id: The client ID of your generated Looker API key.
  • my_looker_client_secret: The client secret of your generated Looker API key.
  • https://my_company.looker.com: The complete URL of your Looker instance.
  • my_model: The name of the LookML model that includes the Explore that you want to connect to.
  • my_explore: The name of the Looker Explore that you want the data agent to query.

Access token

You can establish a connection with a Looker instance by using an access token, as described in Authenticate and connect to a data source with the Conversational Analytics API.

looker_access_token = "my_access_token"
looker_instance_uri = "https://my_company.looker.com"
lookml_model = "my_model"
explore = "my_explore"

looker_explore_reference = geminidataanalytics.LookerExploreReference()
looker_explore_reference.looker_instance_uri = looker_instance_uri
looker_explore_reference.lookml_model = lookml_model
looker_explore_reference.explore = explore

credentials = geminidataanalytics.Credentials()
credentials.oauth.token.access_token = looker_access_token

datasource_references = geminidataanalytics.DatasourceReferences()
datasource_references.looker.explore_references = [looker_explore_reference]

Replace the sample values as follows:

  • my_access_token: The access_token value that you generate to authenticate to Looker.
  • https://my_company.looker.com: The complete URL of your Looker instance.
  • my_model: The name of the LookML model that includes the Explore that you want to connect to.
  • my_explore: The name of the Looker Explore that you want the data agent to query.

Connect to BigQuery data

With the Conversational Analytics API, you can connect to and query up to 10 BigQuery tables at a time.

The following sample code defines a connection to a single BigQuery table.

bq_project_id = "my_project_id"
bq_dataset_id = "my_dataset_id"
bq_table_id = "my_table_id"

bigquery_table_reference = geminidataanalytics.BigQueryTableReference()
bigquery_table_reference.project_id = bq_project_id
bigquery_table_reference.dataset_id = bq_dataset_id
bigquery_table_reference.table_id = bq_table_id

# Connect to your data source
datasource_references = geminidataanalytics.DatasourceReferences()
datasource_references.bq.table_references = [bigquery_table_reference]

Replace the sample values as follows:

  • my_project_id: The ID of the Google Cloud project that contains the BigQuery dataset and table that you want to connect to. To connect to a public dataset, specify bigquery-public-data.
  • my_dataset_id: The ID of the BigQuery dataset. For example, san_francisco.
  • my_table_id: The ID of the BigQuery table. For example, street_trees.

Connect to Looker Studio data

The following sample code defines a connection to a Looker Studio data source.

studio_datasource_id = "my_datasource_id"

studio_references = geminidataanalytics.StudioDatasourceReference()
studio_references.datasource_id = studio_datasource_id

## Connect to your data source
datasource_references.studio.studio_references = [studio_references]

In the previous example, replace my_datasource_id with the data source ID.

Set up context for stateful or stateless chat

The Conversational Analytics API supports multi-turn conversations, which let users ask follow-up questions that build on previous context. The following sample Python code demonstrates how to set up context for either stateful or stateless chat:

  • Stateful chat: Google Cloud stores and manages the conversation history. Stateful chat is inherently multi-turn, as the API retains context from previous messages. You only need to send the current message for each turn.
  • Stateless chat: Your application manages the conversation history. You must include the entire conversation history with each new message. For detailed examples on how to manage multi-turn conversations in stateless mode, see Create a stateless multi-turn conversation.

Stateful chat

The following code sample sets up context for stateful chat, where Google Cloud stores and manages the conversation history. You can also optionally enable advanced analysis with Python by including the line published_context.options.analysis.python.enabled = True in the following sample code.

# Set up context for stateful chat
published_context = geminidataanalytics.Context()
published_context.system_instruction = system_instruction
published_context.datasource_references = datasource_references
# Optional: To enable advanced analysis with Python, include the following line:
published_context.options.analysis.python.enabled = True

Stateless chat

The following sample code sets up context for stateless chat, where you must send the entire conversation history with each message. You can also optionally enable advanced analysis with Python by including the line inline_context.options.analysis.python.enabled = True in the following sample code.

# Set up context for stateless chat
# datasource_references.looker.credentials = credentials
inline_context = geminidataanalytics.Context()
inline_context.system_instruction = system_instruction
inline_context.datasource_references = datasource_references
# Optional: To enable advanced analysis with Python, include the following line:
inline_context.options.analysis.python.enabled = True

Create a data agent

The following sample Python code makes an API request to create a data agent, which you can then use to have a conversation about your data. The data agent is configured with the specified data source, system instructions, and context.

data_agent_id = "data_agent_1"

data_agent = geminidataanalytics.DataAgent()
data_agent.data_analytics_agent.published_context = published_context
data_agent.name = f"projects/{billing_project}/locations/global/dataAgents/{data_agent_id}" # Optional

request = geminidataanalytics.CreateDataAgentRequest(
    parent=f"projects/{billing_project}/locations/global",
    data_agent_id=data_agent_id, # Optional
    data_agent=data_agent,
)

try:
    data_agent_client.create_data_agent(request=request)
    print("Data Agent created")
except Exception as e:
    print(f"Error creating Data Agent: {e}")

In the previous example, replace the value data_agent_1 with a unique identifier for the data agent.

Retrieve a data agent

The following sample Python code demonstrates how to make an API request to retrieve a data agent that you previously created.

# Initialize request arguments
data_agent_id = "data_agent_1"
request = geminidataanalytics.GetDataAgentRequest(
    name=f"projects/{billing_project}/locations/global/dataAgents/{data_agent_id}",
)

# Make the request
response = data_agent_client.get_data_agent(request=request)

# Handle the response
print(response)

In the previous example, replace the value data_agent_1 with the unique identifier for the data agent that you want to retrieve.

Create a conversation

The following sample Python code makes an API request to create a conversation.

# Initialize request arguments
data_agent_id = "data_agent_1"
conversation_id = "conversation_1"

conversation = geminidataanalytics.Conversation()
conversation.agents = [f'projects/{billing_project}/locations/global/dataAgents/{data_agent_id}']
conversation.name = f"projects/{billing_project}/locations/global/conversations/{conversation_id}"

request = geminidataanalytics.CreateConversationRequest(
    parent=f"projects/{billing_project}/locations/global",
    conversation_id=conversation_id,
    conversation=conversation,
)

# Make the request
response = data_chat_client.create_conversation(request=request)

# Handle the response
print(response)

Replace the sample values as follows:

  • data_agent_1: The ID of the data agent, as defined in the sample code block in Create a data agent.
  • conversation_1: A unique identifier for the conversation.

Use the API to ask questions

After you create a data agent and a conversation, the following sample Python code sends a query to the agent. The code uses the context that you set up for stateful or stateless chat. The API returns a stream of messages that represent the steps that the agent takes to answer the query.

Stateful chat

Send a stateful chat request with a Conversation reference

You can send a stateful chat request to the data agent by referencing a Conversation resource that you previously created.

# Create a request that contains a single user message (your question)
question = "Which species of tree is most prevalent?"
messages = [geminidataanalytics.Message()]
messages[0].user_message.text = question

data_agent_id = "data_agent_1"
conversation_id = "conversation_1"

# Create a conversation_reference
conversation_reference = geminidataanalytics.ConversationReference()
conversation_reference.conversation = f"projects/{billing_project}/locations/global/conversations/{conversation_id}"
conversation_reference.data_agent_context.data_agent = f"projects/{billing_project}/locations/global/dataAgents/{data_agent_id}"
# conversation_reference.data_agent_context.credentials = credentials

# Form the request
request = geminidataanalytics.ChatRequest(
    parent = f"projects/{billing_project}/locations/global",
    messages = messages,
    conversation_reference = conversation_reference
)

# Make the request
stream = data_chat_client.chat(request=request)

# Handle the response
for response in stream:
    show_message(response)

Replace the sample values as follows:

  • Which species of tree is most prevalent?: A natural language question to send to the data agent.
  • data_agent_1: The unique identifier for the data agent, as defined in Create a data agent.
  • conversation_1: The unique identifier for the conversation, as defined in Create a conversation.

Stateless chat

The following code samples demonstrate how to send a query to the data agent when you've set up context for stateless chat. You can send stateless queries by referencing a previously defined DataAgent resource or by using inline context in the request.

Send a stateless chat request with a DataAgent reference

You can send a query to the data agent by referencing a DataAgent resource that you previously created.

# Create a request that contains a single user message (your question)
question = "Which species of tree is most prevalent?"
messages = [geminidataanalytics.Message()]
messages[0].user_message.text = question

data_agent_id = "data_agent_1"

data_agent_context = geminidataanalytics.DataAgentContext()
data_agent_context.data_agent = f"projects/{billing_project}/locations/global/dataAgents/{data_agent_id}"
# data_agent_context.credentials = credentials

# Form the request
request = geminidataanalytics.ChatRequest(
    parent=f"projects/{billing_project}/locations/global",
    messages=messages,
    data_agent_context = data_agent_context
)

# Make the request
stream = data_chat_client.chat(request=request)

# Handle the response
for response in stream:
    show_message(response)

Replace the sample values as follows:

  • Which species of tree is most prevalent?: A natural language question to send to the data agent.
  • data_agent_1: The unique identifier for the data agent, as defined in Create a data agent.

Send a stateless chat request with inline context

The following sample code demonstrates how to use the inline_context parameter to provide context directly within your stateless chat request.

# Create a request that contains a single user message (your question)
question = "Which species of tree is most prevalent?"
messages = [geminidataanalytics.Message()]
messages[0].user_message.text = question

request = geminidataanalytics.ChatRequest(
    inline_context=inline_context,
    parent=f"projects/{billing_project}/locations/global",
    messages=messages,
)

# Make the request
stream = data_chat_client.chat(request=request)

# Handle the response
for response in stream:
    show_message(response)

In the previous example, replace Which species of tree is most prevalent? with a natural language question to send to the data agent.

Create a stateless multi-turn conversation

To ask follow-up questions in a stateless conversation, your application must manage the conversation's context by sending the entire message history with each new request. The following example shows how to create a multi-turn conversation by referencing a data agent or by using inline context to provide the data source directly.

# List that is used to track previous turns and is reused across requests
conversation_messages = []

data_agent_id = "data_agent_1"

# Use data agent context
data_agent_context = geminidataanalytics.DataAgentContext()
data_agent_context.data_agent = f"projects/{billing_project}/locations/global/dataAgents/{data_agent_id}"
# data_agent_context.credentials = credentials

# Helper function for calling the API
def multi_turn_Conversation(msg):

    message = geminidataanalytics.Message()
    message.user_message.text = msg

    # Send a multi-turn request by including previous turns and the new message
    conversation_messages.append(message)

    request = geminidataanalytics.ChatRequest(
        parent=f"projects/{billing_project}/locations/global",
        messages=conversation_messages,
        # Use data agent context
        data_agent_context=data_agent_context,
        # Use inline context
        # inline_context=inline_context,
    )

    # Make the request
    stream = data_chat_client.chat(request=request)

    # Handle the response
    for response in stream:
      show_message(response)
      conversation_messages.append(response)

# Send the first turn request
multi_turn_Conversation("Which species of tree is most prevalent?")

# Send follow-up turn request
multi_turn_Conversation("Can you show me the results as a bar chart?")

In the previous example, replace the sample values as follows:

  • data_agent_1: The unique identifier for the data agent, as defined in the sample code block in Create a data agent.
  • Which species of tree is most prevalent?: A natural language question to send to the data agent.
  • Can you show me the results as a bar chart?: A follow-up question that builds on or refines the previous question.

Define helper functions

The following sample code contains helper function definitions that are used in the previous code samples. These functions help to parse the response from the API and display the results.

from pygments import highlight, lexers, formatters
import pandas as pd
import requests
import json as json_lib
import altair as alt
import IPython
from IPython.display import display, HTML

import proto
from google.protobuf.json_format import MessageToDict, MessageToJson

def handle_text_response(resp):
  parts = getattr(resp, 'parts')
  print(''.join(parts))

def display_schema(data):
  fields = getattr(data, 'fields')
  df = pd.DataFrame({
    "Column": map(lambda field: getattr(field, 'name'), fields),
    "Type": map(lambda field: getattr(field, 'type'), fields),
    "Description": map(lambda field: getattr(field, 'description', '-'), fields),
    "Mode": map(lambda field: getattr(field, 'mode'), fields)
  })
  display(df)

def display_section_title(text):
  display(HTML('<h2>{}</h2>'.format(text)))

def format_looker_table_ref(table_ref):
 return 'lookmlModel: {}, explore: {}, lookerInstanceUri: {}'.format(table_ref.lookml_model, table_ref.explore, table_ref.looker_instance_uri)

def format_bq_table_ref(table_ref):
  return '{}.{}.{}'.format(table_ref.project_id, table_ref.dataset_id, table_ref.table_id)

def display_datasource(datasource):
  source_name = ''
  if 'studio_datasource_id' in datasource:
   source_name = getattr(datasource, 'studio_datasource_id')
  elif 'looker_explore_reference' in datasource:
   source_name = format_looker_table_ref(getattr(datasource, 'looker_explore_reference'))
  else:
    source_name = format_bq_table_ref(getattr(datasource, 'bigquery_table_reference'))

  print(source_name)
  display_schema(datasource.schema)

def handle_schema_response(resp):
  if 'query' in resp:
    print(resp.query.question)
  elif 'result' in resp:
    display_section_title('Schema resolved')
    print('Data sources:')
    for datasource in resp.result.datasources:
      display_datasource(datasource)

def handle_data_response(resp):
  if 'query' in resp:
    query = resp.query
    display_section_title('Retrieval query')
    print('Query name: {}'.format(query.name))
    print('Question: {}'.format(query.question))
    print('Data sources:')
    for datasource in query.datasources:
      display_datasource(datasource)
  elif 'generated_sql' in resp:
    display_section_title('SQL generated')
    print(resp.generated_sql)
  elif 'result' in resp:
    display_section_title('Data retrieved')

    fields = [field.name for field in resp.result.schema.fields]
    d = {}
    for el in resp.result.data:
      for field in fields:
        if field in d:
          d[field].append(el[field])
        else:
          d[field] = [el[field]]

    display(pd.DataFrame(d))

def handle_chart_response(resp):
  def _value_to_dict(v):
    if isinstance(v, proto.marshal.collections.maps.MapComposite):
      return _map_to_dict(v)
    elif isinstance(v, proto.marshal.collections.RepeatedComposite):
      return [_value_to_dict(el) for el in v]
    elif isinstance(v, (int, float, str, bool)):
      return v
    else:
      return MessageToDict(v)

  def _map_to_dict(d):
    out = {}
    for k in d:
      if isinstance(d[k], proto.marshal.collections.maps.MapComposite):
        out[k] = _map_to_dict(d[k])
      else:
        out[k] = _value_to_dict(d[k])
    return out

  if 'query' in resp:
    print(resp.query.instructions)
  elif 'result' in resp:
    vegaConfig = resp.result.vega_config
    vegaConfig_dict = _map_to_dict(vegaConfig)
    alt.Chart.from_json(json_lib.dumps(vegaConfig_dict)).display();

def show_message(msg):
  m = msg.system_message
  if 'text' in m:
    handle_text_response(getattr(m, 'text'))
  elif 'schema' in m:
    handle_schema_response(getattr(m, 'schema'))
  elif 'data' in m:
    handle_data_response(getattr(m, 'data'))
  elif 'chart' in m:
    handle_chart_response(getattr(m, 'chart'))
  print('\n')