Guide agent behavior with authored context

This page describes the recommended structure for writing effective prompts for your Conversational Analytics API data agents. These prompts are authored context that you define as strings by using the system_instruction parameter. Well-structured system instructions can improve the accuracy and relevance of the responses that the API provides.

What are system instructions?

System instructions are user-defined guidance that developers can provide to shape the behavior of a data agent and to refine the API's responses. System instructions are part of the context that the API uses to answer questions. This context also includes connected data sources (BigQuery tables, Looker Explores, Looker Studio data sources), and conversation history (for multi-turn conversations).

By providing clear, structured guidance through system instructions, you can improve the agent's ability to interpret user questions and generate useful and accurate responses. Well-defined system instructions are especially useful if you're connecting to data such as BigQuery tables, where there may not be a predefined semantic layer like there is with a Looker Explore.

For example, you can use system instructions to provide the following types of guidance to an agent:

  • Business-specific logic: Define a "loyal" customer as one who has made more than five purchases within a certain timeframe.
  • Response formatting: Summarize all responses from your data agent in 20 words or fewer to save your users time.
  • Data presentation: Format all numbers to match the company's style guide.

Provide system instructions

You can provide system instructions to the Conversational Analytics API as a YAML-formatted string by using the system_instruction parameter. While the system_instruction parameter is optional, providing well-structured system instructions is recommended for accurate and relevant responses.

You can define the YAML-formatted string in your code during initial setup, as shown in Configure initial settings and authentication (HTTP) or Specify the billing project and system instructions (Python SDK). You can then include the system_instruction parameter in the following API calls:

YAML template for system instructions

The following template shows the recommended YAML structure for the string that you provide to the system_instruction parameter, including the available keys and expected data types. For a sample system instruction with filled-out values, see Example: System instructions.

You can use the following YAML template to provide your own system instructions:

- system_instruction: str # A description of the expected behavior of the agent. For example: You are a sales agent.
- tables: # A list of tables to describe for the agent.
    - table: # Details about a single table that is relevant for the agent.
        - name: str # The name of the table.
        - description: str # A description of the table.
        - synonyms: list[str] # Alternative terms for referring to the table.
        - tags: list[str] # Keywords or tags that are associated with the table.
        - fields: # Details about columns (fields) within the table.
            - field: # Details about a single column within the current table.
                - name: str # The name of the column.
                - description: str # A description of the column.
                - synonyms: list[str] # Alternative terms for referring to the column.
                - tags: list[str] # Keywords or tags that are associated with the column.
                - sample_values: list[str] # Sample values that are present within the column.
                - aggregations: list[str] # Commonly used or default aggregations for the column.
        - measures: # A list of calculated metrics (measures) for the table.
            - measure: # Details about a single measure within the table.
                - name: str # The name of the measure.
                - description: str # A description of the measure.
                - exp: str # The expression that is used to construct the measure.
                - synonyms: list[str] # Alternative terms for referring to the measure.
        - golden_queries: # A list of important or popular ("golden") queries for the table.
            - golden_query: # Details about a single golden query.
                - natural_language_query: str # The natural language query.
                - sql_query: str # The SQL query that corresponds to the natural language query.
        - golden_action_plans: # A list of suggested multi-step plans for answering specific queries.
            - golden_action_plan: # Details about a single action plan.
                - natural_language_query: str # The natural language query.
                - action_plan: # A list of the steps for this action plan.
                    - step: str # A single step within the action plan.
    - relationships: # A list of join relationships between tables.
        - relationship: # Details about a single join relationship.
            - name: str # The name of this join relationship.
            - description: str # A description of the relationship.
            - relationship_type: str # The join relationship type: one-to-one, one-to-many, many-to-one, or many-to-many.
            - join_type: str # The join type: inner, outer, left, right, or full.
            - left_table: str # The name of the left table in the join.
            - right_table: str # The name of the right table in the join.
            - relationship_columns: # A list of columns that are used for the join.
                - left_column: str # The join column from the left table.
                - right_column: str # The join column from the right table.
- glossaries: # A list of definitions for glossary business terms, jargon, and abbreviations.
    - glossary: # The definition for a single glossary item.
        - term: str # The term, phrase, or abbreviation to define.
        - description: str # A description or definition of the term.
        - synonyms: list[str] # Alternative terms for the glossary entry.
- additional_descriptions: # A list of any other general instructions or content.
    - text: str # Any additional general instructions or context not covered elsewhere.

Key components of system instructions

The following sections describe the key components of system instructions and how to use them to improve the quality of your agent's responses.

Define the agent's role and persona with system_instruction

Define the agent's role and persona by using the system_instruction key. This initial instruction sets the tone and style for the API's responses and helps the agent understand its core purpose.

The following YAML code block shows the basic structure for the system_instrucction key:

- system_instruction: str

For example, you can define an agent as a sales analyst for a fictitious ecommerce store as follows:

- system_instruction: >-
    You are an expert sales analyst for a fictitious ecommerce store. You will answer questions about sales, orders, and customer data. Your responses should be concise and data-driven.

Describe your data with tables

The tables key contains a list of table descriptions for the agent and provides details about the specific data that is available to the agent for answering questions. Each table object within this list contains the metadata for a specific table, including that table's name, description, synonyms, tags, fields, measures, golden queries, and golden action plans.

The following YAML code block shows the basic structure for the tables key:

- tables:
    - table:
      - name: str # The name of the table.
      - description: str # A description of the table.
      - synonyms: list[str] # Alternative terms for referring to the table.
      - tags: list[str] # Keywords or tags that are associated with the table.
      - fields: # A list of the fields in the table.
      - measures: # A list of the measures in the table.
      - golden_queries: # A list of golden queries for the table.
      - golden_action_plans: # A list of golden action plans for the table.

For example, the following YAML code block shows the basic structure for the tables key for the table bigquery-public-data.thelook_ecommerce.orders:

- tables:
    - table:
        - name: bigquery-public-data.thelook_ecommerce.orders
        - description: Data for customer orders in The Look fictitious e-commerce store.
        - synonyms:
            - sales
            - orders_data
        - tags:
            - ecommerce
            - transaction

Describe commonly used fields with fields

The fields key, which is nested within a table object, takes a list of field objects to describe individual columns. Not all fields need additional context; however, for commonly used fields, including additional details can help improve the agent's performance.

The following YAML code block shows the basic structure for the fields key:

- fields: # Details about columns (fields) within the table.
    - field: # Details about a single column within the current table.
        - name: str # The name of the column.
        - description: str # A description of the column.
        - synonyms: list[str] # Alternative terms for referring to the column.
        - tags: list[str] # Keywords or tags that are associated with the column.
        - sample_values: list[str] # Sample values that are present within the column.
        - aggregations: list[str] # Commonly used or default aggregations for the column.

For example, the following sample YAML code describes key fields such as order_id, status, created_at, num_of_items, and earnings for the orders table:

- tables:
    - table:
        - name: bigquery-public-data.thelook_ecommerce.orders
        - fields:
            - field:
                - name: order_id
                - description: The unique identifier for each customer order.
            - field:
                - name: user_id
                - description: The unique identifier for each customer.
            - field:
                - name: status
                - description: The current status of the order.
                - sample_values:
                    - complete
                    - shipped
                    - returned
            - field:
                - name: created_at
                - description: The timestamp when the order was created.
            - field:
                - name: num_of_items
                - description: The total number of items in the order.
                - aggregations:
                    - sum
                    - avg
            - field:
                - name: earnings
                - description: The sales amount for the order.
                - aggregations:
                    - sum
                    - avg

Define business metrics with measures

The measures key, which is nested within a table object, defines custom business metrics or calculations that aren't directly present as columns in your tables. Providing context for measures helps the agent answer questions about key performance indicators (KPIs) or other calculated values.

The following YAML code block shows the basic structure for the measures key:

- measures: # A list of calculated metrics (measures) for the table.
    - measure: # Details about a single measure within the table.
        - name: str # The name of the measure.
        - description: str # A description of the measure.
        - exp: str # The expression that is used to construct the measure.
        - synonyms: list[str] # Alternative terms for referring to the measure.

For example, you can define a profit measure as a calculation of the earnings minus the cost as follows:

- tables:
    - table:
        - name: bigquery-public-data.thelook_ecommerce.orders
        - measures:
            - measure:
                - name: profit
                - description: Raw profit (earnings minus cost).
                - exp: earnings - cost
                - synonyms:
                    - gains

Improve accuracy with golden_queries

The golden_queries key, which is nested within a table object, takes a list of golden_query objects. Golden queries help the agent provide more accurate and relevant responses to common or important questions. By providing the agent with both a natural language query and the corresponding SQL query for each golden query, you can guide the agent to provide higher quality and more consistent results.

The following YAML code block shows the basic structure for the golden_queries key:

- golden_queries: # A list of important or popular ("golden") queries for the table.
    - golden_query: # Details about a single golden query.
        - natural_language_query: str # The natural language query.
        - sql_query: str # The SQL query that corresponds to the natural language query.

For example, you can define golden queries for common analyses for the data in the orders table as follows:

- tables:
    - table:
        - golden_queries:
            - golden_query:
                - natural_language_query: How many orders are there?
                - sql_query: SELECT COUNT(*) FROM sqlgen-testing.thelook_ecommerce.orders
            - golden_query:
                - natural_language_query: How many orders were shipped?
                - sql_query: >-
                    SELECT COUNT(*) FROM sqlgen-testing.thelook_ecommerce.orders
                    WHERE status = 'shipped'

Outline multi-step tasks with golden_action_plans

The golden_action_plans key, which is nested within a table object, takes a list of golden_action_plan objects. You can use golden action plans to provide the agent with examples of how to handle multi-step requests, such as to fetch data and then create a visualization.

The following YAML code block shows the basic structure for the golden_action_plans key:

- golden_action_plans: # A list of suggested multi-step plans for answering specific queries.
    - golden_action_plan: # Details about a single action plan.
        - natural_language_query: str # The natural language query.
        - action_plan: # A list of the steps for this action plan.
            - step: str # A single step within the action plan.

For example, you can define an action plan for showing order breakdowns by age group and include details about the SQL query and visualization-related steps:

- tables:
    - table:
        - golden_action_plans:
            - golden_action_plan:
                - natural_language_query: Show me the number of orders broken down by age group.
                - action_plan:
                    - step: >-
                        Run a SQL query that joins the table
                        sqlgen-testing.thelook_ecommerce.orders and
                        sqlgen-testing.thelook_ecommerce.users to get a
                        breakdown of order count by age group.
                    - step: >-
                        Create a vertical bar plot using the retrieved data,
                        with one bar per age group.

Define table joins with relationships

The relationships key contains a list of join relationships between tables. By defining join relationships, you can help the agent understand how to join data from multiple tables when answering questions.

The following YAML code block shows the basic structure for the relationships key:

- relationships: # A list of join relationships between tables.
    - relationship: # Details for a single join relationship.
        - name: str # A unique name for this join relationship.
        - description: str # A description of the relationship.
        - relationship_type: str # The join relationship type (e.g., "one-to-one", "one-to-many", "many-to-one", "many-to-many").
        - join_type: str # The join type (e.g., "inner", "outer", "left", "right", "full").
        - left_table: str # The name of the left table in the join.
        - right_table: str # The name of the right table in the join.
        - relationship_columns: # A list of columns that are used for the join.
            - left_column: str # The join column from the left table.
            - right_column: str # The join column from the right table.

For example, you can define an orders_to_user relationship between the bigquery-public-data.thelook_ecommerce.orders table and the bigquery-public-data.thelook_ecommerce.users table as follows:

- relationships:
    - relationship:
        - name: orders_to_user
        - description: >-
            Connects customer order data to user information with the user_id and id fields to allow an aggregated view of sales by customer demographics.
        - relationship_type: many-to-one
        - join_type: left
        - left_table: bigquery-public-data.thelook_ecommerce.orders
        - right_table: bigquery-public-data.thelook_ecommerce.users
        - relationship_columns:
            - left_column: user_id
            - right_column: id

Explain business terms with glossaries

The glossaries key lists definitions for business terms, jargon, and abbreviations that are relevant to your data and use case. By providing glossary definitions, you can help the agent accurately interpret and answer questions that use specific business language.

The following YAML code block shows the basic structure for the glossaries key:

- glossaries: # A list of definitions for glossary business terms, jargon, and abbreviations.
    - glossary: # The definition for a single glossary item.
        - term: str # The term, phrase, or abbreviation to define.
        - description: str # A description or definition of the term.
        - synonyms: list[str] # Alternative terms for the glossary entry.

For example, you can define terms like common business statuses and "OMPF" according to your specific business context as follows:

- glossaries:
    - glossary:
        - term: complete
        - description: Represents an order status where the order has been completed.
        - synonyms: 'finish, done, fulfilled'
    - glossary:
        - term: shipped
        - description: Represents an order status where the order has been shipped to the customer.
    - glossary:
        - term: returned
        - description: Represents an order status where the customer has returned the order.
    - glossary:
        - term: OMPF
        - description: Order Management and Product Fulfillment

Include further instructions with additional_descriptions

The additional_descriptions key lists any additional general instructions or context that is not covered elsewhere in the system instructions. By providing additional descriptions, you can help the agent better understand the context of your data and use case.

The following YAML code block shows the basic structure for the additional_descriptions key:

- additional_descriptions: # A list of any other general instructions or content.
    - text: str # Any additional general instructions or context not covered elsewhere.

For example, you can use the additional_descriptions key to provide information about your organization as follows:

- additional_descriptions:
    - text: All the sales data pertains to The Look, a fictitious ecommerce store.
    - text: 'Orders can be of three categories: food, clothes, and electronics.'

Example: System instructions

The follow example shows sample system instructions for a fictitious sales analyst agent.

- system_instruction: >-
    You are an expert sales analyst for a fictitious ecommerce store. You will answer questions about sales, orders, and customer data. Your responses should be concise and data-driven.
- tables:
    - table:
        - name: bigquery-public-data.thelook_ecommerce.orders
        - description: Data for orders in The Look, a fictitious ecommerce store.
        - synonyms: sales
        - tags: 'sale, order, sales_order'
        - fields:
            - field:
                - name: order_id
                - description: The unique identifier for each customer order.
            - field:
                - name: user_id
                - description: The unique identifier for each customer.
            - field:
                - name: status
                - description: The current status of the order.
                - sample_values:
                    - complete
                    - shipped
                    - returned
            - field:
                - name: created_at
                - description: >-
                    The date and time at which the order was created in timestamp
                    format.
            - field:
                - name: returned_at
                - description: >-
                    The date and time at which the order was returned in timestamp
                    format.
            - field:
                - name: num_of_items
                - description: The total number of items in the order.
                - aggregations: 'sum, avg'
            - field:
                - name: earnings
                - description: The sales revenue for the order.
                - aggregations: 'sum, avg'
            - field:
                - name: cost
                - description: The cost for the items in the order.
                - aggregations: 'sum, avg'
        - measures:
            - measure:
                - name: profit
                - description: Raw profit (earnings minus cost).
                - exp: earnings - cost
                - synonyms: gains
        - golden_queries:
            - golden_query:
                - natural_language_query: How many orders are there?
                - sql_query: SELECT COUNT(*) FROM sqlgen-testing.thelook_ecommerce.orders
            - golden_query:
                - natural_language_query: How many orders were shipped?
                - sql_query: >-
                    SELECT COUNT(*) FROM sqlgen-testing.thelook_ecommerce.orders
                    WHERE status = 'shipped'
        - golden_action_plans:
            - golden_action_plan:
                - natural_language_query: Show me the number of orders broken down by age group.
                - action_plan:
                    - step: >-
                        Run a SQL query that joins the table
                        sqlgen-testing.thelook_ecommerce.orders and
                        sqlgen-testing.thelook_ecommerce.users to get a
                        breakdown of order count by age group.
                    - step: >-
                        Create a vertical bar plot using the retrieved data,
                        with one bar per age group.
    - table:
        - name: bigquery-public-data.thelook_ecommerce.users
        - description: Data for users in The Look, a fictitious ecommerce store.
        - synonyms: customers
        - tags: 'user, customer, buyer'
        - fields:
            - field:
                - name: id
                - description: The unique identifier for each user.
            - field:
                - name: first_name
                - description: The first name of the user.
                - tag: person
                - sample_values: 'alex, izumi, nur'
            - field:
                - name: last_name
                - description: The first name of the user.
                - tag: person
                - sample_values: 'warmer, stilles, smith'
            - field:
                - name: age_group
                - description: The age demographic group of the user.
                - sample_values:
                    - 18-24
                    - 25-34
                    - 35-49
                    - 50+
            - field:
                - name: email
                - description: The email address of the user.
                - tag: contact
                - sample_values: '222larabrown@gmail.com, cloudysanfrancisco@gmail.com'
        - golden_queries:
            - golden_query:
                - natural_language_query: How many unique customers are there?
                - sql_query: >-
                    SELECT COUNT(DISTINCT id) FROM
                    bigquery-public-data.thelook_ecommerce.users
            - golden_query:
                - natural_language_query: How many users in the 25-34 age group have a cymbalgroup email address?
                - sql_query: >-
                    SELECT COUNT(DISTINCT id) FROM
                    bigquery-public-data.thelook_ecommerce.users WHERE users.age_group =
                    '25-34' AND users.email LIKE '%@cymbalgroup.com';
    - relationships:
        - relationship:
            - name: orders_to_user
            - description: >-
                Connects customer order data to user information with the user_id and id fields to allow an aggregated view of sales by customer demographics.
            - relationship_type: many-to-one
            - join_type: left
            - left_table: bigquery-public-data.thelook_ecommerce.orders
            - right_table: bigquery-public-data.thelook_ecommerce.users
            - relationship_columns:
                - left_column: user_id
                - right_column: id
- glossaries:
    - glossary:
        - term: complete
        - description: Represents an order status where the order has been completed.
        - synonyms: 'finish, done, fulfilled'
    - glossary:
        - term: shipped
        - description: Represents an order status where the order has been shipped to the customer.
    - glossary:
        - term: returned
        - description: Represents an order status where the customer has returned the order.
    - glossary:
        - term: OMPF
        - description: Order Management and Product Fulfillment
- additional_descriptions:
    - text: All the sales data pertains to The Look, a fictitious ecommerce store.
    - text: 'Orders can be of three categories: food, clothes, and electronics.'