Aggregate data using a parallel loop

Stay organized with collections Save and categorize content based on your preferences.

Separate queries to a public BigQuery dataset each return the number of words in a document, or set of documents. A shared variable allows the count of the words to accumulate and be read after all the iterations complete.

Explore further

For detailed documentation that includes this code sample, see the following:

Code sample

YAML

main:
  params: [input]
  steps:
    - init:
        assign:
          - numWords: 0
          - corpuses:
              - sonnets
              - various
              - 1kinghenryvi
              - 2kinghenryvi
              - 3kinghenryvi
              - comedyoferrors
              - kingrichardiii
              - titusandronicus
              - tamingoftheshrew
              - loveslabourslost
    - runQueries:
        parallel:  # numWords is shared so it can be written within the parallel loop
          shared: [numWords]
          for:
            value: corpus
            in: ${corpuses}
            steps:
              - runQuery:
                  call: googleapis.bigquery.v2.jobs.query
                  args:
                    projectId: ${sys.get_env("GOOGLE_CLOUD_PROJECT_ID")}
                    body:
                      useLegacySql: false
                      query: ${
                        "SELECT COUNT(DISTINCT word) FROM `bigquery-public-data.samples.shakespeare` " +
                        " WHERE corpus='" + corpus + "' "
                        }
                  result: query
              - add:
                  assign:
                    - numWords: ${numWords + int(query.rows[0].f[0].v)}  # first result is the count
    - done:
        return: ${numWords}

What's next

To search and filter code samples for other Google Cloud products, see the Google Cloud sample browser.