并行运行多个 BigQuery 查询作业

并行运行多个 BigQuery 查询作业,与依次串行运行作业相比,性能得到了提升。

深入探索

如需查看包含此代码示例的详细文档,请参阅以下内容:

代码示例

YAML

main:
    steps:
    - init:
        assign:
            - results : {} # result from each iteration keyed by table name
            - tables:
                - 201201h
                - 201202h
                - 201203h
                - 201204h
                - 201205h
    - runQueries:
        parallel:
            shared: [results]
            for:
                value: table
                in: ${tables}
                steps:
                - logTable:
                    call: sys.log
                    args:
                        text: ${"Running query for table " + table}
                - runQuery:
                    call: googleapis.bigquery.v2.jobs.query
                    args:
                        projectId: ${sys.get_env("GOOGLE_CLOUD_PROJECT_ID")}
                        body:
                            useLegacySql: false
                            useQueryCache: false
                            timeoutMs: 30000
                            # Find top 100 titles with most views on Wikipedia
                            query: ${
                                "SELECT TITLE, SUM(views)
                                FROM `bigquery-samples.wikipedia_pageviews." + table + "`
                                WHERE LENGTH(TITLE) > 10
                                GROUP BY TITLE
                                ORDER BY SUM(VIEWS) DESC
                                LIMIT 100"
                                }
                    result: queryResult
                - returnResult:
                    assign:
                        # Return the top title from each table
                        - results[table]: {}
                        - results[table].title: ${queryResult.rows[0].f[0].v}
                        - results[table].views: ${queryResult.rows[0].f[1].v}
    - returnResults:
        return: ${results}

后续步骤

如需搜索和过滤其他 Google Cloud 产品的代码示例,请参阅 Google Cloud 示例浏览器