[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-07-26。"],[[["\u003cp\u003eTable sampling allows querying random subsets of data from large BigQuery tables, providing a representative sample without the cost of scanning the entire table.\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003eTABLESAMPLE\u003c/code\u003e clause, when used in a query, selects a specified percentage of data blocks from a table, returning all rows within those selected blocks.\u003c/p\u003e\n"],["\u003cp\u003eUnlike \u003ccode\u003eLIMIT\u003c/code\u003e, \u003ccode\u003eTABLESAMPLE\u003c/code\u003e returns a random selection, and results are not cached, which means subsequent queries may return different subsets of data.\u003c/p\u003e\n"],["\u003cp\u003eTable sampling can be combined with other SQL clauses like \u003ccode\u003eWHERE\u003c/code\u003e and \u003ccode\u003eJOIN\u003c/code\u003e, as well as with parameterized queries to dynamically adjust the sampling percentage.\u003c/p\u003e\n"],["\u003cp\u003eTable sampling is supported for external tables, but may behave differently based on file type or structure, and it's subject to several limitations, including restrictions on views, subqueries, array scans, and row-level security.\u003c/p\u003e\n"]]],[],null,["# Table sampling\n==============\n\n|\n| **Preview**\n|\n|\n| This product or feature is subject to the \"Pre-GA Offerings Terms\" in the General Service Terms section\n| of the [Service Specific Terms](/terms/service-terms#1).\n|\n| Pre-GA products and features are available \"as is\" and might have limited support.\n|\n| For more information, see the\n| [launch stage descriptions](/products#product-launch-stages).\n\nTable sampling lets you query random subsets of data from large\nBigQuery tables. Sampling returns a variety of records while avoiding\nthe costs associated with scanning and processing an entire table.\n\nUsing table sampling\n--------------------\n\nTo use table sampling in a query, include the\n[`TABLESAMPLE`](/bigquery/docs/reference/standard-sql/query-syntax#tablesample_operator)\nclause. For example, the following query selects approximately 10% of a table's\ndata: \n\n SELECT * FROM dataset.my_table TABLESAMPLE SYSTEM (10 PERCENT)\n\nUnlike the `LIMIT` clause, `TABLESAMPLE` returns a random subset of data from a\ntable. Also, BigQuery does not cache the results of queries that\ninclude a `TABLESAMPLE` clause, so the query might return different results each\ntime.\n\nYou can combine the `TABLESAMPLE` clause with other selection conditions. The\nfollowing example samples about 50% of the table and then applies a `WHERE`\nclause: \n\n SELECT *\n FROM dataset.my_table TABLESAMPLE SYSTEM (50 PERCENT)\n WHERE customer_id = 1\n\nThe next example combines a `TABLESAMPLE` clause with a `JOIN` clause: \n\n SELECT *\n FROM dataset.table1 T1 TABLESAMPLE SYSTEM (10 PERCENT)\n JOIN dataset.table2 T2 TABLESAMPLE SYSTEM (20 PERCENT) USING (customer_id)\n\nFor smaller tables, if you join two samples and none of the sampled rows meet\nthe join condition, then you might receive an empty result.\n\nYou can specify the percentage as a\n[query parameter](/bigquery/docs/parameterized-queries). The next example shows\nhow to pass the percentage to a query by using the bq command-line tool: \n\n bq query --use_legacy_sql=false --parameter=percent:INT64:29 \\\n 'SELECT * FROM `dataset.my_table` TABLESAMPLE SYSTEM (@percent PERCENT)`\n\nBigQuery tables are organized into data blocks. The `TABLESAMPLE`\nclause works by randomly selecting a percentage of data blocks from the table\nand reading all of the rows in the selected blocks. The sampling granularity\nis limited by the number of data blocks.\n\nTypically, BigQuery splits tables or table partitions into blocks\nif they are larger than about 1 GB. Smaller tables might consist of a single\ndata block. In that case, the `TABLESAMPLE` clause reads the entire table. If\nthe sampling percentage is greater than zero and the table is not empty, then\ntable sampling always returns some results.\n\nBlocks can be different sizes, so the exact fraction of rows that are sampled\nmight vary. If you want to sample individual rows, rather than data blocks, then\nyou can use a `WHERE rand() \u003c K` clause instead. However, this approach requires\nBigQuery to scan the entire table. To save costs but still\nbenefit from row-level sampling, you can combine both techniques.\n\nThe following example reads approximately 20% of the data blocks from storage\nand then randomly selects 10% of the rows in those blocks: \n\n SELECT * FROM dataset.my_table TABLESAMPLE SYSTEM (20 PERCENT)\n WHERE rand() \u003c 0.1\n\nExternal tables\n---------------\n\nYou can use the `TABLESAMPLE` clause with external tables that store data in a\ncollection of files. BigQuery samples a subset of the external\nfiles that the table references. For some file formats, BigQuery\ncan split individual files into blocks for sampling. Some external data, such as\ndata in Google Sheets, consists of a single file that is sampled as one block\nof data.\n\nSampling from the write-optimized storage\n-----------------------------------------\n\nIf you use table sampling with\n[streaming inserts](/bigquery/docs/streaming-data-into-bigquery), then\nBigQuery samples data from the write-optimized storage. In some cases,\nall the data in the write-optimized storage is represented as a single block. When that happens,\neither all the data in the write-optimized storage appears in the results, or none of it does.\n\nPartitioned and clustered tables\n--------------------------------\n\nPartitioning and clustering produce blocks where all rows within a specific\nblock have either the same partitioning key or have clustering attributes with\nclose values. Therefore, sample sets from these tables tend to be more biased\nthan sample sets from non-partitioned, non-clustered tables.\n\nLimitations\n-----------\n\n- A sampled table can only appear once in a query statement. This restriction includes tables that are referenced inside view definitions.\n- Sampling data from views is not supported.\n- Sampling the results of subqueries or table-valued function calls is not supported.\n- Sampling from an array scan, such as the result of calling the `UNNEST` operator, is not supported.\n- Sampling inside an `IN` subquery is not supported.\n- Sampling from tables with row-level security applied is not supported.\n\nTable sampling pricing\n----------------------\n\nIf you use [on-demand billing](/bigquery/pricing#on_demand_pricing), then you\nare charged for reading the data that is sampled. BigQuery does\nnot cache the results of a query that includes a `TABLESAMPLE` clause, so each\nexecution incurs the cost of reading the data from storage."]]