[[["わかりやすい","easyToUnderstand","thumb-up"],["問題の解決に役立った","solvedMyProblem","thumb-up"],["その他","otherUp","thumb-up"]],[["わかりにくい","hardToUnderstand","thumb-down"],["情報またはサンプルコードが不正確","incorrectInformationOrSampleCode","thumb-down"],["必要な情報 / サンプルがない","missingTheInformationSamplesINeed","thumb-down"],["翻訳に関する問題","translationIssue","thumb-down"],["その他","otherDown","thumb-down"]],["最終更新日 2025-09-04 UTC。"],[[["\u003cp\u003eDenormalizing data using nested and repeated fields in BigQuery can significantly improve query performance by reducing the need for table joins and data shuffling.\u003c/p\u003e\n"],["\u003cp\u003eNesting data with \u003ccode\u003eSTRUCT\u003c/code\u003e types allows for representing foreign entities inline, while repeated data with \u003ccode\u003eARRAY\u003c/code\u003e types preserves one-to-many relationships without high cardinality, both improving query performance.\u003c/p\u003e\n"],["\u003cp\u003eUsing nested and repeated fields can reduce computation overhead from operations like \u003ccode\u003eGROUP BY\u003c/code\u003e, which can occur when data is completely flattened and not using nested fields.\u003c/p\u003e\n"],["\u003cp\u003eNested and repeated data, represented as an \u003ccode\u003eARRAY\u003c/code\u003e of \u003ccode\u003eSTRUCT\u003c/code\u003es, offers a way to maintain data relationships and avoid the performance penalties associated with extensive network communication when data is completely flattened.\u003c/p\u003e\n"],["\u003cp\u003eThe provided example demonstrated that using nested fields can result in a query taking less time to run, as well as processing less data, compared to the version that does not use them.\u003c/p\u003e\n"]]],[],null,["# Use nested and repeated fields\n==============================\n\nBigQuery can be used with many different data modelling methods, and\ngenerally provides high performance across many data model methodologies. To further\ntune a data model for performance, one method you might consider is data denormalization,\nwhich means adding columns of data to a single table to reduce or remove table joins.\n\n**Best practice:** Use nested and repeated fields to denormalize data\nstorage and increase query performance.\n\nDenormalization is a common strategy for increasing read performance for\nrelational datasets that were previously normalized. The recommended way to\ndenormalize data in BigQuery is to use nested and repeated\nfields. It's best to use this strategy when the relationships are hierarchical\nand frequently queried together, such as in parent-child relationships.\n\nThe storage savings from using normalized data has less of an effect in modern\nsystems. Increases in storage costs are worth the performance gains of using\ndenormalized data. Joins require data coordination (communication\nbandwidth). Denormalization localizes the data to individual\n[slots](/bigquery/docs/slots), so that execution can be done in parallel.\n\nTo maintain relationships while denormalizing your data, you can use nested\nand repeated fields instead of completely flattening your data. When relational\ndata is completely flattened, network communication (shuffling) can negatively\nimpact query performance.\n\nFor example, denormalizing an orders schema without using nested and repeated\nfields might require you to group the data by a field like `order_id`\n(when there is a one-to-many relationship). Because of the shuffling involved,\ngrouping the data is less effective than denormalizing the data by using\nnested and repeated fields.\n\nIn some circumstances, denormalizing your data and using nested and repeated\nfields doesn't result in increased performance. For example, star schemas are\ntypically optimized schemas for analytics, and as a result, performance might\nnot be significantly different if you attempt to denormalize further.\n\nUsing nested and repeated fields\n--------------------------------\n\nBigQuery doesn't require a completely flat denormalization. You\ncan use nested and repeated fields to maintain relationships.\n\n- Nesting data (`STRUCT`)\n\n - Nesting data lets you represent foreign entities inline.\n - Querying nested data uses \"dot\" syntax to reference leaf fields, which is similar to the syntax using a join.\n - Nested data is represented as a [`STRUCT` type](/bigquery/docs/reference/standard-sql/data-types#struct_type) in GoogleSQL.\n- Repeated data (`ARRAY`)\n\n - Creating a field of type `RECORD` with the mode set to `REPEATED` lets you preserve a one-to-many relationship inline (so long as the relationship isn't high cardinality).\n - With repeated data, shuffling is not necessary.\n - Repeated data is represented as an `ARRAY`. You can use an [`ARRAY` function](/bigquery/docs/reference/standard-sql/array_functions) in GoogleSQL when you query the repeated data.\n- Nested and repeated data (`ARRAY` of `STRUCT`s)\n\n - Nesting and repetition complement each other.\n - For example, in a table of transaction records, you could include an array of line item `STRUCT`s.\n\nFor more information, see\n[Specify nested and repeated columns in table schemas](/bigquery/docs/nested-repeated).\n\nFor more information about denormalizing data, see\n[Denormalization](/bigquery/docs/migration/schema-data-overview#denormalization).\n\nExample\n-------\n\nConsider an `Orders` table with a row for each line item sold:\n\nIf you wanted to analyze data from this table, you would need to use a\n`GROUP BY` clause, similar to the following: \n\n```googlesql\nSELECT COUNT (Item_Name)\nFROM Orders\nGROUP BY Order_Id;\n```\n\nThe `GROUP BY` clause involves additional computation overhead, but this can be\navoided by nesting repeated data. You can avoid using a `GROUP BY` clause\nby creating a table with one order per row, where the order line items are in a\nnested field:\n\nIn BigQuery, you typically specify a nested schema as an `ARRAY`\nof `STRUCT` objects. You use the\n[`UNNEST` operator](/bigquery/docs/reference/standard-sql/query-syntax#unnest_operator)\nto [flatten the nested data](/bigquery/docs/arrays#flattening_arrays),\nas shown in the following query: \n\n```googlesql\nSELECT *\nFROM UNNEST(\n [\n STRUCT('001' AS Order_Id, ['A1', 'B1'] AS Item_Name),\n STRUCT('002' AS Order_Id, ['A1', 'C1'] AS Item_Name)\n ]\n);\n```\n\nThis query yields results similar to the following:\n\nIf this data wasn't nested, you could potentially have several rows for\neach order, one for each item sold in that order, which would result in a\nlarge table and an expensive `GROUP BY` operation.\n\nExercise\n--------\n\nYou can see the performance difference in queries that use nested fields as\ncompared to those that don't by following the steps in this section.\n\n1. Create a table based on the `bigquery-public-data.stackoverflow.comments`\n public dataset:\n\n ```googlesql\n CREATE OR REPLACE TABLE `\u003cvar translate=\"no\"\u003ePROJECT\u003c/var\u003e.\u003cvar translate=\"no\"\u003eDATASET\u003c/var\u003e.stackoverflow`\n AS (\n SELECT\n user_id,\n post_id,\n creation_date\n FROM\n `bigquery-public-data.stackoverflow.comments`\n );\n ```\n2. Using the `stackoverflow` table, run the following query to see the\n earliest comment for each user:\n\n ```googlesql\n SELECT\n user_id,\n ARRAY_AGG(STRUCT(post_id, creation_date AS earliest_comment) ORDER BY creation_date ASC LIMIT 1)[OFFSET(0)].*\n FROM\n `\u003cvar translate=\"no\"\u003ePROJECT\u003c/var\u003e.\u003cvar translate=\"no\"\u003eDATASET\u003c/var\u003e.stackoverflow`\n GROUP BY user_id\n ORDER BY user_id ASC;\n ```\n\n This query takes about 25 seconds to run and processes 1.88 GB of data.\n3. Create a second table with identical data that creates a `comments` field\n using a `STRUCT` type to store the `post_id` and `creation_date` data, instead\n of two individual fields:\n\n ```googlesql\n CREATE OR REPLACE TABLE `\u003cvar translate=\"no\"\u003ePROJECT\u003c/var\u003e.\u003cvar translate=\"no\"\u003eDATASET\u003c/var\u003e.stackoverflow_nested`\n AS (\n SELECT\n user_id,\n ARRAY_AGG(STRUCT(post_id, creation_date) ORDER BY creation_date ASC) AS comments\n FROM\n `bigquery-public-data.stackoverflow.comments`\n GROUP BY user_id\n );\n ```\n4. Using the `stackoverflow_nested` table, run the following query\n to see the earliest comment for each user:\n\n ```googlesql\n SELECT\n user_id,\n (SELECT AS STRUCT post_id, creation_date as earliest_comment FROM UNNEST(comments) ORDER BY creation_date ASC LIMIT 1).*\n FROM\n `\u003cvar translate=\"no\"\u003ePROJECT\u003c/var\u003e.\u003cvar translate=\"no\"\u003eDATASET\u003c/var\u003e.stackoverflow_nested`\n ORDER BY user_id ASC;\n ```\n\n This query takes about 10 seconds to run and processes 1.28 GB of data.\n5. [Delete](/bigquery/docs/samples/bigquery-delete-table) the `stackoverflow`\n and `stackoverflow_nested` tables when you are finished with them."]]