[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-09-04。"],[[["\u003cp\u003eFor string comparisons, favor using \u003ccode\u003eLIKE\u003c/code\u003e over \u003ccode\u003eREGEXP_CONTAINS\u003c/code\u003e whenever possible, as it offers faster execution, especially if you don't require the full capabilities of regular expressions.\u003c/p\u003e\n"],["\u003cp\u003eIf your requirements allow for it, utilize approximate aggregation functions like \u003ccode\u003eAPPROX_COUNT_DISTINCT\u003c/code\u003e instead of \u003ccode\u003eCOUNT(DISTINCT)\u003c/code\u003e to improve query performance.\u003c/p\u003e\n"],["\u003cp\u003eReplace the \u003ccode\u003eNTILE\u003c/code\u003e function with \u003ccode\u003eAPPROX_QUANTILES\u003c/code\u003e when dealing with large datasets to avoid \u003ccode\u003eResources exceeded\u003c/code\u003e errors and improve query efficiency, recognizing that \u003ccode\u003eAPPROX_QUANTILES\u003c/code\u003e provides approximate results.\u003c/p\u003e\n"],["\u003cp\u003eWhen creating user-defined functions (UDFs), opt for SQL UDFs over JavaScript UDFs for simple calculations, as SQL UDFs are more readily optimized by the query optimizer; reserve JavaScript UDFs for more complex calculations.\u003c/p\u003e\n"],["\u003cp\u003eIt is better to create persistent UDFs, and make use of them in a centralized dataset to reduce the amount of resources needed, and optimize performance.\u003c/p\u003e\n"]]],[],null,["# Best practices for functions\n============================\n\nThis document describes how to optimize queries that use SQL functions.\n\nOptimize string comparison\n--------------------------\n\n**Best practice:** When possible, use `LIKE` instead of `REGEXP_CONTAINS`.\n\nIn BigQuery, you can use the\n[`REGEXP_CONTAINS`](/bigquery/docs/reference/standard-sql/string_functions#regexp_contains)\nfunction or the [`LIKE`](/bigquery/docs/reference/standard-sql/operators#comparison_operators)\noperator to compare strings. `REGEXP_CONTAINS` provides more functionality,\nbut also has a slower execution time. Using `LIKE` instead of `REGEXP_CONTAINS`\nis faster, particularly if you don't need the full power of regular expressions\nthat `REGEXP_CONTAINS` provides, for example wildcard matching.\n\nConsider the following use of the `REGEXP_CONTAINS` function: \n\n```googlesql\nSELECT\n dim1\nFROM\n `dataset.table1`\nWHERE\n REGEXP_CONTAINS(dim1, '.*test.*');\n```\n\nYou can optimize this query as follows: \n\n```googlesql\nSELECT\n dim1\nFROM\n `dataset.table`\nWHERE\n dim1 LIKE '%test%';\n```\n\nOptimize aggregation functions\n------------------------------\n\n**Best practice:** If your use case supports it, use an approximate aggregation\nfunction.\n\nIf the SQL aggregation function you're using has an equivalent approximation\nfunction, the approximation function yields faster query performance. For\nexample, instead of using\n[`COUNT(DISTINCT)`](/bigquery/docs/reference/standard-sql/aggregate_functions#count),\nuse\n[`APPROX_COUNT_DISTINCT`](/bigquery/docs/reference/standard-sql/approximate_aggregate_functions#approx_count_distinct).\nFor more information, see\n[approximate aggregation functions](/bigquery/docs/reference/standard-sql/approximate_aggregate_functions).\n\nYou can also use `HyperLogLog++` functions to do approximations (including custom\napproximate aggregations). For more information, see\n[HyperLogLog++ functions](/bigquery/docs/reference/standard-sql/hll_functions)\nin the GoogleSQL reference.\n\nConsider the following use of the `COUNT` function: \n\n```googlesql\nSELECT\n dim1,\n COUNT(DISTINCT dim2)\nFROM\n `dataset.table`\nGROUP BY 1;\n```\n\nYou can optimize this query as follows: \n\n```googlesql\nSELECT\n dim1,\n APPROX_COUNT_DISTINCT(dim2)\nFROM\n `dataset.table`\nGROUP BY 1;\n```\n\n### Optimize quantile functions\n\n**Best practice:** When possible, use `APPROX_QUANTILE` instead of `NTILE`.\n\nRunning a query that contains the\n[`NTILE`](/bigquery/docs/reference/standard-sql/numbering_functions#ntile)\nfunction can fail with a\n[`Resources exceeded`](/bigquery/troubleshooting-errors#resourcesExceeded)\nerror if there are too many\nelements to `ORDER BY` in a single partition, which causes data volume to grow.\nThe analytic window isn't partitioned, so the `NTILE` computation requires\na global `ORDER BY` for all rows in the table to be processed\nby a single worker/slot.\n\nTry using\n[`APPROX_QUANTILES`](/bigquery/docs/reference/standard-sql/approximate_aggregate_functions#approx_quantiles)\ninstead. This function allows the query to run more efficiently because it\ndoesn't require a global `ORDER BY` for all rows in the table.\n\nConsider the following use of the `NTILE` function: \n\n```googlesql\nSELECT\n individual_id,\n NTILE(nbuckets) OVER (ORDER BY sales desc) AS sales_third\nFROM\n `dataset.table`;\n```\n\nYou can optimize this query as follows: \n\n```googlesql\nWITH QuantInfo AS (\n SELECT\n o, qval\n FROM UNNEST((\n SELECT APPROX_QUANTILES(sales, nbuckets)\n FROM `dataset.table`\n )) AS qval\n WITH offset o\n WHERE o \u003e 0\n)\nSELECT\n individual_id,\n (SELECT\n (nbuckets + 1) - MIN(o)\n FROM QuantInfo\n WHERE sales \u003c= QuantInfo.qval\n ) AS sales_third\nFROM `dataset.table`;\n```\n\nThe optimized version gives similar but not identical results to the original\nquery, because `APPROX_QUANTILES`:\n\n1. Provides an approximate aggregation.\n2. Places the remainder values (the remainder of the number of rows divided by buckets) in a different way.\n\nOptimize UDFs\n-------------\n\n**Best practice:** Use SQL UDFs for simple calculations because the query optimizer can apply optimizations to SQL UDF definitions. Use Javascript UDFs for complex calculations that are not supported by SQL UDF.\n\nCalling a JavaScript UDF requires the instantiation of a subprocess.\nSpinning up this process and running the UDF directly impacts query performance.\nIf possible, use a\n[native (SQL) UDF](/bigquery/docs/user-defined-functions#sql-udf-structure)\ninstead.\n\n### Persistent UDFs\n\nIt is better to create persistent user-defined SQL and JavaScript functions\nin a centralized BigQuery dataset that can be invoked across\nqueries and in logical views, as opposed to creating and calling a UDF in code\neach time. Creating org-wide libraries of business logic within shared datasets\nhelps optimize performance and use fewer resources.\n\nThe following example shows how a temporary UDF is invoked in a query: \n\n```googlesql\nCREATE TEMP FUNCTION addFourAndDivide(x INT64, y INT64) AS ((x + 4) / y);\n\nWITH numbers AS\n (SELECT 1 as val\n UNION ALL\n SELECT 3 as val\n UNION ALL\n SELECT 4 as val\n UNION ALL\n SELECT 5 as val)\nSELECT val, addFourAndDivide(val, 2) AS result\nFROM numbers;\n```\n\nYou can optimize this query by replacing the temporary UDF with a\npersistent one: \n\n```googlesql\nWITH numbers AS\n (SELECT 1 as val\n UNION ALL\n SELECT 3 as val\n UNION ALL\n SELECT 4 as val\n UNION ALL\n SELECT 5 as val)\nSELECT val, `your_project.your_dataset.addFourAndDivide`(val, 2) AS result\nFROM numbers;\n```"]]