CREATE MODEL 문의 TRANSFORM 절을 수동 사전 처리 함수와 함께 사용하여 커스텀 데이터 사전 처리를 정의할 수 있습니다. TRANSFORM 절 외부에서 이러한 수동 사전 처리 함수를 사용할 수도 있습니다.
데이터 전처리를 모델 학습에서 분리하려면 TRANSFORM 절을 사용하여 데이터 변환만 수행하는 변환 전용 모델을 만들면 됩니다.
ML.TRANSFORM 함수를 사용하여 특성 사전 처리의 투명성을 높일 수 있습니다. 이 함수를 사용하면 모델의 TRANSFORM 절에서 사전 처리된 데이터를 반환할 수 있으므로 모델 학습으로 연결되는 실제 학습 데이터는 물론 모델 서빙에 들어가는 실제 예측 데이터를 확인할 수 있습니다.
BigQuery ML의 특성 전처리 지원에 대한 자세한 내용은 특성 전처리 개요를 참조하세요.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-09-04(UTC)"],[[["\u003cp\u003eManual feature preprocessing can be defined using custom functions with the \u003ccode\u003eTRANSFORM\u003c/code\u003e clause in the \u003ccode\u003eCREATE MODEL\u003c/code\u003e statement, or independently.\u003c/p\u003e\n"],["\u003cp\u003eTransform-only models can be created using the \u003ccode\u003eTRANSFORM\u003c/code\u003e clause to perform data transformations without training a model.\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003eML.TRANSFORM\u003c/code\u003e function allows users to inspect preprocessed data from a model's \u003ccode\u003eTRANSFORM\u003c/code\u003e clause for improved transparency.\u003c/p\u003e\n"],["\u003cp\u003eManual preprocessing functions are categorized into scalar, table-valued, and analytic functions, each operating on different scopes of data.\u003c/p\u003e\n"],["\u003cp\u003eThe data cleanup, numerical, categorical, text, and image functions are available for use in manual preprocessing.\u003c/p\u003e\n"]]],[],null,["# Manual feature preprocessing\n============================\n\nYou can use the\n[`TRANSFORM` clause](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create#transform)\nof the `CREATE MODEL` statement in combination with manual preprocessing\nfunctions to define custom data preprocessing. You can\nalso use these manual preprocessing functions outside of the `TRANSFORM` clause.\n\nIf you want to decouple data preprocessing from model training, you can create a\n[transform-only model](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-transform)\nthat only performs data transformations by using the `TRANSFORM` clause.\n\nYou can use the\n[`ML.TRANSFORM` function](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-transform)\nto increase the transparency of feature preprocessing. This function lets you\nreturn the preprocessed data from a model's `TRANSFORM` clause, so that you can\nsee the actual training data that goes into the model training, as well as the\nactual prediction data that goes into model serving.\n\nFor information about feature preprocessing support in\nBigQuery ML, see\n[Feature preprocessing overview](/bigquery/docs/preprocess-overview).\n\nFor information about the supported SQL statements and functions for each model\ntype, see [End-to-end user journey for each model](/bigquery/docs/e2e-journey).\n\nTypes of preprocessing functions\n--------------------------------\n\nThere are several types of manual preprocessing functions:\n\n- Scalar functions operate on a single row. For example, [`ML.BUCKETIZE`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-bucketize).\n- Table-valued functions operate on all rows and output a table. For example, [`ML.FEATURES_AT_TIME`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-feature-time).\n- Analytic functions operate on all rows, and output the result for each\n row based on the statistics collected across all rows. For example,\n [`ML.QUANTILE_BUCKETIZE`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-quantile-bucketize).\n\n You must always use an empty `OVER()` clause with ML analytic functions.\n\n When you use ML analytic functions inside the`TRANSFORM` clause\n during training, the same statistics are automatically applied to\n the input in prediction.\n\nThe following sections describe the available preprocessing functions.\n\n### General functions\n\nUse the following function on string or numerical expressions to do data cleanup:\n\n- [`ML.IMPUTER`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-imputer)\n\n### Numerical functions\n\nUse the following functions on numerical expressions to regularize data:\n\n- [`ML.BUCKETIZE`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-bucketize)\n- [`ML.MAX_ABS_SCALER`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-max-abs-scaler)\n- [`ML.MIN_MAX_SCALER`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-min-max-scaler)\n- [`ML.NORMALIZER`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-normalizer)\n- [`ML.POLYNOMIAL_EXPAND`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-polynomial-expand)\n- [`ML.QUANTILE_BUCKETIZE`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-quantile-bucketize)\n- [`ML.ROBUST_SCALER`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-robust-scaler)\n- [`ML.STANDARD_SCALER`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-standard-scaler)\n\n### Categorical functions\n\nUse the following functions on categorize data:\n\n- [`ML.FEATURE_CROSS`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-feature-cross)\n- [`ML.HASH_BUCKETIZE`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-hash-bucketize)\n- [`ML.LABEL_ENCODER`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-label-encoder)\n- [`ML.MULTI_HOT_ENCODER`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-multi-hot-encoder)\n- [`ML.ONE_HOT_ENCODER`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-one-hot-encoder)\n\n### Text functions\n\nUse the following functions on text string expressions:\n\n- [`ML.NGRAMS`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ngrams)\n- [`ML.BAG_OF_WORDS`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-bag-of-words)\n- [`ML.TF_IDF`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-tf-idf)\n\n### Image functions\n\nUse the following functions on image data:\n\n- [`ML.CONVERT_COLOR_SPACE`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-convert-color-space)\n- [`ML.CONVERT_IMAGE_TYPE`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-convert-image-type)\n- [`ML.DECODE_IMAGE`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-decode-image)\n- [`ML.RESIZE_IMAGE`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-resize-image)\n\nKnown limitations\n-----------------\n\n- BigQuery ML supports both automatic preprocessing and manual preprocessing in the [model export](/bigquery/docs/exporting-models). See the [supported data types](/bigquery/docs/exporting-models#export-transform-types) and [functions](/bigquery/docs/exporting-models#export-transform-functions) for exporting models trained with the [BigQuery ML `TRANSFORM` clause](/bigquery/docs/bigqueryml-transform)."]]