이 문서에서는 Feature Transform Engine이 특성 추출을 수행하는 방법을 설명합니다.
Feature Transform Engine에서 특성 선택과 특성 변환을 수행합니다.
특성 선택이 사용 설정되면 Feature Transform Engine에서 순위가 지정된 중요한 특성 집합을 만듭니다. 특성 변환이 사용 설정되면 Feature Transform Engine에서 모델 학습과 모델 서빙에 필요한 입력이 일관되도록 특성을 처리합니다. Feature Transform Engine은 단독으로 또는 테이블 형식 학습 워크플로와 함께 사용할 수 있습니다.
TensorFlow 및 비TensorFlow 프레임워크를 모두 지원합니다.
입력
Feature Transform Engine에 다음 입력을 제공해야 합니다.
원시 데이터(BigQuery 또는 CSV 데이터 세트)
데이터 분할 구성
특성 선택 구성
특성 변환 구성
출력
Feature Transform Engine에서 다음과 같은 출력을 생성합니다.
dataset_stats: 원시 데이터 세트를 설명하는 통계. 예를 들어 dataset_stats은 데이터 세트의 행 수를 제공합니다.
feature_importance: 특성의 중요도 점수. 특성 선택이 사용 설정되면 이 출력이 생성됩니다.
materialized_data: 학습 분할, 평가 분할, 테스트 분할이 포함된 데이터 분할 그룹의 변환된 버전
training_schema: 학습 데이터의 데이터 유형을 설명하는 OpenAPI 사양의 학습 데이터 스키마
instance_schema: 예측 데이터의 데이터 유형을 설명하는 OpenAPI 사양의 인스턴스 스키마
transform_output: 변환 메타데이터. 변환에 TensorFlow를 사용하는 경우 메타데이터에 TensorFlow 그래프가 포함됩니다.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-09-04(UTC)"],[],[],null,["# Feature engineering\n\n| **Preview**\n|\n|\n| This feature is subject to the \"Pre-GA Offerings Terms\" in the General Service Terms section\n| of the [Service Specific Terms](/terms/service-terms#1).\n|\n| Pre-GA features are available \"as is\" and might have limited support.\n|\n| For more information, see the\n| [launch stage descriptions](/products#product-launch-stages).\n\nThis document describes how Feature Transform Engine performs feature\nengineering.\nFeature Transform Engine performs feature selection and feature transformations.\nIf feature selection is enabled, Feature Transform Engine creates a ranked set of important\nfeatures. If feature transformations are enabled, Feature Transform Engine\nprocesses the features to ensure that the input for model training and model\nserving is consistent. Feature Transform Engine can be used on its own or together with any of\nthe [tabular training workflows](/vertex-ai/docs/tabular-data/tabular-workflows/overview).\nIt supports both TensorFlow and non-TensorFlow frameworks.\n\n\u003cbr /\u003e\n\nInputs\n------\n\nProvide the following inputs to Feature Transform Engine:\n\n- Raw data (BigQuery or CSV dataset).\n- Data split configuration.\n- Feature selection configuration.\n- Feature transformation configuration.\n\nOutputs\n-------\n\nFeature Transform Engine generates the following outputs:\n\n- `dataset_stats`: Statistics that describe the raw dataset. For example, `dataset_stats` gives the number of rows in the dataset.\n- `feature_importance`: The importance score of the features. This output is generated if [feature selection](#feature-selection) is enabled.\n- `materialized_data`, which is the transformed version of a data split group containing the training split, the evaluation split, and the test split.\n- `training_schema`: Training data schema in OpenAPI specification, which describes the data types of the training data.\n- `instance_schema`: Instance schema in OpenAPI specification, which describes the data types of the inference data.\n- `transform_output`: Metadata of the transformation. If you use TensorFlow for transformation, the metadata includes the TensorFlow graph.\n\nProcessing steps\n----------------\n\nFeature Transform Engine performs the following steps:\n\n- Generate [dataset splits](/vertex-ai/docs/tabular-data/data-splits) for training, evaluation, and testing.\n- Generate input dataset statistics `dataset_stats` that describe the raw dataset.\n- Perform [feature selection](#feature-selection).\n- Process the transform configuration using the dataset statistics, resolving automatic transformation parameters into manual transformation parameters.\n- [Transform raw features into engineered features](/vertex-ai/docs/datasets/data-types-tabular). Different transformations are done for different types of features.\n\nFeature selection\n-----------------\n\nThe main purpose of feature selection is to reduce the number of features used\nin the model. The reduced feature set captures most of the label's\ninformation in a more compact manner. Feature selection allows you to reduce the\ncost of training and serving models without significantly impacting model quality.\n\nIf you enable feature selection, Feature Transform Engine assigns an importance\nscore to each feature. You can choose to output the importance scores of the\nfull set of features or of a reduced subset of the most important features.\n\nVertex AI offers the following feature selection algorithms:\n\n- [Adjusted Mutual Information (AMI)](#ami)\n- [Conditional Mutual Information Maximization (CMIM)](#cmim)\n- [Joint Mutual Information Maximization (JMIM)](#jmim)\n- [Maximum Relevance Minimum Redundancy (MRMR)](#mrmr)\n\nNote that no feature selection algorithm always works best on all\ndatasets and for all purposes. If possible, run all the algorithms and combine\nthe results.\n\n### Adjusted Mutual Information (AMI)\n\nAMI is an adjustment of the Mutual Information (MI) score to account for chance.\nIt accounts for the fact that the MI is generally higher for two clusterings\nwith a larger number of clusters, regardless of whether there is actually more\ninformation shared.\n\nAMI is good at detecting the relevance of features and the label, but it is\ninsensitive to feature redundancy. Consider AMI if there are many\nfeatures (for example, more than 2000) and not much feature redundancy. It is\nfaster than the other algorithms described here, but it could pick up redundant\nfeatures.\n\n### Conditional Mutual Information Maximization (CMIM)\n\nCMIM is a greedy algorithm that chooses features iteratively based on conditional mutual information of candidate features with respect to selected features. In each iteration, it selects the feature that maximizes the minimum mutual information with the label that hasn't been captured by selected features yet.\n\nCMIM is robust in dealing with feature redundancy, and it works well in typical cases.\n\n### Joint Mutual Information Maximization (JMIM)\n\nJMIM is a greedy algorithm that is similar to CMIM. JMIM selects the feature that\nmaximizes the joint mutual information of the new one and pre-selected features\nwith the label, while CMIM takes redundancy more into account.\n\nJMIM is a high-quality feature selection algorithm.\n\n### Maximum Relevance Minimum Redundancy (MRMR)\n\nMRMR is a greedy algorithm that works iteratively. It is similar to CMIM. Each\niteration chooses the feature that maximizes relevance with respect to the label\nwhile minimizing pair-wise redundancy with respect to the selected features in\nprevious iterations.\n\nMRMR is a high-quality feature selection algorithm.\n\nWhat's next\n-----------\n\nAfter performing feature engineering, you can train a model for classification\nor regression:\n\n- Train a model with [End-to-End AutoML](/vertex-ai/docs/tabular-data/tabular-workflows/e2e-automl).\n- Train a model with [TabNet](/vertex-ai/docs/tabular-data/tabular-workflows/tabnet).\n- Train a model with [Wide \\& Deep](/vertex-ai/docs/tabular-data/tabular-workflows/wide-and-deep)."]]