[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["難以理解","hardToUnderstand","thumb-down"],["資訊或程式碼範例有誤","incorrectInformationOrSampleCode","thumb-down"],["缺少我需要的資訊/範例","missingTheInformationSamplesINeed","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-08-18 (世界標準時間)。"],[],[],null,["# Prepare training data\n\nThis page shows you how to prepare your tabular data for training classification\nand regression models in Vertex AI. The quality of your training data\nimpacts the effectiveness of the models you create.\n\nThis document covers the following topics:\n\n1. [Data structure requirements](#data-structure)\n2. [Prepare your import source](#import-source)\n3. [Add weights to your training data](#weight)\n\nBy default, Vertex AI uses a\n[random split](/vertex-ai/docs/tabular-data/data-splits#classification-random)\nalgorithm to separate your data into three data splits. Vertex AI\nrandomly selects 80% of your data rows for the training set, 10% for the\nvalidation set, and 10% for the test set. Alternatively, you can use a\n[manual split](/vertex-ai/docs/tabular-data/data-splits#classification-manual)\nor a [chronological split](/vertex-ai/docs/tabular-data/data-splits#classification-time),\nbut this requires you to prepare a data split column or a time column.\n[Learn more](/vertex-ai/docs/tabular-data/data-splits) about data splits.\n\nData structure requirements\n---------------------------\n\nYour training data must conform to the following basic requirements:\n\nPrepare your import source\n--------------------------\n\nYou can provide model training data to Vertex AI in two formats:\n\n- BigQuery tables\n- Comma-separated values (CSV)\n\nWhich source you use depends on how your data is stored, and the size and\ncomplexity of your data. If your dataset is small, and you don't need more\ncomplex data types, CSV might be easier. For larger datasets that include arrays\nand structs, use BigQuery. \n\n### BigQuery\n\nYour BigQuery table or view must conform to the\n[BigQuery location requirements](/vertex-ai/docs/general/locations#bq-locations).\n\nIf your BigQuery table or view is in a different project than the\nproject where you're creating your Vertex AI dataset, or your\nBigQuery table or view is backed by an external data source, add\none or more roles to the Vertex AI Service Agent. See\n[Role addition requirements for BigQuery](/vertex-ai/docs/general/access-control#bq-roles).\n\nYou do not need to specify a schema for your BigQuery table.\nVertex AI automatically infers the schema for your table when you\nimport your data.\n\nYour BigQuery URI (specifying the location of your training data)\nmust conform to the following format: \n\n```\nbq://\u003cproject_id\u003e.\u003cdataset_id\u003e.\u003ctable_id\u003e\n```\n\nThe URI cannot contain any other special characters.\n\nFor information about BigQuery data types and how they map into\nVertex AI, see [BigQuery tables](/vertex-ai/docs/datasets/data-types-tabular#bq). For more\ninformation about using BigQuery external data sources, see\n[Introduction to external data sources](/bigquery/external-data-sources).\n\n### CSV\n\nCSV files can be in Cloud Storage, or on your local computer. They must\nconform to the following requirements:\n\n- The first line of the first file must be a header, containing the names of the columns. If the first row of a subsequent file is the same as the header, then the row is also treated as a header, otherwise the row is treated as data.\n- Column names can include any alphanumeric character or an underscore (_). The column name cannot begin with an underscore.\n- Each file must not be larger than 10 GB.\n\n You can include multiple files, up to a maximum amount of 100 GB.\n- The delimiter must be a comma (\",\").\n\nYou do not need to specify a schema for your CSV data. Vertex AI\nautomatically infers the schema for your table when you import your data, and\nuses the header row for column names.\n\nFor more information about CSV file format and data types, see\n[CSV files](/vertex-ai/docs/datasets/data-types-tabular#csv).\n\nIf you import your data from Cloud Storage, it must be in a\nbucket that meets the following requirements:\n\n- It conforms to the [Vertex AI bucket requirements](/vertex-ai/docs/general/locations#buckets).\n- If the bucket is not in the same project as Vertex AI, add one or more roles to the Vertex AI Service Agent. See [Role addition requirements for Cloud Storage](/vertex-ai/docs/general/access-control#storage-roles).\n\nIf you import your data from your local computer, you must have a\nCloud Storage bucket that meets the following requirements:\n\n- It conforms to the [Vertex AI bucket requirements](/vertex-ai/docs/general/locations#buckets).\n- If the bucket is not in the same project as Vertex AI,\n add one or more roles to the Vertex AI Service Agent.\n See [Role addition requirements for Cloud Storage](/vertex-ai/docs/general/access-control#storage-roles).\n\n Vertex AI uses this bucket as a staging area before importing\n your data.\n\nAdd weights to your training data\n---------------------------------\n\nBy default, Vertex AI weighs each row of your training data\nequally. For training purposes, no row is considered more important than\nanother.\n\nSometimes, you might want some rows to have more importance for training. For\nexample, if you use spending data, you might want the data associated with\nhigher spenders to have a larger impact on the model. If you want\nto avoid missing a specific outcome, then weight rows with that outcome more\nheavily.\n\nGive rows a relative weight by adding a weight column to your dataset. The\nweight column must be a numeric column. The weight value can be 0‑10,000.\nHigher values indicate that the row is more important when training the model. A\nweight of 0 causes the row to be ignored. If you include a weight column, it\nmust contain a value for every row.\n\nLater, when you train your model, specify this column as the `Weight`\ncolumn.\n\nCustom weighting schemes are used only for training the model; they do not\naffect the test set used for model evaluation.\n\nWhat's next\n-----------\n\n- [Create your dataset](/vertex-ai/docs/tabular-data/classification-regression/create-dataset).\n- Learn about [best practices for creating tabular training data](/vertex-ai/docs/tabular-data/bp-tabular).\n- Learn how [Vertex AI works with different types of tabular data](/vertex-ai/docs/datasets/data-types-tabular)."]]