Cloud Data Loss Prevention (Cloud DLP) 現已併入機密資料保護。API 名稱維持不變:Cloud Data Loss Prevention API (DLP API)。如要瞭解構成 Sensitive Data Protection 的服務,請參閱「Sensitive Data Protection 總覽」。
[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["難以理解","hardToUnderstand","thumb-down"],["資訊或程式碼範例有誤","incorrectInformationOrSampleCode","thumb-down"],["缺少我需要的資訊/範例","missingTheInformationSamplesINeed","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-09-04 (世界標準時間)。"],[],[],null,["# Supported file types and scanning modes\n\nThis page lists the types of files that Sensitive Data Protection can scan\nand describes the scanning modes that Sensitive Data Protection uses to\nanalyze files.\n\nSupported file types in inspection and de-identification operations\n-------------------------------------------------------------------\n\nThe following table shows the types of files that Sensitive Data Protection can\ninspect and transform (*de-identify*).\n\n\nSensitive Data Protection relies on file extensions and media (MIME) types to identify the types\nof the files to be scanned and the [scanning modes](/sensitive-data-protection/docs/supported-file-types#scanning_modes) to\napply. For example, Sensitive Data Protection scans a `.txt` file in\nplain text mode, even if the file is structured as a CSV file, which is normally\nscanned in structured parsing mode.\n\nSupported file clusters in discovery operations\n-----------------------------------------------\n\nDuring discovery, Sensitive Data Protection organizes the detected files into\n*file clusters*. These clusters are groups of similar file types. The following\ntable shows the supported file clusters and file extensions. Not all detected\nfiles are scannable.\n\nFiles might move between file clusters as Sensitive Data Protection adds\nsupport for more file clusters. As scanning support expands, the discovery\nservice might begin to scan files that were previously not scanned. You are\nbilled as described in [Discovery\npricing](/sensitive-data-protection/pricing#data_profiling_pricing).\n\nUnrecognized file types in Cloud Storage\n----------------------------------------\n\nIf a file is not recognized during a\n[storage scan](/sensitive-data-protection/docs/inspecting-storage), the system will, by default, scan\nit as a binary file. It attempts to convert the content to UTF_8, and then scans\nit as plain text.\n\nIf a file is not recognized during a\n[discovery scan](/sensitive-data-protection/docs/data-profiles), the system\ndoesn't scan it.\n\nIf you have a collection of files you want to skip because Sensitive Data Protection\ndoesn't recognize them, you can specify an exclusion list using\n[`CloudStorageOptions.file_set.regex_file_set.exclude_regex`](/sensitive-data-protection/docs/reference/rest/v2/InspectJobConfig#CloudStorageRegexFileSet).\n\nLimits on bytes scanned per file\n--------------------------------\n\nIn general, you can limit the number of bytes scanned per file. In the\nGoogle Cloud console, you do so by [turning on\nsampling](/sensitive-data-protection/docs/inspecting-storage#dlp-inspect-storage-console). In the\nCloud Data Loss Prevention API, you set the\n[`bytes_limit_per_file`](/sensitive-data-protection/docs/reference/rest/v2/InspectJobConfig#CloudStorageOptions.FIELDS.bytes_limit_per_file)\nor [`bytesLimitPerFilePercent`](/sensitive-data-protection/docs/reference/rest/v2/InspectJobConfig#CloudStorageOptions.FIELDS.bytes_limit_per_file_percent)\nfield.\n\nSampling isn't supported in OCR and intelligent parsing modes. That is, when the\nfollowing file types are scanned in OCR or intelligent document parsing mode,\nSensitive Data Protection ignores any settings that you apply to limit the bytes\nscanned per file.\n\n- Image\n- Microsoft Excel\n- Microsoft PowerPoint\n- Microsoft Word\n- PDF\n\nIf you scan these files in binary mode, the limits apply.\n\nScanning modes\n--------------\n\nEach scanning mode provides additional\n[location details](https://cloud.google.com/sensitive-data-protection/docs/reference/rest/v2/InspectResult#Location)\nin [inspection findings](https://cloud.google.com/sensitive-data-protection/docs/reference/rest/v2/InspectResult#Finding).\n\n\u003cbr /\u003e\n\nScanning structured files in structured parsing mode\n----------------------------------------------------\n\nWhen you scan a structured file---such as an Avro, CSV, or TSV\nfile---Sensitive Data Protection attempts to scan the file in\n[structured parsing scanning\nmode](/sensitive-data-protection/docs/supported-file-types#structured-parsing). This scanning mode has\na superior detection quality compared to [binary\nscanning](/sensitive-data-protection/docs/supported-file-types#binary) because the structured parsing\nmode searches for correlations between rows and columns in the structured data.\nFindings are returned with additional metadata indicating the location of the\nfinding, including the\n[`fieldId`](/sensitive-data-protection/docs/reference/rest/v2/InspectResult#recordlocation).\n\nHowever, in the following cases, Sensitive Data Protection might revert\nto binary scanning mode, which doesn't include the enhancements of the\nstructured parsing mode:\n\n- The file or header is corrupted.\n- The inspection job configuration has size limits---such as [`bytesLimitPerFile` and\n `bytesLimitPerFilePercent`](/sensitive-data-protection/docs/reference/rest/v2/InspectJobConfig#cloudstorageoptions)---that are too small. For example, if the `bytesLimitPerFile` limit isn't large enough to include a full block header and at least one row of valid data, then Sensitive Data Protection might scan that file in binary scanning mode.\n\nThe selection of data that is scanned depends on whether\n[sampling](/sensitive-data-protection/docs/reference/rest/v2/InspectJobConfig#CloudStorageOptions.FIELDS.sample_method)\nis set to start from the top of the file or from a random position.\n\nFor example, suppose that you have an Avro file that has 50 KB block headers and\n2 MB data blocks. In general, starting the sample from the top helps you make\nsure that the block header is always included in the sample that\nSensitive Data Protection takes. If you start sampling from a random\nposition in the file and the sample size is smaller than a data block, there's a\nchance that the block header isn't included in the sample. In this example,\nincreasing the sample size (specified by `bytesLimitPerFile` or\n`bytesLimitPerFilePercent`) to 2.05 MB helps prevent the inspection from\nreverting to binary parsing mode.\n[](/static/sensitive-data-protection/docs/images/random-sampling-avro.svg) Example: When a sample size is too small, the inspection might not include the block header (click to enlarge)."]]