CloudStorageRegexFileSet(mapping=None, *, ignore_unknown_fields=False, **kwargs)Message representing a set of files in a Cloud Storage bucket. Regular expressions are used to allow fine-grained control over which files in the bucket to include.
Included files are those that match at least one item in
include_regex and do not match any items in exclude_regex.
Note that a file that matches items from both lists will not be
included. For a match to occur, the entire file path (i.e.,
everything in the url after the bucket name) must match the regular
expression.
For example, given the input
{bucket_name: "mybucket", include_regex: ["directory1/.*"], exclude_regex: ["directory1/excluded.*"]}:
- gs://mybucket/directory1/myfilewill be included
- gs://mybucket/directory1/directory2/myfilewill be included (- .*matches across- /)
- gs://mybucket/directory0/directory1/myfilewill not be included (the full path doesn't match any items in- include_regex)
- gs://mybucket/directory1/excludedfilewill not be included (the path matches an item in- exclude_regex)
If include_regex is left empty, it will match all files by
default (this is equivalent to setting include_regex: [".*"]).
Some other common use cases:
- {bucket_name: "mybucket", exclude_regex: [".*\.pdf"]}will include all files in- mybucketexcept for .pdf files
- {bucket_name: "mybucket", include_regex: ["directory/[^/]+"]}will include all files directly under- gs://mybucket/directory/, without matching across- /
| Attributes | |
|---|---|
| Name | Description | 
| bucket_name | strThe name of a Cloud Storage bucket. Required. | 
| include_regex | MutableSequence[str]A list of regular expressions matching file paths to include. All files in the bucket that match at least one of these regular expressions will be included in the set of files, except for those that also match an item in exclude_regex. Leaving this field empty will match all
   files by default (this is equivalent to including.*in
   the list).
   
   Regular expressions use RE2syntax __; a
   guide can be found under the google/re2 repository on
   GitHub. | 
| exclude_regex | MutableSequence[str]A list of regular expressions matching file paths to exclude. All files in the bucket that match at least one of these regular expressions will be excluded from the scan. Regular expressions use RE2 syntax __; a
   guide can be found under the google/re2 repository on
   GitHub. |