Message representing a set of files in a Cloud Storage bucket. Regular
expressions are used to allow fine-grained control over which files in
the bucket to include. Included files are those that match at least
one item in include_regex
and do not match any items in
exclude_regex
. Note that a file that matches items from both lists
will not be included. For a match to occur, the entire file path
(i.e., everything in the url after the bucket name) must match the
regular expression. For example, given the input {bucket_name:
"mybucket", include_regex: ["directory1/.*"], exclude_regex:
["directory1/excluded.*"]}
: - gs://mybucket/directory1/myfile
will be included - gs://mybucket/directory1/directory2/myfile
will be included (.*
matches across /
) -
gs://mybucket/directory0/directory1/myfile
will not be included
(the full path doesn’t match any items in include_regex
) -
gs://mybucket/directory1/excludedfile
will not be included (the
path matches an item in exclude_regex
) If include_regex
is
left empty, it will match all files by default (this is equivalent to
setting include_regex: [".*"]
). Some other common use cases: -
{bucket_name: "mybucket", exclude_regex: [".*\.pdf"]}
will
include all files in mybucket
except for .pdf files -
{bucket_name: "mybucket", include_regex: ["directory/[^/]+"]}
will include all files directly under gs://mybucket/directory/
,
without matching across /
.. attribute:: bucket_name
The name of a Cloud Storage bucket. Required.
A list of regular expressions matching file paths to exclude.
All files in the bucket that match at least one of these
regular expressions will be excluded from the scan. Regular
expressions use RE2 syntax
<https://github.com/google/re2/wiki/Syntax>
__; a guide can be
found under the google/re2 repository on GitHub.