Package dataflux provides an easy way to parallelize listing in Google Cloud Storage.
More information about Google Cloud Storage is available at https://cloud.google.com/storage/docs.
See https://pkg.go.dev/cloud.google.com/go for authentication, timeouts, connection pooling and similar aspects of this package.
NOTE: This package is in preview. It is not stable, and is likely to change.
Lister
type Lister struct {
// contains filtered or unexported fields
}
Lister is used for interacting with Dataflux fast-listing. The caller should initialize it with NewLister() instead of creating it directly.
func NewLister
func NewLister(c *storage.Client, in *ListerInput) *Lister
NewLister creates a new dataflux Lister to list objects in the give bucket.
func (*Lister) Close
func (c *Lister) Close()
Close closes the range channel of the Lister.
func (*Lister) NextBatch
NextBatch runs worksteal algorithm and sequential listing in parallel to quickly return a list of objects in the bucket. For smaller dataset, sequential listing is expected to be faster. For larger dataset, worksteal listing is expected to be faster.
ListerInput
type ListerInput struct {
// BucketName is the name of the bucket to list objects from. Required.
BucketName string
// Parallelism is number of parallel workers to use for listing.
// Default value is 10x number of available CPU. Optional.
Parallelism int
// BatchSize is the number of objects to list. Default value returns
// all objects at once. The number of objects returned will be
// rounded up to a multiple of gcs page size. Optional.
BatchSize int
// Query is the query to filter objects for listing. Default value is nil.
// Use ProjectionNoACL for faster listing. Including ACLs increases
// latency while fetching objects. Optional.
Query storage.Query
// SkipDirectoryObjects is to indicate whether to list directory objects.
// Default value is false. Optional.
SkipDirectoryObjects bool
}
ListerInput contains options for listing objects.