Optimize performance in buckets with hierarchical namespace enabled

This page provides guidance on how you can optimize performance in buckets with hierarchical namespace enabled.

Listing objects

The following are the performance considerations for listing objects:

  • In buckets with hierarchical namespace enabled, listing all objects for the entire bucket or with a prefix is resource-intensive as the operation must traverse each folder and subfolder, similar to the ls -r command in a file system. Consequently, if there are more folders in your bucket, the slower the object listing happens. A large number of empty folders can also negatively impact object listing performance. To avoid negatively impacting performance, we recommend that you maximize the number of objects in each folder and regularly delete empty folders.
  • Listing or retrieving objects and sub folders within a specific folder using a delimiter and a specific prefix is more efficient in buckets with hierarchical namespace enabled as the objects are organized within a folder structure. To optimize listing performance when using a delimiter and a specific prefix, set the includeFoldersAsPrefixes parameter. Otherwise, Cloud Storage performs additional checks to exclude empty folders, which can slow down the operation. For more information about using the includeFoldersAsPrefixes when listing objects, see Listing objects.

Folder management

For efficient folder management, we recommend the following:

  • Pre-create folder structure: Instead of relying on automatic folder creation during object upload, rewrite, and compose operations, use the create folder operation to obtain your intended folder structure in advance. Pre-creating the folder structure improves the performance consistency and predictability.
  • Maximize objects per folder ratio: Aim for a high objects-to-folder ratio as it reduces the overhead associated with folder creation and management.
  • Limit folder creation and deletion requests: Creating or deleting folders is more resource-intensive than working with individual objects due to its hierarchical nature. To ensure a smooth performance, Cloud Storage limits these operations to 1000 requests per second for each bucket. Requests exceeding this limit are not explicitly restricted but resource availability determines whether they can be processed successfully.
  • Regularly delete empty folders: Empty folders can accumulate, especially when using Object Lifecycle Management or deleting objects without explicitly deleting their parent folders. The accumulated folders can impact the performance of object listing operation and other folder related operations. The following are some of the methods that you can use to delete empty folders:

    • When you use Cloud Storage FUSE or Cloud Storage connector to interact with a bucket enabled with hierarchical namespace, deleting a directory deletes the corresponding folder in your bucket.
    • You can use a recursive delete to delete folders automatically when using the Google Cloud console or Google Cloud CLI.
    • You can use a script or an automated process to periodically delete empty folders. The following script provides a basic approach to deleting empty folders. The script deletes folders sequentially, which can be slow for large buckets, hence you might consider optimizing the script for production environments. Additionally, the script deletes all empty folders (created implicitly or explicitly) including managed folders and their associated IAM policies. If you need to retain specific folders and managed folders, adjust the script based on the resources that you want to retain.

      # List all the folders under <bucket>/<prefix> and export results into
      # folders.txt
      gcloud storage folders list gs://<bucket>/<prefix> | grep storage_url | sed 's/storage_url: //' > folders.txt
      
      # Reverse the folder list and export results into folders-reverse.txt
      sed '1!G;h;$!d' folders.txt > folders-reverse.txt
      
      # Try deleting each folder in the reverse order (to guarantee child
      # folders are deleted before parent folders). This will fail for
      # non-empty folders, so only empty folders will be deleted
      xargs -I{} gcloud storage folders delete "{}" < folders-reverse.txt
      

What's next