This page provides an overview of hierarchical namespace, key features, common use cases, benefits, and limitations to consider.
Overview
Hierarchical namespace is a capability offered by Cloud Storage that lets you organize objects into folders. With hierarchical namespace, you can store your data in a logical file system structure. Organizing your data in a file system structure enhances performance, ensures consistency, and simplifies the management of data-intensive and file-oriented workloads.
The folder management operations provide reliability and management capabilities, including creating, deleting, listing, and renaming folders. The hierarchical organization of objects simplifies data organization and streamlines data management tasks. A folder in a bucket with hierarchical namespace enabled can contain objects, other folders, or a combination of both.
You must choose whether or not to use hierarchical namespace when you create the bucket; your bucket's hierarchical namespace setting can't be changed after the bucket is created. For information about enabling hierarchical namespace for your bucket, see Create and manage buckets with hierarchical namespace enabled.
The following diagram shows an example of a bucket with hierarchical namespace enabled where objects are organized in a hierarchical structure of folders.
Key features
Hierarchical namespace provides the following features:
Higher initial queries per second (QPS): Buckets with hierarchical namespace enabled offer up to 8 times higher initial QPS limits for reading and writing objects compared to buckets without hierarchical namespace enabled. The higher initial QPS makes it easier to scale data-intensive workloads and provides enhanced throughput. For information about performance optimization methods while using folders in buckets with hierarchical namespace enabled, see Folder management.
Folders: Folders act as a container for objects and other folders, with support for operations such as create, delete and get folders.
Rename folders: The rename folders operation helps you to atomically rename the path of a folder and its underlying folders without deleting any objects. This technique is efficient and time-saving, especially for large folders with multiple objects.
List folders: The list folders operation lists all folders in the bucket or underneath a specific folder helping you to manage and understand the structure of your data stored within a bucket.
When should you enable hierarchical namespace for your bucket
You should consider enabling hierarchical namespace when using applications that expect a file system-like hierarchy and semantics. Hierarchical namespace is beneficial for data-intensive tasks like analytics, AI, and ML workloads. Here are some common scenarios where you should consider using hierarchical namespace:
Hadoop based processing: Hadoop and Spark workloads traditionally expect a file system-like storage structure and time-based naming for files and folders. Hierarchical namespace integrates with the Cloud Storage connector to provide enhanced throughput and atomic folder renames, improving data integrity and consistency for many data processing pipelines.
File-oriented workloads processing: Workloads such as batch analytics processing, financial services, or high performance computing are structured into partitions based on a hierarchy of folders and files. Hierarchical namespace helps to manage these environments with a dedicated API for folder management. Additionally, hierarchical namespace simplifies managing folders that contain other folders and objects. With a single API command, you can swiftly rename a folder along with all its contents, saving valuable time and resources.
AI and ML processing: AI and ML tools such as TensorFlow, Pandas, and PyTorch expect file system-like access and semantics. Hierarchical namespace, especially when combined with Cloud Storage FUSE, delivers increased throughput and efficient data access. As a result, hierarchical namespace enhances the performance and reliability of the ML model iteration.
Before enabling hierarchical namespace for your bucket, you should consider the limitations of hierarchical namespace. For information about hierarchical namespace limitations, see Limitations.
Benefits of hierarchical namespace
When you enable Hierarchical namespace for your buckets, you can do the following:
Optimize organization: You can organize your data into a hierarchical folder structure, that helps you to manage and locate files or datasets.
Establish a file system-like ecosystem: Hierarchical namespace introduces file system-like features such as folders, folder renaming, and folder listing, which are beneficial for file-oriented applications, including the Hadoop ecosystem and AI and ML workloads.
Performance improvement: By scaling data-intensive workloads to handle higher throughput, you can enhance the overall performance of your application.
Platform support
Buckets with hierarchical namespace support the following Cloud Storage platform capabilities:
All the Cloud Storage object APIs and widely-used Cloud Storage features. For details about any unsupported features, see Limitations.
Data transfer from a standard bucket to a bucket with hierarchical namespace using Storage Transfer Service.
Integration with the following products:
Cloud Storage Connector, maintained by Dataproc for Hadoop workloads. For more information, see Use hierarchical namespace enabled buckets for Hadoop workloads
Cloud Storage FUSE for filesystem-like bucket access using clients.
Compatibility with Cloud Storage operations and features
Buckets with hierarchical namespace enabled have the following interactions with other Cloud Storage operations:
Object operations
Buckets with hierarchical namespace enabled handle object operations in the following ways:
- Operations like
Upload
,Rewrite
, andCompose
automatically create any missing parent folders, as long as you have the necessary permissions. As a result, you don't need to pre-create folders before uploading objects. - While folders can be created automatically during object operations, you
need to delete them explicitly using the
DeleteFolder
operation. - When using the
ListObjects
operation with thedelimiter
parameter, buckets return each child folder as aprefix.
However, empty folders are excluded by default. To include empty folders, similar to a typical file system listing, you must set theincludeFoldersAsPrefixes
parameter. For information about performance optimization methods while listing objects in buckets with hierarchical namespace enabled, see Listing objects.
Managed folder operations
Buckets with hierarchical namespace enabled handle managed folder operations in the following ways:
- Buckets with hierarchical namespace enabled offer granular access control through managed folders. To manage access within a folder, you must create a managed folder with the same name as the folder and then apply IAM policies to it. A managed folder cannot exist without the corresponding folder.
- Creating a managed folder automatically creates any missing parent folders, including the folder with the same name.
- Deleting a folder automatically deletes the associated managed folder.
- Renaming a folder automatically renames associated managed folders.
- Buckets with hierarchical namespace must follow the managed folder name rules and the folder name rules. Although folder names can be nested up to 50 levels deep, managed folder names can only be nested up to 15 levels deep. The maximum managed folder name size is limited by the maximum folder name size, which is 512 bytes when UTF-8 encoded.
Bucket operations
You can delete a bucket with hierarchical namespace enabled in the same manner as any other bucket. If a bucket enabled with hierarchical namespace only contains empty folders and no objects or managed folders, then the bucket can be deleted.
Object Lifecycle Management
Object Lifecycle Management lets you automate actions on objects
based on conditions, such as age or prefix. However, Object Lifecycle Management rules can behave differently in buckets with hierarchical namespace and in buckets with a flat
namespace due to the RenameFolder
operation:
Object Lifecycle Management rules for buckets with a flat namespace: The renaming operation involves renaming every object using tools by copying every object to a destination location and deleting the original object from the source location. As a result, new objects are created with new creation times at the destination location. If age-based Object Lifecycle Management rules are applied for the destination location, they won't apply to the new objects immediately as their creation times are reset.
Object Lifecycle Management rules for buckets with hierarchical namespace enabled: Renaming a folder operates at the folder level, without having to rename every single object. As a result, the creation time of the objects is preserved, meaning the age-based Object Lifecycle Management rules are applied to renamed objects immediately if they meet the age criteria.
Pricing
For pricing information, refer to Cloud Storage pricing.
Limitations
The following are the limitations of hierarchical namespace:
You must choose whether or not to use hierarchical namespace when you create the bucket; your bucket's hierarchical namespace setting can't be changed after the bucket is created.
In order to enable hierarchical namespace, a bucket must also enable uniform bucket-level access.
The following Cloud Storage capabilities are not supported for buckets that use hierarchical namespace:
- Autoclass
- Object versioning
- Object retention lock
- Bucket lock
What's next
- Create and manage buckets with hierarchical namespace enabled.
- Create and manage folders.
- Rename folders.
- Use hierarchical namespace for Hadoop workloads.
- Optimize performance.
Try it for yourself
If you're new to Google Cloud, create an account to evaluate how Cloud Storage performs in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
Try Cloud Storage free