File Storage on Compute Engine

File storage (aka network-attached storage (NAS)) provides file-level access to applications to read and update information that can be shared across multiple machines. Some on-premises file storage solutions have a scale-up architecture and simply add storage to a fixed amount of compute resources. Other file storage solutions have a scale-out architecture where capacity and compute (performance) can be incrementally added to an existing file system as needed. In both storage architectures, one or multiple virtual machines (VMs) can access the storage.

Although some file systems use a native POSIX client, many storage systems use a protocol that enables client machines to mount a file system and access the files as if they were hosted locally. The most common protocols for exporting file shares are Network File System (NFS) for Linux (and in some cases Windows) and Server Message Block (SMB) for Windows.

This solution describes the following options for sharing files:

Compute Engine persistent disks and local SSDs

Managed file storage solutions:

Supported filer solutions in Cloud Marketplace:

An underlying factor in the performance and predictability of all of the Google Cloud services is the network stack that Google evolved over many years. With the Jupiter Fabric, Google built a robust, scalable, and stable networking stack that can continue to evolve without affecting your workloads. As Google improves and bolsters its network abilities internally, your file-sharing solution benefits from the added performance. For more details on the Jupiter Fabric, see the 2015 paper that describes its evolution.

One feature of Google Cloud that can help you get the most out of your investment is the ability to specify Custom VM types. When choosing the size of your filer, you can pick exactly the right mix of memory and CPU, so that your filer is operating at optimal performance without being oversubscribed.

Furthermore, it is important to choose the correct Compute Engine persistent disk capacity and number of vCPUs to ensure that your file server's storage devices receive the required storage bandwidth and IOPs as well as network bandwidth. A VM receives 2 Gb/s of network throughput for every vCPU (up to the max). For tuning persistent disk, see Optimizing persistent disk and local SSD performance.

Note that Cloud Storage is also a great way to store petabytes of data with high levels of redundancy at a low cost, but Cloud Storage has a different performance profile and API than the file servers discussed here.

Summary of file server options

The following table summarizes the features of persistent disks and the filer options:

Filer solution Optimal data set Throughput Managed support Export protocols
Filestore 1 TB to 64 TB Up to 1.2 GB/s Fully managed service by Google NFSv3
Filestore High Scale 10s of TB to 320 TB Up to 16 GB/s Fully managed service by Google NFSv3
NetApp Cloud Volumes 1 TB to 100 TB Up to 4.5 GB/s Fully managed service by Google and NetApp NFSv3, NFSv4, SMB2, SMB3
Dell PowerScale 108 TiB up to 50 PiB Up to 100s of GB/sec Fully managed service by Google and Dell Technologies NFSv3, NFSv4, SMB1, SMB2, SMB3, HDFS
Panzura 10s of TB to > 1 PB Up to several GB/s Panzura NFSv3, NFSv4, SMB1, SMB2, SMB3
Nasuni 10s of TB to > 1 PB Up to 1.2 GB/s Nasuni and customer managed NFSv3, NFSv4, NFSv4.1, NFSv4.2, SMB2, SMB3
Read-only PD < 64 TB 240 to 1,200 MB/s No Direct attachment

Compute Engine persistent disks and local SSDs

If you have data that only needs to be accessed by a single VM or doesn't change over time, you might use Compute Engine's persistent disks, and avoid a file server altogether. With persistent disks, you can format them with a file system such as Ext4 or XFS and attach volumes in either read-write or read-only mode. This means that you can first attach a volume to an instance, load it with the data you need, and then attach it as a read-only disk to hundreds of VMs simultaneously. Employing read-only persistent disks does not work for all use cases, but it can greatly reduce complexity, compared to using a file server.

Persistent disks deliver consistent performance. All disks of the same size (and for SSD persistent disks, the same number of vCPUs) that you attach to your instance have the same performance characteristics. You don't need to pre-warm or test your persistent disks before using them in production.

The cost of persistent disks is easy to determine because there are no I/O costs to consider after provisioning your volume. Persistent disks can also be resized on the fly, allowing you to start with a low-cost and low-capacity volume, and not requiring you to spin up additional instances or disks to scale your capacity.

If total storage capacity is the main requirement, you can use low-cost standard persistent disks. For the best performance while continuing to be durable, you can use SSD persistent disks.

If your data is ephemeral and requires sub-millisecond latency and high I/O operations per second (IOPS), you can leverage up to 9 TB of local SSDs for extreme performance. Local SSDs provide GB/s of bandwidth and millions of IOPS, all while not using up your instances' allotted network bandwidth. It is important to remember though that local SSDs have certain trade-offs in availability, durability, and flexibility.

For a comparison of the many disk types available to Compute Engine instances, see the documentation for block storage.

Considerations when choosing a file storage solution

Choosing a file storage solution requires you to make tradeoffs regarding manageability, cost, performance, and scalability. Making the decision is easier if you have a well-defined workload, which isn't often the case. Where workloads evolve over time or are highly variant, it's prudent to trade cost savings for flexibility and elasticity, so you can grow into your solution. On the other hand, if you have a temporal and well-known workload, you can create a purpose-built file storage architecture that you can easily tear down and rebuild to meet your immediate storage needs.

One of the first decisions to make is whether you want to pay for a managed storage service, a solution that includes product support, or an unsupported solution.

  • Managed file storage services are the easiest to operate, because either Google or a partner is handling all operations. These services might even provide an SLA for availability like most other Google Cloud services.
  • Unmanaged, yet supported, solutions provide additional flexibility. Partners can help with any issues, but the day-to-day operation of the storage solution is left to the user.
  • Unsupported solutions require the most effort to deploy and maintain, leaving all issues to the user. These solutions are not covered in this document.

Your next decision involves determining the solution's durability and availability requirements. Most file solutions are zonal solutions and do not by default provide protection if the zone fails. So it's important to consider if a disaster recovery (DR) solution that protects against zonal failures is required. It's also important to understand the application requirements for durability and availability. For example, the choice of local SSDs or persistent disks in your deployment has a big impact, as does the configuration of the file solution software. Each solution requires careful planning to achieve high durability, availability, and even protection against zonal and regional failures.

Finally, consider the locations (that is, zones, regions, on-premises data centers) of where you need to access the data. The locations of the compute farms that access your data influence your choice of filer solution because only some solutions allow hybrid on-premises and in-cloud access.

Managed file storage solutions

Filestore

Filestore is Google's fully managed NAS solution.

You can easily mount Filestore file shares on Compute Engine VMs. Filestore is also tightly integrated with Google Kubernetes Engine so your containers can reference the same shared data.

Filestore offers two performance tiers, Standard and Premium. Both tiers provide consistent performance and predictable costs.

For more information follow these links:

Filestore High Scale

Filestore High Scale simplifies enterprise storage and data management on Google Cloud and across hybrid clouds. Filestore High Scale delivers cost-effective, high-performance parallel access to global data while maintaining strict consistency powered by a dynamically scalable, distributed file system. With High Scale, existing NFS applications and NAS workflows can run in the cloud without requiring refactoring, yet retain the benefits of enterprise data services (high availability, compression, deduplication, and so on). Cloud-based integration with Google Kubernetes Engine allows seamless data persistence, portability, and sharing for containerized workloads.

High Scale is deployable and scalable at the push of a button. It lets you create and expand file system infrastructure easily and on-demand, ensuring that storage performance and capacity always align with your dynamic workflow requirements. As a High Scale cluster expands, both metadata and I/O performance scale linearly. This scaling allows you to enhance and accelerate a broad range of data-intensive workflows, including high-performance computing, analytics, cross-site data aggregation, DevOps, and many more. As a result, High Scale is a great fit for use in data-centric industries such as life sciences (for example, genome sequencing), financial services, and media and entertainment.

NetApp Cloud Volumes

NetApp Cloud Volumes Service for Google Cloud is a fully-managed cloud-based storage service that is integrated in the Google Cloud console, with seamless billing and support from Google.

The service allows you to quickly mount persistent shared storage to your compute instances. This storage delivers high throughput to your applications at low latency, with robust data-protection capabilities (snapshots and copies). With enterprise-grade architecture, the service provides high performance for both sequential and random workloads, which can scale across hundreds or thousands of Compute Engine compute instances. In seconds, volumes that range in size from 1 TB to 100 TB can be provisioned and protected with automated space-efficient snapshots. Commands to mount the created volumes to compute instances are available in the Cloud Console, further enhancing the user experience.

Architecture of NetApp Cloud Volumes Service.

There's no need to rewrite applications because Cloud Volumes provides POSIX compliance shares required by a broad range of file-based workloads, including web and rich media content, used across many industries such as electronic design automation (EDA) and media and entertainment.

With three service levels—standard, premium, and extreme—that you can change on demand, Cloud Volumes Service for Google Cloud delivers the right performance fit for your workload, without impacting availability of your workloads. NetApp can also help sync your data between on-premises and Cloud Volumes Service for Google Cloud.

For more information, follow these links:

Dell Technologies Cloud PowerScale for Google Cloud

Dell Technologies Cloud PowerScale for Google Cloud is an integrated cloud-native file service for Google Cloud users powered by the Dell EMC PowerScale family that includes PowerScale and Isilon nodes, the industry’s #1 scale-out NAS storage system. This turnkey offering, managed by Dell Technologies Services, combines the performance and capacity at scale of PowerScale OneFS and the flexibility and cost economics of Google Cloud.

PowerScale for Google Cloud is a simple, easy-to-use service with annual subscriptions and guaranteed, predictable pricing. Customers order it from Cloud Marketplace, and once it's provisioned can configure and manage their OneFS clusters directly from the Google Cloud Console. Google sends a single monthly bill, and support comes from Google, while Dell Technologies experts provide complete lifecycle management of the environment.

With PowerScale for Google Cloud, organizations can deploy a dedicated, secure PowerScale instance with sub-millisecond latency access to Google Cloud on-demand compute and analytics services while retaining the value they enjoy with PowerScale without having to make any changes to their applications. PowerScale for Google Cloud provides multi-protocol access and scale-out up to 50 PB in a single namespace along with other enterprise-class features such as multi-protocol access, native replication, and snapshots. Backed by enterprise-level uptime and performance SLAs, customers can expand existing and file storage capabilities and new capabilities—all without additional investment in their data center, facilities, people, hardware, engineering, or integration.

Google Cloud provides a broad range of compute and analytics services for on-demand, cost-effective processing and analysis of high-throughput, filed-based workloads. Together, PowerScale for Google Cloud enables enterprises to run the most demanding file-based workloads in the cloud—from big data analytics, artificial intelligence, and machine learning to genome sequencing and media and entertainment, taking advantage of flexible cloud consumption models and cloud economics.

Architecture of Dell Technologies Cloud PowerScale for Google Cloud.

For more information, follow these links:

Supported filer solutions in Cloud Marketplace

The following solutions are available in Cloud Marketplace.

Panzura

Panzura is a leader in managing unstructured data in the cloud. Enterprises in media and entertainment, genomics, life sciences, healthcare, financial services, and more choose Panzura Freedom NAS to consolidate their data islands into a single source of truth in Google Cloud without sacrificing performance or application rewrites. By consolidating unstructured data (NFS, SMB, and Object) into Google Cloud, you access all your data, collaborate on this data, and analyze and control it for compliance.

Panzura CloudFS underpins the Freedom Family and is a scale-out, distributed file system built for the cloud. It incorporates intelligent file services backed by 26 patents. The Freedom product family lets you address the following use cases cost effectively: cloud migration, global collaboration, and search and analytics.

Together, Panzura Freedom and Google Cloud enable IT leaders to:

  • Migrate thousands of legacy applications to Google Cloud without rewriting, changing workflows, or sacrificing performance.
  • Eliminate copy data sprawl for backup and secondary storage by consolidating data into a single source of truth.
  • Collaborate globally on large-scale projects to improve productivity and time to market.
  • Modernize your legacy NAS while realizing a 70% cost saving and reducing your file infrastructure in your data center by 90%.
  • Rehydrate legacy tape data for advanced analytics and machine learning.

Learn more:

Nasuni Cloud File Storage

Nasuni replaces enterprise file servers and NAS devices and all associated infrastructures, including backup and DR hardware, with a simpler, low-cost cloud alternative. Nasuni uses Google Cloud object storage to deliver a more efficient software-as-a-service (SaaS) storage solution that scales easily to handle rapid, unstructured file data growth. Nasuni is designed to handle department, project, and organizational file shares and application workflows for every employee, wherever they work.

Nasuni Cloud File Storage.

Nasuni offers three packages, with pricing for companies and organizations of all sizes so they can grow and expand as needed.

Its benefits include the following:

  • Cloud-based primary file storage for up to 70% less. Nasuni's architecture takes advantage of Google Cloud's native object lifecycle management policies. These policies allow complete flexibility of use with Cloud Storage object storage classes, including Standard, Nearline, Coldline, and Archive. By using Google's unique immediate-access Archive class object storage for primary storage with Nasuni, you can realize cost savings as dramatic as 70%.
  • Departmental and organizational file shares in the cloud. Nasuni's cloud-based architecture offers a single global namespace across Google Cloud regions, with no limits on the number of files, file sizes, or snapshots, letting you store files directly from your desktop into Google Cloud through standard NAS (SMB) drive-mapping protocols.
  • Built-in backup and disaster recovery. Nasuni's "set-it and forget-it" operations ease managing global file storage. Its built-in backup and DR is included with a single management console where you can oversee and control the environment anywhere, anytime.
  • Replaces aging file servers. Nasuni makes it easy to migrate Microsoft Windows file servers and other existing file storage systems to Google Cloud, reducing costs and management complexity of these environments.

Additional Resources

Overview Video

Nasuni Google Cloud web page

Solution Brief

Nasuni Marketplace Listing

Nasuni Google Cloud Blog Post