A file server, also called a storage filer, provides a way for applications to read and update files that are shared across machines. Some file solutions are scale-up, and consist of storage attached to a single VM. Some solutions are scale-out, and consist of a cluster of VMs with storage attached that present a single file-system namespace to applications.
Although some file systems use a native POSIX client, many file servers use a protocol that enables client machines to mount a file system and access the files as if they were hosted locally. The most common protocols for exporting file shares are Network File System (NFS) for Linux and the Common Internet File System (CIFS) or Server Message Block (SMB) for Windows.
This solution describes the following options for sharing files:
Managed filer solutions:
Supported filer solutions in GCP Marketplace:
Supported filer solutions from partners:
An underlying factor in the performance and predictability of all of the Google Cloud Platform (GCP) services is the network stack that Google evolved over many years. With the Jupiter Fabric, Google built a robust, scalable, and stable networking stack that can continue to evolve without affecting your workloads. As Google improves and bolsters its network abilities internally, your file-sharing solution benefits from the added performance. For more details on the Jupiter Fabric, see the 2015 paper that describes its evolution.
One feature of GCP that can help you get the most out of your investment is the ability to specify Custom VM types. When choosing the size of your filer, you can pick exactly the right mix of memory and CPU, so that your filer is operating at optimal performance without being oversubscribed.
Further, it is important to choose the correct Compute Engine persistent disk capacity and number of vCPUs to ensure that your file server's storage devices receive the required storage bandwidth and IOPs as well as network bandwidth. A VM receives 2 Gb/s of network throughput for every vCPU (up to the max). For tuning persistent disk, see Optimizing persistent disk and local SSD performance.
Note that Cloud Storage is also a great way to store petabytes of data with high levels of redundancy at a low cost, but Cloud Storage has a different performance profile and API than the file servers discussed here.
Summary of file server options
The following table summarizes the features of persistent disks and the filer options:
|Filer solution||Optimal data set||Throughput||Managed support||Export protocols||Highly available||Hybrid|
|Cloud Filestore||1 TB to 63.9 TB||100 MB/s to 1.2 GB/s||Fully managed service by Google||NFSv3||Yes||No|
|NetApp Cloud Volumes||1 TB to 1 PB||10s to 100s of Gb/s||Fully managed service by Google and NetApp||NFSv3, SMB2, SMB3||Yes||Yes|
|Elastifile (acquired by Google Cloud)||10s of TB to > 1 PB||10s to 100s of Gb/s||Elastifile (acquired by Google Cloud)||NFSv3||Yes||Yes|
|Panzura||10s of TB to > 1 PB||10s to 100s of Gb/s||Panzura||NFSv3, NFSv4, SMB1, SMB2, SMB3||Yes||Yes|
|Quobyte||10s of TB to > 1 PB||100s to 1000s of Gb/s||Quobyte||Native Linux and Windows clients, Amazon S3, HDFS, NFSv4/3, SMB||Yes||Yes|
|Read-only PD||< 64 TB||240 to 1,200 MB/s||No||Direct attachment||No||No|
|Avere||10s to 100s of TB||10s to 100s of Gb/s||Avere||NFSv3, SMB2||Yes||Yes|
Compute Engine persistent disks
If you have data that only needs to be accessed by a single VM or doesn't change over time, you might use Compute Engine's persistent disks, and avoid a file server altogether. With persistent disks, you can format them with a file system such as Ext4 or XFS and attach volumes in either read-write or read-only modes. This means that you can first attach a volume to an instance, load it with the data you need, and then attach it as a read-only disk to hundreds of virtual machines simultaneously. Employing read-only persistent disks does not work for all use cases, but it can greatly reduce complexity, compared to using a file server.
Persistent disks deliver consistent performance. All disks of the same size (and for SSD persistent disks, the same number of vCPUs) that you attach to your instance have the same performance characteristics. You don't need to pre-warm or test your persistent disks before using them in production.
The cost of persistent disks is easy to determine because there are no I/O costs to consider after provisioning your volume. Persistent disks can also be resized on the fly, allowing you to start with a low-cost and low-capacity volume, and not requiring you to spin up additional instances or disks to scale your capacity.
If total capacity is the main requirement, you can use low-cost standard persistent disks. For the best performance while continuing to be durable, you can use the SSD persistent disks.
If your data is ephemeral and requires sub-millisecond latency and high I/O operations per second (IOPS), you can leverage up to 3 TB of local SSDs for extreme performance. Local SSDs allow for up to ~700k IOPS with speeds similar to DDR2 RAM, all while not using up your instances' allotted network capacity.
For a comparison of the many disk types available to Compute Engine instances, see the documentation for block storage.
Considerations when choosing a filer solution
Choosing a filer solution requires you to make tradeoffs regarding manageability, cost, performance, and scalability. Making the decision is easier if you have a well-defined workload, which isn't often the case. Where workloads evolve over time or are highly variant, it's prudent to trade cost savings for flexibility and elasticity, so you can grow into your solution. On the other hand, if you have a temporal and well-known workload, you can create a purpose-built filer architecture that you can easily tear down and rebuild to meet your immediate storage needs.
One of the first decisions to make is whether you want to pay for a managed filer service, a filer solution that includes product support, or an unsupported solution.
- Managed filer services are the easiest to operate, because either Google or a partner is handling all operations. These filer services might even provide an SLA for availability like most other GCP services.
- Unmanaged yet supported solutions provide additional flexibility. Partners can help with any issues, but the day-to-day operation of the filer is left to the user.
- Unsupported solutions require the most effort to deploy and maintain, leaving all issues to the user. These solutions are not covered in this document.
Your next decision involves figuring out the filer's durability and availability requirements. Most filer solutions are zonal solutions and do not by default provide protection if the zone fails. So it's important to consider if a disaster recovery solution that protects against zonal failures is required. Further, it's important to understand your application requirements for durability and availability. For example, the choice of local SSDs or persistent disks in your deployment has a big impact, as does the configuration of your filer solutions software. Each solution requires careful planning to achieve high durability, availability, and even protection against zonal and regional failures.
Finally, consider the locations (that is, zones, regions, on-premises data centers) of where you need to access the data. The locations of the compute farms that access your data influence your choice of filer solution because only some solutions allow hybrid on-premises and in-cloud access.
Managed filer solutions
Cloud Filestore is Google's fully managed Network Attached Storage (NAS) solution.
You can easily mount Cloud Filestore file shares on Compute Engine VMs. Cloud Filestore is also tightly integrated with Google Kubernetes Engine so your containers can reference the same shared data.
Cloud Filestore offers two performance tiers, Standard and Premium. Both tiers provide consistent performance and predictable costs.
For more information follow these links:
- Dynamically provision GKE storage from Cloud Filestore using the NFS-Client provisioner
- Cloud Filestore powers high-performance storage for ClioSoft's design management platform
NetApp Cloud Volumes
NetApp Cloud Volumes Service for Google Cloud Platform is a fully-managed cloud-native storage service that is integrated in the GCP console, with seamless billing and support from Google.
The service allows you to quickly mount persistent shared storage to your compute instances. This storage delivers high throughput to your applications at low latency, with robust data-protection capabilities (snapshots and copies). With enterprise-grade architecture, the service provides high performance for both sequential and random workloads, which can scale across hundreds or thousands of Compute Engine compute instances. In seconds, volumes that range in size from 1 TB to 100 TB can be provisioned and protected with automated space-efficient snapshots. Commands to mount the created volumes to compute instances, are available in the GCP Console, further enhancing the user experience.
There's no need to rewrite apps as Cloud Volumes provides POSIX compliance shares required by a broad range of file-based workloads including web and rich media content, used across many industries such as Oil & Gas, EDA, and Media & Entertainment.
With three service levels—standard, premium, and extreme—that you can change on demand, Cloud Volumes Service for Google Cloud Platform delivers the right performance fit for your workload, without impacting availability of your workloads. NetApp can also help sync your data between On-Premises and Cloud Volumes Service for Google Cloud Platform.
For more information, follow these links:
Elastifile (acquired by Google Cloud)
Elastifile (acquired by Google Cloud) simplifies enterprise storage and data management on GCP and across hybrid clouds. Elastifile delivers cost-effective, high-performance parallel access to global data while maintaining strict consistency powered by a dynamically scalable, distributed file system. With Elastifile, existing NFS applications and NAS workflows can run in the cloud without requiring refactoring, yet retain the benefits of enterprise data services (high availability, compression, deduplication, replication, and so on). Native integration with Google Kubernetes Engine allows seamless data persistence, portability, and sharing for containerized workloads.
Elastifile is deployable and scalable at the push of a button. It lets you create and expand file system infrastructure easily and on-demand, ensuring that storage performance and capacity always align with your dynamic workflow requirements. As an Elastifile cluster expands, both metadata and I/O performance scale linearly. This scaling allows you to enhance and accelerate a broad range of data-intensive workflows, including high-performance computing, analytics, cross-site data aggregation, DevOps, and many more. As a result, Elastifile is a great fit for use in data-centric industries such as life sciences, electronic design automation (EDA), oil and gas, financial services, and media and entertainment.
Elastifile’s CloudConnect capability enables granular, bidirectional data transfer between any POSIX file system and Cloud Storage. To optimize performance and minimize costs, CloudConnect ensures that data is compressed and deduplicated before transfer and sends changes only after the initial data synchronization. When leveraged for hybrid cloud deployments, CloudConnect allows you to efficiently load data into Cloud Storage from any on-premises NFS file system, delivering a cost-effective way to bring data to GCP. When leveraged in GCP, CloudConnect enables cost-optimized data tiering between an Elastifile file system and Cloud Storage.
For more information, follow these links:
- Deploy Elastifile File System by using GCP Marketplace
- Deploy Elastifile Service by using GCP Marketplace
Supported filer solutions in GCP Marketplace
Panzura is a leader in managing unstructured data in the cloud. Enterprises in media and entertainment, genomics, life sciences, healthcare, oil and gas, financial services, and more choose Panzura Freedom NAS to consolidate their data islands into a single source of truth in Google Cloud Platform (GCP) without sacrificing performance or application rewrites. By consolidating unstructured data (NFS, SMB, and Object) into GCP, you can gain access to all your data, collaborate on this data, and analyze and control it for compliance.
Panzura CloudFS underpins the Freedom Family and is a scale-out, distributed file system built for the cloud. It incorporates intelligent file services backed by 26 patents. The Freedom product family cost-effectively allows you to address the following use cases: cloud migration, global collaboration, and search and analytics.
Together, Panzura Freedom and GCP enable IT leaders to:
- Migrate thousands of legacy applications to GCP without rewrite, changing workflows or sacrificing performance.
- Eliminate copy data sprawl for backup and secondary storage by consolidating data to a single source of truth.
- Collaborate globally on large-scale projects to improve productivity and time to market.
- Modernize your legacy NAS while realizing a 70% cost saving and reducing your file infrastructure in your data center by 90%.
- Rehydrate legacy tape data for advanced analytics and machine learning.
- Download the white paper to learn more about how Panzura and Google work together.
- Read the press release: Panzura Freedom Cloud NAS Now Available in Google Cloud Platform Marketplace.
- Read the blog post: Panzura Freedom Hybrid Cloud NAS Comes to the Google Cloud Platform Marketplace.
- Download the datasheet on the Panzura Freedom NAS Filer.
- Download the white paper on the Panzura Freedom NAS Filer: Technology in Detail.
- Visit the Panzura website.
Quobyte is a parallel, distributed, POSIX-compatible file system that runs in the cloud and on-premises to provide petabytes of storage and millions of IOPS. The company was founded by ex-Google engineers who designed and architected the Quobyte file system by drawing on their deep technical understanding of the cloud.
Customers use Quobyte in demanding, large-scale production environments in industries ranging from life sciences, financial services, aerospace engineering, broadcasting and digital production, and electronic design automation (EDA) to traditional high-performance computing (HPC) research projects.
Quobyte natively supports all Linux, Windows, and NFS applications. Existing applications, newly implemented ones, and developers can work in the same environment whether in the cloud or on-premises. Quobyte offers optional cache-consistency for applications that need stronger guarantees than NFS or have not been designed for distributed setups. And HPC applications can take advantage of the fact that Quobyte is a parallel file system supporting concurrent reads and writes from multiple clients at high speed.
As a distributed file system, Quobyte scales IOPS and throughput linearly with the number of nodes—avoiding the performance bottlenecks of clustered or single filer solutions. Quobyte provides thousands of Linux and Windows client virtual machines (VMs) or containerized applications access to high IOPS, low latency, and several GBs of throughput through its native client software. This native client directly communicates with all storage VMs and can even read from multiple replicas of the data, avoiding the additional latencies and performance bottlenecks of NFS gateways.
You can create and extend Quobyte clusters on Compute Engine in a matter of minutes, allowing admins to run entire workloads in the cloud or to burst peak workloads. Start with a single storage VM and add additional capacity and VMs on the fly; also, dynamically downsize the deployment when resources are no longer needed.
Standard Linux VMs are the foundation for a Quobyte cluster on Compute Engine. The interactive installer makes for a quick and effortless setup. Data is stored on the attached persistent disks, which can be HDD or SSD based. You can use both types in a single installation, for example, as different performance tiers. The volume mirroring feature enables georeplicated disaster recovery (DR) copies of volumes, which you can also use for read-only access in the remote region.
Monitoring and automation are built into Quobyte, making it easy to maintain a cluster of several hundred storage VMs. With a single click, you can add or remove VMs and disks, and new resources are available in less than a minute. Built-in real-time analytics help to identify the top storage consumers and the application's access patterns.
Quobyte is available as a 45-day test license at no cost directly from www.quobyte.com/get-quobyte.
Quobyte supports thousands of clients communicating directly with all storage VMs without any performance bottlenecks. By using optional volume mirroring between different availability zones or on-premises clusters, you can asynchronously replicate volumes between multiple sites—for example, for disaster recovery—for read-only data access.
Supported filer solutions from partners
For workloads that require the utmost read performance, Avere Systems provides a best-of-breed solution. With Avere’s cloud based vFXT clustered cloud file system, you can provide your users with petabytes of storage and millions of IOPS.
The Avere vFXT is not only a filer, but also a read/write cache that allows for minimal changes to your existing workflow by putting working data sets as close to your compute cluster as possible. With Avere, you can employ the cost effectiveness of Cloud Storage as a backing store, along with the performance, scalability and per-second pricing of Compute Engine.
Avere also allows you to make the most of your current on-premises footprint. In addition to being able to leverage GCP with the vFXT, you can use Avere's on-premises FXT series to unify the storage of your legacy devices and storage arrays into an extensible filer with a single namespace.
If you are considering a transition away from your on-premises storage footprint, you can use Avere's FlashCloud technology to migrate to Cloud Storage with zero downtime to your clients. If you find yourself in need of a large amount of storage for a brief period of time, you can use Cloud Storage to burst your workload into the cloud. You can use as much storage and compute as you need, and then deprovision it without paying any ongoing costs.
Avere uses fast local devices, like SSDs and RAM, to cache the currently active data set as close to your compute devices as possible. With the vFXT, you can use the global redundancy and immense scale of Cloud Storage, while still providing your users with the illusion that their data is local to their compute cluster.