A file server, also called a storage filer, provides a way for applications to read and update files that are shared across machines. Typically, a file server uses a protocol that enables client machines to mount a filesystem and access the files as if they were hosted locally. The most common protocols for exporting file shares are Network File System (NFS) for Linux and the Common Internet File System (CIFS) for Windows.
An underlying factor in the performance and predictability of all of the Google Cloud Platform services is the network stack that Google has evolved over many years. With the Jupiter Fabric, Google has built a robust, scalable, and stable networking stack that can continue to evolve without affecting your workloads. As Google improves and bolsters its network abilities internally, your file-sharing solution will benefit from the added performance. For more details on the Jupiter Fabric, see the 2015 paper that describes its evolution.
One feature of GCP that can help you get the most out of your investment is the ability to specify custom VM types. When choosing the size of your filer, you can pick exactly the right mix of memory and CPU, so that your filer is operating at optimal performance without being oversubscribed. Custom VM types allow you to allocate up to 624 GB of memory and up to 96 cores per machine. If you are replicating or moving your existing filer over to GCP, you can use the exact same specifications to ensure parity between environments.
Cloud Storage is a great way to store your data with high levels of redundancy at a low cost, while enabling you to scale to petabytes and beyond. With Cloud Storage, you can upload and download objects to namespaces, called buckets, which are similar to folders.
Objects uploaded to Cloud Storage can be terabytes in size and can be uploaded in parallel using composite objects. When your upload is successful, your object is available globally to all readers, thanks to Cloud Storage's strong consistency. You can configure access control to provide fine grained policies scoped at the user, account, and even domain levels. To control your costs, Cloud Storage offers the ability to store data at varying storage classes that are designed to provide different levels of availability and latency.
Cloud Storage has a powerful set of features, and in many situations has benefits over using a file server. In some cases, however, a file server might be more appropriate for your situation. Here are some things to consider if you choose to share files using Cloud Storage:
Reads and writes are done on the entire file rather than at offsets, which means a full overwrite of the file is necessary when uploading.
When multiple writers are operating at the same time, the last write wins, and overwrites the other changes to the file unless you provide your own synchronization mechanism.
If your application requires access to POSIX file metadata attributes, like last-modified timestamps, you must use the Cloud Storage API rather than a
statcall on your host.
Compute Engine persistent disks
If you have data that doesn't change over time, you might be able to use Compute Engine's persistent disks, and avoid hosting a file server altogether. With persistent disks, you can attach volumes in both read-write and read-only modes. This means that you can first attach a volume to an instance, load it with the data you need, and then attach it as a read-only disk to hundreds of virtual machines simultaneously. Employing read-only persistent disks does not work for all use cases, but it can greatly reduce complexity, compared to using a file server.
Compute Engine's persistent disks are a great way to store data in Google Cloud Platform (GCP), because they give you flexibility in balancing scale and performance against cost. Persistent disks can also be resized on the fly, allowing you to start with a low cost and low capacity volume, and not requiring you to spin up additional instances or disks to scale your capacity. Persistent disk throughput and IOPS scale linearly with disk size. That means you can scale your performance by doing a resize, which requires little to no downtime. You no longer have to stripe together a set of disks with software-based RAID mechanisms to get the aggregate performance you want.
Another advantage of persistent disks is their consistent performance. All disks of the same size that you attach to your instance have the same performance characteristics. You don't need to pre-warm or test your persistent disks before using them in production.
Performance is not the only thing that is easy to predict: the cost of persistent disks is easy to determine, because there are no IO costs to consider after provisioning your volume. You can easily balance cost and performance, because you have the option of using three different types of disks with varying costs and performance characteristics.
For some workloads, total capacity is the main scaling factor; for those, you can use cheaper spinning disks, called standard persistent disks, instead of leveraging the additional IOPS and cost of an SSD persistent disk. If your data is ephemeral and requires sub-millisecond latency and high IOPS, you can leverage up to 3 TB of local SSDs for extreme performance. Local SSDs allow for up to ~700k IOPS with speeds similar to DDR2 RAM, all while not using up your instances’ allotted network capacity. For a comparison of the many disk types available to Compute Engine instances, see the documentation for block storage.
Considerations when choosing a filer solution
Choosing a filer solution requires you to make tradeoffs regarding cost, performance, and scalability. Making the decision is easier if you have a well defined workload, but unfortunately that often isn't the case. In situations where workloads evolve over time or are highly variant, it is prudent to trade cost savings for flexibility and elasticity, so you can grow into your solution. On the other hand, if you have a workload that is temporal and well known, you can create a purpose-built filer architecture that can easily be torn down and rebuilt to meet your immediate storage needs.
One of the first decisions to make is whether you want to pay for a supported filer solution, or if you have the staff available to create your own solution and maintain it in the long run. Once you have decided on a support model, the next decision involves figuring out the durability requirements of your filer. Can you afford to lose any of the data? If not, how much are you willing to pay to ensure that the data is properly replicated to allow for disaster recovery. Next, consider the overall size of your present and future data sets, as this will heavily influence the cost of your filer. Finally, consider your mount locations. The locations of the compute farms you use to access your data will influence your choice of filer solution, as only some solutions allow hybrid on-premises and in-cloud access.
Single Node File Server
The easiest way to get a filer up and running on GCP is to use Single Node File Server, which you can deploy automatically by using Cloud Marketplace. The deployment includes monitoring via Grafana. In minutes you can have a fully functional filer.
When you use Cloud Marketplace to deploy Single Node File Server, you can configure the type of backing disk you'd like: standard or SSD. You can also configure the instance type and total data disk size. Keep in mind that the performance of your filer depends on both the size and type of disk as well as the instance type. The type and size of disk determine the total throughput. The larger the disk, the more performance you will get. The instance type determines how much network bandwidth is available to your filer. For each core, you will see up to 2 Gb/s of network throughput. With those guidelines in place, you should be able to decide where you need to start.
After your filer is fully deployed, you can mount your shares by using NFS or SMB mounts from any host on the local subnet. Keep in mind that you can start with smaller disks and then resize them as necessary to scale with your performance or capacity needs.
If you can tolerate downtime, you can also scale up your filer by stopping the instance, changing the instance type, and then starting it again. Although Single Node File Server cannot scale horizontally to provide a shared pool of disks, you can create as many of the individual filers as you need. This approach could be useful if you are doing development or testing against a shared filesystem back end.
Although Single Node File Server does not provide redundancy for your data, you can create snapshots of your data disk in order to take periodic backups. There is no official paid support for Single Node Filer, so the costs of running it are tied directly to the instance, disk, and network costs. In general, this option should be very low maintenance and require little to no administration.
With Single Node File Server, you can scale up to:
- 64 TB of persistent disk SSD for data
- 800 MB/s disk read throughput per instance
- 400 MB/s disk write throughput per instance
- 40,000 read IOPS per instance
- 30,000 write IOPS per instance
- 3.0 TB of local SSD for caching
- 680k+ read IOPS (similar speed to DDR2 RAM)
- 360k+ write IOPS
- 8 Gb/s of network bandwidth
Elastifile simplifies enterprise storage and data management on GCP and across hybrid clouds. Elastifile delivers cost-effective, high-performance parallel access to global data while maintaining strict consistency powered by a dynamically scalable, distributed file system with intelligent object tiering. With Elastifile, existing NFS applications and NAS workflows can run in the cloud without requiring refactoring, yet retain the benefits of enterprise data services (high availability, compression, deduplication, replication, and so on). Native integration with Google Kubernetes Engine allows seamless data persistence, portability, and sharing for containerized workloads.
Elastifile is deployable and scalable at the push of a button. It lets you create and expand file system infrastructure easily and on-demand, ensuring that storage performance and capacity always align with your dynamic workflow requirements. As an Elastifile cluster expands, both metadata and I/O performance scale linearly. This scaling allows you to enhance and accelerate a broad range of data-intensive workflows, including high-performance computing, analytics, cross-site data aggregation, DevOps, and many more. As a result, Elastifile is a great fit for use in data-centric industries such as life sciences, electronic design automation (EDA), oil and gas, financial services, and media and entertainment.
Elastifile’s CloudConnect capability enables granular, bidirectional data transfer between any POSIX file system and Cloud Storage. To optimize performance and minimize costs, CloudConnect ensures that data is compressed and deduplicated before transfer and sends changes only after the initial data synchronization. When leveraged for hybrid cloud deployments, CloudConnect allows you to efficiently load data into Cloud Storage from any on-premises NFS file system, delivering a cost-effective way to bring data to the cloud. When leveraged in the cloud, CloudConnect enables cost-optimized data tiering between an Elastifile file system and Cloud Storage.
For more information, follow these links:
- Deployment whitepaper - Elastifile on Google Cloud Platform
- Solutions brief - Elastifile on Google Cloud Platform
- Elastifile Overview on GCP Live - NFS and NAS on GCP (video)
- Demo - Scalable Molecular Dynamics Simulations on Google Cloud Platform (video)
- Contact Elastifile or deploy Elastifile by using Cloud Marketplace.
Quobyte is a parallel, distributed, POSIX-compatible file system that runs in the cloud and on-premises to provide petabytes of storage and millions of IOPS. The company was founded by ex-Google engineers who designed and architected the Quobyte file system by drawing on their deep technical understanding of the cloud.
Customers use Quobyte in demanding, large-scale production environments in industries ranging from life sciences, financial services, aerospace engineering, broadcasting and digital production, and electronic design automation (EDA) to traditional high-performance computing (HPC) research projects.
Quobyte natively supports all Linux, Windows, and NFS applications. Existing applications, newly implemented ones, and developers can work in the same environment whether in the cloud or on-premises. Quobyte offers optional cache-consistency for applications that need stronger guarantees than NFS or have not been designed for distributed setups. And HPC applications can take advantage of the fact that Quobyte is a parallel file system supporting concurrent reads and writes from multiple clients at high speed.
As a distributed file system, Quobyte scales IOPS and throughput linearly with the number of nodes—avoiding the performance bottlenecks of clustered or single filer solutions. Quobyte provides thousands of Linux and Windows client virtual machines (VMs) or containerized applications access to high IOPS, low latency, and several GB/s of throughput through its native client software. This native client directly communicates with all storage VMs and can even read from multiple replicas of the data, avoiding the additional latencies and performance bottlenecks of NFS gateways.
Quobyte clusters on Compute Engine can be created and extended in a matter of minutes, allowing admins to run entire workloads in the cloud or to burst peak workloads. Start with a single storage VM and add additional capacity and VMs on the fly; also, dynamically downsize the deployment when resources are no longer needed.
Standard Linux VMs are the foundation for a Quobyte cluster on Compute Engine. The interactive installer makes for a quick and effortless setup. Data is stored on the attached persistent disks, which can be HDD or SSD based. You can use both types in a single installation, for example, as different performance tiers. The volume mirroring feature enables georeplicated disaster recovery (DR) copies of volumes, which you can also use for read-only access in the remote region.
Monitoring and automation are built into Quobyte, making it easy to maintain a cluster of several hundred storage VMs. With a single click, you can add or remove VMs and disks, and new resources are available in less than a minute. Built-in real-time analytics help to identify the top storage consumers and the application's access patterns.
Quobyte is available as a 45-day test license at no cost directly from www.quobyte.com/get-quobyte.
Quobyte supports thousands of clients communicating directly with all storage VMs without any performance bottlenecks. By using optional volume mirroring between different availability zones or on-premises clusters, you can asynchronously replicate volumes between multiple sites—for example, for disaster recovery—for read-only data access.
For workloads that require the utmost read performance, Avere Systems provides a best-of-breed solution. With Avere’s cloud based vFXT clustered cloud filesystem, you can provide your users with petabytes of storage and millions of IOPS.
The Avere vFXT is not only a filer, but also a read/write cache that allows for minimal changes to your existing workflow by putting working data sets as close to your compute cluster as possible. With Avere, you can employ the cost effectiveness of Cloud Storage as a backing store, along with the performance, scalability and per-second pricing of Compute Engine.
Avere also allows you to make the most of your current on-premises footprint. In addition to being able to leverage GCP with the vFXT, you can use Avere’s on-premises FXT series to unify the storage of your legacy devices and storage arrays into an extensible filer with a single namespace.
If you are considering a transition away from your on-premises storage footprint, you can use Avere's FlashMove technology to migrate to Cloud Storage with zero downtime to your clients. If you want to provide a disaster recovery mechanism for your on-premises data, you can use the FlashMirror feature to replicate your on-premises storage in Cloud Storage. If you find yourself in need of a large amount of storage for a brief period of time, you can use Cloud Storage to burst your workload into the cloud. You can use as much storage and compute as you need, and then deprovision it without paying any ongoing costs.
Avere uses fast local devices, like SSDs and RAM, to cache the currently active data set as close to your compute devices as possible. With the vFXT, you can use the global redundancy and immense scale of Cloud Storage, while still providing your users with the illusion that their data is local to their compute cluster.
Summary of file server options
The following table summarizes the features of persistent disks and three file server options:
|Filer solution||Optimal data set||Throughput||Managed support||Export protocols||Highly available||Hybrid|
|Single Node File Server||< 64 TB||Up to 16 Gb/s||No||NFSv3, SMB3||No||No|
|Elastifile||10s of TB to > 1 PB||10s to 100s of Gb/s||Elastifile||NFSv3||Yes||Yes|
|Quobyte||10s of TB to > 1 PB||100s to 1000s of Gb/s||Quobyte||Native Linux and Windows clients, Amazon S3, HDFS, NFSv4/3, SMB||Yes||Yes|
|Avere||10s to 100s of TB||10s to 100s of Gb/s||Avere||NFSv3, SMB2||Yes||Yes|
|Read-only PD||< 64 TB||180 to 800 MB/s||No||Direct attachment||No||No|
For a comparison of the many disk types available to Compute Engine instances, see the documentation for block storage.