By Ron Hogue, Training and Development Manager, Avere Systems
Avere Systems, a Google Cloud Technology Partner, offers high-performance storage solutions for hybrid cloud infrastructures that you can use to help move compute-intensive workloads with large datasets to Google Cloud.
vFXT is a virtual clustered storage solution that accelerates access to on-premises storage by caching data in the cloud. vFXT helps you:
- Reduce latency by positioning cached datasets in close proximity to compute environments.
- Optimize the volume of data transferred between on-premises and Google Cloud networks.
- Avoid expanding private data centers while supporting the same workloads.
- Liberate on-premises resources by shifting compute-intensive workloads to the cloud.
Avere vFXT capabilities: low-latency, high-bandwidth connections
Use vFXT to create enterprise-grade network-attached storage on the cloud, reducing the differences between on-premises and Cloud Storage.
Storage speed is often a bottleneck for HPC workloads. With vFXT you can leverage Compute Engine resources by accelerating access to on-premises storage.
With vFXT, you can:
- Leverage low-cost Cloud Storage for applications running on Compute Engine servers.
- Connect directly to Compute Engine instances in your Virtual Private Cloud (VPC) network, resulting in low latency and high bandwidth between instances and Cloud Storage.
- Enable your Compute Engine instances to access data for high-performance workloads faster than reading from your on-premises NAS.
Features: protocols, caching, cost reduction, scalability, security
The Avere vFXT provides the following features.
- Familiar protocols: Run applications in the cloud using familiar storage protocols, such as NFS, to access on-premises data. Apps use their native protocols rather than Cloud Storage APIs. Cloud instances communicate with vFXT as if they were communicating with on-premises storage.
- Read-through caching: Connect to your on-premises NAS and cache only the data requested by cloud compute resources. Instances access the data on vFXT as if it were local. Avere's global namespace (GNS) functionality ensures simple NAS access with a single mount point.
- Reduce cost: Only the working dataset is transferred to the cloud rather than the entire data pool, so storage costs and transfer times are reduced.
- Scalability: Choose between cluster sizes of 3 to 24 nodes.
- Security: vFXT data is encrypted using AES256 encryption.
Avere vFXT on Google Cloud architecture
The Avere vFXT cluster sits between your on-premises NAS and cloud resources. vFXT transfers data on an as-needed basis from your on-premises NAS to its attached storage, making subsequent read requests for the same data faster, without synchronizing your entire dataset. If the working set is larger than what the current cluster can cache, you can add nodes to your vFXT cluster to increase vFXT storage capacity, letting it hold the entire working dataset.
The following architecture diagram shows the vFXT cluster sitting between your on-premises NAS and cloud resources:
Google Cloud components
The vFXT solution uses the following Google Cloud components.
vFXT nodes run as Compute Engine instances, as
n1-highmem-32 instance machine types.
Persistent and Local SSDs
On creation, vFXT creates a persistent SSD as its boot disk. Data disks can be either persistent SSDs or local SSDs. For burst-type workloads, you can typically use a local SSD as your data disk.
You can choose to create a Cloud Storage bucket during the vFXT cluster creation process. This provides a low-cost option for longer-term storage of data, as well as faster retrieval of data no longer in the cache.
Disk type and size
The type of disk you choose will determine disk size. Local SSD disks have a fixed size of 375 GB per disk. You can add between 1 and 8 local SSD disks to a single node.
vFXT allows persistent disk sizes to range from 250 GB to 8000 GB (8 TB). For testing, you might choose the lowest number at 250 GB per node. A three-node vFXT cluster provides 750 GB of cache for the cluster. If you determine that your working set requires 5 TB of cache, you might choose 2 TB per node for 6 TB of total cache in a three-node cluster.
You can create vFXT with between 3 and 24 nodes. After you create your first nodes, you can add more nodes, to a maximum of 24.
Some example vFXT configurations are:
|Node count||Disk type||Cache per node||Aggregate cluster cache|
|3||Local SSD||1125 GB (3 x 375 GB)||3.375 TB|
|12||Local SSD||3000 GB (8 x 375 GB)||36 TB|
|24||Persistent SSD||8000 GB||192 TB|
Preparing to create the vFXT cluster
This section outlines the process for deploying the vFXT cluster.
Specifying a project ID
You must provide your Google Cloud project ID to Avere. Avere shares the vFXT custom image with that project. You can find your project ID and number in the Google Cloud Console Dashboard. Send this project ID to your Avere sales representative or to the general sales email at firstname.lastname@example.org.
Choosing a storage type
During vFXT creation, you must choose which storage type you want to use in the vFXT cluster. You can choose between persistent SSDs or local SSDs for the data drives.
Persistent SSDs let you stop and restart instances in the vFXT cluster. Persistent SSD is the recommended storage type, because the data on those drives remains after you restart the cluster.
Local SSD data doesn't persist after you restart the vFXT cluster. Clusters that use local SSDs for storage cannot be stopped or restarted — only terminated — whereupon all stored data and configuration is lost. Local SSDs are less expensive and provide faster speeds than persistent SSDs, making them ideal for burst workloads where the stored data is ephemeral.
Calculating resource usage
Depending on how you configure your Avere vFXT cluster, you might have to increase your resource quota. Verify that your project has sufficient quota for the number of CPU cores and the type of storage you chose. If your project doesn't have enough resource quota to create the Avere vFXT cluster, request a quota increase.
Cluster size, disk size, and the type of disk you choose helps you determine the size of the required resource quota.
|Resource type||Quota calculation||Minimum quota required*|
|vCPUs||[Number of nodes] x [8 or 32]||24 vCPUs|
|Local SSD||[Number of nodes] x [SSD amount]||1125 GB|
|Persistent SSD||[Number of nodes] x [PD SSD amount]||750 GB|
* 3-node vFXT cluster.
Enabling storage connectivity
vFXT must have connectivity to Google Cloud APIs, such as the Compute Engine API, as well as external endpoints, such as your on-premises NAS and the Avere API. Instances connected to vFXT are protected from external access by firewall policies you configure.
There are a number of ways to allow vFXT to communicate with your on-premises NAS. Choose the connectivity method which is best for your situation.
This is the most commonly used method for vFXT to communicate with external resources.
A NAT gateway is an instance that can forward traffic on behalf of any instance on your VPC network. NAT gateways allow your vFXT cluster to communicate with your on-premises NAS, as well as external APIs such as Avere's software portal.
The following diagram shows how to use an NAT gateway to communicate with external resources:
You can also use a NAT gateway instance to provide an SSH tunnel for vFXT Control Panel access.
If you choose to configure a NAT gateway, you must add firewall rules to allow communication between your on-premises NAS, the NAT Gateway, and your vFXT cluster.
You can improve access to on-premises NAS by using Cloud Interconnect options instead of a NAT gateway.
Private Google access
By default, your vFXT cluster is configured without external access. You can configure Private Google access so that the vFXT cluster can reach APIs, such as the Cloud Storage API, by using only internal IP addresses. See Configuring private Google access for more information.
Two connectivity methods allow direct access to RFC 1918 IPs in your VPC: Dedicated Interconnect and IPsec VPN. See Cloud Interconnect documentation for detailed information about these methods.
Dedicated Interconnect provides direct physical connections and RFC 1918 addressability between your on-premises network and Google's network. Dedicated Interconnect enables you to transfer large amounts of data between networks, which can provide higher bandwidth and lower cost than purchasing additional bandwidth over the public internet or by using VPN tunnels.
The following diagram shows using Dedicated Interconnect to provide connectionsband addressability between your on-premises network and Google's network:
Cloud VPN lets you connect your existing network to Google Cloud using IPsec tunnels through a VPN gateway device. Traffic is encrypted as it transmits over public links to the private IP interfaces of Avere vFXT.
The following diagram shows using IPsec tunnels through a VPN gateway device:
Creating the vFXT cluster
The vFXT cluster is created with a Python script (
vfxt.py), which is included
in the Avere vFXT
You can also download the script from Avere System's
This script streamlines the creation process by using Google Cloud APIs
Google API Client Library for Python.
You must be connected to Google Cloud to run this script.
If you choose to run the script from a cloud instance, you must ensure the
instance's scope allows for the creation of storage and compute resources. Add
storage-rw to your instance on creation.
Accessing the vFXT cluster
After you create the cluster, you configure access to your on-premises NAS by the vFXT cluster on the Avere vFXT Control Panel.
In your browser's address bar, go to
- If you are directly connected to Google Cloud, through VPN or Dedicated Interconnect, in your browser, go to the IP address of any node in your vFXT cluster.
For the username, enter
Enter the password generated by the
If a Cloud Storage bucket was created during vFXT cluster creation, you can access it as additional storage within the same namespace as the local NAS. For more information about connecting to NAS and Cloud Storage, consult the Avere documentation.
You can calculate a portion of the costs of running an Avere vFXT on Google Cloud using the Pricing Calculator. Consider the following components when you estimate costs:
- Number and type of nodes in the vFXT cluster (either
- Size of persistent SSD drive, if any.
- Number of local SSD drives, if any.
- Egress costs, if you are writing between zones or back to on-premises storage. For more information, see Network Pricing documentation.
- VPN costs (if any), which can vary depending on the number of VPN tunnels opened. For more information, see VPN Pricing documentation.
- Avere vFXT licensing and deployment costs. For more information, contact your Avere representative.
Deploying workloads on the cloud with Avere vFXT allows you to take advantage of cloud compute resources with data pulled from local storage. A wide variety of industries are embracing the use of cloud compute resources without having to move data.
Rendering for media and content creation
- Manage rendering workloads as production needs change.
- Optimize costs and boost productivity by deploying only the compute resources necessary for the duration of a project.
- Help maintain high levels of security as data moves between on-premises NAS and cloud resources.
- Avoid time-consuming hardware rentals.
- Free up local resources, allowing artists to be more productive.
For more information about the benefits of cloud rendering, see the Moonbot Studios case study.
Running simulations in financial management
- Use Google Cloud services to perform complex risk analysis and quant simulation.
- Avoid processing limitations by deploying as many cores as necessary for quicker decision making.
- Gain access to tens of thousands of compute cores without moving data from its data center location.
Processing genomics in life sciences
- Access Google Cloud resources in close proximity to large research datasets.
- Support research by eliminating bottlenecks in infrastructure to supply high-performance access to unlimited compute resources.
- Avoid data center expansions while continuing to support mission-critical work.
- Manage processing capacity during peaks in demand.