On-premises storage integration with Avere vFXT

By Ron Hogue, Training and Development Manager, Avere Systems

This article discusses how to integrate your on-premises Network-Attached Storage (NAS) with the Avere Virtual vFXT Edge filer (the vFXT), a compute and storage resource available on Compute Engine.

Avere Systems, a Google Cloud Platform (GCP) Technology Partner, offers high-performance storage solutions for hybrid cloud infrastructures that you can use to help move compute-intensive workloads with large datasets to GCP.

vFXT is a virtual clustered storage solution that accelerates access to on-premises storage by caching data in the cloud. vFXT helps you:

  • Reduce latency by positioning cached datasets in close proximity to compute environments.
  • Optimize the volume of data transferred between on-premises and GCP networks.
  • Avoid expanding private data centers while supporting the same workloads.
  • Liberate on-premises resources by shifting compute-intensive workloads to the cloud.

Avere vFXT capabilities: low-latency, high-bandwidth connections

Use vFXT to create enterprise-grade network-attached storage on the cloud, reducing the differences between on-premises and Cloud Storage.

Storage speed is often a bottleneck for HPC workloads. With vFXT you can leverage Compute Engine resources by accelerating access to on-premises storage.

With vFXT, you can:

  • Leverage low-cost Cloud Storage for applications running on Compute Engine servers.
  • Connect directly to Compute Engine instances in your Virtual Private Cloud (VPC) network, resulting in low latency and high bandwidth between instances and Cloud Storage.
  • Enable your Compute Engine instances to access data for high-performance workloads faster than reading from your on-premises NAS.

Features: protocols, caching, cost reduction, scalability, security

The Avere vFXT provides the following features.

  • Familiar protocols: Run applications in the cloud using familiar storage protocols, such as NFS, to access on-premises data. Apps use their native protocols rather than Cloud Storage APIs. Cloud instances communicate with vFXT as if they were communicating with on-premises storage.
  • Read-through caching: Connect to your on-premises NAS and cache only the data requested by cloud compute resources. Instances access the data on vFXT as if it were local. Avere's global namespace (GNS) functionality ensures simple NAS access with a single mount point.
  • Reduce cost: Only the working dataset is transferred to the cloud rather than the entire data pool, so storage costs and transfer times are reduced.
  • Scalability: Choose between cluster sizes of 3 to 24 nodes.
  • Security: vFXT data is encrypted using AES256 encryption.

Avere vFXT on GCP architecture

The Avere vFXT cluster sits between your on-premises NAS and cloud resources. vFXT transfers data on an as-needed basis from your on-premises NAS to its attached storage, making subsequent read requests for the same data faster, without synchronizing your entire dataset. If the working set is larger than what the current cluster can cache, you can add nodes to your vFXT cluster to increase vFXT storage capacity, letting it hold the entire working dataset.

The following architecture diagram shows the vFXT cluster sitting between your on-premises NAS and cloud resources:

vFXT cluster between on-premises NAS and cloud resources

GCP components

The vFXT solution uses the following GCP components.

Compute Engine

vFXT nodes run as Compute Engine instances, as n1-highmem-8 or n1-highmem-32 instance machine types.

Cloud Storage

Persistent and Local SSDs

On creation, vFXT creates a persistent SSD as its boot disk. Data disks can be either persistent SSDs or local SSDs. For burst-type workloads, you can typically use a local SSD as your data disk.

Buckets

You can choose to create a Cloud Storage bucket during the vFXT cluster creation process. This provides a low-cost option for longer-term storage of data, as well as faster retrieval of data no longer in the cache.

Disk type and size

The type of disk you choose will determine disk size. Local SSD disks have a fixed size of 375 GB per disk. You can add between 1 and 8 local SSD disks to a single node.

vFXT allows persistent disk sizes to range from 250 GB to 8000 GB (8 TB). For testing, you might choose the lowest number at 250 GB per node. A three-node vFXT cluster provides 750 GB of cache for the cluster. If you determine that your working set requires 5 TB of cache, you might choose 2 TB per node for 6 TB of total cache in a three-node cluster.

Cluster size

You can create vFXT with between 3 and 24 nodes. After you create your first nodes, you can add more nodes, to a maximum of 24.

Some example vFXT configurations are:

Node count Disk type Cache per node Aggregate cluster cache
3 Local SSD 1125 GB (3 x 375 GB) 3.375 TB
12 Local SSD 3000 GB (8 x 375 GB) 36 TB
24 Persistent SSD 8000 GB 192 TB

Preparing to create the vFXT cluster

This section outlines the process for deploying the vFXT cluster.

Specifying a project ID

You must provide your GCP project ID to Avere. Avere shares the vFXT custom image with that project. You can find your project ID and number in the Google Cloud Platform Console Dashboard. Send this project ID to your Avere sales representative or to the general sales email at sales@averesystems.com.

Choosing a storage type

During vFXT creation, you must choose which storage type you want to use in the vFXT cluster. You can choose between persistent SSDs or local SSDs for the data drives.

Persistent SSD

Persistent SSDs let you stop and restart instances in the vFXT cluster. Persistent SSD is the recommended storage type, because the data on those drives remains after you restart the cluster.

Local SSD

Local SSD data doesn't persist after you restart the vFXT cluster. Clusters that use local SSDs for storage cannot be stopped or restarted — only terminated — whereupon all stored data and configuration is lost. Local SSDs are less expensive and provide faster speeds than persistent SSDs, making them ideal for burst workloads where the stored data is ephemeral.

Calculating resource usage

Depending on how you configure your Avere vFXT cluster, you might have to increase your resource quota. Verify that your project has sufficient quota for the number of CPU cores and the type of storage you chose. If your project doesn't have enough resource quota to create the Avere vFXT cluster, request a quota increase.

Cluster size, disk size, and the type of disk you choose helps you determine the size of the required resource quota.

Resource type Quota calculation Minimum quota required*
vCPUs [Number of nodes] x [8 or 32] 24 vCPUs
Local SSD [Number of nodes] x [SSD amount] 1125 GB
Persistent SSD [Number of nodes] x [PD SSD amount] 750 GB

* 3-node vFXT cluster.

Enabling storage connectivity

vFXT must have connectivity to GCP APIs, such as the Compute Engine API, as well as external endpoints, such as your on-premises NAS and the Avere API. Instances connected to vFXT are protected from external access by firewall policies you configure.

There are a number of ways to allow vFXT to communicate with your on-premises NAS. Choose the connectivity method which is best for your situation.

NAT gateway

This is the most commonly used method for vFXT to communicate with external resources.

A NAT gateway is an instance that can forward traffic on behalf of any instance on your VPC network. NAT gateways allow your vFXT cluster to communicate with your on-premises NAS, as well as external APIs such as Avere's software portal.

The following diagram shows how to use an NAT gateway to communicate with external resources:

using an NAT gateway to communicate with external
resources

You can also use a NAT gateway instance to provide an SSH tunnel for vFXT Control Panel access.

If you choose to configure a NAT gateway, you must add firewall rules to allow communication between your on-premises NAS, the NAT Gateway, and your vFXT cluster.

You can improve access to on-premises NAS by using Cloud Interconnect options instead of a NAT gateway.

Private Google access

By default, your vFXT cluster is configured without external access. You can configure Private Google access so that the vFXT cluster can reach APIs, such as the Cloud Storage API, by using only internal IP addresses. See Configuring private Google access for more information.

Cloud Interconnect

Two connectivity methods allow direct access to RFC 1918 IPs in your VPC: Cloud Interconnect – Dedicated and IPsec VPN. See Cloud Interconnect documentation for detailed information about these methods.

Dedicated Interconnect

Dedicated Interconnect provides direct physical connections and RFC 1918 addressability between your on-premises network and Google's network. Dedicated Interconnect enables you to transfer large amounts of data between networks, which can provide higher bandwidth and lower cost than purchasing additional bandwidth over the public internet or by using VPN tunnels.

The following diagram shows using Dedicated Interconnect to provide connectionsband addressability between your on-premises network and Google's network:

Dedicated Interconnect

IPsec VPN

Cloud VPN lets you connect your existing network to GCP using IPsec tunnels through a VPN gateway device. Traffic is encrypted as it transmits over public links to the private IP interfaces of Avere vFXT.

The following diagram shows using IPsec tunnels through a VPN gateway device:

IPsec tunnels through a VPN gateway

Creating the vFXT cluster

The vFXT cluster is created with a Python script (vfxt.py), which is included in the Avere vFXT Python library. You can also download the script from Avere System's GitHub page. This script streamlines the creation process by using GCP APIs and the Google API Client Library for Python.

You must be connected to GCP to run this script.

If you choose to run the script from a cloud instance, you must ensure the instance's scope allows for the creation of storage and compute resources. Add the scopes compute-rw and storage-rw to your instance on creation.

Accessing the vFXT cluster

After you create the cluster, you configure access to your on-premises NAS by the vFXT cluster on the Avere vFXT Control Panel.

  1. In your browser's address bar, go to https://127.0.0.1:8443/.

    1. If you are directly connected to GCP, through VPN or Dedicated Interconnect, in your browser, go to the IP address of any node in your vFXT cluster.
  2. For the username, enter admin.

  3. Enter the password generated by the vfxt.py script.

If a Cloud Storage bucket was created during vFXT cluster creation, you can access it as additional storage within the same namespace as the local NAS. For more information about connecting to NAS and Cloud Storage, consult the Avere documentation.

Costs

You can calculate a portion of the costs of running an Avere vFXT on GCP using the Pricing Calculator. Consider the following components when you estimate costs:

  • Number and type of nodes in the vFXT cluster (either n1-highmem-8 or n1-highmem-32).
  • Size of persistent SSD drive, if any.
  • Number of local SSD drives, if any.
  • Egress costs, if you are writing between zones or back to on-premises storage. For more information, see Network Pricing documentation.
  • VPN costs (if any), which can vary depending on the number of VPN tunnels opened. For more information, see VPN Pricing documentation.
  • Avere vFXT licensing and deployment costs. For more information, contact your Avere representative.

Use cases

Deploying workloads on the cloud with Avere vFXT allows you to take advantage of cloud compute resources with data pulled from local storage. A wide variety of industries are embracing the use of cloud compute resources without having to move data.

Rendering for media and content creation

  • Manage rendering workloads as production needs change.
  • Optimize costs and boost productivity by deploying only the compute resources necessary for the duration of a project.
  • Help maintain high levels of security as data moves between on-premises NAS and cloud resources.
  • Avoid time-consuming hardware rentals.
  • Free up local resources, allowing artists to be more productive.

For more information about the benefits of cloud rendering, see the Moonbot Studios case study.

Running simulations in financial management

  • Use GCP services to perform complex risk analysis and quant simulation.
  • Avoid processing limitations by deploying as many cores as necessary for quicker decision making.
  • Gain access to tens of thousands of compute cores without moving data from its data center location.

Processing genomics in life sciences

  • Access GCP resources in close proximity to large research datasets.
  • Support research by eliminating bottlenecks in infrastructure to supply high-performance access to unlimited compute resources.
  • Avoid data center expansions while continuing to support mission-critical work.
  • Manage processing capacity during peaks in demand.

What's next

此页内容是否有用?请给出您的反馈和评价:

发送以下问题的反馈:

此网页
Solutions