Optimizing Data Ingestion Using BitSpeed Concurrency

By Doug Davis, CEO and Founder, BitSpeed

This article discusses BitSpeed Concurrency, a product aimed at Google Cloud Platform (GCP) users who need to manage large data flows between on-premises systems and their cloud environments.

High-speed data movement to the cloud requires the optimal use of network protocols and specialized techniques to manage the end-to-end workflow. As dataset sizes grow in the enterprise and more enterprises move some portion of their workloads to the cloud, data movement workflows quickly become a bottleneck in both performance and complexity.

Concurrency addresses these concerns by offering the following benefits:

  • Combines the best of reliable and efficient use of network protocols.
  • Uses the latest research in maximizing protocol efficiency to control and manage the exchange of large-scale media objects.
  • Provides a turn-key package with professional support.
  • Replaces FTP or scp to accelerate and secure large file transfers to GCP.
  • Supports non-disruptive scripting and watches folders to automatically move new data.
  • Supports data movement to and from the cloud.

Concurrency combines the strength and ordered nature of TCP with new techniques to optimize the bandwidth connection using multiple concurrent streams. As a result, Concurrency provides increased utilization of available bandwidth and increased network packet processing, compared to traditional file transfer protocols.

Capabilities

The advanced techniques used by Concurrency enable:

  • Full use of available bandwidth, rather than a small fraction that is typically used by other techniques.
  • Data movement 5–100 times faster than older file transport protocols such as FTP, rsync, and scp.
  • Up to 500 percent faster than UDP-based solutions.
  • Auditable and guaranteed verifiable delivery.
  • Encrypted data transfer to ensure privacy of the data in transit.

Features

Concurrency provides the following features:

  • Security. Nine different encryption algorithms are available, of which AES-128 and AES-256 are the most commonly used. All key management is done within the BitSpeed process and is not accessible from the outside.
  • Configuration. Web-based configuration and administration.
  • Monitoring. Supports custom reporting to integrate into existing network management solutions and to receive email notifications. You can monitor ongoing operations from either a GUI, through CLI reporting, or by using a shell manager.

Architecture

Concurrency is a point-to-point application distributed as a server instance installed in your GCP project. It's also an appliance that is installed in the data center. Once Concurrency is installed, an administrator connects the two instances to enable data movement between them. The cloud server instance is configured to push the received data to Google Cloud Storage.

The Concurrency administrator can either manually initiate data movement from the on-premises appliance to the cloud instance, or configure a set of folders that are monitored by the appliance. If any new files or file updates are discovered in the watch folders, they are automatically copied.

Concurrency admin UI

Figure 1. BitSpeed Concurrency's administration user interface to configure the endpoints and create watched folders.

Components

Concurrency uses several components to enhance speed and security.

Speed

  • Parallel TCP. The underlying TCP protocol gives you the benefits of TCP versus other acceleration protocols such as UDP.

    • TCP is ordered, so files are written in the same format and order they were sent. UDP requires a reordering at the destination for applications to use the files.
    • TCP can be used in LANs as well, unlike UDP, which floods the network with randomized packets.
    • Parallelizing TCP streams allows faster transfer of more packets over multiple parallel streams within the BitSpeed process. This helps file transfers to complete anywhere from 3 to 100 times faster.
    • By mitigating line conditions, including latency, packet loss, and jitter, TCP enables faster, more consistent connections.
  • Advanced File Replication (AFR). With AFR, you can begin transferring files as they are being created at the source. Completed files arrive at the destinations within seconds.

  • Compression. Concurrency allows the use of compression on top of its native acceleration architecture. If data files are compressible, transfers will be even faster, by a factor of the compression ratio of the data (for example, 2–4 times faster).
  • Multicasting and chain multicasting. Concurrency enables you to transfer source files to multiple destinations at the same time.

Security

  • Encryption. Concurrency uses encryption with nine different user-selectable algorithms (for example, AES-128 and AES-256). This encryption makes transfers more secure, even if the original data is already encrypted.

  • Guaranteed delivery using checksums. Concurrency adds auditable checksum capability so that all packets are confirmed and guaranteed arrival.

Value

Although file transfer protocols such as FTP, rsync, and scp are free, they are decades old and not optimized for today's bigger network connections and larger files. About 10 years ago, the faster UDP protocol emerged as the next generation of file transfer solutions. UDP has inherent present and future architectural weaknesses, however, such as randomized packets, and transfer rate limits on faster lines. Concurrency's performance is, in general, better than UDP, and comes at 30–50% of the cost of the leading UDP-based solutions.

With Concurrency, you can speed up workflow or backup of repositories. You are able to send or upload large files either faster or even as they are being created, instead of gathering them up for later transmission. These capabilities make your organization more efficient, save time and money, and allow more effective use of costly bandwidth.

Costs

BitSpeed instances, including physical appliances, are charged on a monthly, per-instance basis. There is no limitation to the amount of data that can be transferred using Concurrency. Pricing varies by bandwidth connection, with three bandwidth pricing levels: 0–500 Mbps, 500–1000 Mbps, and above 1000 Mbps.

Performance

With BitSpeed Concurrency, you can expect an increase in transfer rates. Here are some factors that will help you determine how much performance increase to expect:

  • Connection. The greater the bandwidth between the user site and GCP, the greater the "acceleration factor" Concurrency can provide. Under 100 Mbps, you can reasonably expect 2–4 times faster transfer. At 1000 Mbps (GbE), you can expect anywhere between 2 and 10 times faster.
  • Line conditions. The greater the "bad" line conditions (latency, packet loss, jitter), the better Concurrency will perform, because it mitigates these conditions.
  • Storage I/O. Concurrency is typically limited only by the I/O of the storage. If a disk array or cloud storage is configured for lower read-write I/O speeds than the bandwidth connection, the storage will be the gating factor. Concurrency can offer services to help tune performance.
  • Security. For greater security, Concurrency allows you to encrypt and decrypt data files during file transfer. The underlying BitSpeed architecture runs just as fast using encryption as without encryption.

The following chart shows an independent lab test comparing the transfer of the same file set using different file transfer solutions. With BitSpeed, GCP users will typically experience somewhere between 40–120 millisecond latency:

Comparison of transfer speeds

Use cases

BitSpeed's solution addresses batch or bulk data needs. Bulk data consists of large datasets where ingestion requires high aggregate bandwidth between a small number of sources and the target. The data could be stored in files, such as CSV, JSON, Avro, or Parquet files, or in a relational or NoSQL database. The source data could be located on-premises or on other cloud platforms. Consider the following examples:

  • Scientific workloads. Uploading genetics data stored in Variant Call Format (VCF) text files to Cloud Storage for later import into Google Genomics.
  • Migrating to the cloud. Moving data in an on-premises database to a fully managed Cloud SQL database.
  • Backing up data. Replicating data stored in an Amazon S3 bucket to Cloud Storage using Cloud Storage Transfer Service.
  • Importing legacy data. Copying ten years worth of website log data into BigQuery for long-term trend analysis.

Here are examples of actual use cases of BitSpeed and how much faster file transfers ran:

  • ISO images – 4.5 GB files from Japan to Arizona:  17x faster
  • Video – 3D 4K film content from China to Los Angeles:  20x faster
  • Microscopy – over 10 Gbit connection:  22x faster
  • Genomics – 1 TB of daily genomics sequences:  32x faster
  • Video – TV content from New York City to Los Angeles, normally 10 Mbps:  86x faster

What's next

If you'd like to run a simple comparison test uploading files from your own facility, you are welcome to connect to the BitSpeed Test Center in El Segundo, CA. Contact info@bitspeed.com, and we will set up a browser-based demo for you. You can see an example test in this brief video.

If you have any issues regarding BitSpeed Concurrency for GCP, contact supportme@bitspeed.com.

For use cases and success stories spanning different implementations, visit the Contact Us tab at the BitSpeed website.

Try out other Google Cloud Platform features for yourself. Have a look at our tutorials.

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...