Using Komprise to Archive Cold Data to Cloud Storage

By Ranjana Bhadoria, Product Manager, Komprise

This article discusses using Komprise to analyze data across on-premises storage to identify cold data and then move it to Google Cloud Storage.

Typically, 60% to 80% of data is infrequently accessed within months of creation, yet consumes the same expensive resources as hot data.

Komprise is analytics-driven data management software that analyzes data usage and growth across on-premises storage. Komprise identifies cold data and then moves it transparently to the appropriate class of Cloud Storage based on customer-defined policies.

To support both existing on-premises and new cloud-native use cases, the moved data is accessible both as files, exactly as before, and as files or objects in the cloud.

As data footprints expand rapidly, Komprise is working with customers across industries such as financial services, healthcare, and engineering who are streamlining costs, building a path to the cloud, and increasing the resiliency of their data in use cases like the following:

  • Active archiving: Based on research done by Komprise, over 60-80% of data is infrequently accessed and cold within months of creation; Komprise is used to identify the cold data and move it continuously to the cloud. Customers typically save over 70% of storage costs through active archiving.
  • Replication in the cloud: Many businesses want to keep a copy of their data in the cloud for redundancy and disaster recovery purposes. The Komprise copy policy continuously copies data to the cloud to provide a replication site.
  • Disaster recovery: In a disaster recovery situation, this archived data needs to be accessed as quickly as possible. Komprise enables you to access the data directly from Google Compute Engine.

The Komprise software deploys in under 15 minutes, works across NFS, SMB/CIFS, and object storage without any storage agents, adapts to file-system and network loads to run non-intrusively in the background, and scales out on-demand.

You can work in the Komprise Console, as shown below, to analyze data usage and growth, plan policies to move and manage data, and automate the ongoing management across storage:

User interface of Komprise Console with Analysis tab selected.

Analyzing data for use, size, users, and growth

Komprise runs as two components:

  • One or more Observer virtual machines that run on the customer’s premises.
  • The Komprise Director console running in the cloud.

The Observer virtual machines connect to the Komprise Director. Komprise can analyze and move data from any storage that supports NFS or SMB/CIFS mounts – including NetApp, EMC Isilon, and Windows File Servers.

Komprise profiles the data across storage and provides analytics to answer the following questions:

  • What are the types of files?
  • What is file size distribution?
  • Who is accessing which files?
  • How fast is file storage growing?

Customers get a good view into what data can be moved and when:

Plan Analysis view with usage highlighted

Planning data-management objectives and estimating costs

Along with the analytics, Komprise provides interactive ROI projections based on different data management objectives that customers can set. As customers set policies on when data should be moved and copied, and to where, Komprise instantly projects the estimated capacity that will be freed up and the projected cost savings.

Customers can pick different Cloud Storage classes, including Coldline Storage, as targets for the data. The cost model used by Komprise can be modified by the customer to use their own costs, so they can get a customized ROI. Komprise also projects the 3-year savings based on historical growth rates on their data.

You do this work in the Komprise Plan Editor, as shown below:

Plan Editor showing 3 groups in a plan.

Moving and managing data outside the hot data path

When ready, customers simply activate the plan and Komprise moves the data to Cloud Storage based on customer-defined objectives. The moved data still appears to exist on the source storage system as before. When a user or application accesses the data on the source system, Komprise transparently returns the data from the cloud. As illustrated in the following diagram, data is stored as objects in Cloud Storage. Komprise delivers file-based access to all the moved data so existing users and applications continue to access the data as before.

Architectural diagram of the hot data path.

The data is accessible both as files and as objects in Cloud Storage, so applications can be run against the data natively in the cloud. Unlike legacy data management solutions that are inline and get in the path of hot data, Komprise sits outside the path of hot data. Unlike storage-array tiering solutions that move cold blocks to the cloud but require that data be accessed only through the array, and thus limit customers from leveraging the true power of the cloud, Komprise moves files as objects and allows object-level access to the moved data directly from the cloud as well as file-level access from on the premises.

Deploying Komprise

The high-level steps to deploy Komprise are as follows:

  1. Download the Observer and install on a virtual server, set up the network configuration, and use the virtual appliance console to connect and authenticate with the Komprise Cloud Director.
  2. Configure the Observer to discover and analyze the shares on your file servers. After an Observer is authenticated with the Director, its web-based UI is used to discover and enable shares for analysis.

Use Komprise Director Console to connect with file servers and shares:

Komprise Director Console with the Sources tab selected.

Komprise hybrid cloud architecture

Komprise runs as a hybrid cloud service, with a grid of one or more Komprise Observer virtual appliances deployed on premises to analyze and move data, and a Director virtual machine that runs in the cloud and provides the management console.

Architectural diagram of Komprise as a hybrid cloud architecture.

Komprise does not require any dedicated hardware and runs as a scale-out grid of virtual machines. A typical challenge with traditional storage analytics software is that they may disrupt the performance of the storage. Komprise overcomes this issue by adaptively throttling back when the storage systems are in active use, so that Komprise runs non-disruptively in the background.

Another challenge with legacy data management solutions is their use of static stubs (pointers to the moved file), as these stubs can be deleted or corrupted, leaving the moved files orphaned. Komprise overcomes this challenge by not using static stubs, instead using resilient dynamic links that can be repopulated if they are accidentally deleted without losing access to the moved data.

Komprise is also invisible to the hot data path and does not get inline. So, performance of the active data is unchanged and in fact may improve as the primary storage is less overloaded.

In order to scale effectively without centralized bottlenecks, Komprise does not use a traditional SQL database that limits scalability. Instead, the Komprise Observers analyze and aggregate metadata and the Komprise Director presents an accumulation of these aggregates.


The following conditions must be met in order for you to deploy the Komprise Observer virtual appliance:

  • The Komprise Observer virtual appliance runs on the following hypervisors:

    • VMware ESXi 5.5 or higher (recommended)
    • KVM 1.2.0
  • A minimum configuration of the virtual appliance requires:

    • 4 to 8 CPUs
    • 8G B to 16 GB of RAM
    • 100 GB of disk space (thin-provisioned)

A standard Observer can support up to 250 or 300 shares. When you are ready to move to production, a Komprise sizing guide will help you configure the appropriate number of Observers for your environment.

Information required for network configuration and internet access

  • Static IP for the Observer (recommended)
  • DNS server IP
  • Gateway IP
  • Proxy server IP (optional)

Browsers supported

Chrome or Firefox is required for accessing the console.

Credentials to discover and access shares

  • Administrator/root access to the management console of the file server to discover shares
  • Backup Operator privileges to access all files

Outbound internet access over ports 80 and 443

The Observer must access the Google Cloud Platform Console over TCP ports 80 and 443. Only aggregate meta-meta data is sent to the Director to summarize the analysis results. Actual data is not sent to the Director, and is stored at a customer’s source and target storage.

Free assessment

For customers who want to reduce storage costs without compromising access, Komprise identifies infrequently accessed data and transparently moves it to the appropriate class of Cloud Storage for the best price/performance. Customers interested in understanding the ROI of using Cloud Storage and Komprise can sign up for an assessment at no charge. Sign up for a free assessment at

What's next

Explore reference architectures, diagrams, tutorials, and best practices about Google Cloud. Take a look at our Cloud Architecture Center.