By Eran Tamir, VP Product, NooBaa
This article discusses a solution for building a single data fabric with NooBaa's data platform and Google Cloud Platform (GCP).
NooBaa's data platform is a petabyte-scale solution for unstructured data. Install NooBaa on-premises or on GCP. Use storage nodes on any hardware, including existing or new dedicated hardware on-premises, and native cloud storage on GCP. Access workloads on-premises and in the cloud with a hybrid topology of NooBaa and GCP. Advanced erasure coding and true hardware agnostics provide flexibility and scalability.
Use NooBaa with native cloud storage.
- Data collaboration for cloud boost: Overcome peaks of loads, collaborate data, or extend existing workload.
- Data tiering for local applications: Frequently accessed data is seamlessly served from the local storage nodes while colder data is served from GCP.
- Data tiering for archive: Migrate data from local storage nodes to any storage class on GCP with customized data flows.
- Cloud spillover: Automatically spillover data from local storage to GCP.
- Data distribution: Use local storage nodes for the day-to-day workloads. If data is ready for distribution, copy it automatically to GCP for storage.
- Multi-cloud mirroring: Mirror data between any storage resource including data centers, private cloud storage, AWS, Azure, and GCP.
Solutions built with NooBaa:
- Active archiving for video surveillance and media asset management
- De-identify data for healthcare and research
- Offer storage as a service to backup and archive data
For more information about these use cases and others, see NooBaa's main use cases.
Active archiving lets you keep the data available for your applications, while the data can be anywhere.
This use case is relevant mainly for workloads that produce a lot of data on-premises, usually on a fast and expensive storage, while a large portion of it might never be used.
Data is stored on NooBaa, while a lifecycle policy or a custom data flow function, moves the local copies to GCP. The data is available regardless of the actual location and NooBaa serves it with a locality preference.
Video surveillance is a classic example of data that isn't accessed regularly, but when the need arises, the data is required in real time. In such a use case, NooBaa is the primary storage for the video catalog. The deduplication algorithms keeps the data storage efficient, mainly for the motionless periods.
Maintain multiple data resiliency policies in a single repository. Every class of data can have a different resiliency, such as different level of erasure coding. The active archiving can take place, moving old videos to the cloud transparently, so any application that tries to read data gets it. Extract metadata on the fly to feed any database with important characteristics extracted from a video. When a certain video is required, no restore process is needed, because NooBaa serves the data transparently, regardless of the location.
Media and entertainment
Connect NooBaa to any Media Asset Management (MAM) system that works with Amazon S3 API and serve the media catalog seamlessly. The data storage can be on-premises, be easily replicated to multiple locations for the various production phases, and transferred for distribution later. Automate the entire data flow based on dates, extensions, or custom metadata.
Extract metadata information on the fly to feed back into the MAM system. NooBaa can invoke the API of any tools used for subtitle creation as part of this custom data flow.
De-identification of data
The Internet of Things (IoT) has a distributed architecture, where multiple devices send bits to terabytes of data to centralized locations on a daily basis.
With NooBaa's technology, build multiple data centers, close to clusters of devices and provide the primary storage that handles the data aggregation. NooBaa's independent serverless functions allow for data manipulation, aggregation, and de-identification to take place automatically before transferring the data for analytics and diagnostics.
Use NooBaa in any country, collect all the data, de-identify it, per General Data Protection Regulation (GDPR) requirements, and then compress, encrypt, and transfer the data to a centralized location for analytics.
Medical imaging introduces regulatory and data gravity challenges.
Create on-premises storage with NooBaa that stores the medical imaging for the diagnostic phase. After the data is cold, a de-identification process automatically masks the Digital Imaging and Communications in Medicine (DICOM) records, updates a database with the connecting link, and moves the local copies of the encrypted data to the cloud using NooBaa's serverless functions.
Research projects are tricky because it's a challenge to analyze the amount of raw data. In many cases, the same dataset is used for multiple projects, while on-premises computational resources are limited. NooBaa helps you automatically split the relevant chunk of data out of the raw data and move it to the cloud for analysis. This method reduces the amount of data you need to push to the cloud, de-identifies or masks the data if needed, and uses the right computational resources for the project's lifetime.
Use NooBaa and GCP to boost research, keep privacy in place, squeeze timeframes, and stay in budget while using only temporary resources instead of investing in computational resources that might never be used again.
Backup and archive
NooBaa supports both GCP API, Amazon S3 compatible API, and Azure Blob compatible API. Any backup or archiving software that uses these APIs works with NooBaa seamlessly. Using NooBaa in this way lets you use any mixture of on-premises, hybrid-cloud storage, or public cloud-native storage. In addition, set a unique data placement policy for every application or for the data you want to backup or archive. NooBaa is certified by Veritas and Commvault, but also tested with Rubrik, CloudBerry, Synology, Cyberduck and more, successfully. NooBaa offers data efficiency due to the compression and deduplication. It also provides an additional security tier and is a great solution for regulation demands that impact multiple cloud providers.
Storage as a service
Managed service providers (MSP) can offer storage as a service by using NooBaa to turn existing data centers into an ideal and cost-effective backup as well as offering Disaster Recovery (DR) services with Amazon S3 compatibility.
GCP and Amazon S3 API let you use the cloud ecosystem including backup applications, archival solutions, and a documented API. With its true hardware agnostic technology, NooBaa aggregates multiple storage silos of any size and vendor. Due to NooBaa's unique architecture, MSPs can quickly scale the storage anywhere. NooBaa has a rich API to enable a quick and smooth integration with billing, account creation, and permissions.
In such a case, MSPs provides a high availability tier by mirroring data to GCP.
NooBaa core is a Virtual Machine (VM) that runs on GCP, the same way it can run on-premises. The VM includes the same components and handles the data in the same way, regardless of its location, with the added benefits of compression, encryption, and deduplication. In addition, the VM can be used to replicate data between multiple regions and across accounts.
A hybrid-cloud solution is a single data management solution, where NooBaa Core is deployed on-premises and connected to GCP, aggregating capacity from both the on-premises and cloud resources. You create mirroring and tiering policies based on metadata, time, and usage.
NooBaa can optionally spill over data to GCP in case it's running out of space on-premises. The data migrates back to on-premises, if there are enough resources on-premises.
The following figure shows how NooBaa Core is a VM that you can deploy on any local virtualized infrastructure.
By default, the VM includes the NooBaa core component (3) and a REST service (1).
The NooBaa Core component manages the metadata, monitors storage nodes, and optimizes data placement. The REST service chunks the data and then runs deduplication, compression, and encryption algorithms. The REST service is a stateless scalable component that provides distributed endpoints locally and in the cloud.
Storage nodes (2) can be local storage based on any x86 servers, or native cloud storage like GCP.
A multi-cloud solution is a single data management solution, aggregating capacity from multiple cloud resources. Create mirroring and tiering policies based on metadata, time, and usage across cloud providers, regions, and accounts.
The NooBaa Core runs in the cloud, and a GCP instance image is available, as well as AWS, Azure, and Alibaba instances. The VM includes exactly the same components and handles data in a consistent way across clouds to compress, encrypt and deduplicate. On top of that, NooBaa Core can mirror data between the cloud providers.
Personalized data flow with GCP
In both a hybrid-cloud and a multi-cloud configuration, NooBaa allows data tiering and data manipulation based on time or events. Data flow can be as simple as tiering data based on creation date, or more complex, such as masking the data for every newly created object. Move data from one location to another, update a database, or update an in-memory key value with metadata.
Configure data flow as data placement policies:
- Use mirroring between the local data center and GCP.
- Use Amazon S3 lifecycle
- Use the serverless function