Backup & Restore Neo4j Graph Database via GKE Cronjob and Google Cloud Storage
Usama Ijaz
Strategic Cloud Engineer
David Ng
Cloud Database Migrations Engineer
In today's data-centric world, the integrity and availability of database information are critical to the success of any digital enterprise. Neo4j, a premier graph database known for its adept handling of intricate relationships and complex queries, stands at the forefront of this reality. Neo4j's powerful scalability invites the optimization of data backup and restoration processes. Building upon Neo4j's already existing capabilities, this process adds additional automation for demanding environments.
Background
In the dynamic realm of Neo4j database management, developers and IT professionals grapple with multiple challenges. Traditional backup methods are not only cumbersome but fraught with risks – manual interventions can lead to human errors, and local storage backups carry the threat of data loss from system failures or unforeseen disasters. Moreover, the diverse backup options offered by Neo4j – full, incremental, and differential – while beneficial, demand a strategic approach to balance comprehensiveness with efficiency.
Recognizing these complexities, Google Cloud Consulting has pioneered an automated, cloud-centric solution for the backup and restoration of Neo4j databases. This innovative approach utilizes the versatility of Google Kubernetes Engine (GKE) CronJobs coupled with the robustness of Google Cloud Storage (GCS) buckets. By harnessing the cloud's scalability and resilience, this solution not only streamlines the backup process but also significantly mitigates the risks associated with data loss and corruption.
This tool, initially custom-crafted for a specific client's needs, showcased such potential in enhancing data resilience that Google Cloud Consulting decided to open-source its design. This decision reflects our ongoing commitment to fostering a culture of innovation and sharing, where advanced, cloud-native solutions can be accessible to a broader community. By open-sourcing this tool, we aim to empower developers and organizations to not only safeguard their data but also to embrace the full potential of Neo4j’s capabilities in a secure and efficient manner.
As we step further into a future where data is the cornerstone of decision-making and operations, the ability to reliably backup and restore data becomes indispensable. With this guide and the tools provided by Google Cloud, organizations leveraging Neo4j can now navigate this path with greater confidence and capability.
Setting up the Environment
Before we begin, let's ensure our environment is properly configured:
- Google Cloud : You'll need an active Google Cloud account.
- Google Kubernetes Engine (GKE): Create a GKE cluster to deploy Neo4j and associated components.
- Google Cloud Storage (GCS): Set up a GCS bucket to store your Neo4j backups securely. You can follow the detailed setup instructions provided in the repository's README.
- Code Repository: In the provided repository, you'll find a well-organized example of the backup and restore procedure designed for simplicity and ease of use: Neo4j Back & Restore Example
Backup
The backup procedure outlines the following steps to create and manage a backup process. Let's break down each step:
1. Build and Push Backup Pod Image:
-
-
Start by creating a special container for backups ( example)
-
Make sure the settings in the backup/backup.env file are correct. These settings tell the backup where to put data and how to find your Neo4j database.
-
Use a script called pod-image-exec.sh to make this container and send it to an image repository such as Google Artifact Registry.
-
2 .Deploy Backup Schedule:
-
-
Decide how often you want to create backups and any other special settings in the ‘backup/deployment/backup-cronjob.yaml’ file.
-
Use a script called ‘deploy-exec.sh’ to set up a schedule for creating backups on your Neo4j cluster.
-
3. Update Backup Container (if needed):
-
-
If you ever want to change how the backup works, you can do so in a script called ‘backup-via-admin.sh’ or by modifying the Dockerfile in the ‘backup/docker/’ folder.
-
After making changes, you'll need to update the backup container.
-
4. Delete Backup Schedule (if needed):
-
- If you no longer want to make automatic backups, you can remove the schedule with a simple command. Replace <CRONJOB_NAME> with the name of your schedule.
5. Re-deploy Backup Schedule (if needed):
-
-
If you change your mind and want to start making backups again, you can easily set up the schedule again using the same configuration file above.
-
This procedure allows you to automate the backup process for your Neo4j database running on Kubernetes using GCS for storage. It ensures that your data is regularly backed up and can be restored if needed, providing data resilience and reliability for your Neo4j-based applications.
Restore
1. Requirements:
Ensure you have one of the following before continuing with the restore process:
-
-
Sidecar container running the Google Cloud SDK on your Neo4j instance
-
Google Cloud SDK pre-installed on the servers where your Neo4j instance is running
-
2. Download and Restore from Google Cloud Storage Bucket:
-
-
The restore process involves retrieving backup data from a Google Cloud Storage (GCS) bucket and using it to restore your Neo4j database.
-
To simplify this process, there's a script called ‘/restore/restore-exec.sh’ that coordinates the restore steps, handling them one server at a time.
-
3. Executing the Restore Script:
-
- To initiate the restore process, first, ensure you have permission to execute the script:
b. Next, run the restore script:
This restore procedure assumes you have the necessary Google Cloud tools available on your Neo4j instance or servers. It uses a script to download backup data from a Google Cloud Storage bucket, and then carefully restores your Neo4j database one server at a time. This process ensures that your Neo4j database can be recovered efficiently, in case of data loss or corruption, providing data reliability for your applications.
Conclusion
Safeguarding your Neo4j data is of utmost importance. The code repository we've explored in this blog post, combined with the capabilities of GKE and GCS, offers a robust solution for Neo4j backup and restore. By following the comprehensive instructions and best practices outlined here, you can ensure the resilience and availability of your Neo4j databases using the best of Google Cloud - in this case Google Kubernetes Engine and Cloud Storage -ultimately contributing to the success and reliability of your applications.
This guide provides a glimpse into the capabilities of Google Cloud Consulting and our commitment to developing solutions that not only solve immediate challenges, but also pave the way for future advancements. Embrace the power of Google Kubernetes Engine and Cloud Storage to secure your Neo4j databases. Contact us to learn more.