This guide explains different tasks associated with Cloud Datalab notebooks.
Users can use git to work on their notebooks in a Cloud Datalab instance.
Since Cloud Datalab is a GUI-first experience, the Cloud Datalab container includes a git web client, which allows users to make commits and push/pull notebook changes from the browser. Alternatively, users can choose to SSH to the VM running Cloud Datalab, start a shell session inside the running Cloud Datalab container, then work directly on their files from the command line.
Users can also use the SSH shell scenario to set up their own configuration for source control. Example steps to do this are:
- SSH to the VM running Cloud Datalab
gcloud compute ssh user@instance_name
- Find the docker container ID running Cloud Datalab
docker ps -qf ancestor=datalab
- Open an interactive shell session inside the container using the ID from last
docker exec -it container_id
- Change directory to the location of user content, (by default,
Copying notebooks to and from the VM
You can copy files to and from your Cloud Datalab instance using the
gcloud compute copy-files command. For example, to copy the contents of your
datalab/notebooks directory to your
local machine, run the following (after replacing
instance_name with the name
of your VM):
gcloud compute copy-files \ datalab@instance_name:/mnt/disks/datalab-pd/datalab/notebooks \ instance_name-notebooks
Cloud Datalab backup
Cloud Datalab instances periodically back up user content to a Google Cloud Storage bucket in the user's project to prevent accidental loss of user content in case of a failed or deleted VM disk. By default, a Cloud Datalab instance stores all of the user’s content in an attached disk, and the backup utility works on this disk’s root. The backup job is run every ten minutes, creates a zip file of the entire disk, compares it to the last backup zip file, uploads the zip if there’s a difference between the two and if sufficient time has elapsed between the new changes and the last backup. Cloud Datalab uploads the backup files to Google Cloud Storage.
Cloud Datalab retains the last 10 hourly backups, 7 daily backups, and 20 weekly
backups, and deletes older backup files to preserve space. Backups can turned off
by passing the
--no-backups flag when creating a Cloud Datalab instance
with the datalab create command.
Each backup file is named using the VM instance zone, instance name, notebook
backup directory path within the instance, timestamp, and a tag that is either
hourly, daily, or weekly. By default, Cloud Datalab will try to create the backup path
$project_id.appspot.com/datalab_backups. If this path cannot be created or
the user does not have sufficient permissions, the creation of a
$project_id/datalab_backups path is attempted. If that attempt fails, backups
to Google Cloud Storage will fail.
To restore a backup, the user selects the backup file from Google Cloud Storage by examining the VM zone, VM name, notebook directory, and the human-readable timestamp.
Sample backup file path:
This sample backup was created for the VM
datalab0125 in zone
us-central1-b, and it contains all content under the notebook's
directory. It was created as a daily backup point on
A backup zip file can be downloaded from the browser or by using the gsutil tool that is installed as part of the Google Cloud SDK installation.
To use the browser, navigate to Google Cloud Platform Console, then select Storage from the left navigation sidebar. Browse to the Cloud Datalab backup bucket, then select and download the zip file to disk.
gsutilto download the backup file, run
gsutil cp gs://backup_path destination_path. For example, to backup and extract the sample zip file discussed above:
gsutil cp \ gs://myproject/datalab-backups/us-central1-b/datalab0125/content/daily-20170127102921 \ /tmp/backup0127.zip
unzip -q /tmp/backup0127.zip -d /tmp/restore_location/
Working with data
Cloud Datalab can access data located in any of the following places:
Google Cloud Storage: files and directories in Cloud Storage can be programmatically accessed using the
datalab.storageAPIs (see the
/datalab/docs/tutorials/Storage/Storage APIs.ipynbnotebook tutorial)
BigQuery: tables and views can be queried using SQL and
datalab.bigqueryAPIs (see the
datalab/docs/tutorials/BigQuery/BigQuery/BigQuery APIs.ipynbnotebook tutorial)
Local file system on the persistent disk: you can create or copy files to the file system on the persistent disk attached to your Cloud Datalab VM.