This guide explains different tasks associated with Cloud Datalab notebooks.
Source control
When you run datalab create
VM-instance-name for the first time,
it adds a datalab-notebooks
Cloud Source Repository
in the project (referred to, below, as the "cloud remote repo"). This is a remote
repository for the /content/datalab/notebooks
git repository created in the docker container running
in your Cloud Datalab VM instance (referred to, below, as the "Cloud Datalab
VM repo"). You can browse the cloud remote repo from the
Google Cloud Console Repositories page.

You can use git or ungit to manage the notebooks in the Cloud Datalab VM repo.
Using ungit in your browser
The Cloud Datalab container includes ungit, a web-based git client, which allows you to make commits to the Cloud Datalab VM repo and push notebooks to the cloud remote repo from the Cloud Datalab browser UI.
To open ungit on the Cloud Datalab /content/datalab/notebooks
repo,
select the repository icon in the right-top section of the Google Cloud
Datalab menu bar.

A browser window opens on the Cloud Datalab VM repo.

Adding a notebook to the cloud remote repo.
Navigate to the
/datalab/notebooks
folder in your Cloud Datalab notebook browser window.Open a new notebook from the
/datalab/notebooks
folder by selecting the "+ Notebook" icon.- Add one or more cells to the notebook
- Rename the notebook by clicking on "Untitled Notebook" in the menu bar and changing the name to "New Notebook"
- Select Notebook→Save and Checkpoint (Ctrl-s), or wait for the
notebook to be autosaved.
Return to the Cloud Datalab notebook browser window, and click on the ungit icon to open an ungit browser page (see Using ungit in your browser). After providing a commit title,
New Notebook.ipynb
is ready to be committed to the Cloud Datalab VM repo.After committing the notebook, push it to the
datalab-notebooks
cloud remote repo from the ungit browser page.
Using git from the command line
Instead of using ungit from the Cloud Datalab UI for source control (see Using ungit in your browser), you can SSH into the Cloud Datalab VM and run git from a terminal running in your VM or from Cloud Shell. Here are the steps:
- SSH to the Cloud Datalab VM using the
gcloud
command line tool or the Cloud Console:gcloud command
Run the following command after inserting project-id, zone, and instance-name.gcloud compute --project project-id ssh
--zone zone instance-nameConsole/SHELL
Go to the Cloud Console VM instances section, expand the SSH menu at the right of your Cloud Datalab VM row, and then select View gcloud command. - After SSHing into the Cloud Datalab VM, run the
sudo docker ps
command to list the Container ID of the Cloud Datalab docker image running in the VM. Copy the Container ID that is associated with the/datalab/run.sh
command and thedatalab_datalab-server
name.docker ps CONTAINER ID ... COMMAND ... ... NAMES ... b228e3392374 ... "/datalab/run.sh" ... datalab_datalab-server-... ...
- Open an interactive shell session inside the container using the Container ID
from last step.
docker exec -it container-id bash ... root@datalab-server-vm-name:/#
- Change to the
/content/datalab/notebooks
directory in the container.cd /content/datalab/notebooks
This is the root Cloud Datalab VM git repo directory from which you can issue git commands.git status On branch master nothing to commit, working directory clean
Copying notebooks from the Cloud Datalab VM
You can copy files from your Cloud Datalab VM instance using the
gcloud compute scp command. For example,
to copy the contents of your Cloud Datalab VM's datalab/notebooks
directory to
a instance-name-notebooks directory on your local machine, run the
following command after replacing instance-name
with the name of your Cloud
Datalab VM (the instance-name-notebooks directory directory will be
created if it doesn't exist).
gcloud compute scp --recurse \ datalab@instance-name:/mnt/disks/datalab-pd/content/datalab/notebooks \ instance-name-notebooks
Cloud Datalab backup
Cloud Datalab instances periodically back up user content to a Google Cloud Storage bucket in the user's project to prevent accidental loss of user content in case of a failed or deleted VM disk. By default, a Cloud Datalab instance stores all of the user’s content in an attached disk, and the backup utility works on this disk’s root. The backup job is run every ten minutes, creates a zip file of the entire disk, compares it to the last backup zip file, uploads the zip if there’s a difference between the two and if sufficient time has elapsed between the new changes and the last backup. Cloud Datalab uploads the backup files to Google Cloud Storage.
Cloud Datalab retains the last 10 hourly backups, 7 daily backups, and 20 weekly
backups, and deletes older backup files to preserve space. Backups can be turned
off by passing the --no-backups
flag when creating a Cloud Datalab instance
with the datalab create command.
Each backup file is named using the VM instance zone, instance name, notebook
backup directory path within the instance, timestamp, and a tag that is either
hourly, daily, or weekly. By default, Cloud Datalab will try to create the backup path
$project_id.appspot.com/datalab_backups
. If this path cannot be created or
the user does not have sufficient permissions, the creation of a
$project_id/datalab_backups
path is attempted. If that attempt fails, backups
to Google Cloud Storage will fail.
Restoring backups
To restore a backup, the user selects the backup file from Google Cloud Storage by examining the VM zone, VM name, notebook directory, and the human-readable timestamp.
Sample backup file path: gs://myproject/datalab-backups/us-central1-b/datalab0125/content/daily-20170127102921 /tmp/backup0127.zip
This sample backup was created for the VM datalab0125
in zone
us-central1-b
, and it contains all content under the notebook's/content
directory. It was created as a daily backup point on 01/27/2017
at 10:29:21
.
A backup zip file can be downloaded from the browser or by using the gsutil tool that is installed as part of the Google Cloud SDK installation.
To use the browser, navigate to Google Cloud Console, then select Storage from the left navigation sidebar. Browse to the Cloud Datalab backup bucket, then select and download the zip file to disk.
To use
gsutil
to download the backup file, rungsutil cp gs://backup_path destination_path
. For example, to backup and extract the sample zip file discussed above:gsutil cp
gs://myproject/datalab-backups/us-central1-b/datalab0125/content/daily-20170127102921
/tmp/backup0127.zipunzip -q /tmp/backup0127.zip -d /tmp/restore_location/
Working with data
Cloud Datalab can access data located in any of the following places:
Google Cloud Storage: files and directories in Cloud Storage can be programmatically accessed using the
datalab.storage
APIs (see the/datalab/docs/tutorials/Storage/Storage APIs.ipynb
notebook tutorial)BigQuery: tables and views can be queried using SQL and
datalab.bigquery
APIs (see thedatalab/docs/tutorials/BigQuery/BigQuery/BigQuery APIs.ipynb
notebook tutorial)Local file system on the persistent disk: you can create or copy files to the file system on the persistent disk attached to your Cloud Datalab VM.
If your data is in a different location—on premise or in another cloud—you can transfer the data to Cloud Storage using the gsutil tool or the Cloud Storage Transfer Service.