This guide explains different tasks associated with Cloud Datalab notebooks.
When you run
datalab create VM-instance-name for the first time,
it adds a
datalab-notebooks Cloud Source Repository
in the project (referred to, below, as the "cloud remote repo"). This is a remote
repository for the
git repository created in the docker container running
in your Cloud Datalab VM instance (referred to, below, as the "Cloud Datalab
VM repo"). You can browse the cloud remote repo from the
Google Cloud Console Repositories page.
Using ungit in your browser
The Cloud Datalab container includes ungit, a web-based git client, which allows you to make commits to the Cloud Datalab VM repo and push notebooks to the cloud remote repo from the Cloud Datalab browser UI.
To open ungit on the Cloud Datalab
select the repository icon in the right-top section of the Google Cloud
Datalab menu bar.
A browser window opens on the Cloud Datalab VM repo.
Adding a notebook to the cloud remote repo.
Navigate to the
/datalab/notebooksfolder in your Cloud Datalab notebook browser window.
Open a new notebook from the
/datalab/notebooksfolder by selecting the "+ Notebook" icon.
- Add one or more cells to the notebook
- Rename the notebook by clicking on "Untitled Notebook" in the menu bar and changing the name to "New Notebook"
- Select Notebook→Save and Checkpoint (Ctrl-s), or wait for the notebook to be autosaved.
Return to the Cloud Datalab notebook browser window, and click on the ungit icon to open an ungit browser page (see Using ungit in your browser). After providing a commit title,
New Notebook.ipynbis ready to be committed to the Cloud Datalab VM repo.
After committing the notebook, push it to the
datalab-notebookscloud remote repo from the ungit browser page.
Using git from the command line
Instead of using ungit from the Cloud Datalab UI for source control (see Using ungit in your browser), you can SSH into the Cloud Datalab VM and run git from a terminal running in your VM or from Cloud Shell. Here are the steps:
- SSH to the Cloud Datalab VM using the
gcloudcommand line tool or the Cloud Console:
gcloud commandRun the following command after inserting project-id, zone, and instance-name.
gcloud compute --project project-id ssh
--zone zone instance-name
Console/SHELLGo to the Cloud Console VM instances section, expand the SSH menu at the right of your Cloud Datalab VM row, and then select View gcloud command. The gcloud command line window opens, showing the gcloud SSH command that you can copy and paste to run in a local terminal.
- After SSHing into the Cloud Datalab VM, run the
sudo docker pscommand to list the Container ID of the Cloud Datalab docker image running in the VM. Copy the Container ID that is associated with the
/datalab/run.shcommand and the
docker ps CONTAINER ID ... COMMAND ... ... NAMES ... b228e3392374 ... "/datalab/run.sh" ... datalab_datalab-server-... ...
- Open an interactive shell session inside the container using the Container ID
from last step.
docker exec -it container-id bash ... root@datalab-server-vm-name:/#
- Change to the
/content/datalab/notebooksdirectory in the container.
cd /content/datalab/notebooksThis is the root Cloud Datalab VM git repo directory from which you can issue git commands.
git status On branch master nothing to commit, working directory clean
Copying notebooks from the Cloud Datalab VM
You can copy files from your Cloud Datalab VM instance using the
gcloud compute scp command. For example,
to copy the contents of your Cloud Datalab VM's
datalab/notebooks directory to
a instance-name-notebooks directory on your local machine, run the
following command after replacing
instance-name with the name of your Cloud
Datalab VM (the instance-name-notebooks directory directory will be
created if it doesn't exist).
gcloud compute scp --recurse \ datalab@instance-name:/mnt/disks/datalab-pd/content/datalab/notebooks \ instance-name-notebooks
Cloud Datalab backup
Cloud Datalab instances periodically back up user content to a Google Cloud Storage bucket in the user's project to prevent accidental loss of user content in case of a failed or deleted VM disk. By default, a Cloud Datalab instance stores all of the user’s content in an attached disk, and the backup utility works on this disk’s root. The backup job is run every ten minutes, creates a zip file of the entire disk, compares it to the last backup zip file, uploads the zip if there’s a difference between the two and if sufficient time has elapsed between the new changes and the last backup. Cloud Datalab uploads the backup files to Google Cloud Storage.
Cloud Datalab retains the last 10 hourly backups, 7 daily backups, and 20 weekly
backups, and deletes older backup files to preserve space. Backups can be turned
off by passing the
--no-backups flag when creating a Cloud Datalab instance
with the datalab create command.
Each backup file is named using the VM instance zone, instance name, notebook
backup directory path within the instance, timestamp, and a tag that is either
hourly, daily, or weekly. By default, Cloud Datalab will try to create the backup path
$project_id.appspot.com/datalab_backups. If this path cannot be created or
the user does not have sufficient permissions, the creation of a
$project_id/datalab_backups path is attempted. If that attempt fails, backups
to Google Cloud Storage will fail.
To restore a backup, the user selects the backup file from Google Cloud Storage by examining the VM zone, VM name, notebook directory, and the human-readable timestamp.
Sample backup file path:
This sample backup was created for the VM
datalab0125 in zone
us-central1-b, and it contains all content under the notebook's
directory. It was created as a daily backup point on
A backup zip file can be downloaded from the browser or by using the gsutil tool that is installed as part of the Google Cloud SDK installation.
To use the browser, navigate to Google Cloud Console, then select Storage from the left navigation sidebar. Browse to the Cloud Datalab backup bucket, then select and download the zip file to disk.
gsutilto download the backup file, run
gsutil cp gs://backup_path destination_path. For example, to backup and extract the sample zip file discussed above:
unzip -q /tmp/backup0127.zip -d /tmp/restore_location/
Working with data
Cloud Datalab can access data located in any of the following places:
Google Cloud Storage: files and directories in Cloud Storage can be programmatically accessed using the
datalab.storageAPIs (see the
/datalab/docs/tutorials/Storage/Storage APIs.ipynbnotebook tutorial)
BigQuery: tables and views can be queried using SQL and
datalab.bigqueryAPIs (see the
datalab/docs/tutorials/BigQuery/BigQuery/BigQuery APIs.ipynbnotebook tutorial)
Local file system on the persistent disk: you can create or copy files to the file system on the persistent disk attached to your Cloud Datalab VM.