Working with notebooks

This guide explains different tasks associated with Cloud Datalab notebooks.

Source control

Users can use git to work on their notebooks in a Cloud Datalab instance.

Since Cloud Datalab is a GUI-first experience, the Cloud Datalab container includes a git web client, which allows users to make commits and push/pull notebook changes from the browser. Alternatively, users can choose to SSH to the VM running Cloud Datalab, start a shell session inside the running Cloud Datalab container, then work directly on their files from the command line.

Users can also use the SSH shell scenario to set up their own configuration for source control. Example steps to do this are:

  1. SSH to the VM running Cloud Datalab
    gcloud compute ssh user@instance-name
    
  2. Find the docker container ID running Cloud Datalab
    docker ps -qf ancestor=datalab
    
  3. Open an interactive shell session inside the container using the ID from last step
    docker exec -it container-id
    
  4. Change directory to the location of user content, (by default, /content)

Copying notebooks to and from the VM

You can copy files to and from your Cloud Datalab instance using the gcloud compute copy-files command. For example, to copy the contents of your datalab/notebooks directory to your local machine, run the following (after replacing instance-name with the name of your VM):

gcloud compute copy-files \
  datalab@instance-name:/mnt/disks/datalab-pd/datalab/notebooks \
  instance-name-notebooks

Cloud Datalab backup

Cloud Datalab instances periodically back up user content to a Google Cloud Storage bucket in the user's project to prevent accidental loss of user content in case of a failed or deleted VM disk. By default, a Cloud Datalab instance stores all of the user’s content in an attached disk, and the backup utility works on this disk’s root. The backup job is run every ten minutes, creates a zip file of the entire disk, compares it to the last backup zip file, uploads the zip if there’s a difference between the two and if sufficient time has elapsed between the new changes and the last backup. Cloud Datalab uploads the backup files to Google Cloud Storage.

Cloud Datalab retains the last 10 hourly backups, 7 daily backups, and 20 weekly backups, and deletes older backup files to preserve space. Backups can turned off by passing the --no-backups flag when creating a Cloud Datalab instance with the datalab create command.

Each backup file is named using the VM instance zone, instance name, notebook backup directory path within the instance, timestamp, and a tag that is either hourly, daily, or weekly. By default, Cloud Datalab will try to create the backup path $project_id.appspot.com/datalab_backups. If this path cannot be created or the user does not have sufficient permissions, the creation of a $project_id/datalab_backups path is attempted. If that attempt fails, backups to Google Cloud Storage will fail.

Restoring backups

To restore a backup, the user selects the backup file from Google Cloud Storage by examining the VM zone, VM name, notebook directory, and the human-readable timestamp.

Sample backup file path: gs://myproject/datalab-backups/us-central1-b/datalab0125/content/daily-20170127102921 /tmp/backup0127.zip

This sample backup was created for the VM datalab0125 in zone us-central1-b, and it contains all content under the notebook's/content directory. It was created as a daily backup point on 01/27/2017 at 10:29:21.

A backup zip file can be downloaded from the browser or by using the gsutil tool that is installed as part of the Google Cloud SDK installation.

  • To use the browser, navigate to Google Cloud Platform Console, then select Storage from the left navigation sidebar. Browse to the Cloud Datalab backup bucket, then select and download the zip file to disk.

  • To use gsutil to download the backup file, run gsutil cp gs://backup_path destination_path. For example, to backup and extract the sample zip file discussed above:

       gsutil cp \
         gs://myproject/datalab-backups/us-central1-b/datalab0125/content/daily-20170127102921 \
         /tmp/backup0127.zip
       unzip -q /tmp/backup0127.zip -d /tmp/restore_location/
       

Working with data

Cloud Datalab can access data located in any of the following places:

  • Google Cloud Storage: files and directories in Cloud Storage can be programmatically accessed using the datalab.storage APIs (see the /datalab/docs/tutorials/Storage/Storage APIs.ipynb notebook tutorial)

  • BigQuery: tables and views can be queried using SQL and datalab.bigquery APIs (see the datalab/docs/tutorials/BigQuery/BigQuery/BigQuery APIs.ipynb notebook tutorial)

  • Local file system on the persistent disk: you can create or copy files to the file system on the persistent disk attached to your Cloud Datalab VM.

If your data is in a different location—on premise or in another cloud—you can transfer the data to Cloud Storage using the gsutil tool or the Cloud Storage Transfer Service.

Send feedback about...

Google Cloud Datalab Documentation