Managing the lifecycle of a Cloud Datalab instance

Stay organized with collections Save and categorize content based on your preferences.

This page describes the lifecycle of a Cloud Datalab instance and the options available for managing and conserving compute resources.

Cloud Datalab runs inside of a Google Compute Engine VM with an attached persistent disk that is used to store notebooks. Cloud Datalab VMs are connected to a special network in a project called datalab-network. The default configuration of this network limits incoming connections to SSH connections.

Prerequisites

To use the commands discussed below, you must have done the following:

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.

  4. Enable the Google Compute Engine and Cloud Source Repositories APIs.

    Enable the APIs

  5. Install the Google Cloud CLI.
  6. To initialize the gcloud CLI, run the following command:

    gcloud init
  7. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  8. Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.

  9. Enable the Google Compute Engine and Cloud Source Repositories APIs.

    Enable the APIs

  10. Install the Google Cloud CLI.
  11. To initialize the gcloud CLI, run the following command:

    gcloud init

Creating an instance

You create a Cloud Datalab instance using the datalab create command.

datalab create instance-name

There are several command-line options available with this command. For example, if you want to create an instance with more memory than the default, you can pass in the --machine-type flag:

datalab create --machine-type n1-highmem-2 instance-name

To list all available options, run:

datalab create --help

By default, the datalab create command connects to the newly created instance. To create the instance but not connect to it, pass in the --no-connect flag:

datalab create --no-connect instance-name

The datalab create command also creates the following Google Cloud Platform resources (if not already available):

  • The datalab-network network
  • A firewall rule on the datalab-network allowing incoming SSH connections
  • The datalab-notebooks Google Cloud Source Repository
  • The persistent disk for storing Cloud Datalab notebooks

Note that some of the above steps may require owner permission (see Using Cloud Datalab in a team environment).

Connecting to an instance

The datalab tool can create a persistent SSH tunnel to your Cloud Datalab instance that allows you to connect to the instance from your local browser as though Cloud Datalab was running on your local machine.

To create this connection, use the datalab connect command:

datalab connect instance-name

The datalab connect command restarts your instance if it is not running. The command continues to run until you stop it (the connection remains available for as long as the command is running).

By default, the local port used for the connection is 8081. To change to a different port, pass in the --port flag. For example, to use local port 8082, run the following:

datalab connect --port 8082 instance-name

Stopping an instance

Run the following command to stop your Cloud Datalab instance to avoid incurring unnecessary costs when you want to pause using Cloud Datalab.

datalab stop instance-name

When you are ready to start using Cloud Datalab again, run datalab connect command to restart the instance.

Updating the Cloud Datalab VM without deleting the notebooks disk

To update to a new Cloud Datalab version, or to change VM properties such as the machine type or the service account, you can delete and then re-create the Cloud Datalab VM without losing your notebooks stored on the persistent disk.

datalab delete --keep-disk instance-name
datalab create instance-name

Deleting an instance and the notebooks disk

By default, the datalab delete command does not delete the persistent disk holding your notebooks. This allows you to easily change the VM without accidentally losing your data (see Deleting and recreating an instance without deleting the notebooks disk).

If you want to delete both the VM and the attached persistent disk, then add the --delete-disk flag to the command:

datalab delete --delete-disk instance-name

Reducing usage of compute resources

Google Compute Engine VMs incur costs. You are charged for the time that a Cloud Datalab instance is running whether or not you are using it. You can reduce Cloud Datalab VM charges by stopping the instance when you are not using it. You will continue to incur charges for the resources attached to the VM (such as the persistent disk and the external IP address), but the VM instance itself will not incur charges while it is stopped.

When you need to use your stopped instance again, run datalab connect instance-name to connect to your instance, and the datalab tool will restart the instance before attempting to connect to it.

To stop incurring all charges associated with a Cloud Datalab instance, you must delete both the VM and the attached persistent disk by running the datalab delete command with the --delete-disk option.