This page describes the lifecycle of a Cloud Datalab instance and the options available for managing and conserving compute resources.
Cloud Datalab runs inside of a Google Compute Engine VM with an attached
persistent disk that is used to store notebooks. Cloud Datalab VMs are connected
to a special network in a project called datalab-network.
The default
configuration of this network limits incoming connections to SSH connections.
Prerequisites
To use the commands discussed below, you must have done the following:
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.
-
Enable the Google Compute Engine and Cloud Source Repositories APIs.
- Install the Google Cloud CLI.
-
To initialize the gcloud CLI, run the following command:
gcloud init
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.
-
Enable the Google Compute Engine and Cloud Source Repositories APIs.
- Install the Google Cloud CLI.
-
To initialize the gcloud CLI, run the following command:
gcloud init
Creating an instance
You create a Cloud Datalab instance using the datalab create command.
datalab create instance-name
There are several command-line options available with this command. For example, if you want to create an instance with
more memory than the default, you can pass in the --machine-type
flag:
datalab create --machine-type n1-highmem-2 instance-name
To list all available options, run:
datalab create --help
By default, the datalab create
command connects to the newly created instance.
To create the instance but not connect to it, pass in the --no-connect
flag:
datalab create --no-connect instance-name
The datalab create
command also creates the following Google Cloud Platform
resources (if not already available):
- The
datalab-network
network - A firewall rule on the
datalab-network
allowing incoming SSH connections - The
datalab-notebooks
Google Cloud Source Repository - The persistent disk for storing Cloud Datalab notebooks
Note that some of the above steps may require owner permission (see Using Cloud Datalab in a team environment).
Connecting to an instance
The datalab
tool can create a persistent SSH tunnel to your Cloud Datalab
instance that allows you to connect to the instance from your local browser
as though Cloud Datalab was running on your local machine.
To create this connection, use the datalab connect command:
datalab connect instance-name
The datalab connect
command restarts your instance if it is not
running. The command continues to run until you stop it (the connection remains
available for as long as the command is running).
By default, the local port used for the connection is 8081
. To change to a
different port, pass in the --port
flag. For example, to use local port 8082
,
run the following:
datalab connect --port 8082 instance-name
Stopping an instance
Run the following command to stop your Cloud Datalab instance to avoid incurring unnecessary costs when you want to pause using Cloud Datalab.
datalab stop instance-name
When you are ready to start using Cloud Datalab again,
run datalab connect
command to restart the instance.
Updating the Cloud Datalab VM without deleting the notebooks disk
To update to a new Cloud Datalab version, or to change VM properties such as the machine type or the service account, you can delete and then re-create the Cloud Datalab VM without losing your notebooks stored on the persistent disk.
datalab delete --keep-disk instance-name
datalab create instance-name
Deleting an instance and the notebooks disk
By default, the datalab delete command does not delete the persistent disk holding your notebooks. This allows you to easily change the VM without accidentally losing your data (see Deleting and recreating an instance without deleting the notebooks disk).
If you want to delete both the VM and the attached persistent disk, then
add the --delete-disk
flag to the command:
datalab delete --delete-disk instance-name
Reducing usage of compute resources
Google Compute Engine VMs incur costs. You are charged for the time that a Cloud Datalab instance is running whether or not you are using it. You can reduce Cloud Datalab VM charges by stopping the instance when you are not using it. You will continue to incur charges for the resources attached to the VM (such as the persistent disk and the external IP address), but the VM instance itself will not incur charges while it is stopped.
When you need to use your stopped instance again,
run datalab connect instance-name
to connect to your
instance, and the datalab
tool will restart the instance before attempting to
connect to it.
To stop incurring all charges associated with a Cloud Datalab instance,
you must delete both the VM and the attached persistent disk by running
the datalab delete
command with the --delete-disk
option.