This page describes considerations for choosing the right Google Compute Engine machine type for your Cloud Datalab Virtual Machine (VM) instance.
Considerations when choosing a VM machine type
At the time of creation of a Datalab VM instance, you can specify a
Google Compute Engine machine type.
The default machine type used is n1-standard-1
. You can select a different
machine type based on performance and cost characteristics to suit your data
analysis needs. Here are a few key considerations for selecting a
machine type:
- Each notebook uses a Python kernel to run code in its own process. For example, if you have N notebooks open, there are at least N processes corresponding to those notebooks.
- Each kernel is single threaded. Unless you are running multiple notebooks at the same time, multiple cores may not provide significant benefit.
- You may benefit significantly by selecting a machine with additional memory depending on your usage pattern and the amount of data processed.
- Execution is cumulative—running three Cloud Datalab notebook cells in a row results in the accumulation of corresponding state, including memory allocated for data structures used in those cells.
- Processing large amounts of data in memory (for example, using Pandas Dataframes)
causes proportional memory allocation. When you finish running a notebook,
you can stop the session by clicking on the Running Sessions icon
in the top bar (you may need to resize the browser window to see the icon) and shutting down the session.
- Cloud Datalab utilizes disk-based swap file to provide overhead for additional memory requirements, but relying on the swap file is likely to slow down processing. It's best to estimate memory needs, then pick a machine type with at least the estimated amount of memory.
Choosing a machine type
You choose a machine type for your Cloud Datalab VM instance when you create the instance—see datalab create --machine-type for more information. Here's an example:
datalab create --machine-type n1-highmem-2 instance-name