Choosing a VM machine type

This page describes considerations for choosing the right Google Compute Engine machine type for your Cloud Datalab Virtual Machine (VM) instance.

Considerations when choosing a VM machine type

At the time of creation of a Datalab VM instance, you can specify a Google Compute Engine machine type. The default machine type used is n1-standard-1. You can select a different machine type based on performance and cost characteristics to suit your data analysis needs. Here are a few key considerations for selecting a machine type:

  • Each notebook uses a Python kernel to run code in its own process. For example, if you have N notebooks open, there are at least N processes corresponding to those notebooks.
  • Each kernel is single threaded. Unless you are running multiple notebooks at the same time, multiple cores may not provide significant benefit.
  • You may benefit significantly by selecting a machine with additional memory depending on your usage pattern and the amount of data processed.
  • Execution is cumulative—running three Cloud Datalab notebook cells in a row results in the accumulation of corresponding state, including memory allocated for data structures used in those cells.
  • Processing large amounts of data in memory (for example, using Pandas Dataframes) causes proportional memory allocation. When you finish running a notebook, you can stop the session by clicking on the Running Sessions icon sessions-icon in the top bar (you may need to resize the browser window to see the icon) and shutting down the session.
  • Cloud Datalab utilizes disk-based swap file to provide overhead for additional memory requirements, but relying on the swap file is likely to slow down processing. It's best to estimate memory needs, then pick a machine type with at least the estimated amount of memory.

Choosing a machine type

You choose a machine type for your Cloud Datalab VM instance when you create the instance—see datalab create --machine-type for more information. Here's an example:

datalab create --machine-type n1-highmem-2 instance-name