This page describes considerations for choosing the right Google Compute Engine machine type for your Cloud Datalab Virtual Machine (VM) instance.
Considerations when choosing a VM machine type
At the time of creation of a Datalab VM instance, you can specify a
Google Compute Engine machine type.
The default machine type used is
n1-standard-1. You can select a different
machine type based on performance and cost characteristics to suit your data
analysis needs. Here are a few key considerations for selecting a
- Each notebook uses a Python kernel to run code in its own process. For example, if you have N notebooks open, there are at least N processes corresponding to those notebooks.
- Each kernel is single threaded. Unless you are running multiple notebooks at the same time, multiple cores may not provide significant benefit.
- You may benefit significantly by selecting a machine with additional memory depending on your usage pattern and the amount of data processed.
- Execution is cumulative—running three Cloud Datalab notebook cells in a row results in the accumulation of corresponding state, including memory allocated for data structures used in those cells.
- Processing large amounts of data in memory (for example, using Pandas Dataframes) causes proportional memory allocation. When you finish running a notebook, you can stop the session by clicking on the Running Sessions icon in the top bar and shutting down the session.
- Cloud Datalab utilizes disk-based swap file to provide overhead for additional memory requirements, but relying on the swap file is likely to slow down processing. It's best to estimate memory needs, then pick a machine type with at least the estimated amount of memory.