Adding Python libraries to a Cloud Datalab instance

This guide explains how to customize Cloud Datalab by adding Python libraries to your Cloud Datalab VM instance.

Adding Python libraries to a Cloud Datalab instance

Cloud Datalab includes a set of libraries. The included libraries are intended to support common data analysis, transformation, and visualization scenarios. You can add additional Python libraries using one of the following three mechanisms:

  • Option 1: Add a code cell in a notebook and install the library using conda, substituting lib-name and running the cell:

    !conda install -y lib-name
    
    This is the easiest way to customize for individual needs and involves minimum maintenance as the underlying Cloud Datalab image is updated because rerunning the code cell is trivial.

  • Option 1.5: Use pip instead of conda. Libraries should be installed via conda when possible, but some libraries are only available via pip. In these cases, create a code cell as above, but change the content to the following:

    !pip install lib-name
    

  • Option 2: Create a new notebook and add a code cell with the following content, replacing conda with pip if necessary and substituting lib-name. Remember to remove -y if using pip.

    %%bash
    echo "conda install -y lib-name" >> /content/datalab/.config/startup.sh
    cat /content/datalab/.config/startup.sh
    
    Run the cell, then restart the Cloud Datalab instance by clicking the account icon user-icon in the top-right corner of the Cloud Datalab notebook or notebook listing page in your browser, selecting About Datalab,
    then clicking the Restart Server option from the About Google Cloud Datalab dialog.

  • Option 3: Inherit from the Cloud Datalab Docker container using a Docker customization mechanism. This option is much more heavyweight compared to the other options listed above. However, it provides maximum flexibility for those who intend to significantly customize the container for use by a team or organization. To use this mechanism you need to build your own container—named "Dockerfile-extended-example", below— by following the Docker documentation. Also see the customization example in the Cloud Datalab GitHub repo.

    In Dockerfile-extended-example.in:

    FROM datalab
    ...
    pip install lib-name
    ...
    
    This approach requires you to take on the additional work of building and maintaining your own image as the underlying datalab container evolves. Therefore, it is recommended that you use this approach only if the other mechanisms described above do not meet your needs.