Access environment variables in Jupyter notebook

Problem

Start a Dataproc cluster with an environment variable set to a predefined value and access the value of this variable in Jupyter Notebook.

Environment

  • Dataproc cluster
  • Jupyter notebook
    • Optional component enabled

Solution

Workaround
  1. Set a new environment variable and restart Jupyter service.
  2. Sample init script:
    #!/bin/bash
    
    
    function set_env() {
    
      local role
    
      role="$(/usr/share/google/get_metadata_value attributes/dataproc-role)"
    
    
      if [[ "${role}" == 'Master' ]]; then
    
        cat <<EOF >>"/etc/environment" DATAPROC_TEST_ENV="my_env_var" EOF
    
        systemctl restart jupyter.service
    
      fi
    
    }
    
    
    set_env

Cause

The environment variables are read when Jupyter service is started by the dataproc-startup script. The new environment variables are defined in the initialization script that is run after the dataproc-startup script. Therefore they will not be available in the Jupyter Notebook.