Using JupyterHub
JupyterHub is an open-source, multi-user version of Jupyter Notebook for performing analysis of Project files in the Core. More information can be found in the application documentation Project Jupyter.
- 1 How it Works
- 2 Prerequisites
- 3 Data Stewardship
- 4 Getting Access to JupyterHub
- 5 Launching JupyterHub
- 6 Creating a Notebook
- 7 Launching the Terminal
- 8 Creating a Python Virtual Environment and Registering a Kernel
- 9 Installing New Python Packages
- 10 Launching the Command Line Interface in a JupyterHub Terminal
- 10.1 Launching Pilot Command Line Interface
- 10.2 Zone Restrictions when using Pilot Command Line Interface in JupyterHub
- 10.3 Downloading Project Data to JupyterHub using the Pilot Command Line Interface
- 10.3.1 Example
- 10.3.2 Unzipping Files
- 10.4 Uploading Project Data from JupyterHub using the Pilot Command Line Interface
- 10.4.1 Example
- 11 Troubleshooting
- 12 Related articles
How it Works
JupyterHub is a multi-user Hub that spawns and manages multiple instances of Jupyter notebooks. When deployed for a a Project, JupyterHub spins up a new JupyterLab instance for each Project member.
JupyterHub allows Project members to create or import Jupyter Notebooks into the Project Workspace environment, retrieve Project files from the Project Core, perform computational workflows on the data, and write the outputs back to the Project Core where they can be accessed by other permissioned Project members.
Prerequisites
Project Collaborator role or higher.
JupyterHub has been configured for the Project by the Platform Administrator. See Getting Access to Jupyterhub (below).
Data Stewardship
Users are reminded to abide by the Platform Terms of Use and any Project-specific restrictions when using Workspace tools to access data and code.
For data privacy, be sure to log out of Jupyterhub when finished with the session.
Getting Access to JupyterHub
JupyterHub is configured for each Project by a Platform Administrator upon request.
If you click the JupyterHub icon and receive a notice that it hasn’t been deployed for your project, please contact your Platform Administrator.
If you don’t see the JupyterHub icon, your role doesn’t include the JupyterHub permission level. Please contact the Project Administrator to update your role.
Launching JupyterHub
Launch your Project and click the JupyterHub icon in the left menu bar.
Click Sign in with Keycloak to initiate your session. JupyterHub automatically authenticates with your existing username and password and launches your session - no additional sign-in is required.
You can chose to either start a Minimal environment, which comes with Python, or a Datascience environment, which also includes R and Julia in addition to Python.
From the JupyterHub home page (a JupyterLab interface) you can now perform various actions such as creating and working on Jupyter Notebooks, importing existing ones, and using the Pilot Command Line Interface in the terminal to retrieve, analyze, and re-upload Project Core data. Moreover, you can also use the pre-deployed and configured package management software conda to download, install, and manage Python packages as per individual demand (see the sections Installing New Python Packages and Creating a Virtual Python Environment and Registering a Kernel below for more details).
When finished using JupyterHub, click Logout to end your session.
After a period of session inactivity, you will be logged out automatically.
Creating a Notebook
Users can create a new Jupyter Notebook with Python 3 inside JupyterHub, with dedicated and persistent storage under the users' Home Directory.
In the Launcher, click the Python 3 Notebook icon, or click File > New > Notebook.
Create your Notebook.
Launching the Terminal
JupyterHub provides browser-based terminal access for advanced users to run commands directly in the system shell. Importantly, this allows users to transfer data between, for instance the Project's Core and their JupyterHub home directory using pilotcli, or to download and manage Python packages.
In the Launcher, click the Terminal icon, or click File > New > Terminal.
The terminal window opens.
Ubuntu is used to host Jupyter Notebook. Use the command cat /etc/os-release
to determine the current version of Ubuntu:
uname@jupyter-uname:/etc$ cat os-release
NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.4 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
Creating a Python Virtual Environment and Registering a Kernel
The user has full flexibility to use different virtual environment and/or package management systems. Please find the examples of using conda or Python's in-built venv options described below. In either case, the user has to register the new environment as a kernel using ipykernel, to make is accessible via the Jupyter Notebooks (see Registering the new Virtual Environment as Kernel for more details).
Option 1: Creating the Virtual Environment Using conda
The package management software conda by Anaconda has become one of the most popular package management systems, especially for Data and Life Sciences. Therefore, conda is already pre-deployed and configured in each user’s JupyterHub. Please find the full documentation of conda here, and the corresponding documentation of how to manage virtual environments using conda here. The following steps provide a short example of how you can use conda to create a new virtual environment using the JupyterHub terminal within the Platform.
To begin, you need to activate conda. Since it is already pre-deployed and configured for you, all you need to do is launch a terminal within JupyterHub (see Launching the Terminal above) and execute the command source activate
. This will activate conda and you can see the success of this by the indication of the currently activated conda environment at the beginning of the line, displayed in parentheses - usually “base”:
username@jupyter-username:~$ source activate
(base) username@jupyter-username:~$
To create a new environment, run the following commands in the terminal after activating conda:
(base) username@jupyter-username:~$ conda create --name your_env_name
Replace your_env_name
with your preferred name for the environment. When being prompted by conda to confirm the creation of the environment at the specified location (per default in the users home directory - please do not change this location, to ensure persistence of your created environment), proceed with the creation by typing “y”, or abort the process by typing “n”. Once confirmed, conda will complete the environment creation process and remind you to activate the environment:
Please note, at the end of the environment creation process, you will still remain in the previously activate environment (“base”, in this example). Therefore, please remember to activate the novel environment before installing any packages by running the command conda activate your_env_name
and replace “your_env_name” with the corresponding name you chose (“sample_env” in this example):
You can now install the desired packages in this new conda environment, for instance using the conda install
command. For example, in order to install the latest version of Python, run:
To see a list of all installed packages in the currently activated environment (indicated in parentheses at the beginning of the line, “base” in this case), run:
To see a list of all existing conda environments, run:
Please find many more examples and the full documentation of how to manage conda environments here. Importantly, please remember to follow the instructions in the Registering the new Virtual Environment as Kernel section below, to make the virtual environment accessible via the Jupyter Notebooks.
Option 2: Creating the Virtual Environment Using venv
As an alternative to using conda, you can also use the Python native package venv. Please find the full documentation of venv here, and a short example of how to create a new virtual environment using venv below:
Registering the new Virtual Environment as Kernel
In order to make the newly created virtual environment accessible for the Jupyter Notebooks, you have to register it using ipykernel. Importantly, please make sure that the corresponding environment is currently active before running the following commands:
Please replace your_env_name
with the name of your newly created environment. Depending on which package and/or virtual environment management system you chose to use, you may have to install ipykernel
in the newly created environment first. Remember to activate the newly created environment and then run one of the following commands to install ipykernel, depending on your package management system of choice:
or:
Once you have installed ipykernel, re-run the command above to register your novel environment via ipykernel.
Afterwards, the environment will be listed when you open the Launcher to open a new Jupyter Notebook:
and also from each opened Notebook, e.g., via Kernel > Change Kernel… :
Installing New Python Packages
We highly recommend the use of virtual environments when installing new packages (see Creating a Python Virtual Environment and Registering a Kernel above for more details). Consequently, we recommend installing new packages via commands in the JupyterHub terminal in the corresponding virtual environments, instead of installing packages from within Jupyter Notebooks.
Depending on the IT policies, outbound traffic may need to go through a proxy. If so, users will be required to provide the proxy command line argument such as pip, curl, wget, etc.
For example:
If you are using conda to manage python packages:
The above information is provided as examples only. Please refer to documentation provided by your IT department with respect to proxy configuration.
Launching the Command Line Interface in a JupyterHub Terminal
The Pilot Command Line Interface (CLI) is deployed within JupyterHub. Project members can use the Pilot Command Line Interface in a JupyterHub terminal to download Project data from the Core for further analysis, and upload the derivative outputs back to the Green Room or Core.
The Home Directory is your default directory. When you download a copy of your Core files to JupyterHub, the files persists in the JupyterHub environment until deleted by you, so you can return to the session and continue your work at a later time without the need to retrieve the data from the Core again.
The following sections focus on getting started with basic pilotcli commands in JupyterHub. For additional pilotcli commands and usage, see the article https://indocconsortium.atlassian.net/wiki/spaces/JSDNXT/pages/4317709818.
Launching Pilot Command Line Interface
Launch your Project and click the JupyterHub icon in the left-hand sidebar.
Click the Terminal launcher icon to open the Terminal.
In the JupyterHub Terminal, type
pilotcli
to verify the Pilot Command Line Interface is functional.
Note: you will need to log in again to the pilotcli to utilize it.
For more information on logging in and using the pilotcli, see https://indocconsortium.atlassian.net/wiki/spaces/JSDNXT/pages/4317709818.
Zone Restrictions when using Pilot Command Line Interface in JupyterHub
When using the Pilot Command Line Interface in JupyterHub the following actions are possible:
File Operation | Permitted to/from the | Permitted to/from the |
---|---|---|
File upload | Yes | Yes |
File download | No | Yes |
Downloading Project Data to JupyterHub using the Pilot Command Line Interface
After logging into the Pilot Command Line Interface, you can download data from the Project Core into the JupyterHub environment to start your data analyses.
File related commands are grouped in the file
category. To see more information on all of the file commands, see https://indocconsortium.atlassian.net/wiki/spaces/JSDNXT/pages/4317709086.
Example
Downloading a file from the Core to your Home Directory:
Reminder: Please follow Linux conventions for file management. If your filename contains spaces, wrap it in single or double quotes.
Filename: “Chemical Tracking Data.csv”
Source: Project “Indoc Test Project”, “Core” storage zone, “users” folder “collaborator4”
Destination: user's Home directory in JupyterHub
To confirm successful download, type ls
and verify the file Chemical Tracking Data.csv
is stored in the Home folder.
The file “Chemical Tracking Data.csv” can be viewed in the JupyterHub graphical user interface by clicking the folder icon in the left-hand sidebar.
After downloading Project data, you can analyze the data using containerized pipelines or other computational workflows inside the JupyterHub environment, then upload the derivative files back into the Project using the Pilot Command Line Interface.
Unzipping Files
Archives can be extracted using either unzip
or 7z x
. Please note that if using unzip
, you may receive errors:
These errors are expected and are caused by the underlying file system being used by JupyterHub. These can be ignored. Your files will still successfully extract.
Uploading Project Data from JupyterHub using the Pilot Command Line Interface
After analyzing Project data inside the JupyterHub, you can log into the Pilot Command Line Interface and upload the generated outputs back into the Project.
File related commands are grouped in the file
category. To see more information on all of the file commands, see https://indocconsortium.atlassian.net/wiki/spaces/JSDNXT/pages/4317709086.
Example
Filename: Chemical Tracking Data rev.csv
Source: user's Home directory in JupyterHub
Destination: Project “Indoc Test Project”, “Core” storage zone, , “users” folder “collaborator4”
When uploading data to the Core, you are reminded that you are bypassing the usual Green Room upload workflow. To confirm, type y
at the prompt, or N
to cancel.
After completing the upload, you can confirm the new file “Chemical Tracking Data rev.csv" exists in the correct directory using the pilotcli file list
command and/or in the Portal File Explorer.
Troubleshooting
If downloading or uploading failed, please check the following:
Session expiry: Your session may have expired after being inactive for 30 minutes.
Incorrect target file/folder path: Check the file path of the target file/folder and ensure it exists in the Portal File Explorer.
Access permissions: Check that you have access to the target file/folder. You must be a member of the Project where your target file/folder is stored, with Project Collaborator or Project Administrator role (able to access to all files/folders within the Project Core zone).