/
Using Apache Guacamole VMs

Using Apache Guacamole VMs

Apache Guacamole is a clientless remote desktop gateway that gives you access to Project-based Virtual Machines (VMs) to perform computational analyses on your Project data. To learn more about the application, see the Guacamole documentation here: Apache GuacamoleĀ®.

How it Works

When a new project is created the Guacamole application and VMs will need to be set up for that Project by a Platform Administrator. Once set up, Project members can request access to the Guacamole-connected VMs. After access is granted, Project members can download data from the portal, perform computational workflows on the data inside the VM, then write the derivative outputs back to the portal where other Project members can access them.

Prerequisites

  • A Project user role with permission to access the Guacamole Workspace tool (e.g., custom roles or default roles with Collaborator permissions or higher).

  • Guacamole application and connected virtual machines pre-configured for the Project by the Platform Administrator. See Getting Access to Guacamole below.

Data Stewardship

Users are reminded to abide by the Platform Terms of Use and any Project-specific restrictions when using Workspace tools to access data and code.

Getting Access to Guacamole

Ā 

  1. Log in to Pilot and Navigate to your Project

  2. Click Guacamole in the Workspace Tools icon group.

    1. If you launch Guacamole and receive a notice that it hasnā€™t been deployed for your Project, please contact your Project Administrator to request setup.

    2. If you launch Guacamole and are directed to the workbench but don't see any items listed under the ALL CONNECTIONS heading, it means you do not yet have access to any VMs. Please contact your Project Administrator and request access to the necessary VMs.

Launching a Guacamole VM Connection

After the first-time setup has been completed, you can start using Guacamole to access your Project Virtual Machines (VMs). Both Desktop and Command Line Interface VM connections are possible, depending on which VMs have been deployed for the Project and which VMs you have access to.

  1. Launch your Project and click Guacamole in the workspace icon group.

    1. If you have access to BOTH Desktop AND Command Line Interface VM connections, the Connections screen lists the available VMs. Each VM connection is identified by a name and ID number indicating whether it is a Desktop or Command Line connection, and the deployment zone (Green Room or Core).

    2. Click a VM connection to initiate it.

    3. If you only have access to EITHER a Desktop connection OR a Command Line Interface connection, the relevant login screen (Desktop or Command Line Interface) appears.

  2. Follow the instructions for establishing a Desktop or Command Line Interface VM connection.

Establishing a Desktop VM Connection

  1. On the Desktop login screen, enter the username and password you use to log into the platform, then click OK.Ā 

  2. The VM connection is established.

  3. After connecting, you can begin working in your Desktop VM. To launch a Linux terminal, see Launching a Linux Terminal inside a Desktop VM.

Establishing a Command Line VM connection

Welcome to Ubuntu 20.04.5 LTS (GNU/Linux 5.15.0-1019-azure x86_64) * Documentation: https://help.ubuntu.com * Management: https://landscape.canonical.com * Support: https://ubuntu.com/advantage
  1. On the Command Line login screen, enter the username and password you use to log into the platform.

  2. The VM connection is established.

Launching a Linux Terminal inside a Desktop VM

After successfullyĀ logging, into a Desktop VM, the default desktop is displayed.Ā  To launch the pre-installed Linux applications such as a standard terminal:

Ā 

  1. Click Activities in the upper left corner.

  2. Enter terminal in the search box.

  3. The terminal launches in a new window.

Launching the Command Line Interface in a Guacamole VM

The Pilot Command Line Interface (CLI) is deployed within a Guacamole VM. Project members can use the Pilot Command Line Interface in a terminal to download Project data from the Core for further analysis, and upload the derivative outputs back to the Green Room or Core.

The following sections focus on getting started with basic pilotcli commands in a Guacamole VM. For additional pilotcli commands and usage, see the article Working with Project Files in the Command Line Interface.

Launching Pilot Command Line Interface

  1. Launch your Project and click the Guacamole icon in in the workspace icon group

    1. If you have access to more than one type of VM, a list of available connections appears. Launch a Command Line Interface connection or a Desktop connection and open a Terminal window.

    2. If you only have access to a Command Line VM, the login screen is displayed immediately after launching Guacamole.

  2. Log in with your platform username and password.

  3. In the Terminal, type pilotcli to verify the Pilot Command Line Interface is functional.
    Note: you will need to log in again to the pilotcli to utilize it.

uname@indoctestproject-demo:~$ pilotcli Usage: pilotcli [OPTIONS] COMMAND [ARGS]... Options: --help Show this message and exit. Commands: container_registry Container Registry Actions. dataset Dataset Actions. file File Actions. project Project Actions. use_config Config Actions. user User Actions.
  1. For more information on logging in and using the pilotcli, see Working with Project Files in the Command Line Interface.

Zone Restrictions when using Pilot Command Line Interface in Green Room and Core VMs

The file operations permitted by the Command Line Interface depend on which zone (Green Room orĀ Core) the VM is deployed in. If you are not sure, this is usually indicated in the name of your VM connection after you open Guacamole.

Green Room Virtual Machine

When using the Pilot Command Line Interface in a Green Room VM, the following actions are possible:

File Operation

Permitted in the
Green Room

Permitted in the
Core

File uploadĀ 

Yes

No

File download

Yes

No

Core Virtual Machine

When using the Pilot Command Line Interface in a Core VM, the following actions are possible:

File Operation

Permitted in the
Green Room

Permitted in the
Core

File uploadĀ 

Yes

Yes

File download

No

Yes

Downloading Project Data using the Command Line Interface

After logging into the Pilot Command Line Interface, you can download data from the Project into the Guacamole VM environment to start your data analyses.

File related commands are grouped in the file category. To see more information on all of the file commands, see File Commands - pilotcli file.

Example

Downloading a file from the Core to a userā€™s Home directory in the Guacamole Virtual Machine

  • Filename: ā€œTEST_csv.csvā€

  • Source: Project ā€œIndoc Test Projectā€, ā€œCoreā€ storage zone, folder ā€œunameā€ indoctestproject/uname/Test_csv.csv -z core

  • Destination: user's Home directory in the Guacamole.

  • Command group/option: file download

  • Command: pilotcli file download indoctestproject/uname/TEST_csv.csv . -z coreĀ 

uname@indoctestproject-demo:~$ pilotcli file download indoctestproject/uname/Test_csv.csv . -z core Preparing status: READY_FOR_DOWNLOADING start downloading... Downloading TEST_csv.csv |ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ 100% 00:00 File has been downloaded successfully and saved to: ./TEST_csv.csv
  • To confirm successful download, type ls and verify the file TEST_csv.csv is stored in the Home folder.

After downloading Project data, users can analyze the data using containerized pipelines or other computational workflows inside the Guacamole-connected VM environment, then upload the derivative files back into the Project using the Command Line Interface.Ā 

Ā 

Uploading Project Data from a VM to the Project using the Command Line Interface

After analyzing Project data inside the Workspace VM, users can upload the generated outputs back into the Project via the Command Line Interface (CLI).Ā Two worked examples are provided:

  • the first example walks through the basic operation of file upload;

  • the second example extends the first by adding a lineage history to the CLI operations to capture the uploaded fileā€™s origin in the data lineage graph.

Example 1

Uploading a file from the userā€™s Home directory of the Virtual Machine to the Core:

  • Filename: TEST_csv.csv

  • Source: user's Home directory in the Guacamole-connected VM ./TEST_csv.csv

  • Destination: Project ā€œIndoc Test Projectā€, ā€œCoreā€ storage zone, folder uname
    indoctestproject/uname -z core

  • Command group/option: file upload

  • User message (for upload back to the Core): ā€œmy workbench output, no additional sensitive data"

  • Command: pilotcli file upload ./TEST_csv.csv -p indoctestproject/uname -z core -m "my workbench output, no additional sensitive data"Ā 

After completing the upload, you can confirm the file exists in the correct directory using the Command Line Interface or the Portal File Explorer.

Example 2

Uploading a file from the userā€™s Home directory of the Virtual Machine to the Core and building a data lineage with an existing origin file in the Core using a custom data lineage pipeline name.

  • Filename: TEST_csv.csv - the file generated in the VM

  • Source: user's Home directory in the Guacamole-connected VM ./TEST_csv.csv

  • Destination: Project ā€œIndoc Test Projectā€, ā€œCoreā€ storage zone, folder uname
    indoctestproject/uname -z core

  • Data lineage origin file: testing_data.png - an origin file stored in the Core folder uname. Lineage will be built between TEST_csv.csv and testing_data.png

  • Data lineage pipeline name: testing_pipeline the custom name of the pipeline to describe the lineage between the two files.

  • Command group/option: file upload

  • User message (for upload back to the Core): ā€œmy upload with lineage. No additional sensitive data"

  • Command: pilotcli file upload ./TEST_csv.csv -p indoctestproject/uname -z vrecore -m "my upload with lineage. No additional sensitive data" --source-file uname/testing_data.png Ā --pipeline "test_pipeline"

The newly created Data Lineage Graph can be viewed in the File Properties. The lineage between the source fileĀ (derivative file uploaded from the VM), and the origin file (original file in the Core) is displayed where:

  • testing_data.pngĀ is the Upstream file

  • testing_pipelineĀ is the processing pipeline connecting the two files.

  • TEST_csv.csvĀ is the Downstream file

Review the lineage graph to ensure the source and origin are correctly linked. If not, you can delete the file from the Core and upload again from the VM, re-checking the file paths.

Troubleshooting

If downloading or uploading failed, please check the following:

  • Session expiry: Your session expired after being inactive for 30 minutes.

  • Incorrect target file/folder path: Check the file path of the target file/folder and ensure it exists in the Portal File Explorer.

  • Access permissions: Check that you have access to the target file/folder. You must be a member of the Project where your target file/folder is stored, with Project Collaborator or Project Administrator role (able to access to all files/folders within the Project Core zone).Ā 

Considerations

This example describes working with VMs using data downloaded from and uploaded back to the Core storage. In special cases, Project members may be granted access to VMs in the project Green Room, for example, to perform interactive pseudonymization. In this case, the Command Line Interface deployed in the Green Room VM can only access Green Room data stores and does not support download of data from the Core. See Using Guacamole for more information.

Ā  Related articles

Ā