Using Apache Guacamole VMs
Apache Guacamole is a clientless remote desktop gateway that gives you access to Project-based Virtual Machines (VMs) to perform computational analyses on your Project data. To learn more about the application, see the Guacamole documentation here: Apache GuacamoleĀ®.
How it Works
When a new project is created the Guacamole application and VMs will need to be set up for that Project by a Platform Administrator. Once set up, Project members can request access to the Guacamole-connected VMs. After access is granted, Project members can download data from the portal, perform computational workflows on the data inside the VM, then write the derivative outputs back to the portal where other Project members can access them.
Prerequisites
A Project user role with permission to access the Guacamole Workspace tool (e.g., custom roles or default roles with Collaborator permissions or higher).
Guacamole application and connected virtual machines pre-configured for the Project by the Platform Administrator. See Getting Access to Guacamole below.
Data Stewardship
Users are reminded to abide by the Platform Terms of Use and any Project-specific restrictions when using Workspace tools to access data and code.
Getting Access to Guacamole
Ā
Log in to Pilot and Navigate to your Project
Click Guacamole in the Workspace Tools icon group.
If you launch Guacamole and receive a notice that it hasnāt been deployed for your Project, please contact your Project Administrator to request setup.
If you launch Guacamole and are directed to the workbench but don't see any items listed under the ALL CONNECTIONS heading, it means you do not yet have access to any VMs. Please contact your Project Administrator and request access to the necessary VMs.
Launching a Guacamole VM Connection
After the first-time setup has been completed, you can start using Guacamole to access your Project Virtual Machines (VMs). Both Desktop and Command Line Interface VM connections are possible, depending on which VMs have been deployed for the Project and which VMs you have access to.
Launch your Project and click Guacamole in the workspace icon group.
If you have access to BOTH Desktop AND Command Line Interface VM connections, the Connections screen lists the available VMs. Each VM connection is identified by a name and ID number indicating whether it is a Desktop or Command Line connection, and the deployment zone (Green Room or Core).
Click a VM connection to initiate it.
If you only have access to EITHER a Desktop connection OR a Command Line Interface connection, the relevant login screen (Desktop or Command Line Interface) appears.
Follow the instructions for establishing a Desktop or Command Line Interface VM connection.
Establishing a Desktop VM Connection
On the Desktop login screen, enter the username and password you use to log into the platform, then click OK.Ā
The VM connection is established.
After connecting, you can begin working in your Desktop VM. To launch a Linux terminal, see Launching a Linux Terminal inside a Desktop VM.
Establishing a Command Line VM connection
Welcome to Ubuntu 20.04.5 LTS (GNU/Linux 5.15.0-1019-azure x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
On the Command Line login screen, enter the username and password you use to log into the platform.
The VM connection is established.
Launching a Linux Terminal inside a Desktop VM
After successfullyĀ logging, into a Desktop VM, the default desktop is displayed.Ā To launch the pre-installed Linux applications such as a standard terminal:
Ā
Click Activities in the upper left corner.
Enter
terminal
in the search box.The terminal launches in a new window.
Launching the Command Line Interface in a Guacamole VM
The Pilot Command Line Interface (CLI) is deployed within a Guacamole VM. Project members can use the Pilot Command Line Interface in a terminal to download Project data from the Core for further analysis, and upload the derivative outputs back to the Green Room or Core.
The following sections focus on getting started with basic pilotcli commands in a Guacamole VM. For additional pilotcli commands and usage, see the article Working with Project Files in the Command Line Interface.
Launching Pilot Command Line Interface
Launch your Project and click the Guacamole icon in in the workspace icon group
If you have access to more than one type of VM, a list of available connections appears. Launch a Command Line Interface connection or a Desktop connection and open a Terminal window.
If you only have access to a Command Line VM, the login screen is displayed immediately after launching Guacamole.
Log in with your platform username and password.
In the Terminal, type
pilotcli
to verify the Pilot Command Line Interface is functional.
Note: you will need to log in again to the pilotcli to utilize it.
uname@indoctestproject-demo:~$ pilotcli
Usage: pilotcli [OPTIONS] COMMAND [ARGS]...
Options:
--help Show this message and exit.
Commands:
container_registry Container Registry Actions.
dataset Dataset Actions.
file File Actions.
project Project Actions.
use_config Config Actions.
user User Actions.
For more information on logging in and using the pilotcli, see Working with Project Files in the Command Line Interface.
Zone Restrictions when using Pilot Command Line Interface in Green Room and Core VMs
The file operations permitted by the Command Line Interface depend on which zone (Green Room orĀ Core) the VM is deployed in. If you are not sure, this is usually indicated in the name of your VM connection after you open Guacamole.
Green Room Virtual Machine
When using the Pilot Command Line Interface in a Green Room VM, the following actions are possible:
File Operation | Permitted in the | Permitted in the |
File uploadĀ | Yes | No |
File download | Yes | No |
Core Virtual Machine
When using the Pilot Command Line Interface in a Core VM, the following actions are possible:
File Operation | Permitted in the | Permitted in the |
File uploadĀ | Yes | Yes |
File download | No | Yes |
Downloading Project Data using the Command Line Interface
After logging into the Pilot Command Line Interface, you can download data from the Project into the Guacamole VM environment to start your data analyses.
File related commands are grouped in the file
category. To see more information on all of the file commands, see File Commands - pilotcli file.
Example
Downloading a file from the Core to a userās Home directory in the Guacamole Virtual Machine
Filename: āTEST_csv.csvā
Source: Project āIndoc Test Projectā, āCoreā storage zone, folder āunameā
indoctestproject/uname/Test_csv.csv -z core
Destination: user's Home directory in the Guacamole
.
Command group/option:
file download
Command:
pilotcli file download indoctestproject/uname/TEST_csv.csv . -z core
Ā
uname@indoctestproject-demo:~$ pilotcli file download indoctestproject/uname/Test_csv.csv . -z core
Preparing status: READY_FOR_DOWNLOADING
start downloading...
Downloading TEST_csv.csv |āāāāāāāāāāāāāāāāāāāāāāāāāāāāāā 100% 00:00
File has been downloaded successfully and saved to: ./TEST_csv.csv
To confirm successful download, type
ls
and verify the fileTEST_csv.csv
is stored in the Home folder.
After downloading Project data, users can analyze the data using containerized pipelines or other computational workflows inside the Guacamole-connected VM environment, then upload the derivative files back into the Project using the Command Line Interface.Ā
Ā
Uploading Project Data from a VM to the Project using the Command Line Interface
After analyzing Project data inside the Workspace VM, users can upload the generated outputs back into the Project via the Command Line Interface (CLI).Ā Two worked examples are provided:
the first example walks through the basic operation of file upload;
the second example extends the first by adding a lineage history to the CLI operations to capture the uploaded fileās origin in the data lineage graph.
Example 1
Uploading a file from the userās Home directory of the Virtual Machine to the Core:
Filename:
TEST_csv.csv
Source: user's Home directory in the Guacamole-connected VM
./TEST_csv.csv
Destination: Project āIndoc Test Projectā, āCoreā storage zone, folder
uname
indoctestproject/uname
-z core
Command group/option:
file upload
User message (for upload back to the Core): āmy workbench output, no additional sensitive data"
Command:
pilotcli file upload ./TEST_csv.csv -p indoctestproject/uname -z core -m "my workbench output, no additional sensitive data"
Ā
After completing the upload, you can confirm the file exists in the correct directory using the Command Line Interface or the Portal File Explorer.
Example 2
Uploading a file from the userās Home directory of the Virtual Machine to the Core and building a data lineage with an existing origin file in the Core using a custom data lineage pipeline name.
Filename:
TEST_csv.csv
- the file generated in the VMSource: user's Home directory in the Guacamole-connected VM
./TEST_csv.csv
Destination: Project āIndoc Test Projectā, āCoreā storage zone, folder
uname
indoctestproject/uname
-z core
Data lineage origin file:
testing_data.png
- an origin file stored in the Core folderuname
. Lineage will be built betweenTEST_csv.csv
andtesting_data.png
Data lineage pipeline name:
testing_pipeline
the custom name of the pipeline to describe the lineage between the two files.Command group/option:
file upload
User message (for upload back to the Core): āmy upload with lineage. No additional sensitive data"
Command:
pilotcli file upload ./TEST_csv.csv -p indoctestproject/uname -z vrecore -m "my upload with lineage. No additional sensitive data" --source-file uname/testing_data.png Ā --pipeline "test_pipeline"
The newly created Data Lineage Graph can be viewed in the File Properties. The lineage between the source fileĀ (derivative file uploaded from the VM), and the origin file (original file in the Core) is displayed where:
testing_data.png
Ā is the Upstream filetesting_pipeline
Ā is the processing pipeline connecting the two files.TEST_csv.csv
Ā is the Downstream file
Review the lineage graph to ensure the source and origin are correctly linked. If not, you can delete the file from the Core and upload again from the VM, re-checking the file paths.
Troubleshooting
If downloading or uploading failed, please check the following:
Session expiry: Your session expired after being inactive for 30 minutes.
Incorrect target file/folder path: Check the file path of the target file/folder and ensure it exists in the Portal File Explorer.
Access permissions: Check that you have access to the target file/folder. You must be a member of the Project where your target file/folder is stored, with Project Collaborator or Project Administrator role (able to access to all files/folders within the Project Core zone).Ā
Considerations
This example describes working with VMs using data downloaded from and uploaded back to the Core storage. In special cases, Project members may be granted access to VMs in the project Green Room, for example, to perform interactive pseudonymization. In this case, the Command Line Interface deployed in the Green Room VM can only access Green Room data stores and does not support download of data from the Core. See Using Guacamole for more information.
Ā Related articles
Ā