Skip to content

BaseSpace_Fetch

Quick Facts

Workflow Type Applicable Kingdom Last Known Changes Command-line Compatibility Workflow Level
Data Import Any taxa PHB v1.3.0 Yes Sample-level

BaseSpace_Fetch_PHB

The BaseSpace_Fetch workflow facilitates the transfer of Illumina sequencing data from BaseSpace (a cloud location) to a workspace on the Terra.bio platform. Rather than downloading the files to a local drive and then re-uploading them to another location, we can perform a cloud-to-cloud transfer with the BaseSpace_Fetch workflow.

Setting up BaseSpace_Fetch

Some initial set-up is required to use the workflow. To access one's BaseSpace account from within a workflow on Terra.bio, it is necessary to retrieve an access token and the API server address using the BaseSpace command-line tool. The access token is unique to a BaseSpace account. If it is necessary to transfer data from multiple BaseSpace accounts, multiple access tokens will need to be retrieved. Please see the "Retrieving BaseSpace Access Credentials" section below.

Retrieving BaseSpace Access Credentials

This process must be performed on a command-line before using the BaseSpace_Fetch workflow for the first time. This can be set up in Terra, however it will work in any command-line environment that has access to the internet to install & run the BaseSpace command-line tool: bs.

If you already have a command-line environment available, you can skip ahead to Step 2.

Step 1: Create a command-line environment
Click for more information
  1. Select the "Environment Configuration" cloud icon on the right side of the workspace dashboard tab

    Click on the cloud icon to access the environment configuration

    environment configuration

  2. Select the "Settings" button under Jupyter

    Click on Settings underneath the Jupyter icon

    settings

  3. Click "CREATE" at the bottom of the "Jupyter Cloud Environment" page. There is no need to alter the default environment configuration.

    Click on Create at the bottom of the page

    create environment

    Environment customization

    The default environment should be sufficient for retrieval of BaseSpace credentials, but if performing other tasks in the environment please modify the resource allocations appropriately.

    You will be returned to the main page after clicking "Create". You will notice two new icons in your right-hand side bar as the environment is being created.

    Environment creation in progress

    environment creation

Step 2: Install the BaseSpace Command-Line Tool to Retrieve the Access Token and API Server Address
Click for more information
  1. When the environment is created and active, you should see a green dot in the bottom right corner of the Jupyter icon. Click on the "Terminal" icon in the right side-bar of the Terra dashboard to open the terminal.

    Open the terminal

    open the terminal

    The open terminal will appear in a new tab in your browser and will look similar to this:

    The terminal window

    terminal window

  2. Download and setup the BaseSpace (BS) command line interface (CLI) tool (as per the Illumina documentation) by following the commands below. The lines beginning with # are comments, the following lines are the commands to be copy/pasted into the terminal

    BaseSpace Fetch Authentication Instructions
    # create bin directory
    mkdir ~/bin
    
    # download the basespace cli
    wget "https://launch.basespace.illumina.com/CLI/latest/amd64-linux/bs" -O $HOME/bin/bs
    
    # provide proper permissions to make the bs cli executable 
    chmod u+x $HOME/bin/bs
    
    # add the 'bs' command-line tool to the $PATH variable so that you can call the command-line tool from any directory
    export PATH="$PATH:$HOME/bin/"
    
    # authenticate with BaseSpace credentials
    bs auth
    
    # navigate to the link provided in stdout and accept the authentication request through BaseSpace
    
    # Print the api server and access token to stdout (replace the path below with the appropriate path returned by the find command above)
    cat ~/.basespace/default.cfg
    
  3. Copy and paste the contents of the ~/.basespace/default.cfg (specifically, the access_token & API server details) of the default.cfg file into Terra as workspace data elements.

    1. Navigate to the Terra "DATA" tab, and select "Workspace Data" at the bottom of the left sidebar.
    2. Click on "Edit" and then "Add variable" to add the new workspace data elements as in the examples below.

    Create workspace data elements

    workspace data elements

Preparing to retrieve a run with BaseSpace_Fetch

Step 1: Create a Metadata Sheet from the BaseSpace SampleSheet
Click for more information

Best Practices for Sample Identifiers

  • Avoid the use of underscores and whitespaces in the BaseSpace Project/Run name and/or the sample identifiers
  • Underscores in a sample name may lead to BaseSpace_Fetch failure
  • Avoid re-using Sample IDs. Make all sample IDs unique!
  1. Download the sample sheet from BaseSpace.

    On the BaseSpace portal, you can navigate to this via: Runs → {run} → Files → SampleSheet.csv

    Example SampleSheet.csv

    example sample sheet

  2. In Excel or an alternative spreadsheet software, set up a metadata sheet for Terra, with a row for each sample. Please feel free to use our BaseSpace_Fetch Template to help ensure the file is formatted correctly.

    1. In cell A1, enter the data table name with the "entity:TABLENAME_id" format
    2. Create a column called basespace_sample_name and populate this with the data found under the Sample_Name column in the BaseSpace sample sheet.

      Watch out

      If the contents of the Sample_Name and Sample_ID columns in the BaseSpace sample sheet are different, make a basespace_sample_id column in your spreadsheet and populate this with the data found under the Sample_ID column in the BaseSpace sample sheet.

    3. Create a basespace_collection_id column, and populate it with the BaseSpace Project or Run identifier

    4. Populate column A of the spreadsheet with the sample names as seen in the sample sheet

      Example Metadata Sheet

      example metadata sheet

Step 2: Upload the metadata spreadsheet to the destination workspace in Terra.bio
Click for more information
  1. In Terra, navigate to the "DATA" tab, click "IMPORT DATA" then "Upload TSV"

    Upload TSV

    upload tsv

  2. Copy and paste the contents of the whole spreadsheet into the "TEXT IMPORT" tab and click "START IMPORT JOB"

    Import Metadata

    text import

You can now use the created table to run the BaseSpace_Fetch workflow.

Inputs

Call-Caching Disabled

If using BaseSpace_Fetch workflow version 1.3.0 or higher, the call-caching feature of Terra has been DISABLED to ensure that the workflow is run from the beginning and data is downloaded fresh. Call-caching will not be enabled, even if the user checks the box ✅ in the Terra workflow interface.

Sample_Name and Sample_ID

If the Sample_Name and Sample_ID in the BaseSpace sample sheet are different, set the basespace_sample_id input attribute to "this.basespace_sample_id".

Terra Task Name Variable Type Description Default Value Terra Status
basespace_fetch access_token String The access token is used in place of a username and password to allow the workflow to access the user account in BaseSpace from which the data is to be transferred. It is an alphanumeric string that is 32 characters in length. Example: 9e08a96471df44579b72abf277e113b7 Required
basespace_fetch api_server String The API server is the web address to which data transfer requests can be sent by the workflow. Use this API server if you are unsure: "https://api.basespace.illumina.com" (this is the default set by the command-line tool) Required
basespace_fetch basespace_collection String The collection ID is the BaseSpace Run or Project where the data to be transferred is stored. Required
basespace_fetch basespace_sample_name String The BaseSpace sample name is the sample identifier used in BaseSpace. This identifier is set on the sample sheet at the onset of an Illumina sequencing run. Required
basespace_fetch sample_name String The sample name is the sample identifier used in the Terra.bio data table corresponding to the metadata associated with the sample to be transferred from BaseSpace Required
basespace_fetch basespace_sample_id String The BaseSpace sample ID is an optional additional identifier used in BaseSpace. If a sample has a BaseSpace sample ID it should be available on the sample sheet and must be included in the metadata sheet upload prior to running BaseSpace_Fetch. Optional
fetch_bs cpu Int This input is the number of CPU's used in the data transfer. To facilitate the transfer of many files this runtime parameter may be increased. 2 Optional
fetch_bs disk_size Int The disk size is the amount of storage in GigaBytes (GB) requested for the VM to run the data transfer task. 100 Optional
fetch_bs docker_image String The Docker image used to run BaseSpace_Fetch task. us-docker.pkg.dev/general-theiagen/theiagen/basespace_cli:1.2.1 Optional
fetch_bs memory Int The memory is the amount of RAM/memory requested for running the data transfer task. 8 Optional
version_capture docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0 Optional
version_capture timezone String Set the time zone to get an accurate date of analysis (uses UTC by default) Optional

Outputs

The outputs of this workflow will be the fastq files imported from BaseSpace into the data table where the sample ID information had originally been uploaded.

Variable Type Description
basespace_fetch_analysis_date String The date the workflow was run
basespace_fetch_version String The version of the repository the Basespace_Fetch workflow is in
read1 File File with forward-facing reads
read2 File File with reverse-facing read