Skip to content

Create_Terra_Table

Quick Facts

Workflow Type Applicable Kingdom Last Known Changes Command-line Compatibility Workflow Level
Data Import Any taxa PHB v2.2.0 Yes Sample-level

Create_Terra_Table_PHB

The manual creation of Terra tables can be tedious and error-prone. This workflow will automatically create your Terra data table when provided with the location of the files.

Inputs

Default Behavior

Files with underscores and/or decimals in the sample name are not recognized; please use dashes instead.

For example, name.banana.hello_yes_please.fastq.gz will become "name". This means that se-test_21.fastq.gz and se-test_22.fastq.gz will not be recognized as separate samples.

This can be changed by providing information in the file_ending optional input parameter. See below for more information.

Terra Task Name Variable Type Description Default Value Terra Status
create_terra_table assembly_data Boolean Set to true if your data is in FASTA format; set to false if your data is FASTQ format Required
create_terra_table data_location_path String The full path to your data's Google bucket folder location, including the gs://; can be easily copied by right-clicking and copying the link address in the header after navigating to the folder in the "Files" section of the "Data" tab on Terra (see below for example) Required
create_terra_table new_table_name String The name of the new Terra table you want to create Required
create_terra_table paired_end Boolean Set to true if your data is paired-end FASTQ files; set to false if not Required
create_terra_table terra_project String The name of the Terra project where your data table will be created Required
create_terra_table terra_workspace String The name of the Terra workspace where your data table will be created Required
create_terra_table file_ending String Use to provide file ending(s) to determine what should be dropped from the filename to determine the name of the sample (see below for more information) Optional
make_table cpu Int Number of CPUs to allocate to the task 1 Optional
make_table disk_size Int Amount of storage (in GB) to allocate to the task 25 Optional
make_table docker String The Docker container to use for the task "us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-06-21" Optional
make_table memory Int Amount of memory/RAM (in GB) to allocate to the task 4 Optional

Finding the data_location_path

Using the Terra data uploader

Click for more information

Once you have named your new collection, you will see the collection name directly above where you can drag-and-drop your data files, or on the same line as the Upload button. Right-click the collection name and select "Copy link address." Paste the copied link into the data_location_path variable, remembering to enclose it in quotes.

Note

If you click "Next" after uploading your files, it will ask for a metadata TSV. You do not have to provide this, and can instead exit the window. Your data will still be uploaded.

Data uploader

Using the "Files" section in the Data tab

Click for more information

Navigate to the folder where your data is ("example_upload" in this example) and right-click on the folder name and select "Copy link address."

If you uploaded data with the Terra data uploader, your collection will be nested in the "uploads" folder.

Data tab

How to determine the appropriate file_ending for your data

The file_ending should be a substring of your file names that is held in common. See the following examples:

One or more elements in common

If you have the following files:

  • sample_01_R1.fastq.gz
  • sample_01_R2.fastq.gz
  • sample_02_R1.fastq.gz
  • sample_02_R2.fastq.gz

The default behavior would result in a single entry in the table called "sample" which is incorrect. You can rectify this by providing an appropriate file_ending for your samples.

In this group, the desired sample names are "sample_01" and "sample_02". For all the files following the desired names, the text contains _R. If we provide "_R" as our file_ending, then "sample_01" and "sample_02" will appear in our table with the appropriate read files.

No elements in common

If you have the following files:

  • sample_01_1.fastq.gz
  • sample_01_2.fastq.gz
  • sample_02_1.fastq.gz
  • sample_02_2.fastq.gz

The default behavior would result in a single entry in the table called "sample" which is incorrect. You can rectify this by providing an appropriate file_ending for your samples.

In this group, the desired sample names are "sample_01" and "sample_02". However, in this example, there is no common text following the sample name. Providing "_" would result in the same behavior as default. We can provide two different patterns in the file_ending variable: "_1,_2" to capture all possible options. By doing this, "sample_01" and "sample_02" will appear in our table with the appropriate read files.

To include multiple file endings, please separate them with commas, as shown in the "no elements in common" section.

Outputs

Your table will automatically appear in your workspace with the following fields:

  • Sample name (under the new_table_name_id column), which will be the section of the file's name before any decimals or underscores (unless file_ending is provided)
  • By default:
    • sample01.lane2_flowcell3.fastq.gz will be represented by sample01 in the table
    • sample02_negativecontrol.fastq.gz will be represented by sample02 in the table
  • See "How to determine the appropriate file_ending for your data" above to learn how to change this default behavior
  • Your data in the appropriate columns, dependent on the values of assembly_data and paired_end

    table columns assembly_data is true paired_end is true assembly_data AND paired_end are false
    read1
    read2
    assembly_fasta
  • The date of upload under the upload_date column

  • The name of the workflow under table_created_by, to indicate the table was made by the Create_Terra_Table_PHB workflow.