Skip to content

Getting Started with Terra

Our Approach

Theiagen’s approach to genomic analysis in public health typically uses the Terra platform to run workflows that undertake bioinformatic analysis, then uses other platforms for visualization of the resulting data. This is described in more depth in our paper Accelerating bioinformatics implementation in public health, and the application of this approach for genomic surveillance of SARS-CoV-2 in California is described in the paper Pathogen genomics in public health laboratories: successes, challenges, and lessons learned from California’s SARS-CoV-2 Whole-Genome Sequencing Initiative, California COVIDNet.

When undertaking genomic analysis using Terra and other data visualization platforms, it is essential to consider the necessary and appropriate workflows and resources for your analysis. To help you make these choices, take a look at the relationship between the most commonly used Theiagen workflows, and the descriptions of the major stages in genomic data analysis below.

Analysis Approaches for Genomic Data

The relationship between the various PHB workflows The relationship between the various PHB workflows

This diagram shows the Theiagen workflows (green boxes) available for analysis of genomic data in public health and the workflows that may be used consecutively (arrows). The blue boxes describe the major functions that these workflows undertake. The yellow boxes show functions that may be undertaken independently of workflows on Terra.

Data Import to Terra

To start using Terra for data analysis, you will first need to import your data into your workspace. There are multiple ways to do this:

  • Using Terra’s native features to upload data from your local computer or link to data that’s already in a Google bucket
  • Data import workflows

SOPs for importing data into a Terra workspace

SOP SOP Version PHB Version Compatibility
Uploading Data, Creating Metadata Tables and TSV files, and Importing Workflows v3 v1.3.0, v2+
Linking BaseSpace and Importing BaseSpace Reads to Terra v3 v1.3.0, v2+

Genome assembly, QC, and characterization

TheiaX workflows

The TheiaX workflows are used for genome assembly, quality control, and characterization. The TheiaCoV Workflow Series, TheiaProk Workflow Series, and TheiaEuk Workflow Series workflows are intended for viral, bacterial, and fungal pathogens, respectively. TheiaMeta Workflow Series is intended for the analysis of a single taxon from metagenomic data.

SOPs for the TheiaX workflows

For analyzing SARS-CoV-2
SOP SOP Version PHB Version Compatibility
Analyze SARS-COV-2 using TheiaCoV_Illumina_PE_PHB v3 v2+
Analyze SARS-COV-2 using TheiaCoV_Illumina_SE_PHB v3 v2+
Analyze SARS-COV-2 using TheiaCoV_ClearLabs v3 v2+
Analyze SARS-COV-2 using TheiaCoV_ONT v2 v1.x+
Analyzing SARS-CoV-2 using TheiaCoV_FASTA v2 v1.x+
For analyzing influenza
SOP SOP Version PHB Version Compatibility
Analyzing Flu Data in Terra using TheiaCov_Illumina_PE and Augur Workflows v1 v1.x+

Quality evaluation

The TheiaX workflows will generate various quality metrics. These should be evaluated relative to quality thresholds that have been agreed upon within your laboratory or sequencing program and define the sufficient quality characteristics for a genome and sequence data to be used. For the TheiaCoV Workflow Series, TheiaProk Workflow Series, and TheiaEuk Workflow Series workflows, this quality evaluation may be undertaken using the optional QC_check task. Full instructions for the use of this task may be found on the relevant workflow page. Some quality metrics are not evaluated by the QC_check task and should be evaluated manually.

Genomes that fail to meet agreed quality thresholds should not be used. Results for characterization of these genomes may be inaccurate or unreliable. The inclusion of poor-quality genomes in downstream comparative analyses will bias their results. Samples that fail to meet QC thresholds will need to be re-sequenced and sample processing may need to be repeated (e.g. culture-based isolation of clonal bacteria, DNA/RNA extraction, and processing for sequencing).

Update workflows for SARS-CoV-2 genomes

Workflows are available for updating the Pangolin and VADR assignments made to SARS-CoV-2 genomes. The Pangolin Update workflow accounts for the delay in assigning names to newly emerging lineages that you may have already sequenced. The VADR_Update workflow similarly accounts for features that have been newly identified in SARS-CoV-2 genomes when assessing genome quality with VADR.

Phylogenetics

Phylogenetic construction

Phylogenetic trees are constructed to assess the evolutionary relationships between sequences in the tree. These evolutionary relationships are often used as a proxy for epidemiological relationships, and sometimes for inferring transmission between isolation sources.

There are various methods for constructing phylogenetic trees, depending on the sequencing data being used, the organism being analyzed and how it evolved, what you would like to infer from the tree, and the computational resources available for the tree construction. Theiagen has a number of workflows for constructing phylogenetic trees. For full details of these workflows, please see Guide to Phylogenetics which includes advice on the appropriate tree-building workflows and phylogenetic visualization approaches.

Phylogenetic placement

Phylogenetic placement is used to place your own sequences onto an existing phylogenetic tree. This may be used to find the closest relatives to your sequence(s). More details, including phylogenetic visualization approaches can be found in Guide to Phylogenetics

Public Data Sharing

SOPs for data submissions

SOP SOP Version PHB Version Compatibility
Submitting SC2 Sequence Data to GISAID using Theiagen’s Terra 2 GISAID Workflow v2 v2+

SARS-CoV-2 Metagenomic Analysis