Getting Started with Terra¶
Our Approach
Theiagen’s approach to genomic analysis in public health typically uses the Terra platform to run workflows that undertake bioinformatic analysis, then uses other platforms for visualization of the resulting data. This is described in more depth in our paper Accelerating bioinformatics implementation in public health, and the application of this approach for genomic surveillance of SARS-CoV-2 in California is described in the paper Pathogen genomics in public health laboratories: successes, challenges, and lessons learned from California’s SARS-CoV-2 Whole-Genome Sequencing Initiative, California COVIDNet.
When undertaking genomic analysis using Terra and other data visualization platforms, it is essential to consider the necessary and appropriate workflows and resources for your analysis. To help you make these choices, take a look at the relationship between the most commonly used Theiagen workflows, and the descriptions of the major stages in genomic data analysis below.
Data Import to Terra¶
To start using Terra for data analysis, you will first need to import your data into your workspace. There are multiple ways to do this:
- Using Terra’s native features to upload data from your local computer or link to data that’s already in a Google bucket
- Data import workflows
- Using the SRA_Fetch workflow to import publicly available data from any repository in the INSDC (including with SRA, ENA and DRA)
- Using the Assembly_Fetch workflow to import publicly available genome assemblies from NCBI
- Using the BaseSpace_Fetch workflow to import data from your Illumina BaseSpace account
- Using the Create_Terra_Table workflow to help create your data table after manual upload to your Terra workspace (or a Google Cloud Storage Bucket)
SOPs for importing data into a Terra workspace
SOP | SOP Version | PHB Version Compatibility |
---|---|---|
Uploading Data, Creating Metadata Tables and TSV files, and Importing Workflows | v3 | v1.3.0, v2+ |
Linking BaseSpace and Importing BaseSpace Reads to Terra | v3 | v1.3.0, v2+ |
Genome assembly, QC, and characterization¶
TheiaX workflows¶
The TheiaX workflows are used for genome assembly, quality control, and characterization. The TheiaCoV Workflow Series, TheiaProk Workflow Series, and TheiaEuk Workflow Series workflows are intended for viral, bacterial, and fungal pathogens, respectively. TheiaMeta Workflow Series is intended for the analysis of a single taxon from metagenomic data.
SOPs for the TheiaX workflows
For analyzing SARS-CoV-2
SOP | SOP Version | PHB Version Compatibility |
---|---|---|
Analyze SARS-COV-2 using TheiaCoV_Illumina_PE_PHB | v3 | v2+ |
Analyze SARS-COV-2 using TheiaCoV_Illumina_SE_PHB | v3 | v2+ |
Analyze SARS-COV-2 using TheiaCoV_ClearLabs | v3 | v2+ |
Analyze SARS-COV-2 using TheiaCoV_ONT | v2 | v1.x+ |
Analyzing SARS-CoV-2 using TheiaCoV_FASTA | v2 | v1.x+ |
For analyzing influenza
SOP | SOP Version | PHB Version Compatibility |
---|---|---|
Analyzing Flu Data in Terra using TheiaCov_Illumina_PE and Augur Workflows | v1 | v1.x+ |
Quality evaluation¶
The TheiaX workflows will generate various quality metrics. These should be evaluated relative to quality thresholds that have been agreed upon within your laboratory or sequencing program and define the sufficient quality characteristics for a genome and sequence data to be used. For the TheiaCoV Workflow Series, TheiaProk Workflow Series, and TheiaEuk Workflow Series workflows, this quality evaluation may be undertaken using the optional QC_check
task. Full instructions for the use of this task may be found on the relevant workflow page. Some quality metrics are not evaluated by the QC_check
task and should be evaluated manually.
Genomes that fail to meet agreed quality thresholds should not be used. Results for characterization of these genomes may be inaccurate or unreliable. The inclusion of poor-quality genomes in downstream comparative analyses will bias their results. Samples that fail to meet QC thresholds will need to be re-sequenced and sample processing may need to be repeated (e.g. culture-based isolation of clonal bacteria, DNA/RNA extraction, and processing for sequencing).
Update workflows for SARS-CoV-2 genomes¶
Workflows are available for updating the Pangolin and VADR assignments made to SARS-CoV-2 genomes. The Pangolin Update workflow accounts for the delay in assigning names to newly emerging lineages that you may have already sequenced. The VADR_Update workflow similarly accounts for features that have been newly identified in SARS-CoV-2 genomes when assessing genome quality with VADR.
Phylogenetics¶
Phylogenetic construction¶
Phylogenetic trees are constructed to assess the evolutionary relationships between sequences in the tree. These evolutionary relationships are often used as a proxy for epidemiological relationships, and sometimes for inferring transmission between isolation sources.
There are various methods for constructing phylogenetic trees, depending on the sequencing data being used, the organism being analyzed and how it evolved, what you would like to infer from the tree, and the computational resources available for the tree construction. Theiagen has a number of workflows for constructing phylogenetic trees. For full details of these workflows, please see Guide to Phylogenetics which includes advice on the appropriate tree-building workflows and phylogenetic visualization approaches.
SOPs for phylogenetic construction
SOP | SOP Version | PHB Version Compatibility |
---|---|---|
Analyzing Flu Data in Terra using TheiaCov_Illumina_PE and Augur Workflows | v1 | v1.x+ |
Analyzing Phylogenetic Relationships in Terra using Theiagen’s Augur Workflows | v1 | v1.x+ |
Phylogenetic placement¶
Phylogenetic placement is used to place your own sequences onto an existing phylogenetic tree. This may be used to find the closest relatives to your sequence(s). More details, including phylogenetic visualization approaches can be found in Guide to Phylogenetics
Public Data Sharing¶
SOPs for data submissions
SOP | SOP Version | PHB Version Compatibility |
---|---|---|
Submitting SC2 Sequence Data to GISAID using Theiagen’s Terra 2 GISAID Workflow | v2 | v2+ |
SARS-CoV-2 Metagenomic Analysis¶
SOPs for SARS-CoV-2 metagenomic data analysis
SOP | SOP Version | PHB Version Compatibility |
---|---|---|
Analyzing SARS-CoV-2 Metagenomic Samples using Freyja FASTQ | v2 | v2+ |
Plotting SARS-CoV-2 Metagenomic Sample Data using Freyja Plot | v3 | v2+ |
Creating a Dashboard Visualization of SARS-CoV-2 Metagenomic Samples using Freyja Dashboard | v2 | v2+ |
Creating Static Reference Files for Freyja Analysis in Terra using Freyja Update | v2 | v2+ |