Skip to content

Freyja Workflow Series

Quick Facts

Workflow Type Applicable Kingdom Last Known Changes Command-line Compatibility Workflow Level Dockstore
Genomic Characterization SARS-CoV-2, Viral v3.1.0 Yes Sample-level, Set-level Freyja_FASTQ_PHB, Freyja_Plot_PHB, Freyja_Dashboard_PHB, Freyja_Update_PHB

Freyja Overview

Freyja is a tool for analysing viral mixed sample genomic sequencing data. Developed by Joshua Levy from the Andersen Lab, it performs two main steps:

  1. Variant Frequency Estimation: Freyja calculates the frequencies of single nucleotide variants (SNVs) in the genomic sequencing data.
  2. Depth-Weighted Demixing: It separates mixed populations of viral subtypes using a depth-weighted statistical approach, estimating the proportional abundance of each subtype in the sample based on the frequencies of subtype-defining variants.

Additional post-processing steps can produce visualizations of aggregated samples.

Wastewater and more

The typical use case of Freyja is to analyze mixed SARS-CoV-2 samples from a sequencing dataset, most often wastewater, but the tool is not limited to this context. With the appropriate reference genomes and barcode files, Freyja can be adapted for other pathogens, including MPXV, Influenza, RSV, and Measles.

Default Values

The defaults included in the Freyja workflows reflect this use case but can be adjusted for other pathogens. See the Running Freyja on other pathogens section for more information. Please be aware this is an experimental feature and we cannot guarantee complete functionality at this time.

Figure 1: Workflow diagram for Freyja Suite of workflows

Figure 1

**Figure 1: Workflow diagram for Freyja Suite of workflows.**

Depending on the type of data (Illumina or Oxford Nanopore), the Read QC and Filtering steps, as well as the Read Alignment steps use different software. The user can specify if the barcodes and lineages file should be updated with freyja update before running Freyja or if bootstrapping is to be performed with freyja boot.

Four workflows have been created that perform different parts of Freyja:

The main workflow is Freyja_FASTQ_PHB (Figure 1). Depending on the type of input data (Illumina paired-end, Illumina single-end or ONT), it runs various QC modules before aligning the sample with either BWA (Illumina) or minimap2 (ONT) to the provided reference file, followed by iVar for primer trimming. After the preprocessing is completed, Freyja is run to generate relative lineage abundances (demix) from the sample. Optional bootstrapping may be performed.

Data Compatability

The Freyja_FASTQ_PHB workflow is compatible with the following input data types:

- Ilumina Single-End
- Illumina Paired-End
- Oxford Nanopore

Freyja_Update_PHB will copy the SARS-CoV-2 reference files that can then be used as input for the Freyja_FASTQ_PHB workflow.

Two options are available to visualize the Freyja results: Freyja_Plot_PHB and Freyja_Dashboard_PHB. Freyja_Plot_PHB aggregates multiple samples using output from Freyja_FASTQ_PHB to generate a plot that shows fractional abundance estimates for all samples. including the option to plot sample collection date information. Alternatively, Freyja_Dashboard_PHB aggregates multiple samples using output from Freyja_FASTQ_PHB to generate an interactive visualization. This workflow requires an additional input field called viral load, which is the number of viral copies per liter.

Freyja, Sequencing Platforms and Data Quality

The choice of sequencing platform and the quality of the data directly influence Freyja's performance. High-accuracy platforms like Illumina provide reliable SNV detection, enhancing the precision of lineage abundance estimates. In contrast, platforms with higher error rates, such as Nanopore, whilst it has improved greatly in the recent years, may introduce uncertainties in variant calling, affecting the deconvolution process. Sequencing depth requirements will increase as the quality of the sequencing data decreases. A rational target depth is 100X coverage for sequencing data with Q-scores in the range of 25-30.

Additionally, inadequate sequencing depth can hinder Freyja's ability to differentiate between lineages, leading to potential misestimations. Sequencing depth requirements will increase with the complexity of the sample composition and the diversity of lineages present. For samples containing multiple closely related lineages, higher sequencing depth is necessary to resolve subtle differences in genetic variation and accurately estimate lineage abundances. This is particularly important for pathogens with high mutation rates or a large number of cocirculating lineages, such as influenza, where distinguishing between lineages relies on detecting specific single nucleotide variants (SNVs) with high confidence.

Freyja Workflows

Freyja_Update_PHB

This workflow will copy the SARS-CoV-2 reference files (curated_lineages.json and usher_barcodes.feather) from the source repository to a user-specific Google Cloud Storage (GCP) location (often a Terra.bio workspace-associated bucket). These files can then be used as input for the Freyja_FASTQ_PHB workflow.

Warning

This workflow is compatible only with SARS-CoV-2 reference files! To download reference files for other organisms please see the following repository: Freyja Barcodes.

More information is available in the Running Freyja on other pathogens section.

Inputs

We recommend running this workflow with "Run inputs defined by file paths" selected since no information from a Terra data table is actually being used. We also recommend turning off call caching so new information is retrieved every time.

Terra Task Name Variable Type Description Default Value Terra Status
freyja_update gcp_uri String The path where you want the Freyja reference files to be stored. Include gs:// at the beginning of the string. Full example with a Terra workspace bucket: "gs://fc-87ddd67a-c674-45a8-9651-f91e3d2f6bb7" Required
freyja_update_refs cpu Int Number of CPUs to allocate to the task 1 Optional
freyja_update_refs disk_size Int Amount of storage (in GB) to allocate to the task 25 Optional
freyja_update_refs docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/freyja:1.5.3 Optional
freyja_update_refs memory Int Amount of memory/RAM (in GB) to allocate to the task 10 Optional
transfer_files cpu Int Number of CPUs to allocate to the task 1 Optional
transfer_files disk_size Int Amount of storage (in GB) to allocate to the task 25 Optional
transfer_files docker String Docker image to use for the task us-docker.pkg.dev/general-theiagen/cloudsdktool/google-cloud-cli:427.0.0-alpine Optional
transfer_files memory Int Amount of memory (in GB) to allocate to the task 2 Optional

Outputs

This workflow does not produce any outputs that appear in a Terra data table. The reference files will appear at the location specified with the gcp_uri input variable.

Freyja_FASTQ_PHB

Freyja measures SNV frequency and sequencing depth at each position in the genome to return an estimate of the true lineage abundances in the sample. The method uses lineage-defining "barcodes" that, for SARS-CoV-2, are derived from the UShER global phylogenetic tree as a base set for demixing. Freyja_FASTQ_PHB returns as output a TSV file that includes the lineages present and their corresponding abundances, along with other values.

The Freyja_FASTQ_PHB workflow is compatible with the multiple input data types: Ilumina Single-End, Illumina Paired-End and Oxford Nanopore. Depending on the type of input data, different input values are used.

Table 1: Freyja_FASTQ_PHB input configuration for different types of input data.

Table Columns Illumina Paired-End Illumina Single-End Oxford Nanopore
read1
read2
ont false false true

Inputs

This workflow runs on the sample level.

Terra Task Name Variable Type Description Default Value Terra Status
freyja_fastq read1 File FASTQ file containing read1 sequences (Illumina or (ONT) Required
freyja_fastq reference_genome File The reference genome to use; should match the reference used for alignment (Wuhan-Hu-1) Required
freyja_fastq samplename String The name of the sample being analyzed Required
freyja_fastq freyja_lineage_metadata File File containing the lineage metadata; the "curated_lineages.json" file found https://github.com/andersen-lab/Freyja/tree/main/freyja/data can be used for this variable. Does not need to be provided if update_db is true or if the freyja_pathogen is provided. Optional, Required
bwa cpu Int Number of CPUs to allocate to the task 6 Optional
bwa disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
bwa docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/ivar:1.3.1-titan Optional
bwa memory Int Amount of memory/RAM (in GB) to allocate to the task 16 Optional
freyja adapt Float adaptive lasso penalty parameter 0 Optional
freyja bootstrap Boolean Perform bootstrapping FALSE Optional
freyja confirmed_only Boolean Include only confirmed SARS-CoV-2 lineages FALSE Optional
freyja cpu Int Number of CPUs to allocate to the task 2 Optional
freyja disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
freyja docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/freyja:1.5.3 Optional
freyja eps Float The minimum lineage abundance cut-off value 0.001 Optional
freyja memory Int Amount of memory/RAM (in GB) to allocate to the task 8 Optional
freyja number_bootstraps Int The number of bootstraps to perform (only used if bootstrap = true) 100 Optional
freyja update_db Boolean Updates the Freyja reference files (the usher barcodes and lineage metadata files) but will not save them as output (use Freyja_Update for that purpose). If set to true, the freyja_lineage_metadata and freyja_barcodes files are not required. FALSE Optional
freyja_fastq depth_cutoff Int The minimum coverage depth with which to exclude sites below this value and group identical barcodes -- THIS MAY NOT WORK FOR NON-SARS-COV-2 ORGANISMS! 10 Optional
freyja_fastq freyja_barcodes File Custom barcode file. Does not need to be provided if update_db is true if the freyja_pathogen is provided. Optional
freyja_fastq freyja_pathogen String Pathogen of interest, used if not providing the barcodes and lineage metadata files. Options: SARS-CoV-2, MPXV, H5NX, H1N1pdm, FLU-B-VIC, MEASLESN450, MEASLES, RSVa, RSVb Optional
freyja_fastq freyja_pathogen String Pathogen to be used by Freyja SARS-CoV-2 Optional
freyja_fastq kraken2_target_organism String The organism whose abundance the user wants to check in their reads. This should be a proper taxonomic name recognized by the Kraken database. Severe acute respiratory syndrome coronavirus 2 Optional
freyja_fastq ont Boolean Indicates if the input data is derived from an ONT instrument. FALSE Optional
freyja_fastq primer_bed File The bed file containing the primers used when sequencing was performed Optional
freyja_fastq read2 File Illumina reverse read file in FASTQ file format (compression optional) Optional
freyja_fastq reference_gff File The GFF file for reference; should match the reference used for alignment (Wuhan-Hu-1) Optional
freyja_fastq trimmomatic_min_length Int The minimum length cut-off when performing read cleaning 25 Optional
get_fasta_genome_size cpu Int Number of CPUs to allocate to the task 1 Optional
get_fasta_genome_size disk_size Int Amount of storage (in GB) to allocate to the task 10 Optional
get_fasta_genome_size docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/biocontainers/seqkit:2.4.0--h9ee0642_0 Optional
get_fasta_genome_size memory Int Amount of memory/RAM (in GB) to allocate to the task 2 Optional
minimap2 cpu Int Number of CPUs to allocate to the task 2 Optional
minimap2 disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
minimap2 docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/minimap2:2.22 Optional
minimap2 memory Int Amount of memory/RAM (in GB) to allocate to the task 8 Optional
minimap2 query2 File Internal component, do not modify Optional
nanoplot_clean cpu Int Number of CPUs to allocate to the task 4 Optional
nanoplot_clean disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
nanoplot_clean docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/nanoplot:1.40.0 Optional
nanoplot_clean max_length Int The maximum length of clean reads, for which reads longer than the length specified will be hidden. 100000 Optional
nanoplot_clean memory Int Amount of memory/RAM (in GB) to allocate to the task 16 Optional
nanoplot_raw cpu Int Number of CPUs to allocate to the task 4 Optional
nanoplot_raw disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
nanoplot_raw docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/nanoplot:1.40.0 Optional
nanoplot_raw max_length Int The maximum length of clean reads, for which reads longer than the length specified will be hidden. 100000 Optional
nanoplot_raw memory Int Amount of memory/RAM (in GB) to allocate to the task 16 Optional
primer_trim cpu Int Number of CPUs to allocate to the task 2 Optional
primer_trim disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
primer_trim docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/ivar:1.3.1-titan Optional
primer_trim keep_noprimer_reads Boolean Include reads with no primers TRUE Optional
primer_trim memory Int Amount of memory/RAM (in GB) to allocate to the task 8 Optional
read_QC_trim_ont artic_guppyplex_cpu Int Number of CPUs to allocate to the task 8 Optional
read_QC_trim_ont artic_guppyplex_disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
read_QC_trim_ont artic_guppyplex_docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/artic-ncov2019:1.3.0-medaka-1.4.3 Optional
read_QC_trim_ont artic_guppyplex_memory Int Amount of memory/RAM (in GB) to allocate to the task 16 Optional
read_QC_trim_ont call_kraken Boolean Internal component, do not modify FALSE Optional
read_QC_trim_ont downsampling_coverage Float Internal component, do not modify 150 Optional
read_QC_trim_ont genome_length Int Internal component, do not modify Optional
read_QC_trim_ont genome_length Int Length of the genome 5000000 Optional
read_QC_trim_ont kraken2_recalculate_abundances_cpu Int Internal component, do not modify 4 Optional
read_QC_trim_ont kraken2_recalculate_abundances_disk_size Int Internal component, do not modify 100 Optional
read_QC_trim_ont kraken2_recalculate_abundances_docker Int Internal component, do not modify us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-08-28-v4 Optional
read_QC_trim_ont kraken2_recalculate_abundances_memory Int Internal component, do not modify 8 Optional
read_QC_trim_ont kraken_cpu Int Internal component, do not modify 4 Optional
read_QC_trim_ont kraken_db File Internal component, do not modify Optional
read_QC_trim_ont kraken_disk_size Int Internal component, do not modify 100 Optional
read_QC_trim_ont kraken_docker_image String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/kraken2:2.1.2-no-db Optional
read_QC_trim_ont kraken_memory Int Internal component, do not modify 8 Optional
read_QC_trim_ont max_length Int Internal component, do not modify Optional
read_QC_trim_ont min_length Int Internal component, do not modify Optional
read_QC_trim_ont nanoq_cpu Int Number of CPUs to allocate to the task 2 Optional
read_QC_trim_ont nanoq_disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
read_QC_trim_ont nanoq_docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/biocontainers/nanoq:0.9.0--hec16e2b_1 Optional
read_QC_trim_ont nanoq_max_read_length Int Maximum read length to use for filtering Optional
read_QC_trim_ont nanoq_max_read_qual Int Maximum read quality to use for filtering Optional
read_QC_trim_ont nanoq_memory Int Amount of memory/RAM (in GB) to allocate to the task 2 Optional
read_QC_trim_ont nanoq_min_read_length Int Minimum read length to use for filtering Optional
read_QC_trim_ont nanoq_min_read_qual Int Minimum read quality to use for filtering Optional
read_QC_trim_ont ncbi_scrub_cpu Int Number of CPUs to allocate to the task 4 Optional
read_QC_trim_ont ncbi_scrub_disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
read_QC_trim_ont ncbi_scrub_docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/ncbi/sra-human-scrubber:2.2.1 Optional
read_QC_trim_ont ncbi_scrub_memory Int Amount of memory/RAM (in GB) to allocate to the task 4 Optional
read_QC_trim_ont rasusa_bases String Explicitly set the number of bases required e.g., 4.3kb, 7Tb, 9000, 4.1MB. If this option is given, --coverage and --genome-size are ignored Optional
read_QC_trim_ont rasusa_cpu Int Number of CPUs to allocate to the task 4 Optional
read_QC_trim_ont rasusa_disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
read_QC_trim_ont rasusa_docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/rasusa:2.1.0 Optional
read_QC_trim_ont rasusa_fraction_of_reads Float Subsample to a fraction of the reads - e.g., 0.5 samples half the reads Optional
read_QC_trim_ont rasusa_memory Int Amount of memory/RAM (in GB) to allocate to the task 8 Optional
read_QC_trim_ont rasusa_number_of_reads Int Subsample to a specific number of reads Optional
read_QC_trim_ont rasusa_seed Int Random seed to use Optional
read_QC_trim_ont run_prefix String Internal component, do not modify Optional
read_QC_trim_ont target_organism String Internal component, do not modify Optional
read_QC_trim_pe adapters File A FASTA file containing adapter sequences Optional
read_QC_trim_pe bbduk_memory Int Amount of memory/RAM (in GB) to allocate to the task 8 Optional
read_QC_trim_pe call_kraken Boolean True/False variable that determines if the Kraken2 task should be called; for non-TheiaCoV workflows, the kraken_db variable must be provided. FALSE Optional
read_QC_trim_pe call_midas Boolean True/False variable that determines if the MIDAS task should be called. FALSE Optional
read_QC_trim_pe extract_unclassified Boolean Internal component, do not modify FALSE Optional
read_QC_trim_pe fastp_args String Additional arguments to use with fastp --detect_adapter_for_pe -g -5 20 -3 20 Optional
read_QC_trim_pe host String Internal component, do not modify Optional
read_QC_trim_pe host_complete_only Boolean Internal component, do not modify FALSE Optional
read_QC_trim_pe host_decontaminate_mem Int Internal component, do not modify 32 Optional
read_QC_trim_pe host_is_accession Boolean Internal component, do not modify FALSE Optional
read_QC_trim_pe host_refseq Boolean Internal component, do not modify TRUE Optional
read_QC_trim_pe kraken_cpu Int Number of CPUs to allocate to the task 4 Optional
read_QC_trim_pe kraken_db File A kraken2 database to use with the kraken2 optional task. The file must be a .tar.gz kraken2 database. Optional
read_QC_trim_pe kraken_disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
read_QC_trim_pe kraken_memory Int Amount of memory/RAM (in GB) to allocate to the task 8 Optional
read_QC_trim_pe midas_db File Database to use with MIDAS. Not required as one will be auto-selected when running the MIDAS task. Optional
read_QC_trim_pe phix File The file containing the phix sequence to be used during bbduk task Optional
read_QC_trim_pe read_processing String Options: "trimmomatic" or "fastp" to indicate which read trimming module to use trimmomatic Optional
read_QC_trim_pe read_qc String Allows the user to decide between fastq_scan (default) and fastqc for the evaluation of read quality. fastq_scan Optional
read_QC_trim_pe target_organism String The organism whose abundance the user wants to check in their reads. This should be a proper taxonomic name recognized by the Kraken database. Optional
read_QC_trim_pe taxon_id Int Internal component, do not modify 0 Optional
read_QC_trim_pe trim_quality_min_score Int The minimum quality score to keep during trimming 30 Optional
read_QC_trim_pe trim_quality_trim_score Int The minimum quality score to keep during trimming 30 Optional
read_QC_trim_pe trim_window_size Int The window size to use during trimming 4 Optional
read_QC_trim_pe trimmomatic_args String Additional command-line arguments to use with trimmomatic Optional
read_QC_trim_se adapters File Internal component, do not modify Optional
read_QC_trim_se bbduk_memory Int Internal component, do not modify 8 Optional
read_QC_trim_se call_kraken Boolean Internal component, do not modify FALSE Optional
read_QC_trim_se call_midas Boolean Internal component, do not modify FALSE Optional
read_QC_trim_se fastp_args String Internal component, do not modify --detect_adapter_for_pe -g -5 20 -3 20 Optional
read_QC_trim_se kraken_cpu Int Internal component, do not modify 4 Optional
read_QC_trim_se kraken_db File Internal component, do not modify Optional
read_QC_trim_se kraken_disk_size Int Internal component, do not modify 100 Optional
read_QC_trim_se kraken_memory Int Internal component, do not modify 8 Optional
read_QC_trim_se midas_db File Internal component, do not modify Optional
read_QC_trim_se phix File Internal component, do not modify Optional
read_QC_trim_se read_processing String Internal component, do not modify trimmomatic Optional
read_QC_trim_se read_qc String Internal component, do not modify fastq_scan Optional
read_QC_trim_se target_organism String Internal component, do not modify Optional
read_QC_trim_se trim_quality_min_score Int Internal component, do not modify 30 Optional
read_QC_trim_se trim_window_size Int Internal component, do not modify 4 Optional
read_QC_trim_se trimmomatic_args String Internal component, do not modify Optional
sam_to_sorted_bam cpu Int Number of CPUs to allocate to the task 2 Optional
sam_to_sorted_bam disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
sam_to_sorted_bam docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/samtools:1.17 Optional
sam_to_sorted_bam memory Int Amount of memory/RAM (in GB) to allocate to the task 8 Optional
sam_to_sorted_bam min_qual Int Minimum quality score for reads to be included in the analysis Optional
version_capture docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0 Optional
version_capture timezone String Set the time zone to get an accurate date of analysis (uses UTC by default) Optional
Terra Task Name Variable Type Description Default Value Terra Status
freyja_fastq read1 File FASTQ file containing read1 sequences (Illumina or (ONT) Required
freyja_fastq reference_genome File The reference genome to use; should match the reference used for alignment (Wuhan-Hu-1) Required
freyja_fastq samplename String The name of the sample being analyzed Required
freyja_fastq freyja_lineage_metadata File File containing the lineage metadata; the "curated_lineages.json" file found https://github.com/andersen-lab/Freyja/tree/main/freyja/data can be used for this variable. Does not need to be provided if update_db is true or if the freyja_pathogen is provided. Optional, Required
bwa cpu Int Number of CPUs to allocate to the task 6 Optional
bwa disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
bwa docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/ivar:1.3.1-titan Optional
bwa memory Int Amount of memory/RAM (in GB) to allocate to the task 16 Optional
freyja adapt Float adaptive lasso penalty parameter 0 Optional
freyja bootstrap Boolean Perform bootstrapping FALSE Optional
freyja confirmed_only Boolean Include only confirmed SARS-CoV-2 lineages FALSE Optional
freyja cpu Int Number of CPUs to allocate to the task 2 Optional
freyja disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
freyja docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/freyja:1.5.3 Optional
freyja eps Float The minimum lineage abundance cut-off value 0.001 Optional
freyja memory Int Amount of memory/RAM (in GB) to allocate to the task 8 Optional
freyja number_bootstraps Int The number of bootstraps to perform (only used if bootstrap = true) 100 Optional
freyja update_db Boolean Updates the Freyja reference files (the usher barcodes and lineage metadata files) but will not save them as output (use Freyja_Update for that purpose). If set to true, the freyja_lineage_metadata and freyja_barcodes files are not required. FALSE Optional
freyja_fastq depth_cutoff Int The minimum coverage depth with which to exclude sites below this value and group identical barcodes -- THIS MAY NOT WORK FOR NON-SARS-COV-2 ORGANISMS! 10 Optional
freyja_fastq freyja_barcodes File Custom barcode file. Does not need to be provided if update_db is true if the freyja_pathogen is provided. Optional
freyja_fastq freyja_pathogen String Pathogen of interest, used if not providing the barcodes and lineage metadata files. Options: SARS-CoV-2, MPXV, H5NX, H1N1pdm, FLU-B-VIC, MEASLESN450, MEASLES, RSVa, RSVb Optional
freyja_fastq freyja_pathogen String Pathogen to be used by Freyja SARS-CoV-2 Optional
freyja_fastq kraken2_target_organism String The organism whose abundance the user wants to check in their reads. This should be a proper taxonomic name recognized by the Kraken database. Severe acute respiratory syndrome coronavirus 2 Optional
freyja_fastq ont Boolean Indicates if the input data is derived from an ONT instrument. FALSE Optional
freyja_fastq primer_bed File The bed file containing the primers used when sequencing was performed Optional
freyja_fastq read2 File Illumina reverse read file in FASTQ file format (compression optional) Optional
freyja_fastq reference_gff File The GFF file for reference; should match the reference used for alignment (Wuhan-Hu-1) Optional
freyja_fastq trimmomatic_min_length Int The minimum length cut-off when performing read cleaning 25 Optional
get_fasta_genome_size cpu Int Number of CPUs to allocate to the task 1 Optional
get_fasta_genome_size disk_size Int Amount of storage (in GB) to allocate to the task 10 Optional
get_fasta_genome_size docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/biocontainers/seqkit:2.4.0--h9ee0642_0 Optional
get_fasta_genome_size memory Int Amount of memory/RAM (in GB) to allocate to the task 2 Optional
minimap2 cpu Int Number of CPUs to allocate to the task 2 Optional
minimap2 disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
minimap2 docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/minimap2:2.22 Optional
minimap2 memory Int Amount of memory/RAM (in GB) to allocate to the task 8 Optional
minimap2 query2 File Internal component, do not modify Optional
nanoplot_clean cpu Int Number of CPUs to allocate to the task 4 Optional
nanoplot_clean disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
nanoplot_clean docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/nanoplot:1.40.0 Optional
nanoplot_clean max_length Int The maximum length of clean reads, for which reads longer than the length specified will be hidden. 100000 Optional
nanoplot_clean memory Int Amount of memory/RAM (in GB) to allocate to the task 16 Optional
nanoplot_raw cpu Int Number of CPUs to allocate to the task 4 Optional
nanoplot_raw disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
nanoplot_raw docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/nanoplot:1.40.0 Optional
nanoplot_raw max_length Int The maximum length of clean reads, for which reads longer than the length specified will be hidden. 100000 Optional
nanoplot_raw memory Int Amount of memory/RAM (in GB) to allocate to the task 16 Optional
primer_trim cpu Int Number of CPUs to allocate to the task 2 Optional
primer_trim disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
primer_trim docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/ivar:1.3.1-titan Optional
primer_trim keep_noprimer_reads Boolean Include reads with no primers TRUE Optional
primer_trim memory Int Amount of memory/RAM (in GB) to allocate to the task 8 Optional
read_QC_trim_ont artic_guppyplex_cpu Int Number of CPUs to allocate to the task 8 Optional
read_QC_trim_ont artic_guppyplex_disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
read_QC_trim_ont artic_guppyplex_docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/artic-ncov2019:1.3.0-medaka-1.4.3 Optional
read_QC_trim_ont artic_guppyplex_memory Int Amount of memory/RAM (in GB) to allocate to the task 16 Optional
read_QC_trim_ont call_kraken Boolean Internal component, do not modify FALSE Optional
read_QC_trim_ont downsampling_coverage Float Internal component, do not modify 150 Optional
read_QC_trim_ont genome_length Int Internal component, do not modify Optional
read_QC_trim_ont genome_length Int Length of the genome 5000000 Optional
read_QC_trim_ont kraken2_recalculate_abundances_cpu Int Internal component, do not modify 4 Optional
read_QC_trim_ont kraken2_recalculate_abundances_disk_size Int Internal component, do not modify 100 Optional
read_QC_trim_ont kraken2_recalculate_abundances_docker Int Internal component, do not modify us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-08-28-v4 Optional
read_QC_trim_ont kraken2_recalculate_abundances_memory Int Internal component, do not modify 8 Optional
read_QC_trim_ont kraken_cpu Int Internal component, do not modify 4 Optional
read_QC_trim_ont kraken_db File Internal component, do not modify Optional
read_QC_trim_ont kraken_disk_size Int Internal component, do not modify 100 Optional
read_QC_trim_ont kraken_docker_image String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/kraken2:2.1.2-no-db Optional
read_QC_trim_ont kraken_memory Int Internal component, do not modify 8 Optional
read_QC_trim_ont max_length Int Internal component, do not modify Optional
read_QC_trim_ont min_length Int Internal component, do not modify Optional
read_QC_trim_ont nanoq_cpu Int Number of CPUs to allocate to the task 2 Optional
read_QC_trim_ont nanoq_disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
read_QC_trim_ont nanoq_docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/biocontainers/nanoq:0.9.0--hec16e2b_1 Optional
read_QC_trim_ont nanoq_max_read_length Int Maximum read length to use for filtering Optional
read_QC_trim_ont nanoq_max_read_qual Int Maximum read quality to use for filtering Optional
read_QC_trim_ont nanoq_memory Int Amount of memory/RAM (in GB) to allocate to the task 2 Optional
read_QC_trim_ont nanoq_min_read_length Int Minimum read length to use for filtering Optional
read_QC_trim_ont nanoq_min_read_qual Int Minimum read quality to use for filtering Optional
read_QC_trim_ont ncbi_scrub_cpu Int Number of CPUs to allocate to the task 4 Optional
read_QC_trim_ont ncbi_scrub_disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
read_QC_trim_ont ncbi_scrub_docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/ncbi/sra-human-scrubber:2.2.1 Optional
read_QC_trim_ont ncbi_scrub_memory Int Amount of memory/RAM (in GB) to allocate to the task 4 Optional
read_QC_trim_ont rasusa_bases String Explicitly set the number of bases required e.g., 4.3kb, 7Tb, 9000, 4.1MB. If this option is given, --coverage and --genome-size are ignored Optional
read_QC_trim_ont rasusa_cpu Int Number of CPUs to allocate to the task 4 Optional
read_QC_trim_ont rasusa_disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
read_QC_trim_ont rasusa_docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/rasusa:2.1.0 Optional
read_QC_trim_ont rasusa_fraction_of_reads Float Subsample to a fraction of the reads - e.g., 0.5 samples half the reads Optional
read_QC_trim_ont rasusa_memory Int Amount of memory/RAM (in GB) to allocate to the task 8 Optional
read_QC_trim_ont rasusa_number_of_reads Int Subsample to a specific number of reads Optional
read_QC_trim_ont rasusa_seed Int Random seed to use Optional
read_QC_trim_ont run_prefix String Internal component, do not modify Optional
read_QC_trim_ont target_organism String Internal component, do not modify Optional
read_QC_trim_pe adapters File Internal component, do not modify Optional
read_QC_trim_pe bbduk_memory Int Internal component, do not modify 8 Optional
read_QC_trim_pe call_kraken Boolean Internal component, do not modify FALSE Optional
read_QC_trim_pe call_midas Boolean Internal component, do not modify FALSE Optional
read_QC_trim_pe extract_unclassified Boolean Internal component, do not modify FALSE Optional
read_QC_trim_pe fastp_args String Internal component, do not modify --detect_adapter_for_pe -g -5 20 -3 20 Optional
read_QC_trim_pe host String Internal component, do not modify Optional
read_QC_trim_pe host_complete_only Boolean Internal component, do not modify FALSE Optional
read_QC_trim_pe host_decontaminate_mem Int Internal component, do not modify 32 Optional
read_QC_trim_pe host_is_accession Boolean Internal component, do not modify FALSE Optional
read_QC_trim_pe host_refseq Boolean Internal component, do not modify TRUE Optional
read_QC_trim_pe kraken_cpu Int Internal component, do not modify 4 Optional
read_QC_trim_pe kraken_db File Internal component, do not modify Optional
read_QC_trim_pe kraken_disk_size Int Internal component, do not modify 100 Optional
read_QC_trim_pe kraken_memory Int Internal component, do not modify 8 Optional
read_QC_trim_pe midas_db File Internal component, do not modify Optional
read_QC_trim_pe phix File Internal component, do not modify Optional
read_QC_trim_pe read_processing String Internal component, do not modify trimmomatic Optional
read_QC_trim_pe read_qc String Internal component, do not modify fastq_scan Optional
read_QC_trim_pe target_organism String Internal component, do not modify Optional
read_QC_trim_pe taxon_id Int Internal component, do not modify 0 Optional
read_QC_trim_pe trim_quality_min_score Int The minimum quality score to keep during trimming 30 Optional
read_QC_trim_pe trim_quality_trim_score Int Internal component, do not modify 30 Optional
read_QC_trim_pe trim_window_size Int Internal component, do not modify 4 Optional
read_QC_trim_pe trimmomatic_args String Internal component, do not modify Optional
read_QC_trim_se adapters File A FASTA file containing adapter sequences Optional
read_QC_trim_se bbduk_memory Int Amount of memory/RAM (in GB) to allocate to the task 8 Optional
read_QC_trim_se call_kraken Boolean True/False variable that determines if the Kraken2 task should be called; for non-TheiaCoV workflows, the kraken_db variable must be provided. FALSE Optional
read_QC_trim_se call_midas Boolean True/False variable that determines if the MIDAS task should be called. FALSE Optional
read_QC_trim_se fastp_args String Additional arguments to use with fastp --detect_adapter_for_pe -g -5 20 -3 20 Optional
read_QC_trim_se kraken_cpu Int Number of CPUs to allocate to the task 4 Optional
read_QC_trim_se kraken_db File A kraken2 database to use with the kraken2 optional task. The file must be a .tar.gz kraken2 database. Optional
read_QC_trim_se kraken_disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
read_QC_trim_se kraken_memory Int Amount of memory/RAM (in GB) to allocate to the task 8 Optional
read_QC_trim_se midas_db File Database to use with MIDAS. Not required as one will be auto-selected when running the MIDAS task. Optional
read_QC_trim_se phix File The file containing the phix sequence to be used during bbduk task Optional
read_QC_trim_se read_processing String Options: "trimmomatic" or "fastp" to indicate which read trimming module to use trimmomatic Optional
read_QC_trim_se read_qc String Allows the user to decide between fastq_scan (default) and fastqc for the evaluation of read quality. fastq_scan Optional
read_QC_trim_se target_organism String The organism whose abundance the user wants to check in their reads. This should be a proper taxonomic name recognized by the Kraken database. Optional
read_QC_trim_se trim_quality_min_score Int The minimum quality score to keep during trimming 30 Optional
read_QC_trim_se trim_window_size Int The window size to use during trimming 4 Optional
read_QC_trim_se trimmomatic_args String Additional command-line arguments to use with trimmomatic Optional
sam_to_sorted_bam cpu Int Number of CPUs to allocate to the task 2 Optional
sam_to_sorted_bam disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
sam_to_sorted_bam docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/samtools:1.17 Optional
sam_to_sorted_bam memory Int Amount of memory/RAM (in GB) to allocate to the task 8 Optional
sam_to_sorted_bam min_qual Int Minimum quality score for reads to be included in the analysis Optional
version_capture docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0 Optional
version_capture timezone String Set the time zone to get an accurate date of analysis (uses UTC by default) Optional
Terra Task Name Variable Type Description Default Value Terra Status
freyja_fastq read1 File FASTQ file containing read1 sequences (Illumina or (ONT) Required
freyja_fastq reference_genome File The reference genome to use; should match the reference used for alignment (Wuhan-Hu-1) Required
freyja_fastq samplename String The name of the sample being analyzed Required
freyja_fastq freyja_lineage_metadata File File containing the lineage metadata; the "curated_lineages.json" file found https://github.com/andersen-lab/Freyja/tree/main/freyja/data can be used for this variable. Does not need to be provided if update_db is true or if the freyja_pathogen is provided. Optional, Required
bwa cpu Int Number of CPUs to allocate to the task 6 Optional
bwa disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
bwa docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/ivar:1.3.1-titan Optional
bwa memory Int Amount of memory/RAM (in GB) to allocate to the task 16 Optional
freyja adapt Float adaptive lasso penalty parameter 0 Optional
freyja bootstrap Boolean Perform bootstrapping FALSE Optional
freyja confirmed_only Boolean Include only confirmed SARS-CoV-2 lineages FALSE Optional
freyja cpu Int Number of CPUs to allocate to the task 2 Optional
freyja disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
freyja docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/freyja:1.5.3 Optional
freyja eps Float The minimum lineage abundance cut-off value 0.001 Optional
freyja memory Int Amount of memory/RAM (in GB) to allocate to the task 8 Optional
freyja number_bootstraps Int The number of bootstraps to perform (only used if bootstrap = true) 100 Optional
freyja update_db Boolean Updates the Freyja reference files (the usher barcodes and lineage metadata files) but will not save them as output (use Freyja_Update for that purpose). If set to true, the freyja_lineage_metadata and freyja_barcodes files are not required. FALSE Optional
freyja_fastq depth_cutoff Int The minimum coverage depth with which to exclude sites below this value and group identical barcodes -- THIS MAY NOT WORK FOR NON-SARS-COV-2 ORGANISMS! 10 Optional
freyja_fastq freyja_barcodes File Custom barcode file. Does not need to be provided if update_db is true if the freyja_pathogen is provided. Optional
freyja_fastq freyja_pathogen String Pathogen of interest, used if not providing the barcodes and lineage metadata files. Options: SARS-CoV-2, MPXV, H5NX, H1N1pdm, FLU-B-VIC, MEASLESN450, MEASLES, RSVa, RSVb Optional
freyja_fastq freyja_pathogen String Pathogen to be used by Freyja SARS-CoV-2 Optional
freyja_fastq kraken2_target_organism String The organism whose abundance the user wants to check in their reads. This should be a proper taxonomic name recognized by the Kraken database. Severe acute respiratory syndrome coronavirus 2 Optional
freyja_fastq ont Boolean Indicates if the input data is derived from an ONT instrument. FALSE Optional
freyja_fastq primer_bed File The bed file containing the primers used when sequencing was performed Optional
freyja_fastq read2 File Illumina reverse read file in FASTQ file format (compression optional) Optional
freyja_fastq reference_gff File The GFF file for reference; should match the reference used for alignment (Wuhan-Hu-1) Optional
freyja_fastq trimmomatic_min_length Int The minimum length cut-off when performing read cleaning 25 Optional
get_fasta_genome_size cpu Int Number of CPUs to allocate to the task 1 Optional
get_fasta_genome_size disk_size Int Amount of storage (in GB) to allocate to the task 10 Optional
get_fasta_genome_size docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/biocontainers/seqkit:2.4.0--h9ee0642_0 Optional
get_fasta_genome_size memory Int Amount of memory/RAM (in GB) to allocate to the task 2 Optional
minimap2 cpu Int Number of CPUs to allocate to the task 2 Optional
minimap2 disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
minimap2 docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/minimap2:2.22 Optional
minimap2 memory Int Amount of memory/RAM (in GB) to allocate to the task 8 Optional
minimap2 query2 File Internal component, do not modify Optional
nanoplot_clean cpu Int Number of CPUs to allocate to the task 4 Optional
nanoplot_clean disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
nanoplot_clean docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/nanoplot:1.40.0 Optional
nanoplot_clean max_length Int The maximum length of clean reads, for which reads longer than the length specified will be hidden. 100000 Optional
nanoplot_clean memory Int Amount of memory/RAM (in GB) to allocate to the task 16 Optional
nanoplot_raw cpu Int Number of CPUs to allocate to the task 4 Optional
nanoplot_raw disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
nanoplot_raw docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/nanoplot:1.40.0 Optional
nanoplot_raw max_length Int The maximum length of clean reads, for which reads longer than the length specified will be hidden. 100000 Optional
nanoplot_raw memory Int Amount of memory/RAM (in GB) to allocate to the task 16 Optional
primer_trim cpu Int Internal component, do not modify Optional
primer_trim disk_size Int Internal component, do not modify Optional
primer_trim docker String Internal component, do not modify Optional
primer_trim keep_noprimer_reads Boolean Internal component, do not modify Optional
primer_trim memory Int Internal component, do not modify Optional
read_QC_trim_ont artic_guppyplex_cpu Int Number of CPUs to allocate to the task 8 Optional
read_QC_trim_ont artic_guppyplex_disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
read_QC_trim_ont artic_guppyplex_docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/artic-ncov2019:1.3.0-medaka-1.4.3 Optional
read_QC_trim_ont artic_guppyplex_memory Int Amount of memory/RAM (in GB) to allocate to the task 16 Optional
read_QC_trim_ont call_kraken Boolean True/False variable that determines if the Kraken2 task should be called; for non-TheiaCoV workflows, the kraken_db variable must be provided. FALSE Optional
read_QC_trim_ont downsampling_coverage Float Internal component, do not modify 150 Optional
read_QC_trim_ont genome_length Int Internal component, do not modify Optional
read_QC_trim_ont genome_length Int Length of the genome 5000000 Optional
read_QC_trim_ont kraken2_recalculate_abundances_cpu Int Number of CPUs to allocate to the task 4 Optional
read_QC_trim_ont kraken2_recalculate_abundances_disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
read_QC_trim_ont kraken2_recalculate_abundances_docker Int The Docker container to use for the task us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-08-28-v4 Optional
read_QC_trim_ont kraken2_recalculate_abundances_memory Int Amount of memory/RAM (in GB) to allocate to the task 8 Optional
read_QC_trim_ont kraken_cpu Int Number of CPUs to allocate to the task 4 Optional
read_QC_trim_ont kraken_db File A kraken2 database to use with the kraken2 optional task. The file must be a .tar.gz kraken2 database. Optional
read_QC_trim_ont kraken_disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
read_QC_trim_ont kraken_docker_image String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/kraken2:2.1.2-no-db Optional
read_QC_trim_ont kraken_memory Int Amount of memory/RAM (in GB) to allocate to the task 8 Optional
read_QC_trim_ont max_length Int Internal component, do not modify Optional
read_QC_trim_ont min_length Int Internal component, do not modify Optional
read_QC_trim_ont nanoq_cpu Int Number of CPUs to allocate to the task 2 Optional
read_QC_trim_ont nanoq_disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
read_QC_trim_ont nanoq_docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/biocontainers/nanoq:0.9.0--hec16e2b_1 Optional
read_QC_trim_ont nanoq_max_read_length Int Maximum read length to use for filtering Optional
read_QC_trim_ont nanoq_max_read_qual Int Maximum read quality to use for filtering Optional
read_QC_trim_ont nanoq_memory Int Amount of memory/RAM (in GB) to allocate to the task 2 Optional
read_QC_trim_ont nanoq_min_read_length Int Minimum read length to use for filtering Optional
read_QC_trim_ont nanoq_min_read_qual Int Minimum read quality to use for filtering Optional
read_QC_trim_ont ncbi_scrub_cpu Int Number of CPUs to allocate to the task 4 Optional
read_QC_trim_ont ncbi_scrub_disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
read_QC_trim_ont ncbi_scrub_docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/ncbi/sra-human-scrubber:2.2.1 Optional
read_QC_trim_ont ncbi_scrub_memory Int Amount of memory/RAM (in GB) to allocate to the task 4 Optional
read_QC_trim_ont rasusa_bases String Explicitly set the number of bases required e.g., 4.3kb, 7Tb, 9000, 4.1MB. If this option is given, --coverage and --genome-size are ignored Optional
read_QC_trim_ont rasusa_cpu Int Number of CPUs to allocate to the task 4 Optional
read_QC_trim_ont rasusa_disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
read_QC_trim_ont rasusa_docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/rasusa:2.1.0 Optional
read_QC_trim_ont rasusa_fraction_of_reads Float Subsample to a fraction of the reads - e.g., 0.5 samples half the reads Optional
read_QC_trim_ont rasusa_memory Int Amount of memory/RAM (in GB) to allocate to the task 8 Optional
read_QC_trim_ont rasusa_number_of_reads Int Subsample to a specific number of reads Optional
read_QC_trim_ont rasusa_seed Int Random seed to use Optional
read_QC_trim_ont run_prefix String Internal component, do not modify Optional
read_QC_trim_ont target_organism String This string is searched for in the kraken2 outputs to extract the read percentage Optional
read_QC_trim_pe adapters File Internal component, do not modify Optional
read_QC_trim_pe bbduk_memory Int Internal component, do not modify 8 Optional
read_QC_trim_pe call_kraken Boolean Internal component, do not modify FALSE Optional
read_QC_trim_pe call_midas Boolean Internal component, do not modify FALSE Optional
read_QC_trim_pe extract_unclassified Boolean Internal component, do not modify FALSE Optional
read_QC_trim_pe fastp_args String Internal component, do not modify --detect_adapter_for_pe -g -5 20 -3 20 Optional
read_QC_trim_pe host String Internal component, do not modify Optional
read_QC_trim_pe host_complete_only Boolean Internal component, do not modify FALSE Optional
read_QC_trim_pe host_decontaminate_mem Int Internal component, do not modify 32 Optional
read_QC_trim_pe host_is_accession Boolean Internal component, do not modify FALSE Optional
read_QC_trim_pe host_refseq Boolean Internal component, do not modify TRUE Optional
read_QC_trim_pe kraken_cpu Int Internal component, do not modify 4 Optional
read_QC_trim_pe kraken_db File Internal component, do not modify Optional
read_QC_trim_pe kraken_disk_size Int Internal component, do not modify 100 Optional
read_QC_trim_pe kraken_memory Int Internal component, do not modify 8 Optional
read_QC_trim_pe midas_db File Internal component, do not modify Optional
read_QC_trim_pe phix File Internal component, do not modify Optional
read_QC_trim_pe read_processing String Internal component, do not modify trimmomatic Optional
read_QC_trim_pe read_qc String Internal component, do not modify fastq_scan Optional
read_QC_trim_pe target_organism String Internal component, do not modify Optional
read_QC_trim_pe taxon_id Int Internal component, do not modify 0 Optional
read_QC_trim_pe trim_quality_min_score Int The minimum quality score to keep during trimming 30 Optional
read_QC_trim_pe trim_quality_trim_score Int Internal component, do not modify 30 Optional
read_QC_trim_pe trim_window_size Int Internal component, do not modify 4 Optional
read_QC_trim_pe trimmomatic_args String Internal component, do not modify Optional
read_QC_trim_se adapters File Internal component, do not modify Optional
read_QC_trim_se bbduk_memory Int Internal component, do not modify 8 Optional
read_QC_trim_se call_kraken Boolean Internal component, do not modify FALSE Optional
read_QC_trim_se call_midas Boolean Internal component, do not modify FALSE Optional
read_QC_trim_se fastp_args String Internal component, do not modify --detect_adapter_for_pe -g -5 20 -3 20 Optional
read_QC_trim_se kraken_cpu Int Internal component, do not modify 4 Optional
read_QC_trim_se kraken_db File Internal component, do not modify Optional
read_QC_trim_se kraken_disk_size Int Internal component, do not modify 100 Optional
read_QC_trim_se kraken_memory Int Internal component, do not modify 8 Optional
read_QC_trim_se midas_db File Internal component, do not modify Optional
read_QC_trim_se phix File Internal component, do not modify Optional
read_QC_trim_se read_processing String Internal component, do not modify trimmomatic Optional
read_QC_trim_se read_qc String Internal component, do not modify fastq_scan Optional
read_QC_trim_se target_organism String Internal component, do not modify Optional
read_QC_trim_se trim_quality_min_score Int Internal component, do not modify 30 Optional
read_QC_trim_se trim_window_size Int Internal component, do not modify 4 Optional
read_QC_trim_se trimmomatic_args String Internal component, do not modify Optional
sam_to_sorted_bam cpu Int Number of CPUs to allocate to the task 2 Optional
sam_to_sorted_bam disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
sam_to_sorted_bam docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/samtools:1.17 Optional
sam_to_sorted_bam memory Int Amount of memory/RAM (in GB) to allocate to the task 8 Optional
sam_to_sorted_bam min_qual Int Minimum quality score for reads to be included in the analysis Optional
version_capture docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0 Optional
version_capture timezone String Set the time zone to get an accurate date of analysis (uses UTC by default) Optional

Analysis Tasks

read_QC_trim: Read Quality Trimming, Adapter Removal, Quantification, and Identification

read_QC_trim is a sub-workflow that removes low-quality reads, low-quality regions of reads, and sequencing adapters to improve data quality. It uses a number of tasks, described below. The differences between the PE and SE versions of the read_QC_trim sub-workflow lie in the default parameters, the use of two or one input read file(s), and the different output files.

HRRT: Human Host Sequence Removal

All reads of human origin are removed, including their mates, by using NCBI's human read removal tool (HRRT).

HRRT is based on the SRA Taxonomy Analysis Tool and employs a k-mer database constructed of k-mers from Eukaryota derived from all human RefSeq records with any k-mers found in non-Eukaryota RefSeq records subtracted from the database.

NCBI-Scrub Technical Details

Links
Task task_ncbi_scrub.wdl
Software Source Code HRRT on GitHub
Software Documentation HRRT on NCBI
Read quality trimming

read_processing with "trimmomatic" (default) or "fastp"

Either trimmomatic or fastp can be used for read-quality trimming. Trimmomatic is used by default.

To activate fastp, set the read_processing input parameter to "fastp".

These tasks are mutually exclusive.

Trimmomatic: Read Trimming

Trimmomatic trims low-quality regions of Illumina paired-end or single-end reads with a sliding window (with a default window size of 4, specified with trim_window_size), cutting once the average quality within the window falls below the trim_quality_trim_score (default of 20 for paired-end, 30 for single-end). The read is discarded if it is trimmed below trim_minlen (default of 75 for paired-end, 25 for single-end).

Trimmomatic Technical Details

Links
Task task_trimmomatic.wdl
Software Source Code Trimmomatic on GitHub
Software Documentation Trimmomatic Website
Original Publication(s) Trimmomatic: a flexible trimmer for Illumina sequence data
fastp: Read Trimming

fastp trims low-quality regions of Illumina paired-end or single-end reads with a sliding window (with a default window size of 4, specified with trim_window_size), cutting once the average quality within the window falls below the trim_quality_trim_score (default of 20 for paired-end, 30 for single-end). The read is discarded if it is trimmed below trim_minlen (default of 75 for paired-end, 25 for single-end).

fastp also has additional default parameters and features that are not a part of trimmomatic's default configuration.

fastp default read-trimming parameters
Parameter Explanation
-g enables polyG tail trimming
-5 20 enables read end-trimming
-3 20 enables read end-trimming
--detect_adapter_for_pe enables adapter-trimming only for paired-end reads

Additional arguments can be passed using the fastp_args optional parameter.

Trimmomatic and fastp Technical Details

Links
Task task_fastp.wdl
Software Source Code fastp on GitHub
Software Documentation fastp on GitHub
Original Publication(s) fastp: an ultra-fast all-in-one FASTQ preprocessor
BBDuk: Adapter Trimming and PhiX Removal

Adapters are manufactured oligonucleotide sequences attached to DNA fragments during the library preparation process. In Illumina sequencing, these adapter sequences are required for attaching reads to flow cells. You can read more about Illumina adapters here. For genome analysis, it's important to remove these sequences since they're not actually from your sample. If you don't remove them, the downstream analysis may be affected.

The bbduk task removes adapters from sequence reads. To do this:

  • Repair from the BBTools package reorders reads in paired fastq files to ensure the forward and reverse reads of a pair are in the same position in the two fastq files (it re-pairs).
  • BBDuk ("Bestus Bioinformaticus" Decontamination Using Kmers) is then used to trim the adapters and filter out all reads that have a 31-mer match to PhiX, which is commonly added to Illumina sequencing runs to monitor and/or improve overall run quality.

BBDuk Technical Details

Links
Task task_bbduk.wdl
Software Source Code BBMap on SourceForge
Software Documentation BBDuk Guide (archived)
Read Quantification

read_qc with "fastq-scan" (default) or "fastqc"

Either fastq-scan or fastqc can be used for read quantification. fastq-scan is used by default.

To activate fastqc, set the read_qc input parameter to "fastqc".

These tasks are mutually exclusive.

fastq-scan: Read Quantification

fastq-scan quantifies the forward and reverse reads in FASTQ files. For paired-end data, it also provide the total number of read pairs. This task is run once with raw reads as input and once with clean reads as input. If QC has been performed correctly, you should expect fewer clean reads than raw reads.

fastq-scan Technical Details

Links
Task task_fastq_scan.wdl
Software Source Code fastq-scan on GitHub
Software Documentation fastq-scan on GitHub
FastQC: Read Quantification

FastQC quantifies the forward and reverse reads in FASTQ files. For paired-end data, it also provide the total number of read pairs. This task is run once with raw reads as input and once with clean reads as input. If QC has been performed correctly, you should expect fewer clean reads than raw reads.

This tool also provides a graphical visualization of the read quality.

FastQC Technical Details

Links
Task task_fastqc.wdl
Software Source Code FastQC on Github
Software Documentation FastQC Website

read_QC_trim Technical Details

Links
Subworkflow wf_read_QC_trim_pe.wdl
wf_read_QC_trim_se.wdl
bwa Details

This task aligns the cleaned short reads (Illumina) to the reference genome provided by the user.

BWA Technical Details

Links
Task task_bwa.wdl
Software Source Code BWA on GitHub
Software Documentation BWA Documentation
Original Publication(s) Fast and accurate short read alignment with Burrows-Wheeler transform
primer_trim Details

This task trims the primer sequences from the aligned bam file with iVar. The optional input, keep_noprimer_reads, does not have to be modified.

Primer Trim Technical Details

Links
Task task_ivar_primer_trim.wdl
Software Source Code https://github.com/andersen-lab/ivar
Software Documentation https://andersen-lab.github.io/ivar/html/manualpage.html
Original Publication(s) An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar
read_QC_trim: Read Quality Trimming, Adapter Removal, Quantification, and Identification

read_QC_trim is a sub-workflow that removes low-quality reads, low-quality regions of reads, and sequencing adapters to improve data quality. It uses a number of tasks, described below. The differences between the PE and SE versions of the read_QC_trim sub-workflow lie in the default parameters, the use of two or one input read file(s), and the different output files.

HRRT: Human Host Sequence Removal

All reads of human origin are removed, including their mates, by using NCBI's human read removal tool (HRRT).

HRRT is based on the SRA Taxonomy Analysis Tool and employs a k-mer database constructed of k-mers from Eukaryota derived from all human RefSeq records with any k-mers found in non-Eukaryota RefSeq records subtracted from the database.

NCBI-Scrub Technical Details

Links
Task task_ncbi_scrub.wdl
Software Source Code HRRT on GitHub
Software Documentation HRRT on NCBI
Read quality trimming

read_processing with "trimmomatic" (default) or "fastp"

Either trimmomatic or fastp can be used for read-quality trimming. Trimmomatic is used by default.

To activate fastp, set the read_processing input parameter to "fastp".

These tasks are mutually exclusive.

Trimmomatic: Read Trimming

Trimmomatic trims low-quality regions of Illumina paired-end or single-end reads with a sliding window (with a default window size of 4, specified with trim_window_size), cutting once the average quality within the window falls below the trim_quality_trim_score (default of 20 for paired-end, 30 for single-end). The read is discarded if it is trimmed below trim_minlen (default of 75 for paired-end, 25 for single-end).

Trimmomatic Technical Details

Links
Task task_trimmomatic.wdl
Software Source Code Trimmomatic on GitHub
Software Documentation Trimmomatic Website
Original Publication(s) Trimmomatic: a flexible trimmer for Illumina sequence data
fastp: Read Trimming

fastp trims low-quality regions of Illumina paired-end or single-end reads with a sliding window (with a default window size of 4, specified with trim_window_size), cutting once the average quality within the window falls below the trim_quality_trim_score (default of 20 for paired-end, 30 for single-end). The read is discarded if it is trimmed below trim_minlen (default of 75 for paired-end, 25 for single-end).

fastp also has additional default parameters and features that are not a part of trimmomatic's default configuration.

fastp default read-trimming parameters
Parameter Explanation
-g enables polyG tail trimming
-5 20 enables read end-trimming
-3 20 enables read end-trimming
--detect_adapter_for_pe enables adapter-trimming only for paired-end reads

Additional arguments can be passed using the fastp_args optional parameter.

Trimmomatic and fastp Technical Details

Links
Task task_fastp.wdl
Software Source Code fastp on GitHub
Software Documentation fastp on GitHub
Original Publication(s) fastp: an ultra-fast all-in-one FASTQ preprocessor
BBDuk: Adapter Trimming and PhiX Removal

Adapters are manufactured oligonucleotide sequences attached to DNA fragments during the library preparation process. In Illumina sequencing, these adapter sequences are required for attaching reads to flow cells. You can read more about Illumina adapters here. For genome analysis, it's important to remove these sequences since they're not actually from your sample. If you don't remove them, the downstream analysis may be affected.

The bbduk task removes adapters from sequence reads. To do this:

  • Repair from the BBTools package reorders reads in paired fastq files to ensure the forward and reverse reads of a pair are in the same position in the two fastq files (it re-pairs).
  • BBDuk ("Bestus Bioinformaticus" Decontamination Using Kmers) is then used to trim the adapters and filter out all reads that have a 31-mer match to PhiX, which is commonly added to Illumina sequencing runs to monitor and/or improve overall run quality.

BBDuk Technical Details

Links
Task task_bbduk.wdl
Software Source Code BBMap on SourceForge
Software Documentation BBDuk Guide (archived)
Read Quantification

read_qc with "fastq-scan" (default) or "fastqc"

Either fastq-scan or fastqc can be used for read quantification. fastq-scan is used by default.

To activate fastqc, set the read_qc input parameter to "fastqc".

These tasks are mutually exclusive.

fastq-scan: Read Quantification

fastq-scan quantifies the forward and reverse reads in FASTQ files. For paired-end data, it also provide the total number of read pairs. This task is run once with raw reads as input and once with clean reads as input. If QC has been performed correctly, you should expect fewer clean reads than raw reads.

fastq-scan Technical Details

Links
Task task_fastq_scan.wdl
Software Source Code fastq-scan on GitHub
Software Documentation fastq-scan on GitHub
FastQC: Read Quantification

FastQC quantifies the forward and reverse reads in FASTQ files. For paired-end data, it also provide the total number of read pairs. This task is run once with raw reads as input and once with clean reads as input. If QC has been performed correctly, you should expect fewer clean reads than raw reads.

This tool also provides a graphical visualization of the read quality.

FastQC Technical Details

Links
Task task_fastqc.wdl
Software Source Code FastQC on Github
Software Documentation FastQC Website

read_QC_trim Technical Details

Links
Subworkflow wf_read_QC_trim_pe.wdl
wf_read_QC_trim_se.wdl
bwa Details

This task aligns the cleaned short reads (Illumina) to the reference genome provided by the user.

BWA Technical Details

Links
Task task_bwa.wdl
Software Source Code BWA on GitHub
Software Documentation BWA Documentation
Original Publication(s) Fast and accurate short read alignment with Burrows-Wheeler transform
primer_trim Details

This task trims the primer sequences from the aligned bam file with iVar. The optional input, keep_noprimer_reads, does not have to be modified.

Primer Trim Technical Details

Links
Task task_ivar_primer_trim.wdl
Software Source Code https://github.com/andersen-lab/ivar
Software Documentation https://andersen-lab.github.io/ivar/html/manualpage.html
Original Publication(s) An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar
read_QC_trim_ont: Read Quality Trimming, Quantification, and Identification

read_QC_trim_ont is a sub-workflow that filters low-quality reads and trims low-quality regions of reads. It uses several tasks, described below.

HRRT: Human Host Sequence Removal

All reads of human origin are removed, including their mates, by using NCBI's human read removal tool (HRRT).

HRRT is based on the SRA Taxonomy Analysis Tool and employs a k-mer database constructed of k-mers from Eukaryota derived from all human RefSeq records with any k-mers found in non-Eukaryota RefSeq records subtracted from the database.

NCBI-Scrub Technical Details

Links
Task task_ncbi_scrub.wdl
Software Source Code HRRT on GitHub
Software Documentation HRRT on NCBI
artic_guppyplex: Read Filtering

Reads are filtered by length with artic_guppyplex, which is a part of the ARTIC protocol. Since TheiaCoV was developed primarily for amplicon-based viral sequencing, this task is included to remove chimeric reads that are either too short or too long.

artic_guppyplex Technical Details

Links
Task task_artic_guppyplex.wdl
Software Source Code ARTIC on GitHub
Software Documentation ARTIC Documentation
Kraken2: Read Identification

Kraken2 is a bioinformatics tool originally designed for metagenomic applications. It has additionally proven valuable for validating taxonomic assignments and checking contamination of single-species (e.g. bacterial isolate, eukaryotic isolate, viral isolate, etc.) whole genome sequence data.

Kraken2 is run on both the raw and clean reads.

Database-dependent

This workflow automatically uses a viral-specific Kraken2 database. This database was generated in-house from RefSeq's viral sequence collection and human genome GRCh38. It's available at gs://theiagen-public-resources-rp/reference_data/databases/kraken2/kraken2_humanGRCh38_viralRefSeq_20240828.tar.gz.

Kraken2 Technical Details

Links
Task task_kraken2.wdl
Software Source Code Kraken2 on GitHub
Software Documentation Kraken2 Documentation
Original Publication(s) Improved metagenomic analysis with Kraken 2
NanoPlot: Read Quantification

NanoPlot is used for the determination of mean quality scores, read lengths, and number of reads. This task is run once with raw reads as input and once with clean reads as input. If QC has been performed correctly, you should expect fewer clean reads than raw reads.

While this task currently is run outside of the read_QC_trim_ont workflow, it is being included here as it calculates statistics on the read data. This is done so that the actual assembly genome lengths can be used (if an estimated genome length is not provided by the user) to ensure the estimated coverage statistics are accurate.

NanoPlot Technical Details

Links
Task task_nanoplot.wdl
Software Source Code NanoPlot on GitHub
Software Documentation NanoPlot Documentation
Original Publication(s) NanoPack2: population-scale evaluation of long-read sequencing data

read_QC_trim_ont Technical Details

Links
Subworkflow wf_read_QC_trim_ont.wdl
minimap2: Read Alignment Details

minimap2 is a popular aligner that is used to align reads (or assemblies) to an assembly file. In minimap2, "modes" are a group of preset options.

The mode used in this task is map-ont which is the default mode for long reads and indicates that long reads of ~10% error rates should be aligned to the reference genome. The output file is in SAM format.

For more information regarding modes and the available options for minimap2, please see the minimap2 manpage

minimap2 Technical Details

Links
Task task_minimap2.wdl
Software Source Code minimap2 on GitHub
Software Documentation minimap2
Original Publication(s) Minimap2: pairwise alignment for nucleotide sequences
freyja Details

The Freyja task will call variants and capture sequencing depth information to identify the relative abundance of lineages present. Optionally, if bootstrap is set to true, bootstrapping will be performed. After the optional bootstrapping step, the variants are demixed.

Freyja Technical Details

Links
Task task_freyja_one_sample.wdl
Software Source Code https://github.com/andersen-lab/Freyja
Software Documentation https://andersen-lab.github.io/Freyja/index.html#

Outputs

The main output file used in subsequent Freyja workflows is found under the freyja_demixed column. This TSV file takes on the following format:

sample name
summarized [('Delta', 0.65), ('Other', 0.25), ('Alpha', 0.1')]
lineages ['B.1.617.2' 'B.1.2' 'AY.6' 'Q.3']
abundances "[0.5 0.25 0.15 0.1]"
resid 3.14159
coverage 95.8
  • The summarized array denotes a sum of all lineage abundances in a particular WHO designation (i.e. B.1.617.2 and AY.6 abundances are summed in the above example), otherwise they are grouped into "Other".
  • The lineage array lists the identified lineages in descending order
  • The abundances array contains the corresponding abundances estimates.
  • The value of resid corresponds to the residual of the weighted least absolute deviation problem used to estimate lineage abundances.
  • The coverage value provides the 10x coverage estimate (percent of sites with 10 or greater reads)

Click "Ignore empty outputs"

When running the Freyja_FASTQ_PHB workflow, it is recommended to select the "Ignore empty outputs" option in the Terra UI. This will hide the output columns that will not be generated for your input data type.

Variable Type Description
aligned_bai String Index companion file to the bam file generated during the consensus assembly process
aligned_bam String Sorted BAM file containing the alignments of reads to the reference genome
alignment_method String The method used to generate the alignment
bbduk_docker String The Docker image for bbduk, which was used to remove the adapters from the sequences
bwa_version String Version of BWA software used
fastp_html_report String The HTML report made with fastp
fastp_version String The version of fastp used
fastq_scan_clean1_json String The JSON file output from fastq-scan containing summary stats about clean forward read quality and length
fastq_scan_clean2_json File The JSON file output from fastq-scan containing summary stats about clean reverse read quality and length
fastq_scan_num_reads_clean1 String The number of forward reads after cleaning as calculated by fastq_scan
fastq_scan_num_reads_clean2 Int The number of reverse reads after cleaning as calculated by fastq_scan
fastq_scan_num_reads_clean_pairs String The number of read pairs after cleaning as calculated by fastq_scan
fastq_scan_num_reads_raw1 String The number of input forward reads as calculated by fastq_scan
fastq_scan_num_reads_raw2 Int The number of input reserve reads as calculated by fastq_scan
fastq_scan_num_reads_raw_pairs String The number of input read pairs as calculated by fastq_scan
fastq_scan_raw1_json String The JSON file output from fastq-scan containing summary stats about raw forward read quality and length
fastq_scan_raw2_json File The JSON file output from fastq-scan containing summary stats about raw reverse read quality and length
fastq_scan_version String The version of fastq_scan
fastqc_clean1_html String An HTML file that provides a graphical visualization of clean forward read quality from fastqc to open in an internet browser
fastqc_clean2_html File An HTML file that provides a graphical visualization of clean reverse read quality from fastqc to open in an internet browser
fastqc_docker String The Docker container used for fastqc
fastqc_num_reads_clean1 String The number of forward reads after cleaning by fastqc
fastqc_num_reads_clean2 Int The number of reverse reads after cleaning by fastqc
fastqc_num_reads_clean_pairs String The number of read pairs after cleaning by fastqc
fastqc_num_reads_raw1 String The number of input forward reads by fastqc before cleaning
fastqc_num_reads_raw2 Int The number of input reverse reads by fastqc before cleaning
fastqc_num_reads_raw_pairs String The number of input read pairs by fastqc before cleaning
fastqc_raw1_html String An HTML file that provides a graphical visualization of raw forward read quality from fastqc to open in an internet browser
fastqc_raw2_html File An HTML file that provides a graphical visualization of raw reverse read quality from fastqc to open in an internet browser
fastqc_version String Version of fastqc software used
freyja_abundances String Abundances estimates identified by Freyja and parsed from freyja_demixed file
freyja_barcode_file String Barcode file used with Freyja
freyja_barcode_version String Name of barcode file used, or the date if update_db is true
freyja_bootstrap_lineages String A CSV that contains the 0.025, 0.05, 0.25, 0.5 (median), 0.75, 0.95, and 0.975 percentiles for each lineage
freyja_bootstrap_lineages_pdf String A boxplot of the bootstrap lineages CSV file
freyja_bootstrap_summary String A CSV that contains the 0.025, 0.05, 0.25, 0.5 (median), 0.75, 0.95, and 0.975 percentiles for each WHO designated VOI/VOC
freyja_bootstrap_summary_pdf String A boxplot of the bootstrap summary CSV file
freyja_coverage Float Coverage identified by Freyja and parsed from freyja_demixed file
freyja_demixed File The main output TSV; see the section directly above this table for an explanation
freyja_demixed_parsed File Parsed freyja_demixed file, containing the same information, for easy result concatenation
freyja_depths File A TSV listing the depth of every position
freyja_fastq_wf_analysis_date String Date of analysis
freyja_fastq_wf_version String The version of the Public Health Bioinformatics (PHB) repository used
freyja_lineage_metadata_file String Metadata file for lineages identified by Freyja
freyja_lineages String Lineages in descending order identified by Freyja and parsed from freyja_demixed file
freyja_metadata_version String Name of lineage metadata file used, or the date if update_db is true
freyja_resid String Residual of the weighted least absolute deviation problem used to estimate lineage abundances identified by Freyja and parsed from freyja_demixed file
freyja_summarized String Sum of all lineage abundances in a particular WHO designation identified by Freyja and parsed from freyja_demixed file
freyja_variants File The TSV file containing the variants identified by Freyja
freyja_version String version of Freyja used
ivar_version_primtrim String Version of iVar for running the iVar trim command
kraken_human Float Percent of human read data detected using the Kraken2 software
kraken_human_dehosted Float Percent of human read data detected using the Kraken2 software after host removal
kraken_report String Full Kraken report
kraken_report_dehosted File Full Kraken report after host removal
kraken_sc2 String Percent of SARS-CoV-2 read data detected using the Kraken2 software
kraken_sc2_dehosted String Percent of SARS-CoV-2 read data detected using the Kraken2 software after host removal
kraken_version String Version of Kraken software used
primer_bed_name String Name of the primer bed files used for primer trimming
primer_trimmed_read_percent Float Percentage of read data with primers trimmed as determined by iVar trim
read1_clean File Forward read file after quality trimming and adapter removal
read1_dehosted File The dehosted forward reads file; suggested read file for SRA submission
read2_clean File Reverse read file after quality trimming and adapter removal
read2_dehosted File The dehosted reverse reads file; suggested read file for SRA submission
samtools_version String The version of SAMtools used to sort and index the alignment file
samtools_version_primtrim String The version of SAMtools used to create the pileup before running iVar trim
trimmomatic_docker String The docker image used for the trimmomatic module in this workflow
trimmomatic_version String The version of Trimmomatic used
Variable Type Description
aligned_bai String Index companion file to the bam file generated during the consensus assembly process
aligned_bam String Sorted BAM file containing the alignments of reads to the reference genome
alignment_method String The method used to generate the alignment
bbduk_docker String The Docker image for bbduk, which was used to remove the adapters from the sequences
bwa_version String Version of BWA software used
fastp_html_report String The HTML report made with fastp
fastp_version String The version of fastp used
fastq_scan_clean1_json String The JSON file output from fastq-scan containing summary stats about clean forward read quality and length
fastq_scan_num_reads_clean1 String The number of forward reads after cleaning as calculated by fastq_scan
fastq_scan_num_reads_raw1 String The number of input forward reads as calculated by fastq_scan
fastq_scan_raw1_json String The JSON file output from fastq-scan containing summary stats about raw forward read quality and length
fastq_scan_version String The version of fastq_scan
fastqc_clean1_html String An HTML file that provides a graphical visualization of clean forward read quality from fastqc to open in an internet browser
fastqc_docker String The Docker container used for fastqc
fastqc_num_reads_clean1 String The number of forward reads after cleaning by fastqc
fastqc_num_reads_raw1 String The number of input forward reads by fastqc before cleaning
fastqc_raw1_html String An HTML file that provides a graphical visualization of raw forward read quality from fastqc to open in an internet browser
fastqc_version String Version of fastqc software used
freyja_abundances String Abundances estimates identified by Freyja and parsed from freyja_demixed file
freyja_barcode_file String Barcode file used with Freyja
freyja_barcode_version String Name of barcode file used, or the date if update_db is true
freyja_bootstrap_lineages String A CSV that contains the 0.025, 0.05, 0.25, 0.5 (median), 0.75, 0.95, and 0.975 percentiles for each lineage
freyja_bootstrap_lineages_pdf String A boxplot of the bootstrap lineages CSV file
freyja_bootstrap_summary String A CSV that contains the 0.025, 0.05, 0.25, 0.5 (median), 0.75, 0.95, and 0.975 percentiles for each WHO designated VOI/VOC
freyja_bootstrap_summary_pdf String A boxplot of the bootstrap summary CSV file
freyja_coverage Float Coverage identified by Freyja and parsed from freyja_demixed file
freyja_demixed File The main output TSV; see the section directly above this table for an explanation
freyja_demixed_parsed File Parsed freyja_demixed file, containing the same information, for easy result concatenation
freyja_depths File A TSV listing the depth of every position
freyja_fastq_wf_analysis_date String Date of analysis
freyja_fastq_wf_version String The version of the Public Health Bioinformatics (PHB) repository used
freyja_lineage_metadata_file String Metadata file for lineages identified by Freyja
freyja_lineages String Lineages in descending order identified by Freyja and parsed from freyja_demixed file
freyja_metadata_version String Name of lineage metadata file used, or the date if update_db is true
freyja_resid String Residual of the weighted least absolute deviation problem used to estimate lineage abundances identified by Freyja and parsed from freyja_demixed file
freyja_summarized String Sum of all lineage abundances in a particular WHO designation identified by Freyja and parsed from freyja_demixed file
freyja_variants File The TSV file containing the variants identified by Freyja
freyja_version String version of Freyja used
ivar_version_primtrim String Version of iVar for running the iVar trim command
kraken_human Float Percent of human read data detected using the Kraken2 software
kraken_human_dehosted Float Percent of human read data detected using the Kraken2 software after host removal
kraken_report String Full Kraken report
kraken_report_dehosted File Full Kraken report after host removal
kraken_sc2 String Percent of SARS-CoV-2 read data detected using the Kraken2 software
kraken_sc2_dehosted String Percent of SARS-CoV-2 read data detected using the Kraken2 software after host removal
kraken_version String Version of Kraken software used
primer_bed_name String Name of the primer bed files used for primer trimming
primer_trimmed_read_percent Float Percentage of read data with primers trimmed as determined by iVar trim
samtools_version String The version of SAMtools used to sort and index the alignment file
samtools_version_primtrim String The version of SAMtools used to create the pileup before running iVar trim
trimmomatic_docker String The docker image used for the trimmomatic module in this workflow
trimmomatic_version String The version of Trimmomatic used
Variable Type Description
aligned_bai String Index companion file to the bam file generated during the consensus assembly process
aligned_bam String Sorted BAM file containing the alignments of reads to the reference genome
alignment_method String The method used to generate the alignment
freyja_abundances String Abundances estimates identified by Freyja and parsed from freyja_demixed file
freyja_barcode_file String Barcode file used with Freyja
freyja_barcode_version String Name of barcode file used, or the date if update_db is true
freyja_bootstrap_lineages String A CSV that contains the 0.025, 0.05, 0.25, 0.5 (median), 0.75, 0.95, and 0.975 percentiles for each lineage
freyja_bootstrap_lineages_pdf String A boxplot of the bootstrap lineages CSV file
freyja_bootstrap_summary String A CSV that contains the 0.025, 0.05, 0.25, 0.5 (median), 0.75, 0.95, and 0.975 percentiles for each WHO designated VOI/VOC
freyja_bootstrap_summary_pdf String A boxplot of the bootstrap summary CSV file
freyja_coverage Float Coverage identified by Freyja and parsed from freyja_demixed file
freyja_demixed File The main output TSV; see the section directly above this table for an explanation
freyja_demixed_parsed File Parsed freyja_demixed file, containing the same information, for easy result concatenation
freyja_depths File A TSV listing the depth of every position
freyja_fastq_wf_analysis_date String Date of analysis
freyja_fastq_wf_version String The version of the Public Health Bioinformatics (PHB) repository used
freyja_lineage_metadata_file String Metadata file for lineages identified by Freyja
freyja_lineages String Lineages in descending order identified by Freyja and parsed from freyja_demixed file
freyja_metadata_version String Name of lineage metadata file used, or the date if update_db is true
freyja_resid String Residual of the weighted least absolute deviation problem used to estimate lineage abundances identified by Freyja and parsed from freyja_demixed file
freyja_summarized String Sum of all lineage abundances in a particular WHO designation identified by Freyja and parsed from freyja_demixed file
freyja_variants File The TSV file containing the variants identified by Freyja
freyja_version String version of Freyja used
ivar_version_primtrim String Version of iVar for running the iVar trim command
kraken_human Float Percent of human read data detected using the Kraken2 software
kraken_human_dehosted Float Percent of human read data detected using the Kraken2 software after host removal
kraken_report String Full Kraken report
kraken_report_dehosted File Full Kraken report after host removal
kraken_sc2 String Percent of SARS-CoV-2 read data detected using the Kraken2 software
kraken_sc2_dehosted String Percent of SARS-CoV-2 read data detected using the Kraken2 software after host removal
kraken_version String Version of Kraken software used
minimap2_docker String The Docker image of minimap2
minimap2_version String The version of minimap2
nanoplot_html_clean File An HTML report describing the clean reads
nanoplot_html_raw File An HTML report describing the raw reads
nanoplot_num_reads_clean1 Int Number of clean reads
nanoplot_num_reads_raw1 Int Number of raw reads
nanoplot_r1_est_coverage_clean Float Estimated coverage on the clean reads by nanoplot
nanoplot_r1_est_coverage_raw Float Estimated coverage on the raw reads by nanoplot
nanoplot_r1_mean_q_clean Float Mean quality score of clean forward reads
nanoplot_r1_mean_q_raw Float Mean quality score of raw forward reads
nanoplot_r1_mean_readlength_clean Float Mean read length of clean forward reads
nanoplot_r1_mean_readlength_raw Float Mean read length of raw forward reads
nanoplot_r1_median_q_clean Float Median quality score of clean forward reads
nanoplot_r1_median_q_raw Float Median quality score of raw forward reads
nanoplot_r1_median_readlength_clean Float Median read length of clean forward reads
nanoplot_r1_median_readlength_raw Float Median read length of raw forward reads
nanoplot_r1_n50_clean Float N50 of clean forward reads
nanoplot_r1_n50_raw Float N50 of raw forward reads
nanoplot_r1_stdev_readlength_clean Float Standard deviation read length of clean forward reads
nanoplot_r1_stdev_readlength_raw Float Standard deviation read length of raw forward reads
nanoplot_tsv_clean File A TSV report describing the clean reads
nanoplot_tsv_raw File A TSV report describing the raw reads
nanoq_version String Version of nanoq used in analysis
primer_bed_name String Name of the primer bed files used for primer trimming
primer_trimmed_read_percent Float Percentage of read data with primers trimmed as determined by iVar trim
samtools_version String The version of SAMtools used to sort and index the alignment file
samtools_version_primtrim String The version of SAMtools used to create the pileup before running iVar trim

Freyja_Plot_PHB

This workflow visualizes aggregated freyja_demixed output files produced by Freyja_FASTQ_PHB in a single plot (pdf format) which provides fractional abundance estimates for all aggregated samples.

Options exist to provide lineage-specific breakdowns and/or sample collection time information.

Inputs

This workflow runs on the set level.

Terra Task Name Variable Type Description Default Value Terra Status
freyja_plot freyja_demixed Array[File] An array containing the output files (freyja_demixed) made by Freyja_FASTQ Required
freyja_plot freyja_plot_name String The name of the plot to be produced. Example: "my-freyja-plot" Required
freyja_plot samplename Array[String] The names of the samples being analyzed Required
freyja_plot collection_date Array[String] An array containing the collection dates for the sample (YYYY-MM-DD format) Optional
freyja_plot_task cpu Int Number of CPUs to allocate to the task 1 Optional
freyja_plot_task disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
freyja_plot_task docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/freyja:1.5.3 Optional
freyja_plot_task memory Int Amount of memory/RAM (in GB) to allocate to the task 2 Optional
freyja_plot_task mincov Int The minimum genome coverage used as a cut-off of data to include in the plot 60 Optional
freyja_plot_task plot_day_window Int The width of the rolling average window; only used if plot_time_interval is "D" 14 Optional
freyja_plot_task plot_lineages Boolean If true, will plot a lineage-specific breakdown FALSE Optional
freyja_plot_task plot_time Boolean If true, will plot sample collection time information (requires the collection_date input variable) FALSE Optional
freyja_plot_task plot_time_interval String Options: "MS" for month, "D" for day MS Optional
version_capture docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0 Optional
version_capture timezone String Set the time zone to get an accurate date of analysis (uses UTC by default) Optional

Analysis Tasks

freyja_plot_task Details

This task will aggregate multiple samples together, and then creates a plot. Several optional inputs dictate the plot appearance (see each variable's description for more information).

Freyja Plot Technical Details

Links
Task wf_freyja_plot.wdl
Software Source Code https://github.com/andersen-lab/Freyja
Software Documentation https://github.com/andersen-lab/Freyja

Outputs

Variable Type Description
freyja_demixed_aggregate File A TSV file that summarizes the freyja_demixed outputs for all samples
freyja_plot File A PDF of the plot produced by the workflow
freyja_plot_metadata File The metadata used to create the plot
freyja_plot_version String The version of Freyja used
freyja_plot_wf_analysis_date String The date of analysis
freyja_plot_wf_version String The version of the Public Health Bioinformatics (PHB) repository used

Freyja_Dashboard_PHB

This workflow creates a group of interactive visualizations based off of the aggregated freyja_demixed output files produced by Freyja_FASTQ_PHB called a "dashboard". Creating this dashboard requires knowing the viral load of your samples (viral copies/litre).

Warning

This dashboard is not "live" — that is, you must rerun the workflow every time you want new data to be included in the visualizations.

Inputs

This workflow runs on the set level.

Terra Task Name Variable Type Description Default Value Terra Status
freyja_dashboard collection_date Array[String] An array containing the collection dates for the sample (YYYY-MM-DD format) Required
freyja_dashboard freyja_dashboard_title String The name of the dashboard to be produced. Example: "my-freyja-dashboard" Required
freyja_dashboard freyja_demixed Array[File] An array containing the output files (freyja_demixed) made by Freyja_FASTQ workflow Required
freyja_dashboard samplename Array[String] The names of the samples being analyzed Required
freyja_dashboard viral_load Array[String] An array containing the number of viral copies per liter Required
freyja_dashboard_task config File (found in the optional section, but is required) A yaml file that applies various configurations to the dashboard, such as grouping lineages together, applying colorings, etc. See also https://github.com/andersen-lab/Freyja/blob/main/freyja/data/plot_config.yml. Optional, Required
freyja_dashboard dashboard_intro_text File A file containing the text to be contained at the top of the dashboard. SARS-CoV-2 lineage de-convolution performed by the Freyja workflow (https://github.com/andersen-lab/Freyja). Optional
freyja_dashboard_task cpu Int Number of CPUs to allocate to the task 1 Optional
freyja_dashboard_task disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
freyja_dashboard_task docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/freyja:1.5.3 Optional
freyja_dashboard_task headerColor String A hex color code to change the color of the header Optional
freyja_dashboard_task memory Int Amount of memory/RAM (in GB) to allocate to the task 2 Optional
freyja_dashboard_task mincov Float The minimum genome coverage used as a cut-off of data to include in the dashboard. Default is set to 60 by the freyja command-line tool (not a WDL task default, per se) Optional
freyja_dashboard_task scale_by_viral_load Boolean If set to true, averages samples taken the same day while taking viral load into account FALSE Optional
freyja_dashboard_task thresh Float The minimum lineage abundance cut-off value Optional
version_capture docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0 Optional
version_capture timezone String Set the time zone to get an accurate date of analysis (uses UTC by default) Optional

Analysis Tasks

freyja_dashboard_task Details

This task will aggregate multiple samples together, and then create an interactive HTML visualization. Several optional inputs dictate the dashboard appearance (see each variable's description for more information).

Freyja Dashboard Technical Details

Links
Task wf_freyja_dashboard.wdl
Software Source Code https://github.com/andersen-lab/Freyja
Software Documentation https://github.com/andersen-lab/Freyja

Outputs

Variable Type Description
freyja_dashboard File The HTML file of the dashboard created
freyja_dashboard_metadata File The metadata used to create the dashboard
freyja_dashboard_version String The version of Freyja used
freyja_dashboard_wf_analysis_date String The date of analysis
freyja_dashboard_wf_version String The version of the Public Health Bioinformatics (PHB) repository used
freyja_demixed_aggregate File A TSV file that summarizes the freyja_demixed outputs for all samples

Running Freyja on other pathogens

Experimental Feature

Please be aware this is an experimental feature and we cannot guarantee complete functionality at this time.

The main requirement to run Freyja on other pathogens is the existence of a barcode file for your pathogen of interest. Currently, barcodes exist for the following organisms:

  • SARS-CoV-2 (default)
  • FLU-B-VIC
  • H1N1
  • H3N2
  • H5Nx-cattle
  • H5NX
  • MEASLESN450
  • MEASLESgenome
  • MPX
  • RSVa
  • RSVb

Freyja barcodes for other pathogens

Data for various pathogens can be found in the following repository: Freyja Barcodes

Folders are organized by pathogen, with each subfolder named after the date the barcode was generated, using the format YYYY-MM-DD, as well as a "latest" folder. Barcode files are named barcode.csv, and reference genome files are named reference.fasta.

There are two ways to run Freyja_FASTQ_PHB for non-SARS-CoV-2 organisms:

  • Using the freyja_pathogen optional input (limited set of allowable organisms)
  • Providing the appropriate barcode file through the freyja_barcodes optional input (any organism for which barcodes are supplied)

Using the freyja_pathogen flag

When using the freyja_pathogen flag, the user must set the optional update_db flag to true, so that the latest version of the barcode file is automatically downloaded by Freyja.

Figure 2: Optional input for Freyja_FASTQ_PHB to provide the pathogen to be used by Freyja

Figure 2

**Figure 2:  Optional input for Freyja_FASTQ_PHB to provide the pathogen to be used by Freyja.**

Allowed options:

  • SARS-CoV-2 (default)
  • MPXV
  • H1N1pdm
  • H5NX
  • FLU-B-VIC
  • MEASLESN450
  • MEASLES
  • RSVa
  • RSVb

Warning

The freyja_pathogen flag is not used if a barcodes file is provided. This means that this option is ignored if a barcode file is provided through freyja_barcodes.

Providing the appropriate barcode file

The appropriate barcode file for your organism of interest and reference sequence need to be downloaded and uploaded to your Terra.bio workspace. When running Freyja_FASTQ_PHB, the appropriate reference and barcodes file need to be passed as inputs. The first is a required input and will show up at the top of the workflows inputs page on Terra.bio (Figure 3).

Figure 3: Required input for Freyja_FASTQ_PHB to provide the reference genome to be used by Freyja

Figure 3

**Figure 3:  Required input for Freyja_FASTQ_PHB to provide the reference genome to be used by Freyja.**

The barcodes file can be passed directly to Freyja by the freyja_barcodes optional input (Figure 4).

Figure 4: Optional input for Freyja_FASTQ_PHB to provide the barcodes file to be used by Freyja

Figure 4

**Figure 4: Optional input for Freyja_FASTQ_PHB to provide the barcodes file to be used by Freyja.**

References

If you use any of the Freyja workflows, please cite:

Karthikeyan, S., Levy, J.I., De Hoff, P. et al. Wastewater sequencing reveals early cryptic SARS-CoV-2 variant transmission. Nature 609, 101–108 (2022). https://doi.org/10.1038/s41586-022-05049-6

Freyja source code can be found at https://github.com/andersen-lab/Freyja

Freyja barcodes (non-SARS-CoV-2): https://github.com/gp201/Freyja-barcodes