TheiaCoV Workflow Series¶
Quick Facts¶
Workflow Type | Applicable Kingdom | Last Known Changes | Command-line Compatibility | Workflow Level |
---|---|---|---|---|
Genomic Characterization | HIV, Influenza, Monkeypox virus, RSV-A, RSV-B, SARS-CoV-2, Viral, WNV | vX.X.X | Some optional features incompatible, Yes | Sample-level, Set-level |
TheiaCoV Workflows¶
The TheiaCoV workflows are for the assembly, quality assessment, and characterization of viral genomes. There are currently five TheiaCoV workflows designed to accommodate different kinds of input data:
- Illumina paired-end sequencing (TheiaCoV_Illumina_PE)
- Illumina single-end sequencing (TheiaCoV_Illumina_SE)
- ONT sequencing (TheiaCoV_ONT)
- Genome assemblies (TheiaCoV_FASTA)
- ClearLabs sequencing (TheiaCoV_ClearLabs)
Additionally, the TheiaCoV_FASTA_Batch workflow is available to process several hundred SARS-CoV-2 assemblies at the same time.
Key Resources
Reference Materials for SARS-CoV-2
Reference Materials for non-default viruses
HIV Input JSONs
WNV Input JSONs
Flu Input JSONs
Supported Organisms¶
These workflows currently support the following organisms. The first option in the list (bolded) is what our workflows use as the standardized organism name:
- SARS-CoV-2 (
"sars-cov-2"
,"SARS-CoV-2"
) - default organism input - Monkeypox virus (
"MPXV"
,"mpox"
,"monkeypox"
,"Monkeypox virus"
,"Mpox"
) - Human Immunodeficiency Virus (
"HIV"
) - West Nile Virus (
"WNV"
,"wnv"
,"West Nile virus"
) - Influenza (
"flu"
,"influenza"
,"Flu"
,"Influenza"
) - RSV-A (
"rsv_a"
,"rsv-a"
,"RSV-A"
,"RSV_A"
) - RSV-B (
"rsv_b"
,"rsv-b"
,"RSV-B"
,"RSV_B"
)
The compatibility of each workflow with each pathogen is shown below:
SARS-CoV-2 | Mpox | HIV | WNV | Influenza | RSV-A | RSV-B | |
---|---|---|---|---|---|---|---|
Illumina_PE | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Illumina_SE | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ |
ClearLabs | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
ONT | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ |
FASTA | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ |
We've provided the following information to help you set up the workflow for each organism in the form of input JSONs.
Inputs¶
Input Data
The TheiaCoV_Illumina_PE workflow takes in Illumina paired-end read data. Read file names should end with .fastq
or .fq
, with the optional addition of .gz
. When possible, Theiagen recommends zipping files with gzip before Terra uploads to minimize data upload time.
By default, the workflow anticipates 2 x 150bp reads (i.e. the input reads were generated using a 300-cycle sequencing kit). Modifications to the optional parameter for trim_minlen
may be required to accommodate shorter read data, such as the 2 x 75bp reads generated using a 150-cycle sequencing kit.
TheiaCoV_Illumina_SE takes in Illumina single-end reads. Read file names should end with .fastq
or .fq
, with the optional addition of .gz
. Theiagen highly recommends zipping files with gzip before uploading to Terra to minimize data upload time & save on storage costs.
By default, the workflow anticipates 1 x 35 bp reads (i.e. the input reads were generated using a 70-cycle sequencing kit). Modifications to the optional parameter for trim_minlen
may be required to accommodate longer read data.
The TheiaCoV_ONT workflow takes in base-called ONT read data. Read file names should end with .fastq
or .fq
, with the optional addition of .gz
. When possible, Theiagen recommends zipping files with gzip before uploading to Terra to minimize data upload time.
The ONT sequencing kit and base-calling approach can produce substantial variability in the amount and quality of read data. Genome assemblies produced by the TheiaCoV_ONT workflow must be quality assessed before reporting results.
The TheiaCoV_FASTA workflow takes in assembly files in FASTA format.
The TheiaCoV_ClearLabs workflow takes in read data produced by the Clear Dx platform from ClearLabs. However, many users use the TheiaCoV_FASTA workflow instead of this one due to a few known issues when generating assemblies with this pipeline that are not present when using ClearLabs-generated FASTA files.
The TheiaCoV_FASTA_Batch workflow takes in a set of assembly files in FASTA format.
Terra Task Name | Variable | Type | Description | Default Value | Terra Status | Organism |
---|---|---|---|---|---|---|
theiacov_illumina_pe | read1 | File | Illumina forward read file in FASTQ file format (compression optional) | Required | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
theiacov_illumina_pe | read2 | File | Illumina reverse read file in FASTQ file format (compression optional) | Required | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | samplename | String | The name of the sample being analyzed | Required | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
clean_check_reads | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
clean_check_reads | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
clean_check_reads | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/bactopia/gather_samples:2.0.2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
clean_check_reads | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
consensus_qc | cpu | Int | Number of CPUs to allocate to the task | 1 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
consensus_qc | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
consensus_qc | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/utility:1.1 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
consensus_qc | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
flu_track | abricate_flu_cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | flu |
flu_track | abricate_flu_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | flu |
flu_track | abricate_flu_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/abricate:1.0.1-insaflu-220727 | Optional | flu |
flu_track | abricate_flu_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional | flu |
flu_track | abricate_flu_min_percent_coverage | Int | Minimum DNA percent coverage | 60 | Optional | flu |
flu_track | abricate_flu_min_percent_identity | Int | Minimum DNA percent identity | 70 | Optional | flu |
flu_track | antiviral_aa_subs | String | Additional list of antiviral resistance associated amino acid substitutions of interest to be searched against those called on the sample segments. They take the format of :, e.g. NA:A26V | Optional | flu | |
flu_track | assembly_metrics_cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | flu |
flu_track | assembly_metrics_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | flu |
flu_track | assembly_metrics_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/samtools:1.15 | Optional | flu |
flu_track | assembly_metrics_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | flu |
flu_track | flu_h1_ha_ref | File | Internal component, do not modify | Optional | flu | |
flu_track | flu_h1n1_m2_ref | File | Internal component, do not modify | Optional | flu | |
flu_track | flu_h3_ha_ref | File | Internal component, do not modify | Optional | flu | |
flu_track | flu_h3n2_m2_ref | File | Internal component, do not modify | Optional | flu | |
flu_track | flu_n1_na_ref | File | Internal component, do not modify | Optional | flu | |
flu_track | flu_n2_na_ref | File | Internal component, do not modify | Optional | flu | |
flu_track | flu_pa_ref | File | Internal component, do not modify | Optional | flu | |
flu_track | flu_pb1_ref | File | Internal component, do not modify | Optional | flu | |
flu_track | flu_pb2_ref | File | Internal component, do not modify | Optional | flu | |
flu_track | flu_subtype | String | The influenza subtype being analyzed. Used for picking nextclade datasets. Options: "Yamagata", "Victoria", "H1N1", "H3N2", "H5N1". Only use to override the subtype call from IRMA and ABRicate. | Optional | flu | |
flu_track | genoflu_cpu | Int | Number of CPUs to allocate to the task | 1 | Optional | flu |
flu_track | genoflu_cross_reference | File | An Excel file to cross-reference BLAST findings; probably useful if novel genotypes are not in the default file used by genoflu.py | Optional | flu | |
flu_track | genoflu_disk_size | Int | Amount of storage (in GB) to allocate to the task | 25 | Optional | flu |
flu_track | genoflu_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/genoflu:1.06 | Optional | flu |
flu_track | genoflu_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional | flu |
flu_track | genoflu_min_percent_identity | Float | Percent identity threshold used for calling matches for each genome segment that make up the final GenoFlu genotype | 98 | Optional | flu |
flu_track | irma_cpu | Int | Number of CPUs to allocate to the task | 4 | Optional | flu |
flu_track | irma_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | flu |
flu_track | irma_docker_image | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/irma:1.2.0 | Optional | flu |
flu_track | irma_keep_ref_deletions | Boolean | True/False variable that determines if sites missed (i.e. 0 reads for a site in the reference genome) during read gathering should be deleted by ambiguation by inserting N's or deleting the sequence entirely. False sets this IRMA paramater to "DEL" and true sets it to "NNN" | TRUE | Optional | flu |
flu_track | irma_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 16 | Optional | flu |
flu_track | irma_min_ambiguous_threshold | Float | Minimum called Single Nucleotide Variant (SNV) frequency for mixed based calls in the output consensus assembly (AKA amended consensus). | 0.2 | Optional | flu |
flu_track | irma_min_avg_consensus_allele_quality | Int | Minimum allele coverage depth to call plurality consensus, otherwise calls "N". Setting this value too high can negatively impact final amended consensus. | 10 | Optional | flu |
flu_track | irma_min_read_length | Int | Minimum read length to include reads in read gathering step in IRMA. This value should not be greater than the typical read length. | 75 | Optional | flu |
flu_track | nextclade_cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | flu |
flu_track | nextclade_disk_size | Int | Amount of storage (in GB) to allocate to the task | 50 | Optional | flu |
flu_track | nextclade_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/nextstrain/nextclade:3.10.2 | Optional | flu |
flu_track | nextclade_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional | flu |
flu_track | nextclade_output_parser_cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
flu_track | nextclade_output_parser_disk_size | Int | Amount of storage (in GB) to allocate to the task | 50 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
flu_track | nextclade_output_parser_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/python/python:3.8.18-slim | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
flu_track | nextclade_output_parser_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
gene_coverage | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | MPXV, sars-cov-2 |
gene_coverage | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | MPXV, sars-cov-2 |
gene_coverage | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/samtools:1.15 | Optional | MPXV, sars-cov-2 |
gene_coverage | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | MPXV, sars-cov-2 |
gene_coverage | min_depth | Int | The minimum depth to determine if a position was covered. | 10 | Optional | MPXV, sars-cov-2 |
gene_coverage | sc2_s_gene_start | Int | start nucleotide position of the SARS-CoV-2 Spike gene | 21563 | Optional | MPXV, sars-cov-2 |
gene_coverage | sc2_s_gene_stop | Int | End/Last nucleotide position of the SARS-CoV-2 Spike gene | 25384 | Optional | MPXV, sars-cov-2 |
ivar_consensus | ivar_bwa_cpu | Int | Number of CPUs to allocate to the task | 6 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | ivar_bwa_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | ivar_bwa_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/ivar:1.3.1-titan | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | ivar_bwa_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 16 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | ivar_consensus_cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | ivar_consensus_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | ivar_consensus_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/ivar:1.3.1-titan | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | ivar_consensus_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | ivar_trim_primers_cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | ivar_trim_primers_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | ivar_trim_primers_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/ivar:1.3.1-titan | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | ivar_trim_primers_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | ivar_variant_cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | ivar_variant_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | ivar_variant_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/ivar:1.3.1-titan | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | ivar_variant_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | skip_N | Boolean | True/False variable that determines if regions with depth less than minimum depth should not be added to the consensus sequence | FALSE | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | stats_n_coverage_cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | stats_n_coverage_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | stats_n_coverage_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/samtools:1.15 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | stats_n_coverage_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | stats_n_coverage_primtrim_cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | stats_n_coverage_primtrim_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | stats_n_coverage_primtrim_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/samtools:1.15 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | stats_n_coverage_primtrim_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
nextclade_output_parser | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_output_parser | disk_size | Int | Amount of storage (in GB) to allocate to the task | 50 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_output_parser | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/python/python:3.8.18-slim | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_output_parser | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | auspice_reference_tree_json | File | An Auspice JSON phylogenetic reference tree which serves as a target for phylogenetic placement. | Inherited from nextclade dataset | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | disk_size | Int | Amount of storage (in GB) to allocate to the task | 50 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/nextstrain/nextclade:3.10.2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | gene_annotations_gff | File | A genome annotation to specify how to translate the nucleotide sequence to proteins (genome_annotation.gff3). specifying this enables codon-informed alignment and protein alignments. See here for more info: https://docs.nextstrain.org/projects/nextclade/en/latest/user/input-files/03-genome-annotation.html | Inherited from nextclade dataset | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | input_ref | File | A nucleotide sequence which serves as a reference for the pairwise alignment of all input sequences. This is also the sequence which defines the coordinate system of the genome annotation. See here for more info: https://docs.nextstrain.org/projects/nextclade/en/latest/user/input-files/02-reference-sequence.html | Inherited from nextclade dataset | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | nextclade_pathogen_json | File | General dataset configuration file. See here for more info: https://docs.nextstrain.org/projects/nextclade/en/latest/user/input-files/05-pathogen-config.html | Inherited from nextclade dataset | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | verbosity | String | other options are: "off" , "error" , "info" , "debug" , and "trace" (highest level of verbosity) | warn | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
organism_parameters | auspice_config | File | Auspice config file for customizing visualizations in the Augur_PHB workflow; takes priority over the other customization values available for augur_export. Defaults are set for various organisms & flu segments. A minimal auspice config file is set in cases where organism is not specified and user does not provide an optional input config file. | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
organism_parameters | flu_segment | String | Influenza genome segment being analyzed. Options: "HA" or "NA". Automatically determined. This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs | N/A | Optional | flu |
organism_parameters | flu_subtype | String | The influenza subtype being analyzed. Options: "Yamagata", "Victoria", "H1N1", "H3N2", "H5N1". Automatically determined. This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs | N/A | Optional | flu |
organism_parameters | hiv_primer_version | String | The version of HIV primers used. Options are "https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl#L156" and "https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl#L164". This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs | v1 | Optional | HIV |
organism_parameters | vadr_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 32 (RSV-A, RSV-B, WNV) and 16 (all other TheiaCoV organisms) | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
pangolin4 | analysis_mode | String | Used to switch between usher and pangolearn analysis modes. Only use usher because pangolearn is no longer supported as of Pangolin v4.3 and higher versions. | Optional | sars-cov-2 | |
pangolin4 | cpu | Int | Number of CPUs to allocate to the task | 4 | Optional | sars-cov-2 |
pangolin4 | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | sars-cov-2 |
pangolin4 | expanded_lineage | Boolean | True/False that determines if a lineage should be expanded without aliases (e.g., BA.1 → B.1.1.529.1) | TRUE | Optional | sars-cov-2 |
pangolin4 | max_ambig | Float | The maximum proportion of Ns allowed for pangolin to attempt an assignment | 0.5 | Optional | sars-cov-2 |
pangolin4 | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | sars-cov-2 |
pangolin4 | min_length | Int | Minimum query length allowed for pangolin to attempt an assignment | 10000 | Optional | sars-cov-2 |
pangolin4 | pangolin_arguments | String | Optional arguments for pangolin e.g. ''--skip-scorpio'' | Optional | sars-cov-2 | |
pangolin4 | skip_designation_cache | Boolean | A True/False option that determines if the designation cache should be used | FALSE | Optional | sars-cov-2 |
pangolin4 | skip_scorpio | Boolean | A True/False option that determines if scorpio should be skipped. | FALSE | Optional | sars-cov-2 |
qc_check_task | ani_highest_percent | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | ani_highest_percent_bases_aligned | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | assembly_length | Int | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | busco_results | String | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | combined_mean_q_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | combined_mean_q_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | combined_mean_readlength_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | combined_mean_readlength_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | cpu | Int | Number of CPUs to allocate to the task | 4 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
qc_check_task | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
qc_check_task | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-03-16 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
qc_check_task | est_coverage_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | est_coverage_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | gambit_predicted_taxon | String | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | kraken_sc2 | String | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | kraken_sc2_dehosted | String | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | kraken_target_organism | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | kraken_target_organism_dehosted | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
qc_check_task | midas_secondary_genus_abundance | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | midas_secondary_genus_coverage | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | n50_value | Int | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | number_contigs | Int | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | quast_gc_percent | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r1_mean_q_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r1_mean_q_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r1_mean_readlength_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r1_mean_readlength_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r2_mean_q_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r2_mean_q_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r2_mean_readlength_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r2_mean_readlength_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | sc2_s_gene_mean_coverage | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | sc2_s_gene_percent_coverage | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
quasitools_illumina_pe | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | HIV |
quasitools_illumina_pe | disk_size | Int | Amount of storage (in GB) to allocate to the task | 50 | Optional | HIV |
quasitools_illumina_pe | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/biocontainers/quasitools:0.7.0--pyh864c0ab_1 | Optional | HIV |
quasitools_illumina_pe | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional | HIV |
raw_check_reads | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
raw_check_reads | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
raw_check_reads | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/bactopia/gather_samples:2.0.2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
raw_check_reads | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | bbduk_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | call_kraken | Boolean | True/False variable that determines if the Kraken2 task should be called; for non-TheiaCoV workflows, the kraken_db variable must be provided. |
FALSE | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | call_midas | Boolean | True/False variable that determines if the MIDAS task should be called. | FALSE | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | fastp_args | String | Additional arguments to use with fastp | --detect_adapter_for_pe -g -5 20 -3 20 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | kraken_cpu | Int | Number of CPUs to allocate to the task | 4 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | kraken_db | File | A kraken2 database to use with the kraken2 optional task. The file must be a .tar.gz kraken2 database. Must contain human and viral sequences | gs://theiagen-large-public-files-rp/terra/databases/kraken2/kraken2_humanGRCh38_viralRefSeq_20240828.tar.gz | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | kraken_disk_size | Int | Amount of storage (in GB) to allocate to the task. Increase this when using large (>30GB) kraken2 databases such as the "k2_standard" database | 100 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | kraken_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | midas_db | File | Internal component, do not modify | Optional | ||
read_QC_trim | read_processing | String | The name of the tool to perform basic read processing; options: "trimmomatic" or "fastp" | trimmomatic | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | read_qc | String | The tool used for quality control (QC) of reads. Options are "fastq_scan" (default) and "fastqc" | fastq_scan | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | target_organism | String | This string is searched for in the kraken2 outputs to extract the read percentage | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
read_QC_trim | trimmomatic_args | String | Additional arguments to pass to trimmomatic. "-phred33" specifies the Phred Q score encoding which is almost always phred33 with modern sequence data. | -phred33 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
vadr | cpu | Int | Number of CPUs to allocate to the task | 4 | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
vadr | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
vadr | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/vadr:1.5.1 | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
vadr | min_length | Int | Minimum length subsequence to possibly replace Ns for the fasta-trim-terminal-ambigs.pl VADR script | 50 | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
version_capture | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0 | Optional | |
version_capture | timezone | String | Set the time zone to get an accurate date of analysis (uses UTC by default) | Optional | ||
workflow name | adapters | File | A FASTA file containing adapter sequences | /bbmap/resources/adapters.fa | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | consensus_min_freq | Float | The minimum frequency for a variant to be called a SNP in consensus genome | 0.6 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | genome_length | Int | User-specified expected genome length to be used in genome statistics calculations | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | max_genome_length | Int | Maximum genome length able to pass read screening | 2673870 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | min_basepairs | Int | Minimum number of base pairs able to pass read screening | 34000 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | min_coverage | Int | Minimum genome coverage able to pass read screening | 10 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | min_depth | Int | Minimum depth of reads required to call variants and generate a consensus genome. This value is passed to the iVar software. | 100 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | min_genome_length | Int | Minimum genome length to pass read screening | 1700 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | min_proportion | Int | Minimum proportion of total reads in each read file to pass read screening | 40 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | min_reads | Int | Minimum number of reads to pass read screening | 113 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | nextclade_dataset_name | String | Nextclade organism dataset names. However, if organism input is set correctly, this input will be automatically assigned the corresponding dataset name. See organism defaults for more information | Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments) here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | nextclade_dataset_tag | String | Nextclade dataset tag. Used for pulling up-to-date reference genomes and associated information specific to nextclade datasets (QC thresholds, organism-specific information like SARS-CoV-2 clade & lineage information, etc.) that is required for running the Nextclade tool. | Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments) here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | organism | String | The organism that is being analyzed. Options: "sars-cov-2", "MPXV", "WNV", "HIV", "flu", "rsv_a", "rsv_b". However, "flu" is not available for TheiaCoV_Illumina_SE | sars-cov-2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | pangolin_docker_image | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/pangolin:4.3.1-pdata-1.33 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | phix | File | File that contains the phix used | /bbmap/resources/phix174_ill.ref.fa.gz | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | primer_bed | File | The bed file containing the primers used when sequencing was performed | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 | |
workflow name | qc_check_table | File | TSV value with taxons for rows and QC values for columns; internal cells represent user-determined QC thresholds; if provided, turns on the QC Check task. See below for an example QC Check table. | Optional | ||
workflow name | reference_gene_locations_bed | File | Use to provide locations of interest where average coverage will be calculated | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | reference_genome | File | An optional reference genome used for consensus assembly and QC | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | reference_gff | File | The general feature format (gff) of the reference genome. | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | seq_method | String | The sequencing methodology used to generate the input read data; for TheiaProk workflows, this input will be used in the "seq_id" column in any taxon-specific tables created in the Export Taxon Tables task | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | skip_screen | Boolean | Set to True to skip the read screening prior to analysis | FALSE | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | target_organism | String | The organism whose abundance the user wants to check in their reads. This should be a proper taxonomic name recognized by the Kraken database. | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | trim_min_length | Int | Specifies minimum length of each read after trimming to be kept | 75 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | trim_primers | Boolean | A True/False option that determines if primers should be trimmed. | TRUE | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | trim_quality_min_score | Int | Specifies the minimum average quality of bases in a sliding window to be kept | 30 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | trim_window_size | Int | Specifies window size for trimming (the number of bases to average the quality across) | 4 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | vadr_max_length | Int | Maximum length of contig allowed to run VADR | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | vadr_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 32 (RSV-A and RSV-B) and 8 (all other TheiaCoV organisms) | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | vadr_options | String | Additional options to provide to VADR | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | vadr_skip_length | Int | Minimum assembly length (unambiguous) to run VADR | 10000 | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | variant_min_freq | Float | Minimum frequency for a variant to be reported in ivar outputs | 0.6 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
Terra Task Name | Variable | Type | Description | Default Value | Terra Status | Organism |
---|---|---|---|---|---|---|
theiacov_illumina_se | read1 | File | Illumina forward read file in FASTQ file format (compression optional) | Required | MPXV, WNV, sars-cov-2 | |
workflow name | samplename | String | The name of the sample being analyzed | Required | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
clean_check_reads | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
clean_check_reads | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
clean_check_reads | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/bactopia/gather_samples:2.0.2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
clean_check_reads | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
consensus_qc | cpu | Int | Number of CPUs to allocate to the task | 1 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
consensus_qc | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
consensus_qc | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/utility:1.1 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
consensus_qc | genome_length | Int | Internal component, do not modify | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 | |
consensus_qc | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
flu_track | flu_subtype | String | The influenza subtype being analyzed. Used for picking nextclade datasets. Options: "Yamagata", "Victoria", "H1N1", "H3N2", "H5N1". Only use to override the subtype call from IRMA and ABRicate. | Optional | flu | |
flu_track | nextclade_output_parser_cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
flu_track | nextclade_output_parser_disk_size | Int | Amount of storage (in GB) to allocate to the task | 50 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
flu_track | nextclade_output_parser_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/python/python:3.8.18-slim | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
flu_track | nextclade_output_parser_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
gene_coverage | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | MPXV, sars-cov-2 |
gene_coverage | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | MPXV, sars-cov-2 |
gene_coverage | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/samtools:1.15 | Optional | MPXV, sars-cov-2 |
gene_coverage | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | MPXV, sars-cov-2 |
gene_coverage | min_depth | Int | The minimum depth to determine if a position was covered. | 10 | Optional | MPXV, sars-cov-2 |
gene_coverage | sc2_s_gene_start | Int | start nucleotide position of the SARS-CoV-2 Spike gene | 21563 | Optional | MPXV, sars-cov-2 |
gene_coverage | sc2_s_gene_stop | Int | End/Last nucleotide position of the SARS-CoV-2 Spike gene | 25384 | Optional | MPXV, sars-cov-2 |
ivar_consensus | ivar_bwa_cpu | Int | Number of CPUs to allocate to the task | 6 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | ivar_bwa_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | ivar_bwa_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/ivar:1.3.1-titan | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | ivar_bwa_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 16 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | ivar_consensus_cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | ivar_consensus_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | ivar_consensus_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/ivar:1.3.1-titan | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | ivar_consensus_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | ivar_trim_primers_cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | ivar_trim_primers_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | ivar_trim_primers_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/ivar:1.3.1-titan | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | ivar_trim_primers_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | ivar_variant_cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | ivar_variant_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | ivar_variant_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/ivar:1.3.1-titan | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | ivar_variant_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | read2 | File | Internal component, do not modify | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 | |
ivar_consensus | skip_N | Boolean | True/False variable that determines if regions with depth less than minimum depth should not be added to the consensus sequence | FALSE | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | stats_n_coverage_cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | stats_n_coverage_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | stats_n_coverage_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/samtools:1.15 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | stats_n_coverage_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | stats_n_coverage_primtrim_cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | stats_n_coverage_primtrim_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | stats_n_coverage_primtrim_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/samtools:1.15 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
ivar_consensus | stats_n_coverage_primtrim_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | auspice_reference_tree_json | File | An Auspice JSON phylogenetic reference tree which serves as a target for phylogenetic placement. | Inherited from nextclade dataset | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | disk_size | Int | Amount of storage (in GB) to allocate to the task | 50 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/nextstrain/nextclade:3.10.2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | gene_annotations_gff | File | A genome annotation to specify how to translate the nucleotide sequence to proteins (genome_annotation.gff3). specifying this enables codon-informed alignment and protein alignments. See here for more info: https://docs.nextstrain.org/projects/nextclade/en/latest/user/input-files/03-genome-annotation.html | Inherited from nextclade dataset | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | input_ref | File | A nucleotide sequence which serves as a reference for the pairwise alignment of all input sequences. This is also the sequence which defines the coordinate system of the genome annotation. See here for more info: https://docs.nextstrain.org/projects/nextclade/en/latest/user/input-files/02-reference-sequence.html | Inherited from nextclade dataset | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | nextclade_pathogen_json | File | General dataset configuration file. See here for more info: https://docs.nextstrain.org/projects/nextclade/en/latest/user/input-files/05-pathogen-config.html | Inherited from nextclade dataset | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | verbosity | String | other options are: "off" , "error" , "info" , "debug" , and "trace" (highest level of verbosity) | warn | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
organism_parameters | auspice_config | File | Auspice config file for customizing visualizations in the Augur_PHB workflow; takes priority over the other customization values available for augur_export. Defaults are set for various organisms & flu segments. A minimal auspice config file is set in cases where organism is not specified and user does not provide an optional input config file. | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
organism_parameters | flu_segment | String | Influenza genome segment being analyzed. Options: "HA" or "NA". Automatically determined. This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs | N/A | Optional | flu |
organism_parameters | flu_subtype | String | The influenza subtype being analyzed. Options: "Yamagata", "Victoria", "H1N1", "H3N2", "H5N1". Automatically determined. This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs | N/A | Optional | flu |
organism_parameters | hiv_primer_version | String | The version of HIV primers used. Options are "https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl#L156" and "https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl#L164". This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs | v1 | Optional | HIV |
organism_parameters | kraken_target_organism_input | String | The organism whose abundance the user wants to check in their reads. This should be a proper taxonomic name recognized by the Kraken database. | Default provided for mpox (Monkeypox virus), WNV (West Nile virus), and HIV (Human immunodeficiency virus 1) | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
organism_parameters | vadr_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 32 (RSV-A, RSV-B, WNV) and 16 (all other TheiaCoV organisms) | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
pangolin4 | analysis_mode | String | Used to switch between usher and pangolearn analysis modes. Only use usher because pangolearn is no longer supported as of Pangolin v4.3 and higher versions. | Optional | sars-cov-2 | |
pangolin4 | cpu | Int | Number of CPUs to allocate to the task | 4 | Optional | sars-cov-2 |
pangolin4 | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | sars-cov-2 |
pangolin4 | expanded_lineage | Boolean | True/False that determines if a lineage should be expanded without aliases (e.g., BA.1 → B.1.1.529.1) | TRUE | Optional | sars-cov-2 |
pangolin4 | max_ambig | Float | The maximum proportion of Ns allowed for pangolin to attempt an assignment | 0.5 | Optional | sars-cov-2 |
pangolin4 | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | sars-cov-2 |
pangolin4 | min_length | Int | Minimum query length allowed for pangolin to attempt an assignment | 10000 | Optional | sars-cov-2 |
pangolin4 | pangolin_arguments | String | Optional arguments for pangolin e.g. ''--skip-scorpio'' | Optional | sars-cov-2 | |
pangolin4 | skip_designation_cache | Boolean | A True/False option that determines if the designation cache should be used | FALSE | Optional | sars-cov-2 |
pangolin4 | skip_scorpio | Boolean | A True/False option that determines if scorpio should be skipped. | FALSE | Optional | sars-cov-2 |
qc_check_task | ani_highest_percent | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | ani_highest_percent_bases_aligned | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | assembly_length | Int | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | busco_results | String | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | combined_mean_q_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | combined_mean_q_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | combined_mean_readlength_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | combined_mean_readlength_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | cpu | Int | Number of CPUs to allocate to the task | 4 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
qc_check_task | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
qc_check_task | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-03-16 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
qc_check_task | est_coverage_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | est_coverage_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | gambit_predicted_taxon | String | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | kraken_human | String | Internal component, do not modify | Optional | ||
qc_check_task | kraken_human_dehosted | String | Internal component, do not modify | Optional | ||
qc_check_task | kraken_sc2 | String | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | kraken_sc2_dehosted | String | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | kraken_target_organism | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | kraken_target_organism_dehosted | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
qc_check_task | midas_secondary_genus_abundance | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | midas_secondary_genus_coverage | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | n50_value | Int | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | num_reads_clean2 | Int | Internal component, do not modify | Optional | ||
qc_check_task | num_reads_raw2 | Int | Internal component, do not modify | Optional | ||
qc_check_task | number_contigs | Int | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | quast_gc_percent | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r1_mean_q_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r1_mean_q_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r1_mean_readlength_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r1_mean_readlength_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r2_mean_q_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r2_mean_q_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r2_mean_readlength_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r2_mean_readlength_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | sc2_s_gene_mean_coverage | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | sc2_s_gene_percent_coverage | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
raw_check_reads | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
raw_check_reads | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
raw_check_reads | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/bactopia/gather_samples:2.0.2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
raw_check_reads | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | bbduk_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | call_kraken | Boolean | True/False variable that determines if the Kraken2 task should be called; for non-TheiaCoV workflows, the kraken_db variable must be provided. |
FALSE | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | call_midas | Boolean | True/False variable that determines if the MIDAS task should be called. | FALSE | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | fastp_args | String | Additional arguments to use with fastp | -g -5 20 -3 20 | Optional | |
read_QC_trim | kraken_cpu | Int | Number of CPUs to allocate to the task | 4 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | kraken_db | File | A kraken2 database to use with the kraken2 optional task. The file must be a .tar.gz kraken2 database. Must contain human and viral sequences | gs://theiagen-large-public-files-rp/terra/databases/kraken2/kraken2_humanGRCh38_viralRefSeq_20240828.tar.gz | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | kraken_disk_size | Int | Amount of storage (in GB) to allocate to the task. Increase this when using large (>30GB) kraken2 databases such as the "k2_standard" database | 100 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | kraken_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | midas_db | File | Internal component, do not modify | Optional | ||
read_QC_trim | read_processing | String | The name of the tool to perform basic read processing; options: "trimmomatic" or "fastp" | trimmomatic | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | read_qc | String | The tool used for quality control (QC) of reads. Options are "fastq_scan" (default) and "fastqc" | fastq_scan | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | target_organism | String | This string is searched for in the kraken2 outputs to extract the read percentage | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
read_QC_trim | trimmomatic_args | String | Additional arguments to pass to trimmomatic. "-phred33" specifies the Phred Q score encoding which is almost always phred33 with modern sequence data. | -phred33 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
vadr | cpu | Int | Number of CPUs to allocate to the task | 4 | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
vadr | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
vadr | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/vadr:1.5.1 | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
vadr | min_length | Int | Minimum length subsequence to possibly replace Ns for the fasta-trim-terminal-ambigs.pl VADR script | 50 | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
version_capture | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0 | Optional | |
version_capture | timezone | String | Set the time zone to get an accurate date of analysis (uses UTC by default) | Optional | ||
workflow name | adapters | File | A FASTA file containing adapter sequences | /bbmap/resources/adapters.fa | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | consensus_min_freq | Float | The minimum frequency for a variant to be called a SNP in consensus genome | 0.6 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | genome_length | Int | User-specified expected genome length to be used in genome statistics calculations | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | max_genome_length | Int | Maximum genome length able to pass read screening | 2673870 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | min_basepairs | Int | Minimum number of base pairs able to pass read screening | 34000 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | min_coverage | Int | Minimum genome coverage able to pass read screening | 10 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | min_depth | Int | Minimum depth of reads required to call variants and generate a consensus genome. This value is passed to the iVar software. | 100 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | min_genome_length | Int | Minimum genome length to pass read screening | 1700 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | min_reads | Int | Minimum number of reads to pass read screening | 113 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | nextclade_dataset_name | String | Nextclade organism dataset names. However, if organism input is set correctly, this input will be automatically assigned the corresponding dataset name. See organism defaults for more information | Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments) here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | nextclade_dataset_tag | String | Nextclade dataset tag. Used for pulling up-to-date reference genomes and associated information specific to nextclade datasets (QC thresholds, organism-specific information like SARS-CoV-2 clade & lineage information, etc.) that is required for running the Nextclade tool. | Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments) here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | organism | String | The organism that is being analyzed. Options: "sars-cov-2", "MPXV", "WNV", "HIV", "flu", "rsv_a", "rsv_b". However, "flu" is not available for TheiaCoV_Illumina_SE | sars-cov-2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | pangolin_docker_image | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/pangolin:4.3.1-pdata-1.33 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | phix | File | File that contains the phix used | /bbmap/resources/phix174_ill.ref.fa.gz | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | primer_bed | File | The bed file containing the primers used when sequencing was performed | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 | |
workflow name | qc_check_table | File | TSV value with taxons for rows and QC values for columns; internal cells represent user-determined QC thresholds; if provided, turns on the QC Check task. See below for an example QC Check table. | Optional | ||
workflow name | reference_gene_locations_bed | File | Use to provide locations of interest where average coverage will be calculated | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | reference_genome | File | An optional reference genome used for consensus assembly and QC | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | reference_gff | File | The general feature format (gff) of the reference genome. | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | seq_method | String | The sequencing methodology used to generate the input read data; for TheiaProk workflows, this input will be used in the "seq_id" column in any taxon-specific tables created in the Export Taxon Tables task | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | skip_mash | Boolean | If true, skips estimation of genome size and coverage using mash in read screening steps. As a result, providing true also prevents screening using these parameters. | FALSE | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
workflow name | skip_screen | Boolean | Set to True to skip the read screening prior to analysis | FALSE | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | trim_min_length | Int | Specifies minimum length of each read after trimming to be kept | 25 | Optional | |
workflow name | trim_primers | Boolean | A True/False option that determines if primers should be trimmed. | TRUE | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | trim_quality_min_score | Int | Specifies the minimum average quality of bases in a sliding window to be kept | 30 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | trim_window_size | Int | Specifies window size for trimming (the number of bases to average the quality across) | 4 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | vadr_max_length | Int | Maximum length of contig allowed to run VADR | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | vadr_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 32 (RSV-A and RSV-B) and 8 (all other TheiaCoV organisms) | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | vadr_options | String | Additional options to provide to VADR | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | vadr_skip_length | Int | Minimum assembly length (unambiguous) to run VADR | 10000 | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | variant_min_freq | Float | Minimum frequency for a variant to be reported in ivar outputs | 0.6 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
Terra Task Name | Variable | Type | Description | Default Value | Terra Status | Organism |
---|---|---|---|---|---|---|
theiacov_ont | read1 | File | ONT read file in FASTQ file format (compression optional) | Required | HIV, MPXV, WNV, flu, sars-cov-2 | |
workflow name | samplename | String | The name of the sample being analyzed | Required | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
clean_check_reads | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
clean_check_reads | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
clean_check_reads | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/bactopia/gather_samples:2.0.2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
clean_check_reads | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
consensus | cpu | Int | Number of CPUs to allocate to the task | 8 | Optional | sars-cov-2 |
consensus | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | sars-cov-2 |
consensus | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/artic-ncov2019-epi2me | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
consensus | medaka_model | String | In order to obtain the best results, the appropriate model must be set to match the sequencer's basecaller model; this string takes the format of {pore}{device}{caller variant}_{caller_version}. See also https://github.com/nanoporetech/medaka?tab=readme-ov-file#models. | r941_min_high_g360 | Optional | sars-cov-2 |
consensus | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 16 | Optional | sars-cov-2 |
consensus_qc | cpu | Int | Number of CPUs to allocate to the task | 1 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
consensus_qc | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
consensus_qc | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/utility:1.1 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
consensus_qc | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
flu_track | abricate_flu_cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | flu |
flu_track | abricate_flu_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | flu |
flu_track | abricate_flu_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/abricate:1.0.1-insaflu-220727 | Optional | flu |
flu_track | abricate_flu_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional | flu |
flu_track | abricate_flu_min_percent_coverage | Int | Minimum DNA percent coverage | 60 | Optional | flu |
flu_track | abricate_flu_min_percent_identity | Int | Minimum DNA percent identity | 70 | Optional | flu |
flu_track | antiviral_aa_subs | String | Additional list of antiviral resistance associated amino acid substitutions of interest to be searched against those called on the sample segments. They take the format of :, e.g. NA:A26V | Optional | flu | |
flu_track | flu_h1_ha_ref | File | Internal component, do not modify | Optional | flu | |
flu_track | flu_h1n1_m2_ref | File | Internal component, do not modify | Optional | flu | |
flu_track | flu_h3_ha_ref | File | Internal component, do not modify | Optional | flu | |
flu_track | flu_h3n2_m2_ref | File | Internal component, do not modify | Optional | flu | |
flu_track | flu_n1_na_ref | File | Internal component, do not modify | Optional | flu | |
flu_track | flu_n2_na_ref | File | Internal component, do not modify | Optional | flu | |
flu_track | flu_pa_ref | File | Internal component, do not modify | Optional | flu | |
flu_track | flu_pb1_ref | File | Internal component, do not modify | Optional | flu | |
flu_track | flu_pb2_ref | File | Internal component, do not modify | Optional | flu | |
flu_track | flu_subtype | String | The influenza subtype being analyzed. Used for picking nextclade datasets. Options: "Yamagata", "Victoria", "H1N1", "H3N2", "H5N1". Only use to override the subtype call from IRMA and ABRicate. | Optional | flu | |
flu_track | genoflu_cpu | Int | Number of CPUs to allocate to the task | 1 | Optional | flu |
flu_track | genoflu_cross_reference | File | An Excel file to cross-reference BLAST findings; probably useful if novel genotypes are not in the default file used by genoflu.py | Optional | flu | |
flu_track | genoflu_disk_size | Int | Amount of storage (in GB) to allocate to the task | 25 | Optional | flu |
flu_track | genoflu_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/genoflu:1.06 | Optional | flu |
flu_track | genoflu_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional | flu |
flu_track | genoflu_min_percent_identity | Float | Percent identity threshold used for calling matches for each genome segment that make up the final GenoFlu genotype | 98 | Optional | flu |
flu_track | irma_cpu | Int | Number of CPUs to allocate to the task | 4 | Optional | flu |
flu_track | irma_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | flu |
flu_track | irma_docker_image | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/irma:1.2.0 | Optional | flu |
flu_track | irma_keep_ref_deletions | Boolean | True/False variable that determines if sites missed (i.e. 0 reads for a site in the reference genome) during read gathering should be deleted by ambiguation by inserting N's or deleting the sequence entirely. False sets this IRMA paramater to "DEL" and true sets it to "NNN" | TRUE | Optional | flu |
flu_track | irma_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 16 | Optional | flu |
flu_track | irma_min_ambiguous_threshold | Float | Minimum called Single Nucleotide Variant (SNV) frequency for mixed based calls in the output consensus assembly (AKA amended consensus). | 0.2 | Optional | flu |
flu_track | irma_min_avg_consensus_allele_quality | Int | Minimum allele coverage depth to call plurality consensus, otherwise calls "N". Setting this value too high can negatively impact final amended consensus. | 10 | Optional | flu |
flu_track | irma_min_read_length | Int | Minimum read length to include reads in read gathering step in IRMA. This value should not be greater than the typical read length. | 75 | Optional | flu |
flu_track | nextclade_cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | flu |
flu_track | nextclade_disk_size | Int | Amount of storage (in GB) to allocate to the task | 50 | Optional | flu |
flu_track | nextclade_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/nextstrain/nextclade:3.10.2 | Optional | flu |
flu_track | nextclade_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional | flu |
flu_track | nextclade_output_parser_cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
flu_track | nextclade_output_parser_disk_size | Int | Amount of storage (in GB) to allocate to the task | 50 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
flu_track | nextclade_output_parser_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/python/python:3.8.18-slim | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
flu_track | nextclade_output_parser_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
flu_track | read2 | File | Internal component, do not modify | Optional | flu | |
gene_coverage | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | MPXV, sars-cov-2 |
gene_coverage | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | MPXV, sars-cov-2 |
gene_coverage | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/samtools:1.15 | Optional | MPXV, sars-cov-2 |
gene_coverage | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | MPXV, sars-cov-2 |
gene_coverage | min_depth | Int | The minimum depth to determine if a position was covered. | 10 | Optional | MPXV, sars-cov-2 |
gene_coverage | sc2_s_gene_start | Int | start nucleotide position of the SARS-CoV-2 Spike gene | 21563 | Optional | MPXV, sars-cov-2 |
gene_coverage | sc2_s_gene_stop | Int | End/Last nucleotide position of the SARS-CoV-2 Spike gene | 25384 | Optional | MPXV, sars-cov-2 |
nanoplot_clean | cpu | Int | Number of CPUs to allocate to the task | 4 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nanoplot_clean | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nanoplot_clean | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/nanoplot:1.40.0 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nanoplot_clean | max_length | Int | The maximum length of clean reads, for which reads longer than the length specified will be hidden. | 100000 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nanoplot_clean | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 16 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nanoplot_raw | cpu | Int | Number of CPUs to allocate to the task | 4 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nanoplot_raw | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nanoplot_raw | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/nanoplot:1.40.0 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nanoplot_raw | max_length | Int | The maximum length of clean reads, for which reads longer than the length specified will be hidden. | 100000 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nanoplot_raw | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 16 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_output_parser | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_output_parser | disk_size | Int | Amount of storage (in GB) to allocate to the task | 50 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_output_parser | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/python/python:3.8.18-slim | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_output_parser | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | auspice_reference_tree_json | File | An Auspice JSON phylogenetic reference tree which serves as a target for phylogenetic placement. | Inherited from nextclade dataset | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | disk_size | Int | Amount of storage (in GB) to allocate to the task | 50 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/nextstrain/nextclade:3.10.2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | gene_annotations_gff | File | A genome annotation to specify how to translate the nucleotide sequence to proteins (genome_annotation.gff3). specifying this enables codon-informed alignment and protein alignments. See here for more info: https://docs.nextstrain.org/projects/nextclade/en/latest/user/input-files/03-genome-annotation.html | Inherited from nextclade dataset | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | input_ref | File | A nucleotide sequence which serves as a reference for the pairwise alignment of all input sequences. This is also the sequence which defines the coordinate system of the genome annotation. See here for more info: https://docs.nextstrain.org/projects/nextclade/en/latest/user/input-files/02-reference-sequence.html | Inherited from nextclade dataset | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | nextclade_pathogen_json | File | General dataset configuration file. See here for more info: https://docs.nextstrain.org/projects/nextclade/en/latest/user/input-files/05-pathogen-config.html | Inherited from nextclade dataset | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | verbosity | String | other options are: "off" , "error" , "info" , "debug" , and "trace" (highest level of verbosity) | warn | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
organism_parameters | auspice_config | File | Auspice config file for customizing visualizations in the Augur_PHB workflow; takes priority over the other customization values available for augur_export. Defaults are set for various organisms & flu segments. A minimal auspice config file is set in cases where organism is not specified and user does not provide an optional input config file. | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
organism_parameters | flu_segment | String | Influenza genome segment being analyzed. Options: "HA" or "NA". Automatically determined. This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs | N/A | Optional | flu |
organism_parameters | flu_subtype | String | The influenza subtype being analyzed. Options: "Yamagata", "Victoria", "H1N1", "H3N2", "H5N1". Automatically determined. This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs | N/A | Optional | flu |
organism_parameters | hiv_primer_version | String | The version of HIV primers used. Options are "https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl#L156" and "https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl#L164". This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs | v1 | Optional | HIV |
organism_parameters | kraken_target_organism_input | String | The organism whose abundance the user wants to check in their reads. This should be a proper taxonomic name recognized by the Kraken database. | Default provided for mpox (Monkeypox virus), WNV (West Nile virus), and HIV (Human immunodeficiency virus 1) | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
organism_parameters | reference_gff_file | File | Reference GFF file for the organism being analyzed | Default provided for mpox ("gs://theiagen-public-files/terra/mpxv-files/Mpox-MT903345.1.reference.gff3") and HIV (primer versions 1 ["gs://theiagen-public-files/terra/hivgc-files/NC_001802.1.gff3"] and 2 ["gs://theiagen-public-files/terra/hivgc-files/AY228557.1.gff3"]) | Optional | |
organism_parameters | vadr_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 32 (RSV-A, RSV-B, WNV) and 16 (all other TheiaCoV organisms) | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
pangolin4 | analysis_mode | String | Used to switch between usher and pangolearn analysis modes. Only use usher because pangolearn is no longer supported as of Pangolin v4.3 and higher versions. | Optional | sars-cov-2 | |
pangolin4 | cpu | Int | Number of CPUs to allocate to the task | 4 | Optional | sars-cov-2 |
pangolin4 | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | sars-cov-2 |
pangolin4 | expanded_lineage | Boolean | True/False that determines if a lineage should be expanded without aliases (e.g., BA.1 → B.1.1.529.1) | TRUE | Optional | sars-cov-2 |
pangolin4 | max_ambig | Float | The maximum proportion of Ns allowed for pangolin to attempt an assignment | 0.5 | Optional | sars-cov-2 |
pangolin4 | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | sars-cov-2 |
pangolin4 | min_length | Int | Minimum query length allowed for pangolin to attempt an assignment | 10000 | Optional | sars-cov-2 |
pangolin4 | pangolin_arguments | String | Optional arguments for pangolin e.g. ''--skip-scorpio'' | Optional | sars-cov-2 | |
pangolin4 | skip_designation_cache | Boolean | A True/False option that determines if the designation cache should be used | FALSE | Optional | sars-cov-2 |
pangolin4 | skip_scorpio | Boolean | A True/False option that determines if scorpio should be skipped. | FALSE | Optional | sars-cov-2 |
qc_check_task | ani_highest_percent | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | ani_highest_percent_bases_aligned | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | assembly_length | Int | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | busco_results | String | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | combined_mean_q_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | combined_mean_q_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | combined_mean_readlength_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | combined_mean_readlength_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | cpu | Int | Number of CPUs to allocate to the task | 4 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
qc_check_task | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
qc_check_task | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-03-16 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
qc_check_task | est_coverage_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | est_coverage_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | gambit_predicted_taxon | String | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | kraken_human | String | Internal component, do not modify | Optional | ||
qc_check_task | kraken_human_dehosted | String | Internal component, do not modify | Optional | ||
qc_check_task | kraken_sc2 | String | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | kraken_sc2_dehosted | String | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | kraken_target_organism | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | kraken_target_organism_dehosted | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
qc_check_task | midas_secondary_genus_abundance | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | midas_secondary_genus_coverage | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | n50_value | Int | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | num_reads_clean2 | Int | Internal component, do not modify | Optional | ||
qc_check_task | num_reads_raw2 | Int | Internal component, do not modify | Optional | ||
qc_check_task | number_contigs | Int | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | quast_gc_percent | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r1_mean_q_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r1_mean_q_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r1_mean_readlength_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r1_mean_readlength_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r2_mean_q_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r2_mean_q_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r2_mean_readlength_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r2_mean_readlength_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | sc2_s_gene_mean_coverage | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | sc2_s_gene_percent_coverage | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
quasitools_ont | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | HIV |
quasitools_ont | disk_size | Int | Amount of storage (in GB) to allocate to the task | 50 | Optional | HIV |
quasitools_ont | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/biocontainers/quasitools:0.7.0--pyh864c0ab_1 | Optional | HIV |
quasitools_ont | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional | HIV |
quasitools_ont | read2 | File | Internal component, do not modify | Optional | HIV | |
raw_check_reads | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
raw_check_reads | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
raw_check_reads | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/bactopia/gather_samples:2.0.2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
raw_check_reads | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | artic_guppyplex_cpu | Int | Number of CPUs to allocate to the task | 8 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | artic_guppyplex_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | artic_guppyplex_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/artic-ncov2019:1.3.0-medaka-1.4.3 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | artic_guppyplex_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 16 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | call_kraken | Boolean | True/False variable that determines if the Kraken2 task should be called; for non-TheiaCoV workflows, the kraken_db variable must be provided. |
FALSE | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | downsampling_coverage | Float | The desired coverage to sub-sample the reads to with RASUSA | 150 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | kraken_cpu | Int | Number of CPUs to allocate to the task | 4 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | kraken_db | File | A kraken2 database to use with the kraken2 optional task. The file must be a .tar.gz kraken2 database. Must contain human and viral sequences | gs://theiagen-large-public-files-rp/terra/databases/kraken2/kraken2_humanGRCh38_viralRefSeq_20240828.tar.gz | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | kraken_disk_size | Int | Amount of storage (in GB) to allocate to the task. Increase this when using large (>30GB) kraken2 databases such as the "k2_standard" database | 100 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | kraken_docker_image | Int | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/kraken2:2.1.2-no-db | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | kraken_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | kraken2_recalculate_abundances_cpu | Int | Number of CPUs to allocate to the task | 4 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | kraken2_recalculate_abundances_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | kraken2_recalculate_abundances_docker | Int | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-08-28-v4 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | kraken2_recalculate_abundances_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | nanoq_cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | nanoq_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | nanoq_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/biocontainers/nanoq:0.9.0--hec16e2b_1 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | nanoq_max_read_length | Int | The maximum read length to keep after trimming | 100000 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | nanoq_max_read_qual | Int | The maximum read quality to keep after trimming | 40 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | nanoq_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | nanoq_min_read_length | Int | The minimum read length to keep after trimming | 500 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | nanoq_min_read_qual | Int | The minimum read quality to keep after trimming | 10 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | ncbi_scrub_cpu | Int | Number of CPUs to allocate to the task | 4 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | ncbi_scrub_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | ncbi_scrub_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/ncbi/sra-human-scrubber:2.2.1 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | ncbi_scrub_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
read_QC_trim | rasusa_bases | String | Internal component, do not modify | Optional | ||
read_QC_trim | rasusa_cpu | Int | Internal component, do not modify | Optional | ||
read_QC_trim | rasusa_disk_size | Int | Internal component, do not modify | Optional | ||
read_QC_trim | rasusa_docker | String | Internal component, do not modify | Optional | ||
read_QC_trim | rasusa_fraction_of_reads | Float | Internal component, do not modify | Optional | ||
read_QC_trim | rasusa_memory | Int | Internal component, do not modify | Optional | ||
read_QC_trim | rasusa_number_of_reads | Int | Internal component, do not modify | Optional | ||
read_QC_trim | rasusa_seed | Int | Internal component, do not modify | Optional | ||
set_flu_ha_nextclade_values | reference_gff_file | File | Reference GFF file for flu HA | Optional | flu | |
set_flu_na_nextclade_values | reference_gff_file | Int | Reference GFF file for flu NA | Optional | flu | |
set_flu_na_nextclade_values | vadr_mem | Int | Memory, in GB, allocated to this task | 8 | Optional | flu |
stats_n_coverage | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | |
stats_n_coverage | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | |
stats_n_coverage | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/samtools:1.15 | Optional | |
stats_n_coverage | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | |
stats_n_coverage_primtrim | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | |
stats_n_coverage_primtrim | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | |
stats_n_coverage_primtrim | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/samtools:1.15 | Optional | |
stats_n_coverage_primtrim | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | |
vadr | cpu | Int | Number of CPUs to allocate to the task | 4 | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
vadr | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
vadr | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/vadr:1.5.1 | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
vadr | min_length | Int | Minimum length subsequence to possibly replace Ns for the fasta-trim-terminal-ambigs.pl VADR script | 50 | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
version_capture | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0 | Optional | |
version_capture | timezone | String | Set the time zone to get an accurate date of analysis (uses UTC by default) | Optional | ||
workflow name | genome_length | Int | User-specified expected genome length to be used in genome statistics calculations | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | max_genome_length | Int | Maximum genome length able to pass read screening | 2673870 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | max_length | Int | Maximum length for a read based on the SARS-CoV-2 primer scheme | 700 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | min_basepairs | Int | Minimum number of base pairs able to pass read screening | 34000 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | min_coverage | Int | Minimum genome coverage able to pass read screening | 10 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | min_depth | Int | Minimum depth of reads required to call variants and generate a consensus genome. This value is passed to the iVar software. | 100 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | min_genome_length | Int | Minimum genome length to pass read screening | 1700 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | min_length | Int | Minimum length of a read based on the SARS-CoV-2 primer scheme | 400 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | nextclade_dataset_name | String | Nextclade organism dataset names. However, if organism input is set correctly, this input will be automatically assigned the corresponding dataset name. See organism defaults for more information | Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments) here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | nextclade_dataset_tag | String | Nextclade dataset tag. Used for pulling up-to-date reference genomes and associated information specific to nextclade datasets (QC thresholds, organism-specific information like SARS-CoV-2 clade & lineage information, etc.) that is required for running the Nextclade tool. | Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments) here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | normalise | Int | Used to normalize the amount of reads to the indicated level before variant calling | 20000 for CL, 200 for ONT | Optional | |
workflow name | organism | String | The organism that is being analyzed. Options: "sars-cov-2", "MPXV", "WNV", "HIV", "flu", "rsv_a", "rsv_b". However, "flu" is not available for TheiaCoV_Illumina_SE | sars-cov-2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | pangolin_docker_image | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/pangolin:4.3.1-pdata-1.33 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | primer_bed | File | The bed file containing the primers used when sequencing was performed | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 | |
workflow name | qc_check_table | File | TSV value with taxons for rows and QC values for columns; internal cells represent user-determined QC thresholds; if provided, turns on the QC Check task. See below for an example QC Check table. | Optional | ||
workflow name | reference_gene_locations_bed | File | Use to provide locations of interest where average coverage will be calculated | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | reference_genome | File | An optional reference genome used for consensus assembly and QC | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | seq_method | String | The sequencing methodology used to generate the input read data; for TheiaProk workflows, this input will be used in the "seq_id" column in any taxon-specific tables created in the Export Taxon Tables task | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | skip_mash | Boolean | If true, skips estimation of genome size and coverage using mash in read screening steps. As a result, providing true also prevents screening using these parameters. | FALSE | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
workflow name | skip_screen | Boolean | Set to True to skip the read screening prior to analysis | FALSE | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | target_organism | String | The organism whose abundance the user wants to check in their reads. This should be a proper taxonomic name recognized by the Kraken database. | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | vadr_max_length | Int | Maximum length of contig allowed to run VADR | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | vadr_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 32 (RSV-A and RSV-B) and 8 (all other TheiaCoV organisms) | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | vadr_options | String | Additional options to provide to VADR | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | vadr_skip_length | Int | Minimum assembly length (unambiguous) to run VADR | 10000 | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
Terra Task Name | Variable | Type | Description | Default Value | Terra Status | Organism |
---|---|---|---|---|---|---|
theiacov_fasta | assembly_fasta | File | The assembly file for your sample in FASTA format | Required | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
theiacov_fasta | input_assembly_method | String | Method used to generate the assembly file | Required | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | samplename | String | The name of the sample being analyzed | Required | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | seq_method | String | The sequencing methodology used to generate the input read data | Required | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | flu_segment | String | Influenza genome segment being analyzed. Options: "HA" or "NA". | HA | Optional, Required | |
consensus_qc | cpu | Int | Number of CPUs to allocate to the task | 1 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
consensus_qc | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
consensus_qc | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/utility:1.1 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
consensus_qc | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
flu_track | abricate_flu_cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | flu |
flu_track | abricate_flu_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | flu |
flu_track | abricate_flu_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/abricate:1.0.1-insaflu-220727 | Optional | flu |
flu_track | abricate_flu_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional | flu |
flu_track | abricate_flu_min_percent_coverage | Int | Minimum DNA percent coverage | 60 | Optional | flu |
flu_track | abricate_flu_min_percent_identity | Int | Minimum DNA percent identity | 70 | Optional | flu |
flu_track | genoflu_cpu | Int | Number of CPUs to allocate to the task | 1 | Optional | flu |
flu_track | genoflu_cross_reference | File | An Excel file to cross-reference BLAST findings; probably useful if novel genotypes are not in the default file used by genoflu.py | Optional | flu | |
flu_track | genoflu_disk_size | Int | Amount of storage (in GB) to allocate to the task | 25 | Optional | flu |
flu_track | genoflu_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/genoflu:1.06 | Optional | flu |
flu_track | genoflu_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional | flu |
flu_track | genoflu_min_percent_identity | Float | Percent identity threshold used for calling matches for each genome segment that make up the final GenoFlu genotype | 98 | Optional | flu |
flu_track | nextclade_output_parser_cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
flu_track | nextclade_output_parser_disk_size | Int | Amount of storage (in GB) to allocate to the task | 50 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
flu_track | nextclade_output_parser_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/python/python:3.8.18-slim | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
flu_track | nextclade_output_parser_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | auspice_reference_tree_json | File | An Auspice JSON phylogenetic reference tree which serves as a target for phylogenetic placement. | Inherited from nextclade dataset | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | disk_size | Int | Amount of storage (in GB) to allocate to the task | 50 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/nextstrain/nextclade:3.10.2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | gene_annotations_gff | File | A genome annotation to specify how to translate the nucleotide sequence to proteins (genome_annotation.gff3). specifying this enables codon-informed alignment and protein alignments. See here for more info: https://docs.nextstrain.org/projects/nextclade/en/latest/user/input-files/03-genome-annotation.html | Inherited from nextclade dataset | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | input_ref | File | A nucleotide sequence which serves as a reference for the pairwise alignment of all input sequences. This is also the sequence which defines the coordinate system of the genome annotation. See here for more info: https://docs.nextstrain.org/projects/nextclade/en/latest/user/input-files/02-reference-sequence.html | Inherited from nextclade dataset | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | nextclade_pathogen_json | File | General dataset configuration file. See here for more info: https://docs.nextstrain.org/projects/nextclade/en/latest/user/input-files/05-pathogen-config.html | Inherited from nextclade dataset | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | verbosity | String | other options are: "off" , "error" , "info" , "debug" , and "trace" (highest level of verbosity) | warn | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
organism_parameters | auspice_config | File | Auspice config file for customizing visualizations in the Augur_PHB workflow; takes priority over the other customization values available for augur_export. Defaults are set for various organisms & flu segments. A minimal auspice config file is set in cases where organism is not specified and user does not provide an optional input config file. | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
organism_parameters | gene_locations_bed_file | File | Use to provide locations of interest where average coverage will be calculated | Default provided for SARS-CoV-2 ("gs://theiagen-public-files-rp/terra/sars-cov-2-files/sc2_gene_locations.bed") and mpox ("gs://theiagen-public-files/terra/mpxv-files/mpox_gene_locations.bed") | Optional | |
organism_parameters | hiv_primer_version | String | The version of HIV primers used. Options are "https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl#L156" and "https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl#L164". This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs | v1 | Optional | HIV |
organism_parameters | kraken_target_organism_input | String | The organism whose abundance the user wants to check in their reads. This should be a proper taxonomic name recognized by the Kraken database. | Default provided for mpox (Monkeypox virus), WNV (West Nile virus), and HIV (Human immunodeficiency virus 1) | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
organism_parameters | pangolin_docker_image | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/pangolin:4.3.1-pdata-1.33 | Optional | |
organism_parameters | primer_bed_file | File | The bed file containing the primers used when sequencing was performed | REQUIRED FOR SARS-CoV-2, MPOX, WNV, RSV-A & RSV-B. Provided by default only for HIV primer versions 1 ("gs://theiagen-public-files/terra/hivgc-files/HIV-1_v1.0.primer.hyphen.bed" and 2 ("gs://theiagen-public-files/terra/hivgc-files/HIV-1_v2.0.primer.hyphen400.1.bed") | Optional | |
organism_parameters | reference_gff_file | File | Reference GFF file for the organism being analyzed | Default provided for mpox ("gs://theiagen-public-files/terra/mpxv-files/Mpox-MT903345.1.reference.gff3") and HIV (primer versions 1 ["gs://theiagen-public-files/terra/hivgc-files/NC_001802.1.gff3"] and 2 ["gs://theiagen-public-files/terra/hivgc-files/AY228557.1.gff3"]) | Optional | |
organism_parameters | vadr_max_length | Int | Maximum length for the fasta-trim-terminal-ambigs.pl VADR script | Default provided for SARS-CoV-2 (30000), mpox (210000), WNV (11000), flu (0), RSV-A (15500) and RSV-B (15500). | Optional | |
pangolin4 | analysis_mode | String | Used to switch between usher and pangolearn analysis modes. Only use usher because pangolearn is no longer supported as of Pangolin v4.3 and higher versions. | Optional | sars-cov-2 | |
pangolin4 | cpu | Int | Number of CPUs to allocate to the task | 4 | Optional | sars-cov-2 |
pangolin4 | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | sars-cov-2 |
pangolin4 | expanded_lineage | Boolean | True/False that determines if a lineage should be expanded without aliases (e.g., BA.1 → B.1.1.529.1) | TRUE | Optional | sars-cov-2 |
pangolin4 | max_ambig | Float | The maximum proportion of Ns allowed for pangolin to attempt an assignment | 0.5 | Optional | sars-cov-2 |
pangolin4 | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | sars-cov-2 |
pangolin4 | min_length | Int | Minimum query length allowed for pangolin to attempt an assignment | 10000 | Optional | sars-cov-2 |
pangolin4 | pangolin_arguments | String | Optional arguments for pangolin e.g. ''--skip-scorpio'' | Optional | sars-cov-2 | |
pangolin4 | skip_designation_cache | Boolean | A True/False option that determines if the designation cache should be used | FALSE | Optional | sars-cov-2 |
pangolin4 | skip_scorpio | Boolean | A True/False option that determines if scorpio should be skipped. | FALSE | Optional | sars-cov-2 |
qc_check_task | ani_highest_percent | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | ani_highest_percent_bases_aligned | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | assembly_length | Int | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | assembly_mean_coverage | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | busco_results | String | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | combined_mean_q_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | combined_mean_q_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | combined_mean_readlength_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | combined_mean_readlength_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | cpu | Int | Number of CPUs to allocate to the task | 4 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
qc_check_task | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
qc_check_task | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-03-16 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
qc_check_task | est_coverage_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | est_coverage_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | gambit_predicted_taxon | String | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | kraken_human | String | Internal component, do not modify | Optional | ||
qc_check_task | kraken_human_dehosted | String | Internal component, do not modify | Optional | ||
qc_check_task | kraken_sc2 | String | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | kraken_sc2_dehosted | String | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | kraken_target_organism | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | kraken_target_organism_dehosted | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
qc_check_task | midas_secondary_genus_abundance | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | midas_secondary_genus_coverage | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | minbaseq_trim | Int | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | n50_value | Int | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | num_reads_clean2 | Int | Internal component, do not modify | Optional | ||
qc_check_task | num_reads_raw2 | Int | Internal component, do not modify | Optional | ||
qc_check_task | number_contigs | Int | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | quast_gc_percent | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r1_mean_q_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r1_mean_q_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r1_mean_readlength_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r1_mean_readlength_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r2_mean_q_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r2_mean_q_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r2_mean_readlength_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r2_mean_readlength_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | sc2_s_gene_mean_coverage | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | sc2_s_gene_percent_coverage | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
vadr | cpu | Int | Number of CPUs to allocate to the task | 4 | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
vadr | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
vadr | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/vadr:1.5.1 | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
vadr | min_length | Int | Minimum length subsequence to possibly replace Ns for the fasta-trim-terminal-ambigs.pl VADR script | 50 | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
version_capture | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0 | Optional | |
version_capture | timezone | String | Set the time zone to get an accurate date of analysis (uses UTC by default) | Optional | ||
workflow name | flu_subtype | String | The influenza subtype being analyzed. Options: "Yamagata", "Victoria", "H1N1", "H3N2", "H5N1". Automatically determined. | Optional | ||
workflow name | genome_length | Int | User-specified expected genome length to be used in genome statistics calculations | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | nextclade_dataset_name | String | Nextclade organism dataset names. However, if organism input is set correctly, this input will be automatically assigned the corresponding dataset name. See organism defaults for more information | Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments) here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | nextclade_dataset_tag | String | Nextclade dataset tag. Used for pulling up-to-date reference genomes and associated information specific to nextclade datasets (QC thresholds, organism-specific information like SARS-CoV-2 clade & lineage information, etc.) that is required for running the Nextclade tool. | Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments) here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | organism | String | The organism that is being analyzed. Options: "sars-cov-2", "MPXV", "WNV", "HIV", "flu", "rsv_a", "rsv_b". However, "flu" is not available for TheiaCoV_Illumina_SE | sars-cov-2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | qc_check_table | File | TSV value with taxons for rows and QC values for columns; internal cells represent user-determined QC thresholds; if provided, turns on the QC Check task. See below for an example QC Check table. | Optional | ||
workflow name | reference_genome | File | An optional reference genome used for consensus assembly and QC | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | vadr_max_length | Int | Maximum length of contig allowed to run VADR | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | vadr_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 32 (RSV-A and RSV-B) and 8 (all other TheiaCoV organisms) | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | vadr_opts | String | Additional options to provide to VADR | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | vadr_skip_length | Int | Minimum assembly length (unambiguous) to run VADR | 10000 | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
Terra Task Name | Variable | Type | Description | Default Value | Terra Status | Organism |
---|---|---|---|---|---|---|
theiacov_clearlabs | primer_bed | File | The bed file containing the primers used when sequencing was performed | Required | sars-cov-2 | |
theiacov_clearlabs | read1 | File | Clear Dx-produced read file in FASTQ file format (compression optional) | Required | sars-cov-2 | |
workflow name | samplename | String | The name of the sample being analyzed | Required | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
consensus | cpu | Int | Number of CPUs to allocate to the task | 8 | Optional | sars-cov-2 |
consensus | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | sars-cov-2 |
consensus | medaka_model | String | In order to obtain the best results, the appropriate model must be set to match the sequencer's basecaller model; this string takes the format of {pore}{device}{caller variant}_{caller_version}. See also https://github.com/nanoporetech/medaka?tab=readme-ov-file#models. | r941_min_high_g360 | Optional | sars-cov-2 |
consensus | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 16 | Optional | sars-cov-2 |
consensus_qc | cpu | Int | Number of CPUs to allocate to the task | 1 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
consensus_qc | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
consensus_qc | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/utility:1.1 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
consensus_qc | genome_length | Int | Internal component, do not modify | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 | |
consensus_qc | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional | HIV, MPXV, WNV, rsv_a, rsv_b, sars-cov-2 |
fastq_scan_clean_reads | cpu | Int | Number of CPUs to allocate to the task | 1 | Optional | sars-cov-2 |
fastq_scan_clean_reads | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | sars-cov-2 |
fastq_scan_clean_reads | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/utility:1.1 | Optional | sars-cov-2 |
fastq_scan_clean_reads | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional | sars-cov-2 |
fastq_scan_clean_reads | read1_name | Int | Internal component, do not modify | Optional | sars-cov-2 | |
fastq_scan_raw_reads | cpu | Int | Number of CPUs to allocate to the task | 1 | Optional | sars-cov-2 |
fastq_scan_raw_reads | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | sars-cov-2 |
fastq_scan_raw_reads | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/utility:1.1 | Optional | sars-cov-2 |
fastq_scan_raw_reads | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional | sars-cov-2 |
fastq_scan_raw_reads | read1_name | Int | Internal component, do not modify | Optional | sars-cov-2 | |
flu_track | flu_subtype | String | The influenza subtype being analyzed. Used for picking nextclade datasets. Options: "Yamagata", "Victoria", "H1N1", "H3N2", "H5N1". Only use to override the subtype call from IRMA and ABRicate. | Optional | flu | |
flu_track | nextclade_output_parser_cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
flu_track | nextclade_output_parser_disk_size | Int | Amount of storage (in GB) to allocate to the task | 50 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
flu_track | nextclade_output_parser_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/python/python:3.8.18-slim | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
flu_track | nextclade_output_parser_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
gene_coverage | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | MPXV, sars-cov-2 |
gene_coverage | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | MPXV, sars-cov-2 |
gene_coverage | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/samtools:1.15 | Optional | MPXV, sars-cov-2 |
gene_coverage | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | MPXV, sars-cov-2 |
gene_coverage | sc2_s_gene_start | Int | start nucleotide position of the SARS-CoV-2 Spike gene | 21563 | Optional | MPXV, sars-cov-2 |
gene_coverage | sc2_s_gene_stop | Int | End/Last nucleotide position of the SARS-CoV-2 Spike gene | 25384 | Optional | MPXV, sars-cov-2 |
kraken2_dehosted | cpu | Int | Number of CPUs to allocate to the task | 4 | Optional | sars-cov-2 |
kraken2_dehosted | disk_size | Int | Amount of storage (in GB) to allocate to the task. Increase this when using large (>30GB kraken2 databases such as the "k2_standard" database) | 100 | Optional | sars-cov-2 |
kraken2_dehosted | docker_image | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/kraken2:2.1.2-no-db | Optional | sars-cov-2 |
kraken2_dehosted | kraken2_db | File | The database used to run Kraken2. Must contain viral and human sequences. | gs://theiagen-large-public-files-rp/terra/databases/kraken2/kraken2_humanGRCh38_viralRefSeq_20240828.tar.gz | Optional | sars-cov-2 |
kraken2_dehosted | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | sars-cov-2 |
kraken2_dehosted | read2 | File | Internal component, do not modify | Optional | sars-cov-2 | |
kraken2_raw | cpu | Int | Number of CPUs to allocate to the task | 4 | Optional | sars-cov-2 |
kraken2_raw | disk_size | Int | Amount of storage (in GB) to allocate to the task. Increase this when using large (>30GB kraken2 databases such as the "k2_standard" database) | 100 | Optional | sars-cov-2 |
kraken2_raw | docker_image | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/kraken2:2.1.2-no-db | Optional | sars-cov-2 |
kraken2_raw | kraken2_db | File | The database used to run Kraken2. Must contain viral and human sequences. | gs://theiagen-large-public-files-rp/terra/databases/kraken2/kraken2_humanGRCh38_viralRefSeq_20240828.tar.gz | Optional | sars-cov-2 |
kraken2_raw | memory | String | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | sars-cov-2 |
kraken2_raw | read2 | File | Internal component, do not modify | Optional | sars-cov-2 | |
ncbi_scrub_se | cpu | Int | Number of CPUs to allocate to the task | 4 | Optional | sars-cov-2 |
ncbi_scrub_se | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | sars-cov-2 |
ncbi_scrub_se | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/ncbi/sra-human-scrubber:2.2.1 | Optional | sars-cov-2 |
ncbi_scrub_se | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | sars-cov-2 |
nextclade_v3 | auspice_reference_tree_json | File | An Auspice JSON phylogenetic reference tree which serves as a target for phylogenetic placement. | Inherited from nextclade dataset | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | disk_size | Int | Amount of storage (in GB) to allocate to the task | 50 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/nextstrain/nextclade:3.10.2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | gene_annotations_gff | File | A genome annotation to specify how to translate the nucleotide sequence to proteins (genome_annotation.gff3). specifying this enables codon-informed alignment and protein alignments. See here for more info: https://docs.nextstrain.org/projects/nextclade/en/latest/user/input-files/03-genome-annotation.html | Inherited from nextclade dataset | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | input_ref | File | A nucleotide sequence which serves as a reference for the pairwise alignment of all input sequences. This is also the sequence which defines the coordinate system of the genome annotation. See here for more info: https://docs.nextstrain.org/projects/nextclade/en/latest/user/input-files/02-reference-sequence.html | Inherited from nextclade dataset | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | nextclade_pathogen_json | File | General dataset configuration file. See here for more info: https://docs.nextstrain.org/projects/nextclade/en/latest/user/input-files/05-pathogen-config.html | Inherited from nextclade dataset | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
nextclade_v3 | verbosity | String | other options are: "off" , "error" , "info" , "debug" , and "trace" (highest level of verbosity) | warn | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
organism_parameters | auspice_config | File | Auspice config file for customizing visualizations in the Augur_PHB workflow; takes priority over the other customization values available for augur_export. Defaults are set for various organisms & flu segments. A minimal auspice config file is set in cases where organism is not specified and user does not provide an optional input config file. | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
organism_parameters | flu_segment | String | Influenza genome segment being analyzed. Options: "HA" or "NA". Automatically determined. This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs | N/A | Optional | flu |
organism_parameters | flu_subtype | String | The influenza subtype being analyzed. Options: "Yamagata", "Victoria", "H1N1", "H3N2", "H5N1". Automatically determined. This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs | N/A | Optional | flu |
organism_parameters | gene_locations_bed_file | File | Use to provide locations of interest where average coverage will be calculated | Default provided for SARS-CoV-2 ("gs://theiagen-public-files-rp/terra/sars-cov-2-files/sc2_gene_locations.bed") and mpox ("gs://theiagen-public-files/terra/mpxv-files/mpox_gene_locations.bed") | Optional | |
organism_parameters | genome_length_input | Int | Use to specify the expected genome length; provided by default for all supported organisms | Default provided for SARS-CoV-2 (29903), mpox (197200), WNV (11000), flu (13000), RSV-A (16000), RSV-B (16000), HIV (primer versions 1 [9181] and 2 [9840]) | Optional | |
organism_parameters | hiv_primer_version | String | The version of HIV primers used. Options are "https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl#L156" and "https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl#L164". This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs | v1 | Optional | HIV |
organism_parameters | pangolin_docker_image | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/pangolin:4.3.1-pdata-1.33 | Optional | |
organism_parameters | primer_bed_file | File | The bed file containing the primers used when sequencing was performed | REQUIRED FOR SARS-CoV-2, MPOX, WNV, RSV-A & RSV-B. Provided by default only for HIV primer versions 1 ("gs://theiagen-public-files/terra/hivgc-files/HIV-1_v1.0.primer.hyphen.bed" and 2 ("gs://theiagen-public-files/terra/hivgc-files/HIV-1_v2.0.primer.hyphen400.1.bed") | Optional | |
organism_parameters | reference_gff_file | File | Reference GFF file for the organism being analyzed | Default provided for mpox ("gs://theiagen-public-files/terra/mpxv-files/Mpox-MT903345.1.reference.gff3") and HIV (primer versions 1 ["gs://theiagen-public-files/terra/hivgc-files/NC_001802.1.gff3"] and 2 ["gs://theiagen-public-files/terra/hivgc-files/AY228557.1.gff3"]) | Optional | |
organism_parameters | vadr_max_length | Int | Maximum length for the fasta-trim-terminal-ambigs.pl VADR script | Default provided for SARS-CoV-2 (30000), mpox (210000), WNV (11000), flu (0), RSV-A (15500) and RSV-B (15500). | Optional | |
organism_parameters | vadr_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 32 (RSV-A, RSV-B, WNV) and 16 (all other TheiaCoV organisms) | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
organism_parameters | vadr_options | String | Options for the v-annotate.pl VADR script | Default provided for SARS-CoV-2 ("--noseqnamemax --glsearch -s -r --nomisc --mkey sarscov2 --lowsim5seq 6 --lowsim3seq 6 --alt_fail lowscore,insertnn,deletinn --out_allfasta"), mpox ("--glsearch -s -r --nomisc --mkey mpxv --r_lowsimok --r_lowsimxd 100 --r_lowsimxl 2000 --alt_pass discontn,dupregin --out_allfasta --minimap2 --s_overhang 150"), WNV ("--mkey flavi --mdir /opt/vadr/vadr-models-flavi/ --nomisc --noprotid --out_allfasta"), flu (""), RSV-A ("-r --mkey rsv --xnocomp"), and RSV-B ("-r --mkey rsv --xnocomp") | Optional | |
organism_parameters | vadr_skip_length | Int | Minimum assembly length (unambiguous) to run VADR | 10000 | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
pangolin4 | analysis_mode | String | Used to switch between usher and pangolearn analysis modes. Only use usher because pangolearn is no longer supported as of Pangolin v4.3 and higher versions. | Optional | sars-cov-2 | |
pangolin4 | cpu | Int | Number of CPUs to allocate to the task | 4 | Optional | sars-cov-2 |
pangolin4 | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | sars-cov-2 |
pangolin4 | expanded_lineage | Boolean | True/False that determines if a lineage should be expanded without aliases (e.g., BA.1 → B.1.1.529.1) | TRUE | Optional | sars-cov-2 |
pangolin4 | max_ambig | Float | The maximum proportion of Ns allowed for pangolin to attempt an assignment | 0.5 | Optional | sars-cov-2 |
pangolin4 | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | sars-cov-2 |
pangolin4 | min_length | Int | Minimum query length allowed for pangolin to attempt an assignment | 10000 | Optional | sars-cov-2 |
pangolin4 | pangolin_arguments | String | Optional arguments for pangolin e.g. ''--skip-scorpio'' | Optional | sars-cov-2 | |
pangolin4 | skip_designation_cache | Boolean | A True/False option that determines if the designation cache should be used | FALSE | Optional | sars-cov-2 |
pangolin4 | skip_scorpio | Boolean | A True/False option that determines if scorpio should be skipped. | FALSE | Optional | sars-cov-2 |
qc_check_task | ani_highest_percent | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | ani_highest_percent_bases_aligned | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | assembly_length | Int | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | busco_results | String | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | combined_mean_q_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | combined_mean_q_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | combined_mean_readlength_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | combined_mean_readlength_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | cpu | Int | Number of CPUs to allocate to the task | 4 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
qc_check_task | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
qc_check_task | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-03-16 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
qc_check_task | est_coverage_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | est_coverage_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | gambit_predicted_taxon | String | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | kraken_sc2 | String | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | kraken_sc2_dehosted | String | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | kraken_target_organism | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | kraken_target_organism_dehosted | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
qc_check_task | midas_secondary_genus_abundance | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | midas_secondary_genus_coverage | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | n50_value | Int | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | num_reads_clean2 | Int | Internal component, do not modify | Optional | ||
qc_check_task | num_reads_raw2 | Int | Internal component, do not modify | Optional | ||
qc_check_task | number_contigs | Int | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | quast_gc_percent | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r1_mean_q_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r1_mean_q_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r1_mean_readlength_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r1_mean_readlength_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r2_mean_q_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r2_mean_q_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r2_mean_readlength_clean | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | r2_mean_readlength_raw | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | sc2_s_gene_mean_coverage | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
qc_check_task | sc2_s_gene_percent_coverage | Float | Internal component, do not modify | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
stats_n_coverage | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | |
stats_n_coverage | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | |
stats_n_coverage | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/samtools:1.15 | Optional | |
stats_n_coverage | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | |
stats_n_coverage_primtrim | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional | |
stats_n_coverage_primtrim | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | |
stats_n_coverage_primtrim | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/samtools:1.15 | Optional | |
stats_n_coverage_primtrim | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional | |
vadr | cpu | Int | Number of CPUs to allocate to the task | 4 | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
vadr | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
vadr | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/vadr:1.5.1 | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
vadr | max_length | Int | Maximum length for the fasta-trim-terminal-ambigs.pl VADR script | 30000 | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
vadr | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 32 (RSV-A, RSV-B, and WNV) and 16 (all other TheiaCoV organisms) | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
vadr | min_length | Int | Minimum length subsequence to possibly replace Ns for the fasta-trim-terminal-ambigs.pl VADR script | 50 | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
vadr | skip_length | Int | Minimum assembly length (unambiguous) to run VADR | 10000 | Optional | MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
vadr | vadr_opts | String | Options for the v-annotate.pl VADR script | "--noseqnamemax --glsearch -s -r --nomisc --mkey sarscov2 --lowsim5seq 6 --lowsim3seq 6 --alt_fail lowscore,insertnn,deletinn --out_allfasta" | Optional | |
version_capture | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0 | Optional | |
version_capture | timezone | String | Set the time zone to get an accurate date of analysis (uses UTC by default) | Optional | ||
workflow name | medaka_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/artic-ncov2019:1.3.0-medaka-1.4.3 | Optional | |
workflow name | nextclade_dataset_name | String | Nextclade organism dataset names. However, if organism input is set correctly, this input will be automatically assigned the corresponding dataset name. See organism defaults for more information | Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments) here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | nextclade_dataset_tag | String | Nextclade dataset tag. Used for pulling up-to-date reference genomes and associated information specific to nextclade datasets (QC thresholds, organism-specific information like SARS-CoV-2 clade & lineage information, etc.) that is required for running the Nextclade tool. | Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments) here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | normalise | Int | Used to normalize the amount of reads to the indicated level before variant calling | 20000 for CL, 200 for ONT | Optional | |
workflow name | organism | String | The organism that is being analyzed. Options: "sars-cov-2", "MPXV", "WNV", "HIV", "flu", "rsv_a", "rsv_b". However, "flu" is not available for TheiaCoV_Illumina_SE | sars-cov-2 | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
workflow name | qc_check_table | File | TSV value with taxons for rows and QC values for columns; internal cells represent user-determined QC thresholds; if provided, turns on the QC Check task. See below for an example QC Check table. | Optional | ||
workflow name | reference_genome | File | An optional reference genome used for consensus assembly and QC | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | seq_method | String | The sequencing methodology used to generate the input read data; for TheiaProk workflows, this input will be used in the "seq_id" column in any taxon-specific tables created in the Export Taxon Tables task | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 | |
workflow name | target_organism | String | The organism whose abundance the user wants to check in their reads. This should be a proper taxonomic name recognized by the Kraken database. | Optional | HIV, MPXV, WNV, flu, rsv_a, rsv_b, sars-cov-2 |
Terra Task Name | Variable | Type | Description | Default Value | Terra Status |
---|---|---|---|---|---|
theiacov_fasta_batch | assembly_fastas | Array[File] | The assembly files for your samples in FASTA format | Required | |
theiacov_fasta_batch | bucket_name | String | The GCP bucket for the workspace where the TheiaCoV_FASTA_Batch output files are saved. We recommend using a unique GSURI for the bucket associated with your Terra workspace. The root GSURI is accessible in the Dashboard page of your workspace in the "Cloud Information" section.Do not include the prefix gs:// in the stringExample: ""fc-c526190d-4332-409b-8086-be7e1af9a0b6/theiacov_fasta_batch-2024-04-15-seq-run-1/ | Required | |
theiacov_fasta_batch | project_name | String | The name of the Terra project where the data can be found. Example: "my-terra-project" | Required | |
theiacov_fasta_batch | samplenames | Array[String] | The names of the samples being analyzed | Required | |
theiacov_fasta_batch | table_name | String | The name of the Terra table where the data can be found. Example: "sars-cov-2-sample" | Required | |
theiacov_fasta_batch | workspace_name | String | The name of the Terra workspace where the data can be found. Example "my-terra-workspace" | Required | |
cat_files_fasta | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional |
cat_files_fasta | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
cat_files_fasta | docker_image | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/utility:1.1 | Optional |
cat_files_fasta | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional |
nextclade_v3 | auspice_reference_tree_json | File | An Auspice JSON phylogenetic reference tree which serves as a target for phylogenetic placement. | Inherited from nextclade dataset | Optional |
nextclade_v3 | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional |
nextclade_v3 | disk_size | Int | Amount of storage (in GB) to allocate to the task | 50 | Optional |
nextclade_v3 | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/nextstrain/nextclade:3.10.2 | Optional |
nextclade_v3 | gene_annotations_gff | File | A genome annotation to specify how to translate the nucleotide sequence to proteins (genome_annotation.gff3). specifying this enables codon-informed alignment and protein alignments. See here for more info: https://docs.nextstrain.org/projects/nextclade/en/latest/user/input-files/03-genome-annotation.html | Inherited from nextclade dataset | Optional |
nextclade_v3 | input_ref | File | A nucleotide sequence which serves as a reference for the pairwise alignment of all input sequences. This is also the sequence which defines the coordinate system of the genome annotation. See here for more info: https://docs.nextstrain.org/projects/nextclade/en/latest/user/input-files/02-reference-sequence.html | Inherited from nextclade dataset | Optional |
nextclade_v3 | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional |
nextclade_v3 | nextclade_pathogen_json | File | General dataset configuration file. See here for more info: https://docs.nextstrain.org/projects/nextclade/en/latest/user/input-files/05-pathogen-config.html | Inherited from nextclade dataset | Optional |
nextclade_v3 | verbosity | String | other options are: "off" , "error" , "info" , "debug" , and "trace" (highest level of verbosity) | warn | Optional |
organism_parameters | flu_segment | String | Influenza genome segment being analyzed. Options: "HA" or "NA". Automatically determined. This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs | N/A | Optional |
organism_parameters | flu_subtype | String | The influenza subtype being analyzed. Options: "Yamagata", "Victoria", "H1N1", "H3N2", "H5N1". Automatically determined. This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs | N/A | Optional |
organism_parameters | gene_locations_bed_file | File | Use to provide locations of interest where average coverage will be calculated | Default provided for SARS-CoV-2 ("gs://theiagen-public-files-rp/terra/sars-cov-2-files/sc2_gene_locations.bed") and mpox ("gs://theiagen-public-files/terra/mpxv-files/mpox_gene_locations.bed") | Optional |
organism_parameters | genome_length_input | Int | Use to specify the expected genome length; provided by default for all supported organisms | Default provided for SARS-CoV-2 (29903), mpox (197200), WNV (11000), flu (13000), RSV-A (16000), RSV-B (16000), HIV (primer versions 1 [9181] and 2 [9840]) | Optional |
organism_parameters | hiv_primer_version | String | The version of HIV primers used. Options are "https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl#L156" and "https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl#L164". This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs | v1 | Optional |
organism_parameters | kraken_target_organism_input | String | The organism whose abundance the user wants to check in their reads. This should be a proper taxonomic name recognized by the Kraken database. | Default provided for mpox (Monkeypox virus), WNV (West Nile virus), and HIV (Human immunodeficiency virus 1) | Optional |
organism_parameters | primer_bed_file | File | The bed file containing the primers used when sequencing was performed | REQUIRED FOR SARS-CoV-2, MPOX, WNV, RSV-A & RSV-B. Provided by default only for HIV primer versions 1 ("gs://theiagen-public-files/terra/hivgc-files/HIV-1_v1.0.primer.hyphen.bed" and 2 ("gs://theiagen-public-files/terra/hivgc-files/HIV-1_v2.0.primer.hyphen400.1.bed") | Optional |
organism_parameters | reference_genome | File | An optional reference genome used for consensus assembly and QC | Optional | |
organism_parameters | reference_gff_file | File | Reference GFF file for the organism being analyzed | Default provided for mpox ("gs://theiagen-public-files/terra/mpxv-files/Mpox-MT903345.1.reference.gff3") and HIV (primer versions 1 ["gs://theiagen-public-files/terra/hivgc-files/NC_001802.1.gff3"] and 2 ["gs://theiagen-public-files/terra/hivgc-files/AY228557.1.gff3"]) | Optional |
organism_parameters | vadr_max_length | Int | Maximum length for the fasta-trim-terminal-ambigs.pl VADR script | Default provided for SARS-CoV-2 (30000), mpox (210000), WNV (11000), flu (0), RSV-A (15500) and RSV-B (15500). | Optional |
organism_parameters | vadr_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 32 (RSV-A, RSV-B, WNV) and 16 (all other TheiaCoV organisms) | Optional |
organism_parameters | vadr_options | String | Options for the v-annotate.pl VADR script | Default provided for SARS-CoV-2 ("--noseqnamemax --glsearch -s -r --nomisc --mkey sarscov2 --lowsim5seq 6 --lowsim3seq 6 --alt_fail lowscore,insertnn,deletinn --out_allfasta"), mpox ("--glsearch -s -r --nomisc --mkey mpxv --r_lowsimok --r_lowsimxd 100 --r_lowsimxl 2000 --alt_pass discontn,dupregin --out_allfasta --minimap2 --s_overhang 150"), WNV ("--mkey flavi --mdir /opt/vadr/vadr-models-flavi/ --nomisc --noprotid --out_allfasta"), flu (""), RSV-A ("-r --mkey rsv --xnocomp"), and RSV-B ("-r --mkey rsv --xnocomp") | Optional |
pangolin4 | analysis_mode | String | Used to switch between usher and pangolearn analysis modes. Only use usher because pangolearn is no longer supported as of Pangolin v4.3 and higher versions. | Optional | |
pangolin4 | cpu | Int | Number of CPUs to allocate to the task | 4 | Optional |
pangolin4 | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
pangolin4 | expanded_lineage | Boolean | True/False that determines if a lineage should be expanded without aliases (e.g., BA.1 → B.1.1.529.1) | TRUE | Optional |
pangolin4 | max_ambig | Float | The maximum proportion of Ns allowed for pangolin to attempt an assignment | 0.5 | Optional |
pangolin4 | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional |
pangolin4 | min_length | Int | Minimum query length allowed for pangolin to attempt an assignment | 10000 | Optional |
pangolin4 | pangolin_arguments | String | Optional arguments for pangolin e.g. ''--skip-scorpio'' | Optional | |
pangolin4 | skip_designation_cache | Boolean | A True/False option that determines if the designation cache should be used | FALSE | Optional |
pangolin4 | skip_scorpio | Boolean | A True/False option that determines if scorpio should be skipped. | FALSE | Optional |
sm_theiacov_fasta_wrangling | cpu | Int | Number of CPUs to allocate to the task | 8 | Optional |
sm_theiacov_fasta_wrangling | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
sm_theiacov_fasta_wrangling | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-08-28-v4 | Optional |
sm_theiacov_fasta_wrangling | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional |
theiacov_fasta_batch | nextclade_dataset_name | String | Nextclade organism dataset name. Options: "nextstrain/sars-cov-2/wuhan-hu-1/orfs" However, if organism input is set correctly, this input will be automatically assigned the corresponding dataset name. | sars-cov-2 | Optional |
theiacov_fasta_batch | nextclade_dataset_tag | String | Nextclade dataset tag. Used for pulling up-to-date reference genomes and associated information specific to nextclade datasets (QC thresholds, organism-specific information like SARS-CoV-2 clade & lineage information, etc.) that is required for running the Nextclade tool. | 2024-06-13--23-42-47Z | Optional |
theiacov_fasta_batch | organism | String | The organism that is being analyzed. Options: "sars-cov-2" | sars-cov-2 | Optional |
theiacov_fasta_batch | pangolin_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/pangolin:4.3.1-pdata-1.33 | Optional |
version_capture | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0 | Optional |
version_capture | timezone | String | Set the time zone to get an accurate date of analysis (uses UTC by default) | Optional |
Organism-specific parameters and logic¶
The organism_parameters
sub-workflow is the first step in all TheiaCoV workflows. This step automatically sets the different parameters needed for each downstream tool to the appropriate value for the user-designated organism (by default, "sars-cov-2"
is the default organism).
The following tables include the relevant organism-specific parameters; all of these default values can be overwritten by providing a value for the "Overwrite Variable Name" field.
SARS-CoV-2 Defaults
Overwrite Variable Name | Organism | Default Value |
---|---|---|
gene_locations_bed_file | sars-cov-2 | "gs://theiagen-public-files-rp/terra/sars-cov-2-files/sc2_gene_locations.bed" |
genome_length_input | sars-cov-2 | 29903 |
kraken_target_organism_input | sars-cov-2 | "Severe acute respiratory syndrome coronavirus 2" |
nextclade_dataset_name_input | sars-cov-2 | "nextstrain/sars-cov-2/wuhan-hu-1/orfs" |
pangolin_docker_image | sars-cov-2 | "us-docker.pkg.dev/general-theiagen/staphb/pangolin:4.3.1-pdata-1.33 " |
nextclade_dataset_tag_input | sars-cov-2 | "2025-03-26--11-47-13Z" |
reference_genome | sars-cov-2 | "gs://theiagen-public-files-rp/terra/augur-sars-cov-2-references/MN908947.fasta" |
vadr_max_length | sars-cov-2 | 30000 |
vadr_mem | sars-cov-2 | 8 |
vadr_options | sars-cov-2 | "--noseqnamemax --glsearch -s -r --nomisc --mkey sarscov2 --lowsim5seq 6 --lowsim3seq 6 --alt_fail lowscore,insertnn,deletinn --out_allfasta" |
Mpox Defaults
Overwrite Variable Name | Organism | Default Value |
---|---|---|
gene_locations_bed_file | MPXV | "gs://theiagen-public-files/terra/mpxv-files/mpox_gene_locations.bed" |
genome_length_input | MPXV | 197200 |
kraken_target_organism_input | MPXV | "Monkeypox virus" |
nextclade_dataset_name_input | MPXV | "nextstrain/mpox/lineage-b.1" |
nextclade_dataset_tag_input | MPXV | "2024-11-19--14-18-53Z" |
primer_bed_file | MPXV | "gs://theiagen-public-files/terra/mpxv-files/MPXV.primer.bed" |
reference_genome | MPXV | "gs://theiagen-public-files/terra/mpxv-files/MPXV.MT903345.reference.fasta" |
reference_gff_file | MPXV | "gs://theiagen-public-files/terra/mpxv-files/Mpox-MT903345.1.reference.gff3" |
vadr_max_length | MPXV | 210000 |
vadr_mem | MPXV | 8 |
vadr_options | MPXV | "--glsearch -s -r --nomisc --mkey mpxv --r_lowsimok --r_lowsimxd 100 --r_lowsimxl 2000 --alt_pass discontn,dupregin --out_allfasta --minimap2 --s_overhang 150" |
WNV Defaults
Overwrite Variable Name | Organism | Default Value | Notes |
---|---|---|---|
genome_length_input | WNV | 11000 |
|
kraken_target_organism_input | WNV | "West Nile virus " |
|
nextclade_dataset_name_input | WNV | "NA" |
TheiaCoV's Nextclade currently does not support WNV |
nextclade_dataset_tag_input | WNV | "NA" |
TheiaCoV's Nextclade currently does not support WNV |
primer_bed_file | WNV | "gs://theiagen-public-files/terra/theiacov-files/WNV/WNV-L1_primer.bed" |
|
reference_genome | WNV | "gs://theiagen-public-files/terra/theiacov-files/WNV/NC_009942.1_wnv_L1.fasta" |
|
vadr_max_length | WNV | 11000 |
|
vadr_mem | WNV | 8 |
|
vadr_options | WNV | "--mkey flavi --mdir /opt/vadr/vadr-models-flavi/ --nomisc --noprotid --out_allfasta" |
Flu Defaults
Overwrite Variable Name | Organism | Flu Segment | Flu Subtype | Default Value | Notes |
---|---|---|---|---|---|
flu_segment | flu | all | all | N/A | TheiaCoV will attempt to automatically assign a flu segment |
flu_subtype | flu | all | all | N/A | TheiaCoV will attempt to automatically assign a flu subtype |
genome_length_input | flu | all | all | 13500 |
|
vadr_max_length | flu | all | all | 13500 |
|
vadr_mem | flu | all | all | 8 |
|
vadr_options | flu | all | all | "--atgonly --xnocomp --nomisc --alt_fail extrant5,extrant3 --mkey flu" |
|
nextclade_dataset_name_input | flu | ha | h1n1 | "nextstrain/flu/h1n1pdm/ha/MW626062" |
|
nextclade_dataset_tag_input | flu | ha | h1n1 | "2025-01-22--09-54-14Z" |
|
reference_genome | flu | ha | h1n1 | "gs://theiagen-public-files-rp/terra/flu-references/reference_h1n1pdm_ha.fasta" |
|
nextclade_dataset_name_input | flu | ha | h3n2 | "nextstrain/flu/h3n2/ha/EPI1857216" |
|
nextclade_dataset_tag_input | flu | ha | h3n2 | "2025-01-22--09-54-14Z" |
|
reference_genome | flu | ha | h3n2 | "gs://theiagen-public-files-rp/terra/flu-references/reference_h3n2_ha.fasta" |
|
nextclade_dataset_name_input | flu | ha | victoria | "nextstrain/flu/vic/ha/KX058884" |
|
nextclade_dataset_tag_input | flu | ha | victoria | "2025-01-22--09-54-14Z" |
|
reference_genome | flu | ha | victoria | "gs://theiagen-public-files-rp/terra/flu-references/reference_vic_ha.fasta" |
|
nextclade_dataset_name_input | flu | ha | yamagata | "nextstrain/flu/yam/ha/JN993010" |
|
nextclade_dataset_tag_input | flu | ha | yamagata | "2024-01-30--16-34-55Z" |
|
reference_genome | flu | ha | yamagata | "gs://theiagen-public-files-rp/terra/flu-references/reference_yam_ha.fasta" |
|
nextclade_dataset_name_input | flu | ha | h5n1 | "community/moncla-lab/iav-h5/ha/all-clades" |
|
nextclade_dataset_tag_input | flu | ha | h5n1 | "2025-01-30--18-05-53Z" |
|
reference_genome | flu | ha | h5n1 | "gs://theiagen-public-files-rp/terra/flu-references/reference_h5n1_ha.fasta" |
|
nextclade_dataset_name_input | flu | na | h1n1 | "nextstrain/flu/h1n1pdm/na/MW626056" |
|
nextclade_dataset_tag_input | flu | na | h1n1 | "2025-03-26--11-47-13" |
|
reference_genome | flu | na | h1n1 | "gs://theiagen-public-files-rp/terra/flu-references/reference_h1n1pdm_na.fasta" |
|
nextclade_dataset_name_input | flu | na | h3n2 | "nextstrain/flu/h3n2/na/EPI1857215" |
|
nextclade_dataset_tag_input | flu | na | h3n2 | "2025-01-22--09-54-14Z" |
|
reference_genome | flu | na | h3n2 | "gs://theiagen-public-files-rp/terra/flu-references/reference_h3n2_na.fasta" |
|
nextclade_dataset_name_input | flu | na | victoria | "nextstrain/flu/vic/na/CY073894" |
|
nextclade_dataset_tag_input | flu | na | victoria | "2025-03-26--11-47-13Z" |
|
reference_genome | flu | na | victoria | "gs://theiagen-public-files-rp/terra/flu-references/reference_vic_na.fasta" |
|
nextclade_dataset_name_input | flu | na | yamagata | "NA" |
|
nextclade_dataset_tag_input | flu | na | yamagata | "NA" |
|
reference_genome | flu | na | yamagata | "gs://theiagen-public-files-rp/terra/flu-references/reference_yam_na.fasta" |
RSV-A Defaults
Overwrite Variable Name | Organism | Default Value |
---|---|---|
genome_length_input | rsv_a | 16000 |
kraken_target_organism | rsv_a | "Human respiratory syncytial virus A" |
nextclade_dataset_name_input | rsv_a | nextstrain/rsv/a/EPI_ISL_412866 |
nextclade_dataset_tag_input | rsv_a | "2024-11-27--02-51-00Z" |
reference_genome | rsv_a | gs://theiagen-public-files-rp/terra/rsv_references/reference_rsv_a.fasta |
vadr_max_length | rsv_a | 15500 |
vadr_mem | rsv_a | 32 |
vadr_options | rsv_a | -r --mkey rsv --xnocomp |
RSV-B Defaults
Overwrite Variable Name | Organism | Default Value |
---|---|---|
genome_length_input | rsv_b | 16000 |
kraken_target_organism | rsv_b | "human respiratory syncytial virus" |
nextclade_dataset_name_input | rsv_b | nextstrain/rsv/b/EPI_ISL_1653999 |
nextclade_dataset_tag_input | rsv_b | "2025-03-04--17-31-25Z" |
reference_genome | rsv_b | gs://theiagen-public-files-rp/terra/rsv_references/reference_rsv_b.fasta |
vadr_max_length | rsv_b | 15500 |
vadr_mem | rsv_b | 32 |
vadr_options | rsv_b | -r --mkey rsv --xnocomp |
HIV Defaults
Overwrite Variable Name | Organism | Default Value | Notes |
---|---|---|---|
kraken_target_organism_input | HIV | Human immunodeficiency virus 1 | |
genome_length_input | HIV-v1 | 9181 | This version of HIV originates from Oregon |
primer_bed_file | HIV-v1 | gs://theiagen-public-files/terra/hivgc-files/HIV-1_v1.0.primer.hyphen.bed | This version of HIV originates from Oregon |
reference_genome | HIV-v1 | gs://theiagen-public-files/terra/hivgc-files/NC_001802.1.fasta | This version of HIV originates from Oregon |
reference_gff_file | HIV-v1 | gs://theiagen-public-files/terra/hivgc-files/NC_001802.1.gff3 | This version of HIV originates from Oregon |
genome_length_input | HIV-v2 | 9840 | This version of HIV originates from Southern Africa |
primer_bed_file | HIV-v2 | gs://theiagen-public-files/terra/hivgc-files/HIV-1_v2.0.primer.hyphen400.1.bed | This version of HIV originates from Southern Africa |
reference_genome | HIV-v2 | gs://theiagen-public-files/terra/hivgc-files/AY228557.1.headerchanged.fasta | This version of HIV originates from Southern Africa |
reference_gff_file | HIV-v2 | gs://theiagen-public-files/terra/hivgc-files/AY228557.1.gff3 | This version of HIV originates from Southern Africa |
Workflow Tasks¶
All input reads are processed through "core tasks" in the TheiaCoV Illumina, ONT, and ClearLabs workflows. These undertake read trimming and assembly appropriate to the input data type. TheiaCoV workflows subsequently launch default genome characterization modules for quality assessment, and additional taxa-specific characterization steps. When setting up the workflow, users may choose to use "optional tasks" as additions or alternatives to tasks run in the workflow by default.
Core tasks¶
These tasks are performed regardless of organism, and perform read trimming and various quality control steps.
versioning
: Version Capture
The versioning
task captures the workflow version from the GitHub (code repository) version.
Version Capture Technical details
Links | |
---|---|
Task | task_versioning.wdl |
read_QC_trim
: Read Quality Trimming, Adapter Removal, Quantification, and Identification
read_QC_trim
is a sub-workflow that removes low-quality reads, low-quality regions of reads, and sequencing adapters to improve data quality. It uses a number of tasks, described below. The differences between the PE and SE versions of the read_QC_trim
sub-workflow lie in the default parameters, the use of two or one input read file(s), and the different output files.
HRRT
: Human Host Sequence Removal
All reads of human origin are removed, including their mates, by using NCBI's human read removal tool (HRRT).
HRRT is based on the SRA Taxonomy Analysis Tool and employs a k-mer database constructed of k-mers from Eukaryota derived from all human RefSeq records with any k-mers found in non-Eukaryota RefSeq records subtracted from the database.
NCBI-Scrub Technical Details
Links | |
---|---|
Task | task_ncbi_scrub.wdl |
Software Source Code | HRRT on GitHub |
Software Documentation | HRRT on NCBI |
Read quality trimming
Either trimmomatic
or fastp
can be used for read-quality trimming. Trimmomatic is used by default. Both tools trim low-quality regions of reads with a sliding window (with a window size of trim_window_size
), cutting once the average quality within the window falls below trim_quality_trim_score
. They will both discard the read if it is trimmed below trim_minlen
.
If fastp is selected for analysis, fastp also implements the additional read-trimming steps indicated below:
Parameter | Explanation |
---|---|
-g |
enables polyG tail trimming |
-5 20 |
enables read end-trimming |
-3 20 |
enables read end-trimming |
--detect_adapter_for_pe |
enables adapter-trimming only for paired-end reads |
Adapter removal
The BBDuk
task removes adapters from sequence reads. To do this:
- Repair from the BBTools package reorders reads in paired fastq files to ensure the forward and reverse reads of a pair are in the same position in the two fastq files.
- BBDuk ("Bestus Bioinformaticus" Decontamination Using Kmers) is then used to trim the adapters and filter out all reads that have a 31-mer match to PhiX, which is commonly added to Illumina sequencing runs to monitor and/or improve overall run quality.
What are adapters and why do they need to be removed?
Adapters are manufactured oligonucleotide sequences attached to DNA fragments during the library preparation process. In Illumina sequencing, these adapter sequences are required for attaching reads to flow cells. You can read more about Illumina adapters here. For genome analysis, it's important to remove these sequences since they're not actually from your sample. If you don't remove them, the downstream analysis may be affected.
Read Quantification
There are two methods for read quantification to choose from: fastq-scan
(default) or fastqc
. Both quantify the forward and reverse reads in FASTQ files. For paired-end data, they also provide the total number of read pairs. This task is run once with raw reads as input and once with clean reads as input. If QC has been performed correctly, you should expect fewer clean reads than raw reads. fastqc
also provides a graphical visualization of the read quality.
Read Identification with Kraken2
Kraken2
is a bioinformatics tool originally designed for metagenomic applications. It has additionally proven valuable for validating taxonomic assignments and checking contamination of single-species (e.g. bacterial isolate, eukaryotic isolate, viral isolate, etc.) whole genome sequence data.
Kraken2 is run on both the raw and clean reads.
Database-dependent
This workflow automatically uses a viral-specific Kraken2 database. This database was generated in-house from RefSeq's viral sequence collection and human genome GRCh38. It's available at gs://theiagen-large-public-files-rp/terra/databases/kraken2/kraken2_humanGRCh38_viralRefSeq_20240828.tar.gz
.
Kraken2 Technical Details
Links | |
---|---|
Task | task_kraken2.wdl |
Software Source Code | Kraken2 on GitHub |
Software Documentation | https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown |
Original Publication(s) | Improved metagenomic analysis with Kraken 2 |
read_QC_trim Technical Details
read_QC_trim
: Read Quality Trimming, Adapter Removal, Quantification, and Identification
read_QC_trim
is a sub-workflow that removes low-quality reads, low-quality regions of reads, and sequencing adapters to improve data quality. It uses a number of tasks, described below. The differences between the PE and SE versions of the read_QC_trim
sub-workflow lie in the default parameters, the use of two or one input read file(s), and the different output files.
HRRT
: Human Host Sequence Removal
All reads of human origin are removed, including their mates, by using NCBI's human read removal tool (HRRT).
HRRT is based on the SRA Taxonomy Analysis Tool and employs a k-mer database constructed of k-mers from Eukaryota derived from all human RefSeq records with any k-mers found in non-Eukaryota RefSeq records subtracted from the database.
NCBI-Scrub Technical Details
Links | |
---|---|
Task | task_ncbi_scrub.wdl |
Software Source Code | HRRT on GitHub |
Software Documentation | HRRT on NCBI |
Read quality trimming
Either trimmomatic
or fastp
can be used for read-quality trimming. Trimmomatic is used by default. Both tools trim low-quality regions of reads with a sliding window (with a window size of trim_window_size
), cutting once the average quality within the window falls below trim_quality_trim_score
. They will both discard the read if it is trimmed below trim_minlen
.
If fastp is selected for analysis, fastp also implements the additional read-trimming steps indicated below:
Parameter | Explanation |
---|---|
-g |
enables polyG tail trimming |
-5 20 |
enables read end-trimming |
-3 20 |
enables read end-trimming |
--detect_adapter_for_pe |
enables adapter-trimming only for paired-end reads |
Adapter removal
The BBDuk
task removes adapters from sequence reads. To do this:
- Repair from the BBTools package reorders reads in paired fastq files to ensure the forward and reverse reads of a pair are in the same position in the two fastq files.
- BBDuk ("Bestus Bioinformaticus" Decontamination Using Kmers) is then used to trim the adapters and filter out all reads that have a 31-mer match to PhiX, which is commonly added to Illumina sequencing runs to monitor and/or improve overall run quality.
What are adapters and why do they need to be removed?
Adapters are manufactured oligonucleotide sequences attached to DNA fragments during the library preparation process. In Illumina sequencing, these adapter sequences are required for attaching reads to flow cells. You can read more about Illumina adapters here. For genome analysis, it's important to remove these sequences since they're not actually from your sample. If you don't remove them, the downstream analysis may be affected.
Read Quantification
There are two methods for read quantification to choose from: fastq-scan
(default) or fastqc
. Both quantify the forward and reverse reads in FASTQ files. For paired-end data, they also provide the total number of read pairs. This task is run once with raw reads as input and once with clean reads as input. If QC has been performed correctly, you should expect fewer clean reads than raw reads. fastqc
also provides a graphical visualization of the read quality.
Read Identification with Kraken2
Kraken2
is a bioinformatics tool originally designed for metagenomic applications. It has additionally proven valuable for validating taxonomic assignments and checking contamination of single-species (e.g. bacterial isolate, eukaryotic isolate, viral isolate, etc.) whole genome sequence data.
Kraken2 is run on both the raw and clean reads.
Database-dependent
This workflow automatically uses a viral-specific Kraken2 database. This database was generated in-house from RefSeq's viral sequence collection and human genome GRCh38. It's available at gs://theiagen-large-public-files-rp/terra/databases/kraken2/kraken2_humanGRCh38_viralRefSeq_20240828.tar.gz
.
Kraken2 Technical Details
Links | |
---|---|
Task | task_kraken2.wdl |
Software Source Code | Kraken2 on GitHub |
Software Documentation | https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown |
Original Publication(s) | Improved metagenomic analysis with Kraken 2 |
read_QC_trim Technical Details
read_QC_trim_ont
: Read Quality Trimming, Quantification, and Identification
read_QC_trim_ont
is a sub-workflow that filters low-quality reads and trims low-quality regions of reads. It uses several tasks, described below.
HRRT
: Human Host Sequence Removal
All reads of human origin are removed, including their mates, by using NCBI's human read removal tool (HRRT).
HRRT is based on the SRA Taxonomy Analysis Tool and employs a k-mer database constructed of k-mers from Eukaryota derived from all human RefSeq records with any k-mers found in non-Eukaryota RefSeq records subtracted from the database.
NCBI-Scrub Technical Details
Links | |
---|---|
Task | task_ncbi_scrub.wdl |
Software Source Code | HRRT on GitHub |
Software Documentation | HRRT on NCBI |
Read quality filtering
Read filtering is performed using artic guppyplex
which performs a quality check by filtering the reads by length to remove chimeric reads.
Read Identification with Kraken2
Kraken2
is a bioinformatics tool originally designed for metagenomic applications. It has additionally proven valuable for validating taxonomic assignments and checking contamination of single-species (e.g. bacterial isolate, eukaryotic isolate, viral isolate, etc.) whole genome sequence data.
Kraken2 is run on both the raw and clean reads.
Database-dependent
This workflow automatically uses a viral-specific Kraken2 database. This database was generated in-house from RefSeq's viral sequence collection and human genome GRCh38. It's available at gs://theiagen-large-public-files-rp/terra/databases/kraken2/kraken2_humanGRCh38_viralRefSeq_20240828.tar.gz
.
Kraken2 Technical Details
Links | |
---|---|
Task | task_kraken2.wdl |
Software Source Code | Kraken2 on GitHub |
Software Documentation | https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown |
Original Publication(s) | Improved metagenomic analysis with Kraken 2 |
nanoplot
: Plotting and quantifying long-read sequencing data
Nanoplot is used for the determination of mean quality scores, read lengths, and number of reads. This task is run once with raw reads as input and once with clean reads as input. If QC has been performed correctly, you should expect fewer clean reads than raw reads.
read_QC_trim_ont Technical Details
qc_check
: Check QC Metrics Against User-Defined Thresholds (optional)
The qc_check
task compares generated QC metrics against user-defined thresholds for each metric. This task will run if the user provides a qc_check_table
TSV file. If all QC metrics meet the threshold, the qc_check
output variable will read QC_PASS
. Otherwise, the output will read QC_NA
if the task could not proceed or QC_ALERT
followed by a string indicating what metric failed.
The qc_check
task applies quality thresholds according to the specified organism, which should match the standardized organism
input in the TheiaCoV workflows.
Formatting the qc_check_table.tsv
- The first column of the qc_check_table lists the
organism
that the task will assess and the header of this column must be "taxon". - Each subsequent column indicates a QC metric and lists a threshold for each organism that will be checked. The column names must exactly match expected values, so we highly recommend copy and pasting the header from the template file below as a starting place.
Template qc_check_table.tsv files
- TheiaCoV_Illumina_PE: TheiaCoV_Illumina_PE_qc_check_template.tsv
Example Purposes Only
The QC threshold values shown in the file above are for example purposes only and should not be presumed to be sufficient for every dataset.
qc_check
Technical Details
Links | |
---|---|
Task | task_qc_check_phb.wdl |
Assembly tasks¶
Either one of these tasks is run depending on the organism and workflow type.
ivar_consensus
: Alignment, Consensus, Variant Detection, and Assembly Statistics for non-flu organisms in Illumina workflows
ivar_consensus
is a sub-workflow within TheiaCoV that performs reference-based consensus assembly using the iVar tool by Nathan Grubaugh from the Andersen lab.
The following steps are performed as part of this sub-workflow:
- Cleaned reads are aligned to the appropriate reference genome (see also the organism-specific parameters and logic section above) using BWA to generate a Binary Alignment Mapping (BAM) file.
- If
trim_primers
is set to true, primers will be removed usingivar trim
.- General statistics about the remaining reads are calculated.
- The
ivar consensus
command is run to generate a consensus assembly. - General statistics about the assembly are calculated..
iVar Consensus Technical Details
Workflow | TheiaCoV_Illumina_PE & TheiaCoV_Illumina_SE |
---|---|
Sub-workflow | wf_ivar_consensus.wdl |
Tasks | task_bwa.wdl task_ivar_primer_trim.wdl task_assembly_metrics.wdl task_ivar_variant_call.wdl task_ivar_consensus.wdl |
Software Source Code | BWA on GitHub, iVar on GitHub |
Software Documentation | BWA on SourceForge, iVar on GitHub |
Original Publication(s) | Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar |
artic_consensus
: Alignment, Primer Trimming, Variant Detection, and Consensus for non-flu organisms in ONT & ClearLabs workflows
Briefly, input reads are aligned to the appropriate reference with minimap2 to generate a Binary Alignment Mapping (BAM) file. Primer sequences are then removed from the BAM file and a consensus assembly file is generated using the Artic minion Medaka argument.
Read-trimming is performed on raw read data generated on the ClearLabs instrument and thus not a required step in the TheiaCoV_ClearLabs workflow.
General statistics about the assembly are generated with the consensus_qc
task (task_assembly_metrics.wdl).
Artic Consensus Technical Details
Links | |
---|---|
Task | task_artic_consensus.wdl |
Software Source Code | Artic on GitHub |
Software Documentation | Artic pipeline |
irma
: Assembly and Characterization for flu in TheiaCoV_Illumina_PE & TheiaCoV_ONT
Cleaned reads are assembled using irma
which stands for Iterative Refinement Meta-Assembler. IRMA first sorts reads to Flu genome segments using LABEL, then iteratively maps read to collection of reference sequences (in this case for Influenza virus) and iteratively edits the references to account for high population diversity and mutational rates that are characteristic of Influenza genomes. Assemblies produced by irma
will be ordered from largest to smallest assembled flu segment. irma
also performs typing and subtyping as part of the assembly process. Note: IRMA does not differentiate between Flu B Victoria and Yamagata lineages. For determining this information, please review the abricate
task outputs which will provide this information.
Due to the segmented nature of the Influenza genome and the various downstream bioinformatics tools that require the genome assembly, the IRMA task & TheiaCoV workflows output various genome assembly files. Briefly they are:
assembly_fasta
- The full genome assembly in FASTA format, with 1 FASTA entry per genome segment. There should be 8 segments in total, but depending on the quality and depth of sequence data, some segments may not be assembled and nor present in this output file.irma_assembly_fasta_concatenated
- The full genome assembly in FASTA format, but with all segments concatenated into a single FASTA entry. This is not your typical FASTA file and is purposely created to be used with a custom Nextclade dataset for the H5N1 B3.13 genotype that is based on a concatenated reference genome.irma_<segment-abbreviation>_segment_fasta
- Individual FASTA files that only contain the sequence for 1 segment, for example the HA segment. There are 8 of these in total.
General statistics about the assembly are generated with the consensus_qc
task (task_assembly_metrics.wdl).
IRMA Technical Details
Links | |
---|---|
Task | task_irma.wdl |
Software Documentation | IRMA website |
Original Publication(s) | Viral deep sequencing needs an adaptive approach: IRMA, the iterative refinement meta-assembler |
Organism-specific characterization tasks¶
The following tasks only run for the appropriate organism designation. The following table illustrates which characterization tools are run for the indicated organism.
SARS-CoV-2 | MPXV | HIV | WNV | Influenza | RSV-A | RSV-B | |
---|---|---|---|---|---|---|---|
Pangolin | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
Nextclade | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ |
VADR | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ |
Quasitools HyDRA | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ |
IRMA | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ |
Abricate | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ |
% Gene Coverage | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
Antiviral Detection | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ |
GenoFLU | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ |
pangolin
Pangolin designates SARS-CoV-2 lineage assignments.
Pangolin Technical Details
Links | |
---|---|
Task | task_pangolin.wdl |
Software Source Code | Pangolin on GitHub |
Software Documentation | Pangolin website |
Original Publication(s) | A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology |
nextclade
Nextclade Technical Details
Links | |
---|---|
Task | task_nextclade.wdl |
Software Source Code | https://github.com/nextstrain/nextclade |
Software Documentation | Nextclade |
Original Publication(s) | Nextclade: clade assignment, mutation calling and quality control for viral genomes. |
vadr
VADR annotates and validates completed assembly files.
VADR Technical Details
Links | |
---|---|
Task | task_vadr.wdl |
Software Source Code | https://github.com/ncbi/vadr |
Software Documentation | https://github.com/ncbi/vadr/wiki |
Original Publication(s) | For SARS-CoV-2: Faster SARS-CoV-2 sequence validation and annotation for GenBank using VADR For non-SARS_CoV-2: VADR: validation and annotation of virus sequence submissions to GenBank |
quasitools
quasitools
performs genome characterization for HIV.
Quasitools Technical Details
Links | |
---|---|
Task | task_quasitools.wdl |
Software Source Code | https://github.com/phac-nml/quasitools/ |
Software Documentation | Quasitools HyDRA |
irma
IRMA assigns types and subtype/lineages in addition to performing assembly of flu genomes. Please see the section above under "Assembly tasks" to find more information regarding this tool.
IRMA Technical Details
Links | |
---|---|
Task | task_irma.wdl |
Software Documentation | IRMA website |
Original Publication(s) | Viral deep sequencing needs an adaptive approach: IRMA, the iterative refinement meta-assembler |
abricate
Abricate assigns types and subtype/lineages for flu samples
Abricate Technical Details
Links | |
---|---|
Task | task_abricate.wdl (abricate_flu subtask) |
Software Source Code | ABRicate on GitHub |
Software Documentation | ABRicate on GitHub |
gene_coverage
This task calculates the percent of the gene covered above a minimum depth. By default, it runs for SARS-CoV-2 and MPXV, but if a bed file is provided with regions of interest, this task will be run for other organisms as well.
Gene Coverage Technical Details
Links | |
---|---|
Task | task_gene_coverage.wdl |
flu_antiviral_substitutions
This sub-workflow determines which, if any, antiviral mutations are present in the sample.
The assembled HA, NA, PA, PB1 and PB2 segments are compared against a list of known amino-acid substitutions associated with resistance to the antivirals A_315675, Amantadine, compound_367, Favipiravir, Fludase, L_742_001, Laninamivir, Oseltamivir (tamiflu), Peramivir, Pimodivir, Rimantadine, Xofluza, and Zanamivir. The list of known amino-acid substitutions associated with resistance can be expanded via optional user input antiviral_aa_subs
in the format "NA:V95A,HA:I97V
", i.e. Protein:AAPositionAA
.
The list of amino-acid substitutions associated with antiviral resistance includes both substitutions reported to confer antiviral resistance in the scientific literature and those inferred to potentially cause antiviral resistance based on an analogous mutation reported to confer antiviral resistance in another flu subtype. A table with the explanation for each amino-acid substitution in the antiviral resistance task is available here.
Antiviral Substitutions Technical Details
genoflu
This sub-workflow determines the whole-genome genotype of an H5N1 flu sample.
GenoFLU Technical Details
Links | |
---|---|
Task | task_genoflu.wdl |
Software Source Code | GenoFLU on GitHub |
Outputs¶
Variable | Type | Description |
---|---|---|
abricate_flu_database | String | ABRicate database used for analysis |
abricate_flu_results | File | File containing all results from ABRicate |
abricate_flu_subtype | String | Flu subtype as determined by ABRicate |
abricate_flu_type | String | Flu type as determined by ABRicate |
abricate_flu_version | String | Version of ABRicate |
aligned_bai | File | Index companion file to the bam file generated during the consensus assembly process |
aligned_bam | File | Sorted BAM file containing the alignments of reads to the reference genome |
assembly_fasta | File | Consensus genome assembly; for lower quality flu samples, the output may state "Assembly could not be generated" when there is too little and/or too low quality data for IRMA to produce an assembly. Contigs will be ordered from largest to smallest when IRMA is used. |
assembly_length_unambiguous | Int | Number of unambiguous basecalls within the consensus assembly |
assembly_mean_coverage | Float | Mean sequencing depth throughout the consensus assembly. Generated after performing primer trimming and calculated using the SAMtools coverage command |
assembly_method | String | Method employed to generate consensus assembly |
auspice_json | File | Auspice-compatable JSON output generated from Nextclade analysis that includes the Nextclade default samples for clade-typing and the single sample placed on this tree |
auspice_json_flu_h5n1 | File | Auspice-compatable JSON output generated from Nextclade analysis on Influenza H5N1 whole genome that includes the samples included in the "avian-flu/h5n1-cattle-outbreak" nextstrain build that is focused on B3.13 genotype and the single sample placed on this tree |
auspice_json_flu_ha | File | Auspice-compatable JSON output generated from Nextclade analysis on Influenza HA segment that includes the Nextclade default samples for clade-typing and the single sample placed on this tree |
auspice_json_flu_na | File | Auspice-compatable JSON output generated from Nextclade analysis on Influenza NA segment that includes the Nextclade default samples for clade-typing and the single sample placed on this tree |
bbduk_docker | String | The Docker image for bbduk, which was used to remove the adapters from the sequences |
bwa_version | String | Version of BWA software used |
consensus_flagstat | File | Output from the SAMtools flagstat command to assess quality of the alignment file (BAM) |
consensus_n_variant_min_depth | Int | Minimum read depth to call variants for iVar consensus and iVar variants. Also represents the minimum consensus support threshold used by IRMA with Illumina Influenza data. |
consensus_stats | File | Output from the SAMtools stats command to assess quality of the alignment file (BAM) |
est_percent_gene_coverage_tsv | File | Percent coverage for each gene in the organism being analyzed (depending on the organism input) |
fastp_html_report | File | The HTML report made with fastp |
fastp_version | String | The version of fastp used |
fastq_scan_clean1_json | File | The JSON file output from fastq-scan containing summary stats about clean forward read quality and length |
fastq_scan_clean2_json | File | The JSON file output from fastq-scan containing summary stats about clean reverse read quality and length |
fastq_scan_num_reads_clean_pairs | String | The number of read pairs after cleaning as calculated by fastq_scan |
fastq_scan_num_reads_clean1 | Int | The number of forward reads after cleaning as calculated by fastq_scan |
fastq_scan_num_reads_clean2 | Int | The number of reverse reads after cleaning as calculated by fastq_scan |
fastq_scan_num_reads_raw_pairs | String | The number of input read pairs as calculated by fastq_scan |
fastq_scan_num_reads_raw1 | Int | The number of input forward reads as calculated by fastq_scan |
fastq_scan_num_reads_raw2 | Int | The number of input reserve reads as calculated by fastq_scan |
fastq_scan_r1_mean_q_clean | Float | Forward read mean quality value after quality trimming and adapter removal |
fastq_scan_r1_mean_q_raw | Float | Forward read mean quality value before quality trimming and adapter removal |
fastq_scan_r1_mean_readlength_clean | Float | Forward read mean read length value after quality trimming and adapter removal |
fastq_scan_r1_mean_readlength_raw | Float | Forward read mean read length value before quality trimming and adapter removal |
fastq_scan_raw1_json | File | The JSON file output from fastq-scan containing summary stats about raw forward read quality and length |
fastq_scan_raw2_json | File | The JSON file output from fastq-scan containing summary stats about raw reverse read quality and length |
fastq_scan_version | String | The version of fastq_scan |
fastqc_clean1_html | File | An HTML file that provides a graphical visualization of clean forward read quality from fastqc to open in an internet browser |
fastqc_clean2_html | File | An HTML file that provides a graphical visualization of clean reverse read quality from fastqc to open in an internet browser |
fastqc_docker | String | The Docker container used for fastqc |
fastqc_num_reads_clean_pairs | String | The number of read pairs after cleaning by fastqc |
fastqc_num_reads_clean1 | Int | The number of forward reads after cleaning by fastqc |
fastqc_num_reads_clean2 | Int | The number of reverse reads after cleaning by fastqc |
fastqc_num_reads_raw_pairs | String | The number of input read pairs by fastqc before cleaning |
fastqc_num_reads_raw1 | Int | The number of input forward reads by fastqc before cleaning |
fastqc_num_reads_raw2 | Int | The number of input reverse reads by fastqc before cleaning |
fastqc_raw1_html | File | An HTML file that provides a graphical visualization of raw forward read quality from fastqc to open in an internet browser |
fastqc_raw2_html | File | An HTML file that provides a graphical visualization of raw reverse read quality from fastqc to open in an internet browser |
fastqc_version | String | Version of fastqc software used |
flu_A_315675_resistance | String | resistance mutations to A_315675 |
flu_amantadine_resistance | String | resistance mutations to amantadine |
flu_compound_367_resistance | String | resistance mutations to compound_367 |
flu_favipiravir_resistance | String | resistance mutations to favipiravir |
flu_fludase_resistance | String | resistance mutations to fludase |
flu_L_742_001_resistance | String | resistance mutations to L_742_001 |
flu_laninamivir_resistance | String | resistance mutations to laninamivir |
flu_oseltamivir_resistance | String | resistance mutations to oseltamivir (Tamiflu®) |
flu_peramivir_resistance | String | resistance mutations to peramivir (Rapivab®) |
flu_pimodivir_resistance | String | resistance mutations to pimodivir |
flu_rimantadine_resistance | String | resistance mutations to rimantadine |
flu_xofluza_resistance | String | resistance mutations to xofluza (Baloxavir marboxil) |
flu_zanamivir_resistance | String | resistance mutations to zanamivir (Relenza®) |
genoflu_all_segments | String | The genotypes for each individual flu segment |
genoflu_genotype | String | The genotype of the whole genome, based off of the individual segments types |
genoflu_output_tsv | File | The output file from GenoFLU |
genoflu_version | String | The version of GenoFLU used |
irma_assembly_fasta_concatenated | File | Assembly FASTA file of all Influenza genome segments concatenated into one sequence/FASTA entry |
irma_docker | String | Docker image used to run IRMA |
irma_ha_segment_fasta | File | HA (Haemagglutinin) assembly fasta file |
irma_mp_segment_fasta | File | MP (Matrix Protein) assembly fasta file |
irma_na_segment_fasta | File | NA (Neuraminidase) assembly fasta file |
irma_np_segment_fasta | File | NP (Nucleoprotein) assembly fasta file |
irma_ns_segment_fasta | File | NS (Nonstructural) assembly fasta file |
irma_pa_segment_fasta | File | PA (Polymerase acidic) assembly fasta file |
irma_pb1_segment_fasta | File | PB1 (Polymerase basic 1) assembly fasta file |
irma_pb2_segment_fasta | File | PB2 (Polymerase basic 2) assembly fasta file |
irma_subtype | String | Flu subtype as determined by IRMA |
irma_subtype_notes | String | Helpful note to user about Flu B subtypes. Output will be blank for Flu A samples. For Flu B samples it will state: "IRMA does not differentiate Victoria and Yamagata Flu B lineages. See abricate_flu_subtype output column" |
irma_type | String | Flu type as determined by IRMA |
irma_version | String | Version of IRMA used |
ivar_tsv | File | Variant descriptor file generated by iVar variants |
ivar_variant_proportion_intermediate | String | The proportion of variants of intermediate frequency |
ivar_variant_version | String | Version of iVar for running the iVar variants command |
ivar_vcf | File | iVar tsv output converted to VCF format |
ivar_version_consensus | String | Version of iVar for running the iVar consensus command |
ivar_version_primtrim | String | Version of iVar for running the iVar trim command |
kraken_human | Float | Percent of human read data detected using the Kraken2 software |
kraken_human_dehosted | Float | Percent of human read data detected using the Kraken2 software after host removal |
kraken_report | File | Full Kraken report |
kraken_report_dehosted | File | Full Kraken report after host removal |
kraken_sc2 | String | Percent of SARS-CoV-2 read data detected using the Kraken2 software |
kraken_sc2_dehosted | Float | Percent of SARS-CoV-2 read data detected using the Kraken2 software after host removal |
kraken_target_organism | String | Percent of target organism read data detected using the Kraken2 software |
kraken_target_organism_dehosted | String | Percent of target organism read data detected using the Kraken2 software after host removal |
kraken_target_organism_name | String | The name of the target organism; e.g., "Monkeypox" or "Human immunodeficiency virus" |
kraken_version | String | Version of Kraken software used |
meanbaseq_trim | Float | Mean quality of the nucleotide basecalls aligned to the reference genome after primer trimming |
meanmapq_trim | Float | Mean quality of the mapped reads to the reference genome after primer trimming |
nextclade_aa_dels | String | Amino-acid deletions as detected by NextClade. Will be blank for Flu |
nextclade_aa_dels_flu_h5n1 | String | Amino-acid deletions as detected by NextClade. Specific to flu; it includes deletions for H5N1 whole genome |
nextclade_aa_dels_flu_ha | String | Amino-acid deletions as detected by NextClade. Specific to flu; it includes deletions for HA segment |
nextclade_aa_dels_flu_na | String | Amino-acid deletions as detected by NextClade. Specific to Flu; it includes deletions for NA segment |
nextclade_aa_subs | String | Amino-acid substitutions as detected by Nextclade. Will be blank for Flu |
nextclade_aa_subs_flu_h5n1 | String | Amino-acid substitutions as detected by Nextclade. Specific to Flu; it includes substitutions for H5N1 whole genome |
nextclade_aa_subs_flu_ha | String | Amino-acid substitutions as detected by Nextclade. Specific to Flu; it includes substitutions for HA segment |
nextclade_aa_subs_flu_na | String | Amino-acid substitutions as detected by Nextclade. Specific to Flu; it includes substitutions for NA segment |
nextclade_clade | String | Nextclade clade designation, will be blank for Flu. |
nextclade_clade_flu_h5n1 | String | Nextclade clade designation, specific to Flu 5N1 whole genome. NOTE: Output will be blank or NA since this nextclade dataset does assign clades |
nextclade_clade_flu_ha | String | Nextclade clade designation, specific to Flu NA segment |
nextclade_clade_flu_na | String | Nextclade clade designation, specific to Flu HA segment |
nextclade_docker | String | Docker image used to run Nextclade |
nextclade_ds_tag | String | Dataset tag used to run Nextclade. Will be blank for Flu |
nextclade_ds_tag_flu_ha | String | Dataset tag used to run Nextclade, specific to Flu HA segment |
nextclade_ds_tag_flu_na | String | Dataset tag used to run Nextclade, specific to Flu NA segment |
nextclade_json | File | Nextclade output in JSON file format. Will be blank for Flu |
nextclade_json_flu_h5n1 | File | Nextclade output in JSON file format, specific to Flu H5N1 whole genome |
nextclade_json_flu_ha | File | Nextclade output in JSON file format, specific to Flu HA segment |
nextclade_json_flu_na | File | Nextclade output in JSON file format, specific to Flu NA segment |
nextclade_lineage | String | Nextclade lineage designation |
nextclade_qc | String | QC metric as determined by Nextclade. Will be blank for Flu |
nextclade_qc_flu_h5n1 | String | QC metric as determined by Nextclade, specific to Flu H5N1 whole genome |
nextclade_qc_flu_ha | String | QC metric as determined by Nextclade, specific to Flu HA segment |
nextclade_qc_flu_na | String | QC metric as determined by Nextclade, specific to Flu NA segment |
nextclade_tsv | File | Nextclade output in TSV file format. Will be blank for Flu |
nextclade_tsv_flu_h5n1 | File | Nextclade output in TSV file format, specific to Flu H5N1 whole genome |
nextclade_tsv_flu_ha | File | Nextclade output in TSV file format, specific to Flu HA segment |
nextclade_tsv_flu_na | File | Nextclade output in TSV file format, specific to Flu NA segment |
nextclade_version | String | The version of Nextclade software used |
number_Degenerate | Int | Number of degenerate basecalls within the consensus assembly |
number_N | Int | Number of fully ambiguous basecalls within the consensus assembly |
number_Total | Int | Total number of nucleotides within the consensus assembly |
pango_lineage | String | Pango lineage as determined by Pangolin |
pango_lineage_expanded | String | Pango lineage without use of aliases; e.g., "BA.1" → "B.1.1.529.1" |
pango_lineage_report | File | Full Pango lineage report generated by Pangolin |
pangolin_assignment_version | String | The version of the pangolin software (e.g. PANGO or PUSHER) used for lineage assignment |
pangolin_conflicts | String | Number of lineage conflicts as determined by Pangolin |
pangolin_docker | String | Docker image used to run Pangolin |
pangolin_notes | String | Lineage notes as determined by Pangolin |
pangolin_versions | String | All Pangolin software and database versions |
percent_reference_coverage | Float | Percent coverage of the reference genome after performing primer trimming; calculated as assembly_length_unambiguous / length of the reference genome (SC2: 29903) x 100 |
percentage_mapped_reads | String | Percentage of reads that successfully aligned to the reference genome. This value is calculated by number of mapped reads / total number of reads x 100. |
primer_bed_name | String | Name of the primer bed files used for primer trimming |
primer_trimmed_read_percent | Float | Percentage of read data with primers trimmed as determined by iVar trim |
qc_check | String | A string that indicates whether or not the sample passes a set of pre-determined and user-provided QC thresholds |
qc_standard | File | The file used in the QC Check task containing the QC thresholds. |
quasitools_coverage_file | File | The coverage report created by Quasitools HyDRA |
quasitools_date | String | Date of Quasitools analysis |
quasitools_dr_report | File | Drug resistance report created by Quasitools HyDRA |
quasitools_hydra_vcf | File | The VCF created by Quasitools HyDRA |
quasitools_mutations_report | File | The mutation report created by Quasitools HyDRA |
quasitools_version | String | Version of Quasitools used |
read_screen_clean | String | PASS or FAIL result from clean read screening; FAIL accompanied by the reason(s) for failure |
read_screen_clean_tsv | File | Clean read screening report TSV depicting read counts, total read base pairs, and estimated genome length |
read_screen_raw | String | PASS or FAIL result from raw read screening; FAIL accompanied by the reason(s) for failure |
read_screen_raw_tsv | File | Raw read screening report TSV depicting read counts, total read base pairs, and estimated genome length |
read1_aligned | File | Forward read file of only aligned reads |
read1_clean | File | Forward read file after quality trimming and adapter removal |
read1_dehosted | File | The dehosted forward reads file; suggested read file for SRA submission |
read1_unaligned | File | Forward read file of unaligned reads |
read2_aligned | File | Reverse read file of only aligned reads |
read2_clean | File | Reverse read file after quality trimming and adapter removal |
read2_dehosted | File | The dehosted reverse reads file; suggested read file for SRA submission |
read2_unaligned | File | Reverse read file of unaligned reads |
samtools_version | String | The version of SAMtools used to sort and index the alignment file |
samtools_version_consensus | String | The version of SAMtools used to create the pileup before running iVar consensus |
samtools_version_primtrim | String | The version of SAMtools used to create the pileup before running iVar trim |
samtools_version_stats | String | The version of SAMtools used to assess the quality of read mapping |
sc2_s_gene_mean_coverage | Float | Mean read depth for the S gene in SARS-CoV-2 |
sc2_s_gene_percent_coverage | Float | Percent coverage of the S gene in SARS-CoV-2 |
seq_platform | String | Description of the sequencing methodology used to generate the input read data |
sorted_bam_unaligned | File | A BAM file that only contains reads that did not align to the reference |
sorted_bam_unaligned_bai | File | Index companion file to a BAM file that only contains reads that did not align to the reference |
theiacov_illumina_pe_analysis_date | String | Date of analysis |
theiacov_illumina_pe_version | String | Version of PHB used for running the workflow |
trimmomatic_docker | String | The docker image used for the trimmomatic module in this workflow |
trimmomatic_version | String | The version of Trimmomatic used |
vadr_alerts_list | File | A file containing all of the fatal alerts as determined by VADR |
vadr_all_outputs_tar_gz | File | A .tar.gz file (gzip-compressed tar archive file) containing all outputs from the VADR command v-annotate.pl. This file must be uncompressed & extracted to see the many files within. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description of all files present within the archive. Useful when deeply investigating a sample's genome & annotations. |
vadr_classification_summary_file | File | Per-sequence tabular classification file. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#explanation-of-sqc-suffixed-output-files for more complete description. |
vadr_docker | String | Docker image used to run VADR |
vadr_fastas_zip_archive | File | Zip archive containing all fasta files created during VADR analysis |
vadr_feature_tbl_fail | File | 5 column feature table output for failing sequences. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description. |
vadr_feature_tbl_pass | File | 5 column feature table output for passing sequences. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description. |
vadr_num_alerts | String | Number of fatal alerts as determined by VADR |
Variable | Type | Description |
---|---|---|
aligned_bai | File | Index companion file to the bam file generated during the consensus assembly process |
aligned_bam | File | Sorted BAM file containing the alignments of reads to the reference genome |
assembly_fasta | File | Consensus genome assembly; for lower quality flu samples, the output may state "Assembly could not be generated" when there is too little and/or too low quality data for IRMA to produce an assembly. Contigs will be ordered from largest to smallest when IRMA is used. |
assembly_length_unambiguous | Int | Number of unambiguous basecalls within the consensus assembly |
assembly_mean_coverage | Float | Mean sequencing depth throughout the consensus assembly. Generated after performing primer trimming and calculated using the SAMtools coverage command |
assembly_method | String | Method employed to generate consensus assembly |
auspice_json | File | Auspice-compatable JSON output generated from Nextclade analysis that includes the Nextclade default samples for clade-typing and the single sample placed on this tree |
bbduk_docker | String | The Docker image for bbduk, which was used to remove the adapters from the sequences |
bwa_version | String | Version of BWA software used |
consensus_flagstat | File | Output from the SAMtools flagstat command to assess quality of the alignment file (BAM) |
consensus_n_variant_min_depth | Int | Minimum read depth to call variants for iVar consensus and iVar variants. Also represents the minimum consensus support threshold used by IRMA with Illumina Influenza data. |
consensus_stats | File | Output from the SAMtools stats command to assess quality of the alignment file (BAM) |
est_percent_gene_coverage_tsv | File | Percent coverage for each gene in the organism being analyzed (depending on the organism input) |
fastp_html_report | File | The HTML report made with fastp |
fastp_version | String | The version of fastp used |
fastq_scan_clean1_json | File | The JSON file output from fastq-scan containing summary stats about clean forward read quality and length |
fastq_scan_num_reads_clean1 | Int | The number of forward reads after cleaning as calculated by fastq_scan |
fastq_scan_num_reads_raw1 | Int | The number of input forward reads as calculated by fastq_scan |
fastq_scan_r1_mean_q_clean | Float | Forward read mean quality value after quality trimming and adapter removal |
fastq_scan_r1_mean_q_raw | Float | Forward read mean quality value before quality trimming and adapter removal |
fastq_scan_r1_mean_readlength_clean | Float | Forward read mean read length value after quality trimming and adapter removal |
fastq_scan_r1_mean_readlength_raw | Float | Forward read mean read length value before quality trimming and adapter removal |
fastq_scan_raw1_json | File | The JSON file output from fastq-scan containing summary stats about raw forward read quality and length |
fastq_scan_version | String | The version of fastq_scan |
fastqc_clean1_html | File | An HTML file that provides a graphical visualization of clean forward read quality from fastqc to open in an internet browser |
fastqc_docker | String | The Docker container used for fastqc |
fastqc_num_reads_clean1 | Int | The number of forward reads after cleaning by fastqc |
fastqc_num_reads_raw1 | Int | The number of input forward reads by fastqc before cleaning |
fastqc_raw1_html | File | An HTML file that provides a graphical visualization of raw forward read quality from fastqc to open in an internet browser |
fastqc_version | String | Version of fastqc software used |
ivar_tsv | File | Variant descriptor file generated by iVar variants |
ivar_variant_proportion_intermediate | String | The proportion of variants of intermediate frequency |
ivar_variant_version | String | Version of iVar for running the iVar variants command |
ivar_vcf | File | iVar tsv output converted to VCF format |
ivar_version_consensus | String | Version of iVar for running the iVar consensus command |
ivar_version_primtrim | String | Version of iVar for running the iVar trim command |
kraken_target_organism | String | Percent of target organism read data detected using the Kraken2 software |
kraken_target_organism_name | String | The name of the target organism; e.g., "Monkeypox" or "Human immunodeficiency virus" |
kraken_version | String | Version of Kraken software used |
meanbaseq_trim | Float | Mean quality of the nucleotide basecalls aligned to the reference genome after primer trimming |
meanmapq_trim | Float | Mean quality of the mapped reads to the reference genome after primer trimming |
nextclade_aa_dels | String | Amino-acid deletions as detected by NextClade. Will be blank for Flu |
nextclade_aa_subs | String | Amino-acid substitutions as detected by Nextclade. Will be blank for Flu |
nextclade_clade | String | Nextclade clade designation, will be blank for Flu. |
nextclade_docker | String | Docker image used to run Nextclade |
nextclade_ds_tag | String | Dataset tag used to run Nextclade. Will be blank for Flu |
nextclade_json | File | Nextclade output in JSON file format. Will be blank for Flu |
nextclade_lineage | String | Nextclade lineage designation |
nextclade_qc | String | QC metric as determined by Nextclade. Will be blank for Flu |
nextclade_tsv | File | Nextclade output in TSV file format. Will be blank for Flu |
nextclade_version | String | The version of Nextclade software used |
number_Degenerate | Int | Number of degenerate basecalls within the consensus assembly |
number_N | Int | Number of fully ambiguous basecalls within the consensus assembly |
number_Total | Int | Total number of nucleotides within the consensus assembly |
pango_lineage | String | Pango lineage as determined by Pangolin |
pango_lineage_expanded | String | Pango lineage without use of aliases; e.g., "BA.1" → "B.1.1.529.1" |
pango_lineage_report | File | Full Pango lineage report generated by Pangolin |
pangolin_assignment_version | String | The version of the pangolin software (e.g. PANGO or PUSHER) used for lineage assignment |
pangolin_conflicts | String | Number of lineage conflicts as determined by Pangolin |
pangolin_docker | String | Docker image used to run Pangolin |
pangolin_notes | String | Lineage notes as determined by Pangolin |
pangolin_versions | String | All Pangolin software and database versions |
percent_reference_coverage | Float | Percent coverage of the reference genome after performing primer trimming; calculated as assembly_length_unambiguous / length of the reference genome (SC2: 29903) x 100 |
percentage_mapped_reads | String | Percentage of reads that successfully aligned to the reference genome. This value is calculated by number of mapped reads / total number of reads x 100. |
primer_bed_name | String | Name of the primer bed files used for primer trimming |
primer_trimmed_read_percent | Float | Percentage of read data with primers trimmed as determined by iVar trim |
qc_check | String | A string that indicates whether or not the sample passes a set of pre-determined and user-provided QC thresholds |
qc_standard | File | The file used in the QC Check task containing the QC thresholds. |
read_screen_clean | String | PASS or FAIL result from clean read screening; FAIL accompanied by the reason(s) for failure |
read_screen_clean_tsv | File | Clean read screening report TSV depicting read counts, total read base pairs, and estimated genome length |
read_screen_raw | String | PASS or FAIL result from raw read screening; FAIL accompanied by the reason(s) for failure |
read_screen_raw_tsv | File | Raw read screening report TSV depicting read counts, total read base pairs, and estimated genome length |
read1_aligned | File | Forward read file of only aligned reads |
read1_unaligned | File | Forward read file of unaligned reads |
samtools_version | String | The version of SAMtools used to sort and index the alignment file |
samtools_version_consensus | String | The version of SAMtools used to create the pileup before running iVar consensus |
samtools_version_primtrim | String | The version of SAMtools used to create the pileup before running iVar trim |
samtools_version_stats | String | The version of SAMtools used to assess the quality of read mapping |
sc2_s_gene_mean_coverage | Float | Mean read depth for the S gene in SARS-CoV-2 |
sc2_s_gene_percent_coverage | Float | Percent coverage of the S gene in SARS-CoV-2 |
seq_platform | String | Description of the sequencing methodology used to generate the input read data |
sorted_bam_unaligned | File | A BAM file that only contains reads that did not align to the reference |
sorted_bam_unaligned_bai | File | Index companion file to a BAM file that only contains reads that did not align to the reference |
theiacov_illumina_se_analysis_date | String | Date of analysis |
theiacov_illumina_se_version | String | Version of PHB used for running the workflow |
trimmomatic_docker | String | The docker image used for the trimmomatic module in this workflow |
trimmomatic_version | String | The version of Trimmomatic used |
vadr_alerts_list | File | A file containing all of the fatal alerts as determined by VADR |
vadr_all_outputs_tar_gz | File | A .tar.gz file (gzip-compressed tar archive file) containing all outputs from the VADR command v-annotate.pl. This file must be uncompressed & extracted to see the many files within. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description of all files present within the archive. Useful when deeply investigating a sample's genome & annotations. |
vadr_classification_summary_file | File | Per-sequence tabular classification file. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#explanation-of-sqc-suffixed-output-files for more complete description. |
vadr_docker | String | Docker image used to run VADR |
vadr_fastas_zip_archive | File | Zip archive containing all fasta files created during VADR analysis |
vadr_feature_tbl_fail | File | 5 column feature table output for failing sequences. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description. |
vadr_feature_tbl_pass | File | 5 column feature table output for passing sequences. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description. |
vadr_num_alerts | String | Number of fatal alerts as determined by VADR |
Variable | Type | Description |
---|---|---|
abricate_flu_database | String | ABRicate database used for analysis |
abricate_flu_results | File | File containing all results from ABRicate |
abricate_flu_subtype | String | Flu subtype as determined by ABRicate |
abricate_flu_type | String | Flu type as determined by ABRicate |
abricate_flu_version | String | Version of ABRicate |
aligned_bai | File | Index companion file to the bam file generated during the consensus assembly process |
aligned_bam | File | Sorted BAM file containing the alignments of reads to the reference genome |
artic_docker | String | Docker image utilized for read trimming and consensus genome assembly |
artic_version | String | Version of the Artic software utilized for read trimming and conesnsus genome assembly |
assembly_fasta | File | Consensus genome assembly; for lower quality flu samples, the output may state "Assembly could not be generated" when there is too little and/or too low quality data for IRMA to produce an assembly. Contigs will be ordered from largest to smallest when IRMA is used. |
assembly_length_unambiguous | Int | Number of unambiguous basecalls within the consensus assembly |
assembly_mean_coverage | Float | Mean sequencing depth throughout the consensus assembly. Generated after performing primer trimming and calculated using the SAMtools coverage command |
assembly_method | String | Method employed to generate consensus assembly |
auspice_json | File | Auspice-compatable JSON output generated from Nextclade analysis that includes the Nextclade default samples for clade-typing and the single sample placed on this tree |
auspice_json_flu_h5n1 | File | Auspice-compatable JSON output generated from Nextclade analysis on Influenza H5N1 whole genome that includes the samples included in the "avian-flu/h5n1-cattle-outbreak" nextstrain build that is focused on B3.13 genotype and the single sample placed on this tree |
auspice_json_flu_ha | File | Auspice-compatable JSON output generated from Nextclade analysis on Influenza HA segment that includes the Nextclade default samples for clade-typing and the single sample placed on this tree |
auspice_json_flu_na | File | Auspice-compatable JSON output generated from Nextclade analysis on Influenza NA segment that includes the Nextclade default samples for clade-typing and the single sample placed on this tree |
consensus_flagstat | File | Output from the SAMtools flagstat command to assess quality of the alignment file (BAM) |
consensus_stats | File | Output from the SAMtools stats command to assess quality of the alignment file (BAM) |
est_coverage_clean | Float | Estimated coverage calculated from clean reads and genome length |
est_coverage_raw | Float | Estimated coverage calculated from raw reads and genome length |
est_percent_gene_coverage_tsv | File | Percent coverage for each gene in the organism being analyzed (depending on the organism input) |
flu_A_315675_resistance | String | resistance mutations to A_315675 |
flu_amantadine_resistance | String | resistance mutations to amantadine |
flu_compound_367_resistance | String | resistance mutations to compound_367 |
flu_favipiravir_resistance | String | resistance mutations to favipiravir |
flu_fludase_resistance | String | resistance mutations to fludase |
flu_L_742_001_resistance | String | resistance mutations to L_742_001 |
flu_laninamivir_resistance | String | resistance mutations to laninamivir |
flu_oseltamivir_resistance | String | resistance mutations to oseltamivir (Tamiflu®) |
flu_peramivir_resistance | String | resistance mutations to peramivir (Rapivab®) |
flu_pimodivir_resistance | String | resistance mutations to pimodivir |
flu_rimantadine_resistance | String | resistance mutations to rimantadine |
flu_xofluza_resistance | String | resistance mutations to xofluza (Baloxavir marboxil) |
flu_zanamivir_resistance | String | resistance mutations to zanamivir (Relenza®) |
genoflu_all_segments | String | The genotypes for each individual flu segment |
genoflu_genotype | String | The genotype of the whole genome, based off of the individual segments types |
genoflu_output_tsv | File | The output file from GenoFLU |
genoflu_version | String | The version of GenoFLU used |
irma_assembly_fasta_concatenated | File | Assembly FASTA file of all Influenza genome segments concatenated into one sequence/FASTA entry |
irma_docker | String | Docker image used to run IRMA |
irma_ha_segment_fasta | File | HA (Haemagglutinin) assembly fasta file |
irma_min_consensus_support_threshold | Int | Minimum consensus support threshold used by IRMA with ONT data. For illumina data, see output called consensus_n_variant_min_depth for this value |
irma_mp_segment_fasta | File | MP (Matrix Protein) assembly fasta file |
irma_na_segment_fasta | File | NA (Neuraminidase) assembly fasta file |
irma_np_segment_fasta | File | NP (Nucleoprotein) assembly fasta file |
irma_ns_segment_fasta | File | NS (Nonstructural) assembly fasta file |
irma_pa_segment_fasta | File | PA (Polymerase acidic) assembly fasta file |
irma_pb1_segment_fasta | File | PB1 (Polymerase basic 1) assembly fasta file |
irma_pb2_segment_fasta | File | PB2 (Polymerase basic 2) assembly fasta file |
irma_subtype | String | Flu subtype as determined by IRMA |
irma_subtype_notes | String | Helpful note to user about Flu B subtypes. Output will be blank for Flu A samples. For Flu B samples it will state: "IRMA does not differentiate Victoria and Yamagata Flu B lineages. See abricate_flu_subtype output column" |
irma_type | String | Flu type as determined by IRMA |
irma_version | String | Version of IRMA used |
kraken_human | Float | Percent of human read data detected using the Kraken2 software |
kraken_human_dehosted | Float | Percent of human read data detected using the Kraken2 software after host removal |
kraken_report | File | Full Kraken report |
kraken_report_dehosted | File | Full Kraken report after host removal |
kraken_sc2 | String | Percent of SARS-CoV-2 read data detected using the Kraken2 software |
kraken_sc2_dehosted | Float | Percent of SARS-CoV-2 read data detected using the Kraken2 software after host removal |
kraken_target_organism | String | Percent of target organism read data detected using the Kraken2 software |
kraken_target_organism_dehosted | String | Percent of target organism read data detected using the Kraken2 software after host removal |
kraken_target_organism_name | String | The name of the target organism; e.g., "Monkeypox" or "Human immunodeficiency virus" |
kraken_version | String | Version of Kraken software used |
meanbaseq_trim | Float | Mean quality of the nucleotide basecalls aligned to the reference genome after primer trimming |
meanmapq_trim | Float | Mean quality of the mapped reads to the reference genome after primer trimming |
medaka_reference | String | Reference sequence used in medaka task |
medaka_vcf | File | A VCF file containing the identified variants |
nanoplot_docker | String | Docker image for nanoplot |
nanoplot_html_clean | File | An HTML report describing the clean reads |
nanoplot_html_raw | File | An HTML report describing the raw reads |
nanoplot_num_reads_clean1 | Int | Number of clean reads |
nanoplot_num_reads_raw1 | Int | Number of raw reads |
nanoplot_r1_est_coverage_clean | Float | Estimated coverage on the clean reads by nanoplot |
nanoplot_r1_est_coverage_raw | Float | Estimated coverage on the raw reads by nanoplot |
nanoplot_r1_mean_q_clean | Float | Mean quality score of clean forward reads |
nanoplot_r1_mean_q_raw | Float | Mean quality score of raw forward reads |
nanoplot_r1_mean_readlength_clean | Float | Mean read length of clean forward reads |
nanoplot_r1_mean_readlength_raw | Float | Mean read length of raw forward reads |
nanoplot_r1_median_q_clean | Float | Median quality score of clean forward reads |
nanoplot_r1_median_q_raw | Float | Median quality score of raw forward reads |
nanoplot_r1_median_readlength_clean | Float | Median read length of clean forward reads |
nanoplot_r1_median_readlength_raw | Float | Median read length of raw forward reads |
nanoplot_r1_n50_clean | Float | N50 of clean forward reads |
nanoplot_r1_n50_raw | Float | N50 of raw forward reads |
nanoplot_r1_stdev_readlength_clean | Float | Standard deviation read length of clean forward reads |
nanoplot_r1_stdev_readlength_raw | Float | Standard deviation read length of raw forward reads |
nanoplot_tsv_clean | File | A TSV report describing the clean reads |
nanoplot_tsv_raw | File | A TSV report describing the raw reads |
nanoplot_version | String | Version of nanoplot used for analysis |
nextclade_aa_dels | String | Amino-acid deletions as detected by NextClade. Will be blank for Flu |
nextclade_aa_dels_flu_h5n1 | String | Amino-acid deletions as detected by NextClade. Specific to flu; it includes deletions for H5N1 whole genome |
nextclade_aa_dels_flu_ha | String | Amino-acid deletions as detected by NextClade. Specific to flu; it includes deletions for HA segment |
nextclade_aa_dels_flu_na | String | Amino-acid deletions as detected by NextClade. Specific to Flu; it includes deletions for NA segment |
nextclade_aa_subs | String | Amino-acid substitutions as detected by Nextclade. Will be blank for Flu |
nextclade_aa_subs_flu_h5n1 | String | Amino-acid substitutions as detected by Nextclade. Specific to Flu; it includes substitutions for H5N1 whole genome |
nextclade_aa_subs_flu_ha | String | Amino-acid substitutions as detected by Nextclade. Specific to Flu; it includes substitutions for HA segment |
nextclade_aa_subs_flu_na | String | Amino-acid substitutions as detected by Nextclade. Specific to Flu; it includes substitutions for NA segment |
nextclade_clade | String | Nextclade clade designation, will be blank for Flu. |
nextclade_clade_flu_h5n1 | String | Nextclade clade designation, specific to Flu 5N1 whole genome. NOTE: Output will be blank or NA since this nextclade dataset does assign clades |
nextclade_clade_flu_ha | String | Nextclade clade designation, specific to Flu NA segment |
nextclade_clade_flu_na | String | Nextclade clade designation, specific to Flu HA segment |
nextclade_docker | String | Docker image used to run Nextclade |
nextclade_ds_tag | String | Dataset tag used to run Nextclade. Will be blank for Flu |
nextclade_ds_tag_flu_ha | String | Dataset tag used to run Nextclade, specific to Flu HA segment |
nextclade_ds_tag_flu_na | String | Dataset tag used to run Nextclade, specific to Flu NA segment |
nextclade_json | File | Nextclade output in JSON file format. Will be blank for Flu |
nextclade_json_flu_h5n1 | File | Nextclade output in JSON file format, specific to Flu H5N1 whole genome |
nextclade_json_flu_ha | File | Nextclade output in JSON file format, specific to Flu HA segment |
nextclade_json_flu_na | File | Nextclade output in JSON file format, specific to Flu NA segment |
nextclade_lineage | String | Nextclade lineage designation |
nextclade_qc | String | QC metric as determined by Nextclade. Will be blank for Flu |
nextclade_qc_flu_h5n1 | String | QC metric as determined by Nextclade, specific to Flu H5N1 whole genome |
nextclade_qc_flu_ha | String | QC metric as determined by Nextclade, specific to Flu HA segment |
nextclade_qc_flu_na | String | QC metric as determined by Nextclade, specific to Flu NA segment |
nextclade_tsv | File | Nextclade output in TSV file format. Will be blank for Flu |
nextclade_tsv_flu_h5n1 | File | Nextclade output in TSV file format, specific to Flu H5N1 whole genome |
nextclade_tsv_flu_ha | File | Nextclade output in TSV file format, specific to Flu HA segment |
nextclade_tsv_flu_na | File | Nextclade output in TSV file format, specific to Flu NA segment |
nextclade_version | String | The version of Nextclade software used |
number_Degenerate | Int | Number of degenerate basecalls within the consensus assembly |
number_N | Int | Number of fully ambiguous basecalls within the consensus assembly |
number_Total | Int | Total number of nucleotides within the consensus assembly |
pango_lineage | String | Pango lineage as determined by Pangolin |
pango_lineage_expanded | String | Pango lineage without use of aliases; e.g., "BA.1" → "B.1.1.529.1" |
pango_lineage_report | File | Full Pango lineage report generated by Pangolin |
pangolin_assignment_version | String | The version of the pangolin software (e.g. PANGO or PUSHER) used for lineage assignment |
pangolin_conflicts | String | Number of lineage conflicts as determined by Pangolin |
pangolin_docker | String | Docker image used to run Pangolin |
pangolin_notes | String | Lineage notes as determined by Pangolin |
pangolin_versions | String | All Pangolin software and database versions |
percent_reference_coverage | Float | Percent coverage of the reference genome after performing primer trimming; calculated as assembly_length_unambiguous / length of the reference genome (SC2: 29903) x 100 |
percentage_mapped_reads | String | Percentage of reads that successfully aligned to the reference genome. This value is calculated by number of mapped reads / total number of reads x 100. |
qc_check | String | A string that indicates whether or not the sample passes a set of pre-determined and user-provided QC thresholds |
qc_standard | File | The file used in the QC Check task containing the QC thresholds. |
quasitools_coverage_file | File | The coverage report created by Quasitools HyDRA |
quasitools_date | String | Date of Quasitools analysis |
quasitools_dr_report | File | Drug resistance report created by Quasitools HyDRA |
quasitools_hydra_vcf | File | The VCF created by Quasitools HyDRA |
quasitools_mutations_report | File | The mutation report created by Quasitools HyDRA |
quasitools_version | String | Version of Quasitools used |
read_screen_clean | String | PASS or FAIL result from clean read screening; FAIL accompanied by the reason(s) for failure |
read_screen_clean_tsv | File | Clean read screening report TSV depicting read counts, total read base pairs, and estimated genome length |
read_screen_raw | String | PASS or FAIL result from raw read screening; FAIL accompanied by the reason(s) for failure |
read_screen_raw_tsv | File | Raw read screening report TSV depicting read counts, total read base pairs, and estimated genome length |
read1_aligned | File | Forward read file of only aligned reads |
read1_trimmed | File | Forward read file after quality trimming and adapter removal |
samtools_version | String | The version of SAMtools used to sort and index the alignment file |
sc2_s_gene_mean_coverage | Float | Mean read depth for the S gene in SARS-CoV-2 |
sc2_s_gene_percent_coverage | Float | Percent coverage of the S gene in SARS-CoV-2 |
seq_platform | String | Description of the sequencing methodology used to generate the input read data |
theiacov_ont_analysis_date | String | Date of analysis |
theiacov_ont_version | String | Version of PHB used for running the workflow |
vadr_alerts_list | File | A file containing all of the fatal alerts as determined by VADR |
vadr_all_outputs_tar_gz | File | A .tar.gz file (gzip-compressed tar archive file) containing all outputs from the VADR command v-annotate.pl. This file must be uncompressed & extracted to see the many files within. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description of all files present within the archive. Useful when deeply investigating a sample's genome & annotations. |
vadr_classification_summary_file | File | Per-sequence tabular classification file. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#explanation-of-sqc-suffixed-output-files for more complete description. |
vadr_docker | String | Docker image used to run VADR |
vadr_fastas_zip_archive | File | Zip archive containing all fasta files created during VADR analysis |
vadr_feature_tbl_fail | File | 5 column feature table output for failing sequences. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description. |
vadr_feature_tbl_pass | File | 5 column feature table output for passing sequences. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description. |
vadr_num_alerts | String | Number of fatal alerts as determined by VADR |
Variable | Type | Description |
---|---|---|
abricate_flu_database | String | ABRicate database used for analysis |
abricate_flu_results | File | File containing all results from ABRicate |
abricate_flu_subtype | String | Flu subtype as determined by ABRicate |
abricate_flu_type | String | Flu type as determined by ABRicate |
abricate_flu_version | String | Version of ABRicate |
assembly_length_unambiguous | Int | Number of unambiguous basecalls within the consensus assembly |
assembly_method | String | Method employed to generate consensus assembly |
auspice_json | File | Auspice-compatable JSON output generated from Nextclade analysis that includes the Nextclade default samples for clade-typing and the single sample placed on this tree |
genoflu_all_segments | String | The genotypes for each individual flu segment |
genoflu_genotype | String | The genotype of the whole genome, based off of the individual segments types |
genoflu_output_tsv | File | The output file from GenoFLU |
genoflu_version | String | The version of GenoFLU used |
nextclade_aa_dels | String | Amino-acid deletions as detected by NextClade. Will be blank for Flu |
nextclade_aa_subs | String | Amino-acid substitutions as detected by Nextclade. Will be blank for Flu |
nextclade_clade | String | Nextclade clade designation, will be blank for Flu. |
nextclade_docker | String | Docker image used to run Nextclade |
nextclade_ds_tag | String | Dataset tag used to run Nextclade. Will be blank for Flu |
nextclade_json | File | Nextclade output in JSON file format. Will be blank for Flu |
nextclade_lineage | String | Nextclade lineage designation |
nextclade_qc | String | QC metric as determined by Nextclade. Will be blank for Flu |
nextclade_tsv | File | Nextclade output in TSV file format. Will be blank for Flu |
nextclade_version | String | The version of Nextclade software used |
number_Degenerate | Int | Number of degenerate basecalls within the consensus assembly |
number_N | Int | Number of fully ambiguous basecalls within the consensus assembly |
number_Total | Int | Total number of nucleotides within the consensus assembly |
pango_lineage | String | Pango lineage as determined by Pangolin |
pango_lineage_expanded | String | Pango lineage without use of aliases; e.g., "BA.1" → "B.1.1.529.1" |
pango_lineage_report | File | Full Pango lineage report generated by Pangolin |
pangolin_assignment_version | String | The version of the pangolin software (e.g. PANGO or PUSHER) used for lineage assignment |
pangolin_conflicts | String | Number of lineage conflicts as determined by Pangolin |
pangolin_docker | String | Docker image used to run Pangolin |
pangolin_notes | String | Lineage notes as determined by Pangolin |
pangolin_versions | String | All Pangolin software and database versions |
percent_reference_coverage | Float | Percent coverage of the reference genome after performing primer trimming; calculated as assembly_length_unambiguous / length of the reference genome (SC2: 29903) x 100 |
qc_check | String | A string that indicates whether or not the sample passes a set of pre-determined and user-provided QC thresholds |
qc_standard | File | The file used in the QC Check task containing the QC thresholds. |
seq_platform | String | Description of the sequencing methodology used to generate the input read data |
theiacov_fasta_analysis_date | String | Date of analysis |
theiacov_fasta_version | String | Version of PHB used for running the workflow |
vadr_alerts_list | File | A file containing all of the fatal alerts as determined by VADR |
vadr_all_outputs_tar_gz | File | A .tar.gz file (gzip-compressed tar archive file) containing all outputs from the VADR command v-annotate.pl. This file must be uncompressed & extracted to see the many files within. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description of all files present within the archive. Useful when deeply investigating a sample's genome & annotations. |
vadr_classification_summary_file | File | Per-sequence tabular classification file. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#explanation-of-sqc-suffixed-output-files for more complete description. |
vadr_docker | String | Docker image used to run VADR |
vadr_fastas_zip_archive | File | Zip archive containing all fasta files created during VADR analysis |
vadr_feature_tbl_fail | File | 5 column feature table output for failing sequences. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description. |
vadr_feature_tbl_pass | File | 5 column feature table output for passing sequences. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description. |
vadr_num_alerts | String | Number of fatal alerts as determined by VADR |
Variable | Type | Description |
---|---|---|
aligned_bai | File | Index companion file to the bam file generated during the consensus assembly process |
aligned_bam | File | Sorted BAM file containing the alignments of reads to the reference genome |
artic_docker | String | Docker image utilized for read trimming and consensus genome assembly |
artic_version | String | Version of the Artic software utilized for read trimming and conesnsus genome assembly |
assembly_fasta | File | Consensus genome assembly; for lower quality flu samples, the output may state "Assembly could not be generated" when there is too little and/or too low quality data for IRMA to produce an assembly. Contigs will be ordered from largest to smallest when IRMA is used. |
assembly_length_unambiguous | Int | Number of unambiguous basecalls within the consensus assembly |
assembly_mean_coverage | Float | Mean sequencing depth throughout the consensus assembly. Generated after performing primer trimming and calculated using the SAMtools coverage command |
assembly_method | String | Method employed to generate consensus assembly |
auspice_json | File | Auspice-compatable JSON output generated from Nextclade analysis that includes the Nextclade default samples for clade-typing and the single sample placed on this tree |
consensus_flagstat | File | Output from the SAMtools flagstat command to assess quality of the alignment file (BAM) |
consensus_stats | File | Output from the SAMtools stats command to assess quality of the alignment file (BAM) |
est_percent_gene_coverage_tsv | File | Percent coverage for each gene in the organism being analyzed (depending on the organism input) |
kraken_human | Float | Percent of human read data detected using the Kraken2 software |
kraken_human_dehosted | Float | Percent of human read data detected using the Kraken2 software after host removal |
kraken_report | File | Full Kraken report |
kraken_report_dehosted | File | Full Kraken report after host removal |
kraken_sc2 | String | Percent of SARS-CoV-2 read data detected using the Kraken2 software |
kraken_sc2_dehosted | Float | Percent of SARS-CoV-2 read data detected using the Kraken2 software after host removal |
kraken_target_organism | String | Percent of target organism read data detected using the Kraken2 software |
kraken_target_organism_dehosted | String | Percent of target organism read data detected using the Kraken2 software after host removal |
kraken_target_organism_name | String | The name of the target organism; e.g., "Monkeypox" or "Human immunodeficiency virus" |
kraken_version | String | Version of Kraken software used |
meanbaseq_trim | Float | Mean quality of the nucleotide basecalls aligned to the reference genome after primer trimming |
meanmapq_trim | Float | Mean quality of the mapped reads to the reference genome after primer trimming |
medaka_reference | String | Reference sequence used in medaka task |
nextclade_aa_dels | String | Amino-acid deletions as detected by NextClade. Will be blank for Flu |
nextclade_aa_subs | String | Amino-acid substitutions as detected by Nextclade. Will be blank for Flu |
nextclade_clade | String | Nextclade clade designation, will be blank for Flu. |
nextclade_docker | String | Docker image used to run Nextclade |
nextclade_ds_tag | String | Dataset tag used to run Nextclade. Will be blank for Flu |
nextclade_json | File | Nextclade output in JSON file format. Will be blank for Flu |
nextclade_lineage | String | Nextclade lineage designation |
nextclade_qc | String | QC metric as determined by Nextclade. Will be blank for Flu |
nextclade_tsv | File | Nextclade output in TSV file format. Will be blank for Flu |
nextclade_version | String | The version of Nextclade software used |
number_Degenerate | Int | Number of degenerate basecalls within the consensus assembly |
number_N | Int | Number of fully ambiguous basecalls within the consensus assembly |
number_Total | Int | Total number of nucleotides within the consensus assembly |
pango_lineage | String | Pango lineage as determined by Pangolin |
pango_lineage_expanded | String | Pango lineage without use of aliases; e.g., "BA.1" → "B.1.1.529.1" |
pango_lineage_report | File | Full Pango lineage report generated by Pangolin |
pangolin_assignment_version | String | The version of the pangolin software (e.g. PANGO or PUSHER) used for lineage assignment |
pangolin_conflicts | String | Number of lineage conflicts as determined by Pangolin |
pangolin_docker | String | Docker image used to run Pangolin |
pangolin_notes | String | Lineage notes as determined by Pangolin |
pangolin_versions | String | All Pangolin software and database versions |
percent_reference_coverage | Float | Percent coverage of the reference genome after performing primer trimming; calculated as assembly_length_unambiguous / length of the reference genome (SC2: 29903) x 100 |
qc_check | String | A string that indicates whether or not the sample passes a set of pre-determined and user-provided QC thresholds |
qc_standard | File | The file used in the QC Check task containing the QC thresholds. |
read1_aligned | File | Forward read file of only aligned reads |
samtools_version_stats | String | The version of SAMtools used to assess the quality of read mapping |
sc2_s_gene_mean_coverage | Float | Mean read depth for the S gene in SARS-CoV-2 |
sc2_s_gene_percent_coverage | Float | Percent coverage of the S gene in SARS-CoV-2 |
seq_platform | String | Description of the sequencing methodology used to generate the input read data |
theiacov_clearlabs_analysis_date | String | Date of analysis |
theiacov_clearlabs_version | String | Version of PHB used for running the workflow |
vadr_alerts_list | File | A file containing all of the fatal alerts as determined by VADR |
vadr_all_outputs_tar_gz | File | A .tar.gz file (gzip-compressed tar archive file) containing all outputs from the VADR command v-annotate.pl. This file must be uncompressed & extracted to see the many files within. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description of all files present within the archive. Useful when deeply investigating a sample's genome & annotations. |
vadr_classification_summary_file | File | Per-sequence tabular classification file. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#explanation-of-sqc-suffixed-output-files for more complete description. |
vadr_docker | String | Docker image used to run VADR |
vadr_fastas_zip_archive | File | Zip archive containing all fasta files created during VADR analysis |
vadr_feature_tbl_fail | File | 5 column feature table output for failing sequences. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description. |
vadr_feature_tbl_pass | File | 5 column feature table output for passing sequences. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description. |
vadr_num_alerts | String | Number of fatal alerts as determined by VADR |
variants_from_ref_vcf | File | Number of variants relative to the reference genome |
Overwrite Warning
TheiaCoV_FASTA_Batch_PHB workflow will output results to the set-level data table in addition to overwriting the Pangolin & Nextclade output columns in the sample-level data table. Users can view the set-level workflow output TSV file called "Datatable"
to view exactly which columns were overwritten in the sample-level data table.
Variable | Type | Description |
---|---|---|
datatable | File | Sample-level data table TSV file that was used to update the original sample-level data table in the last step of the TheiaCoV_FASTA_Batch workflow. |
nextclade_json | File | Nextclade output in JSON file format. Will be blank for Flu |
nextclade_tsv | File | Nextclade output in TSV file format. Will be blank for Flu |
pango_lineage_report | File | Full Pango lineage report generated by Pangolin |
theiacov_fasta_batch_analysis_date | String | Date that the workflow was run. |
theiacov_fasta_batch_version | String | Version of the workflow that was used. |