TheiaCoV Workflow Series¶

Quick Facts¶

Workflow Type	Applicable Kingdom	Last Known Changes	Command-line Compatibility	Workflow Level	Dockstore
Genomic Characterization	HIV, Influenza, Monkeypox virus, RSV-A, RSV-B, SARS-CoV-2, Viral, WNV	vX.X.X	Some optional features incompatible, Yes	Sample-level, Set-level	TheiaCoV_Illumina_PE_PHB, TheiaCoV_Illumina_SE_PHB, TheiaCoV_ONT_PHB, TheiaCoV_ClearLabs_PHB, TheiaCoV_FASTA_PHB, TheiaCoV_FASTA_Batch_PHB

TheiaCoV Workflows¶

The TheiaCoV workflows are for the assembly, quality assessment, and characterization of viral genomes. There are currently five TheiaCoV workflows designed to accommodate different kinds of input data:

Illumina paired-end sequencing (TheiaCoV_Illumina_PE)
Illumina single-end sequencing (TheiaCoV_Illumina_SE)
ONT sequencing (TheiaCoV_ONT)
Genome assemblies (TheiaCoV_FASTA)
ClearLabs sequencing (TheiaCoV_ClearLabs)

Additionally, the TheiaCoV_FASTA_Batch workflow is available to process several hundred SARS-CoV-2 assemblies at the same time.

Key Resources

Reference Materials for SARS-CoV-2

Reference Materials for Mpox

Reference Materials for non-default viruses (TheiaViral)

HIV Input JSONs

WNV Input JSONs

Flu Input JSONs

RSV-A Input JSONs

RSV-B Input JSONs

TheiaCoV_Illumina_PETheiaCoV_ONT

TheiaCoV Illumina PE and SE Workflow Diagram

TheiaCoV ONT Workflow Diagram

Supported Organisms¶

These workflows currently support the following organisms. The first option in the list (bolded) is what our workflows use as the standardized organism name:

SARS-CoV-2 ("sars-cov-2", "SARS-CoV-2") - default organism input
Monkeypox virus ("MPXV", "mpox", "monkeypox", "Monkeypox virus", "Mpox")
Human Immunodeficiency Virus ("HIV")
West Nile Virus ("WNV", "wnv", "West Nile virus")
Influenza ("flu", "influenza", "Flu", "Influenza")
RSV-A ("rsv_a", "rsv-a", "RSV-A", "RSV_A")
RSV-B ("rsv_b", "rsv-b", "RSV-B", "RSV_B")
Measles ("measles", "Measles", "mev", "MeV", "Morbillivirus", "morbillivirus")
Mumps ("mumps", "Mumps", "MuV", "muv", "Mumps virus", "mumps virus")
Rubella ("rubella", "Rubella", "RuV", "ruv", "Rubella virus", "rubella virus")

The compatibility of each workflow with each pathogen is shown below:

	SARS-CoV-2	Mpox	HIV	WNV	Influenza	RSV-A	RSV-B	Measles	Mumps	Rubella
Illumina_PE	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅
Illumina_SE	✅	✅	❌	✅	❌	✅	✅	✅	✅	✅
ONT	✅	✅	✅	❌	✅	✅	✅	✅	✅	✅
FASTA	✅	✅	❌	✅	✅	✅	✅	✅	✅	✅
ClearLabs	✅	❌	❌	❌	❌	❌	❌	❌	❌	❌

We've provided the following information to help you set up the workflow for each organism in the form of input JSONs.

Inputs¶

Input Data

TheiaCoV_Illumina_PETheiaCoV_Illumina_SETheiaCoV_ONTTheiaCoV_FASTATheiaCoV_ClearLabsTheiaCoV_FASTA_Batch

The TheiaCoV_Illumina_PE workflow takes in Illumina paired-end read data. Read file names should end with .fastq or .fq, with the optional addition of .gz. When possible, Theiagen recommends zipping files with gzip before Terra uploads to minimize data upload time.

By default, the workflow anticipates 2 x 150bp reads (i.e. the input reads were generated using a 300-cycle sequencing kit). Modifications to the optional parameter for trim_minlen may be required to accommodate shorter read data, such as the 2 x 75bp reads generated using a 150-cycle sequencing kit.

TheiaCoV_Illumina_SE takes in Illumina single-end reads. Read file names should end with .fastq or .fq, with the optional addition of .gz. Theiagen highly recommends zipping files with gzip before uploading to Terra to minimize data upload time & save on storage costs.

By default, the workflow anticipates 1 x 35 bp reads (i.e. the input reads were generated using a 70-cycle sequencing kit). Modifications to the optional parameter for trim_minlen may be required to accommodate longer read data.

The TheiaCoV_ONT workflow takes in base-called ONT read data. Read file names should end with .fastq or .fq, with the optional addition of .gz. When possible, Theiagen recommends zipping files with gzip before uploading to Terra to minimize data upload time.

The ONT sequencing kit and base-calling approach can produce substantial variability in the amount and quality of read data. Genome assemblies produced by the TheiaCoV_ONT workflow must be quality assessed before reporting results.

The TheiaCoV_FASTA workflow takes in assembly files in FASTA format.

Note for TheiaCoV_FASTA users analyzing Influenza:

TheiaCoV_FASTA will use the output of VADR to classify and partition Influenza segments from the input assembly. See vadr_flu_segments task for more details.

The TheiaCoV_ClearLabs workflow takes in read data produced by the Clear Dx platform from ClearLabs. However, many users use the TheiaCoV_FASTA workflow instead of this one due to a few known issues when generating assemblies with this pipeline that are not present when using ClearLabs-generated FASTA files.

The TheiaCoV_FASTA_Batch workflow takes in a set of assembly files in FASTA format.

TheiaCoV_Illumina_PETheiaCoV_Illumina_SETheiaCoV_ONTTheiaCoV_FASTATheiaCoV_ClearLabsTheiaCoV_FASTA_Batch

Terra Task Name	Variable	Type	Description	Default Value	Terra Status
theiacov_illumina_pe	read1	File	Illumina forward read file in FASTQ file format (compression optional)		Required
theiacov_illumina_pe	read2	File	Illumina reverse read file in FASTQ file format (compression optional)		Required
theiacov_illumina_pe	samplename	String	The name of the sample being analyzed		Required
clean_check_reads	cpu	Int	Number of CPUs to allocate to the task	1	Optional
clean_check_reads	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
clean_check_reads	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/bactopia/gather_samples:2.0.2	Optional
clean_check_reads	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	2	Optional
consensus_qc	cpu	Int	Number of CPUs to allocate to the task	1	Optional
consensus_qc	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
consensus_qc	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/theiagen/utility:1.1	Optional
consensus_qc	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	2	Optional
flu_track	abricate_flu_cpu	Int	Number of CPUs to allocate to the task	2	Optional
flu_track	abricate_flu_disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
flu_track	abricate_flu_docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/abricate:1.0.1-insaflu-220727	Optional
flu_track	abricate_flu_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	4	Optional
flu_track	abricate_flu_min_percent_coverage	Int	Minimum DNA percent coverage	60	Optional
flu_track	abricate_flu_min_percent_identity	Int	Minimum DNA percent identity	70	Optional
flu_track	antiviral_aa_subs	String	Additional list of antiviral resistance associated amino acid substitutions of interest to be searched against those called on the sample segments. They take the format of :, e.g. NA:A26V		Optional
flu_track	assembly_fasta	File	Internal component, do not modify		Optional
flu_track	assembly_metrics_cpu	Int	Number of CPUs to allocate to the task	2	Optional
flu_track	assembly_metrics_disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
flu_track	assembly_metrics_docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/samtools:1.15	Optional
flu_track	assembly_metrics_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional
flu_track	flu_h1_ha_ref	File	Internal component, do not modify		Optional
flu_track	flu_h1n1_m2_ref	File	Internal component, do not modify		Optional
flu_track	flu_h3_ha_ref	File	Internal component, do not modify		Optional
flu_track	flu_h3n2_m2_ref	File	Internal component, do not modify		Optional
flu_track	flu_n1_na_ref	File	Internal component, do not modify		Optional
flu_track	flu_n2_na_ref	File	Internal component, do not modify		Optional
flu_track	flu_pa_ref	File	Internal component, do not modify		Optional
flu_track	flu_pb1_ref	File	Internal component, do not modify		Optional
flu_track	flu_pb2_ref	File	Internal component, do not modify		Optional
flu_track	flu_subtype	String	The influenza subtype being analyzed. Used for picking nextclade datasets. Options: "Yamagata", "Victoria", "H1N1", "H3N2", "H5N1". Only use to override the subtype call from IRMA and ABRicate.		Optional
flu_track	genoflu_cpu	Int	Number of CPUs to allocate to the task	1	Optional
flu_track	genoflu_cross_reference	File	An Excel file to cross-reference BLAST findings; probably useful if novel genotypes are not in the default file used by genoflu.py		Optional
flu_track	genoflu_disk_size	Int	Amount of storage (in GB) to allocate to the task	25	Optional
flu_track	genoflu_docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/genoflu:1.06	Optional
flu_track	genoflu_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	2	Optional
flu_track	genoflu_min_percent_identity	Float	Percent identity threshold used for calling matches for each genome segment that make up the final GenoFlu genotype	98	Optional
flu_track	irma_cpu	Int	Number of CPUs to allocate to the task	4	Optional
flu_track	irma_disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
flu_track	irma_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/irma:1.2.0	Optional
flu_track	irma_keep_ref_deletions	Boolean	True/False variable that determines if sites missed (i.e. 0 reads for a site in the reference genome) during read gathering should be deleted by ambiguation by inserting N's or deleting the sequence entirely. False sets this IRMA paramater to "DEL" and true sets it to "NNN"	True	Optional
flu_track	irma_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	16	Optional
flu_track	irma_min_ambiguous_threshold	Float	Minimum called Single Nucleotide Variant (SNV) frequency for mixed based calls in the output consensus assembly (AKA amended consensus).	0.2	Optional
flu_track	irma_min_avg_consensus_allele_quality	Int	Minimum allele coverage depth to call plurality consensus, otherwise calls "N". Setting this value too high can negatively impact final amended consensus.	10	Optional
flu_track	irma_min_read_length	Int	Minimum read length to include reads in read gathering step in IRMA. This value should not be greater than the typical read length.	75	Optional
flu_track	nextclade_cpu	Int	Number of CPUs to allocate to the task	2	Optional
flu_track	nextclade_custom_input_dataset	File	For H5N1 flu samples only. A custom Nextclade dataset in JSON format. If provided, this dataset will be used to process any H5N1 flu samples. If not provided, a custom dataset will be selected depending on the GenoFLU Genotype.	Defaults are GenoFLU Genotype specific. Please find these default values here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl	Optional
flu_track	nextclade_disk_size	Int	Amount of storage (in GB) to allocate to the task	50	Optional
flu_track	nextclade_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/nextstrain/nextclade:3.14.5	Optional
flu_track	nextclade_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	4	Optional
flu_track	nextclade_output_parser_cpu	Int	Number of CPUs to allocate to the task	2	Optional
flu_track	nextclade_output_parser_disk_size	Int	Amount of storage (in GB) to allocate to the task	50	Optional
flu_track	nextclade_output_parser_docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/python/python:3.8.18-slim	Optional
flu_track	nextclade_output_parser_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	4	Optional
flu_track	vadr_outputs_tgz	File	Internal component, do not modify		Optional
ivar_consensus	ivar_bwa_cpu	Int	Number of CPUs to allocate to the task	6	Optional
ivar_consensus	ivar_bwa_disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
ivar_consensus	ivar_bwa_docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/ivar:1.3.1-titan	Optional
ivar_consensus	ivar_bwa_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	16	Optional
ivar_consensus	ivar_consensus_cpu	Int	Number of CPUs to allocate to the task	2	Optional
ivar_consensus	ivar_consensus_disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
ivar_consensus	ivar_consensus_docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/ivar:1.3.1-titan	Optional
ivar_consensus	ivar_consensus_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional
ivar_consensus	ivar_trim_primers_cpu	Int	Number of CPUs to allocate to the task	2	Optional
ivar_consensus	ivar_trim_primers_disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
ivar_consensus	ivar_trim_primers_docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/ivar:1.3.1-titan	Optional
ivar_consensus	ivar_trim_primers_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional
ivar_consensus	ivar_variant_cpu	Int	Number of CPUs to allocate to the task	2	Optional
ivar_consensus	ivar_variant_disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
ivar_consensus	ivar_variant_docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/ivar:1.3.1-titan	Optional
ivar_consensus	ivar_variant_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional
ivar_consensus	skip_N	Boolean	True/False variable that determines if regions with depth less than minimum depth should not be added to the consensus sequence	False	Optional
ivar_consensus	stats_n_coverage_cpu	Int	Number of CPUs to allocate to the task	2	Optional
ivar_consensus	stats_n_coverage_disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
ivar_consensus	stats_n_coverage_docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/samtools:1.15	Optional
ivar_consensus	stats_n_coverage_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional
ivar_consensus	stats_n_coverage_primtrim_cpu	Int	Number of CPUs to allocate to the task	2	Optional
ivar_consensus	stats_n_coverage_primtrim_disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
ivar_consensus	stats_n_coverage_primtrim_docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/samtools:1.15	Optional
ivar_consensus	stats_n_coverage_primtrim_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional
morgana_magic	abricate_flu_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	abricate_flu_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	abricate_flu_docker	String	The Docker container to use for the task		Optional
morgana_magic	abricate_flu_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	abricate_flu_min_percent_coverage	Int	Minimum DNA percent coverage		Optional
morgana_magic	abricate_flu_min_percent_identity	Int	Minimum DNA percent identity		Optional
morgana_magic	assembly_metrics_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	assembly_metrics_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	assembly_metrics_docker	String	The Docker container to use for the task		Optional
morgana_magic	assembly_metrics_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	gene_coverage_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	gene_coverage_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	gene_coverage_docker	String	The Docker container to use for the task		Optional
morgana_magic	gene_coverage_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	gene_coverage_min_depth	Int	The minimum depth to determine if a position was covered.		Optional
morgana_magic	genoflu_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	genoflu_cross_reference	File	An Excel file to cross-reference BLAST findings; probably useful if novel genotypes are not in the default file used by genoflu.py		Optional
morgana_magic	genoflu_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	genoflu_docker	String	The Docker container to use for the task		Optional
morgana_magic	genoflu_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	irma_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	irma_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	irma_docker_image	String	The Docker container to use for the task		Optional
morgana_magic	irma_keep_ref_deletions	Boolean	True/False variable that determines if sites missed (i.e. 0 reads for a site in the reference genome) during read gathering should be deleted by ambiguation by inserting N's or deleting the sequence entirely. False sets this IRMA paramater to "DEL" and true sets it to "NNN"		Optional
morgana_magic	irma_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	nextclade_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	nextclade_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	nextclade_docker_image	String	The Docker container to use for the task		Optional
morgana_magic	nextclade_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	nextclade_output_parser_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	nextclade_output_parser_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	nextclade_output_parser_docker	String	The Docker container to use for the task		Optional
morgana_magic	nextclade_output_parser_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	pangolin_analysis_mode	String	Specify which inference engine to use. Options: accurate (UShER), fast (pangoLEARN), pangolearn, usher.		Optional
morgana_magic	pangolin_arguments	String	Optional arguments for pangolin e.g. ''--skip-scorpio''		Optional
morgana_magic	pangolin_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	pangolin_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	pangolin_expanded_lineage	Boolean	True/False that determines if a lineage should be expanded without aliases (e.g., BA.1 → B.1.1.529.1)		Optional
morgana_magic	pangolin_max_ambig	Float	Maximum proportion of Ns allowed for pangolin to attempt assignment.		Optional
morgana_magic	pangolin_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	pangolin_min_length	Int	Minimum query length allowed for pangolin to attempt an assignment		Optional
morgana_magic	pangolin_skip_designation_cache	Boolean	A True/False option that determines if the designation cache should be used		Optional
morgana_magic	pangolin_skip_scorpio	Boolean	A True/False option that determines if scorpio should be skipped.		Optional
morgana_magic	quasitools_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	quasitools_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	quasitools_docker	String	The Docker container to use for the task		Optional
morgana_magic	quasitools_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	sc2_s_gene_start	Int	Start position of S gene		Optional
morgana_magic	sc2_s_gene_stop	Int	End position of S gene		Optional
morgana_magic	vadr_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	vadr_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	vadr_min_length	Int	Minimum length for the fasta-trim-terminal-ambigs.pl VADR script		Optional
organism_parameters	auspice_config	File	Auspice config file for customizing visualizations in the Augur_PHB workflow; takes priority over the other customization values available for augur_export. Defaults are set for various organisms & flu segments. A minimal auspice config file is set in cases where organism is not specified and user does not provide an optional input config file.		Optional
organism_parameters	clades_tsv	File	Internal component, do not modify		Optional
organism_parameters	flu_genoflu_genotype	String	Internal component, do not modify	N/A	Optional
organism_parameters	flu_segment	String	Influenza genome segment being analyzed. Options: "HA" or "NA". Automatically determined. This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs	N/A	Optional
organism_parameters	flu_subtype	String	The influenza subtype being analyzed. Options: "Yamagata", "Victoria", "H1N1", "H3N2", "H5N1". Automatically determined. This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs	N/A	Optional
organism_parameters	hiv_primer_version	String	The version of HIV primers used. Options are https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl#L156 and https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl#L164. This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs	v1	Optional
organism_parameters	lat_longs_tsv	File	Internal component, do not modify		Optional
organism_parameters	min_date	Float	Internal component, do not modify		Optional
organism_parameters	min_num_unambig	Int	Minimum number of called bases in genome to pass prefilter	Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments and subtypes) here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl. For an organism without set defaults, the default value is 0	Optional
organism_parameters	narrow_bandwidth	Float	Internal component, do not modify		Optional
organism_parameters	pivot_interval	Int	Internal component, do not modify		Optional
organism_parameters	proportion_wide	Float	Internal component, do not modify		Optional
organism_parameters	reference_genbank	File	Internal component, do not modify		Optional
qc_check_task	ani_highest_percent	Float	Internal component, do not modify		Optional
qc_check_task	ani_highest_percent_bases_aligned	Float	Internal component, do not modify		Optional
qc_check_task	assembly_length	Int	Internal component, do not modify		Optional
qc_check_task	busco_results	String	Internal component, do not modify		Optional
qc_check_task	combined_mean_q_clean	Float	Internal component, do not modify		Optional
qc_check_task	combined_mean_q_raw	Float	Internal component, do not modify		Optional
qc_check_task	combined_mean_readlength_clean	Float	Internal component, do not modify		Optional
qc_check_task	combined_mean_readlength_raw	Float	Internal component, do not modify		Optional
qc_check_task	cpu	Int	Number of CPUs to allocate to the task	4	Optional
qc_check_task	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
qc_check_task	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-03-16	Optional
qc_check_task	est_coverage_clean	Float	Internal component, do not modify		Optional
qc_check_task	est_coverage_raw	Float	Internal component, do not modify		Optional
qc_check_task	gambit_predicted_taxon	String	Internal component, do not modify		Optional
qc_check_task	kraken_sc2	Float	Internal component, do not modify		Optional
qc_check_task	kraken_sc2_dehosted	Float	Internal component, do not modify		Optional
qc_check_task	kraken_target_organism	Float	Internal component, do not modify		Optional
qc_check_task	kraken_target_organism_dehosted	Float	Internal component, do not modify		Optional
qc_check_task	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional
qc_check_task	midas_secondary_genus_abundance	Float	Internal component, do not modify		Optional
qc_check_task	midas_secondary_genus_coverage	Float	Internal component, do not modify		Optional
qc_check_task	n50_value	Int	Internal component, do not modify		Optional
qc_check_task	number_contigs	Int	Internal component, do not modify		Optional
qc_check_task	quast_gc_percent	Float	Internal component, do not modify		Optional
qc_check_task	r1_mean_q_clean	Float	Internal component, do not modify		Optional
qc_check_task	r1_mean_q_raw	Float	Internal component, do not modify		Optional
qc_check_task	r1_mean_readlength_clean	Float	Internal component, do not modify		Optional
qc_check_task	r1_mean_readlength_raw	Float	Internal component, do not modify		Optional
qc_check_task	r2_mean_q_clean	Float	Internal component, do not modify		Optional
qc_check_task	r2_mean_q_raw	Float	Internal component, do not modify		Optional
qc_check_task	r2_mean_readlength_clean	Float	Internal component, do not modify		Optional
qc_check_task	r2_mean_readlength_raw	Float	Internal component, do not modify		Optional
qc_check_task	sc2_s_gene_mean_coverage	Float	Internal component, do not modify		Optional
qc_check_task	sc2_s_gene_percent_coverage	Float	Internal component, do not modify		Optional
raw_check_reads	cpu	Int	Number of CPUs to allocate to the task	1	Optional
raw_check_reads	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
raw_check_reads	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/bactopia/gather_samples:2.0.2	Optional
raw_check_reads	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	2	Optional
read_QC_trim	bbduk_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional
read_QC_trim	call_kraken	Boolean	True/False variable that determines if the Kraken2 task should be called; for non-TheiaCoV workflows, the `kraken_db` variable must be provided.	False	Optional
read_QC_trim	call_midas	Boolean	True/False variable that determines if the MIDAS task should be called.	False	Optional
read_QC_trim	extract_unclassified	Boolean	Internal component, do not modify	False	Optional
read_QC_trim	fastp_args	String	Additional arguments to use with fastp	--detect_adapter_for_pe -g -5 20 -3 20	Optional
read_QC_trim	host	String	Internal component, do not modify		Optional
read_QC_trim	host_complete_only	Boolean	Internal component, do not modify	False	Optional
read_QC_trim	host_decontaminate_mem	Int	Internal component, do not modify	32	Optional
read_QC_trim	host_is_accession	Boolean	Internal component, do not modify	False	Optional
read_QC_trim	host_is_genome	Boolean	Inputted "host" is a genome URI	False	Optional
read_QC_trim	host_refseq	Boolean	Internal component, do not modify	True	Optional
read_QC_trim	kraken_cpu	Int	Number of CPUs to allocate to the task	4	Optional
read_QC_trim	kraken_db	File	A kraken2 database to use with the kraken2 optional task. The file must be a .tar.gz kraken2 database. Must contain human and viral sequences	gs://theiagen-public-resources-rp/reference_data/databases/kraken2/kraken2_humanGRCh38_viralRefSeq_20240828.tar.gz	Optional
read_QC_trim	kraken_disk_size	Int	Amount of storage (in GB) to allocate to the task. Increase this when using large (>30GB kraken2 databases such as the "k2_standard" database)	100	Optional
read_QC_trim	kraken_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	32	Optional
read_QC_trim	midas_db	File	Internal component, do not modify	gs://theiagen-public-files-rp/terra/theiaprok-files/midas/midas_db_v1.2.tar.gz	Optional
read_QC_trim	read_processing	String	The name of the tool to perform basic read processing; options: "trimmomatic" or "fastp"	trimmomatic	Optional
read_QC_trim	read_qc	String	The tool used for quality control (QC) of reads. Options are "fastq_scan" (default) and "fastqc"	fastq_scan	Optional
read_QC_trim	taxon_id	Int	Internal component, do not modify	0	Optional
read_QC_trim	trimmomatic_args	String	Additional arguments to pass to trimmomatic. "-phred33" specifies the Phred Q score encoding which is almost always phred33 with modern sequence data.	-phred33	Optional
theiacov_illumina_pe	adapters	File	A FASTA file containing adapter sequences	/bbmap/resources/adapters.fa	Optional
theiacov_illumina_pe	consensus_min_freq	Float	The minimum frequency for a variant to be called a SNP in consensus genome	0.6	Optional
theiacov_illumina_pe	genome_length	Int	User-specified expected genome length to be used in genome statistics calculations		Optional
theiacov_illumina_pe	max_genome_length	Int	Maximum genome length able to pass read screening	2673870	Optional
theiacov_illumina_pe	min_basepairs	Int	Minimum number of base pairs able to pass read screening	17000	Optional
theiacov_illumina_pe	min_coverage	Int	Minimum genome coverage able to pass read screening	10	Optional
theiacov_illumina_pe	min_depth	Int	Minimum depth of reads required to call variants and generate a consensus genome. This value is passed to the iVar software.	100	Optional
theiacov_illumina_pe	min_genome_length	Int	Minimum genome length to pass read screening	1700	Optional
theiacov_illumina_pe	min_proportion	Int	Minimum proportion of total reads in each read file to pass read screening	40	Optional
theiacov_illumina_pe	min_reads	Int	Minimum number of reads to pass read screening	57	Optional
theiacov_illumina_pe	nextclade_dataset_name	String	Nextclade organism dataset names. However, if organism input is set correctly, this input will be automatically assigned the corresponding dataset name. See organism defaults for more information	Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments) here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl	Optional
theiacov_illumina_pe	nextclade_dataset_tag	String	Nextclade dataset tag. Used for pulling up-to-date reference genomes and associated information specific to nextclade datasets (QC thresholds, organism-specific information like SARS-CoV-2 clade & lineage information, etc.) that is required for running the Nextclade tool.	Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments) here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl	Optional
theiacov_illumina_pe	organism	String	The organism that is being analyzed. Options: "sars-cov-2", "MPXV", "WNV", "HIV", "flu", "rsv_a", "rsv_b". However, "flu" is not available for TheiaCoV_Illumina_SE	sars-cov-2	Optional
theiacov_illumina_pe	pangolin_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/pangolin:4.3.1-pdata-1.34	Optional
theiacov_illumina_pe	phix	File	File that contains the phix used	/bbmap/resources/phix174_ill.ref.fa.gz	Optional
theiacov_illumina_pe	primer_bed	File	The bed file containing the primers used when sequencing was performed		Optional
theiacov_illumina_pe	qc_check_table	File	TSV value with taxons for rows and QC values for columns; internal cells represent user-determined QC thresholds; if provided, turns on the QC Check task. See below for an example QC Check table.		Optional
theiacov_illumina_pe	reference_gene_locations_bed	File	Use to provide locations of interest where average coverage will be calculated		Optional
theiacov_illumina_pe	reference_genome	File	An optional reference genome used for consensus assembly and QC		Optional
theiacov_illumina_pe	reference_gff	File	The general feature format (gff) of the reference genome.		Optional
theiacov_illumina_pe	seq_method	String	The sequencing methodology used to generate the input read data; for TheiaProk workflows, this input will be used in the "seq_id" column in any taxon-specific tables created in the Export Taxon Tables task	ILLUMINA	Optional
theiacov_illumina_pe	skip_screen	Boolean	Set to True to skip the read screening prior to analysis	False	Optional
theiacov_illumina_pe	target_organism	String	The organism whose abundance the user wants to check in their reads. This should be a proper taxonomic name recognized by the Kraken database.		Optional
theiacov_illumina_pe	trim_min_length	Int	Specifies minimum length of each read after trimming to be kept	75	Optional
theiacov_illumina_pe	trim_primers	Boolean	A True/False option that determines if primers should be trimmed.	True	Optional
theiacov_illumina_pe	trim_quality_min_score	Int	Specifies the minimum average quality of bases in a sliding window to be kept	30	Optional
theiacov_illumina_pe	trim_window_size	Int	Specifies window size for trimming (the number of bases to average the quality across)	4	Optional
theiacov_illumina_pe	vadr_max_length	Int	Maximum length of contig allowed to run VADR		Optional
theiacov_illumina_pe	vadr_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	32 (RSV-A and RSV-B) and 8 (all other TheiaCoV organisms)	Optional
theiacov_illumina_pe	vadr_model_file	File	Path to the a tar + gzipped VADR model file	Defaults are organism-specific. Please find default values for all organisms here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl.	Optional
theiacov_illumina_pe	vadr_options	String	Additional options to provide to VADR		Optional
theiacov_illumina_pe	vadr_skip_length	Int	Minimum assembly length (unambiguous) to run VADR	10000	Optional
theiacov_illumina_pe	variant_min_freq	Float	Minimum frequency for a variant to be reported in ivar outputs	0.6	Optional
version_capture	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0	Optional
version_capture	timezone	String	Set the time zone to get an accurate date of analysis (uses UTC by default)		Optional

Terra Task Name	Variable	Type	Description	Default Value	Terra Status
theiacov_illumina_se	read1	File	Illumina forward read file in FASTQ file format (compression optional)		Required
theiacov_illumina_se	samplename	String	The name of the sample being analyzed		Required
clean_check_reads	cpu	Int	Number of CPUs to allocate to the task	1	Optional
clean_check_reads	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
clean_check_reads	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/bactopia/gather_samples:2.0.2	Optional
clean_check_reads	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	2	Optional
consensus_qc	cpu	Int	Number of CPUs to allocate to the task	1	Optional
consensus_qc	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
consensus_qc	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/theiagen/utility:1.1	Optional
consensus_qc	genome_length	Int	Internal component, do not modify		Optional
consensus_qc	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	2	Optional
ivar_consensus	ivar_bwa_cpu	Int	Number of CPUs to allocate to the task	6	Optional
ivar_consensus	ivar_bwa_disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
ivar_consensus	ivar_bwa_docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/ivar:1.3.1-titan	Optional
ivar_consensus	ivar_bwa_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	16	Optional
ivar_consensus	ivar_consensus_cpu	Int	Number of CPUs to allocate to the task	2	Optional
ivar_consensus	ivar_consensus_disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
ivar_consensus	ivar_consensus_docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/ivar:1.3.1-titan	Optional
ivar_consensus	ivar_consensus_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional
ivar_consensus	ivar_trim_primers_cpu	Int	Number of CPUs to allocate to the task	2	Optional
ivar_consensus	ivar_trim_primers_disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
ivar_consensus	ivar_trim_primers_docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/ivar:1.3.1-titan	Optional
ivar_consensus	ivar_trim_primers_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional
ivar_consensus	ivar_variant_cpu	Int	Number of CPUs to allocate to the task	2	Optional
ivar_consensus	ivar_variant_disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
ivar_consensus	ivar_variant_docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/ivar:1.3.1-titan	Optional
ivar_consensus	ivar_variant_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional
ivar_consensus	read2	File	Internal component, do not modify		Optional
ivar_consensus	skip_N	Boolean	True/False variable that determines if regions with depth less than minimum depth should not be added to the consensus sequence	False	Optional
ivar_consensus	stats_n_coverage_cpu	Int	Number of CPUs to allocate to the task	2	Optional
ivar_consensus	stats_n_coverage_disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
ivar_consensus	stats_n_coverage_docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/samtools:1.15	Optional
ivar_consensus	stats_n_coverage_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional
ivar_consensus	stats_n_coverage_primtrim_cpu	Int	Number of CPUs to allocate to the task	2	Optional
ivar_consensus	stats_n_coverage_primtrim_disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
ivar_consensus	stats_n_coverage_primtrim_docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/samtools:1.15	Optional
ivar_consensus	stats_n_coverage_primtrim_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional
morgana_magic	abricate_flu_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	abricate_flu_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	abricate_flu_docker	String	The Docker container to use for the task		Optional
morgana_magic	abricate_flu_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	abricate_flu_min_percent_coverage	Int	Minimum DNA percent coverage		Optional
morgana_magic	abricate_flu_min_percent_identity	Int	Minimum DNA percent identity		Optional
morgana_magic	assembly_metrics_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	assembly_metrics_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	assembly_metrics_docker	String	The Docker container to use for the task		Optional
morgana_magic	assembly_metrics_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	gene_coverage_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	gene_coverage_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	gene_coverage_docker	String	The Docker container to use for the task		Optional
morgana_magic	gene_coverage_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	gene_coverage_min_depth	Int	The minimum depth to determine if a position was covered.		Optional
morgana_magic	genoflu_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	genoflu_cross_reference	File	An Excel file to cross-reference BLAST findings; probably useful if novel genotypes are not in the default file used by genoflu.py		Optional
morgana_magic	genoflu_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	genoflu_docker	String	The Docker container to use for the task		Optional
morgana_magic	genoflu_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	irma_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	irma_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	irma_docker_image	String	The Docker container to use for the task		Optional
morgana_magic	irma_keep_ref_deletions	Boolean	True/False variable that determines if sites missed (i.e. 0 reads for a site in the reference genome) during read gathering should be deleted by ambiguation by inserting N's or deleting the sequence entirely. False sets this IRMA paramater to "DEL" and true sets it to "NNN"		Optional
morgana_magic	irma_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	nextclade_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	nextclade_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	nextclade_docker_image	String	The Docker container to use for the task		Optional
morgana_magic	nextclade_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	nextclade_output_parser_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	nextclade_output_parser_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	nextclade_output_parser_docker	String	The Docker container to use for the task		Optional
morgana_magic	nextclade_output_parser_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	pangolin_analysis_mode	String	Specify which inference engine to use. Options: accurate (UShER), fast (pangoLEARN), pangolearn, usher.		Optional
morgana_magic	pangolin_arguments	String	Optional arguments for pangolin e.g. ''--skip-scorpio''		Optional
morgana_magic	pangolin_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	pangolin_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	pangolin_expanded_lineage	Boolean	True/False that determines if a lineage should be expanded without aliases (e.g., BA.1 → B.1.1.529.1)		Optional
morgana_magic	pangolin_max_ambig	Float	Maximum proportion of Ns allowed for pangolin to attempt assignment.		Optional
morgana_magic	pangolin_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	pangolin_min_length	Int	Minimum query length allowed for pangolin to attempt an assignment		Optional
morgana_magic	pangolin_skip_designation_cache	Boolean	A True/False option that determines if the designation cache should be used		Optional
morgana_magic	pangolin_skip_scorpio	Boolean	A True/False option that determines if scorpio should be skipped.		Optional
morgana_magic	quasitools_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	quasitools_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	quasitools_docker	String	The Docker container to use for the task		Optional
morgana_magic	quasitools_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	read2	File	Internal component, do not modify		Optional
morgana_magic	sc2_s_gene_start	Int	Start position of S gene		Optional
morgana_magic	sc2_s_gene_stop	Int	End position of S gene		Optional
morgana_magic	vadr_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	vadr_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	vadr_min_length	Int	Minimum length for the fasta-trim-terminal-ambigs.pl VADR script		Optional
organism_parameters	auspice_config	File	Auspice config file for customizing visualizations in the Augur_PHB workflow; takes priority over the other customization values available for augur_export. Defaults are set for various organisms & flu segments. A minimal auspice config file is set in cases where organism is not specified and user does not provide an optional input config file.		Optional
organism_parameters	clades_tsv	File	Internal component, do not modify		Optional
organism_parameters	flu_genoflu_genotype	String	Internal component, do not modify	N/A	Optional
organism_parameters	flu_segment	String	Influenza genome segment being analyzed. Options: "HA" or "NA". Automatically determined. This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs	N/A	Optional
organism_parameters	flu_subtype	String	The influenza subtype being analyzed. Options: "Yamagata", "Victoria", "H1N1", "H3N2", "H5N1". Automatically determined. This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs	N/A	Optional
organism_parameters	hiv_primer_version	String	The version of HIV primers used. Options are https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl#L156 and https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl#L164. This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs	v1	Optional
organism_parameters	kraken_target_organism_input	String	The organism whose abundance the user wants to check in their reads. This should be a proper taxonomic name recognized by the Kraken database.	Default provided for mpox (Monkeypox virus), WNV (West Nile virus), and HIV (Human immunodeficiency virus 1)	Optional
organism_parameters	lat_longs_tsv	File	Internal component, do not modify		Optional
organism_parameters	min_date	Float	Internal component, do not modify		Optional
organism_parameters	min_num_unambig	Int	Minimum number of called bases in genome to pass prefilter	Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments and subtypes) here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl. For an organism without set defaults, the default value is 0	Optional
organism_parameters	narrow_bandwidth	Float	Internal component, do not modify		Optional
organism_parameters	pivot_interval	Int	Internal component, do not modify		Optional
organism_parameters	proportion_wide	Float	Internal component, do not modify		Optional
organism_parameters	reference_genbank	File	Internal component, do not modify		Optional
qc_check_task	ani_highest_percent	Float	Internal component, do not modify		Optional
qc_check_task	ani_highest_percent_bases_aligned	Float	Internal component, do not modify		Optional
qc_check_task	assembly_length	Int	Internal component, do not modify		Optional
qc_check_task	busco_results	String	Internal component, do not modify		Optional
qc_check_task	combined_mean_q_clean	Float	Internal component, do not modify		Optional
qc_check_task	combined_mean_q_raw	Float	Internal component, do not modify		Optional
qc_check_task	combined_mean_readlength_clean	Float	Internal component, do not modify		Optional
qc_check_task	combined_mean_readlength_raw	Float	Internal component, do not modify		Optional
qc_check_task	cpu	Int	Number of CPUs to allocate to the task	4	Optional
qc_check_task	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
qc_check_task	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-03-16	Optional
qc_check_task	est_coverage_clean	Float	Internal component, do not modify		Optional
qc_check_task	est_coverage_raw	Float	Internal component, do not modify		Optional
qc_check_task	gambit_predicted_taxon	String	Internal component, do not modify		Optional
qc_check_task	kraken_human_dehosted	Float	Internal component, do not modify		Optional
qc_check_task	kraken_sc2	Float	Internal component, do not modify		Optional
qc_check_task	kraken_sc2_dehosted	Float	Internal component, do not modify		Optional
qc_check_task	kraken_target_organism	Float	Internal component, do not modify		Optional
qc_check_task	kraken_target_organism_dehosted	Float	Internal component, do not modify		Optional
qc_check_task	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional
qc_check_task	midas_secondary_genus_abundance	Float	Internal component, do not modify		Optional
qc_check_task	midas_secondary_genus_coverage	Float	Internal component, do not modify		Optional
qc_check_task	n50_value	Int	Internal component, do not modify		Optional
qc_check_task	num_reads_clean2	Int	Internal component, do not modify		Optional
qc_check_task	num_reads_raw2	Int	Internal component, do not modify		Optional
qc_check_task	number_contigs	Int	Internal component, do not modify		Optional
qc_check_task	quast_gc_percent	Float	Internal component, do not modify		Optional
qc_check_task	r1_mean_q_clean	Float	Internal component, do not modify		Optional
qc_check_task	r1_mean_q_raw	Float	Internal component, do not modify		Optional
qc_check_task	r1_mean_readlength_clean	Float	Internal component, do not modify		Optional
qc_check_task	r1_mean_readlength_raw	Float	Internal component, do not modify		Optional
qc_check_task	r2_mean_q_clean	Float	Internal component, do not modify		Optional
qc_check_task	r2_mean_q_raw	Float	Internal component, do not modify		Optional
qc_check_task	r2_mean_readlength_clean	Float	Internal component, do not modify		Optional
qc_check_task	r2_mean_readlength_raw	Float	Internal component, do not modify		Optional
qc_check_task	sc2_s_gene_mean_coverage	Float	Internal component, do not modify		Optional
qc_check_task	sc2_s_gene_percent_coverage	Float	Internal component, do not modify		Optional
raw_check_reads	cpu	Int	Number of CPUs to allocate to the task	1	Optional
raw_check_reads	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
raw_check_reads	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/bactopia/gather_samples:2.0.2	Optional
raw_check_reads	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	2	Optional
read_QC_trim	bbduk_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional
read_QC_trim	call_kraken	Boolean	True/False variable that determines if the Kraken2 task should be called; for non-TheiaCoV workflows, the `kraken_db` variable must be provided.	False	Optional
read_QC_trim	call_midas	Boolean	True/False variable that determines if the MIDAS task should be called.	False	Optional
read_QC_trim	fastp_args	String	Additional arguments to use with fastp	-g -5 20 -3 20	Optional
read_QC_trim	kraken_cpu	Int	Number of CPUs to allocate to the task	4	Optional
read_QC_trim	kraken_db	File	A kraken2 database to use with the kraken2 optional task. The file must be a .tar.gz kraken2 database. Must contain human and viral sequences	gs://theiagen-public-resources-rp/reference_data/databases/kraken2/kraken2_humanGRCh38_viralRefSeq_20240828.tar.gz	Optional
read_QC_trim	kraken_disk_size	Int	Amount of storage (in GB) to allocate to the task. Increase this when using large (>30GB kraken2 databases such as the "k2_standard" database)	100	Optional
read_QC_trim	kraken_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	32	Optional
read_QC_trim	midas_db	File	Internal component, do not modify	gs://theiagen-public-files-rp/terra/theiaprok-files/midas/midas_db_v1.2.tar.gz	Optional
read_QC_trim	read_processing	String	The name of the tool to perform basic read processing; options: "trimmomatic" or "fastp"	trimmomatic	Optional
read_QC_trim	read_qc	String	The tool used for quality control (QC) of reads. Options are "fastq_scan" (default) and "fastqc"	fastq_scan	Optional
read_QC_trim	trimmomatic_args	String	Additional arguments to pass to trimmomatic. "-phred33" specifies the Phred Q score encoding which is almost always phred33 with modern sequence data.	-phred33	Optional
theiacov_illumina_se	adapters	File	A FASTA file containing adapter sequences	/bbmap/resources/adapters.fa	Optional
theiacov_illumina_se	consensus_min_freq	Float	The minimum frequency for a variant to be called a SNP in consensus genome	0.6	Optional
theiacov_illumina_se	genome_length	Int	User-specified expected genome length to be used in genome statistics calculations		Optional
theiacov_illumina_se	max_genome_length	Int	Maximum genome length able to pass read screening	2673870	Optional
theiacov_illumina_se	min_basepairs	Int	Minimum number of base pairs able to pass read screening	17000	Optional
theiacov_illumina_se	min_coverage	Int	Minimum genome coverage able to pass read screening	10	Optional
theiacov_illumina_se	min_depth	Int	Minimum depth of reads required to call variants and generate a consensus genome. This value is passed to the iVar software.	100	Optional
theiacov_illumina_se	min_genome_length	Int	Minimum genome length to pass read screening	1700	Optional
theiacov_illumina_se	min_reads	Int	Minimum number of reads to pass read screening	57	Optional
theiacov_illumina_se	nextclade_dataset_name	String	Nextclade organism dataset names. However, if organism input is set correctly, this input will be automatically assigned the corresponding dataset name. See organism defaults for more information	Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments) here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl	Optional
theiacov_illumina_se	nextclade_dataset_tag	String	Nextclade dataset tag. Used for pulling up-to-date reference genomes and associated information specific to nextclade datasets (QC thresholds, organism-specific information like SARS-CoV-2 clade & lineage information, etc.) that is required for running the Nextclade tool.	Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments) here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl	Optional
theiacov_illumina_se	organism	String	The organism that is being analyzed. Options: "sars-cov-2", "MPXV", "WNV", "HIV", "flu", "rsv_a", "rsv_b". However, "flu" is not available for TheiaCoV_Illumina_SE	sars-cov-2	Optional
theiacov_illumina_se	pangolin_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/pangolin:4.3.1-pdata-1.34	Optional
theiacov_illumina_se	phix	File	File that contains the phix used	/bbmap/resources/phix174_ill.ref.fa.gz	Optional
theiacov_illumina_se	primer_bed	File	The bed file containing the primers used when sequencing was performed		Optional
theiacov_illumina_se	qc_check_table	File	TSV value with taxons for rows and QC values for columns; internal cells represent user-determined QC thresholds; if provided, turns on the QC Check task. See below for an example QC Check table.		Optional
theiacov_illumina_se	reference_gene_locations_bed	File	Use to provide locations of interest where average coverage will be calculated		Optional
theiacov_illumina_se	reference_genome	File	An optional reference genome used for consensus assembly and QC		Optional
theiacov_illumina_se	reference_gff	File	The general feature format (gff) of the reference genome.		Optional
theiacov_illumina_se	seq_method	String	The sequencing methodology used to generate the input read data; for TheiaProk workflows, this input will be used in the "seq_id" column in any taxon-specific tables created in the Export Taxon Tables task	ILLUMINA	Optional
theiacov_illumina_se	skip_mash	Boolean	If true, skips estimation of genome size and coverage using mash in read screening steps. As a result, providing true also prevents screening using these parameters.	False	Optional
theiacov_illumina_se	skip_screen	Boolean	Set to True to skip the read screening prior to analysis	False	Optional
theiacov_illumina_se	trim_min_length	Int	Specifies minimum length of each read after trimming to be kept	25	Optional
theiacov_illumina_se	trim_primers	Boolean	A True/False option that determines if primers should be trimmed.	True	Optional
theiacov_illumina_se	trim_quality_min_score	Int	Specifies the minimum average quality of bases in a sliding window to be kept	30	Optional
theiacov_illumina_se	trim_window_size	Int	Specifies window size for trimming (the number of bases to average the quality across)	4	Optional
theiacov_illumina_se	vadr_max_length	Int	Maximum length of contig allowed to run VADR		Optional
theiacov_illumina_se	vadr_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	32 (RSV-A and RSV-B) and 8 (all other TheiaCoV organisms)	Optional
theiacov_illumina_se	vadr_model_file	File	Path to the a tar + gzipped VADR model file	Defaults are organism-specific. Please find default values for all organisms here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl.	Optional
theiacov_illumina_se	vadr_options	String	Additional options to provide to VADR		Optional
theiacov_illumina_se	vadr_skip_length	Int	Minimum assembly length (unambiguous) to run VADR	10000	Optional
theiacov_illumina_se	variant_min_freq	Float	Minimum frequency for a variant to be reported in ivar outputs	0.6	Optional
version_capture	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0	Optional
version_capture	timezone	String	Set the time zone to get an accurate date of analysis (uses UTC by default)		Optional

Terra Task Name	Variable	Type	Description	Default Value	Terra Status
theiacov_ont	read1	File	ONT read file in FASTQ file format (compression optional)		Required
theiacov_ont	samplename	String	The name of the sample being analyzed		Required
clean_check_reads	cpu	Int	Number of CPUs to allocate to the task	1	Optional
clean_check_reads	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
clean_check_reads	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/bactopia/gather_samples:2.0.2	Optional
clean_check_reads	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	2	Optional
consensus	cpu	Int	Number of CPUs to allocate to the task	8	Optional
consensus	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
consensus	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/artic-ncov2019-epi2me	Optional
consensus	medaka_model	String	In order to obtain the best results, the appropriate model must be set to match the sequencer's basecaller model; this string takes the format of {pore}{device}{caller variant}_{caller_version}. See also https://github.com/nanoporetech/medaka?tab=readme-ov-file#models.	r941_min_high_g360	Optional
consensus	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	16	Optional
consensus_qc	cpu	Int	Number of CPUs to allocate to the task	1	Optional
consensus_qc	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
consensus_qc	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/theiagen/utility:1.1	Optional
consensus_qc	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	2	Optional
flu_track	abricate_flu_cpu	Int	Number of CPUs to allocate to the task	2	Optional
flu_track	abricate_flu_disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
flu_track	abricate_flu_docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/abricate:1.0.1-insaflu-220727	Optional
flu_track	abricate_flu_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	4	Optional
flu_track	abricate_flu_min_percent_coverage	Int	Minimum DNA percent coverage	60	Optional
flu_track	abricate_flu_min_percent_identity	Int	Minimum DNA percent identity	70	Optional
flu_track	antiviral_aa_subs	String	Additional list of antiviral resistance associated amino acid substitutions of interest to be searched against those called on the sample segments. They take the format of :, e.g. NA:A26V		Optional
flu_track	assembly_fasta	File	Internal component, do not modify		Optional
flu_track	assembly_metrics_cpu	Int	Number of CPUs to allocate to the task	2	Optional
flu_track	assembly_metrics_disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
flu_track	assembly_metrics_docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/samtools:1.15	Optional
flu_track	assembly_metrics_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional
flu_track	flu_h1_ha_ref	File	Internal component, do not modify		Optional
flu_track	flu_h1n1_m2_ref	File	Internal component, do not modify		Optional
flu_track	flu_h3_ha_ref	File	Internal component, do not modify		Optional
flu_track	flu_h3n2_m2_ref	File	Internal component, do not modify		Optional
flu_track	flu_n1_na_ref	File	Internal component, do not modify		Optional
flu_track	flu_n2_na_ref	File	Internal component, do not modify		Optional
flu_track	flu_pa_ref	File	Internal component, do not modify		Optional
flu_track	flu_pb1_ref	File	Internal component, do not modify		Optional
flu_track	flu_pb2_ref	File	Internal component, do not modify		Optional
flu_track	flu_subtype	String	The influenza subtype being analyzed. Used for picking nextclade datasets. Options: "Yamagata", "Victoria", "H1N1", "H3N2", "H5N1". Only use to override the subtype call from IRMA and ABRicate.		Optional
flu_track	genoflu_cpu	Int	Number of CPUs to allocate to the task	1	Optional
flu_track	genoflu_cross_reference	File	An Excel file to cross-reference BLAST findings; probably useful if novel genotypes are not in the default file used by genoflu.py		Optional
flu_track	genoflu_disk_size	Int	Amount of storage (in GB) to allocate to the task	25	Optional
flu_track	genoflu_docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/genoflu:1.06	Optional
flu_track	genoflu_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	2	Optional
flu_track	genoflu_min_percent_identity	Float	Percent identity threshold used for calling matches for each genome segment that make up the final GenoFlu genotype	98	Optional
flu_track	irma_cpu	Int	Number of CPUs to allocate to the task	4	Optional
flu_track	irma_disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
flu_track	irma_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/irma:1.2.0	Optional
flu_track	irma_keep_ref_deletions	Boolean	True/False variable that determines if sites missed (i.e. 0 reads for a site in the reference genome) during read gathering should be deleted by ambiguation by inserting N's or deleting the sequence entirely. False sets this IRMA paramater to "DEL" and true sets it to "NNN"	True	Optional
flu_track	irma_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	16	Optional
flu_track	irma_min_ambiguous_threshold	Float	Minimum called Single Nucleotide Variant (SNV) frequency for mixed based calls in the output consensus assembly (AKA amended consensus).	0.2	Optional
flu_track	irma_min_avg_consensus_allele_quality	Int	Minimum allele coverage depth to call plurality consensus, otherwise calls "N". Setting this value too high can negatively impact final amended consensus.	10	Optional
flu_track	irma_min_read_length	Int	Minimum read length to include reads in read gathering step in IRMA. This value should not be greater than the typical read length.	75	Optional
flu_track	nextclade_cpu	Int	Number of CPUs to allocate to the task	2	Optional
flu_track	nextclade_custom_input_dataset	File	For H5N1 flu samples only. A custom Nextclade dataset in JSON format. If provided, this dataset will be used to process any H5N1 flu samples. If not provided, a custom dataset will be selected depending on the GenoFLU Genotype.	Defaults are GenoFLU Genotype specific. Please find these default values here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl	Optional
flu_track	nextclade_disk_size	Int	Amount of storage (in GB) to allocate to the task	50	Optional
flu_track	nextclade_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/nextstrain/nextclade:3.14.5	Optional
flu_track	nextclade_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	4	Optional
flu_track	nextclade_output_parser_cpu	Int	Number of CPUs to allocate to the task	2	Optional
flu_track	nextclade_output_parser_disk_size	Int	Amount of storage (in GB) to allocate to the task	50	Optional
flu_track	nextclade_output_parser_docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/python/python:3.8.18-slim	Optional
flu_track	nextclade_output_parser_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	4	Optional
flu_track	read2	File	Internal component, do not modify		Optional
flu_track	vadr_outputs_tgz	File	Internal component, do not modify		Optional
morgana_magic	abricate_flu_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	abricate_flu_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	abricate_flu_docker	String	The Docker container to use for the task		Optional
morgana_magic	abricate_flu_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	abricate_flu_min_percent_coverage	Int	Minimum DNA percent coverage		Optional
morgana_magic	abricate_flu_min_percent_identity	Int	Minimum DNA percent identity		Optional
morgana_magic	assembly_metrics_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	assembly_metrics_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	assembly_metrics_docker	String	The Docker container to use for the task		Optional
morgana_magic	assembly_metrics_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	gene_coverage_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	gene_coverage_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	gene_coverage_docker	String	The Docker container to use for the task		Optional
morgana_magic	gene_coverage_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	gene_coverage_min_depth	Int	The minimum depth to determine if a position was covered.		Optional
morgana_magic	genoflu_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	genoflu_cross_reference	File	An Excel file to cross-reference BLAST findings; probably useful if novel genotypes are not in the default file used by genoflu.py		Optional
morgana_magic	genoflu_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	genoflu_docker	String	The Docker container to use for the task		Optional
morgana_magic	genoflu_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	irma_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	irma_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	irma_docker_image	String	The Docker container to use for the task		Optional
morgana_magic	irma_keep_ref_deletions	Boolean	True/False variable that determines if sites missed (i.e. 0 reads for a site in the reference genome) during read gathering should be deleted by ambiguation by inserting N's or deleting the sequence entirely. False sets this IRMA paramater to "DEL" and true sets it to "NNN"		Optional
morgana_magic	irma_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	nextclade_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	nextclade_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	nextclade_docker_image	String	The Docker container to use for the task		Optional
morgana_magic	nextclade_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	nextclade_output_parser_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	nextclade_output_parser_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	nextclade_output_parser_docker	String	The Docker container to use for the task		Optional
morgana_magic	nextclade_output_parser_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	pangolin_analysis_mode	String	Specify which inference engine to use. Options: accurate (UShER), fast (pangoLEARN), pangolearn, usher.		Optional
morgana_magic	pangolin_arguments	String	Optional arguments for pangolin e.g. ''--skip-scorpio''		Optional
morgana_magic	pangolin_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	pangolin_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	pangolin_expanded_lineage	Boolean	True/False that determines if a lineage should be expanded without aliases (e.g., BA.1 → B.1.1.529.1)		Optional
morgana_magic	pangolin_max_ambig	Float	Maximum proportion of Ns allowed for pangolin to attempt assignment.		Optional
morgana_magic	pangolin_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	pangolin_min_length	Int	Minimum query length allowed for pangolin to attempt an assignment		Optional
morgana_magic	pangolin_skip_designation_cache	Boolean	A True/False option that determines if the designation cache should be used		Optional
morgana_magic	pangolin_skip_scorpio	Boolean	A True/False option that determines if scorpio should be skipped.		Optional
morgana_magic	quasitools_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	quasitools_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	quasitools_docker	String	The Docker container to use for the task		Optional
morgana_magic	quasitools_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	read2	File	Internal component, do not modify		Optional
morgana_magic	sc2_s_gene_start	Int	Start position of S gene		Optional
morgana_magic	sc2_s_gene_stop	Int	End position of S gene		Optional
morgana_magic	vadr_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	vadr_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	vadr_min_length	Int	Minimum length for the fasta-trim-terminal-ambigs.pl VADR script		Optional
nanoplot_clean	cpu	Int	Number of CPUs to allocate to the task	4	Optional
nanoplot_clean	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
nanoplot_clean	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/nanoplot:1.40.0	Optional
nanoplot_clean	max_length	Int	The maximum length of clean reads, for which reads longer than the length specified will be hidden.	100000	Optional
nanoplot_clean	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	16	Optional
nanoplot_raw	cpu	Int	Number of CPUs to allocate to the task	4	Optional
nanoplot_raw	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
nanoplot_raw	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/nanoplot:1.40.0	Optional
nanoplot_raw	max_length	Int	The maximum length of clean reads, for which reads longer than the length specified will be hidden.	100000	Optional
nanoplot_raw	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	16	Optional
organism_parameters	auspice_config	File	Auspice config file for customizing visualizations in the Augur_PHB workflow; takes priority over the other customization values available for augur_export. Defaults are set for various organisms & flu segments. A minimal auspice config file is set in cases where organism is not specified and user does not provide an optional input config file.		Optional
organism_parameters	clades_tsv	File	Internal component, do not modify		Optional
organism_parameters	flu_genoflu_genotype	String	Internal component, do not modify	N/A	Optional
organism_parameters	flu_segment	String	Influenza genome segment being analyzed. Options: "HA" or "NA". Automatically determined. This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs	N/A	Optional
organism_parameters	flu_subtype	String	The influenza subtype being analyzed. Options: "Yamagata", "Victoria", "H1N1", "H3N2", "H5N1". Automatically determined. This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs	N/A	Optional
organism_parameters	hiv_primer_version	String	The version of HIV primers used. Options are https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl#L156 and https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl#L164. This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs	v1	Optional
organism_parameters	lat_longs_tsv	File	Internal component, do not modify		Optional
organism_parameters	min_date	Float	Internal component, do not modify		Optional
organism_parameters	min_num_unambig	Int	Minimum number of called bases in genome to pass prefilter	Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments and subtypes) here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl. For an organism without set defaults, the default value is 0	Optional
organism_parameters	narrow_bandwidth	Float	Internal component, do not modify		Optional
organism_parameters	pivot_interval	Int	Internal component, do not modify		Optional
organism_parameters	proportion_wide	Float	Internal component, do not modify		Optional
organism_parameters	reference_genbank	File	Internal component, do not modify		Optional
organism_parameters	reference_gff_file	File	Reference GFF file for the organism being analyzed	Default provided for mpox ("gs://theiagen-public-resources-rp/reference_data/viral/mpox/Mpox-MT903345.1.reference.gff3") and HIV (primer versions 1 ["gs://theiagen-public-resources-rp/reference_data/viral/hiv/NC_001802.1.gff3"] and 2 ["gs://theiagen-public-resources-rp/reference_data/viral/hiv/AY228557.1.gff3"])	Optional
qc_check_task	ani_highest_percent	Float	Internal component, do not modify		Optional
qc_check_task	ani_highest_percent_bases_aligned	Float	Internal component, do not modify		Optional
qc_check_task	assembly_length	Int	Internal component, do not modify		Optional
qc_check_task	busco_results	String	Internal component, do not modify		Optional
qc_check_task	combined_mean_q_clean	Float	Internal component, do not modify		Optional
qc_check_task	combined_mean_q_raw	Float	Internal component, do not modify		Optional
qc_check_task	combined_mean_readlength_clean	Float	Internal component, do not modify		Optional
qc_check_task	combined_mean_readlength_raw	Float	Internal component, do not modify		Optional
qc_check_task	cpu	Int	Number of CPUs to allocate to the task	4	Optional
qc_check_task	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
qc_check_task	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-03-16	Optional
qc_check_task	est_coverage_clean	Float	Internal component, do not modify		Optional
qc_check_task	est_coverage_raw	Float	Internal component, do not modify		Optional
qc_check_task	gambit_predicted_taxon	String	Internal component, do not modify		Optional
qc_check_task	kraken_human_dehosted	Float	Internal component, do not modify		Optional
qc_check_task	kraken_sc2	Float	Internal component, do not modify		Optional
qc_check_task	kraken_sc2_dehosted	Float	Internal component, do not modify		Optional
qc_check_task	kraken_target_organism	Float	Internal component, do not modify		Optional
qc_check_task	kraken_target_organism_dehosted	Float	Internal component, do not modify		Optional
qc_check_task	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional
qc_check_task	midas_secondary_genus_abundance	Float	Internal component, do not modify		Optional
qc_check_task	midas_secondary_genus_coverage	Float	Internal component, do not modify		Optional
qc_check_task	n50_value	Int	Internal component, do not modify		Optional
qc_check_task	num_reads_clean2	Int	Internal component, do not modify		Optional
qc_check_task	num_reads_raw2	Int	Internal component, do not modify		Optional
qc_check_task	number_contigs	Int	Internal component, do not modify		Optional
qc_check_task	quast_gc_percent	Float	Internal component, do not modify		Optional
qc_check_task	r1_mean_q_clean	Float	Internal component, do not modify		Optional
qc_check_task	r1_mean_q_raw	Float	Internal component, do not modify		Optional
qc_check_task	r1_mean_readlength_clean	Float	Internal component, do not modify		Optional
qc_check_task	r1_mean_readlength_raw	Float	Internal component, do not modify		Optional
qc_check_task	r2_mean_q_clean	Float	Internal component, do not modify		Optional
qc_check_task	r2_mean_q_raw	Float	Internal component, do not modify		Optional
qc_check_task	r2_mean_readlength_clean	Float	Internal component, do not modify		Optional
qc_check_task	r2_mean_readlength_raw	Float	Internal component, do not modify		Optional
qc_check_task	sc2_s_gene_mean_coverage	Float	Internal component, do not modify		Optional
qc_check_task	sc2_s_gene_percent_coverage	Float	Internal component, do not modify		Optional
raw_check_reads	cpu	Int	Number of CPUs to allocate to the task	1	Optional
raw_check_reads	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
raw_check_reads	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/bactopia/gather_samples:2.0.2	Optional
raw_check_reads	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	2	Optional
read_QC_trim	artic_guppyplex_cpu	Int	Number of CPUs to allocate to the task	8	Optional
read_QC_trim	artic_guppyplex_disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
read_QC_trim	artic_guppyplex_docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/artic-ncov2019:1.3.0-medaka-1.4.3	Optional
read_QC_trim	artic_guppyplex_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	16	Optional
read_QC_trim	call_kraken	Boolean	True/False variable that determines if the Kraken2 task should be called; for non-TheiaCoV workflows, the `kraken_db` variable must be provided.	False	Optional
read_QC_trim	downsampling_coverage	Float	The desired coverage to sub-sample the reads to with RASUSA	150	Optional
read_QC_trim	kraken2_recalculate_abundances_cpu	Int	Number of CPUs to allocate to the task	4	Optional
read_QC_trim	kraken2_recalculate_abundances_disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
read_QC_trim	kraken2_recalculate_abundances_docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-08-28-v4	Optional
read_QC_trim	kraken2_recalculate_abundances_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional
read_QC_trim	kraken_cpu	Int	Number of CPUs to allocate to the task	4	Optional
read_QC_trim	kraken_db	File	A kraken2 database to use with the kraken2 optional task. The file must be a .tar.gz kraken2 database. Must contain human and viral sequences	gs://theiagen-public-resources-rp/reference_data/databases/kraken2/kraken2_humanGRCh38_viralRefSeq_20240828.tar.gz	Optional
read_QC_trim	kraken_disk_size	Int	Amount of storage (in GB) to allocate to the task. Increase this when using large (>30GB kraken2 databases such as the "k2_standard" database)	100	Optional
read_QC_trim	kraken_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/kraken2:2.1.2-no-db	Optional
read_QC_trim	kraken_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	32	Optional
read_QC_trim	nanoq_cpu	Int	Number of CPUs to allocate to the task	2	Optional
read_QC_trim	nanoq_disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
read_QC_trim	nanoq_docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/biocontainers/nanoq:0.9.0--hec16e2b_1	Optional
read_QC_trim	nanoq_max_read_length	Int	The maximum read length to keep after trimming	100000	Optional
read_QC_trim	nanoq_max_read_qual	Int	The maximum read quality to keep after trimming	40	Optional
read_QC_trim	nanoq_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	2	Optional
read_QC_trim	nanoq_min_read_length	Int	The minimum read length to keep after trimming	500	Optional
read_QC_trim	nanoq_min_read_qual	Int	The minimum read quality to keep after trimming	10	Optional
read_QC_trim	ncbi_scrub_cpu	Int	Number of CPUs to allocate to the task	4	Optional
read_QC_trim	ncbi_scrub_disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
read_QC_trim	ncbi_scrub_docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/ncbi/sra-human-scrubber:2.2.1	Optional
read_QC_trim	ncbi_scrub_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional
read_QC_trim	rasusa_bases	String	Internal component, do not modify		Optional
read_QC_trim	rasusa_cpu	Int	Internal component, do not modify		Optional
read_QC_trim	rasusa_disk_size	Int	Internal component, do not modify		Optional
read_QC_trim	rasusa_docker	String	Internal component, do not modify		Optional
read_QC_trim	rasusa_fraction_of_reads	Float	Internal component, do not modify		Optional
read_QC_trim	rasusa_memory	Int	Internal component, do not modify		Optional
read_QC_trim	rasusa_number_of_reads	Int	Internal component, do not modify		Optional
read_QC_trim	rasusa_seed	Int	Internal component, do not modify		Optional
stats_n_coverage	cpu	Int	Number of CPUs to allocate to the task	2	Optional
stats_n_coverage	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
stats_n_coverage	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/samtools:1.15	Optional
stats_n_coverage	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional
stats_n_coverage_primtrim	cpu	Int	Number of CPUs to allocate to the task	2	Optional
stats_n_coverage_primtrim	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
stats_n_coverage_primtrim	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/samtools:1.15	Optional
stats_n_coverage_primtrim	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional
theiacov_ont	genome_length	Int	User-specified expected genome length to be used in genome statistics calculations		Optional
theiacov_ont	irma_min_consensus_support	Int	Minimum consensus support threshold used by IRMA with ONT data.	50	Optional
theiacov_ont	max_genome_length	Int	Maximum genome length able to pass read screening	2673870	Optional
theiacov_ont	max_length	Int	Maximum length for a read based on the SARS-CoV-2 primer scheme	700	Optional
theiacov_ont	min_basepairs	Int	Minimum number of base pairs able to pass read screening	17000	Optional
theiacov_ont	min_coverage	Int	Minimum genome coverage able to pass read screening	10	Optional
theiacov_ont	min_genome_length	Int	Minimum genome length to pass read screening	1700	Optional
theiacov_ont	min_length	Int	Minimum length of a read based on the SARS-CoV-2 primer scheme	400	Optional
theiacov_ont	min_reads	Int	Minimum number of reads to pass read screening	57	Optional
theiacov_ont	nextclade_dataset_name	String	Nextclade organism dataset names. However, if organism input is set correctly, this input will be automatically assigned the corresponding dataset name. See organism defaults for more information	Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments) here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl	Optional
theiacov_ont	nextclade_dataset_tag	String	Nextclade dataset tag. Used for pulling up-to-date reference genomes and associated information specific to nextclade datasets (QC thresholds, organism-specific information like SARS-CoV-2 clade & lineage information, etc.) that is required for running the Nextclade tool.	Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments) here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl	Optional
theiacov_ont	normalise	Int	Used to normalize the amount of reads to the indicated level before variant calling	200	Optional
theiacov_ont	organism	String	The organism that is being analyzed. Options: "sars-cov-2", "MPXV", "WNV", "HIV", "flu", "rsv_a", "rsv_b". However, "flu" is not available for TheiaCoV_Illumina_SE	sars-cov-2	Optional
theiacov_ont	pangolin_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/pangolin:4.3.1-pdata-1.34	Optional
theiacov_ont	primer_bed	File	The bed file containing the primers used when sequencing was performed		Optional
theiacov_ont	qc_check_table	File	TSV value with taxons for rows and QC values for columns; internal cells represent user-determined QC thresholds; if provided, turns on the QC Check task. See below for an example QC Check table.		Optional
theiacov_ont	reference_gene_locations_bed	File	Use to provide locations of interest where average coverage will be calculated		Optional
theiacov_ont	reference_genome	File	An optional reference genome used for consensus assembly and QC		Optional
theiacov_ont	seq_method	String	The sequencing methodology used to generate the input read data; for TheiaProk workflows, this input will be used in the "seq_id" column in any taxon-specific tables created in the Export Taxon Tables task	OXFORD_NANOPORE	Optional
theiacov_ont	skip_mash	Boolean	If true, skips estimation of genome size and coverage using mash in read screening steps. As a result, providing true also prevents screening using these parameters.	False	Optional
theiacov_ont	skip_screen	Boolean	Set to True to skip the read screening prior to analysis	False	Optional
theiacov_ont	target_organism	String	The organism whose abundance the user wants to check in their reads. This should be a proper taxonomic name recognized by the Kraken database.		Optional
theiacov_ont	vadr_max_length	Int	Maximum length of contig allowed to run VADR		Optional
theiacov_ont	vadr_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	32 (RSV-A and RSV-B) and 8 (all other TheiaCoV organisms)	Optional
theiacov_ont	vadr_model_file	File	Path to the a tar + gzipped VADR model file	Defaults are organism-specific. Please find default values for all organisms here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl.	Optional
theiacov_ont	vadr_options	String	Additional options to provide to VADR		Optional
theiacov_ont	vadr_skip_length	Int	Minimum assembly length (unambiguous) to run VADR	10000	Optional
version_capture	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0	Optional
version_capture	timezone	String	Set the time zone to get an accurate date of analysis (uses UTC by default)		Optional

Terra Task Name	Variable	Type	Description	Default Value	Terra Status
theiacov_fasta	assembly_fasta	File	Input assembly FASTA file. Must contain either all 8 influenza genome segments or a single segment provided in multi-FASTA format.		Required
theiacov_fasta	input_assembly_method	String	Method used to generate the assembly file		Required
theiacov_fasta	samplename	String	The name of the sample being analyzed		Required
theiacov_fasta	seq_method	String	The sequencing methodology used to generate the input read data		Required
theiacov_fasta	flu_segment	String	Influenza genome segment being analyzed. Options: "HA" or "NA". Only required if input assembly is a singular flu segment.		Optional, Required
consensus_qc	cpu	Int	Number of CPUs to allocate to the task	1	Optional
consensus_qc	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
consensus_qc	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/theiagen/utility:1.1	Optional
consensus_qc	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	2	Optional
flu_track	abricate_flu_cpu	Int	Number of CPUs to allocate to the task	2	Optional
flu_track	abricate_flu_disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
flu_track	abricate_flu_docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/abricate:1.0.1-insaflu-220727	Optional
flu_track	abricate_flu_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	4	Optional
flu_track	abricate_flu_min_percent_coverage	Int	Minimum DNA percent coverage	60	Optional
flu_track	abricate_flu_min_percent_identity	Int	Minimum DNA percent identity	70	Optional
flu_track	antiviral_aa_subs	String	Additional list of antiviral resistance associated amino acid substitutions of interest to be searched against those called on the sample segments. They take the format of :, e.g. NA:A26V		Optional
flu_track	assembly_metrics_cpu	Int	Internal component, do not modify		Optional
flu_track	assembly_metrics_disk_size	Int	Internal component, do not modify		Optional
flu_track	assembly_metrics_docker	String	Internal component, do not modify		Optional
flu_track	assembly_metrics_memory	Int	Internal component, do not modify		Optional
flu_track	flu_h1_ha_ref	File	Internal component, do not modify		Optional
flu_track	flu_h1n1_m2_ref	File	Internal component, do not modify		Optional
flu_track	flu_h3_ha_ref	File	Internal component, do not modify		Optional
flu_track	flu_h3n2_m2_ref	File	Internal component, do not modify		Optional
flu_track	flu_n1_na_ref	File	Internal component, do not modify		Optional
flu_track	flu_n2_na_ref	File	Internal component, do not modify		Optional
flu_track	flu_pa_ref	File	Internal component, do not modify		Optional
flu_track	flu_pb1_ref	File	Internal component, do not modify		Optional
flu_track	flu_pb2_ref	File	Internal component, do not modify		Optional
flu_track	genoflu_cpu	Int	Number of CPUs to allocate to the task	1	Optional
flu_track	genoflu_cross_reference	File	An Excel file to cross-reference BLAST findings; probably useful if novel genotypes are not in the default file used by genoflu.py		Optional
flu_track	genoflu_disk_size	Int	Amount of storage (in GB) to allocate to the task	25	Optional
flu_track	genoflu_docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/genoflu:1.06	Optional
flu_track	genoflu_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	2	Optional
flu_track	genoflu_min_percent_identity	Float	Percent identity threshold used for calling matches for each genome segment that make up the final GenoFlu genotype	98	Optional
flu_track	irma_cpu	Int	Internal component, do not modify		Optional
flu_track	irma_disk_size	Int	Internal component, do not modify		Optional
flu_track	irma_docker_image	String	Internal component, do not modify		Optional
flu_track	irma_keep_ref_deletions	Boolean	Internal component, do not modify		Optional
flu_track	irma_memory	Int	Internal component, do not modify		Optional
flu_track	irma_min_ambiguous_threshold	Float	Internal component, do not modify		Optional
flu_track	irma_min_avg_consensus_allele_quality	Int	Internal component, do not modify		Optional
flu_track	irma_min_consensus_support	Int	Internal component, do not modify		Optional
flu_track	irma_min_read_length	Int	Internal component, do not modify		Optional
flu_track	nextclade_cpu	Int	Number of CPUs to allocate to the task	2	Optional
flu_track	nextclade_custom_input_dataset	File	For H5N1 flu samples only. A custom Nextclade dataset in JSON format. If provided, this dataset will be used to process any H5N1 flu samples. If not provided, a custom dataset will be selected depending on the GenoFLU Genotype.	Defaults are GenoFLU Genotype specific. Please find these default values here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl	Optional
flu_track	nextclade_disk_size	Int	Amount of storage (in GB) to allocate to the task	50	Optional
flu_track	nextclade_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/nextstrain/nextclade:3.14.5	Optional
flu_track	nextclade_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	4	Optional
flu_track	nextclade_output_parser_cpu	Int	Number of CPUs to allocate to the task	2	Optional
flu_track	nextclade_output_parser_disk_size	Int	Amount of storage (in GB) to allocate to the task	50	Optional
flu_track	nextclade_output_parser_docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/python/python:3.8.18-slim	Optional
flu_track	nextclade_output_parser_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	4	Optional
flu_track	read1	File	Internal component, do not modify		Optional
flu_track	read2	File	Internal component, do not modify		Optional
morgana_magic	abricate_flu_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	abricate_flu_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	abricate_flu_docker	String	The Docker container to use for the task		Optional
morgana_magic	abricate_flu_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	abricate_flu_min_percent_coverage	Int	Minimum DNA percent coverage		Optional
morgana_magic	abricate_flu_min_percent_identity	Int	Minimum DNA percent identity		Optional
morgana_magic	assembly_metrics_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	assembly_metrics_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	assembly_metrics_docker	String	The Docker container to use for the task		Optional
morgana_magic	assembly_metrics_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	gene_coverage_bam	File	Bam file used for calculating gene coverage		Optional
morgana_magic	gene_coverage_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	gene_coverage_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	gene_coverage_docker	String	The Docker container to use for the task		Optional
morgana_magic	gene_coverage_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	gene_coverage_min_depth	Int	The minimum depth to determine if a position was covered.		Optional
morgana_magic	genoflu_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	genoflu_cross_reference	File	An Excel file to cross-reference BLAST findings; probably useful if novel genotypes are not in the default file used by genoflu.py		Optional
morgana_magic	genoflu_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	genoflu_docker	String	The Docker container to use for the task		Optional
morgana_magic	genoflu_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	irma_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	irma_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	irma_docker_image	String	The Docker container to use for the task		Optional
morgana_magic	irma_keep_ref_deletions	Boolean	True/False variable that determines if sites missed (i.e. 0 reads for a site in the reference genome) during read gathering should be deleted by ambiguation by inserting N's or deleting the sequence entirely. False sets this IRMA paramater to "DEL" and true sets it to "NNN"		Optional
morgana_magic	irma_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	nextclade_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	nextclade_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	nextclade_docker_image	String	The Docker container to use for the task		Optional
morgana_magic	nextclade_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	nextclade_output_parser_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	nextclade_output_parser_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	nextclade_output_parser_docker	String	The Docker container to use for the task		Optional
morgana_magic	nextclade_output_parser_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	pangolin_analysis_mode	String	Specify which inference engine to use. Options: accurate (UShER), fast (pangoLEARN), pangolearn, usher.		Optional
morgana_magic	pangolin_arguments	String	Optional arguments for pangolin e.g. ''--skip-scorpio''		Optional
morgana_magic	pangolin_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	pangolin_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	pangolin_docker_image	String	The Docker container to use for the task		Optional
morgana_magic	pangolin_expanded_lineage	Boolean	True/False that determines if a lineage should be expanded without aliases (e.g., BA.1 → B.1.1.529.1)		Optional
morgana_magic	pangolin_max_ambig	Float	Maximum proportion of Ns allowed for pangolin to attempt assignment.		Optional
morgana_magic	pangolin_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	pangolin_min_length	Int	Minimum query length allowed for pangolin to attempt an assignment		Optional
morgana_magic	pangolin_skip_designation_cache	Boolean	A True/False option that determines if the designation cache should be used		Optional
morgana_magic	pangolin_skip_scorpio	Boolean	A True/False option that determines if scorpio should be skipped.		Optional
morgana_magic	quasitools_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	quasitools_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	quasitools_docker	String	The Docker container to use for the task		Optional
morgana_magic	quasitools_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	read1	File	Internal component, do not modify		Optional
morgana_magic	read2	File	Internal component, do not modify		Optional
morgana_magic	reference_gene_locations_bed	File	Use to provide locations of interest where average coverage will be calculated		Optional
morgana_magic	sc2_s_gene_start	Int	Start position of S gene		Optional
morgana_magic	sc2_s_gene_stop	Int	End position of S gene		Optional
morgana_magic	vadr_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	vadr_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	vadr_min_length	Int	Minimum length for the fasta-trim-terminal-ambigs.pl VADR script		Optional
organism_parameters	auspice_config	File	Auspice config file for customizing visualizations in the Augur_PHB workflow; takes priority over the other customization values available for augur_export. Defaults are set for various organisms & flu segments. A minimal auspice config file is set in cases where organism is not specified and user does not provide an optional input config file.		Optional
organism_parameters	clades_tsv	File	Internal component, do not modify		Optional
organism_parameters	flu_genoflu_genotype	String	Internal component, do not modify	N/A	Optional
organism_parameters	gene_locations_bed_file	File	Use to provide locations of interest where average coverage will be calculated	Default provided for SARS-CoV-2 ("gs://theiagen-public-resources-rp/reference_data/viral/sars-cov-2/sc2_gene_locations.bed") and mpox ("gs://theiagen-public-resources-rp/reference_data/viral/mpox/mpox_gene_locations.bed")	Optional
organism_parameters	hiv_primer_version	String	The version of HIV primers used. Options are https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl#L156 and https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl#L164. This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs	v1	Optional
organism_parameters	kraken_target_organism_input	String	The organism whose abundance the user wants to check in their reads. This should be a proper taxonomic name recognized by the Kraken database.	Default provided for mpox (Monkeypox virus), WNV (West Nile virus), and HIV (Human immunodeficiency virus 1)	Optional
organism_parameters	lat_longs_tsv	File	Internal component, do not modify		Optional
organism_parameters	min_date	Float	Internal component, do not modify		Optional
organism_parameters	min_num_unambig	Int	Minimum number of called bases in genome to pass prefilter	Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments and subtypes) here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl. For an organism without set defaults, the default value is 0	Optional
organism_parameters	narrow_bandwidth	Float	Internal component, do not modify		Optional
organism_parameters	pangolin_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/pangolin:4.3.1-pdata-1.34	Optional
organism_parameters	pivot_interval	Int	Internal component, do not modify		Optional
organism_parameters	primer_bed_file	File	The bed file containing the primers used when sequencing was performed	REQUIRED FOR SARS-CoV-2, MPOX, WNV, RSV-A & RSV-B. Provided by default only for HIV primer versions 1 ("gs://theiagen-public-resources-rp/reference_data/viral/hiv/HIV-1_v1.0.primer.hyphen.bed" and 2 ("gs://theiagen-public-resources-rp/reference_data/viral/hiv/HIV-1_v2.0.primer.hyphen400.1.bed")	Optional
organism_parameters	proportion_wide	Float	Internal component, do not modify		Optional
organism_parameters	reference_genbank	File	Internal component, do not modify		Optional
organism_parameters	reference_gff_file	File	Reference GFF file for the organism being analyzed	Default provided for mpox ("gs://theiagen-public-resources-rp/reference_data/viral/mpox/Mpox-MT903345.1.reference.gff3") and HIV (primer versions 1 ["gs://theiagen-public-resources-rp/reference_data/viral/hiv/NC_001802.1.gff3"] and 2 ["gs://theiagen-public-resources-rp/reference_data/viral/hiv/AY228557.1.gff3"])	Optional
qc_check_task	ani_highest_percent	Float	Internal component, do not modify		Optional
qc_check_task	ani_highest_percent_bases_aligned	Float	Internal component, do not modify		Optional
qc_check_task	assembly_length	Int	Internal component, do not modify		Optional
qc_check_task	assembly_mean_coverage	Float	Internal component, do not modify		Optional
qc_check_task	busco_results	String	Internal component, do not modify		Optional
qc_check_task	combined_mean_q_clean	Float	Internal component, do not modify		Optional
qc_check_task	combined_mean_q_raw	Float	Internal component, do not modify		Optional
qc_check_task	combined_mean_readlength_clean	Float	Internal component, do not modify		Optional
qc_check_task	combined_mean_readlength_raw	Float	Internal component, do not modify		Optional
qc_check_task	cpu	Int	Number of CPUs to allocate to the task	4	Optional
qc_check_task	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
qc_check_task	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-03-16	Optional
qc_check_task	est_coverage_clean	Float	Internal component, do not modify		Optional
qc_check_task	est_coverage_raw	Float	Internal component, do not modify		Optional
qc_check_task	gambit_predicted_taxon	String	Internal component, do not modify		Optional
qc_check_task	kraken_human	Float	Internal component, do not modify		Optional
qc_check_task	kraken_human_dehosted	Float	Internal component, do not modify		Optional
qc_check_task	kraken_sc2	Float	Internal component, do not modify		Optional
qc_check_task	kraken_sc2_dehosted	Float	Internal component, do not modify		Optional
qc_check_task	kraken_target_organism	Float	Internal component, do not modify		Optional
qc_check_task	kraken_target_organism_dehosted	Float	Internal component, do not modify		Optional
qc_check_task	meanbaseq_trim	String	Internal component, do not modify		Optional
qc_check_task	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional
qc_check_task	midas_secondary_genus_abundance	Float	Internal component, do not modify		Optional
qc_check_task	midas_secondary_genus_coverage	Float	Internal component, do not modify		Optional
qc_check_task	n50_value	Int	Internal component, do not modify		Optional
qc_check_task	num_reads_clean1	Int	Internal component, do not modify		Optional
qc_check_task	num_reads_clean2	Int	Internal component, do not modify		Optional
qc_check_task	num_reads_raw1	Int	Internal component, do not modify		Optional
qc_check_task	num_reads_raw2	Int	Internal component, do not modify		Optional
qc_check_task	number_contigs	Int	Internal component, do not modify		Optional
qc_check_task	quast_gc_percent	Float	Internal component, do not modify		Optional
qc_check_task	r1_mean_q_clean	Float	Internal component, do not modify		Optional
qc_check_task	r1_mean_q_raw	Float	Internal component, do not modify		Optional
qc_check_task	r1_mean_readlength_clean	Float	Internal component, do not modify		Optional
qc_check_task	r1_mean_readlength_raw	Float	Internal component, do not modify		Optional
qc_check_task	r2_mean_q_clean	Float	Internal component, do not modify		Optional
qc_check_task	r2_mean_q_raw	Float	Internal component, do not modify		Optional
qc_check_task	r2_mean_readlength_clean	Float	Internal component, do not modify		Optional
qc_check_task	r2_mean_readlength_raw	Float	Internal component, do not modify		Optional
qc_check_task	sc2_s_gene_mean_coverage	Float	Internal component, do not modify		Optional
qc_check_task	sc2_s_gene_percent_coverage	Float	Internal component, do not modify		Optional
theiacov_fasta	flu_subtype	String	The influenza subtype being analyzed. Options: "Yamagata", "Victoria", "H1N1", "H3N2", "H5N1". Automatically determined.		Optional
theiacov_fasta	genome_length	Int	User-specified expected genome length to be used in genome statistics calculations		Optional
theiacov_fasta	nextclade_dataset_name	String	Nextclade organism dataset names. However, if organism input is set correctly, this input will be automatically assigned the corresponding dataset name. See organism defaults for more information	Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments) here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl	Optional
theiacov_fasta	nextclade_dataset_tag	String	Nextclade dataset tag. Used for pulling up-to-date reference genomes and associated information specific to nextclade datasets (QC thresholds, organism-specific information like SARS-CoV-2 clade & lineage information, etc.) that is required for running the Nextclade tool.	Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments) here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl	Optional
theiacov_fasta	organism	String	The organism that is being analyzed. Options: "sars-cov-2", "MPXV", "WNV", "HIV", "flu", "rsv_a", "rsv_b". However, "flu" is not available for TheiaCoV_Illumina_SE	sars-cov-2	Optional
theiacov_fasta	qc_check_table	File	TSV value with taxons for rows and QC values for columns; internal cells represent user-determined QC thresholds; if provided, turns on the QC Check task. See below for an example QC Check table.		Optional
theiacov_fasta	reference_genome	File	An optional reference genome used for consensus assembly and QC		Optional
theiacov_fasta	vadr_max_length	Int	Maximum length of contig allowed to run VADR		Optional
theiacov_fasta	vadr_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	32 (RSV-A and RSV-B) and 8 (all other TheiaCoV organisms)	Optional
theiacov_fasta	vadr_model_file	File	Path to the a tar + gzipped VADR model file	Defaults are organism-specific. Please find default values for all organisms here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl.	Optional
theiacov_fasta	vadr_opts	String	Additional options to provide to VADR		Optional
theiacov_fasta	vadr_skip_length	Int	Minimum assembly length (unambiguous) to run VADR	10000	Optional
version_capture	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0	Optional
version_capture	timezone	String	Set the time zone to get an accurate date of analysis (uses UTC by default)		Optional

Terra Task Name	Variable	Type	Description	Default Value	Terra Status
theiacov_clearlabs	primer_bed	File	The bed file containing the primers used when sequencing was performed		Required
theiacov_clearlabs	read1	File	Clear Dx-produced read file in FASTQ file format (compression optional)		Required
theiacov_clearlabs	samplename	String	The name of the sample being analyzed		Required
consensus	cpu	Int	Number of CPUs to allocate to the task	8	Optional
consensus	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
consensus	medaka_model	String	In order to obtain the best results, the appropriate model must be set to match the sequencer's basecaller model; this string takes the format of {pore}{device}{caller variant}_{caller_version}. See also https://github.com/nanoporetech/medaka?tab=readme-ov-file#models.	r941_min_high_g360	Optional
consensus	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	16	Optional
consensus_qc	cpu	Int	Number of CPUs to allocate to the task	1	Optional
consensus_qc	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
consensus_qc	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/theiagen/utility:1.1	Optional
consensus_qc	genome_length	Int	Internal component, do not modify		Optional
consensus_qc	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	2	Optional
fastq_scan_clean_reads	cpu	Int	Number of CPUs to allocate to the task	1	Optional
fastq_scan_clean_reads	disk_size	Int	Amount of storage (in GB) to allocate to the task	50	Optional
fastq_scan_clean_reads	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/biocontainers/fastq-scan:1.0.1--h4ac6f70_3	Optional
fastq_scan_clean_reads	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	2	Optional
fastq_scan_clean_reads	read1_name	String	Internal component, do not modify	basename of read1 (without .gz, .fastq, .fq)	Optional
fastq_scan_raw_reads	cpu	Int	Number of CPUs to allocate to the task	1	Optional
fastq_scan_raw_reads	disk_size	Int	Amount of storage (in GB) to allocate to the task	50	Optional
fastq_scan_raw_reads	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/biocontainers/fastq-scan:1.0.1--h4ac6f70_3	Optional
fastq_scan_raw_reads	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	2	Optional
fastq_scan_raw_reads	read1_name	String	Internal component, do not modify	basename of read1 (without .gz, .fastq, .fq)	Optional
kraken2_dehosted	cpu	Int	Number of CPUs to allocate to the task	4	Optional
kraken2_dehosted	disk_size	Int	Amount of storage (in GB) to allocate to the task. Increase this when using large (>30GB kraken2 databases such as the "k2_standard" database)	100	Optional
kraken2_dehosted	docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/kraken2:2.1.2-no-db	Optional
kraken2_dehosted	kraken2_db	File	The database used to run Kraken2. Must contain viral and human sequences.	gs://theiagen-public-resources-rp/reference_data/databases/kraken2/kraken2_humanGRCh38_viralRefSeq_20240828.tar.gz	Optional
kraken2_dehosted	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional
kraken2_dehosted	read2	File	Internal component, do not modify		Optional
kraken2_raw	cpu	Int	Number of CPUs to allocate to the task	4	Optional
kraken2_raw	disk_size	Int	Amount of storage (in GB) to allocate to the task. Increase this when using large (>30GB kraken2 databases such as the "k2_standard" database)	100	Optional
kraken2_raw	docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/kraken2:2.1.2-no-db	Optional
kraken2_raw	kraken2_db	File	The database used to run Kraken2. Must contain viral and human sequences.	gs://theiagen-public-resources-rp/reference_data/databases/kraken2/kraken2_humanGRCh38_viralRefSeq_20240828.tar.gz	Optional
kraken2_raw	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional
kraken2_raw	read2	File	Internal component, do not modify		Optional
morgana_magic	abricate_flu_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	abricate_flu_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	abricate_flu_docker	String	The Docker container to use for the task		Optional
morgana_magic	abricate_flu_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	abricate_flu_min_percent_coverage	Int	Minimum DNA percent coverage		Optional
morgana_magic	abricate_flu_min_percent_identity	Int	Minimum DNA percent identity		Optional
morgana_magic	assembly_metrics_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	assembly_metrics_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	assembly_metrics_docker	String	The Docker container to use for the task		Optional
morgana_magic	assembly_metrics_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	gene_coverage_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	gene_coverage_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	gene_coverage_docker	String	The Docker container to use for the task		Optional
morgana_magic	gene_coverage_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	gene_coverage_min_depth	Int	The minimum depth to determine if a position was covered.		Optional
morgana_magic	genoflu_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	genoflu_cross_reference	File	An Excel file to cross-reference BLAST findings; probably useful if novel genotypes are not in the default file used by genoflu.py		Optional
morgana_magic	genoflu_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	genoflu_docker	String	The Docker container to use for the task		Optional
morgana_magic	genoflu_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	irma_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	irma_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	irma_docker_image	String	The Docker container to use for the task		Optional
morgana_magic	irma_keep_ref_deletions	Boolean	True/False variable that determines if sites missed (i.e. 0 reads for a site in the reference genome) during read gathering should be deleted by ambiguation by inserting N's or deleting the sequence entirely. False sets this IRMA paramater to "DEL" and true sets it to "NNN"		Optional
morgana_magic	irma_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	nextclade_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	nextclade_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	nextclade_docker_image	String	The Docker container to use for the task		Optional
morgana_magic	nextclade_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	nextclade_output_parser_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	nextclade_output_parser_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	nextclade_output_parser_docker	String	The Docker container to use for the task		Optional
morgana_magic	nextclade_output_parser_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	pangolin_analysis_mode	String	Specify which inference engine to use. Options: accurate (UShER), fast (pangoLEARN), pangolearn, usher.		Optional
morgana_magic	pangolin_arguments	String	Optional arguments for pangolin e.g. ''--skip-scorpio''		Optional
morgana_magic	pangolin_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	pangolin_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	pangolin_expanded_lineage	Boolean	True/False that determines if a lineage should be expanded without aliases (e.g., BA.1 → B.1.1.529.1)		Optional
morgana_magic	pangolin_max_ambig	Float	Maximum proportion of Ns allowed for pangolin to attempt assignment.		Optional
morgana_magic	pangolin_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	pangolin_min_length	Int	Minimum query length allowed for pangolin to attempt an assignment		Optional
morgana_magic	pangolin_skip_designation_cache	Boolean	A True/False option that determines if the designation cache should be used		Optional
morgana_magic	pangolin_skip_scorpio	Boolean	A True/False option that determines if scorpio should be skipped.		Optional
morgana_magic	quasitools_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	quasitools_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	quasitools_docker	String	The Docker container to use for the task		Optional
morgana_magic	quasitools_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	read2	File	Internal component, do not modify		Optional
morgana_magic	sc2_s_gene_start	Int	Start position of S gene		Optional
morgana_magic	sc2_s_gene_stop	Int	End position of S gene		Optional
morgana_magic	vadr_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	vadr_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	vadr_min_length	Int	Minimum length for the fasta-trim-terminal-ambigs.pl VADR script		Optional
ncbi_scrub_se	cpu	Int	Number of CPUs to allocate to the task	4	Optional
ncbi_scrub_se	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
ncbi_scrub_se	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/ncbi/sra-human-scrubber:2.2.1	Optional
ncbi_scrub_se	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional
organism_parameters	auspice_config	File	Auspice config file for customizing visualizations in the Augur_PHB workflow; takes priority over the other customization values available for augur_export. Defaults are set for various organisms & flu segments. A minimal auspice config file is set in cases where organism is not specified and user does not provide an optional input config file.		Optional
organism_parameters	clades_tsv	File	Internal component, do not modify		Optional
organism_parameters	flu_genoflu_genotype	String	Internal component, do not modify	N/A	Optional
organism_parameters	flu_segment	String	Influenza genome segment being analyzed. Options: "HA" or "NA". Automatically determined. This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs	N/A	Optional
organism_parameters	flu_subtype	String	The influenza subtype being analyzed. Options: "Yamagata", "Victoria", "H1N1", "H3N2", "H5N1". Automatically determined. This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs	N/A	Optional
organism_parameters	gene_locations_bed_file	File	Use to provide locations of interest where average coverage will be calculated	Default provided for SARS-CoV-2 ("gs://theiagen-public-resources-rp/reference_data/viral/sars-cov-2/sc2_gene_locations.bed") and mpox ("gs://theiagen-public-resources-rp/reference_data/viral/mpox/mpox_gene_locations.bed")	Optional
organism_parameters	genome_length_input	Int	Use to specify the expected genome length; provided by default for all supported organisms	Default provided for SARS-CoV-2 (29903), mpox (197200), WNV (11000), flu (13000), RSV-A (16000), RSV-B (16000), HIV (primer versions 1 [9181] and 2 [9840])	Optional
organism_parameters	hiv_primer_version	String	The version of HIV primers used. Options are https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl#L156 and https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl#L164. This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs	v1	Optional
organism_parameters	lat_longs_tsv	File	Internal component, do not modify		Optional
organism_parameters	min_date	Float	Internal component, do not modify		Optional
organism_parameters	min_num_unambig	Int	Minimum number of called bases in genome to pass prefilter	Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments and subtypes) here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl. For an organism without set defaults, the default value is 0	Optional
organism_parameters	narrow_bandwidth	Float	Internal component, do not modify		Optional
organism_parameters	pangolin_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/pangolin:4.3.1-pdata-1.34	Optional
organism_parameters	pivot_interval	Int	Internal component, do not modify		Optional
organism_parameters	primer_bed_file	File	The bed file containing the primers used when sequencing was performed	REQUIRED FOR SARS-CoV-2, MPOX, WNV, RSV-A & RSV-B. Provided by default only for HIV primer versions 1 ("gs://theiagen-public-resources-rp/reference_data/viral/hiv/HIV-1_v1.0.primer.hyphen.bed" and 2 ("gs://theiagen-public-resources-rp/reference_data/viral/hiv/HIV-1_v2.0.primer.hyphen400.1.bed")	Optional
organism_parameters	proportion_wide	Float	Internal component, do not modify		Optional
organism_parameters	reference_genbank	File	Internal component, do not modify		Optional
organism_parameters	reference_gff_file	File	Reference GFF file for the organism being analyzed	Default provided for mpox ("gs://theiagen-public-resources-rp/reference_data/viral/mpox/Mpox-MT903345.1.reference.gff3") and HIV (primer versions 1 ["gs://theiagen-public-resources-rp/reference_data/viral/hiv/NC_001802.1.gff3"] and 2 ["gs://theiagen-public-resources-rp/reference_data/viral/hiv/AY228557.1.gff3"])	Optional
organism_parameters	vadr_max_length	Int	Maximum length for the fasta-trim-terminal-ambigs.pl VADR script	Default provided for SARS-CoV-2 (30000), mpox (210000), WNV (11000), flu (0), RSV-A (15500) and RSV-B (15500).	Optional
organism_parameters	vadr_mem	Int	Amount of memory/RAM (in GB) to allocate to the task	32 (RSV-A, RSV-B, WNV) and 16 (all other TheiaCoV organisms)	Optional
organism_parameters	vadr_model	File	Path to the a tar + gzipped VADR model file	gs://theiagen-public-resources-rp/reference_data/databases/vadr_models/vadr-models-sarscov2-1.3-2.tar.gz	Optional
organism_parameters	vadr_options	String	Options for the v-annotate.pl VADR script	--mkey sarscov2 --glsearch -s -r --nomisc --lowsim5seq 6 --lowsim3seq 6 --alt_fail lowscore,insertnn,deletinn --noseqnamemax --out_allfasta	Optional
organism_parameters	vadr_skip_length	Int	Minimum assembly length (unambiguous) to run VADR	10000	Optional
qc_check_task	ani_highest_percent	Float	Internal component, do not modify		Optional
qc_check_task	ani_highest_percent_bases_aligned	Float	Internal component, do not modify		Optional
qc_check_task	assembly_length	Int	Internal component, do not modify		Optional
qc_check_task	busco_results	String	Internal component, do not modify		Optional
qc_check_task	combined_mean_q_clean	Float	Internal component, do not modify		Optional
qc_check_task	combined_mean_q_raw	Float	Internal component, do not modify		Optional
qc_check_task	combined_mean_readlength_clean	Float	Internal component, do not modify		Optional
qc_check_task	combined_mean_readlength_raw	Float	Internal component, do not modify		Optional
qc_check_task	cpu	Int	Number of CPUs to allocate to the task	4	Optional
qc_check_task	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
qc_check_task	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-03-16	Optional
qc_check_task	est_coverage_clean	Float	Internal component, do not modify		Optional
qc_check_task	est_coverage_raw	Float	Internal component, do not modify		Optional
qc_check_task	gambit_predicted_taxon	String	Internal component, do not modify		Optional
qc_check_task	kraken_sc2	Float	Internal component, do not modify		Optional
qc_check_task	kraken_sc2_dehosted	Float	Internal component, do not modify		Optional
qc_check_task	kraken_target_organism	Float	Internal component, do not modify		Optional
qc_check_task	kraken_target_organism_dehosted	Float	Internal component, do not modify		Optional
qc_check_task	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional
qc_check_task	midas_secondary_genus_abundance	Float	Internal component, do not modify		Optional
qc_check_task	midas_secondary_genus_coverage	Float	Internal component, do not modify		Optional
qc_check_task	n50_value	Int	Internal component, do not modify		Optional
qc_check_task	num_reads_clean2	Int	Internal component, do not modify		Optional
qc_check_task	num_reads_raw2	Int	Internal component, do not modify		Optional
qc_check_task	number_contigs	Int	Internal component, do not modify		Optional
qc_check_task	quast_gc_percent	Float	Internal component, do not modify		Optional
qc_check_task	r1_mean_q_clean	Float	Internal component, do not modify		Optional
qc_check_task	r1_mean_q_raw	Float	Internal component, do not modify		Optional
qc_check_task	r1_mean_readlength_clean	Float	Internal component, do not modify		Optional
qc_check_task	r1_mean_readlength_raw	Float	Internal component, do not modify		Optional
qc_check_task	r2_mean_q_clean	Float	Internal component, do not modify		Optional
qc_check_task	r2_mean_q_raw	Float	Internal component, do not modify		Optional
qc_check_task	r2_mean_readlength_clean	Float	Internal component, do not modify		Optional
qc_check_task	r2_mean_readlength_raw	Float	Internal component, do not modify		Optional
qc_check_task	sc2_s_gene_mean_coverage	Float	Internal component, do not modify		Optional
qc_check_task	sc2_s_gene_percent_coverage	Float	Internal component, do not modify		Optional
stats_n_coverage	cpu	Int	Number of CPUs to allocate to the task	2	Optional
stats_n_coverage	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
stats_n_coverage	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/samtools:1.15	Optional
stats_n_coverage	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional
stats_n_coverage_primtrim	cpu	Int	Number of CPUs to allocate to the task	2	Optional
stats_n_coverage_primtrim	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
stats_n_coverage_primtrim	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/samtools:1.15	Optional
stats_n_coverage_primtrim	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional
theiacov_clearlabs	medaka_docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/artic-ncov2019:1.3.0-medaka-1.4.3	Optional
theiacov_clearlabs	nextclade_dataset_name	String	Nextclade organism dataset names. However, if organism input is set correctly, this input will be automatically assigned the corresponding dataset name. See organism defaults for more information	Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments) here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl	Optional
theiacov_clearlabs	nextclade_dataset_tag	String	Nextclade dataset tag. Used for pulling up-to-date reference genomes and associated information specific to nextclade datasets (QC thresholds, organism-specific information like SARS-CoV-2 clade & lineage information, etc.) that is required for running the Nextclade tool.	Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments) here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl	Optional
theiacov_clearlabs	normalise	Int	Used to normalize the amount of reads to the indicated level before variant calling	20000	Optional
theiacov_clearlabs	organism	String	The organism that is being analyzed. Options: "sars-cov-2", "MPXV", "WNV", "HIV", "flu", "rsv_a", "rsv_b". However, "flu" is not available for TheiaCoV_Illumina_SE	sars-cov-2	Optional
theiacov_clearlabs	qc_check_table	File	TSV value with taxons for rows and QC values for columns; internal cells represent user-determined QC thresholds; if provided, turns on the QC Check task. See below for an example QC Check table.		Optional
theiacov_clearlabs	reference_genome	File	An optional reference genome used for consensus assembly and QC		Optional
theiacov_clearlabs	seq_method	String	The sequencing methodology used to generate the input read data; for TheiaProk workflows, this input will be used in the "seq_id" column in any taxon-specific tables created in the Export Taxon Tables task	OXFORD_NANOPORE	Optional
theiacov_clearlabs	target_organism	String	The organism whose abundance the user wants to check in their reads. This should be a proper taxonomic name recognized by the Kraken database.		Optional
version_capture	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0	Optional
version_capture	timezone	String	Set the time zone to get an accurate date of analysis (uses UTC by default)		Optional

Terra Task Name	Variable	Type	Description	Default Value	Terra Status
theiacov_fasta_batch	assembly_fastas	Array[File]	The assembly files for your samples in FASTA format		Required
theiacov_fasta_batch	bucket_name	String	The GCP bucket for the workspace where the TheiaCoV_FASTA_Batch output files are saved. We recommend using a unique GSURI for the bucket associated with your Terra workspace. The root GSURI is accessible in the Dashboard page of your workspace in the "Cloud Information" section.Do not include the prefix gs:// in the stringExample: ""fc-c526190d-4332-409b-8086-be7e1af9a0b6/theiacov_fasta_batch-2024-04-15-seq-run-1/		Required
theiacov_fasta_batch	project_name	String	The name of the Terra project where the data can be found. Example: "my-terra-project"		Required
theiacov_fasta_batch	samplenames	Array[String]	The names of the samples being analyzed		Required
theiacov_fasta_batch	table_name	String	The name of the Terra table where the data can be found. Example: "sars-cov-2-sample"		Required
theiacov_fasta_batch	workspace_name	String	The name of the Terra workspace where the data can be found. Example "my-terra-workspace"		Required
cat_files_fasta	cpu	Int	Number of CPUs to allocate to the task	2	Optional
cat_files_fasta	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
cat_files_fasta	docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/theiagen/utility:1.1	Optional
cat_files_fasta	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional
morgana_magic	abricate_flu_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	abricate_flu_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	abricate_flu_docker	String	The Docker container to use for the task		Optional
morgana_magic	abricate_flu_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	abricate_flu_min_percent_coverage	Int	Minimum DNA percent coverage		Optional
morgana_magic	abricate_flu_min_percent_identity	Int	Minimum DNA percent identity		Optional
morgana_magic	assembly_metrics_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	assembly_metrics_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	assembly_metrics_docker	String	The Docker container to use for the task		Optional
morgana_magic	assembly_metrics_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	gene_coverage_bam	File	Bam file used for calculating gene coverage		Optional
morgana_magic	gene_coverage_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	gene_coverage_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	gene_coverage_docker	String	The Docker container to use for the task		Optional
morgana_magic	gene_coverage_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	gene_coverage_min_depth	Int	The minimum depth to determine if a position was covered.		Optional
morgana_magic	genoflu_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	genoflu_cross_reference	File	An Excel file to cross-reference BLAST findings; probably useful if novel genotypes are not in the default file used by genoflu.py		Optional
morgana_magic	genoflu_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	genoflu_docker	String	The Docker container to use for the task		Optional
morgana_magic	genoflu_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	irma_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	irma_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	irma_docker_image	String	The Docker container to use for the task		Optional
morgana_magic	irma_keep_ref_deletions	Boolean	True/False variable that determines if sites missed (i.e. 0 reads for a site in the reference genome) during read gathering should be deleted by ambiguation by inserting N's or deleting the sequence entirely. False sets this IRMA paramater to "DEL" and true sets it to "NNN"		Optional
morgana_magic	irma_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	nextclade_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	nextclade_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	nextclade_docker_image	String	The Docker container to use for the task		Optional
morgana_magic	nextclade_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	nextclade_output_parser_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	nextclade_output_parser_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	nextclade_output_parser_docker	String	The Docker container to use for the task		Optional
morgana_magic	nextclade_output_parser_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	number_ATCG	Int	Internal component, do not modify		Optional
morgana_magic	pangolin_analysis_mode	String	Specify which inference engine to use. Options: accurate (UShER), fast (pangoLEARN), pangolearn, usher.		Optional
morgana_magic	pangolin_arguments	String	Optional arguments for pangolin e.g. ''--skip-scorpio''		Optional
morgana_magic	pangolin_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	pangolin_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	pangolin_expanded_lineage	Boolean	True/False that determines if a lineage should be expanded without aliases (e.g., BA.1 → B.1.1.529.1)		Optional
morgana_magic	pangolin_max_ambig	Float	Maximum proportion of Ns allowed for pangolin to attempt assignment.		Optional
morgana_magic	pangolin_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	pangolin_min_length	Int	Minimum query length allowed for pangolin to attempt an assignment		Optional
morgana_magic	pangolin_skip_designation_cache	Boolean	A True/False option that determines if the designation cache should be used		Optional
morgana_magic	pangolin_skip_scorpio	Boolean	A True/False option that determines if scorpio should be skipped.		Optional
morgana_magic	quasitools_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	quasitools_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	quasitools_docker	String	The Docker container to use for the task		Optional
morgana_magic	quasitools_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	read1	File	Internal component, do not modify		Optional
morgana_magic	read2	File	Internal component, do not modify		Optional
morgana_magic	reference_gene_locations_bed	File	Use to provide locations of interest where average coverage will be calculated		Optional
morgana_magic	sc2_s_gene_start	Int	Start position of S gene		Optional
morgana_magic	sc2_s_gene_stop	Int	End position of S gene		Optional
morgana_magic	vadr_cpu	Int	Number of CPUs to allocate to the task		Optional
morgana_magic	vadr_disk_size	Int	Amount of storage (in GB) to allocate to the task		Optional
morgana_magic	vadr_max_length	Int	Maximum length for the fasta-trim-terminal-ambigs.pl VADR script		Optional
morgana_magic	vadr_memory	Int	Amount of memory/RAM (in GB) to allocate to the task		Optional
morgana_magic	vadr_min_length	Int	Minimum length for the fasta-trim-terminal-ambigs.pl VADR script		Optional
morgana_magic	vadr_model_file	File	Path to the a tar + gzipped VADR model file		Optional
morgana_magic	vadr_options	String	Options to pass to the VADR script		Optional
morgana_magic	vadr_skip_length	Int	Skip reads shorter than this length		Optional
organism_parameters	auspice_config	File	Auspice config file for customizing visualizations in the Augur_PHB workflow; takes priority over the other customization values available for augur_export. Defaults are set for various organisms & flu segments. A minimal auspice config file is set in cases where organism is not specified and user does not provide an optional input config file.		Optional
organism_parameters	clades_tsv	File	Internal component, do not modify		Optional
organism_parameters	flu_genoflu_genotype	String	Internal component, do not modify	N/A	Optional
organism_parameters	flu_segment	String	Influenza genome segment being analyzed. Options: "HA" or "NA". Automatically determined. This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs	N/A	Optional
organism_parameters	flu_subtype	String	The influenza subtype being analyzed. Options: "Yamagata", "Victoria", "H1N1", "H3N2", "H5N1". Automatically determined. This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs	N/A	Optional
organism_parameters	gene_locations_bed_file	File	Use to provide locations of interest where average coverage will be calculated	Default provided for SARS-CoV-2 ("gs://theiagen-public-resources-rp/reference_data/viral/sars-cov-2/sc2_gene_locations.bed") and mpox ("gs://theiagen-public-resources-rp/reference_data/viral/mpox/mpox_gene_locations.bed")	Optional
organism_parameters	genome_length_input	Int	Use to specify the expected genome length; provided by default for all supported organisms	Default provided for SARS-CoV-2 (29903), mpox (197200), WNV (11000), flu (13000), RSV-A (16000), RSV-B (16000), HIV (primer versions 1 [9181] and 2 [9840])	Optional
organism_parameters	hiv_primer_version	String	The version of HIV primers used. Options are https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl#L156 and https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl#L164. This input is ignored if provided for TheiaCoV_Illumina_SE and TheiaCoV_ClearLabs	v1	Optional
organism_parameters	kraken_target_organism_input	String	The organism whose abundance the user wants to check in their reads. This should be a proper taxonomic name recognized by the Kraken database.	Default provided for mpox (Monkeypox virus), WNV (West Nile virus), and HIV (Human immunodeficiency virus 1)	Optional
organism_parameters	lat_longs_tsv	File	Internal component, do not modify		Optional
organism_parameters	min_date	Float	Internal component, do not modify		Optional
organism_parameters	min_num_unambig	Int	Minimum number of called bases in genome to pass prefilter	Defaults are organism-specific. Please find default values for all organisms (and for Flu - their respective genome segments and subtypes) here: https://github.com/theiagen/public_health_bioinformatics/blob/main/workflows/utilities/wf_organism_parameters.wdl. For an organism without set defaults, the default value is 0	Optional
organism_parameters	narrow_bandwidth	Float	Internal component, do not modify		Optional
organism_parameters	pivot_interval	Int	Internal component, do not modify		Optional
organism_parameters	primer_bed_file	File	The bed file containing the primers used when sequencing was performed	REQUIRED FOR SARS-CoV-2, MPOX, WNV, RSV-A & RSV-B. Provided by default only for HIV primer versions 1 ("gs://theiagen-public-resources-rp/reference_data/viral/hiv/HIV-1_v1.0.primer.hyphen.bed" and 2 ("gs://theiagen-public-resources-rp/reference_data/viral/hiv/HIV-1_v2.0.primer.hyphen400.1.bed")	Optional
organism_parameters	proportion_wide	Float	Internal component, do not modify		Optional
organism_parameters	reference_genbank	File	Internal component, do not modify		Optional
organism_parameters	reference_genome	File	An optional reference genome used for consensus assembly and QC		Optional
organism_parameters	reference_gff_file	File	Reference GFF file for the organism being analyzed	Default provided for mpox ("gs://theiagen-public-resources-rp/reference_data/viral/mpox/Mpox-MT903345.1.reference.gff3") and HIV (primer versions 1 ["gs://theiagen-public-resources-rp/reference_data/viral/hiv/NC_001802.1.gff3"] and 2 ["gs://theiagen-public-resources-rp/reference_data/viral/hiv/AY228557.1.gff3"])	Optional
organism_parameters	vadr_max_length	Int	Maximum length for the fasta-trim-terminal-ambigs.pl VADR script	Default provided for SARS-CoV-2 (30000), mpox (210000), WNV (11000), flu (0), RSV-A (15500) and RSV-B (15500).	Optional
organism_parameters	vadr_mem	Int	Amount of memory/RAM (in GB) to allocate to the task	32 (RSV-A, RSV-B, WNV) and 16 (all other TheiaCoV organisms)	Optional
organism_parameters	vadr_model	File	Path to the a tar + gzipped VADR model file	gs://theiagen-public-resources-rp/reference_data/databases/vadr_models/vadr-models-sarscov2-1.3-2.tar.gz	Optional
organism_parameters	vadr_options	String	Options for the v-annotate.pl VADR script	--mkey sarscov2 --glsearch -s -r --nomisc --lowsim5seq 6 --lowsim3seq 6 --alt_fail lowscore,insertnn,deletinn --noseqnamemax --out_allfasta	Optional
organism_parameters	vadr_skip_length	Int	Minimum assembly length (unambiguous) to run VADR	10000	Optional
sm_theiacov_fasta_wrangling	cpu	Int	Number of CPUs to allocate to the task	8	Optional
sm_theiacov_fasta_wrangling	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
sm_theiacov_fasta_wrangling	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-08-28-v4	Optional
sm_theiacov_fasta_wrangling	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	4	Optional
theiacov_fasta_batch	nextclade_dataset_name	String	Nextclade organism dataset name. Options: "nextstrain/sars-cov-2/wuhan-hu-1/orfs" However, if organism input is set correctly, this input will be automatically assigned the corresponding dataset name.	sars-cov-2	Optional
theiacov_fasta_batch	nextclade_dataset_tag	String	Nextclade dataset tag. Used for pulling up-to-date reference genomes and associated information specific to nextclade datasets (QC thresholds, organism-specific information like SARS-CoV-2 clade & lineage information, etc.) that is required for running the Nextclade tool.	2024-06-13--23-42-47Z	Optional
theiacov_fasta_batch	organism	String	The organism that is being analyzed. Options: "sars-cov-2"	sars-cov-2	Optional
theiacov_fasta_batch	pangolin_docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/pangolin:4.3.1-pdata-1.34	Optional
version_capture	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0	Optional
version_capture	timezone	String	Set the time zone to get an accurate date of analysis (uses UTC by default)		Optional

Organism-Specific Parameters¶

The organism_parameters sub-workflow is the first step in all TheiaCoV workflows. This step automatically sets the different parameters needed for each downstream tool to the appropriate value for the user-designated organism (by default, "sars-cov-2" is the default organism).

The following tables include the relevant organism-specific parameters; all of these default values can be overwritten by providing a value for the "Overwrite Variable Name" field.

SARS-CoV-2MpoxWest Nile VirusInfluenzaRSV-ARSV-BHIVMeaslesMumpsRubella

Overwrite Variable Name	Organism	Default Value
gene_locations_bed_file	sars-cov-2	`"gs://theiagen-public-resources-rp/reference_data/viral/sars-cov-2/sc2_gene_locations.bed"`
genome_length_input	sars-cov-2	`29903`
kraken_target_organism_input	sars-cov-2	`"Severe acute respiratory syndrome coronavirus 2"`
nextclade_dataset_name_input	sars-cov-2	`"nextstrain/sars-cov-2/wuhan-hu-1/orfs"`
pangolin_docker_image	sars-cov-2	`"us-docker.pkg.dev/general-theiagen/staphb/pangolin:4.3.3-pdata-1.36"`
nextclade_dataset_tag_input	sars-cov-2	`"2025-09-19--14-53-06Z"`
reference_genome	sars-cov-2	`"gs://theiagen-public-resources-rp/reference_data/viral/sars-cov-2/MN908947.fasta"`
vadr_max_length	sars-cov-2	`30000`
vadr_skip_length	sars-cov-2	`10000`
vadr_mem	sars-cov-2	`8`
vadr_options	sars-cov-2	`"--mkey sarscov2 --glsearch -s -r --nomisc --lowsim5seq 6 --lowsim3seq 6 --alt_fail lowscore,insertnn,deletinn --noseqnamemax --out_allfasta"`
vadr_model_file	sars-cov-2	`"gs://theiagen-public-resources-rp/reference_data/databases/vadr_models/vadr-models-sarscov2-1.3-2.tar.gz"`

Overwrite Variable Name	Organism	Default Value
gene_locations_bed_file	MPXV	`"gs://theiagen-public-resources-rp/reference_data/viral/mpox/mpox_gene_locations.bed"`
genome_length_input	MPXV	`197200`
kraken_target_organism_input	MPXV	`"Monkeypox virus"`
nextclade_dataset_name_input	MPXV	`"nextstrain/mpox/lineage-b.1"`
nextclade_dataset_tag_input	MPXV	`"2025-09-09--12-13-13Z"`
primer_bed_file	MPXV	`"gs://theiagen-public-resources-rp/reference_data/viral/mpox/MPXV.primer.bed"`
reference_genome	MPXV	`"gs://theiagen-public-resources-rp/reference_data/viral/mpox/MPXV.MT903345.reference.fasta"`
reference_gff_file	MPXV	`"gs://theiagen-public-resources-rp/reference_data/viral/mpox/Mpox-MT903345.1.reference.gff3"`
vadr_max_length	MPXV	`210000`
vadr_skip_length	MPXV	`65480`
vadr_mem	MPXV	`8`
vadr_options	MPXV	`"--mkey mpxv --glsearch --minimap2 -s -r --nomisc --r_lowsimok --r_lowsimxd 100 --r_lowsimxl 2000 --alt_pass discontn,dupregin --s_overhang 150 --out_allfasta"`
vadr_model_file	MPXV	`"gs://theiagen-public-resources-rp/reference_data/databases/vadr_models/vadr-models-mpxv-1.4.2-1.tar.gz"`

Overwrite Variable Name	Organism	Default Value	Notes
genome_length_input	WNV	`11000`
kraken_target_organism_input	WNV	`"West Nile virus`"
nextclade_dataset_name_input	WNV	`"NA"`	TheiaCoV's Nextclade currently does not support WNV
nextclade_dataset_tag_input	WNV	`"NA"`	TheiaCoV's Nextclade currently does not support WNV
primer_bed_file	WNV	`"gs://theiagen-public-resources-rp/reference_data/viral/wnv/al/wnv/WNV-L1_primer.bed"`
reference_genome	WNV	`"gs://theiagen-public-resources-rp/reference_data/viral/wnv/NC_009942.1_wnv_L1.fasta"`
vadr_max_length	WNV	`11000`
vadr_skip_length	WNV	`3000`
vadr_mem	WNV	`8`
vadr_options	WNV	`"--mkey flavi --nomisc --noprotid --out_allfasta"`
vadr_model_file	WNV	`"gs://theiagen-public-resources-rp/reference_data/databases/vadr_models/vadr-models-flavi-1.2-1.tar.gz"`

Overwrite Variable Name	Organism	Flu Segment	Flu Subtype	Default Value	Notes
flu_segment	flu	all	all	N/A	TheiaCoV will attempt to automatically assign a flu segment
flu_subtype	flu	all	all	N/A	TheiaCoV will attempt to automatically assign a flu subtype
genome_length_input	flu	all	all	`13500`
vadr_max_length	flu	all	all	`13500`
vadr_skip_length	flu	all	all	`500`
vadr_mem	flu	all	all	`8`
vadr_options	flu	all	all	`"--mkey flu --atgonly --xnocomp --nomisc --alt_fail extrant5,extrant3"`
vadr_model_file	flu	all	all	`"gs://theiagen-public-resources-rp/reference_data/databases/vadr_models/vadr-models-flu-1.6.3-2.tar.gz"`
nextclade_dataset_name_input	flu	ha	h1n1	`"nextstrain/flu/h1n1pdm/ha/MW626062"`
nextclade_dataset_tag_input	flu	ha	h1n1	`"2025-10-22--18-11-36Z"`
reference_genome	flu	ha	h1n1	`"gs://theiagen-public-resources-rp/reference_data/viral/flu/reference_h1n1pdm_ha.fasta"`
nextclade_dataset_name_input	flu	ha	h3n2	`"nextstrain/flu/h3n2/ha/EPI1857216"`
nextclade_dataset_tag_input	flu	ha	h3n2	`"2025-11-04--15-46-13Z"`
reference_genome	flu	ha	h3n2	`"gs://theiagen-public-resources-rp/reference_data/viral/flu/reference_h3n2_ha.fasta"`
nextclade_dataset_name_input	flu	ha	victoria	`"nextstrain/flu/vic/ha/KX058884"`
nextclade_dataset_tag_input	flu	ha	victoria	`"2025-10-22--18-11-36Z"`
reference_genome	flu	ha	victoria	`"gs://theiagen-public-resources-rp/reference_data/viral/flu/reference_vic_ha.fasta"`
nextclade_dataset_name_input	flu	ha	yamagata	`"nextstrain/flu/yam/ha/JN993010"`
nextclade_dataset_tag_input	flu	ha	yamagata	`"2024-01-30--16-34-55Z"`
reference_genome	flu	ha	yamagata	`"gs://theiagen-public-resources-rp/reference_data/viral/flu/reference_yam_ha.fasta"`
nextclade_dataset_name_input	flu	ha	h5n1	`"community/moncla-lab/iav-h5/ha/all-clades"`
nextclade_dataset_tag_input	flu	ha	h5n1	`"2025-09-09--12-13-13Z"`
reference_genome	flu	ha	h5n1	`"gs://theiagen-public-resources-rp/reference_data/viral/flu/reference_h5n1_ha.fasta"`
nextclade_dataset_name_input	flu	na	h1n1	`"nextstrain/flu/h1n1pdm/na/MW626056"`
nextclade_dataset_tag_input	flu	na	h1n1	`"2025-09-09--12-13-13Z"`
reference_genome	flu	na	h1n1	`"gs://theiagen-public-resources-rp/reference_data/viral/flu/reference_h1n1pdm_na.fasta"`
nextclade_dataset_name_input	flu	na	h3n2	`"nextstrain/flu/h3n2/na/EPI1857215"`
nextclade_dataset_tag_input	flu	na	h3n2	`"2025-09-09--12-13-13Z"`
reference_genome	flu	na	h3n2	`"gs://theiagen-public-resources-rp/reference_data/viral/flu/reference_h3n2_na.fasta"`
nextclade_dataset_name_input	flu	na	victoria	`"nextstrain/flu/vic/na/CY073894"`
nextclade_dataset_tag_input	flu	na	victoria	`"2025-09-09--12-13-13Z"`
reference_genome	flu	na	victoria	`"gs://theiagen-public-resources-rp/reference_data/viral/flu/reference_vic_na.fasta"`
nextclade_dataset_name_input	flu	na	yamagata	`"NA"`
nextclade_dataset_tag_input	flu	na	yamagata	`"NA"`
reference_genome	flu	na	yamagata	`"gs://theiagen-public-resources-rp/reference_data/viral/flu/reference_yam_na.fasta"`

H5N1 Additional Defaults

If the sample is designated as H5N1 by either ABRicate or IRMA, an H5N1-specific Nextclade task will run with the following datasets depending on the GenoFLU genotype.

Alternatively, if a nextclade_custom_input_dataset variable is provided (available under the flu_track task name), the workflow will run that custom dataset on all H5N1 samples, regardless of the GenoFLU genotype.

Overwrite Variable Name	GenoFLU Genotype	Default Value	Notes
nextclade_custom_input_dataset	B3.13	`"gs://theiagen-public-resources-rp/reference_data/viral/flu/nextclade_avian-flu_h5n1-cattle-outbreak_h5n1-b3.13_2025-06-24.json"`	Extracted from nextclade/avian-flu/h5n1-cattle-outbreak on 2025-06-24
nextclade_custom_input_dataset	D1.1	`"gs://theiagen-public-resources-rp/reference_data/viral/flu/nextclade_avian-flu_h5n1-d1.1_2025-06-24.json"`	Extracted from nextclade/avian-flu/h5n1-d1.1 on 2025-06-24

Overwrite Variable Name	Organism	Default Value
genome_length_input	rsv_a	`16000`
kraken_target_organism	rsv_a	`"Human respiratory syncytial virus A"`
nextclade_dataset_name_input	rsv_a	`nextstrain/rsv/a/EPI_ISL_412866`
nextclade_dataset_tag_input	rsv_a	`"2025-09-09--12-13-13Z"`
reference_genome	rsv_a	`"gs://theiagen-public-resources-rp/reference_data/viral/rsv/reference_rsv_a.EPI_ISL_412866.fasta"`
vadr_max_length	rsv_a	`15500`
vadr_skip_length	rsv_a	`5000`
vadr_mem	rsv_a	`32`
vadr_options	rsv_a	`"--mkey rsv --xnocomp -r"`
vadr_model_file	rsv_a	`"gs://theiagen-public-resources-rp/reference_data/databases/vadr_models/vadr-models-rsv-1.5-2.tar.gz"`

Overwrite Variable Name	Organism	Default Value
genome_length_input	rsv_b	`16000`
kraken_target_organism	rsv_b	`"human respiratory syncytial virus"`
nextclade_dataset_name_input	rsv_b	`nextstrain/rsv/b/EPI_ISL_1653999`
nextclade_dataset_tag_input	rsv_b	`"2025-09-09--12-13-13Z"`
reference_genome	rsv_b	`"gs://theiagen-public-resources-rp/reference_data/viral/rsv/reference_rsv_b.EPI_ISL_1653999.fasta"`
vadr_max_length	rsv_b	`15500`
vadr_mem	rsv_b	`32`
vadr_options	rsv_b	`"--mkey rsv --xnocomp -r"`
vadr_model_file	rsv_b	`"gs://theiagen-public-resources-rp/reference_data/databases/vadr_models/vadr-models-rsv-1.5-2.tar.gz"`

Overwrite Variable Name	Organism	Default Value	Notes
kraken_target_organism_input	HIV	`"Human immunodeficiency virus 1"`
genome_length_input	HIV-v1	`9181`	This version of HIV originates from Oregon
primer_bed_file	HIV-v1	`"gs://theiagen-public-resources-rp/reference_data/viral/hiv/HIV-1_v1.0.primer.hyphen.bed"`	This version of HIV originates from Oregon
reference_genome	HIV-v1	`"gs://theiagen-public-resources-rp/reference_data/viral/hiv/NC_001802.1.fasta"`	This version of HIV originates from Oregon
reference_gff_file	HIV-v1	`"gs://theiagen-public-resources-rp/reference_data/viral/hiv/NC_001802.1.gff3"`	This version of HIV originates from Oregon
genome_length_input	HIV-v2	`9840`	This version of HIV originates from Southern Africa
primer_bed_file	HIV-v2	`"gs://theiagen-public-resources-rp/reference_data/viral/hiv/HIV-1_v2.0.primer.hyphen400.1.bed"`	This version of HIV originates from Southern Africa
reference_genome	HIV-v2	`"gs://theiagen-public-resources-rp/reference_data/viral/hiv/AY228557.1.headerchanged.fasta"`	This version of HIV originates from Southern Africa
reference_gff_file	HIV-v2	`"gs://theiagen-public-resources-rp/reference_data/viral/hiv/AY228557.1.gff3"`	This version of HIV originates from Southern Africa

Overwrite Variable Name	Organism	Default Value
kraken_target_organism_input	measles	`"Measles morbillivirus"`
genome_length_input	measles	`16000`
nextclade_dataset_name_input	measles	`"nextstrain/measles/genome/WHO-2012"`
nextclade_dataset_tag_input	measles	`"2025-09-09--12-13-13Z"`
reference_genome	measles	`"gs://theiagen-public-resources-rp/reference_data/viral/measles/NC_001498.1_measles_reference.fasta"`
vadr_max_length	measles	`18000`
vadr_skip_length	measles	`0`
vadr_mem	measles	`24`
vadr_options	measles	`"--mkey mev -r --indefclass 0.01"`
vadr_model_file	measles	`"gs://theiagen-public-resources-rp/reference_data/databases/vadr_models/vadr-models-mev-1.02.tar.gz"`

Overwrite Variable Name	Organism	Default Value
reference_genome	mumps	`"gs://theiagen-public-resources-rp/reference_data/viral/mumps/NC_002200.1_mumps_reference.fasta"`
genome_length_input	mumps	`15300`
vadr_options	mumps	`"--mkey muv -r --indefclass 0.025"`
vadr_max_length	mumps	`18000`
vadr_skip_length	mumps	`0`
vadr_mem	mumps	`16`
vadr_model_file	mumps	`"gs://theiagen-public-resources-rp/reference_data/databases/vadr_models/vadr-models-muv-1.01.tar.gz"`

Overwrite Variable Name	Organism	Default Value
reference_genome	rubella	`"gs://theiagen-public-resources-rp/reference_data/viral/rubella/NC_001545.2_rubella_reference.fasta"`
genome_length_input	rubella	`9800`
vadr_options	rubella	`"--mkey ruv -r"`
vadr_max_length	rubella	`10000`
vadr_skip_length	rubella	`0`
vadr_mem	rubella	`16`
vadr_model_file	rubella	`"gs://theiagen-public-resources-rp/reference_data/databases/vadr_models/vadr-models-ruv-1.01.tar.gz"`

Core Tasks¶

These tasks are performed for all organisms. They include tasks that are performed regardless of and specific for the input data type. They perform read trimming and assembly appropriate to the input data type.

versioning: Version Capture

The versioning task captures the workflow version from the GitHub (code repository) version.

Version Capture Technical details

	Links
Task	task_versioning.wdl

Assembly Tasks¶

TheiaCoV_Illumina_PETheiaCoV_Illumina_SETheiaCoV_ONTTheiaCoV_ClearLabsTheiaCoV_FASTA

read_QC_trim: Read Quality Trimming, Adapter Removal, Quantification, and Identification

read_QC_trim is a sub-workflow that removes low-quality reads, low-quality regions of reads, and sequencing adapters to improve data quality. It uses a number of tasks, described below. The differences between the PE and SE versions of the read_QC_trim sub-workflow lie in the default parameters, the use of two or one input read file(s), and the different output files.

HRRT: Human Host Sequence Removal

All reads of human origin are removed, including their mates, by using NCBI's human read removal tool (HRRT).

HRRT is based on the SRA Taxonomy Analysis Tool and employs a k-mer database constructed of k-mers from Eukaryota derived from all human RefSeq records with any k-mers found in non-Eukaryota RefSeq records subtracted from the database.

NCBI-Scrub Technical Details

	Links
Task	task_ncbi_scrub.wdl
Software Source Code	HRRT on GitHub
Software Documentation	HRRT on NCBI

By default, read_processing is set to "trimmomatic". To use fastp instead, set read_processing to "fastp". These tasks are mutually exclusive.

Trimmomatic: Read Trimming (default)

Read proccessing is available via Trimmomatic by default.

Trimmomatic trims low-quality regions of Illumina paired-end or single-end reads with a sliding window (with a default window size of 4, specified with trim_window_size), cutting once the average quality within the window falls below the trim_quality_trim_score (default of 20 for paired-end, 30 for single-end). The read is discarded if it is trimmed below trim_minlen (default of 75 for paired-end, 25 for single-end).

Trimmomatic Technical Details

	Links
Task	task_trimmomatic.wdl
Software Source Code	Trimmomatic on GitHub
Software Documentation	Trimmomatic Website
Original Publication(s)	Trimmomatic: a flexible trimmer for Illumina sequence data

fastp: Read Trimming (alternative)

To activate this task, set read_processing to "fastp".

fastp trims low-quality regions of Illumina paired-end or single-end reads with a sliding window (with a default window size of 4, specified with trim_window_size), cutting once the average quality within the window falls below the trim_quality_trim_score (default of 20 for paired-end, 30 for single-end). The read is discarded if it is trimmed below trim_minlen (default of 75 for paired-end, 25 for single-end).

fastp also has additional default parameters and features that are not a part of trimmomatic's default configuration.

fastp default read-trimming parameters

Parameter	Explanation
-g	enables polyG tail trimming
-5 20	enables read end-trimming
-3 20	enables read end-trimming
--detect_adapter_for_pe	enables adapter-trimming only for paired-end reads

Additional arguments can be passed using the fastp_args optional parameter.

Trimmomatic and fastp Technical Details

	Links
Task	task_fastp.wdl
Software Source Code	fastp on GitHub
Software Documentation	fastp on GitHub
Original Publication(s)	fastp: an ultra-fast all-in-one FASTQ preprocessor

BBDuk: Adapter Trimming and PhiX Removal

Adapters are manufactured oligonucleotide sequences attached to DNA fragments during the library preparation process. In Illumina sequencing, these adapter sequences are required for attaching reads to flow cells. You can read more about Illumina adapters here. For genome analysis, it's important to remove these sequences since they're not actually from your sample. If you don't remove them, the downstream analysis may be affected.

The bbduk task removes adapters from sequence reads. To do this:

Repair from the BBTools package reorders reads in paired fastq files to ensure the forward and reverse reads of a pair are in the same position in the two fastq files (it re-pairs).
BBDuk ("Bestus Bioinformaticus" Decontamination Using Kmers) is then used to trim the adapters and filter out all reads that have a 31-mer match to PhiX, which is commonly added to Illumina sequencing runs to monitor and/or improve overall run quality.

BBDuk Technical Details

	Links
Task	task_bbduk.wdl
Software Source Code	BBMap on SourceForge
Software Documentation	BBDuk Guide (archived)

By default, read_qc is set to "fastq_scan". To use fastqc instead, set read_qc to "fastqc". These tasks are mutually exclusive.

fastq-scan: Read Quantification (default)

Read quantification is available via fastq-scan by default.

fastq-scan quantifies the forward and reverse reads in FASTQ files. For paired-end data, it also provide the total number of read pairs. This task is run once with raw reads as input and once with clean reads as input. If QC has been performed correctly, you should expect fewer clean reads than raw reads.

fastq-scan Technical Details

	Links
Task	task_fastq_scan.wdl
Software Source Code	fastq-scan on GitHub
Software Documentation	fastq-scan on GitHub

FastQC: Read Quantification (alternative)

To activate this task, set read_qc to "fastqc".

FastQC quantifies the forward and reverse reads in FASTQ files. For paired-end data, it also provide the total number of read pairs. This task is run once with raw reads as input and once with clean reads as input. If QC has been performed correctly, you should expect fewer clean reads than raw reads.

This tool also provides a graphical visualization of the read quality.

FastQC Technical Details

	Links
Task	task_fastqc.wdl
Software Source Code	FastQC on Github
Software Documentation	FastQC Website

Kraken2: Read Identification

Kraken2 is a bioinformatics tool originally designed for metagenomic applications. It has additionally proven valuable for validating taxonomic assignments and checking contamination of single-species (e.g. bacterial isolate, eukaryotic isolate, viral isolate, etc.) whole genome sequence data.

Kraken2 is run on both the raw and clean reads.

Database-dependent

This workflow automatically uses a viral-specific Kraken2 database. This database was generated in-house from RefSeq's viral sequence collection and human genome GRCh38. It's available at gs://theiagen-public-resources-rp/reference_data/databases/kraken2/kraken2_humanGRCh38_viralRefSeq_20240828.tar.gz.

Kraken2 Technical Details

	Links
Task	task_kraken2.wdl
Software Source Code	Kraken2 on GitHub
Software Documentation	Kraken2 Documentation
Original Publication(s)	Improved metagenomic analysis with Kraken 2

read_QC_trim Technical Details

	Links
Subworkflow	wf_read_QC_trim_pe.wdl wf_read_QC_trim_se.wdl

If non-influenza

ivar_consensus: Alignment, Consensus, Variant Detection, and Assembly Statistics

iVar Consensus is a sub-workflow within TheiaCoV that performs reference-based consensus assembly using the iVar tool by Nathan Grubaugh from the Andersen lab.

bwa: Read Alignment to the Reference

BWA (Burrow-Wheeler Aligner) is used to align the cleaned read files to a reference genome, either determined by the user or provided by the organism-specific parameters section (see above). The resulting BAM file is used for primer trimming, variant calling, and consensus generation in downstream tasks.

BWA Technical Details

	Links
Task	task_bwa.wdl
Software Source Code	BWA on GitHub
Software Documentation	BWA Documentation
Original Publication(s)	Fast and accurate short read alignment with Burrows-Wheeler transform

ivar_trim: Primer Trimming (optional)

To deactivate this task, set trim_primers to false.

Using the user-provided (or, more rarely, a organism-specific parameters-determined) primer_bed file, iVar soft-clips primer sequences from an aligned and sorted BAM file and then trims the reads based on a quality threshold of 20 using a sliding window approach. If the resulting read is greater than 30 bp, the read is written to a a new BAM file consisting of only trimmed reads (or reads that did not have a primer identified).

iVar Trim Technical Details

	Links
Task	task_ivar_primer_trim.wdl
Software Source Code	iVar on GitHub
Software Documentation	iVar on GitHub
Original Publication(s)	An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar

assembly_metrics: Mapping Statistics

The assembly_metrics task generates mapping statistics from a BAM file. It uses samtools to generate a summary of the mapping statistics, which includes coverage, depth, average base quality, average mapping quality, and other relevant metrics.

This task is run twice: once on the untrimmed reads and, if primer trimming is enabled, once on the primer-trimmed reads. This allows for a comparison of mapping statistics before and after primer trimming, which can be useful for assessing the impact of primer trimming on the quality of the alignment and subsequent analyses.

assembly_metrics Technical Details

	Links
Task	task_assembly_metrics.wdl
Software Source Code	samtools on GitHub
Software Documentation	samtools
Original Publication(s)	The Sequence Alignment/Map format and SAMtools Twelve Years of SAMtools and BCFtools

ivar_consensus: Consensus Assembly

iVar's consensus tool generates a reference-based consensus assembly. Several parameters can be set that determine the stringency of the consensus assembly, including minimum quality, minimum allele frequency, and minimum depth.

For TheiaCoV, the following default parameters are used:

minimum quality: 20
minimum depth: 100
minimum allele frequency: 0.6

iVar Technical Details

	Links
Task	task_ivar_consensus.wdl
Software Source Code	Ivar on GitHub
Software Documentation	Ivar Documentation
Original Publication(s)	An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar

ivar_variants: Variant Calling

iVar uses the outputs of samtools mpileup to call single nucleotide variants (SNVs) and insertions/deletions (indels). Several key parameters can be set to determine the stringency of variant calling, including minimum quality, minimum allele frequency, and minimum depth.

This task returns a VCF file containing all called variants, the number of detected variants, and the proportion of those variants with allele frequencies between 0.6 and 0.9 (also known as intermediate variants).

For TheiaCoV, the following default parameters are used:

minimum quality: 20
minimum depth: 100
minimum allele frequency: 0.06

iVar Technical Details

	Links
Task	task_ivar_variant_call.wdl
Software Source Code	Ivar on GitHub
Software Documentation	Ivar Documentation
Original Publication(s)	An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar

iVar Consensus Technical Details

	Links
Subworkflow	wf_ivar_consensus.wdl

If influenza

irma: Assembly and Characterization

Cleaned reads are assembled using irma which stands for Iterative Refinement Meta-Assembler. IRMA first sorts reads to Flu genome segments using LABEL, then iteratively maps read to collection of reference sequences (in this case for Influenza virus) and iteratively edits the references to account for high population diversity and mutational rates that are characteristic of Influenza genomes. Assemblies produced by irma will be ordered from largest to smallest assembled flu segment. irma also performs typing and subtyping as part of the assembly process. Note: IRMA does not differentiate between Flu B Victoria and Yamagata lineages. For determining this information, please review the abricate task outputs which will provide this information.

Due to the segmented nature of the Influenza genome and the various downstream bioinformatics tools that require the genome assembly, the IRMA task & TheiaCoV workflows output various genome assembly files. Briefly they are:

assembly_fasta - The full genome assembly in FASTA format, with 1 FASTA entry per genome segment. There should be 8 segments in total, but depending on the quality and depth of sequence data, some segments may not be assembled and nor present in this output file.
irma_assembly_fasta_concatenated - The full genome assembly in FASTA format, but with all segments concatenated into a single FASTA entry. This is not your typical FASTA file and is purposely created to be used with a custom Nextclade dataset for the H5N1 B3.13 genotype that is based on a concatenated reference genome.
irma_<segment-abbreviation>_segment_fasta - Individual FASTA files that only contain the sequence for 1 segment, for example the HA segment. There are 8 of these in total.

General statistics about the assembly are generated with the consensus_qc task (task_assembly_metrics.wdl).

IRMA Technical Details

	Links
Task	task_irma.wdl
Software Documentation	IRMA website
Original Publication(s)	Viral deep sequencing needs an adaptive approach: IRMA, the iterative refinement meta-assembler

read_QC_trim: Read Quality Trimming, Adapter Removal, Quantification, and Identification

read_QC_trim is a sub-workflow that removes low-quality reads, low-quality regions of reads, and sequencing adapters to improve data quality. It uses a number of tasks, described below. The differences between the PE and SE versions of the read_QC_trim sub-workflow lie in the default parameters, the use of two or one input read file(s), and the different output files.

HRRT: Human Host Sequence Removal

All reads of human origin are removed, including their mates, by using NCBI's human read removal tool (HRRT).

HRRT is based on the SRA Taxonomy Analysis Tool and employs a k-mer database constructed of k-mers from Eukaryota derived from all human RefSeq records with any k-mers found in non-Eukaryota RefSeq records subtracted from the database.

NCBI-Scrub Technical Details

	Links
Task	task_ncbi_scrub.wdl
Software Source Code	HRRT on GitHub
Software Documentation	HRRT on NCBI

By default, read_processing is set to "trimmomatic". To use fastp instead, set read_processing to "fastp". These tasks are mutually exclusive.

Trimmomatic: Read Trimming (default)

Read proccessing is available via Trimmomatic by default.

Trimmomatic trims low-quality regions of Illumina paired-end or single-end reads with a sliding window (with a default window size of 4, specified with trim_window_size), cutting once the average quality within the window falls below the trim_quality_trim_score (default of 20 for paired-end, 30 for single-end). The read is discarded if it is trimmed below trim_minlen (default of 75 for paired-end, 25 for single-end).

Trimmomatic Technical Details

	Links
Task	task_trimmomatic.wdl
Software Source Code	Trimmomatic on GitHub
Software Documentation	Trimmomatic Website
Original Publication(s)	Trimmomatic: a flexible trimmer for Illumina sequence data

fastp: Read Trimming (alternative)

To activate this task, set read_processing to "fastp".

fastp trims low-quality regions of Illumina paired-end or single-end reads with a sliding window (with a default window size of 4, specified with trim_window_size), cutting once the average quality within the window falls below the trim_quality_trim_score (default of 20 for paired-end, 30 for single-end). The read is discarded if it is trimmed below trim_minlen (default of 75 for paired-end, 25 for single-end).

fastp also has additional default parameters and features that are not a part of trimmomatic's default configuration.

fastp default read-trimming parameters

Parameter	Explanation
-g	enables polyG tail trimming
-5 20	enables read end-trimming
-3 20	enables read end-trimming
--detect_adapter_for_pe	enables adapter-trimming only for paired-end reads

Additional arguments can be passed using the fastp_args optional parameter.

Trimmomatic and fastp Technical Details

	Links
Task	task_fastp.wdl
Software Source Code	fastp on GitHub
Software Documentation	fastp on GitHub
Original Publication(s)	fastp: an ultra-fast all-in-one FASTQ preprocessor

BBDuk: Adapter Trimming and PhiX Removal

Adapters are manufactured oligonucleotide sequences attached to DNA fragments during the library preparation process. In Illumina sequencing, these adapter sequences are required for attaching reads to flow cells. You can read more about Illumina adapters here. For genome analysis, it's important to remove these sequences since they're not actually from your sample. If you don't remove them, the downstream analysis may be affected.

The bbduk task removes adapters from sequence reads. To do this:

Repair from the BBTools package reorders reads in paired fastq files to ensure the forward and reverse reads of a pair are in the same position in the two fastq files (it re-pairs).
BBDuk ("Bestus Bioinformaticus" Decontamination Using Kmers) is then used to trim the adapters and filter out all reads that have a 31-mer match to PhiX, which is commonly added to Illumina sequencing runs to monitor and/or improve overall run quality.

BBDuk Technical Details

	Links
Task	task_bbduk.wdl
Software Source Code	BBMap on SourceForge
Software Documentation	BBDuk Guide (archived)

By default, read_qc is set to "fastq_scan". To use fastqc instead, set read_qc to "fastqc". These tasks are mutually exclusive.

fastq-scan: Read Quantification (default)

Read quantification is available via fastq-scan by default.

fastq-scan quantifies the forward and reverse reads in FASTQ files. For paired-end data, it also provide the total number of read pairs. This task is run once with raw reads as input and once with clean reads as input. If QC has been performed correctly, you should expect fewer clean reads than raw reads.

fastq-scan Technical Details

	Links
Task	task_fastq_scan.wdl
Software Source Code	fastq-scan on GitHub
Software Documentation	fastq-scan on GitHub

FastQC: Read Quantification (alternative)

To activate this task, set read_qc to "fastqc".

FastQC quantifies the forward and reverse reads in FASTQ files. For paired-end data, it also provide the total number of read pairs. This task is run once with raw reads as input and once with clean reads as input. If QC has been performed correctly, you should expect fewer clean reads than raw reads.

This tool also provides a graphical visualization of the read quality.

FastQC Technical Details

	Links
Task	task_fastqc.wdl
Software Source Code	FastQC on Github
Software Documentation	FastQC Website

Kraken2: Read Identification

Kraken2 is a bioinformatics tool originally designed for metagenomic applications. It has additionally proven valuable for validating taxonomic assignments and checking contamination of single-species (e.g. bacterial isolate, eukaryotic isolate, viral isolate, etc.) whole genome sequence data.

Kraken2 is run on both the raw and clean reads.

Database-dependent

This workflow automatically uses a viral-specific Kraken2 database. This database was generated in-house from RefSeq's viral sequence collection and human genome GRCh38. It's available at gs://theiagen-public-resources-rp/reference_data/databases/kraken2/kraken2_humanGRCh38_viralRefSeq_20240828.tar.gz.

Kraken2 Technical Details

	Links
Task	task_kraken2.wdl
Software Source Code	Kraken2 on GitHub
Software Documentation	Kraken2 Documentation
Original Publication(s)	Improved metagenomic analysis with Kraken 2

read_QC_trim Technical Details

	Links
Subworkflow	wf_read_QC_trim_pe.wdl wf_read_QC_trim_se.wdl

ivar_consensus: Alignment, Consensus, Variant Detection, and Assembly Statistics

iVar Consensus is a sub-workflow within TheiaCoV that performs reference-based consensus assembly using the iVar tool by Nathan Grubaugh from the Andersen lab.

bwa: Read Alignment to the Reference

BWA (Burrow-Wheeler Aligner) is used to align the cleaned read files to a reference genome, either determined by the user or provided by the organism-specific parameters section (see above). The resulting BAM file is used for primer trimming, variant calling, and consensus generation in downstream tasks.

BWA Technical Details

	Links
Task	task_bwa.wdl
Software Source Code	BWA on GitHub
Software Documentation	BWA Documentation
Original Publication(s)	Fast and accurate short read alignment with Burrows-Wheeler transform

ivar_trim: Primer Trimming (optional)

To deactivate this task, set trim_primers to false.

Using the user-provided (or, more rarely, a organism-specific parameters-determined) primer_bed file, iVar soft-clips primer sequences from an aligned and sorted BAM file and then trims the reads based on a quality threshold of 20 using a sliding window approach. If the resulting read is greater than 30 bp, the read is written to a a new BAM file consisting of only trimmed reads (or reads that did not have a primer identified).

iVar Trim Technical Details

	Links
Task	task_ivar_primer_trim.wdl
Software Source Code	iVar on GitHub
Software Documentation	iVar on GitHub
Original Publication(s)	An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar

assembly_metrics: Mapping Statistics

The assembly_metrics task generates mapping statistics from a BAM file. It uses samtools to generate a summary of the mapping statistics, which includes coverage, depth, average base quality, average mapping quality, and other relevant metrics.

This task is run twice: once on the untrimmed reads and, if primer trimming is enabled, once on the primer-trimmed reads. This allows for a comparison of mapping statistics before and after primer trimming, which can be useful for assessing the impact of primer trimming on the quality of the alignment and subsequent analyses.

assembly_metrics Technical Details

	Links
Task	task_assembly_metrics.wdl
Software Source Code	samtools on GitHub
Software Documentation	samtools
Original Publication(s)	The Sequence Alignment/Map format and SAMtools Twelve Years of SAMtools and BCFtools

ivar_consensus: Consensus Assembly

iVar's consensus tool generates a reference-based consensus assembly. Several parameters can be set that determine the stringency of the consensus assembly, including minimum quality, minimum allele frequency, and minimum depth.

For TheiaCoV, the following default parameters are used:

minimum quality: 20
minimum depth: 100
minimum allele frequency: 0.6

iVar Technical Details

	Links
Task	task_ivar_consensus.wdl
Software Source Code	Ivar on GitHub
Software Documentation	Ivar Documentation
Original Publication(s)	An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar

ivar_variants: Variant Calling

iVar uses the outputs of samtools mpileup to call single nucleotide variants (SNVs) and insertions/deletions (indels). Several key parameters can be set to determine the stringency of variant calling, including minimum quality, minimum allele frequency, and minimum depth.

This task returns a VCF file containing all called variants, the number of detected variants, and the proportion of those variants with allele frequencies between 0.6 and 0.9 (also known as intermediate variants).

For TheiaCoV, the following default parameters are used:

minimum quality: 20
minimum depth: 100
minimum allele frequency: 0.06

iVar Technical Details

	Links
Task	task_ivar_variant_call.wdl
Software Source Code	Ivar on GitHub
Software Documentation	Ivar Documentation
Original Publication(s)	An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar

iVar Consensus Technical Details

	Links
Subworkflow	wf_ivar_consensus.wdl

read_QC_trim_ont: Read Quality Trimming, Quantification, and Identification

read_QC_trim_ont is a sub-workflow that filters low-quality reads and trims low-quality regions of reads. It uses several tasks, described below.

HRRT: Human Host Sequence Removal

All reads of human origin are removed, including their mates, by using NCBI's human read removal tool (HRRT).

HRRT is based on the SRA Taxonomy Analysis Tool and employs a k-mer database constructed of k-mers from Eukaryota derived from all human RefSeq records with any k-mers found in non-Eukaryota RefSeq records subtracted from the database.

NCBI-Scrub Technical Details

	Links
Task	task_ncbi_scrub.wdl
Software Source Code	HRRT on GitHub
Software Documentation	HRRT on NCBI

artic_guppyplex: Read Filtering

Reads are filtered by length with artic_guppyplex, which is a part of the ARTIC protocol. Since TheiaCoV was developed primarily for amplicon-based viral sequencing, this task is included to remove chimeric reads that are either too short or too long.

artic_guppyplex Technical Details

	Links
Task	task_artic_guppyplex.wdl
Software Source Code	ARTIC on GitHub
Software Documentation	ARTIC Documentation

Kraken2: Read Identification

Kraken2 is a bioinformatics tool originally designed for metagenomic applications. It has additionally proven valuable for validating taxonomic assignments and checking contamination of single-species (e.g. bacterial isolate, eukaryotic isolate, viral isolate, etc.) whole genome sequence data.

Kraken2 is run on both the raw and clean reads.

Database-dependent

This workflow automatically uses a viral-specific Kraken2 database. This database was generated in-house from RefSeq's viral sequence collection and human genome GRCh38. It's available at gs://theiagen-public-resources-rp/reference_data/databases/kraken2/kraken2_humanGRCh38_viralRefSeq_20240828.tar.gz.

Kraken2 Technical Details

	Links
Task	task_kraken2.wdl
Software Source Code	Kraken2 on GitHub
Software Documentation	Kraken2 Documentation
Original Publication(s)	Improved metagenomic analysis with Kraken 2

NanoPlot: Read Quantification

NanoPlot is used for the determination of mean quality scores, read lengths, and number of reads. This task is run once with raw reads as input and once with clean reads as input. If QC has been performed correctly, you should expect fewer clean reads than raw reads.

While this task currently is run outside of the read_QC_trim_ont workflow, it is being included here as it calculates statistics on the read data. This is done so that the actual assembly genome lengths can be used (if an estimated genome length is not provided by the user) to ensure the estimated coverage statistics are accurate.

NanoPlot Technical Details

	Links
Task	task_nanoplot.wdl
Software Source Code	NanoPlot on GitHub
Software Documentation	NanoPlot Documentation
Original Publication(s)	NanoPack2: population-scale evaluation of long-read sequencing data

read_QC_trim_ont Technical Details

	Links
Subworkflow	wf_read_QC_trim_ont.wdl

If non-influenza

artic_consensus: Alignment, Primer Trimming, Variant Detection, and Consensus

This task runs the Artic minion command which is a pipeline with a number of stages, described in detail in the ARTIC documentation. Briefly, these stages are as follows:

Input reads are aligned to the appropriate reference and only mapped reads are retained. Alignment post-processing occurs, where primers are removed and various trimming steps are undertaken. Variants are detected, and a consensus assembly file is generated.

Please note that the Medaka model is set by default to "r941_min_high_g360" which may not be suitable for your sequencing data. Please be sure to change this parameter if needed.

Read-trimming is performed on raw read data generated on the ClearLabs instrument and thus not a required step in the TheiaCoV_ClearLabs workflow.

Artic Consensus Technical Details

	Links
Task	task_artic_consensus.wdl
Software Source Code	ARTIC on GitHub
Software Documentation	ARTIC Documentation

assembly_metrics: Mapping Statistics

The assembly_metrics task generates mapping statistics from a BAM file. It uses samtools to generate a summary of the mapping statistics, which includes coverage, depth, average base quality, average mapping quality, and other relevant metrics.

This task is run twice: once on the untrimmed reads and, if primer trimming is enabled, once on the primer-trimmed reads. This allows for a comparison of mapping statistics before and after primer trimming, which can be useful for assessing the impact of primer trimming on the quality of the alignment and subsequent analyses.

assembly_metrics Technical Details

	Links
Task	task_assembly_metrics.wdl
Software Source Code	samtools on GitHub
Software Documentation	samtools
Original Publication(s)	The Sequence Alignment/Map format and SAMtools Twelve Years of SAMtools and BCFtools

If influenza

irma: Assembly and Characterization

Cleaned reads are assembled using irma which stands for Iterative Refinement Meta-Assembler. IRMA first sorts reads to Flu genome segments using LABEL, then iteratively maps read to collection of reference sequences (in this case for Influenza virus) and iteratively edits the references to account for high population diversity and mutational rates that are characteristic of Influenza genomes. Assemblies produced by irma will be ordered from largest to smallest assembled flu segment. irma also performs typing and subtyping as part of the assembly process. Note: IRMA does not differentiate between Flu B Victoria and Yamagata lineages. For determining this information, please review the abricate task outputs which will provide this information.

Due to the segmented nature of the Influenza genome and the various downstream bioinformatics tools that require the genome assembly, the IRMA task & TheiaCoV workflows output various genome assembly files. Briefly they are:

assembly_fasta - The full genome assembly in FASTA format, with 1 FASTA entry per genome segment. There should be 8 segments in total, but depending on the quality and depth of sequence data, some segments may not be assembled and nor present in this output file.
irma_assembly_fasta_concatenated - The full genome assembly in FASTA format, but with all segments concatenated into a single FASTA entry. This is not your typical FASTA file and is purposely created to be used with a custom Nextclade dataset for the H5N1 B3.13 genotype that is based on a concatenated reference genome.
irma_<segment-abbreviation>_segment_fasta - Individual FASTA files that only contain the sequence for 1 segment, for example the HA segment. There are 8 of these in total.

General statistics about the assembly are generated with the consensus_qc task (task_assembly_metrics.wdl).

IRMA Technical Details

	Links
Task	task_irma.wdl
Software Documentation	IRMA website
Original Publication(s)	Viral deep sequencing needs an adaptive approach: IRMA, the iterative refinement meta-assembler

assembly_metrics: Mapping Statistics

The assembly_metrics task generates mapping statistics from a BAM file. It uses samtools to generate a summary of the mapping statistics, which includes coverage, depth, average base quality, average mapping quality, and other relevant metrics.

This task is run twice: once on the untrimmed reads and, if primer trimming is enabled, once on the primer-trimmed reads. This allows for a comparison of mapping statistics before and after primer trimming, which can be useful for assessing the impact of primer trimming on the quality of the alignment and subsequent analyses.

assembly_metrics Technical Details

	Links
Task	task_assembly_metrics.wdl
Software Source Code	samtools on GitHub
Software Documentation	samtools
Original Publication(s)	The Sequence Alignment/Map format and SAMtools Twelve Years of SAMtools and BCFtools

HRRT: Human Host Sequence Removal

All reads of human origin are removed, including their mates, by using NCBI's human read removal tool (HRRT).

HRRT is based on the SRA Taxonomy Analysis Tool and employs a k-mer database constructed of k-mers from Eukaryota derived from all human RefSeq records with any k-mers found in non-Eukaryota RefSeq records subtracted from the database.

NCBI-Scrub Technical Details

	Links
Task	task_ncbi_scrub.wdl
Software Source Code	HRRT on GitHub
Software Documentation	HRRT on NCBI

fastq-scan: Read Quantification

fastq-scan quantifies the reads in the FASTQ files. This task is run once with raw reads as input and once with dehosted reads as input. If QC has been performed correctly, you should expect fewer dehosted (or "clean") reads than raw reads.

fastq-scan Technical Details

	Links
Task	task_fastq_scan.wdl
Software Source Code	fastq-scan on GitHub
Software Documentation	fastq-scan on GitHub

Kraken2: Read Identification

Kraken2 is a bioinformatics tool originally designed for metagenomic applications. It has additionally proven valuable for validating taxonomic assignments and checking contamination of single-species (e.g. bacterial isolate, eukaryotic isolate, viral isolate, etc.) whole genome sequence data.

Kraken2 is run on both the raw and clean reads.

Database-dependent

This workflow automatically uses a viral-specific Kraken2 database. This database was generated in-house from RefSeq's viral sequence collection and human genome GRCh38. It's available at gs://theiagen-public-resources-rp/reference_data/databases/kraken2/kraken2_humanGRCh38_viralRefSeq_20240828.tar.gz.

Kraken2 Technical Details

	Links
Task	task_kraken2.wdl
Software Source Code	Kraken2 on GitHub
Software Documentation	Kraken2 Documentation
Original Publication(s)	Improved metagenomic analysis with Kraken 2

artic_consensus: Alignment, Primer Trimming, Variant Detection, and Consensus

This task runs the Artic minion command which is a pipeline with a number of stages, described in detail in the ARTIC documentation. Briefly, these stages are as follows:

Input reads are aligned to the appropriate reference and only mapped reads are retained. Alignment post-processing occurs, where primers are removed and various trimming steps are undertaken. Variants are detected, and a consensus assembly file is generated.

Please note that the Medaka model is set by default to "r941_min_high_g360" which may not be suitable for your sequencing data. Please be sure to change this parameter if needed.

Read-trimming is performed on raw read data generated on the ClearLabs instrument and thus not a required step in the TheiaCoV_ClearLabs workflow.

Artic Consensus Technical Details

	Links
Task	task_artic_consensus.wdl
Software Source Code	ARTIC on GitHub
Software Documentation	ARTIC Documentation

assembly_metrics: Mapping Statistics

The assembly_metrics task generates mapping statistics from a BAM file. It uses samtools to generate a summary of the mapping statistics, which includes coverage, depth, average base quality, average mapping quality, and other relevant metrics.

This task is run twice: once on the untrimmed reads and, if primer trimming is enabled, once on the primer-trimmed reads. This allows for a comparison of mapping statistics before and after primer trimming, which can be useful for assessing the impact of primer trimming on the quality of the alignment and subsequent analyses.

assembly_metrics Technical Details

	Links
Task	task_assembly_metrics.wdl
Software Source Code	samtools on GitHub
Software Documentation	samtools
Original Publication(s)	The Sequence Alignment/Map format and SAMtools Twelve Years of SAMtools and BCFtools

Since this workflow requires FASTA files as input, no assembly or read trimming is performed, and the workflow proceeds directly to the "post-assembly tasks" section below.

Post-Assembly Tasks¶

These tasks are performed for all organisms after assembly (or directly after input for TheiaCoV_FASTA).

consensus_qc: Assembly Statistics

The consensus_qc task generates a summary of genomic statistics from a consensus genome. This includes the total number of bases, "N" bases, degenerate bases, and an estimate of the percent coverage to the reference genome.

consensus_qc Technical Details

	Links
Task	task_consensus_qc.wdl

qc_check: Check QC Metrics Against User-Defined Thresholds (optional)

To activate this task, provide a qc_check_table as input.

The qc_check task compares generated QC metrics against user-defined thresholds for each metric. This task will run if the user provides a qc_check_table TSV file. If all QC metrics meet the threshold, the qc_check output variable will read QC_PASS. Otherwise, the output will read QC_NA if the task could not proceed or QC_ALERT followed by a string indicating what metric failed.

The qc_check task applies quality thresholds according to the specified organism, which should match the standardized organism input in the TheiaCoV workflows.

Formatting the qc_check_table.tsv

The first column of the qc_check_table lists the organism that the task will assess and the header of this column must be "taxon".
Each subsequent column indicates a QC metric and lists a threshold for each organism that will be checked. The column names must exactly match expected values, so we highly recommend copy and pasting the header from the template file below as a starting place.

Template qc_check_table.tsv files

TheiaCoV_Illumina_PE: TheiaCoV_Illumina_PE_qc_check_template.tsv

Example Purposes Only

The QC threshold values shown in the file above are for example purposes only and should not be presumed to be sufficient for every dataset.

qc_check Technical Details

	Links
Task	task_qc_check_phb.wdl

Organism-specific Characterization Tasks¶

The following tasks are organism-specific. The following table summarizes the characterization tools that are run for the indicated organism.

	SARS-CoV-2	Mpox	West Nile Virus	Influenza	RSV-A	RSV-B	HIV	Measles	Mumps	Rubella
Pangolin	✅	❌	❌	❌	❌	❌	❌	❌	❌	❌
Nextclade	✅	✅	❌	✅	✅	✅	❌	✅	❌	❌
VADR	✅	✅	✅	✅	✅	✅	❌	✅	✅	✅
VADR Flu Segments	❌	❌	❌	✅	❌	❌	❌	❌	❌	❌
Quasitools HyDRA	❌	❌	❌	❌	❌	❌	✅	❌	❌	❌
IRMA	❌	❌	❌	✅	❌	❌	❌	❌	❌	❌
Abricate	❌	❌	❌	✅	❌	❌	❌	❌	❌	❌
% Gene Coverage	✅	✅	➕	➕	➕	➕	➕	➕	➕	➕
Antiviral Detection	❌	❌	❌	✅	❌	❌	❌	❌	❌	❌
GenoFLU	❌	❌	❌	✅	❌	❌	❌	❌	❌	❌

✅ This task runs automatically for these organisms
➕ This task can run for these organisms if optional parameter(s) are provided; see task description for details.
❌ This task will not run for these organisms

SARS-CoV-2MpoxWest Nile VirusInfluenzaRSV-ARSV-BHIVMeaslesMumpsRubella

pangolin

Pangolin (Phylogenetic Assignment of Named Global Outbreak Lineages) was developed to implement a dynamic nomenclature for designating SARS-CoV-2 lineage assignments and is used by researchers and public health agencies worldwide to track the spread and transmission of SARS-CoV-2.

Pangolin aligns input sequences against an early SARS-CoV-2 reference and generates a unique hash for the alignment. The hash is checked against a designation cache to see if it matches any previously identified lineages, and is checked via scorpio (Serious Constellations of Reoccuring Phylogenetically-Independent Origin) to determine if the hash matches any variant of concern (VOC) constellations, which are groups of functionally meaningful mutations that can independently evolve. Following a QC check, an inference pipeline is run: either pangoLEARN or UShER (which is the default inference model). The final lineage report is then generated.

Pangolin Technical Details

	Links
Task	task_pangolin.wdl
Software Source Code	Pangolin on GitHub
Software Documentation	Pangolin on cov-lineages.org
Original Publication(s)	A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology

nextclade

Nextclade is an open-source project used to analyze viral genomes, particularly for clade assignment and mutation calling. Simply, Nextclade works by aligning viral genomes to a reference genome, calling variants between the two sequences, and then assigning clades based on those identified mutations.

Clade assignment is performed via phylogenetic placement. Phylogenetic placement compares the mutations of the provided sequence to the mutations of each node found in a reference tree, where the root of that tree is the reference genome. The node that is most similar to the sample is used to both assign a clade designation and calculate where the sample should be placed in the phylogenetic tree.

Nextclade Technical Details

	Links
Task	task_nextclade.wdl
Software Source Code	https://github.com/nextstrain/nextclade
Software Documentation	Nextclade
Original Publication(s)	Nextclade: clade assignment, mutation calling and quality control for viral genomes.

vadr

VADR (Viral Annotation DefineR) annotates and validates completed assembly files. For details on VADR default models/parameters, see the organism-specific parameters and logic section. It was primarily developed to test viral sequences to confirm they would be accepted to NCBI's GenBank data repository, but has found wide usage in general sequence validation and annotation.

As part of the analysis of the assemblies, more than 70 types of unexpected characteristics, also known as alerts, can be reported. Any identified alerts can be found in the vadr_alerts_list output. Fatal alerts indicate that the sample is unlikely to be accepted to GenBank; non-fatal alerts are designated as passing sequences, but may still require further investigation. A full description of the potential alerts can be found on the VADR README here, including details on how to allow sequencecs to pass despite having fatal alerts.

VADR Technical Details

	Links
Task	task_vadr.wdl
Software Source Code	https://github.com/ncbi/vadr
Software Documentation	https://github.com/ncbi/vadr/wiki
Original Publication(s)	For SARS-CoV-2: Faster SARS-CoV-2 sequence validation and annotation for GenBank using VADR For non-SARS_CoV-2: VADR: validation and annotation of virus sequence submissions to GenBank

gene_coverage

This task calculates the percent of a region (typically genes) covered above a minimum depth using samtools and basic arithmetic. By default, this task runs for SARS-CoV-2 and Mpox, but if a BED file is provided with regions of interest, this task can run for other organisms as well.

Gene Coverage Technical Details

	Links
Task	task_gene_coverage.wdl
Software Source Code	SAMtools on GitHub
Software Documentation	SAMTools Manual
Original Publication(s)	Twelve years of SAMtools and BCFtools

nextclade

Nextclade is an open-source project used to analyze viral genomes, particularly for clade assignment and mutation calling. Simply, Nextclade works by aligning viral genomes to a reference genome, calling variants between the two sequences, and then assigning clades based on those identified mutations.

Clade assignment is performed via phylogenetic placement. Phylogenetic placement compares the mutations of the provided sequence to the mutations of each node found in a reference tree, where the root of that tree is the reference genome. The node that is most similar to the sample is used to both assign a clade designation and calculate where the sample should be placed in the phylogenetic tree.

Nextclade Technical Details

	Links
Task	task_nextclade.wdl
Software Source Code	https://github.com/nextstrain/nextclade
Software Documentation	Nextclade
Original Publication(s)	Nextclade: clade assignment, mutation calling and quality control for viral genomes.

vadr

VADR (Viral Annotation DefineR) annotates and validates completed assembly files. For details on VADR default models/parameters, see the organism-specific parameters and logic section. It was primarily developed to test viral sequences to confirm they would be accepted to NCBI's GenBank data repository, but has found wide usage in general sequence validation and annotation.

As part of the analysis of the assemblies, more than 70 types of unexpected characteristics, also known as alerts, can be reported. Any identified alerts can be found in the vadr_alerts_list output. Fatal alerts indicate that the sample is unlikely to be accepted to GenBank; non-fatal alerts are designated as passing sequences, but may still require further investigation. A full description of the potential alerts can be found on the VADR README here, including details on how to allow sequencecs to pass despite having fatal alerts.

VADR Technical Details

	Links
Task	task_vadr.wdl
Software Source Code	https://github.com/ncbi/vadr
Software Documentation	https://github.com/ncbi/vadr/wiki
Original Publication(s)	For SARS-CoV-2: Faster SARS-CoV-2 sequence validation and annotation for GenBank using VADR For non-SARS_CoV-2: VADR: validation and annotation of virus sequence submissions to GenBank

gene_coverage

This task calculates the percent of a region (typically genes) covered above a minimum depth using samtools and basic arithmetic. By default, this task runs for SARS-CoV-2 and Mpox, but if a BED file is provided with regions of interest, this task can run for other organisms as well.

Gene Coverage Technical Details

	Links
Task	task_gene_coverage.wdl
Software Source Code	SAMtools on GitHub
Software Documentation	SAMTools Manual
Original Publication(s)	Twelve years of SAMtools and BCFtools

vadr

VADR (Viral Annotation DefineR) annotates and validates completed assembly files. For details on VADR default models/parameters, see the organism-specific parameters and logic section. It was primarily developed to test viral sequences to confirm they would be accepted to NCBI's GenBank data repository, but has found wide usage in general sequence validation and annotation.

As part of the analysis of the assemblies, more than 70 types of unexpected characteristics, also known as alerts, can be reported. Any identified alerts can be found in the vadr_alerts_list output. Fatal alerts indicate that the sample is unlikely to be accepted to GenBank; non-fatal alerts are designated as passing sequences, but may still require further investigation. A full description of the potential alerts can be found on the VADR README here, including details on how to allow sequencecs to pass despite having fatal alerts.

VADR Technical Details

	Links
Task	task_vadr.wdl
Software Source Code	https://github.com/ncbi/vadr
Software Documentation	https://github.com/ncbi/vadr/wiki
Original Publication(s)	For SARS-CoV-2: Faster SARS-CoV-2 sequence validation and annotation for GenBank using VADR For non-SARS_CoV-2: VADR: validation and annotation of virus sequence submissions to GenBank

nextclade

Nextclade is an open-source project used to analyze viral genomes, particularly for clade assignment and mutation calling. Simply, Nextclade works by aligning viral genomes to a reference genome, calling variants between the two sequences, and then assigning clades based on those identified mutations.

Clade assignment is performed via phylogenetic placement. Phylogenetic placement compares the mutations of the provided sequence to the mutations of each node found in a reference tree, where the root of that tree is the reference genome. The node that is most similar to the sample is used to both assign a clade designation and calculate where the sample should be placed in the phylogenetic tree.

Nextclade Technical Details

	Links
Task	task_nextclade.wdl
Software Source Code	https://github.com/nextstrain/nextclade
Software Documentation	Nextclade
Original Publication(s)	Nextclade: clade assignment, mutation calling and quality control for viral genomes.

vadr

VADR (Viral Annotation DefineR) annotates and validates completed assembly files. For details on VADR default models/parameters, see the organism-specific parameters and logic section. It was primarily developed to test viral sequences to confirm they would be accepted to NCBI's GenBank data repository, but has found wide usage in general sequence validation and annotation.

As part of the analysis of the assemblies, more than 70 types of unexpected characteristics, also known as alerts, can be reported. Any identified alerts can be found in the vadr_alerts_list output. Fatal alerts indicate that the sample is unlikely to be accepted to GenBank; non-fatal alerts are designated as passing sequences, but may still require further investigation. A full description of the potential alerts can be found on the VADR README here, including details on how to allow sequencecs to pass despite having fatal alerts.

VADR Technical Details

	Links
Task	task_vadr.wdl
Software Source Code	https://github.com/ncbi/vadr
Software Documentation	https://github.com/ncbi/vadr/wiki
Original Publication(s)	For SARS-CoV-2: Faster SARS-CoV-2 sequence validation and annotation for GenBank using VADR For non-SARS_CoV-2: VADR: validation and annotation of virus sequence submissions to GenBank

vadr_flu_segments

This task processes a full or partial influenza genome assembly in multifasta format, along with the output .tar.gz file from a VADR run. It extracts each segment into its own fasta file and also generates a concatenated fasta containing all segments combined into a single sequence. Segment names are assigned based on the specified flu type (A or B) and the segment classification found in the VADR .sqc file.

Note: Results may be unreliable if segment lengths deviate from those expected for Influenza A or B. For best results, the input assembly should contain all 8 segments as separate contigs. If the assembly is partial, the task will still extract available segments but may not produce a complete concatenated sequence. Empty fasta files will be created for missing segments.

VADR Flu Segments Technical Details

	Links
Task	task_vadr_flu_segments.wdl

irma

Cleaned reads are assembled using irma which stands for Iterative Refinement Meta-Assembler. IRMA first sorts reads to Flu genome segments using LABEL, then iteratively maps read to collection of reference sequences (in this case for Influenza virus) and iteratively edits the references to account for high population diversity and mutational rates that are characteristic of Influenza genomes. Assemblies produced by irma will be ordered from largest to smallest assembled flu segment. irma also performs typing and subtyping as part of the assembly process. Note: IRMA does not differentiate between Flu B Victoria and Yamagata lineages. For determining this information, please review the abricate task outputs which will provide this information.

IRMA Technical Details

	Links
Task	task_irma.wdl
Software Documentation	IRMA website
Original Publication(s)	Viral deep sequencing needs an adaptive approach: IRMA, the iterative refinement meta-assembler

abricate

ABRicate assigns types and subtype/lineages for flu samples using a version of the INSaFLU ("INSide the FLU") database described here.

ABRicate typically works by screening contigs for the presence of acquired resistance genes, but when using the INSaFLU database, the algorithm works by assigning contigs to the most closely corresponding viral segment in the INSaFLU database, which is used to call the flu type and subtype.

ABRicate Technical Details

	Links
Task	task_abricate.wdl (abricate_flu subtask)
Software Source Code	ABRicate on GitHub
Software Documentation	ABRicate on GitHub
Original Publication(s)	INSaFLU database: INSaFLU: an automated open web-based bioinformatics suite "from-reads" for influenza whole-genome-sequencing-based surveillance

flu_antiviral_substitutions

This subworkflow determines if any antiviral mutations are present in the HA, NA, and MP segments of H1N1 or H3N2 flu sample, or any in non-subtype-specific PA, PB1, and PB2 segments.

These mutations are identified by generating a multiple sequence alignment (MSA) between each individual flu segment and the respective reference genome using MAFFT. Amino acid mutations are then called from the MSA. The resulting mutations are compared against a list of known amino-acid substitutions associated with antiviral resistance and any matches are reported.

This list of amino-acid substitutions includes both substitutions reported in the scientific literature and those inferred to potentially cause antiviral resistance based on analogous antiviral mutations in other flu subtypes. A table with the explanation for each amino-acid substitution in the antiviral resistance task is available here.

The list of known amino-acid substitutions associated with resistance can be expanded via optional user input antiviral_aa_subs in the format "NA:V95A,HA:I97V", i.e. Protein:AAPositionAA.

Currently, the default mutations considered confer resistance to the following antivirals

A_315675
Amantadine
Compound_367
Favipiravir
Fludase
L_742_001
Laninamivir
Oseltamivir (tamiflu)
Peramivir
Pimodivir
Rimantadine
Xofluza
Zanamivir

Antiviral Substitutions Technical Details

	Links
Sub-workflow	wf_influenza_antiviral_substitutions.wdl
Tasks	task_mafft.wdl task_flu_antiviral_subs.wdl
Original Publication(s)	MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability Next-Generation Sequencing: An Eye-Opener for the Surveillance of Antiviral Resistance in Influenza

genoflu

This task determines the whole-genome genotype of a H5N1 (currently only for the 2.3.4.4b clade of H5N1) flu sample by comparing each segment of the sample against a curated database of H5N1 references. Each segment is assigned a type, and the whole-genome genotype is assigned based on the combination of segment types, according to the GenoFLU reference table.

GenoFLU Technical Details

	Links
Task	task_genoflu.wdl
Software Source Code	GenoFLU on GitHub
Software Documentation	GenoFLU on GitHub
Original Publication(s)	H5N1 highly pathogenic avian influenza clade 2.3.4.4b in wild and domestic birds: Introductions into the United States and reassortments, December 2021-April 2022

nextclade

Nextclade is an open-source project used to analyze viral genomes, particularly for clade assignment and mutation calling. Simply, Nextclade works by aligning viral genomes to a reference genome, calling variants between the two sequences, and then assigning clades based on those identified mutations.

Clade assignment is performed via phylogenetic placement. Phylogenetic placement compares the mutations of the provided sequence to the mutations of each node found in a reference tree, where the root of that tree is the reference genome. The node that is most similar to the sample is used to both assign a clade designation and calculate where the sample should be placed in the phylogenetic tree.

Nextclade Technical Details

	Links
Task	task_nextclade.wdl
Software Source Code	https://github.com/nextstrain/nextclade
Software Documentation	Nextclade
Original Publication(s)	Nextclade: clade assignment, mutation calling and quality control for viral genomes.

vadr

VADR (Viral Annotation DefineR) annotates and validates completed assembly files. For details on VADR default models/parameters, see the organism-specific parameters and logic section. It was primarily developed to test viral sequences to confirm they would be accepted to NCBI's GenBank data repository, but has found wide usage in general sequence validation and annotation.

As part of the analysis of the assemblies, more than 70 types of unexpected characteristics, also known as alerts, can be reported. Any identified alerts can be found in the vadr_alerts_list output. Fatal alerts indicate that the sample is unlikely to be accepted to GenBank; non-fatal alerts are designated as passing sequences, but may still require further investigation. A full description of the potential alerts can be found on the VADR README here, including details on how to allow sequencecs to pass despite having fatal alerts.

VADR Technical Details

	Links
Task	task_vadr.wdl
Software Source Code	https://github.com/ncbi/vadr
Software Documentation	https://github.com/ncbi/vadr/wiki
Original Publication(s)	For SARS-CoV-2: Faster SARS-CoV-2 sequence validation and annotation for GenBank using VADR For non-SARS_CoV-2: VADR: validation and annotation of virus sequence submissions to GenBank

nextclade

Nextclade is an open-source project used to analyze viral genomes, particularly for clade assignment and mutation calling. Simply, Nextclade works by aligning viral genomes to a reference genome, calling variants between the two sequences, and then assigning clades based on those identified mutations.

Clade assignment is performed via phylogenetic placement. Phylogenetic placement compares the mutations of the provided sequence to the mutations of each node found in a reference tree, where the root of that tree is the reference genome. The node that is most similar to the sample is used to both assign a clade designation and calculate where the sample should be placed in the phylogenetic tree.

Nextclade Technical Details

	Links
Task	task_nextclade.wdl
Software Source Code	https://github.com/nextstrain/nextclade
Software Documentation	Nextclade
Original Publication(s)	Nextclade: clade assignment, mutation calling and quality control for viral genomes.

vadr

VADR (Viral Annotation DefineR) annotates and validates completed assembly files. For details on VADR default models/parameters, see the organism-specific parameters and logic section. It was primarily developed to test viral sequences to confirm they would be accepted to NCBI's GenBank data repository, but has found wide usage in general sequence validation and annotation.

As part of the analysis of the assemblies, more than 70 types of unexpected characteristics, also known as alerts, can be reported. Any identified alerts can be found in the vadr_alerts_list output. Fatal alerts indicate that the sample is unlikely to be accepted to GenBank; non-fatal alerts are designated as passing sequences, but may still require further investigation. A full description of the potential alerts can be found on the VADR README here, including details on how to allow sequencecs to pass despite having fatal alerts.

VADR Technical Details

	Links
Task	task_vadr.wdl
Software Source Code	https://github.com/ncbi/vadr
Software Documentation	https://github.com/ncbi/vadr/wiki
Original Publication(s)	For SARS-CoV-2: Faster SARS-CoV-2 sequence validation and annotation for GenBank using VADR For non-SARS_CoV-2: VADR: validation and annotation of virus sequence submissions to GenBank

quasitools

quasitools performs genomic characterization for HIV by using the HyDRA module for identifying drug resistance mutations in HIV-1 samples based on the Stanford HIV Drug Resistance Database and the 2009 WHO list for Surveillance of Transmitted HIVDR; see also the papers linked below.

The HyDRA module in quasitools maps the sample sequence against an annotated HIV-1 reference and performs variant calling. Those variants are compared to the databases described above, and any matches are reported, along with the complete list of variants.

quasitools Technical Details

	Links
Task	task_quasitools.wdl
Software Source Code	quasitools on GitHub
Software Documentation	quasitools HyDRA README
Original Publication(s)	quasitools preprint: quasitools: A Collection of Tools for Viral Quasispecies Analysis WHO 2009 Database: Drug resistance mutations for surveillance of transmitted HIV-1 drug-resistance: 2009 update Stanford Database: Human immunodeficiency virus reverse transcriptase and protease sequence database

nextclade

Nextclade is an open-source project used to analyze viral genomes, particularly for clade assignment and mutation calling. Simply, Nextclade works by aligning viral genomes to a reference genome, calling variants between the two sequences, and then assigning clades based on those identified mutations.

Clade assignment is performed via phylogenetic placement. Phylogenetic placement compares the mutations of the provided sequence to the mutations of each node found in a reference tree, where the root of that tree is the reference genome. The node that is most similar to the sample is used to both assign a clade designation and calculate where the sample should be placed in the phylogenetic tree.

Nextclade Technical Details

	Links
Task	task_nextclade.wdl
Software Source Code	https://github.com/nextstrain/nextclade
Software Documentation	Nextclade
Original Publication(s)	Nextclade: clade assignment, mutation calling and quality control for viral genomes.

vadr

VADR (Viral Annotation DefineR) annotates and validates completed assembly files. For details on VADR default models/parameters, see the organism-specific parameters and logic section. It was primarily developed to test viral sequences to confirm they would be accepted to NCBI's GenBank data repository, but has found wide usage in general sequence validation and annotation.

As part of the analysis of the assemblies, more than 70 types of unexpected characteristics, also known as alerts, can be reported. Any identified alerts can be found in the vadr_alerts_list output. Fatal alerts indicate that the sample is unlikely to be accepted to GenBank; non-fatal alerts are designated as passing sequences, but may still require further investigation. A full description of the potential alerts can be found on the VADR README here, including details on how to allow sequencecs to pass despite having fatal alerts.

VADR Technical Details

	Links
Task	task_vadr.wdl
Software Source Code	https://github.com/ncbi/vadr
Software Documentation	https://github.com/ncbi/vadr/wiki
Original Publication(s)	For SARS-CoV-2: Faster SARS-CoV-2 sequence validation and annotation for GenBank using VADR For non-SARS_CoV-2: VADR: validation and annotation of virus sequence submissions to GenBank

vadr

VADR (Viral Annotation DefineR) annotates and validates completed assembly files. For details on VADR default models/parameters, see the organism-specific parameters and logic section. It was primarily developed to test viral sequences to confirm they would be accepted to NCBI's GenBank data repository, but has found wide usage in general sequence validation and annotation.

As part of the analysis of the assemblies, more than 70 types of unexpected characteristics, also known as alerts, can be reported. Any identified alerts can be found in the vadr_alerts_list output. Fatal alerts indicate that the sample is unlikely to be accepted to GenBank; non-fatal alerts are designated as passing sequences, but may still require further investigation. A full description of the potential alerts can be found on the VADR README here, including details on how to allow sequencecs to pass despite having fatal alerts.

VADR Technical Details

	Links
Task	task_vadr.wdl
Software Source Code	https://github.com/ncbi/vadr
Software Documentation	https://github.com/ncbi/vadr/wiki
Original Publication(s)	For SARS-CoV-2: Faster SARS-CoV-2 sequence validation and annotation for GenBank using VADR For non-SARS_CoV-2: VADR: validation and annotation of virus sequence submissions to GenBank

vadr

VADR (Viral Annotation DefineR) annotates and validates completed assembly files. For details on VADR default models/parameters, see the organism-specific parameters and logic section. It was primarily developed to test viral sequences to confirm they would be accepted to NCBI's GenBank data repository, but has found wide usage in general sequence validation and annotation.

As part of the analysis of the assemblies, more than 70 types of unexpected characteristics, also known as alerts, can be reported. Any identified alerts can be found in the vadr_alerts_list output. Fatal alerts indicate that the sample is unlikely to be accepted to GenBank; non-fatal alerts are designated as passing sequences, but may still require further investigation. A full description of the potential alerts can be found on the VADR README here, including details on how to allow sequencecs to pass despite having fatal alerts.

VADR Technical Details

	Links
Task	task_vadr.wdl
Software Source Code	https://github.com/ncbi/vadr
Software Documentation	https://github.com/ncbi/vadr/wiki
Original Publication(s)	For SARS-CoV-2: Faster SARS-CoV-2 sequence validation and annotation for GenBank using VADR For non-SARS_CoV-2: VADR: validation and annotation of virus sequence submissions to GenBank

Outputs¶

TheiaCoV_Illumina_PETheiaCoV_Illumina_SETheiaCoV_ONTTheiaCoV_FASTATheiaCoV_ClearLabsTheiaCoV_FASTA_Batch

Variable	Type	Description
abricate_flu_database	String	ABRicate database used for analysis
abricate_flu_results	File	File containing all results from ABRicate
abricate_flu_subtype	String	Flu subtype as determined by ABRicate
abricate_flu_type	String	Flu type as determined by ABRicate
abricate_flu_version	String	Version of ABRicate
aligned_bai	String	Index companion file to the bam file generated during the consensus assembly process
aligned_bam	String	Sorted BAM file containing the alignments of reads to the reference genome
assembly_fasta	String	Consensus genome assembly; for lower quality flu samples, the output may state "Assembly could not be generated" when there is too little and/or too low quality data for IRMA to produce an assembly. Contigs will be ordered from largest to smallest when IRMA is used.
assembly_length_unambiguous	Int	Number of unambiguous basecalls within the consensus assembly
assembly_mean_coverage	String	Mean sequencing depth throughout the consensus assembly. Generated after performing primer trimming and calculated using the SAMtools coverage command
assembly_method	String	Method employed to generate consensus assembly
auspice_json	File	Auspice-compatable JSON output generated from Nextclade analysis that includes the Nextclade default samples for clade-typing and the single sample placed on this tree
auspice_json_flu_h5n1	File	Auspice-compatable JSON output generated from Nextclade analysis on Influenza H5N1 whole genome that includes the samples included in the "avian-flu/h5n1-cattle-outbreak" nextstrain build that is focused on B3.13 genotype and the single sample placed on this tree
auspice_json_flu_ha	File	Auspice-compatable JSON output generated from Nextclade analysis on Influenza HA segment that includes the Nextclade default samples for clade-typing and the single sample placed on this tree
auspice_json_flu_na	File	Auspice-compatable JSON output generated from Nextclade analysis on Influenza NA segment that includes the Nextclade default samples for clade-typing and the single sample placed on this tree
bbduk_docker	String	The Docker image for bbduk, which was used to remove the adapters from the sequences
bwa_version	String	Version of BWA software used
consensus_flagstat	File	Output from the SAMtools flagstat command to assess quality of the alignment file (BAM)
consensus_n_variant_min_depth	Int	Minimum read depth to call variants for iVar consensus and iVar variants. Also represents the minimum consensus support threshold used by IRMA with Illumina Influenza data.
consensus_stats	File	Output from the SAMtools stats command to assess quality of the alignment file (BAM)
est_percent_gene_coverage_tsv	File	Percent coverage for each gene in the organism being analyzed (depending on the organism input)
fastp_html_report	File	The HTML report made with fastp
fastp_version	String	The version of fastp used
fastq_scan_clean1_json	File	The JSON file output from `fastq-scan` containing summary stats about clean forward read quality and length
fastq_scan_clean2_json	File	The JSON file output from `fastq-scan` containing summary stats about clean reverse read quality and length
fastq_scan_docker	String	The Docker image of fastq_scan
fastq_scan_num_reads_clean1	Int	The number of forward reads after cleaning as calculated by fastq_scan
fastq_scan_num_reads_clean2	Int	The number of reverse reads after cleaning as calculated by fastq_scan
fastq_scan_num_reads_clean_pairs	String	The number of read pairs after cleaning as calculated by fastq_scan
fastq_scan_num_reads_raw1	Int	The number of input forward reads as calculated by fastq_scan
fastq_scan_num_reads_raw2	Int	The number of input reserve reads as calculated by fastq_scan
fastq_scan_num_reads_raw_pairs	String	The number of input read pairs as calculated by fastq_scan
fastq_scan_r1_mean_q_clean	Float	The average quality of forward reads after cleaning as calculated by fastq_scan
fastq_scan_r1_mean_q_raw	Float	The average quality of forward reads as calculated by fastq_scan
fastq_scan_r1_mean_readlength_clean	Float	The average read length of forward reads after cleaning as calculated by fastq_scan
fastq_scan_r1_mean_readlength_raw	Float	The average read length of forward reads as calculated by fastq_scan
fastq_scan_r2_mean_q_clean	Float	The average quality of reverse reads after cleaning as calculated by fastq_scan
fastq_scan_r2_mean_q_raw	Float	The average quality of reverse reads as calculated by fastq_scan
fastq_scan_r2_mean_readlength_clean	Float	The average read length of reverse reads after cleaning as calculated by fastq_scan
fastq_scan_r2_mean_readlength_raw	Float	The average read length of reverse reads as calculated by fastq_scan
fastq_scan_raw1_json	File	The JSON file output from `fastq-scan` containing summary stats about raw forward read quality and length
fastq_scan_raw2_json	File	The JSON file output from `fastq-scan` containing summary stats about raw reverse read quality and length
fastq_scan_version	String	The version of fastq_scan
fastqc_clean1_html	File	An HTML file that provides a graphical visualization of clean forward read quality from fastqc to open in an internet browser
fastqc_clean2_html	File	An HTML file that provides a graphical visualization of clean reverse read quality from fastqc to open in an internet browser
fastqc_docker	String	The Docker container used for fastqc
fastqc_num_reads_clean1	Int	The number of forward reads after cleaning by fastqc
fastqc_num_reads_clean2	Int	The number of reverse reads after cleaning by fastqc
fastqc_num_reads_clean_pairs	String	The number of read pairs after cleaning by fastqc
fastqc_num_reads_raw1	Int	The number of input forward reads by fastqc before cleaning
fastqc_num_reads_raw2	Int	The number of input reverse reads by fastqc before cleaning
fastqc_num_reads_raw_pairs	String	The number of input read pairs by fastqc before cleaning
fastqc_raw1_html	File	An HTML file that provides a graphical visualization of raw forward read quality from fastqc to open in an internet browser
fastqc_raw2_html	File	An HTML file that provides a graphical visualization of raw reverse read quality from fastqc to open in an internet browser
fastqc_version	String	Version of fastqc software used
flu_A_315675_resistance	String	resistance mutations to A_315675
flu_L_742_001_resistance	String	resistance mutations to L_742_001
flu_amantadine_resistance	String	resistance mutations to amantadine
flu_compound_367_resistance	String	resistance mutations to compound_367
flu_favipiravir_resistance	String	resistance mutations to favipiravir
flu_fludase_resistance	String	resistance mutations to fludase
flu_laninamivir_resistance	String	resistance mutations to laninamivir
flu_oseltamivir_resistance	String	resistance mutations to oseltamivir (Tamiflu®)
flu_peramivir_resistance	String	resistance mutations to peramivir (Rapivab®)
flu_pimodivir_resistance	String	resistance mutations to pimodivir
flu_rimantadine_resistance	String	resistance mutations to rimantadine
flu_xofluza_resistance	String	resistance mutations to xofluza (Baloxavir marboxil)
flu_zanamivir_resistance	String	resistance mutations to zanamivir (Relenza®)
genoflu_all_segments	String	The genotypes for each individual flu segment
genoflu_genotype	String	The genotype of the whole genome, based off of the individual segments types
genoflu_output_tsv	File	The output file from GenoFLU
genoflu_version	String	The version of GenoFLU used
irma_all_deletions_tsv	File	Concatenated TSV file of all deletions identified by IRMA
irma_all_insertions_tsv	File	Concatenated TSV file of all insertions identified by IRMA
irma_all_snvs_tsv	File	Concatenated TSV file of all SNVs identified by IRMA
irma_assembly_fasta_concatenated	File	Assembly FASTA file of all Influenza genome segments concatenated into one sequence/FASTA entry
irma_bams	Array[File]	Aligned reads from IRMA
irma_docker	String	Docker image used to run IRMA
irma_ha_segment_fasta	File	HA (Haemagglutinin) assembly fasta file
irma_mp_segment_fasta	File	MP (Matrix Protein) assembly fasta file
irma_na_segment_fasta	File	NA (Neuraminidase) assembly fasta file
irma_np_segment_fasta	File	NP (Nucleoprotein) assembly fasta file
irma_ns_segment_fasta	File	NS (Nonstructural) assembly fasta file
irma_pa_segment_fasta	File	PA (Polymerase acidic) assembly fasta file
irma_pb1_segment_fasta	File	PB1 (Polymerase basic 1) assembly fasta file
irma_pb2_segment_fasta	File	PB2 (Polymerase basic 2) assembly fasta file
irma_qc_summary_tsv	File	TSV file summarizing IRMA quality control metrics
irma_subtype	String	Flu subtype as determined by IRMA
irma_subtype_notes	String	Helpful note to user about Flu B subtypes. Output will be blank for Flu A samples. For Flu B samples it will state: "IRMA does not differentiate Victoria and Yamagata Flu B lineages. See abricate_flu_subtype output column"
irma_type	String	Flu type as determined by IRMA
irma_version	String	Version of IRMA used
ivar_tsv	File	Variant descriptor file generated by iVar variants
ivar_variant_proportion_intermediate	String	The proportion of variants of intermediate frequency
ivar_variant_version	String	Version of iVar for running the iVar variants command
ivar_vcf	File	iVar tsv output converted to VCF format
ivar_version_consensus	String	Version of iVar for running the iVar consensus command
ivar_version_primtrim	String	Version of iVar for running the iVar trim command
kraken_human	Float	Percent of human read data detected using the Kraken2 software
kraken_human_dehosted	Float	Percent of human read data detected using the Kraken2 software after host removal
kraken_report	File	Full Kraken report
kraken_report_dehosted	File	Full Kraken report after host removal
kraken_sc2	String	Percent of SARS-CoV-2 read data detected using the Kraken2 software
kraken_sc2_dehosted	String	Percent of SARS-CoV-2 read data detected using the Kraken2 software after host removal
kraken_target_organism	String	Percent of target organism read data detected using the Kraken2 software
kraken_target_organism_dehosted	String	Percent of target organism read data detected using the Kraken2 software after host removal
kraken_target_organism_name	String	The name of the target organism; e.g., "Monkeypox" or "Human immunodeficiency virus"
kraken_version	String	Version of Kraken software used
meanbaseq_trim	String	Mean quality of the nucleotide basecalls aligned to the reference genome after primer trimming
meanmapq_trim	String	Mean quality of the mapped reads to the reference genome after primer trimming
nextclade_aa_dels	String	Amino-acid deletions as detected by NextClade. Will be blank for Flu
nextclade_aa_dels_flu_h5n1	String	Amino-acid deletions as detected by NextClade. Specific to flu; it includes deletions for H5N1 whole genome
nextclade_aa_dels_flu_ha	String	Amino-acid deletions as detected by NextClade. Specific to flu; it includes deletions for HA segment
nextclade_aa_dels_flu_na	String	Amino-acid deletions as detected by NextClade. Specific to Flu; it includes deletions for NA segment
nextclade_aa_subs	String	Amino-acid substitutions as detected by Nextclade. Will be blank for Flu
nextclade_aa_subs_flu_h5n1	String	Amino-acid substitutions as detected by Nextclade. Specific to Flu; it includes substitutions for H5N1 whole genome
nextclade_aa_subs_flu_ha	String	Amino-acid substitutions as detected by Nextclade. Specific to Flu; it includes substitutions for HA segment
nextclade_aa_subs_flu_na	String	Amino-acid substitutions as detected by Nextclade. Specific to Flu; it includes substitutions for NA segment
nextclade_clade	String	Nextclade clade designation, will be blank for Flu.
nextclade_clade_flu_h5n1	String	Nextclade clade designation, specific to Flu 5N1 whole genome. NOTE: Output will be blank or `NA` since this nextclade dataset does assign clades
nextclade_clade_flu_ha	String	Nextclade clade designation, specific to Flu NA segment
nextclade_clade_flu_na	String	Nextclade clade designation, specific to Flu HA segment
nextclade_docker	String	Docker image used to run Nextclade
nextclade_ds_tag	String	Dataset tag used to run Nextclade. Will be blank for Flu
nextclade_ds_tag_flu_ha	String	Dataset tag used to run Nextclade, specific to Flu HA segment
nextclade_ds_tag_flu_na	String	Dataset tag used to run Nextclade, specific to Flu NA segment
nextclade_json	File	Nextclade output in JSON file format. Will be blank for Flu
nextclade_json_flu_h5n1	File	Nextclade output in JSON file format, specific to Flu H5N1 whole genome
nextclade_json_flu_ha	File	Nextclade output in JSON file format, specific to Flu HA segment
nextclade_json_flu_na	File	Nextclade output in JSON file format, specific to Flu NA segment
nextclade_lineage	String	Nextclade lineage designation
nextclade_qc	String	QC metric as determined by Nextclade. Will be blank for Flu
nextclade_qc_flu_h5n1	String	QC metric as determined by Nextclade, specific to Flu H5N1 whole genome
nextclade_qc_flu_ha	String	QC metric as determined by Nextclade, specific to Flu HA segment
nextclade_qc_flu_na	String	QC metric as determined by Nextclade, specific to Flu NA segment
nextclade_tsv	File	Nextclade output in TSV file format. Will be blank for Flu
nextclade_tsv_flu_h5n1	File	Nextclade output in TSV file format, specific to Flu H5N1 whole genome
nextclade_tsv_flu_ha	File	Nextclade output in TSV file format, specific to Flu HA segment
nextclade_tsv_flu_na	File	Nextclade output in TSV file format, specific to Flu NA segment
nextclade_version	String	The version of Nextclade software used
number_Degenerate	Int	Number of degenerate basecalls within the consensus assembly
number_N	Int	Number of fully ambiguous basecalls within the consensus assembly
number_Total	Int	Total number of nucleotides within the consensus assembly
pango_lineage	String	Pango lineage as determined by Pangolin
pango_lineage_expanded	String	Pango lineage without use of aliases; e.g., "BA.1" → "B.1.1.529.1"
pango_lineage_report	File	Full Pango lineage report generated by Pangolin
pangolin_assignment_version	String	The version of the pangolin software (e.g. PANGO or PUSHER) used for lineage assignment
pangolin_conflicts	String	Number of lineage conflicts as determined by Pangolin
pangolin_docker	String	Docker image used to run Pangolin
pangolin_notes	String	Lineage notes as determined by Pangolin
pangolin_versions	String	All Pangolin software and database versions
percent_reference_coverage	Float	Percent coverage of the reference genome after performing primer trimming; calculated as assembly_length_unambiguous / length of the reference genome (SC2: 29903) x 100
percentage_mapped_reads	String	Percentage of reads that successfully aligned to the reference genome. This value is calculated by number of mapped reads / total number of reads x 100.
primer_bed_name	String	Name of the primer bed files used for primer trimming
primer_trimmed_read_percent	Float	Percentage of read data with primers trimmed as determined by iVar trim
qc_check	String	A string that indicates whether or not the sample passes a set of pre-determined and user-provided QC thresholds
qc_standard	File	The file used in the QC Check task containing the QC thresholds.
quasitools_coverage_file	File	The coverage report created by Quasitools HyDRA
quasitools_date	String	Date of Quasitools analysis
quasitools_dr_report	File	Drug resistance report created by Quasitools HyDRA
quasitools_hydra_vcf	File	The VCF created by Quasitools HyDRA
quasitools_mutations_report	File	The mutation report created by Quasitools HyDRA
quasitools_version	String	Version of Quasitools used
read1_aligned	File	Forward read file of only aligned reads
read1_clean	File	Forward read file after quality trimming and adapter removal
read1_dehosted	File	The dehosted forward reads file; suggested read file for SRA submission
read1_unaligned	File	Forward read file of unaligned reads
read2_aligned	File	Reverse read file of only aligned reads
read2_clean	File	Reverse read file after quality trimming and adapter removal
read2_dehosted	File	The dehosted reverse reads file; suggested read file for SRA submission
read2_unaligned	File	Reverse read file of unaligned reads
read_screen_clean	String	PASS or FAIL result from clean read screening; FAIL accompanied by the reason(s) for failure
read_screen_clean_tsv	File	Clean read screening report TSV depicting read counts, total read base pairs, and estimated genome length
read_screen_raw	String	PASS or FAIL result from raw read screening; FAIL accompanied by the reason(s) for failure
read_screen_raw_tsv	File	Raw read screening report TSV depicting read counts, total read base pairs, and estimated genome length
samtools_version	String	The version of SAMtools used to sort and index the alignment file
samtools_version_consensus	String	The version of SAMtools used to create the pileup before running iVar consensus
samtools_version_primtrim	String	The version of SAMtools used to create the pileup before running iVar trim
samtools_version_stats	String	The version of SAMtools used to assess the quality of read mapping
sc2_s_gene_mean_coverage	Float	Mean read depth for the S gene in SARS-CoV-2
sc2_s_gene_percent_coverage	Float	Percent coverage of the S gene in SARS-CoV-2
seq_platform	String	Description of the sequencing methodology used to generate the input read data
sorted_bam_unaligned	File	A BAM file that only contains reads that did not align to the reference
sorted_bam_unaligned_bai	File	Index companion file to a BAM file that only contains reads that did not align to the reference
theiacov_illumina_pe_analysis_date	String	Date of analysis
theiacov_illumina_pe_version	String	Version of PHB used for running the workflow
trimmomatic_docker	String	The docker image used for the trimmomatic module in this workflow
trimmomatic_version	String	The version of Trimmomatic used
vadr_alerts_list	File	A file containing all of the fatal alerts as determined by VADR
vadr_all_outputs_tar_gz	File	A .tar.gz file (gzip-compressed tar archive file) containing all outputs from the VADR command v-annotate.pl. This file must be uncompressed & extracted to see the many files within. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description of all files present within the archive. Useful when deeply investigating a sample's genome & annotations.
vadr_classification_summary_file	File	Per-sequence tabular classification file. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#explanation-of-sqc-suffixed-output-files for more complete description.
vadr_docker	String	Docker image used to run VADR
vadr_fastas_zip_archive	File	Zip archive containing all fasta files created during VADR analysis
vadr_feature_tbl_fail	File	5 column feature table output for failing sequences. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description.
vadr_feature_tbl_pass	File	5 column feature table output for passing sequences. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description.
vadr_num_alerts	String	Number of fatal alerts as determined by VADR

Variable	Type	Description
aligned_bai	File	Index companion file to the bam file generated during the consensus assembly process
aligned_bam	File	Sorted BAM file containing the alignments of reads to the reference genome
assembly_fasta	File	Consensus genome assembly; for lower quality flu samples, the output may state "Assembly could not be generated" when there is too little and/or too low quality data for IRMA to produce an assembly. Contigs will be ordered from largest to smallest when IRMA is used.
assembly_length_unambiguous	Int	Number of unambiguous basecalls within the consensus assembly
assembly_mean_coverage	Float	Mean sequencing depth throughout the consensus assembly. Generated after performing primer trimming and calculated using the SAMtools coverage command
assembly_method	String	Method employed to generate consensus assembly
auspice_json	File	Auspice-compatable JSON output generated from Nextclade analysis that includes the Nextclade default samples for clade-typing and the single sample placed on this tree
bbduk_docker	String	The Docker image for bbduk, which was used to remove the adapters from the sequences
bwa_version	String	Version of BWA software used
consensus_flagstat	File	Output from the SAMtools flagstat command to assess quality of the alignment file (BAM)
consensus_n_variant_min_depth	Int	Minimum read depth to call variants for iVar consensus and iVar variants. Also represents the minimum consensus support threshold used by IRMA with Illumina Influenza data.
consensus_stats	File	Output from the SAMtools stats command to assess quality of the alignment file (BAM)
est_percent_gene_coverage_tsv	File	Percent coverage for each gene in the organism being analyzed (depending on the organism input)
fastp_html_report	File	The HTML report made with fastp
fastp_version	String	The version of fastp used
fastq_scan_clean1_json	File	The JSON file output from `fastq-scan` containing summary stats about clean forward read quality and length
fastq_scan_docker	String	The Docker image of fastq_scan
fastq_scan_num_reads_clean1	Int	The number of forward reads after cleaning as calculated by fastq_scan
fastq_scan_num_reads_raw1	Int	The number of input forward reads as calculated by fastq_scan
fastq_scan_r1_mean_q_clean	Float	The average quality of forward reads after cleaning as calculated by fastq_scan
fastq_scan_r1_mean_q_raw	Float	The average quality of forward reads as calculated by fastq_scan
fastq_scan_r1_mean_readlength_clean	Float	The average read length of forward reads after cleaning as calculated by fastq_scan
fastq_scan_r1_mean_readlength_raw	Float	The average read length of forward reads as calculated by fastq_scan
fastq_scan_raw1_json	File	The JSON file output from `fastq-scan` containing summary stats about raw forward read quality and length
fastq_scan_version	String	The version of fastq_scan
fastqc_clean1_html	File	An HTML file that provides a graphical visualization of clean forward read quality from fastqc to open in an internet browser
fastqc_docker	String	The Docker container used for fastqc
fastqc_num_reads_clean1	Int	The number of forward reads after cleaning by fastqc
fastqc_num_reads_raw1	Int	The number of input forward reads by fastqc before cleaning
fastqc_raw1_html	File	An HTML file that provides a graphical visualization of raw forward read quality from fastqc to open in an internet browser
fastqc_version	String	Version of fastqc software used
ivar_tsv	File	Variant descriptor file generated by iVar variants
ivar_variant_proportion_intermediate	String	The proportion of variants of intermediate frequency
ivar_variant_version	String	Version of iVar for running the iVar variants command
ivar_vcf	File	iVar tsv output converted to VCF format
ivar_version_consensus	String	Version of iVar for running the iVar consensus command
ivar_version_primtrim	String	Version of iVar for running the iVar trim command
kraken_human	Float	Percent of human read data detected using the Kraken2 software
kraken_human_dehosted	Float	Percent of human read data detected using the Kraken2 software after host removal
kraken_report	File	Full Kraken report
kraken_report_dehosted	File	Full Kraken report after host removal
kraken_sc2	String	Percent of SARS-CoV-2 read data detected using the Kraken2 software
kraken_sc2_dehosted	String	Percent of SARS-CoV-2 read data detected using the Kraken2 software after host removal
kraken_target_organism	String	Percent of target organism read data detected using the Kraken2 software
kraken_target_organism_dehosted	String	Percent of target organism read data detected using the Kraken2 software after host removal
kraken_target_organism_name	String	The name of the target organism; e.g., "Monkeypox" or "Human immunodeficiency virus"
kraken_version	String	Version of Kraken software used
meanbaseq_trim	Float	Mean quality of the nucleotide basecalls aligned to the reference genome after primer trimming
meanmapq_trim	Float	Mean quality of the mapped reads to the reference genome after primer trimming
nextclade_aa_dels	String	Amino-acid deletions as detected by NextClade. Will be blank for Flu
nextclade_aa_subs	String	Amino-acid substitutions as detected by Nextclade. Will be blank for Flu
nextclade_clade	String	Nextclade clade designation, will be blank for Flu.
nextclade_docker	String	Docker image used to run Nextclade
nextclade_ds_tag	String	Dataset tag used to run Nextclade. Will be blank for Flu
nextclade_json	File	Nextclade output in JSON file format. Will be blank for Flu
nextclade_lineage	String	Nextclade lineage designation
nextclade_qc	String	QC metric as determined by Nextclade. Will be blank for Flu
nextclade_tsv	File	Nextclade output in TSV file format. Will be blank for Flu
nextclade_version	String	The version of Nextclade software used
number_Degenerate	Int	Number of degenerate basecalls within the consensus assembly
number_N	Int	Number of fully ambiguous basecalls within the consensus assembly
number_Total	Int	Total number of nucleotides within the consensus assembly
pango_lineage	String	Pango lineage as determined by Pangolin
pango_lineage_expanded	String	Pango lineage without use of aliases; e.g., "BA.1" → "B.1.1.529.1"
pango_lineage_report	File	Full Pango lineage report generated by Pangolin
pangolin_assignment_version	String	The version of the pangolin software (e.g. PANGO or PUSHER) used for lineage assignment
pangolin_conflicts	String	Number of lineage conflicts as determined by Pangolin
pangolin_docker	String	Docker image used to run Pangolin
pangolin_notes	String	Lineage notes as determined by Pangolin
pangolin_versions	String	All Pangolin software and database versions
percent_reference_coverage	Float	Percent coverage of the reference genome after performing primer trimming; calculated as assembly_length_unambiguous / length of the reference genome (SC2: 29903) x 100
percentage_mapped_reads	String	Percentage of reads that successfully aligned to the reference genome. This value is calculated by number of mapped reads / total number of reads x 100.
primer_bed_name	String	Name of the primer bed files used for primer trimming
primer_trimmed_read_percent	Float	Percentage of read data with primers trimmed as determined by iVar trim
qc_check	String	A string that indicates whether or not the sample passes a set of pre-determined and user-provided QC thresholds
qc_standard	File	The file used in the QC Check task containing the QC thresholds.
read1_aligned	File	Forward read file of only aligned reads
read1_clean	File	Forward read file after quality trimming and adapter removal
read1_unaligned	File	Forward read file of unaligned reads
read_screen_clean	String	PASS or FAIL result from clean read screening; FAIL accompanied by the reason(s) for failure
read_screen_clean_tsv	File	Clean read screening report TSV depicting read counts, total read base pairs, and estimated genome length
read_screen_raw	String	PASS or FAIL result from raw read screening; FAIL accompanied by the reason(s) for failure
read_screen_raw_tsv	File	Raw read screening report TSV depicting read counts, total read base pairs, and estimated genome length
samtools_version	String	The version of SAMtools used to sort and index the alignment file
samtools_version_consensus	String	The version of SAMtools used to create the pileup before running iVar consensus
samtools_version_primtrim	String	The version of SAMtools used to create the pileup before running iVar trim
samtools_version_stats	String	The version of SAMtools used to assess the quality of read mapping
sc2_s_gene_mean_coverage	Float	Mean read depth for the S gene in SARS-CoV-2
sc2_s_gene_percent_coverage	Float	Percent coverage of the S gene in SARS-CoV-2
seq_platform	String	Description of the sequencing methodology used to generate the input read data
sorted_bam_unaligned	File	A BAM file that only contains reads that did not align to the reference
sorted_bam_unaligned_bai	File	Index companion file to a BAM file that only contains reads that did not align to the reference
theiacov_illumina_se_analysis_date	String	Date of analysis
theiacov_illumina_se_version	String	Version of PHB used for running the workflow
trimmomatic_docker	String	The docker image used for the trimmomatic module in this workflow
trimmomatic_version	String	The version of Trimmomatic used
vadr_alerts_list	File	A file containing all of the fatal alerts as determined by VADR
vadr_all_outputs_tar_gz	File	A .tar.gz file (gzip-compressed tar archive file) containing all outputs from the VADR command v-annotate.pl. This file must be uncompressed & extracted to see the many files within. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description of all files present within the archive. Useful when deeply investigating a sample's genome & annotations.
vadr_classification_summary_file	File	Per-sequence tabular classification file. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#explanation-of-sqc-suffixed-output-files for more complete description.
vadr_docker	String	Docker image used to run VADR
vadr_fastas_zip_archive	File	Zip archive containing all fasta files created during VADR analysis
vadr_feature_tbl_fail	File	5 column feature table output for failing sequences. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description.
vadr_feature_tbl_pass	File	5 column feature table output for passing sequences. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description.
vadr_num_alerts	String	Number of fatal alerts as determined by VADR

Variable	Type	Description
abricate_flu_database	String	ABRicate database used for analysis
abricate_flu_results	File	File containing all results from ABRicate
abricate_flu_subtype	String	Flu subtype as determined by ABRicate
abricate_flu_type	String	Flu type as determined by ABRicate
abricate_flu_version	String	Version of ABRicate
aligned_bai	File	Index companion file to the bam file generated during the consensus assembly process
aligned_bam	File	Sorted BAM file containing the alignments of reads to the reference genome
artic_docker	String	Docker image utilized for read trimming and consensus genome assembly
artic_version	String	Version of the Artic software utilized for read trimming and conesnsus genome assembly
assembly_fasta	String	Consensus genome assembly; for lower quality flu samples, the output may state "Assembly could not be generated" when there is too little and/or too low quality data for IRMA to produce an assembly. Contigs will be ordered from largest to smallest when IRMA is used.
assembly_length_unambiguous	Int	Number of unambiguous basecalls within the consensus assembly
assembly_mean_coverage	String	Mean sequencing depth throughout the consensus assembly. Generated after performing primer trimming and calculated using the SAMtools coverage command
assembly_method	String	Method employed to generate consensus assembly
auspice_json	File	Auspice-compatable JSON output generated from Nextclade analysis that includes the Nextclade default samples for clade-typing and the single sample placed on this tree
auspice_json_flu_h5n1	File	Auspice-compatable JSON output generated from Nextclade analysis on Influenza H5N1 whole genome that includes the samples included in the "avian-flu/h5n1-cattle-outbreak" nextstrain build that is focused on B3.13 genotype and the single sample placed on this tree
auspice_json_flu_ha	File	Auspice-compatable JSON output generated from Nextclade analysis on Influenza HA segment that includes the Nextclade default samples for clade-typing and the single sample placed on this tree
auspice_json_flu_na	File	Auspice-compatable JSON output generated from Nextclade analysis on Influenza NA segment that includes the Nextclade default samples for clade-typing and the single sample placed on this tree
consensus_flagstat	File	Output from the SAMtools flagstat command to assess quality of the alignment file (BAM)
consensus_stats	File	Output from the SAMtools stats command to assess quality of the alignment file (BAM)
est_coverage_clean	Float	Estimated coverage calculated from clean reads and genome length
est_coverage_raw	Float	Estimated coverage calculated from raw reads and genome length
est_percent_gene_coverage_tsv	File	Percent coverage for each gene in the organism being analyzed (depending on the organism input)
flu_A_315675_resistance	String	resistance mutations to A_315675
flu_L_742_001_resistance	String	resistance mutations to L_742_001
flu_amantadine_resistance	String	resistance mutations to amantadine
flu_compound_367_resistance	String	resistance mutations to compound_367
flu_favipiravir_resistance	String	resistance mutations to favipiravir
flu_fludase_resistance	String	resistance mutations to fludase
flu_laninamivir_resistance	String	resistance mutations to laninamivir
flu_oseltamivir_resistance	String	resistance mutations to oseltamivir (Tamiflu®)
flu_peramivir_resistance	String	resistance mutations to peramivir (Rapivab®)
flu_pimodivir_resistance	String	resistance mutations to pimodivir
flu_rimantadine_resistance	String	resistance mutations to rimantadine
flu_xofluza_resistance	String	resistance mutations to xofluza (Baloxavir marboxil)
flu_zanamivir_resistance	String	resistance mutations to zanamivir (Relenza®)
genoflu_all_segments	String	The genotypes for each individual flu segment
genoflu_genotype	String	The genotype of the whole genome, based off of the individual segments types
genoflu_output_tsv	File	The output file from GenoFLU
genoflu_version	String	The version of GenoFLU used
irma_all_deletions_tsv	File	Concatenated TSV file of all deletions identified by IRMA
irma_all_insertions_tsv	File	Concatenated TSV file of all insertions identified by IRMA
irma_all_snvs_tsv	File	Concatenated TSV file of all SNVs identified by IRMA
irma_assembly_fasta_concatenated	File	Assembly FASTA file of all Influenza genome segments concatenated into one sequence/FASTA entry
irma_bams	Array[File]	Aligned reads from IRMA
irma_docker	String	Docker image used to run IRMA
irma_ha_segment_fasta	File	HA (Haemagglutinin) assembly fasta file
irma_min_consensus_support_threshold	Int	Minimum consensus support threshold used by IRMA with ONT data. For illumina data, see output called `consensus_n_variant_min_depth` for this value
irma_mp_segment_fasta	File	MP (Matrix Protein) assembly fasta file
irma_na_segment_fasta	File	NA (Neuraminidase) assembly fasta file
irma_np_segment_fasta	File	NP (Nucleoprotein) assembly fasta file
irma_ns_segment_fasta	File	NS (Nonstructural) assembly fasta file
irma_pa_segment_fasta	File	PA (Polymerase acidic) assembly fasta file
irma_pb1_segment_fasta	File	PB1 (Polymerase basic 1) assembly fasta file
irma_pb2_segment_fasta	File	PB2 (Polymerase basic 2) assembly fasta file
irma_qc_summary_tsv	File	TSV file summarizing IRMA quality control metrics
irma_subtype	String	Flu subtype as determined by IRMA
irma_subtype_notes	String	Helpful note to user about Flu B subtypes. Output will be blank for Flu A samples. For Flu B samples it will state: "IRMA does not differentiate Victoria and Yamagata Flu B lineages. See abricate_flu_subtype output column"
irma_type	String	Flu type as determined by IRMA
irma_version	String	Version of IRMA used
kraken_human	Float	Percent of human read data detected using the Kraken2 software
kraken_human_dehosted	Float	Percent of human read data detected using the Kraken2 software after host removal
kraken_report	File	Full Kraken report
kraken_report_dehosted	File	Full Kraken report after host removal
kraken_sc2	String	Percent of SARS-CoV-2 read data detected using the Kraken2 software
kraken_sc2_dehosted	String	Percent of SARS-CoV-2 read data detected using the Kraken2 software after host removal
kraken_target_organism	String	Percent of target organism read data detected using the Kraken2 software
kraken_target_organism_dehosted	String	Percent of target organism read data detected using the Kraken2 software after host removal
kraken_target_organism_name	String	The name of the target organism; e.g., "Monkeypox" or "Human immunodeficiency virus"
kraken_version	String	Version of Kraken software used
meanbaseq_trim	Float	Mean quality of the nucleotide basecalls aligned to the reference genome after primer trimming
meanmapq_trim	Float	Mean quality of the mapped reads to the reference genome after primer trimming
medaka_reference	String	Reference sequence used in medaka task
medaka_vcf	File	A VCF file containing the identified variants
nanoplot_docker	String	Docker image for nanoplot
nanoplot_html_clean	File	An HTML report describing the clean reads
nanoplot_html_raw	File	An HTML report describing the raw reads
nanoplot_num_reads_clean1	Int	Number of clean reads
nanoplot_num_reads_raw1	Int	Number of raw reads
nanoplot_r1_est_coverage_clean	Float	Estimated coverage on the clean reads by nanoplot
nanoplot_r1_est_coverage_raw	Float	Estimated coverage on the raw reads by nanoplot
nanoplot_r1_mean_q_clean	Float	Mean quality score of clean forward reads
nanoplot_r1_mean_q_raw	Float	Mean quality score of raw forward reads
nanoplot_r1_mean_readlength_clean	Float	Mean read length of clean forward reads
nanoplot_r1_mean_readlength_raw	Float	Mean read length of raw forward reads
nanoplot_r1_median_q_clean	Float	Median quality score of clean forward reads
nanoplot_r1_median_q_raw	Float	Median quality score of raw forward reads
nanoplot_r1_median_readlength_clean	Float	Median read length of clean forward reads
nanoplot_r1_median_readlength_raw	Float	Median read length of raw forward reads
nanoplot_r1_n50_clean	Float	N50 of clean forward reads
nanoplot_r1_n50_raw	Float	N50 of raw forward reads
nanoplot_r1_stdev_readlength_clean	Float	Standard deviation read length of clean forward reads
nanoplot_r1_stdev_readlength_raw	Float	Standard deviation read length of raw forward reads
nanoplot_tsv_clean	File	A TSV report describing the clean reads
nanoplot_tsv_raw	File	A TSV report describing the raw reads
nanoplot_version	String	Version of nanoplot used for analysis
nextclade_aa_dels	String	Amino-acid deletions as detected by NextClade. Will be blank for Flu
nextclade_aa_dels_flu_h5n1	String	Amino-acid deletions as detected by NextClade. Specific to flu; it includes deletions for H5N1 whole genome
nextclade_aa_dels_flu_ha	String	Amino-acid deletions as detected by NextClade. Specific to flu; it includes deletions for HA segment
nextclade_aa_dels_flu_na	String	Amino-acid deletions as detected by NextClade. Specific to Flu; it includes deletions for NA segment
nextclade_aa_subs	String	Amino-acid substitutions as detected by Nextclade. Will be blank for Flu
nextclade_aa_subs_flu_h5n1	String	Amino-acid substitutions as detected by Nextclade. Specific to Flu; it includes substitutions for H5N1 whole genome
nextclade_aa_subs_flu_ha	String	Amino-acid substitutions as detected by Nextclade. Specific to Flu; it includes substitutions for HA segment
nextclade_aa_subs_flu_na	String	Amino-acid substitutions as detected by Nextclade. Specific to Flu; it includes substitutions for NA segment
nextclade_clade	String	Nextclade clade designation, will be blank for Flu.
nextclade_clade_flu_h5n1	String	Nextclade clade designation, specific to Flu 5N1 whole genome. NOTE: Output will be blank or `NA` since this nextclade dataset does assign clades
nextclade_clade_flu_ha	String	Nextclade clade designation, specific to Flu NA segment
nextclade_clade_flu_na	String	Nextclade clade designation, specific to Flu HA segment
nextclade_docker	String	Docker image used to run Nextclade
nextclade_ds_tag	String	Dataset tag used to run Nextclade. Will be blank for Flu
nextclade_ds_tag_flu_ha	String	Dataset tag used to run Nextclade, specific to Flu HA segment
nextclade_ds_tag_flu_na	String	Dataset tag used to run Nextclade, specific to Flu NA segment
nextclade_json	File	Nextclade output in JSON file format. Will be blank for Flu
nextclade_json_flu_h5n1	File	Nextclade output in JSON file format, specific to Flu H5N1 whole genome
nextclade_json_flu_ha	File	Nextclade output in JSON file format, specific to Flu HA segment
nextclade_json_flu_na	File	Nextclade output in JSON file format, specific to Flu NA segment
nextclade_lineage	String	Nextclade lineage designation
nextclade_qc	String	QC metric as determined by Nextclade. Will be blank for Flu
nextclade_qc_flu_h5n1	String	QC metric as determined by Nextclade, specific to Flu H5N1 whole genome
nextclade_qc_flu_ha	String	QC metric as determined by Nextclade, specific to Flu HA segment
nextclade_qc_flu_na	String	QC metric as determined by Nextclade, specific to Flu NA segment
nextclade_tsv	File	Nextclade output in TSV file format. Will be blank for Flu
nextclade_tsv_flu_h5n1	File	Nextclade output in TSV file format, specific to Flu H5N1 whole genome
nextclade_tsv_flu_ha	File	Nextclade output in TSV file format, specific to Flu HA segment
nextclade_tsv_flu_na	File	Nextclade output in TSV file format, specific to Flu NA segment
nextclade_version	String	The version of Nextclade software used
number_Degenerate	Int	Number of degenerate basecalls within the consensus assembly
number_N	Int	Number of fully ambiguous basecalls within the consensus assembly
number_Total	Int	Total number of nucleotides within the consensus assembly
pango_lineage	String	Pango lineage as determined by Pangolin
pango_lineage_expanded	String	Pango lineage without use of aliases; e.g., "BA.1" → "B.1.1.529.1"
pango_lineage_report	File	Full Pango lineage report generated by Pangolin
pangolin_assignment_version	String	The version of the pangolin software (e.g. PANGO or PUSHER) used for lineage assignment
pangolin_conflicts	String	Number of lineage conflicts as determined by Pangolin
pangolin_docker	String	Docker image used to run Pangolin
pangolin_notes	String	Lineage notes as determined by Pangolin
pangolin_versions	String	All Pangolin software and database versions
percent_reference_coverage	Float	Percent coverage of the reference genome after performing primer trimming; calculated as assembly_length_unambiguous / length of the reference genome (SC2: 29903) x 100
percentage_mapped_reads	String	Percentage of reads that successfully aligned to the reference genome. This value is calculated by number of mapped reads / total number of reads x 100.
primer_bed_name	String	Name of the primer bed files used for primer trimming
qc_check	String	A string that indicates whether or not the sample passes a set of pre-determined and user-provided QC thresholds
qc_standard	File	The file used in the QC Check task containing the QC thresholds.
quasitools_coverage_file	File	The coverage report created by Quasitools HyDRA
quasitools_date	String	Date of Quasitools analysis
quasitools_dr_report	File	Drug resistance report created by Quasitools HyDRA
quasitools_hydra_vcf	File	The VCF created by Quasitools HyDRA
quasitools_mutations_report	File	The mutation report created by Quasitools HyDRA
quasitools_version	String	Version of Quasitools used
read1_aligned	File	Forward read file of only aligned reads
read1_dehosted	File	The dehosted forward reads file; suggested read file for SRA submission
read1_trimmed	File	Forward read file after quality trimming and adapter removal
read_screen_clean	String	PASS or FAIL result from clean read screening; FAIL accompanied by the reason(s) for failure
read_screen_clean_tsv	File	Clean read screening report TSV depicting read counts, total read base pairs, and estimated genome length
read_screen_raw	String	PASS or FAIL result from raw read screening; FAIL accompanied by the reason(s) for failure
read_screen_raw_tsv	File	Raw read screening report TSV depicting read counts, total read base pairs, and estimated genome length
samtools_version	String	The version of SAMtools used to sort and index the alignment file
sc2_s_gene_mean_coverage	Float	Mean read depth for the S gene in SARS-CoV-2
sc2_s_gene_percent_coverage	Float	Percent coverage of the S gene in SARS-CoV-2
seq_platform	String	Description of the sequencing methodology used to generate the input read data
theiacov_ont_analysis_date	String	Date of analysis
theiacov_ont_version	String	Version of PHB used for running the workflow
vadr_alerts_list	File	A file containing all of the fatal alerts as determined by VADR
vadr_all_outputs_tar_gz	File	A .tar.gz file (gzip-compressed tar archive file) containing all outputs from the VADR command v-annotate.pl. This file must be uncompressed & extracted to see the many files within. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description of all files present within the archive. Useful when deeply investigating a sample's genome & annotations.
vadr_classification_summary_file	File	Per-sequence tabular classification file. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#explanation-of-sqc-suffixed-output-files for more complete description.
vadr_docker	String	Docker image used to run VADR
vadr_fastas_zip_archive	File	Zip archive containing all fasta files created during VADR analysis
vadr_feature_tbl_fail	File	5 column feature table output for failing sequences. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description.
vadr_feature_tbl_pass	File	5 column feature table output for passing sequences. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description.
vadr_num_alerts	String	Number of fatal alerts as determined by VADR

Variable	Type	Description
abricate_flu_database	String	ABRicate database used for analysis
abricate_flu_results	File	File containing all results from ABRicate
abricate_flu_subtype	String	Flu subtype as determined by ABRicate
abricate_flu_type	String	Flu type as determined by ABRicate
abricate_flu_version	String	Version of ABRicate
assembly_length_unambiguous	Int	Number of unambiguous basecalls within the consensus assembly
assembly_method	String	Method employed to generate consensus assembly
auspice_json	File	Auspice-compatable JSON output generated from Nextclade analysis that includes the Nextclade default samples for clade-typing and the single sample placed on this tree
auspice_json_flu_h5n1	File	Auspice-compatable JSON output generated from Nextclade analysis on Influenza H5N1 whole genome that includes the samples included in the "avian-flu/h5n1-cattle-outbreak" nextstrain build that is focused on B3.13 genotype and the single sample placed on this tree
auspice_json_flu_ha	File	Auspice-compatable JSON output generated from Nextclade analysis on Influenza HA segment that includes the Nextclade default samples for clade-typing and the single sample placed on this tree
auspice_json_flu_na	File	Auspice-compatable JSON output generated from Nextclade analysis on Influenza NA segment that includes the Nextclade default samples for clade-typing and the single sample placed on this tree
flu_A_315675_resistance	String	resistance mutations to A_315675
flu_L_742_001_resistance	String	resistance mutations to L_742_001
flu_amantadine_resistance	String	resistance mutations to amantadine
flu_compound_367_resistance	String	resistance mutations to compound_367
flu_favipiravir_resistance	String	resistance mutations to favipiravir
flu_fludase_resistance	String	resistance mutations to fludase
flu_laninamivir_resistance	String	resistance mutations to laninamivir
flu_oseltamivir_resistance	String	resistance mutations to oseltamivir (Tamiflu®)
flu_peramivir_resistance	String	resistance mutations to peramivir (Rapivab®)
flu_pimodivir_resistance	String	resistance mutations to pimodivir
flu_rimantadine_resistance	String	resistance mutations to rimantadine
flu_xofluza_resistance	String	resistance mutations to xofluza (Baloxavir marboxil)
flu_zanamivir_resistance	String	resistance mutations to zanamivir (Relenza®)
genoflu_all_segments	String	The genotypes for each individual flu segment
genoflu_genotype	String	The genotype of the whole genome, based off of the individual segments types
genoflu_output_tsv	File	The output file from GenoFLU
genoflu_version	String	The version of GenoFLU used
nextclade_aa_dels	String	Amino-acid deletions as detected by NextClade. Will be blank for Flu
nextclade_aa_dels_flu_h5n1	String	Amino-acid deletions as detected by NextClade. Specific to flu; it includes deletions for H5N1 whole genome
nextclade_aa_dels_flu_ha	String	Amino-acid deletions as detected by NextClade. Specific to flu; it includes deletions for HA segment
nextclade_aa_dels_flu_na	String	Amino-acid deletions as detected by NextClade. Specific to Flu; it includes deletions for NA segment
nextclade_aa_subs	String	Amino-acid substitutions as detected by Nextclade. Will be blank for Flu
nextclade_aa_subs_flu_h5n1	String	Amino-acid substitutions as detected by Nextclade. Specific to Flu; it includes substitutions for H5N1 whole genome
nextclade_aa_subs_flu_ha	String	Amino-acid substitutions as detected by Nextclade. Specific to Flu; it includes substitutions for HA segment
nextclade_aa_subs_flu_na	String	Amino-acid substitutions as detected by Nextclade. Specific to Flu; it includes substitutions for NA segment
nextclade_clade	String	Nextclade clade designation, will be blank for Flu.
nextclade_clade_flu_h5n1	String	Nextclade clade designation, specific to Flu 5N1 whole genome. NOTE: Output will be blank or `NA` since this nextclade dataset does assign clades
nextclade_clade_flu_ha	String	Nextclade clade designation, specific to Flu NA segment
nextclade_clade_flu_na	String	Nextclade clade designation, specific to Flu HA segment
nextclade_docker	String	Docker image used to run Nextclade
nextclade_ds_tag	String	Dataset tag used to run Nextclade. Will be blank for Flu
nextclade_ds_tag_flu_ha	String	Dataset tag used to run Nextclade, specific to Flu HA segment
nextclade_ds_tag_flu_na	String	Dataset tag used to run Nextclade, specific to Flu NA segment
nextclade_json	File	Nextclade output in JSON file format. Will be blank for Flu
nextclade_json_flu_h5n1	File	Nextclade output in JSON file format, specific to Flu H5N1 whole genome
nextclade_json_flu_ha	File	Nextclade output in JSON file format, specific to Flu HA segment
nextclade_json_flu_na	File	Nextclade output in JSON file format, specific to Flu NA segment
nextclade_lineage	String	Nextclade lineage designation
nextclade_qc	String	QC metric as determined by Nextclade. Will be blank for Flu
nextclade_qc_flu_h5n1	String	QC metric as determined by Nextclade, specific to Flu H5N1 whole genome
nextclade_qc_flu_ha	String	QC metric as determined by Nextclade, specific to Flu HA segment
nextclade_qc_flu_na	String	QC metric as determined by Nextclade, specific to Flu NA segment
nextclade_tsv	File	Nextclade output in TSV file format. Will be blank for Flu
nextclade_tsv_flu_h5n1	File	Nextclade output in TSV file format, specific to Flu H5N1 whole genome
nextclade_tsv_flu_ha	File	Nextclade output in TSV file format, specific to Flu HA segment
nextclade_tsv_flu_na	File	Nextclade output in TSV file format, specific to Flu NA segment
nextclade_version	String	The version of Nextclade software used
number_Degenerate	Int	Number of degenerate basecalls within the consensus assembly
number_N	Int	Number of fully ambiguous basecalls within the consensus assembly
number_Total	Int	Total number of nucleotides within the consensus assembly
pango_lineage	String	Pango lineage as determined by Pangolin
pango_lineage_expanded	String	Pango lineage without use of aliases; e.g., "BA.1" → "B.1.1.529.1"
pango_lineage_report	File	Full Pango lineage report generated by Pangolin
pangolin_assignment_version	String	The version of the pangolin software (e.g. PANGO or PUSHER) used for lineage assignment
pangolin_conflicts	String	Number of lineage conflicts as determined by Pangolin
pangolin_docker	String	Docker image used to run Pangolin
pangolin_notes	String	Lineage notes as determined by Pangolin
pangolin_versions	String	All Pangolin software and database versions
percent_reference_coverage	Float	Percent coverage of the reference genome after performing primer trimming; calculated as assembly_length_unambiguous / length of the reference genome (SC2: 29903) x 100
qc_check	String	A string that indicates whether or not the sample passes a set of pre-determined and user-provided QC thresholds
qc_standard	File	The file used in the QC Check task containing the QC thresholds.
seq_platform	String	Description of the sequencing methodology used to generate the input read data
theiacov_fasta_analysis_date	String	Date of analysis
theiacov_fasta_version	String	Version of PHB used for running the workflow
vadr_alerts_list	File	A file containing all of the fatal alerts as determined by VADR
vadr_all_outputs_tar_gz	File	A .tar.gz file (gzip-compressed tar archive file) containing all outputs from the VADR command v-annotate.pl. This file must be uncompressed & extracted to see the many files within. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description of all files present within the archive. Useful when deeply investigating a sample's genome & annotations.
vadr_classification_summary_file	File	Per-sequence tabular classification file. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#explanation-of-sqc-suffixed-output-files for more complete description.
vadr_docker	String	Docker image used to run VADR
vadr_fastas_zip_archive	File	Zip archive containing all fasta files created during VADR analysis
vadr_feature_tbl_fail	File	5 column feature table output for failing sequences. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description.
vadr_feature_tbl_pass	File	5 column feature table output for passing sequences. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description.
vadr_flu_ha_segment_fasta	File	HA (Haemagglutinin) assembly fasta file
vadr_flu_mp_segment_fasta	File	MP (Matrix Protein) assembly fasta file
vadr_flu_na_segment_fasta	File	NA (Neuraminidase) assembly fasta file
vadr_flu_np_segment_fasta	File	NP (Nucleoprotein) assembly fasta file
vadr_flu_ns_segment_fasta	File	NS (Nonstructural) assembly fasta file
vadr_flu_pa_segment_fasta	File	PA (Polymerase acidic) assembly fasta file
vadr_flu_pb1_segment_fasta	File	PB1 (Polymerase basic 1) assembly fasta file
vadr_flu_pb2_segment_fasta	File	PB2 (Polymerase basic 2) assembly fasta file
vadr_flu_segment_concatenated_fasta	File	Assembly FASTA file of all Influenza genome segments concatenated into one sequence/FASTA entry
vadr_num_alerts	String	Number of fatal alerts as determined by VADR

Variable	Type	Description
aligned_bai	File	Index companion file to the bam file generated during the consensus assembly process
aligned_bam	File	Sorted BAM file containing the alignments of reads to the reference genome
artic_docker	String	Docker image utilized for read trimming and consensus genome assembly
artic_version	String	Version of the Artic software utilized for read trimming and conesnsus genome assembly
assembly_fasta	File	Consensus genome assembly; for lower quality flu samples, the output may state "Assembly could not be generated" when there is too little and/or too low quality data for IRMA to produce an assembly. Contigs will be ordered from largest to smallest when IRMA is used.
assembly_length_unambiguous	Int	Number of unambiguous basecalls within the consensus assembly
assembly_mean_coverage	Float	Mean sequencing depth throughout the consensus assembly. Generated after performing primer trimming and calculated using the SAMtools coverage command
assembly_method	String	Method employed to generate consensus assembly
auspice_json	File	Auspice-compatable JSON output generated from Nextclade analysis that includes the Nextclade default samples for clade-typing and the single sample placed on this tree
consensus_flagstat	File	Output from the SAMtools flagstat command to assess quality of the alignment file (BAM)
consensus_stats	File	Output from the SAMtools stats command to assess quality of the alignment file (BAM)
est_percent_gene_coverage_tsv	File	Percent coverage for each gene in the organism being analyzed (depending on the organism input)
fastq_scan_clean1_json	File	The JSON file output from `fastq-scan` containing summary stats about clean forward read quality and length
fastq_scan_num_reads_clean1	Int	The number of forward reads after cleaning as calculated by fastq_scan
fastq_scan_num_reads_raw1	Int	The number of input forward reads as calculated by fastq_scan
fastq_scan_raw1_json	File	The JSON file output from `fastq-scan` containing summary stats about raw forward read quality and length
fastq_scan_version	String	The version of fastq_scan
kraken_human	Float	Percent of human read data detected using the Kraken2 software
kraken_human_dehosted	Float	Percent of human read data detected using the Kraken2 software after host removal
kraken_report	File	Full Kraken report
kraken_report_dehosted	File	Full Kraken report after host removal
kraken_sc2	String	Percent of SARS-CoV-2 read data detected using the Kraken2 software
kraken_sc2_dehosted	String	Percent of SARS-CoV-2 read data detected using the Kraken2 software after host removal
kraken_target_organism	String	Percent of target organism read data detected using the Kraken2 software
kraken_target_organism_dehosted	String	Percent of target organism read data detected using the Kraken2 software after host removal
kraken_target_organism_name	String	The name of the target organism; e.g., "Monkeypox" or "Human immunodeficiency virus"
kraken_version	String	Version of Kraken software used
meanbaseq_trim	Float	Mean quality of the nucleotide basecalls aligned to the reference genome after primer trimming
meanmapq_trim	Float	Mean quality of the mapped reads to the reference genome after primer trimming
medaka_reference	String	Reference sequence used in medaka task
nextclade_aa_dels	String	Amino-acid deletions as detected by NextClade. Will be blank for Flu
nextclade_aa_subs	String	Amino-acid substitutions as detected by Nextclade. Will be blank for Flu
nextclade_clade	String	Nextclade clade designation, will be blank for Flu.
nextclade_docker	String	Docker image used to run Nextclade
nextclade_ds_tag	String	Dataset tag used to run Nextclade. Will be blank for Flu
nextclade_json	File	Nextclade output in JSON file format. Will be blank for Flu
nextclade_lineage	String	Nextclade lineage designation
nextclade_qc	String	QC metric as determined by Nextclade. Will be blank for Flu
nextclade_tsv	File	Nextclade output in TSV file format. Will be blank for Flu
nextclade_version	String	The version of Nextclade software used
number_Degenerate	Int	Number of degenerate basecalls within the consensus assembly
number_N	Int	Number of fully ambiguous basecalls within the consensus assembly
number_Total	Int	Total number of nucleotides within the consensus assembly
pango_lineage	String	Pango lineage as determined by Pangolin
pango_lineage_expanded	String	Pango lineage without use of aliases; e.g., "BA.1" → "B.1.1.529.1"
pango_lineage_report	File	Full Pango lineage report generated by Pangolin
pangolin_assignment_version	String	The version of the pangolin software (e.g. PANGO or PUSHER) used for lineage assignment
pangolin_conflicts	String	Number of lineage conflicts as determined by Pangolin
pangolin_docker	String	Docker image used to run Pangolin
pangolin_notes	String	Lineage notes as determined by Pangolin
pangolin_versions	String	All Pangolin software and database versions
percent_reference_coverage	Float	Percent coverage of the reference genome after performing primer trimming; calculated as assembly_length_unambiguous / length of the reference genome (SC2: 29903) x 100
percentage_mapped_reads	Float	Percentage of reads that successfully aligned to the reference genome. This value is calculated by number of mapped reads / total number of reads x 100.
primer_bed_name	String	Name of the primer bed files used for primer trimming
qc_check	String	A string that indicates whether or not the sample passes a set of pre-determined and user-provided QC thresholds
qc_standard	File	The file used in the QC Check task containing the QC thresholds.
read1_aligned	File	Forward read file of only aligned reads
read1_dehosted	File	The dehosted forward reads file; suggested read file for SRA submission
samtools_version_stats	String	The version of SAMtools used to assess the quality of read mapping
sc2_s_gene_mean_coverage	Float	Mean read depth for the S gene in SARS-CoV-2
sc2_s_gene_percent_coverage	Float	Percent coverage of the S gene in SARS-CoV-2
seq_platform	String	Description of the sequencing methodology used to generate the input read data
theiacov_clearlabs_analysis_date	String	Date of analysis
theiacov_clearlabs_version	String	Version of PHB used for running the workflow
vadr_alerts_list	File	A file containing all of the fatal alerts as determined by VADR
vadr_all_outputs_tar_gz	File	A .tar.gz file (gzip-compressed tar archive file) containing all outputs from the VADR command v-annotate.pl. This file must be uncompressed & extracted to see the many files within. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description of all files present within the archive. Useful when deeply investigating a sample's genome & annotations.
vadr_classification_summary_file	File	Per-sequence tabular classification file. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#explanation-of-sqc-suffixed-output-files for more complete description.
vadr_docker	String	Docker image used to run VADR
vadr_fastas_zip_archive	File	Zip archive containing all fasta files created during VADR analysis
vadr_feature_tbl_fail	File	5 column feature table output for failing sequences. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description.
vadr_feature_tbl_pass	File	5 column feature table output for passing sequences. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description.
vadr_num_alerts	String	Number of fatal alerts as determined by VADR
variants_from_ref_vcf	File	Number of variants relative to the reference genome

Overwrite Warning

TheiaCoV_FASTA_Batch_PHB workflow will output results to the set-level data table in addition to overwriting the Pangolin & Nextclade output columns in the sample-level data table. Users can view the set-level workflow output TSV file called "Datatable" to view exactly which columns were overwritten in the sample-level data table.

Variable	Type	Description
datatable	File	Sample-level data table TSV file that was used to update the original sample-level data table in the last step of the TheiaCoV_FASTA_Batch workflow.
nextclade_json	File	Nextclade output in JSON file format. Will be blank for Flu
nextclade_tsv	File	Nextclade output in TSV file format. Will be blank for Flu
pango_lineage_report	File	Full Pango lineage report generated by Pangolin
theiacov_fasta_batch_analysis_date	String	Date that the workflow was run.
theiacov_fasta_batch_version	String	Version of the workflow that was used.