TheiaProk Workflow Series¶

Quick Facts¶

Workflow Type	Applicable Kingdom	Last Known Changes	Command-line Compatibility	Workflow Level
Genomic Characterization	Bacteria	PHB v2.2.0	Yes, some optional features incompatible	Sample-level

TheiaProk Workflows¶

The TheiaProk workflows are for the assembly, quality assessment, and characterization of bacterial genomes. There are currently four TheiaProk workflows designed to accommodate different kinds of input data:

Illumina paired-end sequencing (TheiaProk_Illumina_PE)
Illumina single-end sequencing (TheiaProk_Illumina_SE)
ONT sequencing (TheiaProk_ONT)
Genome assemblies (TheiaProk_FASTA)

TheiaProk Workflow Diagram

All input reads are processed through "core tasks" in the TheiaProk Illumina and ONT workflows. These undertake read trimming and assembly appropriate to the input data type. TheiaProk workflows subsequently launch default genome characterization modules for quality assessment, species identification, antimicrobial resistance gene detection, sequence typing, and more. For some taxa identified, "taxa-specific sub-workflows" will be automatically activated, undertaking additional taxa-specific characterization steps. When setting up each workflow, users may choose to use "optional tasks" as additions or alternatives to tasks run in the workflow by default.

Inputs¶

TheiaProk_Illumina_PE Input Read Data

The TheiaProk_Illumina_PE workflow takes in Illumina paired-end read data. Read file names should end with .fastq or .fq, with the optional addition of .gz. When possible, Theiagen recommends zipping files with gzip before Terra uploads to minimize data upload time.

By default, the workflow anticipates 2 x 150bp reads (i.e. the input reads were generated using a 300-cycle sequencing kit). Modifications to the optional parameter for trim_minlen may be required to accommodate shorter read data, such as the 2 x 75bp reads generated using a 150-cycle sequencing kit.

TheiaProk_Illumina_SE Input Read Data

TheiaProk_Illumina_SE takes in Illumina single-end reads. Read file names should end with .fastq or .fq, with the optional addition of .gz. Theiagen highly recommends zipping files with gzip before uploading to Terra to minimize data upload time & save on storage costs.

By default, the workflow anticipates 1 x 35 bp reads (i.e. the input reads were generated using a 70-cycle sequencing kit). Modifications to the optional parameter for trim_minlen may be required to accommodate longer read data.

TheiaProk_ONT Input Read Data

The TheiaProk_ONT workflow takes in base-called ONT read data. Read file names should end with .fastq or .fq, with the optional addition of .gz. When possible, Theiagen recommends zipping files with gzip before uploading to Terra to minimize data upload time.

The ONT sequencing kit and base-calling approach can produce substantial variability in the amount and quality of read data. Genome assemblies produced by the TheiaProk_ONT workflow must be quality assessed before reporting results.

TheiaProk_FASTA Input Assembly Data

The TheiaProk_FASTA workflow takes in assembly files in FASTA format.

Terra Task name	Variable	Type	Description	Default value	Terra Status	Workflow
*workflow name	samplename	String	Name of sample to be analyzed		Required	FASTA, ONT, PE, SE
theiaprok_fasta	assembly_fasta	File	Assembly file in fasta format		Required	FASTA
theiaprok_illumina_pe	read1	File	Illumina forward read file in FASTQ file format (compression optional)		Required	PE
theiaprok_illumina_pe	read2	File	Illumina reverse read file in FASTQ file format (compression optional)		Required	PE
theiaprok_illumina_se	read1	File	Illumina forward read file in FASTQ file format (compression optional)		Required	SE
theiaprok_ont	read1	File	Base-called ONT read file in FASTQ file format (compression optional)		Required	ONT
*workflow name	abricate_db	String	Database to use with the Abricate tool. Options: NCBI, CARD, ARG-ANNOT, Resfinder, MEGARES, EcOH, PlasmidFinder, Ecoli_VF and VFDB	vfdb	Optional	FASTA, ONT, PE, SE
*workflow name	call_abricate	Boolean	Set to true to enable the Abricate task	FALSE	Optional	FASTA, ONT, PE, SE
*workflow name	call_ani	Boolean	Set to true to enable the ANI task	FALSE	Optional	FASTA, ONT, PE, SE
*workflow name	call_kmerfinder	Boolean	Set to true to enable the kmerfinder task	FALSE	Optional	FASTA, ONT, PE, SE
*workflow name	call_plasmidfinder	Boolean	Set to true to enable the plasmidfinder task	TRUE	Optional	FASTA, ONT, PE, SE
*workflow name	call_resfinder	Boolean	Set to true to enable the ResFinder task	FALSE	Optional	FASTA, ONT, PE, SE
*workflow name	city	String	Will be used in the "city" column in any taxon-specific tables created in the Export Taxon Tables task		Optional	FASTA, ONT, PE, SE
*workflow name	collection_date	String	Will be used in the "collection_date" column in any taxon-specific tables created in the Export Taxon Tables task		Optional	FASTA, ONT, PE, SE
*workflow name	county	String	Will be used in the "county" column in any taxon-specific tables created in the Export Taxon Tables task		Optional	FASTA, ONT, PE, SE
*workflow name	expected_taxon	String	If provided, this input will override the taxonomic assignment made by GAMBIT and launch the relevant taxon-specific submodules. It will also modify the organism flag used by AMRFinderPlus. Example format: "Salmonella enterica"		Optional	FASTA, ONT, PE, SE
*workflow name	genome_annotation	String	If set to "bakta", TheiaProk will use Bakta rather than Prokka to annotate the genome	prokka	Optional	FASTA, ONT, PE, SE
*workflow name	genome_length	Int	User-specified expected genome length to be used in genome statistics calculations		Optional	ONT, PE, SE
*workflow name	max_genome_length	Int	Maximum genome length able to pass read screening. For TheiaProk_ONT, screening using max_genome_length is skipped by default.	18040666	Optional	ONT, PE, SE
*workflow name	min_basepairs	Int	Minimum number of base pairs able to pass read screening	2241820	Optional	ONT, PE, SE
*workflow name	min_coverage	Int	Minimum genome coverage able to pass read screening. Screening using min_coverage is skipped by default.	5	Optional	ONT
*workflow name	min_coverage	Int	Minimum genome coverage able to pass read screening	10	Optional	PE, SE
*workflow name	min_genome_length	Int	Minimum genome length able to pass read screening. For TheiaProk_ONT, screening using min_genome_length is skipped by default.	100000	Optional	ONT, PE, SE
*workflow name	min_proportion	Int	Minimum proportion of total reads in each read file to pass read screening	40	Optional	PE
*workflow name	min_reads	Int	Minimum number of reads to pass read screening	5000	Optional	ONT
*workflow name	min_reads	Int	Minimum number of reads to pass read screening	7472	Optional	PE, SE
*workflow name	originating_lab	String	Will be used in the "originating_lab" column in any taxon-specific tables created in the Export Taxon Tables task		Optional	FASTA, ONT, PE, SE
*workflow name	perform_characterization	Boolean	Set to "false" if you want to only generate an assembly and relevant QC metrics and skip all characterization tasks	TRUE	Optional	FASTA, ONT, PE, SE
*workflow name	qc_check_table	File	TSV value with taxons for rows and QC values for columns; internal cells represent user-determined QC thresholds; if provided, turns on the QC Check task. Click on the variable name for an example QC Check table		Optional	FASTA, ONT, PE, SE
*workflow name	run_id	String	Will be used in the "run_id" column in any taxon-specific tables created in the Export Taxon Tables task		Optional	FASTA, ONT, PE, SE
*workflow name	seq_method	String	Will be used in the "seq_id" column in any taxon-specific tables created in the Export Taxon Tables task		Optional	FASTA, ONT, PE, SE
*workflow name	skip_mash	Boolean	If true, skips estimation of genome size and coverage in read screening steps. As a result, providing true also prevents screening using these parameters.	TRUE	Optional	ONT, SE
*workflow name	skip_screen	Boolean	Option to skip the read screening prior to analysis	FALSE	Optional	ONT, PE, SE
*workflow name	taxon_tables	File	File indicating data table names to copy samples of a particular taxon to		Optional	FASTA, ONT, PE, SE
*workflow name	terra_project	String	The name of the Terra Project where you want the taxon tables written to		Optional	FASTA, ONT, PE, SE
*workflow name	terra_workspace	String	The name of the Terra Workspace where you want the taxon tables written to		Optional	FASTA, ONT, PE, SE
*workflow name	trim_min_length	Int	Specifies minimum length of each read after trimming to be kept	25	Optional	SE
*workflow name	trim_min_length	Int	Specifies minimum length of each read after trimming to be kept	75	Optional	PE
*workflow name	trim_quality_min_score	Int	Specifies the minimum average quality of bases in a sliding window to be kept	20	Optional	PE
*workflow name	trim_quality_trim_score	Int	Specifies the average quality of bases in a sliding window to be kept	30	Optional	SE
*workflow name	trim_window_size	Int	Specifies window size for trimming (the number of bases to average the quality across)	4	Optional	SE
*workflow name	trim_window_size	Int	Specifies window size for trimming (the number of bases to average the quality across)	4	Optional	PE
*workflow name	zip	String	Will be used in the "zip" column in any taxon-specific tables created in the Export Taxon Tables task		Optional	FASTA, ONT, PE, SE
abricate	cpu	Int	Number of CPUs to allocate to the task	2	Optional	FASTA, ONT, PE, SE
abricate	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional	FASTA, ONT, PE, SE
abricate	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/abricate:1.0.1-abaum-plasmid	Optional	FASTA, ONT, PE, SE
abricate	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional	FASTA, ONT, PE, SE
abricate	mincov	Int	Minimum DNA %coverage for the Abricate task	80	Optional	FASTA, ONT, PE, SE
abricate	minid	Int	Minimum DNA %identity for the Abricate task	80	Optional	FASTA, ONT, PE, SE
amrfinderplus_task	cpu	Int	Number of CPUs to allocate to the task	2	Optional	FASTA, ONT, PE, SE
amrfinderplus_task	detailed_drug_class	Boolean	If set to true, amrfinderplus_amr_classes and amrfinderplus_amr_subclasses outputs will be created	FALSE	Optional	FASTA, ONT, PE, SE
amrfinderplus_task	disk_size	Boolean	Amount of storage (in GB) to allocate to the AMRFinderPlus task	50	Optional	FASTA, ONT, PE, SE
amrfinderplus_task	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/ncbi-amrfinderplus:3.12.8-2024-07-22.1	Optional	FASTA, ONT, PE, SE
amrfinderplus_task	hide_point_mutations	Boolean	If set to true, point mutations are not reported	FALSE	Optional	FASTA, ONT, PE, SE
amrfinderplus_task	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional	FASTA, ONT, PE, SE
amrfinderplus_task	mincov	Float	Minimum proportion of reference gene covered for a BLAST-based hit (Methods BLAST or PARTIAL)." Attribute should be a float ranging from 0-1, such as 0.6 (equal to 60% coverage)	0.5	Optional	FASTA, ONT, PE, SE
amrfinderplus_task	minid	Float	"Minimum identity for a blast-based hit hit (Methods BLAST or PARTIAL). -1 means use a curated threshold if it exists and 0.9 otherwise. Setting this value to something other than -1 will override any curated similarity cutoffs." Attribute should be a float ranging from 0-1, such as 0.95 (equal to 95% identity)	0.9	Optional	FASTA, ONT, PE, SE
amrfinderplus_task	separate_betalactam_genes	Boolean	Report beta-Lactam AMR genes separated out by all beta-lactam and the respective beta-lactam subclasses	FALSE	Optional	FASTA, ONT, PE, SE
ani	ani_threshold	Float	ANI value threshold must be surpassed in order to output the ani_top_species_match. If a genome does not surpass this threshold (and the percent_bases_aligned_threshold) then the ani_top_species_match output String will show a warning instead of a genus & species.	80	Optional	FASTA, ONT, PE, SE
ani	cpu	Int	Number of CPUs to allocate to the task	4	Optional	FASTA, ONT, PE, SE
ani	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional	FASTA, ONT, PE, SE
ani	docker	String	The Docker container to use for the task	"us-docker.pkg.dev/general-theiagen/staphb/mummer:4.0.0-rgdv2	Optional	FASTA, ONT, PE, SE
ani	mash_filter	Float	Mash distance threshold over which ANI is not calculated	0.9	Optional	FASTA, ONT, PE, SE
ani	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional	FASTA, ONT, PE, SE
ani	percent_bases_aligned_threshold	Float	Threshold regarding the proportion of bases aligned between the query genome and reference genome. If a genome does not surpass this threshold (and the ani_threshold) then the ani_top_species_match output String will show a warning instead of a genus & species.	70	Optional	FASTA, ONT, PE, SE
ani	ref_genome	File	If not set, uses all 43 genomes in RGDv2		Optional	FASTA, ONT, PE, SE
bakta	bakta_db	File	Database of reference annotations (seehttps://github.com/oschwengers/bakta#database)	gs://theiagen-public-files-rp/terra/theiaprok-files/bakta_db_2022-08-29.tar.gz	Optional	FASTA, ONT, PE, SE
bakta	bakta_opts	String	Parameters to pass to bakta from https://github.com/oschwengers/bakta#usage		Optional	FASTA, ONT, PE, SE
bakta	compliant	Boolean	If true, forces Genbank/ENA/DDJB compliance	FALSE	Optional	FASTA, ONT, PE, SE
bakta	cpu	Int	Number of CPUs to allocate to the task	8	Optional	FASTA, ONT, PE, SE
bakta	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional	FASTA, ONT, PE, SE
bakta	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/biocontainers/bakta:1.5.1--pyhdfd78af_0	Optional	FASTA, ONT, PE, SE
bakta	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	16	Optional	FASTA, ONT, PE, SE
bakta	prodigal_tf	File	Prodigal training file to use for CDS prediction by bakta		Optional	FASTA, ONT, PE, SE
bakta	proteins	Boolean		FALSE	Optional	FASTA, ONT, PE, SE
busco	cpu	Int	Number of CPUs to allocate to the task	2	Optional	FASTA, ONT, PE, SE
busco	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional	FASTA, ONT, PE, SE
busco	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/ezlabgva/busco:v5.7.1_cv1	Optional	FASTA, ONT, PE, SE
busco	eukaryote	Boolean	Assesses eukaryotic organisms, rather than prokaryotic organisms	FALSE	Optional	FASTA, ONT, PE, SE
busco	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional	FASTA, ONT, PE, SE
cg_pipeline_clean	cg_pipe_opts	String	Options to pass to CG-Pipeline for clean read assessment	--fast	Optional	PE, SE
cg_pipeline_clean	cpu	Int	Number of CPUs to allocate to the task	4	Optional	PE, SE
cg_pipeline_clean	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional	PE, SE
cg_pipeline_clean	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/lyveset:1.1.4f	Optional	PE, SE
cg_pipeline_clean	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional	PE, SE
cg_pipeline_clean	read2	File	Internal component, do not modify		Do not modify, Optional	SE
cg_pipeline_raw	cg_pipe_opts	String	Options to pass to CG-Pipeline for raw read assessment	--fast	Optional	PE, SE
cg_pipeline_raw	cpu	Int	Number of CPUs to allocate to the task	4	Optional	PE, SE
cg_pipeline_raw	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional	PE, SE
cg_pipeline_raw	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/lyveset:1.1.4f	Optional	PE, SE
cg_pipeline_raw	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional	PE, SE
cg_pipeline_raw	read2	File	Internal component, do not modify		Do not modify, Optional	SE
clean_check_reads	cpu	Int	Number of CPUs to allocate to the task	2	Optional	ONT, PE, SE
clean_check_reads	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional	ONT, PE, SE
clean_check_reads	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/bactopia/gather_samples:2.0.2	Optional	ONT, PE, SE
clean_check_reads	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	2	Optional	ONT, PE, SE
clean_check_reads	organism	String	Internal component, do not modify		Do not modify, Optional	ONT, PE, SE
clean_check_reads	workflow_series	String	Internal component, do not modify		Do not modify, Optional	ONT, PE, SE
dragonflye	assembler	String	The assembler to use in dragonflye. Three options: raven, miniasm, flye	flye	Optional	ONT
dragonflye	assembler_options	String	Enables extra assembler options in quote		Optional	ONT
dragonflye	cpu	Int	Number of CPUs to allocate to the task	4	Optional	ONT
dragonflye	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional	ONT
dragonflye	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/biocontainers/dragonflye:1.0.14--hdfd78af_0	Optional	ONT
dragonflye	illumina_polishing_rounds	Int	Number of polishing rounds to conduct with Illumina data	1	Optional	ONT
dragonflye	illumina_read1	File	If Illumina reads are provided, Dragonflye will perform Illumina polishing		Optional	ONT
dragonflye	illumina_read2	File	If Illumina reads are provided, Dragonflye will perform Illumina polishing		Optional	ONT
dragonflye	medaka_model	String	The model of medaka to use for assembly	r941_min_hac_g507	Optional	ONT
dragonflye	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	32	Optional	ONT
dragonflye	polishing_rounds	Int	The number of polishing rounds to conduct (without Illumina)	1	Optional	ONT
dragonflye	use_pilon_illumina_polisher	Boolean	Set to true to use Pilon to polish Illumina reads	FALSE	Optional	ONT
dragonflye	use_racon	Boolean	Set to true to use Racon to polish instead of Medaka	FALSE	Optional	ONT
export_taxon_tables	asembly_fasta	File	Internal component, do not modify		Do not modify, Optional	FASTA
export_taxon_tables	bbduk_docker	String	The Docker container to use for the task		Do not modify, Optional	FASTA, ONT
export_taxon_tables	cg_pipeline_docker	String	The Docker container to use for the task		Do not modify, Optional	FASTA, ONT
export_taxon_tables	cg_pipeline_report_clean	File	Internal component, do not modify		Do not modify, Optional	FASTA, ONT
export_taxon_tables	cg_pipeline_report_raw	File	Internal component, do not modify		Do not modify, Optional	FASTA, ONT
export_taxon_tables	combined_mean_q_clean	Float	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, SE
export_taxon_tables	combined_mean_q_raw	Float	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, SE
export_taxon_tables	combined_mean_readlength_clean	Float	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, SE
export_taxon_tables	combined_mean_readlength_raw	Float	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, SE
export_taxon_tables	contigs_gfa	File	Internal component, do not modify		Do not modify, Optional	FASTA
export_taxon_tables	cpu	Int	Number of CPUs to allocate to the task	1	Optional	FASTA, ONT, PE, SE
export_taxon_tables	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional	FASTA, ONT, PE, SE
export_taxon_tables	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-03-16	Optional	FASTA, ONT, PE, SE
export_taxon_tables	dragonflye_version	String	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	emmtypingtool_docker	String	The Docker container to use for the task		Do not modify, Optional	FASTA, ONT, SE
export_taxon_tables	emmtypingtool_emm_type	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, SE
export_taxon_tables	emmtypingtool_results_xml	File	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, SE
export_taxon_tables	emmtypingtool_version	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, SE
export_taxon_tables	est_coverage_clean	Float	Internal component, do not modify		Do not modify, Optional	FASTA
export_taxon_tables	est_coverage_raw	Float	Internal component, do not modify		Do not modify, Optional	FASTA
export_taxon_tables	fastp_version	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT
export_taxon_tables	fastq_scan_version	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT
export_taxon_tables	hicap_docker	String	The Docker container to use for the task		Do not modify, Optional	FASTA, ONT
export_taxon_tables	hicap_genes	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT
export_taxon_tables	hicap_results_tsv	File	Internal component, do not modify		Do not modify, Optional	FASTA, ONT
export_taxon_tables	hicap_serotype	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT
export_taxon_tables	hicap_version	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT
export_taxon_tables	kmc_est_genome_length	String	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	kmc_kmer_stats	File	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	kmc_version	String	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	kraken2_docker	String	The Docker container to use for the task		Do not modify, Optional	FASTA, ONT, PE
export_taxon_tables	kraken2_report	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT
export_taxon_tables	kraken2_version	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT
export_taxon_tables	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional	FASTA, ONT, PE, SE
export_taxon_tables	midas_docker	String	The Docker container to use for the task		Do not modify, Optional	FASTA, ONT
export_taxon_tables	midas_primary_genus	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT
export_taxon_tables	midas_report	File	Internal component, do not modify		Do not modify, Optional	FASTA, ONT
export_taxon_tables	midas_secondary_genus	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT
export_taxon_tables	midas_secondary_genus_abundance	Float	Internal component, do not modify		Do not modify, Optional	FASTA, ONT
export_taxon_tables	midas_secondary_genus_coverage	Float	Internal component, do not modify		Do not modify, Optional	FASTA, ONT
export_taxon_tables	nanoplot_docker	String	The Docker container to use for the task		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	nanoplot_html_clean	File	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	nanoplot_html_raw	File	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	nanoplot_num_reads_clean1	Int	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	nanoplot_num_reads_raw1	Int	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	nanoplot_r1_est_coverage_clean	Float	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	nanoplot_r1_est_coverage_raw	Float	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	nanoplot_r1_mean_q_clean	Float	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	nanoplot_r1_mean_q_raw	Float	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	nanoplot_r1_mean_readlength_clean	Float	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	nanoplot_r1_mean_readlength_raw	Float	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	nanoplot_r1_median_q_clean	Float	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	nanoplot_r1_median_q_raw	Float	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	nanoplot_r1_median_readlength_clean	Float	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	nanoplot_r1_median_readlength_raw	Float	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	nanoplot_r1_n50_clean	Float	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	nanoplot_r1_n50_raw	Float	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	nanoplot_r1_stdev_readlength_clean	Float	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	nanoplot_r1_stdev_readlength_raw	Float	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	nanoplot_tsv_clean	File	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	nanoplot_tsv_raw	File	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	nanoplot_version	String	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	nanoq_version	String	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	num_reads_clean_pairs	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, SE
export_taxon_tables	num_reads_clean1	Int	Internal component, do not modify		Do not modify, Optional	FASTA
export_taxon_tables	num_reads_clean2	Int	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, SE
export_taxon_tables	num_reads_raw_pairs	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, SE
export_taxon_tables	num_reads_raw1	Int	Internal component, do not modify		Do not modify, Optional	FASTA
export_taxon_tables	num_reads_raw2	Int	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, SE
export_taxon_tables	r1_mean_q_clean	Float	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, PE
export_taxon_tables	r1_mean_q_raw	Float	Internal component, do not modify		Do not modify, Optional	FASTA
export_taxon_tables	r1_mean_readlength_clean	Float	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, PE
export_taxon_tables	r1_mean_readlength_raw	Float	Internal component, do not modify		Do not modify, Optional	FASTA
export_taxon_tables	r2_mean_q_raw	Float	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, SE
export_taxon_tables	r2_mean_readlength_raw	Float	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, SE
export_taxon_tables	rasusa_version	String	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	read1	File	Internal component, do not modify		Do not modify, Optional	FASTA
export_taxon_tables	read1_clean	File	Internal component, do not modify		Do not modify, Optional	FASTA
export_taxon_tables	read2	File	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, SE
export_taxon_tables	read2_clean	File	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, SE
export_taxon_tables	seroba_ariba_identity	String	Internal component, do not modify		Do not modify, Optional	ONT, SE
export_taxon_tables	seroba_ariba_serotype	String	Internal component, do not modify		Do not modify, Optional	ONT, SE
export_taxon_tables	seroba_details	File	Internal component, do not modify		Do not modify, Optional	ONT, SE
export_taxon_tables	seroba_docker	String	The Docker container to use for the task		Do not modify, Optional	ONT, SE
export_taxon_tables	seroba_serotype	String	Internal component, do not modify		Do not modify, Optional	ONT, SE
export_taxon_tables	seroba_version	String	Internal component, do not modify		Do not modify, Optional	ONT, SE
export_taxon_tables	shigeifinder_cluster_reads	String	Internal component, do not modify		Do not modify, Optional	ONT
export_taxon_tables	shigeifinder_docker_reads	String	Internal component, do not modify		Do not modify, Optional	ONT
export_taxon_tables	shigeifinder_H_antigen_reads	String	Internal component, do not modify		Do not modify, Optional	ONT
export_taxon_tables	shigeifinder_ipaH_presence_absence_reads	String	Internal component, do not modify		Do not modify, Optional	ONT
export_taxon_tables	shigeifinder_notes_reads	String	Internal component, do not modify		Do not modify, Optional	ONT
export_taxon_tables	shigeifinder_num_virulence_plasmid_genes	String	Internal component, do not modify		Do not modify, Optional	ONT
export_taxon_tables	shigeifinder_O_antigen_reads	String	Internal component, do not modify		Do not modify, Optional	ONT
export_taxon_tables	shigeifinder_report_reads	String	Internal component, do not modify		Do not modify, Optional	ONT
export_taxon_tables	shigeifinder_serotype_reads	String	Internal component, do not modify		Do not modify, Optional	ONT
export_taxon_tables	shigeifinder_version_reads	String	Internal component, do not modify		Do not modify, Optional	ONT
export_taxon_tables	shovill_pe_version	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, SE
export_taxon_tables	shovill_se_version	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, PE
export_taxon_tables	srst2_vibrio_biotype	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT
export_taxon_tables	srst2_vibrio_ctxA	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT
export_taxon_tables	srst2_vibrio_detailed_tsv	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT
export_taxon_tables	srst2_vibrio_ompW	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT
export_taxon_tables	srst2_vibrio_serogroup	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT
export_taxon_tables	srst2_vibrio_toxR	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT
export_taxon_tables	srst2_vibrio_version	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT
export_taxon_tables	theiaprok_fasta_analysis_date	String	Internal component, do not modify		Do not modify, Optional	ONT, PE, SE
export_taxon_tables	theiaprok_fasta_version	String	Internal component, do not modify		Do not modify, Optional	ONT, PE, SE
export_taxon_tables	theiaprok_illumina_pe_analysis_date	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, SE
export_taxon_tables	theiaprok_illumina_pe_version	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, SE
export_taxon_tables	theiaprok_illumina_se_analysis_date	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, PE
export_taxon_tables	theiaprok_illumina_se_version	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, PE
export_taxon_tables	theiaprok_ont_analysis_date	String	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	theiaprok_ont_version	String	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	tiptoft_plasmid_replicon_fastq	File	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	tiptoft_plasmid_replicon_genes	String	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	tiptoft_version	String	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
export_taxon_tables	trimmomatic_version	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT
gambit	cpu	Int	Number of CPUs to allocate to the task	8	Optional	FASTA, ONT, PE, SE
gambit	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional	FASTA, ONT, PE, SE
gambit	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/gambit:1.0.0	Optional	FASTA, ONT, PE, SE
gambit	gambit_db_genomes	File	User-provided database of assembled query genomes; requires complementary signatures file. If not provided, uses default database, "/gambit-db"	gs://gambit-databases-rp/2.0.0/gambit-metadata-2.0.0-20240628.gdb	Optional	FASTA, ONT, PE, SE
gambit	gambit_db_signatures	File	User-provided signatures file; requires complementary genomes file. If not specified, the file from the docker container will be used.	gs://gambit-databases-rp/2.0.0/gambit-signatures-2.0.0-20240628.gs	Optional	FASTA, ONT, PE, SE
gambit	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	16	Optional	FASTA, ONT, PE, SE
kmerfinder	cpu	Int	Number of CPUs to allocate to the task	4	Optional	FASTA, ONT, PE, SE
kmerfinder	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional	FASTA, ONT, PE, SE
kmerfinder	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/biocontainers/kmerfinder:3.0.2--hdfd78af_0	Optional	FASTA, ONT, PE, SE
kmerfinder	kmerfinder_args	String	Kmerfinder additional arguments		Optional	FASTA, ONT, PE, SE
kmerfinder	kmerfinder_db	String	Bacterial database for KmerFinder	gs://theiagen-public-files-rp/terra/theiaprok-files/kmerfinder_bacteria_20230911.tar.gz	Optional	FASTA, ONT, PE, SE
kmerfinder	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	32	Optional	FASTA, ONT, PE, SE
merlin_magic	abricate_abaum_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/abricate:1.0.1-abaum-plasmid	Optional	FASTA, ONT, PE, SE
merlin_magic	abricate_abaum_mincov	Int	Minimum DNA percent coverage		Optional	FASTA, ONT, PE, SE
merlin_magic	abricate_abaum_minid	Int	Minimum DNA percent identity; set to 95 because there is a strict threshold of 95% identity for typing purposes	95	Optional	FASTA, ONT, PE, SE
merlin_magic	abricate_vibrio_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/abricate:1.0.1-abaum-plasmid	Optional	FASTA, ONT, PE, SE
merlin_magic	abricate_vibrio_mincov	Int	Minimum DNA percent coverage	80	Optional	FASTA, ONT, PE, SE
merlin_magic	abricate_vibrio_minid	Int	Minimum DNA percent identity	80	Optional	FASTA, ONT, PE, SE
merlin_magic	agrvate_agr_typing_only	Boolean	Set to true to skip agr operon extraction and frameshift detection	False	Optional	FASTA, ONT, PE, SE
merlin_magic	agrvate_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/biocontainers/agrvate:1.0.2--hdfd78af_0	Optional	FASTA, ONT, PE, SE
merlin_magic	assembly_only	Boolean	Internal component, do not modify		Do not modify, Optional	ONT, PE, SE
merlin_magic	call_poppunk	Boolean	If "true", runs PopPUNK for GPSC cluster designation for S. pneumoniae	TRUE	Optional	FASTA, ONT, PE, SE
merlin_magic	call_shigeifinder_reads_input	Boolean	If set to "true", the ShigEiFinder task will run again but using read files as input instead of the assembly file. Input is shown but not used for TheiaProk_FASTA.	FALSE	Optional	FASTA, ONT, PE, SE
merlin_magic	cauris_cladetyper_docker_image	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, PE, SE
merlin_magic	cladetyper_kmer_size	Int	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, PE, SE
merlin_magic	cladetyper_ref_clade1	File	*Provide an empty file if running TheiaProk on the command-line		Do not modify, Optional	FASTA, ONT, PE, SE
merlin_magic	cladetyper_ref_clade1_annotated	File	*Provide an empty file if running TheiaProk on the command-line		Do not modify, Optional	FASTA, ONT, PE, SE
merlin_magic	cladetyper_ref_clade2	File	*Provide an empty file if running TheiaProk on the command-line		Do not modify, Optional	FASTA, ONT, PE, SE
merlin_magic	cladetyper_ref_clade2_annotated	File	*Provide an empty file if running TheiaProk on the command-line		Do not modify, Optional	FASTA, ONT, PE, SE
merlin_magic	cladetyper_ref_clade3	File	*Provide an empty file if running TheiaProk on the command-line		Do not modify, Optional	FASTA, ONT, PE, SE
merlin_magic	cladetyper_ref_clade3_annotated	File	*Provide an empty file if running TheiaProk on the command-line		Do not modify, Optional	FASTA, ONT, PE, SE
merlin_magic	cladetyper_ref_clade4	File	*Provide an empty file if running TheiaProk on the command-line		Do not modify, Optional	FASTA, ONT, PE, SE
merlin_magic	cladetyper_ref_clade4_annotated	File	*Provide an empty file if running TheiaProk on the command-line		Do not modify, Optional	FASTA, ONT, PE, SE
merlin_magic	cladetyper_ref_clade5	File	*Provide an empty file if running TheiaProk on the command-line		Do not modify, Optional	FASTA, ONT, PE, SE
merlin_magic	cladetyper_ref_clade5_annotated	File	*Provide an empty file if running TheiaProk on the command-line		Do not modify, Optional	FASTA, ONT, PE, SE
merlin_magic	clockwork_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/cdcgov/varpipe_wgs_with_refs:2bc7234074bd53d9e92a1048b0485763cd9bbf6f4d12d5a1cc82bfec8ca7d75e	Optional	FASTA, ONT, PE, SE
merlin_magic	ectyper_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/biocontainers/ectyper:1.0.0--pyhdfd78af_1	Optional	FASTA, ONT, PE, SE
merlin_magic	ectyper_hpcov	Int	Minumum percent coverage required for an H antigen allele match	50	Optional	FASTA, ONT, PE, SE
merlin_magic	ectyper_hpid	Int	Percent identity required for an H antigen allele match	95	Optional	FASTA, ONT, PE, SE
merlin_magic	ectyper_opcov	Int	Minumum percent coverage required for an O antigen allele match	90	Optional	FASTA, ONT, PE, SE
merlin_magic	ectyper_opid	Int	Percent identity required for an O antigen allele match	90	Optional	FASTA, ONT, PE, SE
merlin_magic	ectyper_print_alleles	Boolean	Set to true to print the allele sequences as the final column	False	Optional	FASTA, ONT, PE, SE
merlin_magic	ectyper_verify	Boolean	Set to true to enable E. coli species verification	False	Optional	FASTA, ONT, PE, SE
merlin_magic	emmtypingtool_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/emmtypingtool:0.0.1	Optional	FASTA, ONT, PE, SE
merlin_magic	genotyphi_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/mykrobe:0.11.0	Optional	FASTA, ONT, PE, SE
merlin_magic	hicap_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/biocontainers/hicap:1.0.3--py_0	Optional	FASTA, ONT, PE, SE
merlin_magic	kaptive_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/kaptive:2.0.3	Optional	FASTA, ONT, PE, SE
merlin_magic	kaptive_low_gene_id	Float	Percent identity threshold for what counts as a low identity match in the gene BLAST search	95	Optional	FASTA, ONT, PE, SE
merlin_magic	kaptive_min_coverage	Float	Minimum required percent identity for the gene BLAST search via tBLASTn	80	Optional	FASTA, ONT, PE, SE
merlin_magic	kaptive_min_identity	Float	Minimum required percent coverage for the gene BLAST search via tBLASTn	90	Optional	FASTA, ONT, PE, SE
merlin_magic	kaptive_start_end_margin	Int	Determines flexibility in identifying the start and end of a locus - if this value is 10, a locus match that is missing the first 8 base pairs will still count as capturing the start of the locus	10	Optional	FASTA, ONT, PE, SE
merlin_magic	kleborate_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/kleborate:2.2.0	Optional	FASTA, ONT, PE, SE
merlin_magic	kleborate_min_coverage	Float	Minimum alignment percent coverage for main results	80	Optional	FASTA, ONT, PE, SE
merlin_magic	kleborate_min_identity	Float	Minimum alignment percent identity for main results	90	Optional	FASTA, ONT, PE, SE
merlin_magic	kleborate_min_kaptive_confidence	String	{None,Low,Good,High,Very_high,Perfect} Minimum Kaptive confidence to call K/O loci - confidence levels below this will be reported as unknown	Good	Optional	FASTA, ONT, PE, SE
merlin_magic	kleborate_min_spurious_coverage	Float	Minimum alignment percent coverage for spurious results	40	Optional	FASTA, ONT, PE, SE
merlin_magic	kleborate_min_spurious_identity	Float	Minimum alignment percent identity for spurious results	80	Optional	FASTA, ONT, PE, SE
merlin_magic	kleborate_skip_kaptive	Boolean	Equivalent to --kaptive_k --kaptive_	False	Optional	FASTA, ONT, PE, SE
merlin_magic	kleborate_skip_resistance	Boolean	Set to true to turn on resistance genes screening (default: no resistance gene screening)	False	Optional	FASTA, ONT, PE, SE
merlin_magic	legsta_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/biocontainers/legsta:0.5.1--hdfd78af_2	Optional	FASTA, ONT, PE, SE
merlin_magic	lissero_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/biocontainers/lissero:0.4.9--py_0	Optional	FASTA, ONT, PE, SE
merlin_magic	lissero_min_cov	Float	Minimum coverage of the gene to accept a match	95	Optional	FASTA, ONT, PE, SE
merlin_magic	lissero_min_id	Float	Minimum percent identity to accept a match	95	Optional	FASTA, ONT, PE, SE
merlin_magic	meningotype_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/biocontainers/meningotype:0.8.5--pyhdfd78af_0	Optional	FASTA, ONT, PE, SE
merlin_magic	ngmaster_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/ngmaster:1.0.0	Optional	FASTA, ONT, PE, SE
merlin_magic	ont_data	Boolean	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
merlin_magic	paired_end	Boolean	Internal component, do not modify		Do not modify, Optional	ONT, PE
merlin_magic	pasty_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/pasty:1.0.3	Optional	FASTA, ONT, PE, SE
merlin_magic	pasty_min_coverage	Int	Minimum coverage of a O-antigen to be considered for serogrouping by pasty	95	Optional	FASTA, ONT, PE, SE
merlin_magic	pasty_min_pident	Int	Minimum percent identity for a blast hit to be considered for serogrouping	95	Optional	FASTA, ONT, PE, SE
merlin_magic	pbptyper_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/pbptyper:1.0.4	Optional	FASTA, ONT, PE, SE
merlin_magic	pbptyper_min_coverage	Int	Minimum percent coverage to count a hit	90	Optional	FASTA, ONT, PE, SE
merlin_magic	pbptyper_min_pident	Int	Minimum percent identity to count a hit	90	Optional	FASTA, ONT, PE, SE
merlin_magic	poppunk_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/poppunk:2.4.0	Optional	FASTA, ONT, PE, SE
merlin_magic	poppunk_gps_clusters_csv	File	Poppunk database file *Provide an empty or local file if running TheiaProk on the command-line	gs://theiagen-public-files-rp/terra/theiaprok-files/GPS_v6/GPS_v6_clusters.csv	Optional	FASTA, ONT, PE, SE
merlin_magic	poppunk_gps_dists_npy	File	Poppunk database file *Provide an empty or local file if running TheiaProk on the command-line	gs://theiagen-public-files-rp/terra/theiaprok-files/GPS_v6/GPS_v6.dists.npy	Optional	FASTA, ONT, PE, SE
merlin_magic	poppunk_gps_dists_pkl	File	Poppunk database file *Provide an empty or local file if running TheiaProk on the command-line	gs://theiagen-public-files-rp/terra/theiaprok-files/GPS_v6/GPS_v6.dists.pkl	Optional	FASTA, ONT, PE, SE
merlin_magic	poppunk_gps_external_clusters_csv	File	Poppunk database file *Provide an empty or local file if running TheiaProk on the command-line	gs://theiagen-public-files-rp/terra/theiaprok-files/GPS_v6/GPS_v6_external_clusters.csv	Optional	FASTA, ONT, PE, SE
merlin_magic	poppunk_gps_fit_npz	File	Poppunk database file *Provide an empty or local file if running TheiaProk on the command-line	gs://theiagen-public-files-rp/terra/theiaprok-files/GPS_v6/GPS_v6_fit.npz	Optional	FASTA, ONT, PE, SE
merlin_magic	poppunk_gps_fit_pkl	File	Poppunk database file *Provide an empty or local file if running TheiaProk on the command-line	gs://theiagen-public-files-rp/terra/theiaprok-files/GPS_v6/GPS_v6_fit.pkl	Optional	FASTA, ONT, PE, SE
merlin_magic	poppunk_gps_graph_gt	File	Poppunk database file *Provide an empty or local file if running TheiaProk on the command-line	gs://theiagen-public-files-rp/terra/theiaprok-files/GPS_v6/GPS_v6_graph.gt	Optional	FASTA, ONT, PE, SE
merlin_magic	poppunk_gps_h5	File	Poppunk database file *Provide an empty or local file if running TheiaProk on the command-line	gs://theiagen-public-files-rp/terra/theiaprok-files/GPS_v6/GPS_v6.h5	Optional	FASTA, ONT, PE, SE
merlin_magic	poppunk_gps_qcreport_txt	File	Poppunk database file *Provide an empty or local file if running TheiaProk on the command-line	gs://theiagen-public-files-rp/terra/theiaprok-files/GPS_v6/GPS_v6_qcreport.txt	Optional	FASTA, ONT, PE, SE
merlin_magic	poppunk_gps_refs	File	Poppunk database file *Provide an empty or local file if running TheiaProk on the command-line	gs://theiagen-public-files-rp/terra/theiaprok-files/GPS_v6/GPS_v6.refs	Optional	FASTA, ONT, PE, SE
merlin_magic	poppunk_gps_refs_dists_npy	File	Poppunk database file *Provide an empty or local file if running TheiaProk on the command-line	gs://theiagen-public-files-rp/terra/theiaprok-files/GPS_v6/GPS_v6.refs.dists.npy	Optional	FASTA, ONT, PE, SE
merlin_magic	poppunk_gps_refs_dists_pkl	File	Poppunk database file *Provide an empty or local file if running TheiaProk on the command-line	gs://theiagen-public-files-rp/terra/theiaprok-files/GPS_v6/GPS_v6.refs.dists.pkl	Optional	FASTA, ONT, PE, SE
merlin_magic	poppunk_gps_refs_graph_gt	File	Poppunk database file *Provide an empty or local file if running TheiaProk on the command-line	gs://theiagen-public-files-rp/terra/theiaprok-files/GPS_v6/GPS_v6refs_graph.gt	Optional	FASTA, ONT, PE, SE
merlin_magic	poppunk_gps_refs_h5	File	Poppunk database file *Provide an empty or local file if running TheiaProk on the command-line	gs://theiagen-public-files-rp/terra/theiaprok-files/GPS_v6/GPS_v6.refs.h5	Optional	FASTA, ONT, PE, SE
merlin_magic	poppunk_gps_unword_clusters_csv	File	Poppunk database file *Provide an empty or local file if running TheiaProk on the command-line	gs://theiagen-public-files-rp/terra/theiaprok-files/GPS_v6/GPS_v6_unword_clusters.csv	Optional	FASTA, ONT, PE, SE
merlin_magic	read1	File	Internal component, do not modify		Do not modify, Optional	FASTA
merlin_magic	read2	File	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, SE
merlin_magic	seqsero2_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/seqsero2:1.2.1	Optional	FASTA, ONT, PE, SE
merlin_magic	seroba_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/seroba:1.0.2	Optional	FASTA, ONT, PE, SE
merlin_magic	serotypefinder_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/serotypefinder:2.0.1	Optional	FASTA, ONT, PE, SE
merlin_magic	shigatyper_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/shigatyper:2.0.5	Optional	FASTA, ONT, PE, SE
merlin_magic	shigeifinder_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/shigeifinder:1.3.5	Optional	FASTA, ONT, PE, SE
merlin_magic	sistr_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/biocontainers/sistr_cmd:1.1.1--pyh864c0ab_2	Optional	FASTA, ONT, PE, SE
merlin_magic	sistr_use_full_cgmlst_db	Boolean	Set to true to use the full set of cgMLST alleles which can include highly similar alleles. By default the smaller "centroid" alleles or representative alleles are used for each marker	False	Optional	FASTA, ONT, PE, SE
merlin_magic	snippy_base_quality	Int	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, PE, SE
merlin_magic	snippy_gene_query_docker_image	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, PE, SE
merlin_magic	snippy_map_qual	Int	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, PE, SE
merlin_magic	snippy_maxsoft	Int	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, PE, SE
merlin_magic	snippy_min_coverage	Int	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, PE, SE
merlin_magic	snippy_min_frac	Float	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, PE, SE
merlin_magic	snippy_min_quality	Int	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, PE, SE
merlin_magic	snippy_query_gene	String	Internal component, do not modify		Do not modify, Optional	FASTA, PE, SE
merlin_magic	snippy_reference_afumigatus	File	*Provide an empty file if running TheiaProk on the command-line		Do not modify, Optional	FASTA, ONT, PE, SE
merlin_magic	snippy_reference_calbicans	File	*Provide an empty file if running TheiaProk on the command-line		Do not modify, Optional	FASTA, ONT, PE, SE
merlin_magic	snippy_reference_cryptoneo	File	*Provide an empty file if running TheiaProk on the command-line		Do not modify, Optional	FASTA, ONT, PE, SE
merlin_magic	snippy_variants_docker_image	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, PE, SE
merlin_magic	sonneityping_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/mykrobe:0.12.1	Optional	FASTA, ONT, PE, SE
merlin_magic	sonneityping_mykrobe_opts	String	Additional options for mykrobe in sonneityping		Optional	FASTA, ONT, PE, SE
merlin_magic	spatyper_do_enrich	Boolean	Set to true to enable PCR product enrichment	False	Optional	FASTA, ONT, PE, SE
merlin_magic	spatyper_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/biocontainers/spatyper:0.3.3--pyhdfd78af_3	Optional	FASTA, ONT, PE, SE
merlin_magic	srst2_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/srst2:0.2.0-vcholerae	Optional	FASTA, ONT, PE, SE
merlin_magic	srst2_gene_max_mismatch	Int	Maximum number of mismatches for SRST2 to call a gene as present	2000	Optional	FASTA, ONT, PE, SE
merlin_magic	srst2_max_divergence	Int	Maximum divergence, in percentage, for SRST2 to call a gene as present	20	Optional	FASTA, ONT, PE, SE
merlin_magic	srst2_min_cov	Int	Minimum breadth of coverage for SRST2 to call a gene as present	80	Optional	FASTA, ONT, PE, SE
merlin_magic	srst2_min_depth	Int	Minimum depth of coverage for SRST2 to call a gene as present	5	Optional	FASTA, ONT, PE, SE
merlin_magic	srst2_min_edge_depth	Int	Minimum edge depth for SRST2 to call a gene as present	2	Optional	FASTA, ONT, PE, SE
merlin_magic	staphopia_sccmec_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/biocontainers/staphopia-sccmec:1.0.0--hdfd78af_0	Optional	FASTA, ONT, PE, SE
merlin_magic	tbp_parser_coverage_regions_bed	File	A bed file that lists the regions to be considered for QC		Optional	FASTA, ONT, PE, SE
merlin_magic	tbp_parser_coverage_threshold	Int	The minimum coverage for a region to pass QC in tbp_parser	100	Optional	FASTA, ONT, PE, SE
merlin_magic	tbp_parser_debug	Boolean	Activate the debug mode on tbp_parser; increases logging outputs	FALSE	Optional	FASTA, ONT, PE, SE
merlin_magic	tbp_parser_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:1.6.0	Optional	FASTA, ONT, PE, SE
merlin_magic	tbp_parser_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:1.4.0	Optional	FASTA, ONT, PE, SE
merlin_magic	tbp_parser_min_depth	Int	Minimum depth for a variant to pass QC in tbp_parser	10	Optional	FASTA, ONT, PE, SE
merlin_magic	tbp_parser_min_frequency	Int	The minimum frequency for a mutation to pass QC	0.1	Optional	FASTA, ONT, PE, SE
merlin_magic	tbp_parser_min_read_support	Int	The minimum read support for a mutation to pass QC	10	Optional	FASTA, ONT, PE, SE
merlin_magic	tbp_parser_operator	String	Fills the "operator" field in the tbp_parser output files	Operator not provided	Optional	FASTA, ONT, PE, SE
merlin_magic	tbp_parser_output_seq_method_type	String	Fills out the "seq_method" field in the tbp_parser output files	Sequencing method not provided	Optional	FASTA, ONT, PE, SE
merlin_magic	tbprofiler_additional_outputs	Boolean	If set to "true", activates the tbp_parser module and results in more outputs, including tbp_parser_looker_report_csv, tbp_parser_laboratorian_report_csv, tbp_parser_lims_report_csv, tbp_parser_coverage_report, and tbp_parser_genome_percent_coverage	FALSE	Optional	FASTA, ONT, PE, SE
merlin_magic	tbprofiler_cov_frac_threshold	Int	A cutoff used to calculate the fraction of the region covered by ≤ this value	1	Optional	FASTA, ONT, PE, SE
merlin_magic	tbprofiler_custom_db	File	TBProfiler uses by default the TBDB database; if you have a custom database you wish to use, you must provide a custom database in this field and set tbprofiler_run_custom_db to true		Optional	FASTA, ONT, PE, SE
merlin_magic	tbprofiler_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/tbprofiler:4.4.2	Optional	FASTA, ONT, PE, SE
merlin_magic	tbprofiler_mapper	String	The mapping tool used in TBProfiler to align the reads to the reference genome; see TBProfiler’s original documentation for available options.	bwa	Optional	FASTA, ONT, PE, SE
merlin_magic	tbprofiler_min_af	Float	The minimum allele frequency to call a variant	0.1	Optional	FASTA, ONT, PE, SE
merlin_magic	tbprofiler_min_af_pred	Float	The minimum allele frequency to use a variant for resistance prediction	0.1	Optional	FASTA, ONT, PE, SE
merlin_magic	tbprofiler_min_depth	Int	The minimum depth for a variant to be called.	10	Optional	FASTA, ONT, PE, SE
merlin_magic	tbprofiler_run_custom_db	Boolean	TBProfiler uses by default the TBDB database; if you have a custom database you wish to use, you must set this value to true and provide a custom database in the tbprofiler_custom_db field	FALSE	Optional	FASTA, ONT, PE, SE
merlin_magic	tbprofiler_variant_caller	String	Select a different variant caller for TBProfiler to use by writing it in this block; see TBProfiler’s original documentation for available options.	freebayes	Optional	FASTA, ONT, PE, SE
merlin_magic	tbprofiler_variant_calling_params	String	Enter additional variant calling parameters in this free text input to customize how the variant caller works in TBProfiler	None	Optional	FASTA, ONT, PE, SE
merlin_magic	theiaeuk	Boolean	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, PE, SE
merlin_magic	virulencefinder_coverage_threshold	Float	The threshold for minimum coverage		Optional	FASTA, ONT, PE, SE
merlin_magic	virulencefinder_database	String	The specific database to use	virulence_ecoli	Optional	FASTA, ONT, PE, SE
merlin_magic	virulencefinder_docker_image	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/virulencefinder:2.0.4	Optional	FASTA, ONT, PE, SE
merlin_magic	virulencefinder_identity_threshold	Float	The threshold for minimum blast identity		Optional	FASTA, ONT, PE, SE
nanoplot_clean	cpu	Int	Number of CPUs to allocate to the task	4	Optional	ONT
nanoplot_clean	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional	ONT
nanoplot_clean	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/nanoplot:1.40.0	Optional	ONT
nanoplot_clean	max_length	Int	Maximum read length for nanoplot	100000	Optional	ONT
nanoplot_clean	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	16	Optional	ONT
nanoplot_raw	cpu	Int	Number of CPUs to allocate to the task	4	Optional	ONT
nanoplot_raw	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional	ONT
nanoplot_raw	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/nanoplot:1.40.0	Optional	ONT
nanoplot_raw	max_length	Int	Maximum read length for nanoplot	100000	Optional	ONT
nanoplot_raw	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	16	Optional	ONT
plasmidfinder	cpu	Int	Number of CPUs to allocate to the task	2	Optional	FASTA, ONT, PE, SE
plasmidfinder	database	String	User-specified database		Optional	FASTA, ONT, PE, SE
plasmidfinder	database_path	String	Path to user-specified database		Optional	FASTA, ONT, PE, SE
plasmidfinder	disk_size	Int	Amount of storage (in GB) to allocate to the task	50	Optional	FASTA, ONT, PE, SE
plasmidfinder	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/plasmidfinder:2.1.6	Optional	FASTA, ONT, PE, SE
plasmidfinder	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional	FASTA, ONT, PE, SE
plasmidfinder	method_path	String	Path to files for a user-specified method to use (blast or kma)		Optional	FASTA, ONT, PE, SE
plasmidfinder	min_cov	Float	Threshold for minimum coverage, default threshold from PlasmidFinder CLI tool is used (0.60)	0.6	Optional	FASTA, ONT, PE, SE
plasmidfinder	threshold	Float	Threshold for mininum blast identity, default threshold from PlasmidFinder CLI tool is used (0.90). This default differs from the default of the PlasmidFinder webtool (0.95)	0.9	Optional	FASTA, ONT, PE, SE
prokka	compliant	Boolean	Forces Genbank/ENA/DDJB compliant headers in Prokka output files	TRUE	Optional	FASTA, ONT, PE, SE
prokka	cpu	Int	Number of CPUs to allocate to the task	8	Optional	FASTA, ONT, PE, SE
prokka	disk_size	String	Amount of storage (in GB) to allocate to the PlasmidFinder task	100	Optional	FASTA, ONT, PE, SE
prokka	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/prokka:1.14.5	Optional	FASTA, ONT, PE, SE
prokka	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	16	Optional	FASTA, ONT, PE, SE
prokka	prodigal_tf	File	https://github.com/tseemann/prokka#option---prodigaltf		Optional	FASTA, ONT, PE, SE
prokka	prokka_arguments	String	Any additional https://github.com/tseemann/prokka#command-line-options		Optional	FASTA, ONT, PE, SE
prokka	proteins	Boolean	FASTA file of trusted proteins for Prokka to first use for annotations	FALSE	Optional	FASTA, ONT, PE, SE
qc_check_task	assembly_length_unambiguous	Int	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, PE, SE
qc_check_task	assembly_mean_coverage	Float	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, PE, SE
qc_check_task	combined_mean_q_clean	Float	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, SE
qc_check_task	combined_mean_q_raw	Float	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, SE
qc_check_task	combined_mean_readlength_clean	Float	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, SE
qc_check_task	combined_mean_readlength_raw	Float	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, SE
qc_check_task	cpu	Int	Number of CPUs to allocate to the task	4	Optional	FASTA, ONT, PE, SE
qc_check_task	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional	FASTA, ONT, PE, SE
qc_check_task	docker	String	The Docker container to use for the task	"us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-03-16"	Optional	FASTA, ONT, PE, SE
qc_check_task	est_coverage_clean	Float	Internal component, do not modify		Do not modify, Optional	FASTA
qc_check_task	est_coverage_raw	Float	Internal component, do not modify		Do not modify, Optional	FASTA
qc_check_task	kraken_human	Float	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, PE, SE
qc_check_task	kraken_human_dehosted	Float	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, PE, SE
qc_check_task	kraken_sc2	Float	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, PE, SE
qc_check_task	kraken_sc2_dehosted	Float	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, PE, SE
qc_check_task	kraken_target_organism	Float	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, PE, SE
qc_check_task	kraken_target_organism_dehosted	Float	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, PE, SE
qc_check_task	meanbaseq_trim	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, PE, SE
qc_check_task	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional	FASTA, ONT, PE, SE
qc_check_task	midas_secondary_genus_abundance	Int	Internal component, do not modify		Do not modify, Optional	FASTA, ONT
qc_check_task	midas_secondary_genus_coverage	Float	Internal component, do not modify		Do not modify, Optional	FASTA, ONT
qc_check_task	num_reads_clean1	Int	Internal component, do not modify		Do not modify, Optional	FASTA
qc_check_task	num_reads_clean2	Int	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, SE
qc_check_task	num_reads_raw1	Int	Internal component, do not modify		Do not modify, Optional	FASTA
qc_check_task	num_reads_raw2	Int	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, SE
qc_check_task	number_Degenerate	Int	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, PE, SE
qc_check_task	number_N	Int	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, PE, SE
qc_check_task	percent_reference_coverage	Float	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, PE, SE
qc_check_task	r1_mean_q_clean	Float	Internal component, do not modify		Do not modify, Optional	FASTA
qc_check_task	r1_mean_q_raw	Float	Internal component, do not modify		Do not modify, Optional	FASTA
qc_check_task	r1_mean_readlength_clean	Float	Internal component, do not modify		Do not modify, Optional	FASTA
qc_check_task	r1_mean_readlength_raw	Float	Internal component, do not modify		Do not modify, Optional	FASTA
qc_check_task	r2_mean_q_clean	Float	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, SE
qc_check_task	r2_mean_q_raw	Float	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, SE
qc_check_task	r2_mean_readlength_clean	Float	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, SE
qc_check_task	r2_mean_readlength_raw	Float	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, SE
qc_check_task	sc2_s_gene_mean_coverage	Float	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, PE, SE
qc_check_task	sc2_s_gene_percent_coverage	Float	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, PE, SE
qc_check_task	vadr_num_alerts	String	Internal component, do not modify		Do not modify, Optional	FASTA, ONT, PE, SE
quast	cpu	Int	Number of CPUs to allocate to the task	2	Optional	FASTA, ONT, PE, SE
quast	disk_size	String	Amount of storage (in GB) to allocate to the Quast task	100	Optional	FASTA, ONT, PE, SE
quast	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/quast:5.0.2	Optional	FASTA, ONT, PE, SE
quast	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	2	Optional	FASTA, ONT, PE, SE
quast	min_contig_length	Int	Lower threshold for a contig length in bp. Shorter contigs won’t be taken into account	500	Optional	FASTA, ONT, PE, SE
raw_check_reads	cpu	Int	Number of CPUs to allocate to the task	2	Optional	ONT, PE, SE
raw_check_reads	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional	ONT, PE, SE
raw_check_reads	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/bactopia/gather_samples:2.0.2	Optional	ONT, PE, SE
raw_check_reads	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	2	Optional	ONT, PE, SE
raw_check_reads	organism	String	Internal component, do not modify		Do not modify, Optional	ONT, PE, SE
raw_check_reads	workflow_series	String	Internal component, do not modify		Do not modify, Optional	ONT, PE, SE
read_QC_trim	adapters	File	A file containing the sequence of the adapters used during library preparation, used in the BBDuk task		Optional	PE, SE
read_QC_trim	bbduk_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional	PE, SE
read_QC_trim	call_kraken	Boolean	Set to true to launch Kraken2; if true, you must provide a kraken_db	FALSE	Optional	ONT, PE, SE
read_QC_trim	call_midas	Boolean	Set to true to launch Midas	TRUE	Optional	PE, SE
read_QC_trim	downsampling_coverage	Float	The depth to downsample to with Rasusa	150	Optional	ONT
read_QC_trim	fastp_args	String	Additional arguments to pass to fastp	-g -5 20 -3 20	Optional	SE
read_QC_trim	fastp_args	String	Additional arguments to pass to fastp	"--detect_adapter_for_pe -g -5 20 -3 20	Optional	PE
read_QC_trim	kraken_cpu	Int	Number of CPUs to allocate to the task	4	Optional	ONT, PE, SE
read_QC_trim	kraken_db	File	Kraken2 database file; must be provided in call_kraken is true		Optional	ONT, PE, SE
read_QC_trim	kraken_disk_size	Int	GB of storage to request for VM used to run the kraken2 task. Increase this when using large (>30GB kraken2 databases such as the "k2_standard" database)	100	Optional	ONT, PE, SE
read_QC_trim	kraken_memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional	ONT, PE, SE
read_QC_trim	max_length	Int	Internal component, do not modify		Do not modify, Optional	ONT
read_QC_trim	midas_db	File	Midas database file	gs://theiagen-large-public-files-rp/terra/theiaprok-files/midas/midas_db_v1.2.tar.gz	Optional	PE, SE
read_QC_trim	min_length	Int	Internal component, do not modify		Do not modify, Optional	ONT
read_QC_trim	phix	File	A file containing the phix used during Illumina sequencing; used in the BBDuk task		Optional	PE, SE
read_QC_trim	read_processing	String	Read trimming software to use, either "trimmomatic" or "fastp"	trimmomatic	Optional	PE, SE
read_QC_trim	read_qc	String	Allows the user to decide between fastq_scan (default) and fastqc for the evaluation of read quality.	fastq_scan	Optional	PE, SE
read_QC_trim	run_prefix	String	Internal component, do not modify		Do not modify, Optional	ONT
read_QC_trim	target_organism	String	This string is searched for in the kraken2 outputs to extract the read percentage		Optional	ONT, PE, SE
read_QC_trim	trimmomatic_args	String	Additional arguments to pass to trimmomatic. "-phred33" specifies the Phred Q score encoding which is almost always phred33 with modern sequence data.	-phred33	Optional	PE, SE
resfinder_task	acquired	Boolean	Set to true to tell ResFinder to identify acquired resistance genes	TRUE	Optional	FASTA, ONT, PE, SE
resfinder_task	call_pointfinder	Boolean	Set to true to enable detection of point mutations.	FALSE	Optional	FASTA, ONT, PE, SE
resfinder_task	cpu	Int	Number of CPUs to allocate to the task	2	Optional	FASTA, ONT, PE, SE
resfinder_task	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional	FASTA, ONT, PE, SE
resfinder_task	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/resfinder:4.1.11	Optional	FASTA, ONT, PE, SE
resfinder_task	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional	FASTA, ONT, PE, SE
resfinder_task	min_cov	Float	Minimum coverage breadth of a gene for it to be identified	0.5	Optional	FASTA, ONT, PE, SE
resfinder_task	min_id	Float	Minimum identity for ResFinder to identify a gene	0.9	Optional	FASTA, ONT, PE, SE
shovill_pe	assembler	String	Assembler to use (spades, skesa, velvet or megahit), see https://github.com/tseemann/shovill#--assembler	skesa	Optional	PE
shovill_pe	assembler_options	String	Assembler-specific options that you might choose, see https://github.com/tseemann/shovill#--opts		Optional	PE
shovill_pe	cpu	Int	Number of CPUs to allocate to the task	4	Optional	PE
shovill_pe	depth	Int	User specified depth of coverage for downsampling (see https://github.com/tseemann/shovill#--depth and https://github.com/tseemann/shovill#main-steps)	150	Optional	PE
shovill_pe	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional	PE
shovill_pe	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/shovill:1.1.0	Optional	PE
shovill_pe	kmers	String	User-specified Kmer length to override choice made by Shovill, see https://github.com/tseemann/shovill#--kmers	Auto	Optional	PE
shovill_pe	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	16	Optional	PE
shovill_pe	min_contig_length	Int	Minimum contig length to keep in final assembly	200	Optional	PE
shovill_pe	min_coverage	Float	Minimum contig coverage to keep in final assembly	2	Optional	PE
shovill_pe	nocorr	Boolean	Disable correction of minor assembly errors by Shovill (see https://github.com/tseemann/shovill#main-steps)	FALSE	Optional	PE
shovill_pe	noreadcorr	Boolean	Disable correction of sequencing errors in reads by Shovill (seehttps://github.com/tseemann/shovill#main-steps)	FALSE	Optional	PE
shovill_pe	nostitch	Boolean	Disable read stitching by Shovill (see https://github.com/tseemann/shovill#main-steps)	FALSE	Optional	PE
shovill_pe	trim	Boolean	Enable adaptor trimming (see https://github.com/tseemann/shovill#main-steps)	FALSE	Optional	PE
shovill_se	assembler	String	Assembler to use (spades, skesa, velvet or megahit), see https://github.com/tseemann/shovill#--assembler	skesa	Optional	SE
shovill_se	assembler_options	String	Assembler-specific options that you might choose, see https://github.com/tseemann/shovill#--opts		Optional	SE
shovill_se	cpu	Int	Number of CPUs to allocate to the task	4	Optional	SE
shovill_se	depth	Int	User specified depth of coverage for downsampling (see https://github.com/tseemann/shovill#--depth and https://github.com/tseemann/shovill#main-steps)	150	Optional	SE
shovill_se	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional	SE
shovill_se	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/shovill:1.1.0	Optional	SE
shovill_se	kmers	String	User-specified Kmer length to override choice made by Shovill, see https://github.com/tseemann/shovill#--kmers	auto	Optional	SE
shovill_se	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	16	Optional	SE
shovill_se	min_contig_length	Int	Minimum contig length to keep in final assembly	200	Optional	SE
shovill_se	min_coverage	Float	Minimum contig coverage to keep in final assembly	2	Optional	SE
shovill_se	nocorr	Boolean	Disable correction of minor assembly errors by Shovill (see https://github.com/tseemann/shovill#main-steps)	FALSE	Optional	SE
shovill_se	noreadcorr	Boolean	Disable correction of sequencing errors in reads by Shovill (seehttps://github.com/tseemann/shovill#main-steps)	FALSE	Optional	SE
shovill_se	trim	Boolean	Enable adaptor trimming (see https://github.com/tseemann/shovill#main-steps)	FALSE	Optional	SE
ts_mlst	cpu	Int	Number of CPUs to allocate to the task	1	Optional	FASTA, ONT, PE, SE
ts_mlst	disk_size	Int	Amount of storage (in GB) to allocate to the task	50	Optional	FASTA, ONT, PE, SE
ts_mlst	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/staphb/mlst:2.23.0-2024-08-01	Optional	FASTA, ONT, PE, SE
ts_mlst	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	2	Optional	FASTA, ONT, PE, SE
ts_mlst	mincov	Float	Minimum % breadth of coverage to report an MLST allele	10	Optional	FASTA, ONT, PE, SE
ts_mlst	minid	Float	Minimum % identity to known MLST gene to report an MLST allele	95	Optional	FASTA, ONT, PE, SE
ts_mlst	minscore	Float	Minimum https://github.com/tseemann/mlst#scoring-system to assign an MLST profile	50	Optional	FASTA, ONT, PE, SE
ts_mlst	nopath	Boolean	true = use mlst --nopath. If set to false, filename paths are not stripped from FILE column in output TSV	TRUE	Optional	FASTA, ONT, PE, SE
ts_mlst	scheme	String	Don’t autodetect the MLST scheme; force this scheme on all inputs (see https://www.notion.so/TheiaProk-Workflow-Series-68c34aca2a0240ef94fef0acd33651b9?pvs=21 for accepted strings)	None	Optional	FASTA, ONT, PE, SE
version_capture	docker	String	The Docker container to use for the task	"us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0"	Optional	FASTA, ONT, PE, SE
version_capture	timezone	String	Set the time zone to get an accurate date of analysis (uses UTC by default)	FASTA, ONT, PE, SE

Skip Characterization

Ever wanted to skip characterization? Now you can! Set the optional input perform_characterization to false to only generate an assembly and run assembly QC.

Core Tasks (performed for all taxa)¶

versioning: Version Capture for TheiaProk

The versioning task captures the workflow version from the GitHub (code repository) version.

Version Capture Technical details

	Links
Task	task_versioning.wdl

screen: Total Raw Read Quantification and Genome Size Estimation

The screen task ensures the quantity of sequence data is sufficient to undertake genomic analysis. It uses bash commands for quantification of reads and base pairs, and mash sketching to estimate the genome size and its coverage. At each step, the results are assessed relative to pass/fail criteria and thresholds that may be defined by optional user inputs. Samples that do not meet these criteria will not be processed further by the workflow:

Total number of reads: A sample will fail the read screening task if its total number of reads is less than or equal to min_reads.
The proportion of basepairs reads in the forward and reverse read files: A sample will fail the read screening if fewer than min_proportion basepairs are in either the reads1 or read2 files.
Number of basepairs: A sample will fail the read screening if there are fewer than min_basepairs basepairs
Estimated genome size: A sample will fail the read screening if the estimated genome size is smaller than min_genome_size or bigger than max_genome_size.
Estimated genome coverage: A sample will fail the read screening if the estimated genome coverage is less than the min_coverage.

Read screening is undertaken on both the raw and cleaned reads. The task may be skipped by setting the skip_screen variable to true.

Default values vary between the PE and SE workflow. The rationale for these default values can be found below. If two default values are shown, the first is for Illumina workflows and the second is for ONT.

Variable	Default Value	Rationale
`skip_screen`	false	Set to false to avoid waste of compute resources processing insufficient data
`min_reads`	7472 or 5000	Calculated from the minimum number of base pairs required for 20x coverage of Nasuia deltocephalinicola genome, the smallest known bacterial genome as of 2019-08-07 (112,091 bp), divided by 300 (the longest Illumina read length) or 5000 (estimate of ONT read length)
`min_basepairs`	2241820	Should be greater than 20x coverage of Nasuia deltocephalinicola, the smallest known bacterial genome (112,091 bp)
`min_genome_length`	100000	Based on the Nasuia deltocephalinicola genome - the smallest known bacterial genome (112,091 bp)
`max_genome_length`	18040666	Based on the Minicystis rosea genome, the biggest known bacterial genome (16,040,666 bp), plus an additional 2 Mbp to cater for potential extra genomic material
`min_coverage`	10 or 5	A bare-minimum average per base coverage across the genome required for genome characterization. Note, a higher per base coverage coverage would be required for high-quality phylogenetics.
`min_proportion`	40	Neither read1 nor read2 files should have less than 40% of the total number of reads. For paired-end data only

Screen Technical Details

There is a single WDL task for read screening that contains two separate sub-tasks, one used for PE data and the other for SE data. The screen task is run twice, once for raw reads and once for clean reads.

	TheiaProk_Illumina_PE	TheiaProk_Illumina_SE and TheiaProk_ONT
Task	task_screen.wdl (PE sub-task)	task_screen.wdl (SE sub-task)

Illumina Data Core Tasks¶

read_QC_trim: Read Quality Trimming, Adapter Removal, Quantification, and Identification

read_QC_trim is a sub-workflow within TheiaMeta that removes low-quality reads, low-quality regions of reads, and sequencing adapters to improve data quality. It uses a number of tasks, described below.

Read quality trimming

Either trimmomatic or fastp can be used for read-quality trimming. Trimmomatic is used by default. Both tools trim low-quality regions of reads with a sliding window (with a window size of trim_window_size), cutting once the average quality within the window falls below trim_quality_trim_score. They will both discard the read if it is trimmed below trim_minlen.

If fastp is selected for analysis, fastp also implements the additional read-trimming steps indicated below:

Parameter	Explanation
-g	enables polyG tail trimming
-5 20	enables read end-trimming
-3 20	enables read end-trimming
--detect_adapter_for_pe	enables adapter-trimming only for paired-end reads

Adapter removal

The BBDuk task removes adapters from sequence reads. To do this:

Repair from the BBTools package reorders reads in paired fastq files to ensure the forward and reverse reads of a pair are in the same position in the two fastq files.
BBDuk ("Bestus Bioinformaticus" Decontamination Using Kmers) is then used to trim the adapters and filter out all reads that have a 31-mer match to PhiX, which is commonly added to Illumina sequencing runs to monitor and/or improve overall run quality.

What are adapters and why do they need to be removed?

Adapters are manufactured oligonucleotide sequences attached to DNA fragments during the library preparation process. In Illumina sequencing, these adapter sequences are required for attaching reads to flow cells. You can read more about Illumina adapters here. For genome analysis, it's important to remove these sequences since they're not actually from your sample. If you don't remove them, the downstream analysis may be affected.

Read Quantification

There are two methods for read quantification to choose from: fastq-scan (default) or fastqc. Both quantify the forward and reverse reads in FASTQ files. In TheiaProk_Illumina_PE, they also provide the total number of read pairs. This task is run once with raw reads as input and once with clean reads as input. If QC has been performed correctly, you should expect fewer clean reads than raw reads. fastqc also provides a graphical visualization of the read quality.

Read Identification (optional)

The MIDAS task is for the identification of reads to detect contamination with non-target taxa. This task is optional and turned off by default. It can be used by setting the call_midas input variable to true.

The MIDAS tool was originally designed for metagenomic sequencing data but has been co-opted for use with bacterial isolate WGS methods. It can be used to detect contamination present in raw sequencing data by estimating bacterial species abundance in bacterial isolate WGS data. If a secondary genus is detected above a relative frequency of 0.01 (1%), then the sample should fail QC and be investigated further for potential contamination.

This task is similar to those used in commercial software, BioNumerics, for estimating secondary species abundance.

How are the MIDAS output columns determined?

Example MIDAS report in the midas_report column:

species_id	count_reads	coverage	relative_abundance
Salmonella_enterica_58156	3309	89.88006645	0.855888033
Salmonella_enterica_58266	501	11.60606061	0.110519371
Salmonella_enterica_53987	99	2.232896237	0.021262881
Citrobacter_youngae_61659	46	0.995216227	0.009477003
Escherichia_coli_58110	5	0.123668877	0.001177644

MIDAS report column descriptions:

species_id: species identifier
count_reads: number of reads mapped to marker genes
coverage: estimated genome-coverage (i.e. read-depth) of species in metagenome
relative_abundance: estimated relative abundance of species in metagenome

The value in the midas_primary_genus column is derived by ordering the rows in order of "relative_abundance" and identifying the genus of top species in the "species_id" column (Salmonella). The value in the midas_secondary_genus column is derived from the genus of the second-most prevalent genus in the "species_id" column (Citrobacter). The midas_secondary_genus_abundance column is the "relative_abundance" of the second-most prevalent genus (0.009477003). The midas_secondary_genus_coverage is the "coverage" of the second-most prevalent genus (0.995216227).

Alternatively to MIDAS, the Kraken2 task can also be turned on through setting the call_kraken input variable as true for the identification of reads to detect contamination with non-target taxa.

Kraken2 is a bioinformatics tool originally designed for metagenomic applications. It has additionally proven valuable for validating taxonomic assignments and checking contamination of single-species (e.g. bacterial isolate) whole genome sequence data. A database must be provided if this optional module is activated, through the kraken_db optional input. A list of suggested databases can be found on Kraken2 standalone documentation.

read_QC_trim Technical Details

	Links
Sub-workflow	wf_read_QC_trim.wdl
Tasks	task_fastp.wdl task_trimmomatic.wdl task_bbduk.wdl task_fastq_scan.wdl task_midas.wdl task_kraken2.wdl
Software Source Code	fastp; Trimmomatic; fastq-scan; MIDAS; Kraken2
Software Documentation	fastp; Trimmomatic; BBDuk; fastq-scan; MIDAS; Kraken2
Original Publication(s)	Trimmomatic: a flexible trimmer for Illumina sequence data fastp: an ultra-fast all-in-one FASTQ preprocessor An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography Improved metagenomic analysis with Kraken 2

CG-Pipeline: Assessment of Read Quality, and Estimation of Genome Coverage

Thecg_pipeline task generates metrics about read quality and estimates the coverage of the genome using the "run_assembly_readMetrics.pl" script from CG-Pipeline. The genome coverage estimates are calculated using both using raw and cleaned reads, using either a user-provided genome_size or the estimated genome length generated by QUAST.

CG-Pipeline Technical Details

The cg_pipeline task is run twice in TheiaProk, once with raw reads, and once with clean reads.

	Links
Task	task_cg_pipeline.wdl
Software Source Code	CG-Pipeline on GitHub
Software Documentation	CG-Pipeline on GitHub
Original Publication(s)	A computational genomics pipeline for prokaryotic sequencing projects

shovill: De novo Assembly

De Novo assembly will be undertaken only for samples that have sufficient read quantity and quality, as determined by the screen task assessment of clean reads.

In TheiaEuk, assembly is performed using the Shovill pipeline. This undertakes the assembly with one of four assemblers (SKESA (default), SPAdes, Velvet, Megahit), but also performs a number of pre- and post-processing steps to improve the resulting genome assembly. Shovill uses an estimated genome size (see here). If this is not provided by the user as an optional input, Shovill will estimate the genome size using mash. Adaptor trimming can be undertaken with Shovill by setting the trim option to "true", but this is set to "false" by default as alternative adapter trimming is undertaken in the TheiaEuk workflow.

What is de novo assembly?

De novo assembly is the process or product of attempting to reconstruct a genome from scratch (without prior knowledge of the genome) using sequence reads. Assembly of fungal genomes from short-reads will produce multiple contigs per chromosome rather than a single contiguous sequence for each chromosome.

Shovill Technical Details

	Links
TheiaProk WDL Task	task_shovill.wdl
Software code repository and documentation	Shovill on GitHub

ONT Data Core Tasks¶

read_QC_trim_ont: Read Quality Trimming, Quantification, and Identification

read_QC_trim_ont is a sub-workflow within TheiaProk_ONT that filters low-quality reads and trims low-quality regions of reads. It uses several tasks, described below.

Estimated genome length:

By default, an estimated genome length is set to 5 Mb, which is around 0.7 Mb higher than the average bacterial genome length, according to the information collated here. This estimate can be overwritten by the user, and is used by RASUSA and dragonflye.

Plotting and quantifying long-read sequencing data: nanoplot

Nanoplot is used for the determination of mean quality scores, read lengths, and number of reads. This task is run once with raw reads as input and once with clean reads as input. If QC has been performed correctly, you should expect fewer clean reads than raw reads.

Read subsampling: Samples are automatically randomly subsampled to 150X coverage using RASUSA.

Plasmid prediction: tiptoft is used to predict plasmid sequences directly from uncorrected long-read data. Plasmids are identified using replicon sequences used for typing from PlasmidFinder.

Read filtering: Reads are filtered by length and quality using nanoq. By default, sequences with less than 500 basepairs and quality score lower than 10 are filtered out to improve assembly accuracy.

read_QC_trim_ont Technical Details

TheiaProk_ONT calls a sub-workflow listed below, which then calls the individual tasks:

Workflow	TheiaProk_ONT
Sub-workflow	wf_read_QC_trim_ont.wdl
Tasks	task_nanoplot.wdl task_fastq_scan.wdl task_rasusa.wdl task_nanoq.wdl task_tiptoft.wdl
Software Source Code	fastq-scan, NanoPlot, RASUSA, tiptoft, nanoq
Original Publication(s)	NanoPlot paper RASUSA paper Nanoq Paper Tiptoft paper

dragonflye: De novo Assembly

dragonflye Technical Details

	Links
Task	task_dragonflye.wdl
Software Source Code	dragonflye on GitHub
Software Documentation	dragonflye on GitHub

Post-Assembly Tasks (performed for all taxa)¶

quast: Assembly Quality Assessment

QUAST stands for QUality ASsessment Tool. It evaluates genome/metagenome assemblies by computing various metrics without a reference being necessary. It includes useful metrics such as number of contigs, length of the largest contig and N50.

QUAST Technical Details

	Links
Task	task_quast.wdl
Software Source Code	QUAST on GitHub
Software Documentation	https://cab.spbu.ru/software/quast/
Original Publication(s)	QUAST: quality assessment tool for genome assemblies

BUSCO: Assembly Quality Assessment

BUSCO (Benchmarking Universal Single-Copy Orthologue) attempts to quantify the completeness and contamination of an assembly to generate quality assessment metrics. It uses taxa-specific databases containing genes that are all expected to occur in the given taxa, each in a single copy. BUSCO examines the presence or absence of these genes, whether they are fragmented, and whether they are duplicated (suggestive that additional copies came from contaminants).

BUSCO notation

Here is an example of BUSCO notation: C:99.1%[S:98.9%,D:0.2%],F:0.0%,M:0.9%,n:440. There are several abbreviations used in this output:

Complete (C) - genes are considered "complete" when their lengths are within two standard deviations of the BUSCO group mean length.
Single-copy (S) - genes that are complete and have only one copy.
Duplicated (D) - genes that are complete and have more than one copy.
Fragmented (F) - genes that are only partially recovered.
Missing (M) - genes that were not recovered at all.
Number of genes examined (n) - the number of genes examined.

A high equity assembly will use the appropriate database for the taxa, have high complete (C) and single-copy (S) percentages, and low duplicated (D), fragmented (F) and missing (M) percentages.

BUSCO Technical Details

	Links
Task	task_busco.wdl
Software Source Code	BUSCO on GitLab
Software Documentation	https://busco.ezlab.org/
Orginal publication	BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs

MUMmer_ANI: Average Nucleotide Identity (optional)

Average Nucleotide Identity (ANI) is a useful approach for taxonomic identification. The higher the percentage ANI of a query sequence to a given reference genome, the more likely the sequence is the same taxa as the reference.

ANI is calculated in TheiaProk using a perl script written by Lee Katz (ani-m.pl). This uses MUMmer to rapidly align entire query assemblies to one or more reference genomes. By default, TheiaProk uses a set of 43 reference genomes in RGDv2, a database containing genomes of enteric pathogens commonly sequenced by CDC EDLB & PulseNet participating laboratories. The user may also provide their own reference genome. After genome alignment with MUMmer, ani-m.pl calculates the average nucleotide identity and percent bases aligned between 2 genomes (query and reference genomes)

The default database of reference genomes used is called "Reference Genome Database version 2" AKA "RGDv2". This database is composed of 43 enteric bacteria representing 32 species and is intended for identification of enteric pathogens and common contaminants. It contains six Campylobacter spp., three Escherichia/Shigella spp., one Grimontia hollisae, six Listeria spp., one Photobacterium damselae, two Salmonella spp., and thirteen Vibrio spp.

2 Thresholds are utilized to prevent false positive hits. The ani_top_species_match will only report a genus & species match if both thresholds are surpassed. Both of these thresholds are set to match those used in BioNumerics for PulseNet organisms.

ani_threshold default value of 80.0
percent_bases_aligned_threshold default value of 70.0

For more information on RGDv2 database of reference genomes, please see the publication here.

MUMmer_ANI Technical Details

	Links
Task	task_mummer_ani.wdl
Software Source Code	ani-m, MUMmer
Software Documentation	ani-m, MUMmer
Original Publication(s)	MUMmer4: A fast and versatile genome alignment system
Publication about RGDv2 database	https://www.frontiersin.org/articles/10.3389/fmicb.2023.1225207/full

GAMBIT: Taxon Assignment

GAMBIT determines the taxon of the genome assembly using a k-mer based approach to match the assembly sequence to the closest complete genome in a database, thereby predicting its identity. Sometimes, GAMBIT can confidently designate the organism to the species level. Other times, it is more conservative and assigns it to a higher taxonomic rank.

For additional details regarding the GAMBIT tool and a list of available GAMBIT databases for analysis, please consult the GAMBIT tool documentation.

GAMBIT Technical Details

	Links
Task	task_gambit.wdl
Software Source Code	GAMBIT on GitHub
Software Documentation	GAMBIT ReadTheDocs
Original Publication(s)	GAMBIT (Genomic Approximation Method for Bacterial Identification and Tracking): A methodology to rapidly leverage whole genome sequencing of bacterial isolates for clinical identification

KmerFinder: Taxon Assignment (optional)

The KmerFinder method predicts prokaryotic species based on the number of overlapping (co-occurring) k-mers, i.e., 16-mers, between the query genome and genomes in a reference database.

KmerFinder Technical Details

	Links
Task	task_kmerfinder.wdl
Software Source Code	https://bitbucket.org/genomicepidemiology/kmerfinder
Software Documentation	https://cge.food.dtu.dk/services/KmerFinder/instructions.php
Original Publication(s)	Benchmarking of Methods for Genomic Taxonomy

AMRFinderPlus: AMR Genotyping (default)

NCBI's AMRFinderPlus is the default antimicrobial resistance (AMR) detection tool used in TheiaProk. ResFinder may be used alternatively and if so, AMRFinderPlus is not run.

AMRFinderPlus identifies acquired antimicrobial resistance (AMR) genes, virulence genes, and stress genes. Such AMR genes confer resistance to antibiotics, metals, biocides, heat, or acid. For some taxa (see here), AMRFinderPlus will provide taxa-specific results including filtering out genes that are almost ubiquitous in the taxa (intrinsic genes) and identifying resistance-associated point mutations. In TheiaProk, the taxon used by AMRFinderPlus is specified based on the gambit_predicted_taxon or a user-provided expected_taxon.

You can check if a gene or point mutation is in the AMRFinderPlus database here, find the sequences of reference genes here, and search the query Hidden Markov Models (HMMs) used by AMRFinderPlus to identify AMR genes and some stress and virulence proteins (here). The AMRFinderPlus database is updated frequently. You can ensure you are using the most up-to-date version by specifying the docker image as a workflow input. You might like to save this docker image as a workspace data element to make this easier.

AMRFinderPlus Technical Details

	Links
Task	task_amrfinderplus.wdl
Software Source Code	amr on GitHub
Software Documentation	https://github.com/ncbi/amr/wiki
Original Publication(s)	AMRFinderPlus and the Reference Gene Catalog facilitate examination of the genomic links among antimicrobial resistance, stress response, and virulence

ResFinder: AMR Genotyping & Shigella XDR phenotype prediction (alternative)

The ResFinder task is an alternative to using AMRFinderPlus for detection and identification of AMR genes and resistance-associated mutations.

This task runs the Centre for Genomic Epidemiology (CGE) ResFinder tool to identify acquired antimicrobial resistance. It can also run the CGE PointFinder tool if the call_pointfinder variable is set with to true. The databases underlying the task are different to those used by AMRFinderPlus.

The default thresholds for calling AMR genes are 90% identity and 50% coverage of the reference genes (expressed as a fraction in workflow inputs: 0.9 & 0.5). These are the same thresholds utilized in BioNumerics for calling AMR genes.

Organisms currently support by PointFinder for mutational-based predicted resistance:

Campylobacter coli & C. jejuni
Enterococcus faecalis
Enterococcus faecium
Escherichia coli & Shigella spp.
Helicobacter pylori
Neisseria gonorrhoeae
Klebsiella
Mycobacterium tuberculosis
Salmonella spp.
Staphylococcus aureus

XDR Shigella prediction

The ResFinder Task also has the ability to predict whether or not a sample meets the CDC's definition for extensively drug-resistant (XDR) Shigella.

CDC defines XDR Shigella bacteria as strains that are resistant to all commonly recommended empiric and alternative antibiotics — azithromycin, ciprofloxacin, ceftriaxone, trimethoprim-sulfamethoxazole (TMP-SMX), and ampicillin. Link to CDC HAN where this definition is found.

A sample is required to meet all 7 criteria in order to be predicted as XDR Shigella

The GAMBIT task in the workflow must identify the sample as Shigella OR the user must input the word Shigella somewhere within the input String variable called expected_taxon. This requirement serves as the identification of a sample to be of the Shigella genus.
Resfinder or PointFinder predicted resistance to Ampicillin
Resfinder or PointFinder predicted resistance to Azithromycin
Resfinder or PointFinder predicted resistance to Ciprofloxacin
Resfinder or PointFinder predicted resistance to Ceftriazone
Resfinder or PointFinder predicted resistance to Trimethoprim
Resfinder or PointFinder predicted resistance to Sulfamethoxazole

There are 3 potential outputs for the resfinder_predicted_xdr_shigella output string:

Not Shigella based on gambit_predicted_taxon or user input
Not XDR Shigella for samples identified as Shigella by GAMBIT or user input BUT does ResFinder did not predict resistance to all 6 drugs in XDR definition
XDR Shigella meaning the sample was identified as Shigella and ResFinder/PointFinder did predict resistance to ceftriazone, azithromycin, ciprofloxacin, trimethoprim, sulfamethoxazole, and ampicillin.

ResFinder Technical Details

	Links
Task	task_resfinder.wdl
Software Source Code	https://bitbucket.org/genomicepidemiology/resfinder/src/master/
Software Documentation	https://bitbucket.org/genomicepidemiology/resfinder/src/master/
ResFinder database	https://bitbucket.org/genomicepidemiology/resfinder_db/src/master/
PointFinder database	https://bitbucket.org/genomicepidemiology/pointfinder_db/src/master/
Web-server	https://cge.food.dtu.dk/services/ResFinder/
Original Publication(s)	ResFinder 4.0 for predictions of phenotypes from genotypes

TS_MLST: MLST Profiling

Multilocus sequence typing (MLST) is a typing method reflecting population structure. It was developed as a portable, unambiguous method for global epidemiology using PCR, but can be applied to whole-genome sequences in silico. MLST is commonly used for pathogen surveillance, ruling out transmission, and grouping related genomes for comparative analysis.

MLST schemes are taxa-specific. Each scheme uses fragments of typically 7 housekeeping genes ("loci") and has a database associating an arbitrary number with each distinct allele of each locus. Each unique combination of alleles ("allelic profile") is assigned a numbered sequence type (ST). Significant diversification of genomes is captured by changes to the MLST loci via mutational events creating new alleles and STs, or recombinational events replacing the allele and changing the ST. Relationships between STs are based on the number of alleles they share. Clonal complexes share a scheme-specific number of alleles (usually for five of the seven loci).

MLST Limitations

Some taxa have multiple MLST schemes, and some MLST schemes are insufficiently robust.

TheiaProk uses the MLST tool developed by Torsten Seeman to assess MLST using traditional PubMLST typing schemes.

Interpretation of MLST results

Each MLST results file returns the ST and allele results for one sample. If the alleles and ST are correctly assigned, only a single integer value will be present for each. If an ST cannot be assigned, multiple integers or additional characters will be shown, representing the issues with assignment as described here.

Identifying novel alleles and STs

The MLST schemes used in TheiaProk are curated on the PubMLST website.If you identify novel alleles or allelic profiles in your data using TheiaProk's MLST task, you can get these assigned via PubMLST:

Check that the novel allele or ST has not already been assigned a type on PubMLST.
1. Download the assembly file from Terra for your sample with the novel allele or ST
2. Go to the PubMLST webpage for the organism of interest
3. Navigate to the organism "Typing" page
4. Under "Query a sequence" choose "Single sequence" (e.g. this is the page for H. influenzae), select the MLST scheme under "Please select locus/scheme", upload the assembly fasta file, and click submit.
5. Results will be returned lower on the page.
If the allele or ST has not been typed previously on the PubMLST website (step 1), new allele or ST numbers can be assigned using instructions here.

Taxa with multiple MLST schemes

As default, the MLST tool automatically detects the genome's taxa to select the MLST scheme.

Some taxa have multiple MLST schemes, e.g. the Escherichia and Leptospira genera, Acinetobacter baumannii, Clostridium difficile and Streptococcus thermophilus. Only one scheme will be used by default.

Users may specify the scheme as an optional workflow input using the scheme variable of the "ts_mlst" task. Available schemes are listed here and the scheme name should be provided in quotation marks ("….").

If results from multiple MLST schemes are required for the same sample, TheiaProk can be run multiple times specifying non-default schemes. After the first run, output attributes for the workflow (i.e. output column names) must be amended to prevent results from being overwritten. Despite re-running the whole workflow, unmodified tasks will return cached outputs, preventing redundant computation.

TS_MLST Technical Details

	Links
Task	task_ts_mlst.wdl
Software Source Code	mlst
Software Documentation	mlst

Prokka: Assembly Annotation (default)

Assembly annotation is available via Prokka as default, or alternatively via Bakta. When Prokka annotation is used, Bakta is not.

Prokka is a prokaryotic genome annotation tool used to identify and describe features of interest within the genome sequence. Prokka annotates there genome by querying databases described here.

Prokka Technical Details

	Links
Task	task_prokka.wdl
Software Source Code	prokka
Software Documentation	prokka
Original Publication(s)	Prokka: rapid prokaryotic genome annotation

Bakta: Assembly Annotation (alternative)

Assembly annotation is available via Bakta as an alternative to Prokka. When Bakta annotation is used, Prokka is not.

Bakta is intended for annotation of Bacteria and plasmids only, and is best described here!

Bakta Technical Details

	Links
Task	task_bakta.wdl
Software Source Code	bakta
Software Documentation	https://github.com/oschwengers/bakta
Original Publication(s)	Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification

PlasmidFinder: Plasmid Identification

PlasmidFinder detects plasmids in totally- or partially-sequenced genomes, and identifies the closest plasmid type in the database for typing purposes.

What are plasmids?

Plasmids are double-stranded circular or linear DNA molecules that are capable of replication independently of the chromosome and may be transferred between different species and clones. Many plasmids contain resistance or virulence genes, though some do not clearly confer an advantage to their host bacterium.

PlasmidFinder Technical Details

	Links
Task	task_plasmidfinder.wdl
Software Source Code	https://bitbucket.org/genomicepidemiology/plasmidfinder/src/master/
Software Documentation	https://bitbucket.org/genomicepidemiology/plasmidfinder/src/master/
Original Publication(s)	In Silico Detection and Typing of Plasmids using PlasmidFinder and Plasmid Multilocus Sequence Typing

QC_check: Check QC Metrics Against User-Defined Thresholds (optional)

The qc_check task compares generated QC metrics against user-defined thresholds for each metric. This task will run if the user provides a qc_check_table .tsv file. If all QC metrics meet the threshold, the qc_check output variable will read QC_PASS. Otherwise, the output will read QC_NA if the task could not proceed or QC_ALERT followed by a string indicating what metric failed.

The qc_check task applies quality thresholds according to the sample taxa. The sample taxa is taken from the gambit_predicted_taxon value inferred by the GAMBIT module OR can be manually provided by the user using the expected_taxon workflow input.

Formatting the qc_check_table.tsv

The first column of the qc_check_table lists the taxa that the task will assess and the header of this column must be "taxon".
Any genus or species can be included as a row of the qc_check_table. However, these taxa must uniquely match the sample taxa, meaning that the file can include multiple species from the same genus (Vibrio_cholerae and Vibrio_vulnificus), but not both a genus row and species within that genus (Vibrio and Vibrio cholerae). The taxa should be formatted with the first letter capitalized and underscores in lieu of spaces.
Each subsequent column indicates a QC metric and lists a threshold for each taxa that will be checked. The column names must exactly match expected values, so we highly recommend copy and pasting from the template files below.

Template qc_check_table.tsv files

TheiaProk_Illumina_PE: theiaprok_illumina_pe_qc_check_template.tsv
TheiaProk_FASTA: theiaprok_fasta_qc_check_template.tsv

Example Purposes Only

QC threshold values shown are for example purposes only and should not be presumed to be sufficient for every dataset.

QC_Check Technical Details

	Links
Task	task_qc_check.wdl

Taxon Tables: Copy outputs to new data tables based on taxonomic assignment (optional)

The taxon_tables module, if enabled, will copy sample data to a different data table based on the taxonomic assignment. For example, if an E. coli sample is analyzed, the module will copy the sample data to a new table for E. coli samples or add the sample data to an existing table.

To implement the taxon_tables module, provide a file indicating data table names to copy samples of each taxa to in the taxon_tables input variable. No other input variables are needed.

Formatting the taxon_tables file

The taxon_tables file must be uploaded a Google storage bucket that is accessible by Terra and should be in the format below. Briefly, the bacterial genera or species should be listed in the leftmost column with the name of the data table to copy samples of that taxon to in the rightmost column.

taxon	taxon_table
Listeria_monocytogenes	lmonocytogenes_specimen
Salmonella	salmonella_specimen
Escherichia	ecoli_specimen
Shigella	shigella_specimen
Streptococcus	strep_pneumo_specimen
Legionella	legionella_specimen
Klebsiella	klebsiella_specimen
Mycobacterium	mycobacterium_specimen
Acinetobacter	acinetobacter_specimen
Pseudomonas	pseudomonas_specimen
Staphylococcus	staphyloccus_specimen
Neisseria	neisseria_specimen

There are no output columns for the taxon table task. The only output of the task is that additional data tables will appear for in the Terra workspace for samples matching a taxa in the taxon_tables file.

Abricate: Mass screening of contigs for antimicrobial and virulence genes (optional)

The abricate module, if enabled, will run abricate with the database defined in abricate_db to perform mass screening of contigs for antimicrobial resistance or virulence genes. It comes bundled with multiple databases: NCBI, CARD, ARG-ANNOT, Resfinder, MEGARES, EcOH, PlasmidFinder, Ecoli_VF and VFDB. It only detects acquired resistance genes, NOT point mutations

Taxa-Specific Tasks¶

The TheiaProk workflows automatically activate taxa-specific sub-workflows after the identification of relevant taxa using GAMBIT. Alternatively, the user can provide the expected taxa in the expected_taxon workflow input to override the taxonomic assignment made by GAMBIT. Modules are launched for all TheiaProk workflows unless otherwise indicated.

Acinetobacter baumannii

Acinetobacter baumannii¶

A number of approaches are available in TheiaProk for A. baumannii characterization.

Kaptive: Capsule and lipooligosaccharide outer core typing

The cell-surface capsular polysaccharide (CPS) of Acinetobacter baumannii can be used as an epidemiological marker. CPS varies in its composition and structure and is a key determinant in virulence and a target for non-antibiotic therapeutics. Specificity for non-antibiotic therapeutics (e.g. phage therapy) bear particular significance given the extent of antibiotic resistance found in this ESKAPE pathogen.

Biosynthesis and export of CPS is encoded by genes clustering at the K locus (KL). Additional genes associated with CPS biosynthesis and export are sometimes found in other chromosomal locations. The full combination of these genes is summarized as a "K type", described as a "predicted serotype associated with the best match locus". You can read more about this here.

Previously, serotyping of A. baumannii focused on a major immunogenic polysaccharide which was considered the O antigen for the species. This serotyping approach appears to no longer be used and the serotyping scheme has not been updated in over 20 years. Nonetheless, the O-antigen polysaccharide is attached to lipooligosaccharide, and the outer core (OC) of this lipooligosaccharide varies. Biosynthesis of the outer core lipooligosaccharide is encoded by a cluster of genes at the outer core (OC) locus.

Variation in the KL and OCL can be characterized with the Kaptive tool and its associated databases of numbered A. baumannii K and OC locus variants. Kaptive takes in a genome assembly file (fasta), and assigns the K and OC locus to their numbered variants, provides K type and a description of genes in the K or OC loci and elsewhere in the chromosome, alongside metrics for quality of locus match. A description of how Kaptive works, explanations of the full output reports which are provided in the Terra data table by TheiaProk and resources for interpreting outputs are available on the Kaptive Wiki page.

Kaptive Technical Details

	Links
Task	task_kaptive.wdl
Software Source Code	Kaptive on GitHub
Software Documentation	https://github.com/katholt/Kaptive/wiki
Orginal publications	Identification of Acinetobacter baumannii loci for capsular polysaccharide (KL) and lipooligosaccharide outer core (OCL) synthesis in genome assemblies using curated reference databases compatible with Kaptive An update to the database for Acinetobacter baumannii capsular polysaccharide locus typing extends the extensive and diverse repertoire of genes found at and outside the K locus

AcinetobacterPlasmidTyping: Acinetobacter plasmid detection

Acinetobacter plasmids are not included in the PlasmidFinder database. Instead, the AcinetobacterPlasmidTyping database contains variants of the plasmid rep gene for A. baumannii plasmid identification. When matched with >/= 95 % identity, this represents a typing scheme for Acinetobacter baumannii plasmids. In TheiaProk, we use the tool abricate to query our assemblies against this database.

The bioinformatics software for querying sample assemblies against the AcinetobacterPlasmidTyping database is Abricate. The WDL task simply runs abricate, and the Acinetobacter Plasmid database and default setting of 95% minimum identity are set in the merlin magic sub-workflow.

AcinetobacterPlasmidTyping Technical Details

	Links
Task	task_abricate.wdl
Database and documentation	https://github.com/MehradHamidian/AcinetobacterPlasmidTyping
Software Source Code and documentation	abricate on GitHub
Original Publication(s)	Detection and Typing of Plasmids in Acinetobacter baumannii Using rep Genes Encoding Replication Initiation Proteins

Acinetobacter MLST

Two MLST schemes are available for Acinetobacter. The Pasteur scheme is run by default, given significant problems with the Oxford scheme have been described. Should users with to alternatively or additionally use the Oxford MLST scheme, see the section above on MLST. The Oxford scheme is activated in TheiaProk with the MLST scheme input as "abaumannii".

blaOXA-51-like gene detection

The blaOXA-51-like genes, also known as oxaAB, are considered intrinsic to Acinetobacter baumannii but are not found in other Acinetobacter species. Identification of a blaOXA-51-like gene is therefore considered to confirm the species' identity as A. baumannii.

NCBI's AMRFinderPlus, which is implemented as a core module in TheiaProk, detects the blaOXA-51-like genes. This may be used to confirm the species, in addition to the GAMBIT taxon identification. The blaOXA-51-like genes act as carbapenemases when an ISAba1 is found 7 bp upstream of the gene. Detection of this IS is not currently undertaken in TheiaProk.

Escherichia or Shigella spp

Escherichia or Shigella spp¶

The Escherichia and Shigella genera are difficult to differentiate as they do not comply with genomic definitions of genera and species. Consequently, when either Escherichia or Shigella are identified by GAMBIT, all tools intended for these taxa are used.

SerotypeFinder and ECTyper are intended for analysis of E. coli. Both tools are used as there are occasional discrepancies between the serotypes predicted. This primarily arises due to differences in the databases used by each tool.

SerotypeFinder: Serotyping

SerotypeFinder, from the Centre for Genomic Epidemiology (CGE), identifies the serotype of total or partially-sequenced isolates of E. coli.

SerotypeFinder Technical Details

	Links
Task	task_serotypefinder.wdl
Software Source Code	https://bitbucket.org/genomicepidemiology/serotypefinder/src/master/
Software Documentation	https://bitbucket.org/genomicepidemiology/serotypefinder/src/master/
Original Publication(s)	Rapid and Easy In Silico Serotyping of Escherichia coli Isolates by Use of Whole-Genome Sequencing Data

ECTyper: Serotyping

ECTyper is a serotyping module for E. coli. In TheiaProk, we are using assembly files as input.

ECTyper Technical Details

	Links
Task	task_ectyper.wdl
Software Source Code	ECTyper on GitHub
Software Documentation	ECTyper on GitHub
Orginal publication	ECTyper: in silico Escherichia coli serotype and species prediction from raw and assembled whole-genome sequence data

VirulenceFinder identifies virulence genes in total or partial sequenced isolates of bacteria. Currently, only E. coli is supported in TheiaProk workflows.

VirulenceFinder: Virulence gene identification

VirulenceFinder in TheiaProk is only run on assembly files due to issues regarding discordant results when using read files on the web application versus the command-line.

VirulenceFinder Technical Details

	Links
Task	task_virulencefinder.wdl
Software Source Code	VirulenceFinder
Software Documentation	VirulenceFinder
Original Publication(s)	Real-time whole-genome sequencing for routine typing, surveillance, and outbreak detection of verotoxigenic Escherichia co

ShigaTyper and ShigEiFinder are intended for differentiation and serotype prediction for any Shigella species and Enteroinvasive Escherichia coli (EIEC). You can read about differences between these here and here. ShigEiFinder can be run using either the assembly (default) or reads. These tasks will report if the samples are neither Shigella nor EIEC.

ShigaTyper: Shigella/EIEC differentiation and serotyping for Illumina and ONT only

ShigaTyper predicts Shigella spp serotypes from Illumina or ONT read data. If the genome is not Shigella or EIEC, the results from this tool will state this. In the notes it provides, it also reports on the presence of ipaB which is suggestive of the presence of the "virulent invasion plasmid".

ShigaTyper Technical Details

	Links
Task	task_shigatyper.wdl
Software Source Code	ShigaTyper on GitHub
Software Documentation	https://github.com/CFSAN-Biostatistics/shigatyper
Origin publication	In Silico Serotyping Based on Whole-Genome Sequencing Improves the Accuracy of Shigella Identification

ShigEiFinder: Shigella/EIEC differentiation and serotyping using the assembly file as input

ShigEiFinder differentiates Shigella and enteroinvasive E. coli (EIEC) using cluster-specific genes, identifies some serotypes based on the presence of O-antigen and H-antigen genes, and predicts the number of virulence plasmids. The shigeifinder task operates on assembly files.

ShigEiFinder Technical Details

	Links
Task	task_shigeifinder.wdl
Software Source Code	ShigEiFinder on GitHub
Software Documentation	ShigEiFinder on GitHub
Origin publication	Cluster-specific gene markers enhance Shigella and enteroinvasive Escherichia coli in silico serotyping

ShigEiFinder_reads: Shigella/EIEC differentiation and serotyping using Illumina read files as input (optional) for Illumina data only

ShigEiFinder differentiates Shigella and enteroinvasive E. coli (EIEC) using cluster-specific genes, identifies some serotypes based on the presence of O-antigen and H-antigen genes, and predicts the number of virulence plasmids. The shigeifinder_reads task performs on read files.

ShigEiFinder_reads Technical Details

	Links
Task	task_shigeifinder.wdl
Software Source Code	ShigEiFinder on GitHub
Software Documentation	ShigEiFinder on GitHub
Origin publication	Cluster-specific gene markers enhance Shigella and enteroinvasive Escherichia coli in silico serotyping

SonneiTyper is run only when GAMBIT predicts the S. sonnei species. This is the most common Shigella species in the United States.

SonneiTyper: Shigella sonnei identification, genotyping, and resistance mutation identification for Illumina and ONT data only

SonneiTyper identifies Shigella sonnei, and uses single-nucleotide variants for genotyping and prediction of quinolone resistance in gyrA (S83L, D87G, D87Y) and parC (S80I). Outputs are provided in this format.

SonneiTyper is a wrapper script around another tool, Mykrobe, that analyses the S. sonnei genomes.

SonneiTyper Technical Details

	Links
Task	task_sonneityping.wdl
Software Source Code	Mykrobe, sonneityping
Software Documentation	https://github.com/Mykrobe-tools/mykrobe/wiki, sonneityping
Original Publication(s)	Global population structure and genotyping framework for genomic surveillance of the major dysentery pathogen, Shigella sonnei

Shigella XDR prediction. Please see the documentation section above for ResFinder for details regarding this taxa-specific analysis.

Haemophilus influenzae

Haemophilus influenzae¶

hicap: Sequence typing

Identification of cap locus serotype in Haemophilus influenzae assemblies with hicap.

The cap locus of H. influenzae is categorised into 6 different groups based on serology (a-f). There are three functionally distinct regions of the cap locus, designated region I, region II, and region III. Genes within region I (bexABCD) and region III (hcsAB) are associated with transport and post-translation modification. The region II genes encode serotype-specific proteins, with each serotype (a-f) having a distinct set of genes. cap loci are often subject to structural changes (e.g. duplication, deletion) making the process of in silico typing and characterisation of loci difficult.

hicap automates the identification of the cap locus, describes the structural layout, and performs in silico serotyping.

hicap Technical Details

	Links
Task	task_hicap.wdl
Software Source Code	hicap on GitHub
Software Documentation	hicap on GitHub
Original Publication(s)	hicap: In Silico Serotyping of the Haemophilus influenzae Capsule Locus

Klebsiella spp

Klebsiella spp¶

Kleborate: Species identification, MLST, serotyping, AMR and virulence characterization

Kleborate is a tool to identify the Klebsiella species, MLST sequence type, serotype, virulence factors (ICEKp and plasmid associated), and AMR genes and mutations. Serotyping is based on the capsular (K antigen) and lipopolysaccharide (LPS) (O antigen) genes. The resistance genes identified by Kleborate are described here.

Kleborate Technical Details

	Links
Task	task_kleborate.wdl
Software Source Code	kleborate on GitHub
Software Documentation	https://github.com/katholt/Kleborate/wiki
Orginal publication	A genomic surveillance framework and genotyping tool for Klebsiella pneumoniae and its related species complex Identification of Klebsiella capsule synthesis loci from whole genome data

Legionella pneumophila

Legionella pneumophila¶

Legsta: Sequence-based typing

Legsta performs a sequence-based typing of Legionella pneumophila, with the intention of being used for outbreak investigations.

Legsta Technical Details

	Links
Task	task_legsta.wdl
Software Source Code	Legsta
Software Documentation	Legsta

Listeria monocytogenes

Listeria monocytogenes¶

LisSero: Serogroup prediction

LisSero performs serogroup prediction (1/2a, 1/2b, 1/2c, or 4b) for Listeria monocytogenes based on the presence or absence of five genes, lmo1118, lmo0737, ORF2110, ORF2819, and prs. These do not predict somatic (O) or flagellar (H) biosynthesis.

LisSero Technical Details

	Links
Task	task_lissero.wd
Software Source Code	LisSero
Software Documentation	LisSero

Mycobacterium tuberculosis

Mycobacterium tuberculosis¶

TBProfiler: Lineage and drug susceptibility prediction for Illumina and ONT only

TBProfiler identifies Mycobacterium tuberculosis complex species, lineages, sub-lineages and drug resistance-associated mutations.

TBProfiler Technical Details

	Links
Task	task_tbprofiler.wdl
Software Source Code	TBProfiler on GitHub
Software Documentation	https://jodyphelan.gitbook.io/tb-profiler/
Original Publication(s)	Integrating informatics tools and portable sequencing technology for rapid detection of resistance to anti-tuberculous drugs

tbp-parser: Interpretation and Parsing of TBProfiler JSON outputs; requires TBProfiler and tbprofiler_additonal_outputs = true

tbp-parser adds useful drug resistance interpretation by applying expert rules and organizing the outputs from TBProfiler. Please note that this tool has not been tested on ONT data and although it is available, result accuracy should be considered carefully. To understand this module and its functions, please examine the README found with the source code here.

tbp-parser Technical Details

	Links
Task	task_tbp_parser.wdl
Software Source Code	tbp-parser
Software Documentation	tbp-parser

Clockwork: Decontamination of input read files for Illumina PE only

Clockwork decontaminates paired-end data by removing all reads that do not match the H37Rv genome or are unmapped.

Clockwork Technical Details

	Links
Task	task_clockwork.wdl
Software Source Code	clockwork
Software Documentation	https://github.com/iqbal-lab-org/clockwork/wiki

Neisseria spp

Neisseria spp¶

ngmaster: Neisseria gonorrhoeae sequence typing

NG-MAST is currently the most widely used method for epidemiological surveillance of Neisseria gonorrhoea. This tool is targeted at clinical and research microbiology laboratories that have performed WGS of N. gonorrhoeae isolates and wish to understand the molecular context of their data in comparison to previously published epidemiological studies. As WGS becomes more routinely performed, NGMASTER has been developed to completely replace PCR-based NG-MAST, reducing time and labour costs.

The NG-STAR offers a standardized method of classifying seven well-characterized genes associated antimicrobial resistance in N. gonorrhoeae (penA, mtrR, porB, ponA, gyrA, parC and 23S rRNA) to three classes of antibiotics (cephalosporins, macrolides and fluoroquinolones).

ngmaster combines two tools: NG-MAST (in silico multi-antigen sequencing typing) and NG-STAR (sequencing typing for antimicrobial resistance).

ngmaster Technical Details

	Links
Task	task_ngmaster.wdl
Software Source Code	ngmaster
Software Documentation	ngmaster
Original Publication(s)	NGMASTER: in silico multi-antigen sequence typing for Neisseria gonorrhoeae

meningotype: Neisseria meningitidis serotyping

This tool performs serotyping, MLST, finetyping (of porA, fetA, and porB), and Bexsero Antigen Sequencing Typing (BAST).

meningotype Technical Details

	Links
Task	task_meningotype.wdl
Software Source Code	meningotype
Software Documentation	meningotype

Pseudomonas aeruginosa

Pseudomonas aeruginosa¶

pasty: Serotyping

pasty is a tool for in silico serogrouping of Pseudomonas aeruginosa isolates. pasty was developed by Robert Petit, based on the PAst tool from the Centre for Genomic Epidemiology.

pasty Technical Details

	Links
Task	task_pasty.wdl
Software Source Code	pasty
Software Documentation	pasty
Original Publication(s)	Application of Whole-Genome Sequencing Data for O-Specific Antigen Analysis and In Silico Serotyping of Pseudomonas aeruginosa Isolates.

Salmonella spp

Salmonella spp¶

Both SISTR and SeqSero2 are used for serotyping all Salmonella spp. Occasionally, the predicted serotypes may differ between SISTR and SeqSero2. When this occurs, differences are typically small and analogous, and are likely as a result of differing source databases. More information about Salmonella serovar nomenclature can be found here. For Salmonella Typhi, genotyphi is additionally run for further typing.

SISTR: Salmonella serovar prediction

SISTR performs Salmonella spp serotype prediction using antigen gene and cgMLST gene alleles. In TheiaProk. SISTR is run on genome assemblies, and uses the default database setting (smaller "centroid" alleles or representative alleles instead of the full set of cgMLST alleles). It also runs a QC mode to determine the level of confidence in the serovar prediction (see here).

SISTR Technical Details

	Links
Task	task_sistr.wdl
Software Source Code	SISTR
Software Documentation	SISTR
Original Publication(s)	The Salmonella In Silico Typing Resource (SISTR): an open web-accessible tool for rapidly typing and subtyping draft Salmonella genome assemblies.

SeqSero2: Serotyping

SeqSero2 is a tool for Salmonella serotype prediction. In the TheiaProk Illumina and ONT workflows, SeqSero2 takes in raw sequencing reads and performs targeted assembly of serotype determinant alleles, which can be used to predict serotypes including contamination between serotypes. Optionally, SeqSero2 can take the genome assembly as input.

SeqSero2 Technical Details

	Links
Task	task_seqsero2.wdl
Software Source Code	SeqSero2
Software Documentation	SeqSero2
Original Publication(s)	Salmonella serotype determination utilizing high-throughput genome sequencing data. SeqSero2: rapid and improved Salmonella serotype determination using whole genome sequencing data.

genotyphi: Salmonella Typhi lineage, clade, subclade and plasmid typing, AMR prediction for Illumina and ONT only

genotyphi is activated upon identification of the "Typhi" serotype by SISTR or SeqSero2. genotyphi divides the Salmonella enterica serovar Typhi population into detailed lineages, clades, and subclades. It also detects mutations in the quinolone-resistance determining regions, acquired antimicrobial resistance genes, plasmid replicons, and subtypes of the IncHI1 plasmid which is associated with multidrug resistance.

TheiaProk uses the Mykrobe implementation of genotyphi that takes raw sequencing reads as input.

genotyphi Technical Details

	Links
Task	task_genotyphi.wdl
Software Source Code	genotyphi
Software Documentation	https://github.com/katholt/genotyphi/blob/main/README.md#mykrobe-implementation
Orginal publication	An extended genotyping framework for Salmonella enterica serovar Typhi, the cause of human typhoid Five Years of GenoTyphi: Updates to the Global Salmonella Typhi Genotyping Framework

Staphyloccocus aureus

Staphyloccocus aureus¶

spatyper: Sequence typing

Given a fasta file or multiple fasta files, this script identifies the repeats and the order and generates a spa type. The repeat sequences and repeat orders found on http://spaserver2.ridom.de/ are used to identify the spa type of each enriched sequence. Ridom spa type and the genomics repeat sequence are then reported back to the user.

spatyper Technical Details

	Links
Task	task_spatyper.wdl
Software Source Code	spatyper
Software Documentation	spatyper

staphopia-sccmec: Sequence typing

This tool assigns a SCCmec type by BLAST the SCCmec primers against an assembly. staphopia-sccmecreports True for exact primer matches and False for at least 1 base pair difference. The Hamming Distance is also reported.

staphopia-sccmec Technical Details

	Links
Task	task_staphopiasccmec.wdl
Software Source Code	staphopia-sccmec
Software Documentation	staphopia-sccmec
Original Publication(s)	Staphylococcus aureus viewed from the perspective of 40,000+ genomes

agrvate: Sequence typing

This tool identifies the agr locus type and reports possible variants in the agr operon. AgrVATE accepts a S. aureus genome assembly as input and performs a kmer search using an Agr-group specific kmer database to assign the Agr-group. The agr operon is then extracted using in-silico PCR and variants are called using an Agr-group specific reference operon.

agrvate Technical Details

	Links
Task	task_agrvate.wdl
Software Source Code	agrVATE
Software Documentation	agrVATE
Original Publication(s)	Species-Wide Phylogenomics of the Staphylococcus aureus Agr Operon Revealed Convergent Evolution of Frameshift Mutations

Streptococcus pneumoniae

Streptococcus pneumoniae¶

PopPUNK: Global Pneumococcal Sequence Cluster typing

Global Pneumococcal Sequence Clusters (GPSC) define and name pneumococcal strains. GPSC designation is undertaken using the PopPUNK software and GPSC database as described in the file below, obtained from here.

:file: GPSC_README_PopPUNK2.txt

Interpreting GPSC results

In the *_external_clusters.csv novel clusters are assigned NA. For isolates that are assigned a novel cluster and pass QC, you can email globalpneumoseq@gmail.com to have these novel clusters added to the database.
Unsampled diversity in the pneumococcal population may represent missing variation that links two GPS clusters. When this is discovered, GPSCs are merged and the merge history is indicated. For example, if GPSC23 and GPSC362 merged, the GPSC would be reported as GPSC23, with a merge history of GPSC23;362.

PopPUNK Technical Details

	Links
Task	task_poppunk_streppneumo.wdl
GPSC database	https://www.pneumogen.net/gps/training_command_line.html
Software Source Code	PopPunk
Software Documentation	https://poppunk.readthedocs.io/en/latest/
Original Publication(s)	Fast and flexible bacterial genomic epidemiology with PopPUNK

SeroBA: Serotyping for Illumina_PE only

Streptococcus pneumoniae serotyping is performed with SeroBA.

SeroBA Technical Details

	Links
Task	task_seroba.wdl
Software Source Code	SeroBA
Software Documentation	https://sanger-pathogens.github.io/seroba/
Original Publication(s)	SeroBA: rapid high-throughput serotyping of Streptococcus pneumoniae from whole genome sequence data

pbptyper: Penicillin-binding protein genotyping

The Penicillin-binding proteins (PBP) are responsible for the minimum inhibitory concentration phenotype for beta-lactam antibiotic. In Streptococcus pneumoniae, these PBP genes can be identified and typed with PBPTyper.

pbptyper Technical Details

	Links
Task	task_pbptyper.wdl
Software Source Code	pbptyper
Software Documentation	pbptyper
Original Publication(s)	Penicillin-binding protein transpeptidase signatures for tracking and predicting β-lactam resistance levels in Streptococcus pneumoniae

Streptococcus pyogenes

Streptococcus pyogenes¶

emm-typing-tool: Sequence typing for Illumina_PE only

emm-typing of Streptococcus pyogenes raw reads. Assign emm type and subtype by querying the CDC M-type specific database.

emm-typing-tool Technical Details

	Links
Task	task_emmtypingtool.wdl
Software Source Code	emm-typing-tool
Software Documentation	emm-typing-tool

Vibrio spp

Vibrio spp¶

SRST2: Vibrio characterization for Illumina only

The SRST2 Vibrio characterization task detects sequences for Vibrio spp characterization using Illumina sequence reads and a database of target sequence that are traditionally used in PCR methods. The sequences included in the database are as follows:

Sequence name	Sequence role	Purpose in database
toxR	Transcriptional activator	Species marker where presence identifies V. cholerae
ompW	Outer Membrane Protein	Species marker where presence identifies V. cholerae
ctxA	Cholera toxin	Indicates cholera toxin production
tcpA_classical	Toxin co-pilus A allele associated with the Classical biotype	Used to infer identity as Classical biotype
tcpA_ElTor	Toxin co-pilus A allele associated with the El Tor biotype	Used to infer identity as El Tor biotype
wbeN	O antigen encoding region	Used to infer identity as O1 serogroup
wbfR	O antigen encoding region	Used to infer identity as O139 serogroup

SRST2 Technical Details

	Links
Task	task_srst2_vibrio.wdl
Software Source Code	srst2
Software Documentation	srst2
Database Description	Docker container

Abricate: Vibrio characterization

The Abricate Vibrio characterization task detects sequences for Vibrio spp characterization using genome assemblies and the abricate "vibrio" database. The sequences included in the database are as follows:

Sequence name	Sequence role	Purpose in database
toxR	Transcriptional activator	Species marker where presence identifies V. cholerae
ompW	Outer Membrane Protein	Species marker where presence identifies V. cholerae
ctxA	Cholera toxin	Indicates cholera toxin production
tcpA_classical	Toxin co-pilus A allele associated with the Classical biotype	Used to infer identity as Classical biotype
tcpA_ElTor	Toxin co-pilus A allele associated with the El Tor biotype	Used to infer identity as El Tor biotype
wbeN	O antigen encoding region	Used to infer identity as O1 serogroup
wbfR	O antigen encoding region	Used to infer identity as O139 serogroup

Abricate Technical Details

	Links
Task	task_abricate_vibrio.wdl
Software Source Code	abricate
Software Documentation	abricate
Database Description	Docker container

Outputs¶

Variable	Type	Description	Workflow
abricate_abaum_database	String	Database of reference A. baumannii plasmid typing genes used for plasmid typing	FASTA, ONT, PE, SE
abricate_abaum_docker	String	Docker file used for running abricate	FASTA, ONT, PE, SE
abricate_abaum_plasmid_tsv	File	https://github.com/tseemann/abricate#output containing a row for each A. baumannii plasmid type gene found in the sample	FASTA, ONT, PE, SE
abricate_abaum_plasmid_type_genes	String	A. baumannii Plasmid typing genes found in the sample; from GENE column in https://github.com/tseemann/abricate#output	FASTA, ONT, PE, SE
abricate_abaum_version	String	Version of abricate used for A. baumannii plasmid typing	FASTA, ONT, PE, SE
abricate_database	String	Database of reference used with Abricate	FASTA, ONT, PE, SE
abricate_docker	String	Docker file used for running abricate	FASTA, ONT, PE, SE
abricate_genes	String	Genes found in the sample; from GENE column in https://github.com/tseemann/abricate#output	FASTA, ONT, PE, SE
abricate_results_tsv	File	https://github.com/tseemann/abricate#output containing a row for each gene found in the sample	FASTA, ONT, PE, SE
abricate_version	String	Version of abricate used for A. baumannii plasmid typing	FASTA, ONT, PE, SE
abricate_vibrio_biotype	String	Biotype classification according to tcpA gene sequence (Classical or ElTor)	FASTA, ONT, PE, SE
abricate_vibrio_ctxA	String	Presence or absence of the ctxA gene	FASTA, ONT, PE, SE
abricate_vibrio_detailed_tsv	File	Detailed ABRicate output file	FASTA, ONT, PE, SE
abricate_vibrio_ompW	String	Presence or absence of the ompW gene	FASTA, ONT, PE, SE
abricate_vibrio_serogroup	String	Serotype classification as O1 (wbeN gene), O139 (wbfR gene) or not detected.	FASTA, ONT, PE, SE
abricate_vibrio_toxR	String	Presence or absence of the toxR gene	FASTA, ONT, PE, SE
abricate_vibrio_version	String	The abricate version run	FASTA, ONT, PE, SE
agrvate_agr_canonical	String	Canonical or non-canonical agrD	FASTA, ONT, PE, SE
agrvate_agr_group	String	Agr group	FASTA, ONT, PE, SE
agrvate_agr_match_score	String	Match score for agr group	FASTA, ONT, PE, SE
agrvate_agr_multiple	String	If multiple agr groups were found	FASTA, ONT, PE, SE
agrvate_agr_num_frameshifts	String	Number of frameshifts found in CDS of extracted agr operon	FASTA, ONT, PE, SE
agrvate_docker	String	The docker used for AgrVATE	FASTA, ONT, PE, SE
agrvate_results	File	A gzipped tarball of all results	FASTA, ONT, PE, SE
agrvate_summary	File	The summary file produced	FASTA, ONT, PE, SE
agrvate_version	String	The version of AgrVATE used	FASTA, ONT, PE, SE
amrfinderplus_all_report	File	Output TSV file from AMRFinderPlus (described https://github.com/ncbi/amr/wiki/Running-AMRFinderPlus#fields)	FASTA, ONT, PE, SE
amrfinderplus_amr_betalactam_betalactam_genes	String	Beta-lactam AMR genes identified by AMRFinderPlus that are known to confer resistance to beta-lactams	FASTA, ONT, PE, SE
amrfinderplus_amr_betalactam_carbapenem_genes	String	Beta-lactam AMR genes identified by AMRFinderPlus that are known to confer resistance to carbapenem	FASTA, ONT, PE, SE
amrfinderplus_amr_betalactam_cephalosporin_genes	String	Beta-lactam AMR genes identified by AMRFinderPlus that are known to confer resistance to cephalosporin	FASTA, ONT, PE, SE
amrfinderplus_amr_betalactam_cephalothin_genes	String	Beta-lactam AMR genes identified by AMRFinderPlus that are known to confer resistance to cephalothin	FASTA, ONT, PE, SE
amrfinderplus_amr_betalactam_genes	String	Beta-lactam AMR genes identified by AMRFinderPlus	FASTA, ONT, PE, SE
amrfinderplus_amr_betalactam_methicillin_genes	String	Beta-lactam AMR genes identified by AMRFinderPlus that are known to confer resistance to methicilin	FASTA, ONT, PE, SE
amrfinderplus_amr_classes	String	AMRFinderPlus predictions for classes of drugs that genes found in the reads are known to confer resistance to	FASTA, ONT, PE, SE
amrfinderplus_amr_core_genes	String	AMR genes identified by AMRFinderPlus where the scope is "core"	FASTA, ONT, PE, SE
amrfinderplus_amr_plus_genes	String	AMR genes identified by AMRFinderPlus where the scope is "plus"	FASTA, ONT, PE, SE
amrfinderplus_amr_report	File	TSV file detailing AMR genes only, from the amrfinderplus_all_report	FASTA, ONT, PE, SE
amrfinderplus_amr_subclasses	String	More specificity about the drugs that genes identified in the reads confer resistance to	FASTA, ONT, PE, SE
amrfinderplus_db_version	String	AMRFinderPlus database version used	FASTA, ONT, PE, SE
amrfinderplus_stress_genes	String	Stress genes identified by AMRFinderPlus	FASTA, ONT, PE, SE
amrfinderplus_stress_report	File	TSV file detailing stress genes only, from the amrfinderplus_all_report	FASTA, ONT, PE, SE
amrfinderplus_version	String	AMRFinderPlus version used	FASTA, ONT, PE, SE
amrfinderplus_virulence_genes	String	Virulence genes identified by AMRFinderPlus	FASTA, ONT, PE, SE
amrfinderplus_virulence_report	File	TSV file detailing virulence genes only, from the amrfinderplus_all_report	FASTA, ONT, PE, SE
ani_highest_percent	Float	Highest ANI between query and any given reference genome (top species match)	FASTA, ONT, PE, SE
ani_highest_percent_bases_aligned	Float	Percentage of bases aligned between query genome and top species match	FASTA, ONT, PE, SE
ani_mummer_docker	String	Docker image used to run the ANI_mummer task	FASTA, ONT, PE, SE
ani_mummer_version	String	Version of MUMmer used	FASTA, ONT, PE, SE
ani_output_tsv	File	Full output TSV from ani-m	FASTA, ONT, PE, SE
ani_top_species_match	String	Species of genome with highest ANI to query FASTA	FASTA, ONT, PE, SE
assembly_fasta	File	https://github.com/tseemann/shovill#contigsfa	ONT, PE, SE
assembly_length	Int	Length of assembly (total contig length) as determined by QUAST	FASTA, ONT, PE, SE
bakta_gbff	File	Genomic GenBank format annotation file	FASTA, ONT, PE, SE
bakta_gff3	File	Generic Feature Format Version 3 file	FASTA, ONT, PE, SE
bakta_summary	File	Bakta summary output TXT file	FASTA, ONT, PE, SE
bakta_tsv	File	Annotations as simple human readable TSV	FASTA, ONT, PE, SE
bakta_version	String	Bakta version used	FASTA, ONT, PE, SE
bbduk_docker	String	BBDuk docker image used	PE, SE
busco_database	String	BUSCO database used	FASTA, ONT, PE, SE
busco_docker	String	BUSCO docker image used	FASTA, ONT, PE, SE
busco_report	File	A plain text summary of the results in BUSCO notation	FASTA, ONT, PE, SE
busco_results	String	BUSCO results (see https://www.notion.so/TheiaProk-Workflow-Series-68c34aca2a0240ef94fef0acd33651b9?pvs=21)	FASTA, ONT, PE, SE
busco_version	String	BUSCO software version used	FASTA, ONT, PE, SE
cg_pipeline_docker	String	Docker file used for running CG-Pipeline on cleaned reads	PE, SE
cg_pipeline_report_clean	File	TSV file of read metrics from clean reads, including average read length, number of reads, and estimated genome coverage	PE, SE
cg_pipeline_report_raw	File	TSV file of read metrics from raw reads, including average read length, number of reads, and estimated genome coverage	PE, SE
clockwork_decontaminated_read1	File	Decontaminated forward reads by Clockwork	PE
clockwork_decontaminated_read2	File	Decontaminated reverse reads by Clockwork	PE
combined_mean_q_clean	Float	Mean quality score for the combined clean reads	PE
combined_mean_q_raw	Float	Mean quality score for the combined raw reads	PE
combined_mean_readlength_clean	Float	Mean read length for the combined clean reads	PE
combined_mean_readlength_raw	Float	Mean read length for the combined raw reads	PE
contigs_fastg	File	Assembly graph if megahit used for genome assembly	PE
contigs_gfa	File	Assembly graph if spades used for genome assembly	ONT, PE, SE
contigs_lastgraph	File	Assembly graph if velvet used for genome assembly	PE
dragonflye_version	String	Version of dragonflye used for de novo assembly	ONT
ectyper_predicted_serotype	String	Serotype predicted by ECTyper	FASTA, ONT, PE, SE
ectyper_results	File	TSV file of evidence for ECTyper predicted serotype (see https://github.com/phac-nml/ecoli_serotyping#report-format)	FASTA, ONT, PE, SE
ectyper_version	String	Version of ECTyper used	FASTA, ONT, PE, SE
emmtypingtool_docker	String	Docker image for emm-typing-tool	PE
emmtypingtool_emm_type	String	emm-type predicted	PE
emmtypingtool_results_xml	File	XML file with emm-typing-tool resuls	PE
emmtypingtool_version	String	Version of emm-typing-tool used	PE
est_coverage_clean	Float	Estimated coverage calculated from clean reads and genome length	ONT, PE, SE
est_coverage_raw	Float	Estimated coverage calculated from raw reads and genome length	ONT, PE, SE
fastp_html_report	File	The HTML report made with fastp	PE, SE
fastp_version	String	Version of fastp software used	PE, SE
fastq_scan_num_reads_clean_pairs	String	Number of read pairs after cleaning as calculated by fastq_scan	PE
fastq_scan_num_reads_clean1	Int	Number of forward reads after cleaning as calculated by fastq_scan	PE, SE
fastq_scan_num_reads_clean2	Int	Number of reverse reads after cleaning as calculated by fastq_scan	PE
fastq_scan_num_reads_raw_pairs	String	Number of input read pairs calculated by fastq_scan	PE
fastq_scan_num_reads_raw1	Int	Number of input forward reads calculated by fastq_scan	PE, SE
fastq_scan_num_reads_raw2	Int	Number of input reverse reads calculated by fastq_scan	PE
fastq_scan_version	String	Version of fastq-scan software used	PE, SE
fastqc_clean1_html	File	Graphical visualization of clean forward read quality from fastqc to open in an internet browser	PE, SE
fastqc_clean2_html	File	Graphical visualization of clean reverse read quality from fastqc to open in an internet browser	PE
fastqc_docker	String	Docker container used with fastqc	PE, SE
fastqc_num_reads_clean_pairs	String	Number of read pairs after cleaning by fastqc	PE
fastqc_num_reads_clean1	Int	Number of forward reads after cleaning by fastqc	PE, SE
fastqc_num_reads_clean2	Int	Number of reverse reads after cleaning by fastqc	PE
fastqc_num_reads_raw_pairs	String	Number of input read pairs by fastqc	PE
fastqc_num_reads_raw1	Int	Number of input reverse reads by fastqc	PE, SE
fastqc_num_reads_raw2	Int	Number of input reverse reads by fastqc	PE
fastqc_raw1_html	File	Graphical visualization of raw forward read quality from fastqc to open in an internet browser	PE, SE
fastqc_raw2_html	File	Graphical visualization of raw reverse read qualityfrom fastqc to open in an internet browser	PE
fastqc_version	String	Version of fastqc software used	PE, SE
gambit_closest_genomes	File	CSV file listing genomes in the GAMBIT database that are most similar to the query assembly	FASTA, ONT, PE, SE
gambit_db_version	String	Version of GAMBIT used	FASTA, ONT, PE, SE
gambit_docker	String	GAMBIT docker file used	FASTA, ONT, PE, SE
gambit_predicted_taxon	String	Taxon predicted by GAMBIT	FASTA, ONT, PE, SE
gambit_predicted_taxon_rank	String	Taxon rank of GAMBIT taxon prediction	FASTA, ONT, PE, SE
gambit_report	File	GAMBIT report in a machine-readable format	FASTA, ONT, PE, SE
gambit_version	String	Version of GAMBIT software used	FASTA, ONT, PE, SE
genotyphi_final_genotype	String	Final genotype call from GenoTyphi	ONT, PE, SE
genotyphi_genotype_confidence	String	Confidence in the final genotype call made by GenoTyphi	ONT, PE, SE
genotyphi_mykrobe_json	File	JSON file of GenoTyphi output, described https://github.com/katholt/genotyphi#explanation-of-columns-in-the-output	ONT, PE, SE
genotyphi_report_tsv	File	TSV file of GenoTyphi output, described https://github.com/katholt/genotyphi#explanation-of-columns-in-the-output	ONT, PE, SE
genotyphi_species	String	Species call from Mykrobe, used to run GenoTyphi	ONT, PE, SE
genotyphi_st_probes_percent_coverage	Float	Percentage coverage to the Typhi MLST probes	ONT, PE, SE
genotyphi_version	String	Version of GenoTyphi used	ONT, PE, SE
hicap_docker	String	Docker image used for hicap	ONT, PE, SE
hicap_genes	String	cap genes identified. genes on different contigs delimited by;. truncation shown by trailing *	ONT, PE, SE
hicap_results_tsv	File	TSV file of hicap output	ONT, PE, SE
hicap_serotype	String	hicap serotype	ONT, PE, SE
hicap_version	String	hicap version used	ONT, PE, SE
kaptive_k_locus	String	Best matching K locus identified by Kaptive	FASTA, ONT, PE, SE
kaptive_k_type	String	Best matching K type identified by Kaptive	FASTA, ONT, PE, SE
kaptive_kl_confidence	String	Kaptive’s confidence in the KL match (see https://github.com/katholt/Kaptive/wiki/Interpreting-the-results)	FASTA, ONT, PE, SE
kaptive_oc_locus	String	Best matching K locus identified by Kaptive	FASTA, ONT, PE, SE
kaptive_ocl_confidence	String	Kaptive’s confidence in the OCL match (see https://github.com/katholt/Kaptive/wiki/Interpreting-the-results)	FASTA, ONT, PE, SE
kaptive_output_file_k	File	TSV https://github.com/katholt/Kaptive/wiki/How-to-run#output-filesfrom the K locus from Kaptive	FASTA, ONT, PE, SE
kaptive_output_file_oc	File	TSV https://github.com/katholt/Kaptive/wiki/How-to-run#output-filesfrom the OC locus from Kaptive	FASTA, ONT, PE, SE
kaptive_version	String	Version of Kaptive used	FASTA, ONT, PE, SE
kleborate_docker	String	Kleborate docker image used	FASTA, ONT, PE, SE
kleborate_genomic_resistance_mutations	String	Genomic resistance mutations identifies by Kleborate	FASTA, ONT, PE, SE
kleborate_key_resistance_genes	String	Key resistance genes identified by Kleborate	FASTA, ONT, PE, SE
kleborate_klocus	String	Best matching K locus identified by Kleborate via Kaptive	FASTA, ONT, PE, SE
kleborate_klocus_confidence	String	Kaptive’s confidence in the KL match (see https://github.com/katholt/Kaptive/wiki/Interpreting-the-results)	FASTA, ONT, PE, SE
kleborate_ktype	String	Best matching K type identified by Kleborate via Kaptive	FASTA, ONT, PE, SE
kleborate_mlst_sequence_type	String	https://github.com/katholt/Kleborate/wiki/MLST#multi-locus-sequence-typing-mlst call by Kleborate	FASTA, ONT, PE, SE
kleborate_olocus	String	Best matching OC locus identified by Kleborate via Kaptive	FASTA, ONT, PE, SE
kleborate_olocus_confidence	String	Kaptive’s confidence in the KL match (see https://github.com/katholt/Kaptive/wiki/Interpreting-the-results)	FASTA, ONT, PE, SE
kleborate_otype	String	Best matching OC type identified by Kleborate via Kaptive	FASTA, ONT, PE, SE
kleborate_output_file	File	https://github.com/katholt/Kleborate/wiki/Scores-and-counts	FASTA, ONT, PE, SE
kleborate_resistance_score	String	Resistance score as given by kleborate	FASTA, ONT, PE, SE
kleborate_version	String	Version of Kleborate used	FASTA, ONT, PE, SE
kleborate_virulence_score	String	Virulence score as given by kleborate	FASTA, ONT, PE, SE
kmerfinder_database	String	Database used to run KmerFinder	FASTA, ONT, PE, SE
kmerfinder_docker	String	Docker image used to run KmerFinder	FASTA, ONT, PE, SE
kmerfinder_query_coverage	String	KmerFinder’s query coverage of the top hit result	FASTA, ONT, PE, SE
kmerfinder_results_tsv	File	Output TSV file created by KmerFinder	FASTA, ONT, PE, SE
kmerfinder_template_coverage	String		FASTA, ONT, PE, SE
kmerfinder_top_hit	String	Top hit species of KmerFinder	FASTA, ONT, PE, SE
kraken2_database	String	Kraken2 database used for the taxonomic assignment	ONT, PE, SE
kraken2_docker	String	Docker container for Kraken2	ONT, PE, SE
kraken2_report	File	Report, in text format, of Kraken2 results	ONT, PE, SE
kraken2_version	String	Kraken2 version	ONT, PE, SE
legsta_predicted_sbt	String	Sequence based type predicted by Legsta	FASTA, ONT, PE, SE
legsta_results	File	TSV file of legsta results (see https://github.com/tseemann/legsta#output)	FASTA, ONT, PE, SE
legsta_version	String	Version of legsta used	FASTA, ONT, PE, SE
lissero_results	File	TSV results file from LisSero (see https://github.com/MDU-PHL/LisSero#example-output)	FASTA, ONT, PE, SE
lissero_serotype	String	Serotype predicted by LisSero	FASTA, ONT, PE, SE
lissero_version	String	Version of LisSero used	FASTA, ONT, PE, SE
meningotype_BAST	String	BAST type	FASTA, ONT, PE, SE
meningotype_FetA	String	FetA type	FASTA, ONT, PE, SE
meningotype_fHbp	String	fHbp type	FASTA, ONT, PE, SE
meningotype_NadA	String	NBA type	FASTA, ONT, PE, SE
meningotype_NHBA	String	NHBA type	FASTA, ONT, PE, SE
meningotype_PorA	String	PorA type	FASTA, ONT, PE, SE
meningotype_PorB	String	PorB type	FASTA, ONT, PE, SE
meningotype_serogroup	String	Serogroup	FASTA, ONT, PE, SE
meningotype_tsv	File	Full result file	FASTA, ONT, PE, SE
meningotype_version	String	Version of meningotype used	FASTA, ONT, PE, SE
midas_docker	String	MIDAS docker image used	PE, SE
midas_primary_genus	String	Genus of most abundant species in reads	PE, SE
midas_report	File	TSV report of full MIDAS results	PE, SE
midas_secondary_genus	String	Genus of the next most abundant species after removing all species of the most abundant genus	PE, SE
midas_secondary_genus_abundance	String	Relative abundance of secondary genus	PE, SE
midas_secondary_genus_coverage	String	Absolute coverage of secondary genus	PE, SE
n50_value	Int	N50 of assembly calculated by QUAST	FASTA, ONT, PE, SE
nanoplot_docker	String	Docker image for nanoplot	ONT
nanoplot_html_clean	File	Clean read file	ONT
nanoplot_html_raw	File	Raw read file	ONT
nanoplot_num_reads_clean1	Int	Number of clean reads	ONT
nanoplot_num_reads_raw1	Int	Number of raw reads	ONT
nanoplot_r1_est_coverage_clean	Float	Estimated coverage on the clean reads by nanoplot	ONT
nanoplot_r1_est_coverage_raw	Float	Estimated coverage on the raw reads by nanoplot	ONT
nanoplot_r1_mean_q_clean	Float	Mean quality score of clean forward reads	ONT
nanoplot_r1_mean_q_raw	Float	Mean quality score of raw forward reads	ONT
nanoplot_r1_mean_readlength_clean	Float	Mean read length of clean forward reads	ONT
nanoplot_r1_mean_readlength_raw	Float	Mean read length of raw forward reads	ONT
nanoplot_r1_median_q_clean	Float	Median quality score of clean forward reads	ONT
nanoplot_r1_median_q_raw	Float	Median quality score of raw forward reads	ONT
nanoplot_r1_median_readlength_clean	Float	Median read length of clean forward reads	ONT
nanoplot_r1_median_readlength_raw	Float	Median read length of raw forward reads	ONT
nanoplot_r1_n50_clean	Float	N50 of clean forward reads	ONT
nanoplot_r1_n50_raw	Float	N50 of raw forward reads	ONT
nanoplot_r1_stdev_readlength_clean	Float	Standard deviation read length of clean forward reads	ONT
nanoplot_r1_stdev_readlength_raw	Float	Standard deviation read length of raw forward reads	ONT
nanoplot_tsv_clean	File	Output TSV file created by nanoplot	ONT
nanoplot_tsv_raw	File	Output TSV file created by nanoplot	ONT
nanoplot_version	String	Version of nanoplot used for analysis	ONT
nanoq_version	String	Version of nanoq used in analysis	ONT
ngmaster_ngmast_porB_allele	String	porB allele number	FASTA, ONT, PE, SE
ngmaster_ngmast_sequence_type	String	NG-MAST sequence type	FASTA, ONT, PE, SE
ngmaster_ngmast_tbpB_allele	String	tbpB allele number	FASTA, ONT, PE, SE
ngmaster_ngstar_23S_allele	String	23S rRNA allele number	FASTA, ONT, PE, SE
ngmaster_ngstar_gyrA_allele	String	gyrA allele number	FASTA, ONT, PE, SE
ngmaster_ngstar_mtrR_allele	String	mtrR allele number	FASTA, ONT, PE, SE
ngmaster_ngstar_parC_allele	String	parC allele number	FASTA, ONT, PE, SE
ngmaster_ngstar_penA_allele	String	penA allele number	FASTA, ONT, PE, SE
ngmaster_ngstar_ponA_allele	String	ponA allele number	FASTA, ONT, PE, SE
ngmaster_ngstar_porB_allele	String	porB allele number	FASTA, ONT, PE, SE
ngmaster_ngstar_sequence_type	String	NG-STAR sequence type	FASTA, ONT, PE, SE
ngmaster_tsv	File	TSV file with NG-MAST/NG-STAR typing	FASTA, ONT, PE, SE
ngmaster_version	String	ngmaster version	FASTA, ONT, PE, SE
number_contigs	Int	Total number of contigs in assembly	FASTA, ONT, PE, SE
pasty_all_serogroups	File	TSV file with details of each serogroup from pasty (see https://github.com/rpetit3/pasty#example-prefixdetailstsv)	FASTA, ONT, PE, SE
pasty_blast_hits	File	TSV file of BLAST hits from pasty (see https://github.com/rpetit3/pasty#example-prefixblastntsv)	FASTA, ONT, PE, SE
pasty_comment	String		FASTA, ONT, PE, SE
pasty_docker	String	pasty docker image used	FASTA, ONT, PE, SE
pasty_serogroup	String	Serogroup predicted by pasty	FASTA, ONT, PE, SE
pasty_serogroup_coverage	Float	The breadth of coverage of the O-antigen by pasty	FASTA, ONT, PE, SE
pasty_serogroup_fragments	Int	Number of BLAST hits included in the prediction (fewer is better)	FASTA, ONT, PE, SE
pasty_summary_tsv	File	TSV summary file of pasty outputs (see https://github.com/rpetit3/pasty#example-prefixtsv)	FASTA, ONT, PE, SE
pasty_version	String	Version of pasty used	FASTA, ONT, PE, SE
pbptyper_docker	String	pbptyper docker image used	FASTA, ONT, PE, SE
pbptyper_pbptype_predicted_tsv	File	TSV file of pbptyper results (see https://github.com/rpetit3/pbptyper#example-prefixtsv)	FASTA, ONT, PE, SE
pbptyper_predicted_1A_2B_2X	String	PBP type predicted by pbptyper	FASTA, ONT, PE, SE
pbptyper_version	String	Version of pbptyper used	FASTA, ONT, PE, SE
plasmidfinder_db_version	String	Version of PlasmidFnder used	FASTA, ONT, PE, SE
plasmidfinder_docker	String	PlasmidFinder docker image used	FASTA, ONT, PE, SE
plasmidfinder_plasmids	String	Names of plasmids identified by PlasmidFinder	FASTA, ONT, PE, SE
plasmidfinder_results	File	Output file from PlasmidFinder in TSV format	FASTA, ONT, PE, SE
plasmidfinder_seqs	File	Hit_in_genome_seq.fsa file produced by PlasmidFinder	FASTA, ONT, PE, SE
poppunk_docker	String	PopPUNK docker image with GPSC database used	FASTA, ONT, PE, SE
poppunk_gps_cluster	String	GPS cluster predicted by PopPUNK	FASTA, ONT, PE, SE
poppunk_GPS_db_version	String	Version of GPSC database used	FASTA, ONT, PE, SE
poppunk_gps_external_cluster_csv	File	GPSC v6 scheme designations	FASTA, ONT, PE, SE
poppunk_version	String	Version of PopPUNK used	FASTA, ONT, PE, SE
prokka_gbk	File	GenBank file produced from Prokka annotation of input FASTA	FASTA, ONT, PE, SE
prokka_gff	File	Prokka output GFF3 file containing sequence and annotation (you can view this in IGV)	FASTA, ONT, PE, SE
prokka_sqn	File	A Sequin file for GenBank submission	FASTA, ONT, PE, SE
qc_check	String	A string that indicates whether or not the sample passes a set of pre-determined and user-provided QC thresholds	FASTA, ONT, PE, SE
qc_standard	File	The user-provided file that contains the QC thresholds used for the QC check	FASTA, ONT, PE, SE
quast_gc_percent	Float	The GC percent of your sample	FASTA, ONT, PE, SE
quast_report	File	TSV report from QUAST	FASTA, ONT, PE, SE
quast_version	String	Software version of QUAST used	FASTA, ONT, PE, SE
r1_mean_q_clean	Float	Mean quality score of clean forward reads	PE, SE
r1_mean_q_raw	Float	Mean quality score of raw forward reads	PE, SE
r1_mean_readlength_clean	Float	Mean read length of clean forward reads	PE, SE
r1_mean_readlength_raw	Float	Mean read length of raw forward reads	PE, SE
r2_mean_q_clean	Float	Mean quality score of clean reverse reads	PE
r2_mean_q_raw	Float	Mean quality score of raw reverse reads	PE
r2_mean_readlength_clean	Float	Mean read length of clean reverse reads	PE
r2_mean_readlength_raw	Float	Mean read length of raw reverse reads	PE
rasusa_version	String	Version of RASUSA used for analysis	ONT
read_screen_clean	String	PASS or FAIL result from clean read screening; FAIL accompanied by the reason for failure	ONT, PE, SE
read_screen_raw	String	PASS or FAIL result from raw read screening; FAIL accompanied by thereason for failure	ONT, PE, SE
read1_clean	File	Clean forward reads file	ONT, PE, SE
read2_clean	File	Clean reverse reads file	PE
resfinder_db_version	String	Version of ResFinder database	FASTA, ONT, PE, SE
resfinder_docker	String	ResFinder docker image used	FASTA, ONT, PE, SE
resfinder_pheno_table	File	Table containing al AMR phenotypes	FASTA, ONT, PE, SE
resfinder_pheno_table_species	File	Table with species-specific AMR phenotypes	FASTA, ONT, PE, SE
resfinder_pointfinder_pheno_table	File	TSV showing presence(1)/absence(0) of predicted resistance against an antibiotic class	FASTA, ONT, PE, SE
resfinder_pointfinder_results	File	Predicted point mutations, grouped by the gene they occur in	FASTA, ONT, PE, SE
resfinder_predicted_pheno_resistance	String	Semicolon delimited list of antimicrobial drugs and associated genes and/or point mutations. : , , ; : , ;	FASTA, ONT, PE, SE
resfinder_predicted_resistance_Amp	String	States either Resistance or No Resistance predicted to Ampicillin based on resfinder phenotypic predictions	FASTA, ONT, PE, SE
resfinder_predicted_resistance_Axo	String	States either Resistance or No Resistance predicted to Ceftriaxone based on resfinder phenotypic predictions	FASTA, ONT, PE, SE
resfinder_predicted_resistance_Azm	String	States either Resistance or No Resistance predicted to Azithromycin based on resfinder phenotypic predictions	FASTA, ONT, PE, SE
resfinder_predicted_resistance_Cip	String	States either Resistance or No Resistance predicted to Ciprofloxacin based on resfinder phenotypic predictions	FASTA, ONT, PE, SE
resfinder_predicted_resistance_Smx	String	States either Resistance or No Resistance predicted to Sulfamethoxazole based on resfinder phenotypic predictions	FASTA, ONT, PE, SE
resfinder_predicted_resistance_Tmp	String	States either Resistance or No Resistance predicted to Trimothoprim based on resfinder phenotypic predictions	FASTA, ONT, PE, SE
resfinder_predicted_xdr_shigella	String	Final prediction of XDR Shigella status based on CDC definition. Explanation can be found in the description above this table.	FASTA, ONT, PE, SE
resfinder_results	File	Predicted resistance genes grouped by antibiotic class	FASTA, ONT, PE, SE
resfinder_seqs	File	FASTA of resistance gene sequences from user’s input sequence	FASTA, ONT, PE, SE
seq_platform	String	Sequencing platform input by the user	FASTA, ONT, PE, SE
seqsero2_predicted_antigenic_profile	String	Antigenic profile predicted for Salmonella spp by SeqSero2	ONT, PE, SE
seqsero2_predicted_contamination	String	Indicates whether contamination between Salmonella with different serotypes was predicted by SeqSero2	ONT, PE, SE
seqsero2_predicted_serotype	String	Serotype predicted by SeqSero2	ONT, PE, SE
seqsero2_report	File	TSV report produced by SeqSero2	ONT, PE, SE
seqsero2_version	String	Version of SeqSero2 used	ONT, PE, SE
seroba_ariba_identity	String	Percentage identity between the query sequence and ARIBA-predicted serotype	PE
seroba_ariba_serotype	String	Serotype predicted by ARIBA, via SeroBA	PE
seroba_details	File	Detailed TSV file from SeroBA	PE
seroba_docker	String	SeroBA docker image used	PE
seroba_serotype	String	Serotype predicted by SeroBA	PE
seroba_version	String	SeroBA version used	PE
serotypefinder_docker	String	SerotypeFinder docker image used	FASTA, ONT, PE, SE
serotypefinder_report	File	TSV report produced by SerotypeFinder	FASTA, ONT, PE, SE
serotypefinder_serotype	String	Serotype predicted by SerotypeFinder	FASTA, ONT, PE, SE
shigatyper_docker	String	ShigaTyper docker image used	ONT, PE, SE
shigatyper_hits_tsv	File	Detailed TSV report from ShigaTyper (seehttps://github.com/CFSAN-Biostatistics/shigatyper#example-prefix-hitstsv)	ONT, PE, SE
shigatyper_ipaB_presence_absence	String	Presence (+) or absence (-) of ipaB identified by ShigaTyper	ONT, PE, SE
shigatyper_notes	String	Any notes output from ShigaTyper	ONT, PE, SE
shigatyper_predicted_serotype	String	Serotype predicted by ShigaTyper	ONT, PE, SE
shigatyper_summary_tsv	File	TSV summary report from ShigaTyper (see https://github.com/CFSAN-Biostatistics/shigatyper#example-prefixtsv)	ONT, PE, SE
shigatyper_version	String	Version of ShigaTyper used	ONT, PE, SE
shigeifinder_cluster	String	Shigella/EIEC cluster identified by ShigEiFinder	FASTA, ONT, PE, SE
shigeifinder_cluster_reads	String	Shigella/EIEC cluster identified by ShigEiFinder using read files as inputs	PE, SE
shigeifinder_docker	String	ShigEiFinder docker image used	FASTA, ONT, PE, SE
shigeifinder_docker_reads	String	ShigEiFinder docker image used using read files as inputs	PE, SE
shigeifinder_H_antigen	String	H-antigen gene identified by ShigEiFinder	FASTA, ONT, PE, SE
shigeifinder_H_antigen_reads	String	H-antigen gene identified by ShigEiFinder using read files as inputs	PE, SE
shigeifinder_ipaH_presence_absence	String	Presence (+) or absence (-) of ipaH identified by ShigEiFinder	FASTA, ONT, PE, SE
shigeifinder_ipaH_presence_absence_reads	String	Presence (+) or absence (-) of ipaH identified by ShigEiFinder using read files as inputs	PE, SE
shigeifinder_notes	String	Any notes output from ShigEiFinder	FASTA, ONT, PE, SE
shigeifinder_notes_reads	String	Any notes output from ShigEiFinder using read files as inputs	PE, SE
shigeifinder_num_virulence_plasmid_genes	String	Number of virulence plasmid genes identified by ShigEiFinder	FASTA, ONT, PE, SE
shigeifinder_num_virulence_plasmid_genes_reads	String	Number of virulence plasmid genes identified by ShigEiFinder using read files as inputs	PE, SE
shigeifinder_O_antigen	String	O-antigen gene identified by ShigEiFinder	FASTA, ONT, PE, SE
shigeifinder_O_antigen_reads	String	O-antigen gene identified by ShigEiFinder using read files as inputs	PE, SE
shigeifinder_report	File	TSV report from ShigEiFinder (see https://github.com/LanLab/ShigEiFinder#shigeifinder)	FASTA, ONT, PE, SE
shigeifinder_report_reads	File	TSV report from ShigEiFinder (see https://github.com/LanLab/ShigEiFinder#shigeifinder) using read files as inputs	PE, SE
shigeifinder_serotype	String	Serotype predicted by ShigEiFinder	FASTA, ONT, PE, SE
shigeifinder_serotype_reads	String	Serotype predicted by ShigEiFinder using read files as inputs	PE, SE
shigeifinder_version	String	ShigEiFinder version used	FASTA, ONT, PE, SE
shigeifinder_version_reads	String	ShigEiFinder version used using read files as inputs	PE, SE
shovill_pe_version	String	Shovill version used	PE
shovill_se_version	String	Shovill version used	SE
sistr_allele_fasta	File	FASTA file of novel cgMLST alleles from SISTR	FASTA, ONT, PE, SE
sistr_allele_json	File	JSON file of cgMLST allele sequences and information (see https://github.com/phac-nml/sistr_cmd#cgmlst-allele-search-results)	FASTA, ONT, PE, SE
sistr_cgmlst	File	CSV file of the cgMLST allelic profile from SISTR (see https://github.com/phac-nml/sistr_cmd#cgmlst-allelic-profiles-output---cgmlst-profiles-cgmlst-profilescsv)	FASTA, ONT, PE, SE
sistr_predicted_serotype	String	Serotype predicted by SISTR	FASTA, ONT, PE, SE
sistr_results	File	TSV results file produced by SISTR (see https://github.com/phac-nml/sistr_cmd#primary-results-output--o-sistr-results)	FASTA, ONT, PE, SE
sistr_version	String	Version of SISTR used	FASTA, ONT, PE, SE
sonneityping_final_genotype	String	Final genotype call from Mykrobe, via sonneityper	ONT, PE, SE
sonneityping_final_report_tsv	File	Detailed TSV report from mykrobe, via sonneityper (see https://github.com/katholt/sonneityping#example-output)	ONT, PE, SE
sonneityping_genotype_confidence	String	Confidence in the final genotype call from sonneityper	ONT, PE, SE
sonneityping_genotype_name	String	Human readable alias for genotype, where available provided by sonneityper	ONT, PE, SE
sonneityping_mykrobe_docker	String	sonneityping docker image used	ONT, PE, SE
sonneityping_mykrobe_report_csv	File	CSV report from mykrobe via sonneityper (see https://github.com/Mykrobe-tools/mykrobe/wiki/AMR-prediction-output#csv-file)	ONT, PE, SE
sonneityping_mykrobe_report_json	File	JSON report from mykrobe via sonneityper (see https://github.com/Mykrobe-tools/mykrobe/wiki/AMR-prediction-output#json-file)	ONT, PE, SE
sonneityping_mykrobe_version	String	Version of sonneityping used	ONT, PE, SE
sonneityping_species	String	Species call from Mykrobe via sonneityping	ONT, PE, SE
spatyper_docker	String	spatyper docker image used	FASTA, ONT, PE, SE
spatyper_repeats	String	order of identified repeats	FASTA, ONT, PE, SE
spatyper_tsv	File	TSV report with spatyper results	FASTA, ONT, PE, SE
spatyper_type	String	spa type	FASTA, ONT, PE, SE
spatyper_version	String	spatyper version used	FASTA, ONT, PE, SE
srst2_vibrio_biotype	String	Biotype classification according to tcpA gene sequence (Classical or ElTor)	PE, SE
srst2_vibrio_ctxA	String	Presence or absence of the ctxA gene	PE, SE
srst2_vibrio_detailed_tsv	File	Detailed https://github.com/katholt/srst2 output file	PE, SE
srst2_vibrio_ompW	String	Presence or absence of the ompW gene	PE, SE
srst2_vibrio_serogroup	String	Serotype classification as O1 (wbeN gene), O139 (wbfR gene) or not detected.	PE, SE
srst2_vibrio_toxR	String	Presence or absence of the toxR gene	PE, SE
srst2_vibrio_version	String	The SRST2 version run	PE, SE
staphopiasccmec_docker	String	staphopia-sccmec docker image used	FASTA, ONT, PE, SE
staphopiasccmec_hamming_distance_tsv	File	staphopia-sccmec version	FASTA, ONT, PE, SE
staphopiasccmec_results_tsv	File	sccmec types and mecA presence	FASTA, ONT, PE, SE
staphopiasccmec_types_and_mecA_presence	String	staphopia-sccmec Hamming distance file	FASTA, ONT, PE, SE
staphopiasccmec_version	String	staphopia-sccmec presence and absence TSV file	FASTA, ONT, PE, SE
taxon_table_status	String	Status of the taxon table upload	FASTA, ONT, PE, SE
tbp_parser_average_genome_depth	Float	Optional output. Average genome depth across the reference genome	ONT, PE, SE
tbp_parser_coverage_report	File	Optional output. TSV file with breadth of coverage of each gene associated with antimicrobial resistance in mycobacterium tuberculosis.	ONT, PE, SE
tbp_parser_docker	String	Optional output. The docker image for tbp-parser	ONT, PE
tbp_parser_genome_percent_coverage	Float	Optional output. The percent of the genome covered at a depth greater than the specified minimum (default 10)	ONT, PE, SE
tbp_parser_laboratorian_report_csv	File	Optional output. Human-readable laboratorian report file containing the list of mutations found to be conferring resistance, both by WHO classification and expert rule implementation. The file contains the following columns: sample_id, tbprofiler_gene_name, tbprofiler_variant_locus_tag, tbprofiler_variant_substitution_type, tbprofiler_variant_substitution_nt, tbprofiler_variant_substitution_aa, confidence according to WHO, antimicrobial, depth, frequency, read_support, rationale ( WHO or expert rule), and warning if the coverage is below specified minimum (default 10)	ONT, PE, SE
tbp_parser_lims_report_csv	File	Optional output. LIMS digestable CSV report containing information on resistance for a set of antimicrobials ( No resistance to X detected, The detected genetic determinant(s) have uncertain significance, resistance to X cannot be ruled out and Genetic determinant(s) associated with resistance to X detected). For each antimicrobial, the mutations found are reported in the mutation_nucleotide; (mutation_protein) format, otherwise No mutations detected is reported.	ONT, PE, SE
tbp_parser_looker_report_csv	File	Optional output. Looker digestible CSV report containing information on resistance for a set of antimicrobials (R for resistant, S for susceptible)	ONT, PE, SE
tbp_parser_version	String	Optional output. The version of tbp-parser	ONT, PE
tbprofiler_dr_type	String	Drug resistance type predicted by TB-Profiler (sensitive, Pre-MDR, MDR, Pre-XDR, XDR)	ONT, PE, SE
tbprofiler_main_lineage	String	Lineage(s) predicted by TBProfiler	ONT, PE, SE
tbprofiler_median_coverage	Int	The median coverage of the H37Rv TB reference genome	ONT, PE
tbprofiler_output_bai	File	Index BAM file generated by mapping sequencing reads to reference genome by TBProfiler	ONT, PE, SE
tbprofiler_output_bam	File	BAM alignment file produced by TBProfiler	ONT, PE, SE
tbprofiler_output_file	File	CSV report from TBProfiler	ONT, PE, SE
tbprofiler_output_vcf	File	VCF file output from TBProfiler; the concatenation of all of the different VCF files produced during TBProfiler analysis	ONT, PE, SE
tbprofiler_pct_reads_mapped	Float	The percentage of reads mapped to the H37Rv TB reference genome	ONT, PE
tbprofiler_resistance_genes	String	List of resistance mutations detected by TBProfiler	ONT, PE, SE
tbprofiler_sub_lineage	String	Sub-lineage(s) predicted by TBProfiler	ONT, PE, SE
tbprofiler_version	String	Version of TBProfiler used	ONT, PE, SE
theiaprok_fasta_analysis_date	String	Date of TheiaProk FASTA workflow execution	FASTA
theiaprok_fasta_version	String	Version of TheiaProk FASTA workflow execution	FASTA
theiaprok_illumina_pe_analysis_date	String	Date of TheiaProk PE workflow execution	PE
theiaprok_illumina_pe_version	String	Version of TheiaProk PE workflow execution	PE
theiaprok_illumina_se_analysis_date	String	Date of TheiaProk SE workflow execution	SE
theiaprok_illumina_se_version	String	Version of TheiaProk SE workflow execution	SE
theiaprok_ont_analysis_date	String	Date of TheiaProk ONT workflow execution	ONT
theiaprok_ont_version	String	Version of TheiaProk ONT workflow execution	ONT
tiptoft_plasmid_replicon_fastq	File	File produced by tiptoft that contains reads containing plasmid rep/inc genes	ONT
tiptoft_plasmid_replicon_genes	String	Rep/inc genes found in sample	ONT
tiptoft_version	String	Version of tiptoft used for analysis	ONT
trimmomatic_docker	String	Docker image used for trimmomatic	PE, SE
trimmomatic_version	String	Version of trimmomatic used	PE, SE
ts_mlst_allelic_profile	String	Profile of MLST loci and allele numbers predicted by MLST	FASTA, ONT, PE, SE
ts_mlst_docker	String	Docker image used for MLST	FASTA, ONT, PE, SE
ts_mlst_novel_alleles	File	FASTA file containing nucleotide sequence of any alleles that are not in the MLST database used by TheiaProk	FASTA, ONT, PE, SE
ts_mlst_predicted_st	String	ST predicted by MLST	FASTA, ONT, PE, SE
ts_mlst_pubmlst_scheme	String	PubMLST scheme used byMLST	FASTA, ONT, PE, SE
ts_mlst_results	File	TSV report with detailed MLST profile, including https://github.com/tseemann/mlst#missing-data	FASTA, ONT, PE, SE
ts_mlst_version	String	Version of Torsten Seeman’s MLST tool used	FASTA, ONT, PE, SE
virulencefinder_docker	String	VirulenceFinder docker image used	FASTA, ONT, PE, SE
virulencefinder_hits	String	Virulence genes detected by VirulenceFinder	FASTA, ONT, PE, SE
virulencefinder_report_tsv	File	Output TSV file created by VirulenceFinder	FASTA, ONT, PE, SE