TheiaEuk Workflow Series¶
Quick Facts¶
Workflow Type | Applicable Kingdom | Last Known Changes | Command-line Compatibility | Workflow Level | Dockstore |
---|---|---|---|---|---|
Genomic Characterization | Mycotics | vX.X.X | Some optional features incompatible, Yes | Sample-level | TheiaEuk_Illumina_PE_PHB, TheiaEuk_ONT_PHB |
TheiaEuk Workflows¶
The TheiaEuk workflows are for the assembly, quality assessment, and characterization of fungal genomes. It is designed to accept Illumina paired-end sequencing data or base-called ONT reads as the primary input. It is currently intended only for haploid fungal genomes like Candidozyma auris. Analyzing diploid genomes using TheiaEuk should be attempted only with expert attention to the resulting genome quality.
All input reads are processed through "core tasks" in each workflow. The core tasks include raw read quality assessment, read cleaning (quality trimming and adapter removal), de novo assembly, assembly quality assessment, and species taxon identification. For some taxa identified, taxa-specific sub-workflows will be automatically activated, undertaking additional taxa-specific characterization steps, including clade-typing and/or antifungal resistance detection.
Before running TheiaEuk
TheiaEuk_Illumina_PE relies on Snippy to perform variant calling on the cleaned read dataset and then queries the resulting file for specific mutations that are known to confim antifugal resistance (see Organism-specific characterization section). This behaviour has been replicated in TheiaEuk_ONT but the variant calling is performed directly on the resulting assemblies. Therefore, the read support reported is, at the moment, non-reliable. Future improvements will include improvements on this module.
Inputs¶
Input Read Data
The TheiaEuk_Illumina_PE workflow takes in Illumina paired-end read data. Read file names should end with .fastq
or .fq
, with the optional addition of .gz
. When possible, Theiagen recommends zipping files with gzip before Terra uploads to minimize data upload time.
By default, the workflow anticipates 2 x 150bp reads (i.e. the input reads were generated using a 300-cycle sequencing kit). Modifications to the optional parameter for trim_minlen
may be required to accommodate shorter read data, such as the 2 x 75bp reads generated using a 150-cycle sequencing kit.
The TheiaEuk_ONT workflow takes in base-called ONT read data. Read file names should end with .fastq
or .fq
, with the optional addition of .gz
. When possible, Theiagen recommends zipping files with gzip before uploading to Terra to minimize data upload time.
The ONT sequencing kit and base-calling approach can produce substantial variability in the amount and quality of read data. Genome assemblies produced by the TheiaEuk_ONT workflow must be quality assessed before reporting results.
Terra Task Name | Variable | Type | Description | Default Value | Terra Status |
---|---|---|---|---|---|
theiaeuk_illumina_pe | read1 | File | FASTQ file containing read1 sequences | Required | |
theiaeuk_illumina_pe | read2 | File | FASTQ file containing read2 sequences | Required | |
theiaeuk_illumina_pe | samplename | String | Sample name for the analysis | Required | |
busco | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional |
busco | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
cg_pipeline_clean | cg_pipe_opts | String | Options to pass to CG-Pipeline for clean read assessment | --fast | Optional |
cg_pipeline_clean | disk_size | Int | Amount of storage (in GB) to allocate to the task | 50 | Optional |
cg_pipeline_clean | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/lyveset:1.1.4f | Optional |
cg_pipeline_raw | cg_pipe_opts | String | Options to pass to CG-Pipeline for raw read assessment | --fast | Optional |
cg_pipeline_raw | disk_size | Int | Amount of storage (in GB) to allocate to the task | 50 | Optional |
cg_pipeline_raw | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/lyveset:1.1.4f | Optional |
clean_check_reads | cpu | Int | Number of CPUs to allocate to the task | 1 | Optional |
clean_check_reads | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
clean_check_reads | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/bactopia/gather_samples:2.0.2 | Optional |
clean_check_reads | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional |
digger_denovo | assembler | String | Assembler to use (spades, skesa, megahit) | skesa | Optional |
digger_denovo | assembler_options | String | Assembler-specific options that you might choose for the selected assembler | Optional | |
digger_denovo | bwa_cpu | Int | Number of CPUs to allocate to the task | 6 | Optional |
digger_denovo | bwa_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
digger_denovo | bwa_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/ivar:1.3.1-titan | Optional |
digger_denovo | bwa_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 16 | Optional |
digger_denovo | call_pilon | Boolean | Whether to run Pilon polishing after assembly | FALSE | Optional |
digger_denovo | filter_contigs_cpu | Int | Number of CPUs to allocate to the task | 1 | Optional |
digger_denovo | filter_contigs_disk_size | Int | Amount of storage (in GB) to allocate to the task | 50 | Optional |
digger_denovo | filter_contigs_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/shovilter:0.2 | Optional |
digger_denovo | filter_contigs_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional |
digger_denovo | filter_contigs_min_coverage | Float | Minimum coverage threshold for contig filtering | 2.0 | Optional |
digger_denovo | filter_contigs_skip_coverage_filter | Boolean | Skip filtering contigs based on coverage | FALSE | Optional |
digger_denovo | filter_contigs_skip_homopolymer_filter | Boolean | Skip filtering contigs containing homopolymers | FALSE | Optional |
digger_denovo | filter_contigs_skip_length_filter | Boolean | Skip filtering contigs based on length | FALSE | Optional |
digger_denovo | kmers | String | K-mer sizes for assembly (comma-separated) | Optional | |
digger_denovo | megahit_cpu | Int | Number of CPUs to allocate to the task | 4 | Optional |
digger_denovo | megahit_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
digger_denovo | megahit_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/megahit:1.2.9 | Optional |
digger_denovo | megahit_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 16 | Optional |
digger_denovo | pilon_cpu | Int | Number of CPUs to allocate to the task | 8 | Optional |
digger_denovo | pilon_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
digger_denovo | pilon_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/biocontainers/pilon:1.24--hdfd78af_0 | Optional |
digger_denovo | pilon_fix | String | Potential issues with assembly to try and automatically fix (snps, indels, gaps, local, all, bases, none) | bases | Optional |
digger_denovo | pilon_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 32 | Optional |
digger_denovo | pilon_min_base_quality | Int | Minimum base quality to keep | 3 | Optional |
digger_denovo | pilon_min_depth | Float | Minimum coverage threshold for variant calling: when set to a value ≥1, it requires that absolute depth of coverage; when set to a fraction <1, it requires coverage at least that fraction of the mean coverage for the region | 0.25 | Optional |
digger_denovo | pilon_min_mapping_quality | Int | Minimum mapping quality for a read to count in pileups | 60 | Optional |
digger_denovo | run_filter_contigs | Boolean | Whether to run contig filtering step | TRUE | Optional |
digger_denovo | skesa_cpu | Int | Number of CPUs to allocate to the task | 4 | Optional |
digger_denovo | skesa_disk_size | Int | Amount of storage (in GB) to allocate to the task | 50 | Optional |
digger_denovo | skesa_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/skesa:2.4.0 | Optional |
digger_denovo | skesa_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional |
digger_denovo | spades_cpu | Int | Number of CPUs to allocate to the task | 16 | Optional |
digger_denovo | spades_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
digger_denovo | spades_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/spades:4.1.0 | Optional |
digger_denovo | spades_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 32 | Optional |
digger_denovo | spades_type | String | SPAdes assembly mode (isolate, meta, rna, etc.), more can be found here | isolate | Optional |
gambit | disk_size | Int | Amount of storage (in GB) to allocate to the task | 20 | Optional |
gambit | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/gambit:1.0.0 | Optional |
merlin_magic | abricate_abaum_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | abricate_abaum_min_percent_coverage | Int | Internal component, do not modify | Optional | |
merlin_magic | abricate_abaum_min_percent_identity | Int | Internal component, do not modify | 95 | Optional |
merlin_magic | abricate_vibrio_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | abricate_vibrio_min_percent_coverage | Int | Internal component, do not modify | 80 | Optional |
merlin_magic | abricate_vibrio_min_percent_identity | Int | Internal component, do not modify | 80 | Optional |
merlin_magic | agrvate_agr_typing_only | Boolean | Internal component, do not modify | Optional | |
merlin_magic | agrvate_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | amr_search_cpu | Int | Number of CPUs to allocate to the task | 2 | Optional |
merlin_magic | amr_search_disk_size | Int | Amount of storage (in GB) to allocate to the task | 50 | Optional |
merlin_magic | amr_search_docker_image | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/amrsearch:0.2.1 | Optional |
merlin_magic | amr_search_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional |
merlin_magic | assembly_only | Boolean | Set to true if only analyzing input assembly | FALSE | Optional |
merlin_magic | call_poppunk | Boolean | Internal component, do not modify | TRUE | Optional |
merlin_magic | call_shigeifinder_reads_input | Boolean | Internal component, do not modify | FALSE | Optional |
merlin_magic | call_stxtyper | Boolean | Internal component, do not modify | FALSE | Optional |
merlin_magic | call_tbp_parser | Boolean | Internal component, do not modify | FALSE | Optional |
merlin_magic | cauris_cladetyper_docker_image | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/gambit:1.0.0 | Optional |
merlin_magic | cladetyper_kmer_size | Int | Kmer size for cladtyper | Optional | |
merlin_magic | cladetyper_max_distance | Float | The maximum GAMBIT distance to report a C. auris clade hit | 0.1 | Optional |
merlin_magic | cladetyper_ref_clade1 | File | Reference genome FASTA for Candidozyma auris clade1 | gs://theiagen-public-resources-rp/reference_data/eukaryotic/candidozyma/Cauris_Clade1_GCA_002759435.2_Cand_auris_B8441_V2_genomic.fasta | Optional |
merlin_magic | cladetyper_ref_clade1_annotated | File | Reference GBFF annotation for C. auris clade1 | gs://theiagen-public-resources-rp/reference_data/eukaryotic/candidozyma/Cauris_Clade1_GCA_002759435_Cauris_B8441_V2_genomic.gbff | Optional |
merlin_magic | cladetyper_ref_clade2 | File | Reference genome FASTA for C. auris clade2 | gs://theiagen-public-resources-rp/reference_data/eukaryotic/candidozyma/Cauris_Clade2_GCA_003013715.2_ASM301371v2_genomic.fasta | Optional |
merlin_magic | cladetyper_ref_clade2_annotated | File | Reference GBFF annotation for C. auris clade2 | gs://theiagen-public-resources-rp/reference_data/eukaryotic/candidozyma/Cauris_Clade2_GCA_003013715.2_ASM301371v2_genomic.gbff | Optional |
merlin_magic | cladetyper_ref_clade3 | File | Reference genome FASTA for C. auris clade3 | gs://theiagen-public-resources-rp/reference_data/eukaryotic/candidozyma/Cauris_Clade3_GCF_002775015.1_Cand_auris_B11221_V1_genomic.fasta | Optional |
merlin_magic | cladetyper_ref_clade3_annotated | File | Reference GBFF annotation for C. auris clade3 | gs://theiagen-public-resources-rp/reference_data/eukaryotic/candidozyma/Cauris_Clade3_GCF_002775015.1_Cand_auris_B11221_V1_genomic.gbff | Optional |
merlin_magic | cladetyper_ref_clade4 | File | Reference genome FASTA for C. auris clade4 | gs://theiagen-public-resources-rp/reference_data/eukaryotic/candidozyma/Cauris_Clade4_GCA_003014415.1_Cand_auris_B11243_genomic.fasta | Optional |
merlin_magic | cladetyper_ref_clade4_annotated | File | Reference GBFF annotation for C. auris clade4 | gs://theiagen-public-resources-rp/reference_data/eukaryotic/candidozyma/Cauris_Clade4_GCA_003014415.1_Cand_auris_B11243_genomic.gbff | Optional |
merlin_magic | cladetyper_ref_clade5 | File | Reference genome FASTA for C. auris clade5 | gs://theiagen-public-resources-rp/reference_data/eukaryotic/candidozyma/Cauris_Clade5_GCA_016809505.1_ASM1680950v1_genomic.fasta | Optional |
merlin_magic | cladetyper_ref_clade5_annotated | File | Reference GBFF annotation for C. auris clade5 | gs://theiagen-public-resources-rp/reference_data/eukaryotic/candidozyma/Cauris_Clade5_GCA_016809505.1_ASM1680950v1_genomic.gbff | Optional |
merlin_magic | cladetyper_ref_clade6 | File | Reference genome FASTA for C. auris clade6 | gs://theiagen-public-resources-rp/reference_data/eukaryotic/candidozyma/Cauris_Clade6_GCA_032714025.1_ASM3271402v1_genomic.fasta | Optional |
merlin_magic | cladetyper_ref_clade6_annotated | File | Reference GBFF annotation for C. auris clade6 | Optional | |
merlin_magic | clockwork_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | ectyper_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | ectyper_h_min_percent_coverage | Int | Internal component, do not modify | Optional | |
merlin_magic | ectyper_h_min_percent_identity | Int | Internal component, do not modify | Optional | |
merlin_magic | ectyper_o_min_percent_coverage | Int | Internal component, do not modify | Optional | |
merlin_magic | ectyper_o_min_percent_identity | Int | Internal component, do not modify | Optional | |
merlin_magic | ectyper_print_alleles | Boolean | Internal component, do not modify | Optional | |
merlin_magic | ectyper_verify | Boolean | Internal component, do not modify | Optional | |
merlin_magic | emmtyper_align_diff | Int | Internal component, do not modify | Optional | |
merlin_magic | emmtyper_cluster_distance | Int | Internal component, do not modify | Optional | |
merlin_magic | emmtyper_culling_limit | Int | Internal component, do not modify | Optional | |
merlin_magic | emmtyper_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | emmtyper_gap | Int | Internal component, do not modify | Optional | |
merlin_magic | emmtyper_max_size | Int | Internal component, do not modify | Optional | |
merlin_magic | emmtyper_min_good | Int | Internal component, do not modify | Optional | |
merlin_magic | emmtyper_min_percent_identity | Int | Internal component, do not modify | Optional | |
merlin_magic | emmtyper_min_perfect | Int | Internal component, do not modify | Optional | |
merlin_magic | emmtyper_mismatch | Int | Internal component, do not modify | Optional | |
merlin_magic | emmtyper_wf | String | Internal component, do not modify | Optional | |
merlin_magic | emmtypingtool_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | genotyphi_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | hicap_broken_gene_length | Int | Internal component, do not modify | Optional | |
merlin_magic | hicap_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | hicap_min_broken_gene_percent_identity | Float | Internal component, do not modify | Optional | |
merlin_magic | hicap_min_gene_percent_coverage | Float | Internal component, do not modify | Optional | |
merlin_magic | hicap_min_gene_percent_identity | Float | Internal component, do not modify | Optional | |
merlin_magic | kaptive_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | kaptive_low_gene_percent_identity | Float | Internal component, do not modify | Optional | |
merlin_magic | kaptive_min_percent_coverage | Float | Internal component, do not modify | Optional | |
merlin_magic | kaptive_min_percent_identity | Float | Internal component, do not modify | Optional | |
merlin_magic | kaptive_start_end_margin | Int | Internal component, do not modify | Optional | |
merlin_magic | kleborate_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | kleborate_min_kaptive_confidence | String | Internal component, do not modify | Optional | |
merlin_magic | kleborate_min_percent_coverage | Float | Internal component, do not modify | Optional | |
merlin_magic | kleborate_min_percent_identity | Float | Internal component, do not modify | Optional | |
merlin_magic | kleborate_min_spurious_percent_coverage | Float | Internal component, do not modify | Optional | |
merlin_magic | kleborate_min_spurious_percent_identity | Float | Internal component, do not modify | Optional | |
merlin_magic | kleborate_skip_kaptive | Boolean | Internal component, do not modify | Optional | |
merlin_magic | kleborate_skip_resistance | Boolean | Internal component, do not modify | Optional | |
merlin_magic | legsta_docker_image | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/biocontainers/legsta:0.5.1--hdfd78af_2 | Optional |
merlin_magic | lissero_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | lissero_min_percent_coverage | Float | Internal component, do not modify | Optional | |
merlin_magic | lissero_min_percent_identity | Float | Internal component, do not modify | Optional | |
merlin_magic | meningotype_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | ngmaster_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | ont_data | Boolean | Set to true if your data is ONT FASTQ files | FALSE | Optional |
merlin_magic | paired_end | Boolean | Set to true if your data is paired-end FASTQ files | TRUE | Optional |
merlin_magic | pasty_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | pasty_min_percent_coverage | Int | Internal component, do not modify | Optional | |
merlin_magic | pasty_min_percent_identity | Int | Internal component, do not modify | Optional | |
merlin_magic | pbptyper_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | pbptyper_min_percent_coverage | Int | Internal component, do not modify | Optional | |
merlin_magic | pbptyper_min_percent_identity | Int | Internal component, do not modify | Optional | |
merlin_magic | poppunk_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | poppunk_gps_clusters_csv | File | Internal component, do not modify | Optional | |
merlin_magic | poppunk_gps_dists_npy | File | Internal component, do not modify | Optional | |
merlin_magic | poppunk_gps_dists_pkl | File | Internal component, do not modify | Optional | |
merlin_magic | poppunk_gps_external_clusters_csv | File | Internal component, do not modify | Optional | |
merlin_magic | poppunk_gps_fit_npz | File | Internal component, do not modify | Optional | |
merlin_magic | poppunk_gps_fit_pkl | File | Internal component, do not modify | Optional | |
merlin_magic | poppunk_gps_graph_gt | File | Internal component, do not modify | Optional | |
merlin_magic | poppunk_gps_h5 | File | Internal component, do not modify | Optional | |
merlin_magic | poppunk_gps_qcreport_txt | File | Internal component, do not modify | Optional | |
merlin_magic | poppunk_gps_refs | File | Internal component, do not modify | Optional | |
merlin_magic | poppunk_gps_refs_dists_npy | File | Internal component, do not modify | Optional | |
merlin_magic | poppunk_gps_refs_dists_pkl | File | Internal component, do not modify | Optional | |
merlin_magic | poppunk_gps_refs_graph_gt | File | Internal component, do not modify | Optional | |
merlin_magic | poppunk_gps_refs_h5 | File | Internal component, do not modify | Optional | |
merlin_magic | poppunk_gps_unword_clusters_csv | File | Internal component, do not modify | Optional | |
merlin_magic | run_amr_search | Boolean | If set to true AMR_Search workflow will be run if species is part of supported taxon, see AMR_Search docs. | FALSE | Optional |
merlin_magic | seqsero2_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | seroba_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | serotypefinder_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | shigatyper_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | shigeifinder_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | sistr_cpu | Int | Internal component, do not modify | Optional | |
merlin_magic | sistr_disk_size | Int | Internal component, do not modify | Optional | |
merlin_magic | sistr_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | sistr_memory | Int | Internal component, do not modify | Optional | |
merlin_magic | sistr_use_full_cgmlst_db | Boolean | Internal component, do not modify | Optional | |
merlin_magic | snippy_base_quality | Int | Minimum quality for a nucleotide to be used in variant calling | 13 | Optional |
merlin_magic | snippy_gene_query_docker_image | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-06-21 | Optional |
merlin_magic | snippy_map_qual | Int | Minimum mapping quality to accept in variant calling | 60 | Optional |
merlin_magic | snippy_maxsoft | Int | Number of bases of alignment to soft-clip before discarding the alignment | 10 | Optional |
merlin_magic | snippy_min_coverage | Int | Minimum read coverage of a position to identify a mutation | 10 | Optional |
merlin_magic | snippy_min_frac | Float | Minimum fraction of bases at a given position to identify a mutation | 0 | Optional |
merlin_magic | snippy_min_quality | Int | Minimum VCF variant call "quality" | 100 | Optional |
merlin_magic | snippy_query_gene | String | Provide a gene to search for using Snippy | Default depend on detected organism | Optional |
merlin_magic | snippy_reference_afumigatus | File | Snippy reference for Aspergillus fumigatus | gs://theiagen-public-resources-rp/reference_data/eukaryotic/aspergillus/Aspergillus_fumigatus_GCF_000002655.1_ASM265v1_genomic.gbff | Optional |
merlin_magic | snippy_reference_cryptoneo | File | Snippy reference for Cryptococcus neoformans | gs://theiagen-public-resources-rp/reference_data/eukaryotic/cryptococcus/Cryptococcus_neoformans_GCF_000091045.1_ASM9104v1_genomic.gbff | Optional |
merlin_magic | snippy_variants_docker_image | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/snippy:4.6.0 | Optional |
merlin_magic | sonneityping_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | sonneityping_mykrobe_opts | String | Internal component, do not modify | Optional | |
merlin_magic | spatyper_do_enrich | Boolean | Internal component, do not modify | Optional | |
merlin_magic | spatyper_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | srst2_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | srst2_gene_max_mismatch | Int | Internal component, do not modify | 2000 | Optional |
merlin_magic | srst2_max_divergence | Int | Internal component, do not modify | 20 | Optional |
merlin_magic | srst2_min_depth | Int | Internal component, do not modify | 5 | Optional |
merlin_magic | srst2_min_edge_depth | Int | Internal component, do not modify | 2 | Optional |
merlin_magic | srst2_min_percent_coverage | Int | Internal component, do not modify | 80 | Optional |
merlin_magic | staphopia_sccmec_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | stxtyper_cpu | Int | Internal component, do not modify | Optional | |
merlin_magic | stxtyper_disk_size | Int | Internal component, do not modify | Optional | |
merlin_magic | stxtyper_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | stxtyper_enable_debug | Boolean | Internal component, do not modify | Optional | |
merlin_magic | stxtyper_memory | Int | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_add_cs_lims | Boolean | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_config | File | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_coverage_regions_bed | File | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_debug | Boolean | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_etha237_frequency | Float | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_expert_rule_regions_bed | File | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_min_depth | Int | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_min_frequency | Float | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_min_percent_coverage | Float | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_min_read_support | Int | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_operator | String | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_output_seq_method_type | String | Internal component, do not modify | WGS | Optional |
merlin_magic | tbp_parser_rpob449_frequency | Float | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_rrl_frequency | Float | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_rrl_read_support | Int | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_rrs_frequency | Float | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_rrs_read_support | Int | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_tngs_data | Boolean | Internal component, do not modify | Optional | |
merlin_magic | tbprofiler_additional_parameters | String | Internal component, do not modify | Optional | |
merlin_magic | tbprofiler_custom_db | File | Internal component, do not modify | Optional | |
merlin_magic | tbprofiler_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | tbprofiler_mapper | String | Internal component, do not modify | Optional | |
merlin_magic | tbprofiler_min_af | Float | Internal component, do not modify | Optional | |
merlin_magic | tbprofiler_min_depth | Int | Internal component, do not modify | Optional | |
merlin_magic | tbprofiler_run_cdph_db | Boolean | Internal component, do not modify | FALSE | Optional |
merlin_magic | tbprofiler_run_custom_db | Boolean | Internal component, do not modify | FALSE | Optional |
merlin_magic | tbprofiler_variant_caller | String | Internal component, do not modify | Optional | |
merlin_magic | tbprofiler_variant_calling_params | String | Internal component, do not modify | Optional | |
merlin_magic | vibecheck_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | vibecheck_lineage_barcodes | File | Internal component, do not modify | Optional | |
merlin_magic | vibecheck_skip_subsampling | Boolean | Internal component, do not modify | Optional | |
merlin_magic | vibecheck_subsampling_fraction | Float | Internal component, do not modify | Optional | |
merlin_magic | virulencefinder_database | String | Internal component, do not modify | Optional | |
merlin_magic | virulencefinder_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | virulencefinder_min_percent_coverage | Float | Internal component, do not modify | Optional | |
merlin_magic | virulencefinder_min_percent_identity | Float | Internal component, do not modify | Optional | |
qc_check_task | ani_highest_percent | Float | Internal component, do not modify | Optional | |
qc_check_task | ani_highest_percent_bases_aligned | Float | Internal component, do not modify | Optional | |
qc_check_task | assembly_length_unambiguous | Int | Internal component, do not modify | Optional | |
qc_check_task | assembly_mean_coverage | Float | Internal component, do not modify | Optional | |
qc_check_task | cpu | Int | Number of CPUs to allocate to the task | 4 | Optional |
qc_check_task | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
qc_check_task | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-03-16 | Optional |
qc_check_task | kraken_human | Float | Internal component, do not modify | Optional | |
qc_check_task | kraken_human_dehosted | Float | Internal component, do not modify | Optional | |
qc_check_task | kraken_sc2 | Float | Internal component, do not modify | Optional | |
qc_check_task | kraken_sc2_dehosted | Float | Internal component, do not modify | Optional | |
qc_check_task | kraken_target_organism | Float | Internal component, do not modify | Optional | |
qc_check_task | kraken_target_organism_dehosted | Float | Internal component, do not modify | Optional | |
qc_check_task | meanbaseq_trim | String | Internal component, do not modify | Optional | |
qc_check_task | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional |
qc_check_task | midas_secondary_genus_abundance | Float | Internal component, do not modify | Optional | |
qc_check_task | midas_secondary_genus_coverage | Float | Internal component, do not modify | Optional | |
qc_check_task | number_Degenerate | Int | Internal component, do not modify | Optional | |
qc_check_task | number_N | Int | Internal component, do not modify | Optional | |
qc_check_task | percent_reference_coverage | Float | Internal component, do not modify | Optional | |
qc_check_task | sc2_s_gene_mean_coverage | Float | Internal component, do not modify | Optional | |
qc_check_task | sc2_s_gene_percent_coverage | Float | Internal component, do not modify | Optional | |
qc_check_task | vadr_num_alerts | String | Internal component, do not modify | Optional | |
quast | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
quast | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/quast:5.0.2 | Optional |
quast | min_contig_length | Int | Minimum length of contig for QUAST | 500 | Optional |
rasusa_task | bases | String | Explicitly set the number of bases required e.g., 4.3kb, 7Tb, 9000, 4.1MB. If this option is given, --coverage and --genome-size are ignored | Optional | |
rasusa_task | cpu | Int | Number of CPUs to allocate to the task | 4 | Optional |
rasusa_task | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
rasusa_task | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/rasusa:2.1.0 | Optional |
rasusa_task | frac | Float | Explicitly define the fraction of reads to keep in the subsample; when used, genome size and coverage are ignored; acceptable inputs include whole numbers and decimals, e.g. 50.0 will leave 50% of the reads in the subsample | Optional | |
rasusa_task | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional |
rasusa_task | num | Int | Optional: explicitly define the number of reads in the subsample; when used, genome size and coverage are ignored; acceptable metric suffixes include: b, k, m, g, and t for base, kilo, mega, giga, and tera, respectively | Optional | |
rasusa_task | seed | Int | Use to assign a name to the "random seed" that is used by the subsampler; i.e. this allows the exact same subsample to be produced from the same input file/s in subsequent runs when providing the seed identifier; do not input values for random downsampling | Optional | |
raw_check_reads | cpu | Int | Number of CPUs to allocate to the task | 1 | Optional |
raw_check_reads | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
raw_check_reads | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/bactopia/gather_samples:2.0.2 | Optional |
raw_check_reads | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional |
read_QC_trim | adapters | File | File with adapter sequences to be removed | Optional | |
read_QC_trim | bbduk_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional |
read_QC_trim | call_kraken | Boolean | True/False variable that determines if the Kraken2 task should be called; for non-TheiaCoV workflows, the kraken_db variable must be provided. |
FALSE | Optional |
read_QC_trim | call_midas | Boolean | Internal component, do not modify | FALSE | Optional |
read_QC_trim | extract_unclassified | Boolean | Internal component, do not modify | FALSE | Optional |
read_QC_trim | fastp_args | String | Additional arguments to use with fastp | --detect_adapter_for_pe -g -5 20 -3 20 | Optional |
read_QC_trim | host | String | Internal component, do not modify | Optional | |
read_QC_trim | host_complete_only | Boolean | Internal component, do not modify | FALSE | Optional |
read_QC_trim | host_decontaminate_mem | Int | Internal component, do not modify | 32 | Optional |
read_QC_trim | host_is_accession | Boolean | Internal component, do not modify | FALSE | Optional |
read_QC_trim | host_refseq | Boolean | Internal component, do not modify | TRUE | Optional |
read_QC_trim | kraken_cpu | Int | Number of CPUs to allocate to the task | 4 | Optional |
read_QC_trim | kraken_db | File | A kraken2 database to use with the kraken2 optional task. The file must be a .tar.gz kraken2 database. | Optional | |
read_QC_trim | kraken_disk_size | Int | Amount of storage (in GB) to allocate to the task. Increase this when using large (>30GB kraken2 databases such as the "k2_standard" database) | 100 | Optional |
read_QC_trim | kraken_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 32 | Optional |
read_QC_trim | midas_db | File | Internal component, do not modify | gs://theiagen-public-files-rp/terra/theiaprok-files/midas/midas_db_v1.2.tar.gz | Optional |
read_QC_trim | phix | File | A file containing the phix used during Illumina sequencing; used in the BBDuk task | Optional | |
read_QC_trim | read_processing | String | The name of the tool to perform basic read processing; options: "trimmomatic" or "fastp" | trimmomatic | Optional |
read_QC_trim | read_qc | String | The tool used for quality control (QC) of reads. Options are "fastq_scan" (default) and "fastqc" | fastq_scan | Optional |
read_QC_trim | target_organism | String | This string is searched for in the kraken2 outputs to extract the read percentage | Optional | |
read_QC_trim | taxon_id | Int | Internal component, do not modify | 0 | Optional |
read_QC_trim | trimmomatic_args | String | Additional arguments to pass to trimmomatic. "-phred33" specifies the Phred Q score encoding which is almost always phred33 with modern sequence data. | -phred33 | Optional |
read_QC_trim | workflow_series | String | Internal component, do not modify | Optional | |
theiaeuk_illumina_pe | busco_docker_image | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/ezlabgva/busco:v5.3.2_cv1 | Optional |
theiaeuk_illumina_pe | busco_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 24 | Optional |
theiaeuk_illumina_pe | call_rasusa | Boolean | If true, RASUSA will subsample raw reads to a specified read depth (150X by default) | TRUE | Optional |
theiaeuk_illumina_pe | cpu | Int | Number of CPUs to allocate to the task | 8 | Optional |
theiaeuk_illumina_pe | expected_taxon | String | If provided, this input will override the taxonomic assignment made by GAMBIT and launch the relevant taxon-specific submodules. It will also modify the organism flag used by AMRFinderPlus. Example format: "Salmonella enterica" | Optional | |
theiaeuk_illumina_pe | gambit_db_genomes | File | User-provided database of assembled query genomes; requires complementary signatures file. If not provided, uses default database, "/gambit-db" | gs://gambit-databases-rp/fungal-version/1.0.0/gambit-fungal-metadata-1.0.0-20241213.gdb | Optional |
theiaeuk_illumina_pe | gambit_db_signatures | File | User-provided signatures file; requires complementary genomes file. If not specified, the file from the docker container will be used. | gs://gambit-databases-rp/fungal-version/1.0.0/gambit-fungal-signatures-1.0.0-20241213.gs | Optional |
theiaeuk_illumina_pe | genome_length | Int | User-specified expected genome length to be used in genome statistics calculations | Optional | |
theiaeuk_illumina_pe | max_genome_length | Int | Maximum genome size able to pass read screening | 178000000 | Optional |
theiaeuk_illumina_pe | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 16 | Optional |
theiaeuk_illumina_pe | min_basepairs | Int | Minimum number of base pairs able to pass read screening | 45000000 | Optional |
theiaeuk_illumina_pe | min_contig_length | Int | Minimum contig length for assembler | 1000 | Optional |
theiaeuk_illumina_pe | min_coverage | Int | Minimum genome coverage able to pass read screening | 10 | Optional |
theiaeuk_illumina_pe | min_genome_length | Int | Minimum genome size able to pass read screening | 9000000 | Optional |
theiaeuk_illumina_pe | min_proportion | Int | Minimum proportion of total reads in each read file to pass read screening | 40 | Optional |
theiaeuk_illumina_pe | min_reads | Int | Minimum number of reads to pass read screening | 30000 | Optional |
theiaeuk_illumina_pe | qc_check_table | File | TSV value with taxons for rows and QC values for columns; internal cells represent user-determined QC thresholds; if provided, turns on the QC Check task. See below for an example QC Check table. | Optional | |
theiaeuk_illumina_pe | seq_method | String | Sequencing method used for the samples | ILLUMINA | Optional |
theiaeuk_illumina_pe | skip_screen | Boolean | Option to skip the read screening prior to analysis; if setting to true, please provide a value for the theiaeuk_pe genome_length optional input, OR set call_rasusa to false. Otherwise RASUSA will attempt to downsample to an expected genome size of 0 bp, and the workflow will fail. | FALSE | Optional |
theiaeuk_illumina_pe | subsample_coverage | Float | Read depth for RASUSA task to subsample reads to | 150 | Optional |
theiaeuk_illumina_pe | trim_min_length | Int | Specifies minimum length of each read after trimming to be kept | 75 | Optional |
theiaeuk_illumina_pe | trim_quality_min_score | Int | Specifies the minimum average quality of bases in a sliding window to be kept | 20 | Optional |
theiaeuk_illumina_pe | trim_window_size | Int | Size of the trimming window to use | 10 | Optional |
version_capture | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0 | Optional |
version_capture | timezone | String | Set the time zone to get an accurate date of analysis (uses UTC by default) | Optional |
Terra Task Name | Variable | Type | Description | Default Value | Terra Status |
---|---|---|---|---|---|
theiaeuk_ont | read1 | File | ONT read file in FASTQ file format (compression optional) | Required | |
theiaeuk_ont | samplename | String | The name of the sample being analyzed | Required | |
busco | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional |
busco | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
clean_check_reads | cpu | Int | Number of CPUs to allocate to the task | 1 | Optional |
clean_check_reads | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
clean_check_reads | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/bactopia/gather_samples:2.0.2 | Optional |
clean_check_reads | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional |
clean_check_reads | workflow_series | String | Internal component, do not modify | theiaviral | Optional |
flye_denovo | auto_medaka_model | Boolean | If true, medaka will automatically select the best Medaka model for assembly | TRUE | Optional |
flye_denovo | bandage_cpu | Int | Number of CPUs to allocate to the task | 2 | Optional |
flye_denovo | bandage_disk_size | Int | Amount of storage (in GB) to allocate to the task | 10 | Optional |
flye_denovo | bandage_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional |
flye_denovo | dnaapler_cpu | Int | Number of CPUs to allocate to the task | 1 | Optional |
flye_denovo | dnaapler_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
flye_denovo | dnaapler_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 16 | Optional |
flye_denovo | dnaapler_mode | String | Dnaapler-specific inputs | all | Optional |
flye_denovo | filter_contigs_cpu | Int | Number of CPUs to allocate to the task | 1 | Optional |
flye_denovo | filter_contigs_disk_size | Int | Amount of storage (in GB) to allocate to the task | 10 | Optional |
flye_denovo | filter_contigs_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 16 | Optional |
flye_denovo | filter_contigs_min_length | Int | Minimum contig length to keep | 1000 | Optional |
flye_denovo | flye_additional_parameters | String | Any extra Flye-specific parameters | Optional | |
flye_denovo | flye_asm_coverage | Int | Reduced coverage for initial disjointig assembly | Optional | |
flye_denovo | flye_cpu | Int | Number of CPUs to allocate to the task | 4 | Optional |
flye_denovo | flye_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
flye_denovo | flye_genome_length | Int | User-specified expected genome length to be used in genome statistics calculations | Optional | |
flye_denovo | flye_keep_haplotypes | Boolean | If true keep haplotypes | FALSE | Optional |
flye_denovo | flye_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 32 | Optional |
flye_denovo | flye_minimum_overlap | Int | Minimum overlap between reads | Optional | |
flye_denovo | flye_no_alt_contigs | Boolean | If true, do not generate alternative contigs | FALSE | Optional |
flye_denovo | flye_polishing_iterations | Int | Default polishing iterations | 1 | Optional |
flye_denovo | flye_read_error_rate | Float | Maximum expected read error rate | Optional | |
flye_denovo | flye_read_type | String | Specifies the type of sequencing reads. Options: --nano-hq (default), --nano-corr, --nano-raw, --pacbio-raw, --pacbio-corr, --pacbio-hifi. Refer to Flye documentation for details on each type. | --nano-hq | Optional |
flye_denovo | flye_scaffold | Boolean | If true, scaffolding is enabled using graph | FALSE | Optional |
flye_denovo | flye_uneven_coverage_mode | Boolean | sets the --meta option in the case of uneven coverage (or metagenomics) | FALSE | Optional |
flye_denovo | illumina_read1 | File | If Illumina reads are provided, flye_denovo subworkflow will perform Illumina polishing | Optional | |
flye_denovo | illumina_read2 | File | If Illumina reads are provided, flye_denovo subworkflow will perform Illumina polishing | Optional | |
flye_denovo | medaka_cpu | Int | Number of CPUs to allocate to the task | 4 | Optional |
flye_denovo | medaka_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
flye_denovo | medaka_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 16 | Optional |
flye_denovo | medaka_model | String | In order to obtain the best results, the appropriate model must be set to match the sequencer's basecaller model; this string takes the format of {pore}{device}{caller variant}_{caller_version}. See also https://github.com/nanoporetech/medaka?tab=readme-ov-file#models. If this is being run on legacy data it is likely to be r941_min_hac_g507. | r1041_e82_400bps_sup_v5.0.0 | Optional |
flye_denovo | polish_rounds | Int | The number of polishing rounds to conduct for medaka or racon (without Illumina) | 1 | Optional |
flye_denovo | polisher | String | The polishing tool to use for assembly | medaka | Optional |
flye_denovo | polypolish_careful | Boolean | Polypolish-specific inputs | FALSE | Optional |
flye_denovo | polypolish_cpu | Int | Number of CPUs to allocate to the task | 1 | Optional |
flye_denovo | polypolish_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
flye_denovo | polypolish_fraction_invalid | Float | Polypolish-specific inputs | Optional | |
flye_denovo | polypolish_fraction_valid | Float | Polypolish-specific inputs | Optional | |
flye_denovo | polypolish_high_percentile_threshold | Float | Polypolish-specific inputs | Optional | |
flye_denovo | polypolish_low_percentile_threshold | Float | Polypolish-specific inputs | Optional | |
flye_denovo | polypolish_maximum_errors | Int | Polypolish-specific inputs | Optional | |
flye_denovo | polypolish_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional |
flye_denovo | polypolish_minimum_depth | Int | Polypolish-specific inputs | Optional | |
flye_denovo | polypolish_pair_orientation | String | Polypolish-specific inputs | Optional | |
flye_denovo | porechop_cpu | Int | Number of CPUs to allocate to the task | 4 | Optional |
flye_denovo | porechop_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
flye_denovo | porechop_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 16 | Optional |
flye_denovo | porechop_trimopts | String | Options to pass to Porechop for trimming | Optional | |
flye_denovo | racon_cpu | Int | Number of CPUs to allocate to the task | 8 | Optional |
flye_denovo | racon_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
flye_denovo | racon_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 16 | Optional |
flye_denovo | run_porechop | Boolean | If true, trims reads before assembly using Porechop | FALSE | Optional |
flye_denovo | skip_polishing | Boolean | If true, skips polishing | FALSE | Optional |
gambit | cpu | Int | Number of CPUs to allocate to the task | 1 | Optional |
gambit | disk_size | Int | Amount of storage (in GB) to allocate to the task | 20 | Optional |
gambit | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/gambit:1.0.0 | Optional |
gambit | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional |
merlin_magic | abricate_abaum_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | abricate_abaum_min_percent_coverage | Int | Internal component, do not modify | Optional | |
merlin_magic | abricate_abaum_min_percent_identity | Int | Internal component, do not modify | 95 | Optional |
merlin_magic | abricate_vibrio_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | abricate_vibrio_min_percent_coverage | Int | Internal component, do not modify | 80 | Optional |
merlin_magic | abricate_vibrio_min_percent_identity | Int | Internal component, do not modify | 80 | Optional |
merlin_magic | agrvate_agr_typing_only | Boolean | Internal component, do not modify | Optional | |
merlin_magic | agrvate_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | agrvate_docker_image | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/biocontainers/agrvate:1.0.2--hdfd78af_0 | Optional |
merlin_magic | amr_search_cpu | Int | Number of CPUs to allocate to the task | 2 | Optional |
merlin_magic | amr_search_disk_size | Int | Amount of storage (in GB) to allocate to the task | 50 | Optional |
merlin_magic | amr_search_docker_image | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/amrsearch:0.2.1 | Optional |
merlin_magic | amr_search_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional |
merlin_magic | call_poppunk | Boolean | Internal component, do not modify | TRUE | Optional |
merlin_magic | call_shigeifinder_reads_input | Boolean | Internal component, do not modify | FALSE | Optional |
merlin_magic | call_stxtyper | Boolean | Internal component, do not modify | FALSE | Optional |
merlin_magic | call_tbp_parser | Boolean | Internal component, do not modify | FALSE | Optional |
merlin_magic | cauris_cladetyper_docker_image | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/gambit:1.0.0 | Optional |
merlin_magic | cladetyper_kmer_size | Int | Kmer size for cladtyper | Optional | |
merlin_magic | cladetyper_max_distance | Float | The maximum GAMBIT distance to report a C. auris clade hit | 0.1 | Optional |
merlin_magic | cladetyper_ref_clade1 | File | Reference genome FASTA for Candidozyma auris clade1 | gs://theiagen-public-resources-rp/reference_data/eukaryotic/candidozyma/Cauris_Clade1_GCA_002759435.2_Cand_auris_B8441_V2_genomic.fasta | Optional |
merlin_magic | cladetyper_ref_clade1_annotated | File | Reference GBFF annotation for C. auris clade1 | gs://theiagen-public-resources-rp/reference_data/eukaryotic/candidozyma/Cauris_Clade1_GCA_002759435_Cauris_B8441_V2_genomic.gbff | Optional |
merlin_magic | cladetyper_ref_clade2 | File | Reference genome FASTA for C. auris clade2 | gs://theiagen-public-resources-rp/reference_data/eukaryotic/candidozyma/Cauris_Clade2_GCA_003013715.2_ASM301371v2_genomic.fasta | Optional |
merlin_magic | cladetyper_ref_clade2_annotated | File | Reference GBFF annotation for C. auris clade2 | gs://theiagen-public-resources-rp/reference_data/eukaryotic/candidozyma/Cauris_Clade2_GCA_003013715.2_ASM301371v2_genomic.gbff | Optional |
merlin_magic | cladetyper_ref_clade3 | File | Reference genome FASTA for C. auris clade3 | gs://theiagen-public-resources-rp/reference_data/eukaryotic/candidozyma/Cauris_Clade3_GCF_002775015.1_Cand_auris_B11221_V1_genomic.fasta | Optional |
merlin_magic | cladetyper_ref_clade3_annotated | File | Reference GBFF annotation for C. auris clade3 | gs://theiagen-public-resources-rp/reference_data/eukaryotic/candidozyma/Cauris_Clade3_GCF_002775015.1_Cand_auris_B11221_V1_genomic.gbff | Optional |
merlin_magic | cladetyper_ref_clade4 | File | Reference genome FASTA for C. auris clade4 | gs://theiagen-public-resources-rp/reference_data/eukaryotic/candidozyma/Cauris_Clade4_GCA_003014415.1_Cand_auris_B11243_genomic.fasta | Optional |
merlin_magic | cladetyper_ref_clade4_annotated | File | Reference GBFF annotation for C. auris clade4 | gs://theiagen-public-resources-rp/reference_data/eukaryotic/candidozyma/Cauris_Clade4_GCA_003014415.1_Cand_auris_B11243_genomic.gbff | Optional |
merlin_magic | cladetyper_ref_clade5 | File | Reference genome FASTA for C. auris clade5 | gs://theiagen-public-resources-rp/reference_data/eukaryotic/candidozyma/Cauris_Clade5_GCA_016809505.1_ASM1680950v1_genomic.fasta | Optional |
merlin_magic | cladetyper_ref_clade5_annotated | File | Reference GBFF annotation for C. auris clade5 | gs://theiagen-public-resources-rp/reference_data/eukaryotic/candidozyma/Cauris_Clade5_GCA_016809505.1_ASM1680950v1_genomic.gbff | Optional |
merlin_magic | cladetyper_ref_clade6 | File | Reference genome FASTA for C. auris clade6 | gs://theiagen-public-resources-rp/reference_data/eukaryotic/candidozyma/Cauris_Clade6_GCA_032714025.1_ASM3271402v1_genomic.fasta | Optional |
merlin_magic | cladetyper_ref_clade6_annotated | File | Reference GBFF annotation for C. auris clade6 | Optional | |
merlin_magic | clockwork_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | ectyper_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | ectyper_h_min_percent_coverage | Int | Internal component, do not modify | Optional | |
merlin_magic | ectyper_h_min_percent_identity | Int | Internal component, do not modify | Optional | |
merlin_magic | ectyper_o_min_percent_coverage | Int | Internal component, do not modify | Optional | |
merlin_magic | ectyper_o_min_percent_identity | Int | Internal component, do not modify | Optional | |
merlin_magic | ectyper_print_alleles | Boolean | Internal component, do not modify | Optional | |
merlin_magic | ectyper_verify | Boolean | Internal component, do not modify | Optional | |
merlin_magic | emmtyper_align_diff | Int | Internal component, do not modify | Optional | |
merlin_magic | emmtyper_cluster_distance | Int | Internal component, do not modify | Optional | |
merlin_magic | emmtyper_culling_limit | Int | Internal component, do not modify | Optional | |
merlin_magic | emmtyper_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | emmtyper_gap | Int | Internal component, do not modify | Optional | |
merlin_magic | emmtyper_max_size | Int | Internal component, do not modify | Optional | |
merlin_magic | emmtyper_min_good | Int | Internal component, do not modify | Optional | |
merlin_magic | emmtyper_min_percent_identity | Int | Internal component, do not modify | Optional | |
merlin_magic | emmtyper_min_perfect | Int | Internal component, do not modify | Optional | |
merlin_magic | emmtyper_mismatch | Int | Internal component, do not modify | Optional | |
merlin_magic | emmtyper_wf | String | Internal component, do not modify | Optional | |
merlin_magic | emmtypingtool_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | genotyphi_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | hicap_broken_gene_length | Int | Internal component, do not modify | Optional | |
merlin_magic | hicap_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | hicap_min_broken_gene_percent_identity | Float | Internal component, do not modify | Optional | |
merlin_magic | hicap_min_gene_percent_coverage | Float | Internal component, do not modify | Optional | |
merlin_magic | hicap_min_gene_percent_identity | Float | Internal component, do not modify | Optional | |
merlin_magic | kaptive_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | kaptive_low_gene_percent_identity | Float | Internal component, do not modify | Optional | |
merlin_magic | kaptive_min_percent_coverage | Float | Internal component, do not modify | Optional | |
merlin_magic | kaptive_min_percent_identity | Float | Internal component, do not modify | Optional | |
merlin_magic | kaptive_start_end_margin | Int | Internal component, do not modify | Optional | |
merlin_magic | kleborate_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | kleborate_min_kaptive_confidence | String | Internal component, do not modify | Optional | |
merlin_magic | kleborate_min_percent_coverage | Float | Internal component, do not modify | Optional | |
merlin_magic | kleborate_min_percent_identity | Float | Internal component, do not modify | Optional | |
merlin_magic | kleborate_min_spurious_percent_coverage | Float | Internal component, do not modify | Optional | |
merlin_magic | kleborate_min_spurious_percent_identity | Float | Internal component, do not modify | Optional | |
merlin_magic | kleborate_skip_kaptive | Boolean | Internal component, do not modify | Optional | |
merlin_magic | kleborate_skip_resistance | Boolean | Internal component, do not modify | Optional | |
merlin_magic | legsta_docker_image | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/biocontainers/legsta:0.5.1--hdfd78af_2 | Optional |
merlin_magic | lissero_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | lissero_min_percent_coverage | Float | Internal component, do not modify | Optional | |
merlin_magic | lissero_min_percent_identity | Float | Internal component, do not modify | Optional | |
merlin_magic | meningotype_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | ngmaster_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | paired_end | Boolean | Set to true if your data is paired-end FASTQ files | TRUE | Optional |
merlin_magic | pasty_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | pasty_min_percent_coverage | Int | Internal component, do not modify | Optional | |
merlin_magic | pasty_min_percent_identity | Int | Internal component, do not modify | Optional | |
merlin_magic | pbptyper_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | pbptyper_min_percent_coverage | Int | Internal component, do not modify | Optional | |
merlin_magic | pbptyper_min_percent_identity | Int | Internal component, do not modify | Optional | |
merlin_magic | poppunk_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | poppunk_gps_clusters_csv | File | Internal component, do not modify | Optional | |
merlin_magic | poppunk_gps_dists_npy | File | Internal component, do not modify | Optional | |
merlin_magic | poppunk_gps_dists_pkl | File | Internal component, do not modify | Optional | |
merlin_magic | poppunk_gps_external_clusters_csv | File | Internal component, do not modify | Optional | |
merlin_magic | poppunk_gps_fit_npz | File | Internal component, do not modify | Optional | |
merlin_magic | poppunk_gps_fit_pkl | File | Internal component, do not modify | Optional | |
merlin_magic | poppunk_gps_graph_gt | File | Internal component, do not modify | Optional | |
merlin_magic | poppunk_gps_h5 | File | Internal component, do not modify | Optional | |
merlin_magic | poppunk_gps_qcreport_txt | File | Internal component, do not modify | Optional | |
merlin_magic | poppunk_gps_refs | File | Internal component, do not modify | Optional | |
merlin_magic | poppunk_gps_refs_dists_npy | File | Internal component, do not modify | Optional | |
merlin_magic | poppunk_gps_refs_dists_pkl | File | Internal component, do not modify | Optional | |
merlin_magic | poppunk_gps_refs_graph_gt | File | Internal component, do not modify | Optional | |
merlin_magic | poppunk_gps_refs_h5 | File | Internal component, do not modify | Optional | |
merlin_magic | poppunk_gps_unword_clusters_csv | File | Internal component, do not modify | Optional | |
merlin_magic | read1 | File | Internal component, do not modify | Optional | |
merlin_magic | read2 | File | Internal component, do not modify | Optional | |
merlin_magic | run_amr_search | Boolean | If set to true AMR_Search workflow will be run if species is part of supported taxon, see AMR_Search docs. | FALSE | Optional |
merlin_magic | seqsero2_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | seroba_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | serotypefinder_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | shigatyper_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | shigeifinder_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | sistr_cpu | Int | Internal component, do not modify | Optional | |
merlin_magic | sistr_disk_size | Int | Internal component, do not modify | Optional | |
merlin_magic | sistr_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | sistr_memory | Int | Internal component, do not modify | Optional | |
merlin_magic | sistr_use_full_cgmlst_db | Boolean | Internal component, do not modify | Optional | |
merlin_magic | snippy_base_quality | Int | Minimum quality for a nucleotide to be used in variant calling | 13 | Optional |
merlin_magic | snippy_gene_query_docker_image | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-06-21 | Optional |
merlin_magic | snippy_map_qual | Int | Minimum mapping quality to accept in variant calling | 60 | Optional |
merlin_magic | snippy_maxsoft | Int | Number of bases of alignment to soft-clip before discarding the alignment | 10 | Optional |
merlin_magic | snippy_min_coverage | Int | Minimum read coverage of a position to identify a mutation | 10 | Optional |
merlin_magic | snippy_min_frac | Float | Minimum fraction of bases at a given position to identify a mutation | 0 | Optional |
merlin_magic | snippy_min_quality | Int | Minimum VCF variant call "quality" | 100 | Optional |
merlin_magic | snippy_query_gene | String | Provide a gene to search for using Snippy | Default depend on detected organism | Optional |
merlin_magic | snippy_reference_afumigatus | File | Snippy reference for Aspergillus fumigatus | gs://theiagen-public-resources-rp/reference_data/eukaryotic/aspergillus/Aspergillus_fumigatus_GCF_000002655.1_ASM265v1_genomic.gbff | Optional |
merlin_magic | snippy_reference_cryptoneo | File | Snippy reference for Cryptococcus neoformans | gs://theiagen-public-resources-rp/reference_data/eukaryotic/cryptococcus/Cryptococcus_neoformans_GCF_000091045.1_ASM9104v1_genomic.gbff | Optional |
merlin_magic | snippy_variants_docker_image | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/snippy:4.6.0 | Optional |
merlin_magic | sonneityping_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | sonneityping_mykrobe_opts | String | Internal component, do not modify | Optional | |
merlin_magic | spatyper_do_enrich | Boolean | Internal component, do not modify | Optional | |
merlin_magic | spatyper_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | srst2_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | srst2_gene_max_mismatch | Int | Internal component, do not modify | 2000 | Optional |
merlin_magic | srst2_max_divergence | Int | Internal component, do not modify | 20 | Optional |
merlin_magic | srst2_min_depth | Int | Internal component, do not modify | 5 | Optional |
merlin_magic | srst2_min_edge_depth | Int | Internal component, do not modify | 2 | Optional |
merlin_magic | srst2_min_percent_coverage | Int | Internal component, do not modify | 80 | Optional |
merlin_magic | staphopia_sccmec_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | stxtyper_cpu | Int | Internal component, do not modify | Optional | |
merlin_magic | stxtyper_disk_size | Int | Internal component, do not modify | Optional | |
merlin_magic | stxtyper_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | stxtyper_enable_debug | Boolean | Internal component, do not modify | Optional | |
merlin_magic | stxtyper_memory | Int | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_add_cs_lims | Boolean | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_config | File | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_coverage_regions_bed | File | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_debug | Boolean | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_etha237_frequency | Float | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_expert_rule_regions_bed | File | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_min_depth | Int | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_min_frequency | Float | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_min_percent_coverage | Float | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_min_read_support | Int | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_operator | String | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_output_seq_method_type | String | Internal component, do not modify | WGS | Optional |
merlin_magic | tbp_parser_rpob449_frequency | Float | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_rrl_frequency | Float | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_rrl_read_support | Int | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_rrs_frequency | Float | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_rrs_read_support | Int | Internal component, do not modify | Optional | |
merlin_magic | tbp_parser_tngs_data | Boolean | Internal component, do not modify | Optional | |
merlin_magic | tbprofiler_additional_parameters | String | Internal component, do not modify | Optional | |
merlin_magic | tbprofiler_custom_db | File | Internal component, do not modify | Optional | |
merlin_magic | tbprofiler_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | tbprofiler_mapper | String | Internal component, do not modify | Optional | |
merlin_magic | tbprofiler_min_af | Float | Internal component, do not modify | Optional | |
merlin_magic | tbprofiler_min_depth | Int | Internal component, do not modify | Optional | |
merlin_magic | tbprofiler_run_cdph_db | Boolean | Internal component, do not modify | FALSE | Optional |
merlin_magic | tbprofiler_run_custom_db | Boolean | Internal component, do not modify | FALSE | Optional |
merlin_magic | tbprofiler_variant_caller | String | Internal component, do not modify | Optional | |
merlin_magic | tbprofiler_variant_calling_params | String | Internal component, do not modify | Optional | |
merlin_magic | vibecheck_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | vibecheck_lineage_barcodes | File | Internal component, do not modify | Optional | |
merlin_magic | vibecheck_skip_subsampling | Boolean | Internal component, do not modify | Optional | |
merlin_magic | vibecheck_subsampling_fraction | Float | Internal component, do not modify | Optional | |
merlin_magic | virulencefinder_database | String | Internal component, do not modify | Optional | |
merlin_magic | virulencefinder_docker_image | String | Internal component, do not modify | Optional | |
merlin_magic | virulencefinder_min_percent_coverage | Float | Internal component, do not modify | Optional | |
merlin_magic | virulencefinder_min_percent_identity | Float | Internal component, do not modify | Optional | |
nanoplot_clean | cpu | Int | Number of CPUs to allocate to the task | 4 | Optional |
nanoplot_clean | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
nanoplot_clean | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/nanoplot:1.40.0 | Optional |
nanoplot_clean | max_length | Int | The maximum length of clean reads, for which reads longer than the length specified will be hidden. | 100000 | Optional |
nanoplot_clean | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 16 | Optional |
nanoplot_raw | cpu | Int | Number of CPUs to allocate to the task | 4 | Optional |
nanoplot_raw | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
nanoplot_raw | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/nanoplot:1.40.0 | Optional |
nanoplot_raw | max_length | Int | The maximum length of clean reads, for which reads longer than the length specified will be hidden. | 100000 | Optional |
nanoplot_raw | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 16 | Optional |
quast | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional |
quast | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
quast | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/quast:5.0.2 | Optional |
quast | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional |
quast | min_contig_length | Int | Minimum length of contig for QUAST | 500 | Optional |
raw_check_reads | cpu | Int | Number of CPUs to allocate to the task | 1 | Optional |
raw_check_reads | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
raw_check_reads | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/bactopia/gather_samples:2.0.2 | Optional |
raw_check_reads | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional |
raw_check_reads | workflow_series | String | Internal component, do not modify | theiaviral | Optional |
read_QC_trim | artic_guppyplex_cpu | Int | Internal component, do not modify | Optional | |
read_QC_trim | artic_guppyplex_disk_size | Int | Internal component, do not modify | Optional | |
read_QC_trim | artic_guppyplex_docker | String | Internal component, do not modify | Optional | |
read_QC_trim | artic_guppyplex_memory | Int | Internal component, do not modify | Optional | |
read_QC_trim | call_kraken | Boolean | Internal component, do not modify | FALSE | Optional |
read_QC_trim | downsampling_coverage | Float | The desired coverage to sub-sample the reads to with RASUSA | 150 | Optional |
read_QC_trim | kraken2_recalculate_abundances_cpu | Int | Internal component, do not modify | Optional | |
read_QC_trim | kraken2_recalculate_abundances_disk_size | Int | Internal component, do not modify | Optional | |
read_QC_trim | kraken2_recalculate_abundances_docker | String | Internal component, do not modify | Optional | |
read_QC_trim | kraken2_recalculate_abundances_memory | Int | Internal component, do not modify | Optional | |
read_QC_trim | kraken_cpu | Int | Internal component, do not modify | Optional | |
read_QC_trim | kraken_db | File | Internal component, do not modify | Optional | |
read_QC_trim | kraken_disk_size | Int | Internal component, do not modify | Optional | |
read_QC_trim | kraken_docker_image | String | Internal component, do not modify | Optional | |
read_QC_trim | kraken_memory | Int | Internal component, do not modify | Optional | |
read_QC_trim | max_length | Int | Internal component, do not modify | Optional | |
read_QC_trim | min_length | Int | Internal component, do not modify | Optional | |
read_QC_trim | nanoq_cpu | Int | Number of CPUs to allocate to the task | 2 | Optional |
read_QC_trim | nanoq_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
read_QC_trim | nanoq_docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/biocontainers/nanoq:0.9.0--hec16e2b_1 | Optional |
read_QC_trim | nanoq_max_read_length | Int | The maximum read length to keep after trimming | 100000 | Optional |
read_QC_trim | nanoq_max_read_qual | Int | The maximum read quality to keep after trimming | 40 | Optional |
read_QC_trim | nanoq_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional |
read_QC_trim | nanoq_min_read_length | Int | The minimum read length to keep after trimming | 500 | Optional |
read_QC_trim | nanoq_min_read_qual | Int | The minimum read quality to keep after trimming | 10 | Optional |
read_QC_trim | ncbi_scrub_cpu | Int | Internal component, do not modify | 4 | Optional |
read_QC_trim | ncbi_scrub_disk_size | Int | Internal component, do not modify | 100 | Optional |
read_QC_trim | ncbi_scrub_docker | String | Internal component, do not modify | us-docker.pkg.dev/general-theiagen/ncbi/sra-human-scrubber:2.2.1 | Optional |
read_QC_trim | ncbi_scrub_memory | Int | Internal component, do not modify | 8 | Optional |
read_QC_trim | rasusa_bases | String | Explicitly set the number of bases required e.g., 4.3kb, 7Tb, 9000, 4.1MB. If this option is given, --coverage and --genome-size are ignored | Optional | |
read_QC_trim | rasusa_cpu | Int | Number of CPUs to allocate to the task | 4 | Optional |
read_QC_trim | rasusa_disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
read_QC_trim | rasusa_docker | String | Internal component, do not modify | Optional | |
read_QC_trim | rasusa_fraction_of_reads | Float | Subsample to a fraction of the reads - e.g., 0.5 samples half the reads | Optional | |
read_QC_trim | rasusa_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional |
read_QC_trim | rasusa_number_of_reads | Int | Subsample to a specific number of reads | Optional | |
read_QC_trim | rasusa_seed | Int | Random seed to use | Optional | |
read_QC_trim | run_prefix | String | Internal component, do not modify | Optional | |
read_QC_trim | target_organism | String | Internal component, do not modify | Optional | |
theiaeuk_ont | busco_docker_image | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/ezlabgva/busco:v5.3.2_cv1 | Optional |
theiaeuk_ont | busco_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 24 | Optional |
theiaeuk_ont | gambit_db_genomes | File | User-provided database of assembled query genomes; requires complementary signatures file. If not provided, uses default database, "/gambit-db" | gs://gambit-databases-rp/fungal-version/1.0.0/gambit-fungal-metadata-1.0.0-20241213.gdb | Optional |
theiaeuk_ont | gambit_db_signatures | File | User-provided signatures file; requires complementary genomes file. If not specified, the file from the docker container will be used. | gs://gambit-databases-rp/fungal-version/1.0.0/gambit-fungal-signatures-1.0.0-20241213.gs | Optional |
theiaeuk_ont | genome_length | Int | User-specified expected genome length to be used in genome statistics calculations | 50000000 | Optional |
theiaeuk_ont | max_genome_length | Int | Maximum genome size able to pass read screening | 178000000 | Optional |
theiaeuk_ont | min_basepairs | Int | Minimum number of base pairs able to pass read screening | 45000000 | Optional |
theiaeuk_ont | min_coverage | Int | Minimum genome coverage able to pass read screening | 5 | Optional |
theiaeuk_ont | min_genome_length | Int | Minimum genome size able to pass read screening | 9000000 | Optional |
theiaeuk_ont | min_reads | Int | Minimum number of reads to pass read screening | 5000 | Optional |
theiaeuk_ont | skip_mash | Boolean | If true, skips estimation of genome size and coverage using mash in read screening steps. As a result, providing true also prevents screening using these parameters. | TRUE | Optional |
theiaeuk_ont | skip_screen | Boolean | Option to skip the read screening prior to analysis; if setting to true, please provide a value for the theiaeuk_pe genome_length optional input, OR set call_rasusa to false. Otherwise RASUSA will attempt to downsample to an expected genome size of 0 bp, and the workflow will fail. | FALSE | Optional |
theiaeuk_ont | workflow_series | String | Internal component, do not modify | theiaeuk | Optional |
version_capture | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0 | Optional |
version_capture | timezone | String | Set the time zone to get an accurate date of analysis (uses UTC by default) | Optional |
Workflow Tasks¶
All input reads are processed through "core tasks" in the TheiaEuk workflows. These undertake read trimming and assembly appropriate to the input data type, currently only Illumina paired-end data. TheiaEuk workflow subsequently launch default genome characterization modules for quality assessment, and additional taxa-specific characterization steps. When setting up the workflow, users may choose to use "optional tasks" or alternatives to tasks run in the workflow by default.
Core tasks¶
These tasks are performed regardless of organism. They include tasks that are performed regardless of and specific for the input data type. They perform read trimming and assembly appropriate to the input data type.
versioning
: Version Capture
The versioning
task captures the workflow version from the GitHub (code repository) version.
Version Capture Technical details
Links | |
---|---|
Task | task_versioning.wdl |
screen
: Total Raw Read Quantification and Genome Size Estimation
The screen
task ensures the quantity of sequence data is sufficient to undertake genomic analysis. It uses fastq-scan
and bash commands for quantification of reads and base pairs, and mash sketching to estimate the genome size and its coverage. At each step, the results are assessed relative to pass/fail criteria and thresholds that may be defined by optional user inputs. Samples are run through all threshold checks, regardless of failures, and the workflow will terminate after the screen
task if any thresholds are not met:
- Total number of reads: A sample will fail the read screening task if its total number of reads is less than or equal to
min_reads
. - The proportion of basepairs reads in the forward and reverse read files: A sample will fail the read screening if fewer than
min_proportion
basepairs are in either the reads1 or read2 files. - Number of basepairs: A sample will fail the read screening if there are fewer than
min_basepairs
basepairs - Estimated genome size: A sample will fail the read screening if the estimated genome size is smaller than
min_genome_size
or bigger thanmax_genome_size
. - Estimated genome coverage: A sample will fail the read screening if the estimated genome coverage is less than the
min_coverage
.
Read screening is undertaken on both the raw and cleaned reads. The task may be skipped by setting the skip_screen
variable to true.
Default values vary between the PE, SE, and ONT workflows. The rationale for these default values can be found below. If two default values are shown, the first is for Illumina workflows and the second is for ONT.
| Variable | Rationale |
| --- | --- | --- |
| skip_screen
| false | Set to true to skip the read screen from running. If you set this value to true, please provide a value for the theiaeuk_illumina_pe genome_length
optional input, OR set the theiaeuk_illumina_pe call_rasusa
optional input to false. Otherwise RASUSA will attempt to downsample to an expected genome size of 0 bp, and the workflow will fail. |
| min_reads
| 3000 | Calculated from the minimum number of base pairs required for 20x coverage of the Hansenula polymorpha genome, the smallest fungal genome as of 2015-04-02 (8.97 Mbp), divided by 300 (the longest Illumina read length) |
| min_basepairs
| 45000000 | Should be greater than 10x coverage of Hansenula polymorpha, the smallest fungal genome as of 2015-04-02 (8.97 Mbp) |
| min_genome_length
| 9000000 | Based on the Hansenula polymorpha genome - the smallest fungal genome as of 2015-04-02 (8.97 Mbp) |
| max_genome_length
| 178000000 | Based on the Cenococcum geophilum genome, the largest pathogenic fungal genome (177.57 Mbp), plus an additional 2 Mbp to cater for potential extra genomic material |
| min_coverage
| 10 | A bare-minimum average per base coverage across the genome required for genome characterization. Higher coverage would be required for high-quality phylogenetics.|
| min_proportion
| 40 | Neither read1 nor read2 files should have less than 40% of the total number of reads. For paired-end data only. |
Screen Technical Details
There is a single WDL task for read screening. The screen
task is run twice, once for raw reads and once for clean reads.
Links | |
---|---|
Task | task_screen.wdl (PE sub-task) task_screen.wdl (SE sub-task) |
Rasusa
: Read subsampling (optional, on by default)
To deactivate this task, set call_rasusa
to false
.
Rasusa
is a tool to randomly subsample sequencing reads to a specified coverage without assuming that all reads are of equal length, making it especially suitable for long-read data while still being applicable to short-read data.
The Rasusa
task performs subsampling on the input raw reads. By default, this task will subsample TheiaProk_ONT reads to a depth of 150X using an estimated genome length of 5 million basepairs (0.7 Mb higher than the average bacterial genome length), and TheiaEuk_ONT reads using an estimated genome length of 50 million basepairs. The estimated genome length can be changed by the user by providing a different value for the genome_length
input parameter. The target subsampling depth can also be adjusted by modifying the subsample_coverage
variable.
For TheiaEuk_Illumina_PE, the estimated genome length is determined by the read_screen
task. Please note that the user can prevent the task from being launched by setting the call_rasusa
variable to false.
Non-deterministic output(s)
This task may yield non-deterministic outputs since it performs random subsampling. To ensure reproducibility, set a a value for the rasusa_seed
optional input variable.
Rasusa Technical Details
Links | |
---|---|
Task | task_rasusa.wdl |
Software Source Code | Rasusa on GitHub |
Software Documentation | Rasusa on GitHub |
Original Publication(s) | Rasusa: Randomly subsample sequencing reads to a specified coverage |
read_QC_trim
: Read Quality Trimming, Adapter Removal, Quantification, and Identification
read_QC_trim
is a sub-workflow that removes low-quality reads, low-quality regions of reads, and sequencing adapters to improve data quality. It uses a number of tasks, described below. The differences between the PE and SE versions of the read_QC_trim
sub-workflow lie in the default parameters, the use of two or one input read file(s), and the different output files.
Read quality trimming
read_processing
with "trimmomatic"
(default) or "fastp"
Either trimmomatic
or fastp
can be used for read-quality trimming. Trimmomatic is used by default.
To activate fastp
, set the read_processing
input parameter to "fastp"
.
These tasks are mutually exclusive.
Trimmomatic
: Read Trimming
Trimmomatic trims low-quality regions of Illumina paired-end or single-end reads with a sliding window (with a default window size of 4, specified with trim_window_size
), cutting once the average quality within the window falls below the trim_quality_trim_score
(default of 20 for paired-end, 30 for single-end). The read is discarded if it is trimmed below trim_minlen
(default of 75 for paired-end, 25 for single-end).
Trimmomatic
Technical Details
Links | |
---|---|
Task | task_trimmomatic.wdl |
Software Source Code | Trimmomatic on GitHub |
Software Documentation | Trimmomatic Website |
Original Publication(s) | Trimmomatic: a flexible trimmer for Illumina sequence data |
fastp
: Read Trimming
fastp
trims low-quality regions of Illumina paired-end or single-end reads with a sliding window (with a default window size of 4, specified with trim_window_size
), cutting once the average quality within the window falls below the trim_quality_trim_score
(default of 20 for paired-end, 30 for single-end). The read is discarded if it is trimmed below trim_minlen
(default of 75 for paired-end, 25 for single-end).
fastp
also has additional default parameters and features that are not a part of trimmomatic
's default configuration.
fastp
default read-trimming parameters
Parameter | Explanation |
---|---|
-g | enables polyG tail trimming |
-5 20 | enables read end-trimming |
-3 20 | enables read end-trimming |
--detect_adapter_for_pe | enables adapter-trimming only for paired-end reads |
Additional arguments can be passed using the fastp_args
optional parameter.
Trimmomatic and fastp Technical Details
Links | |
---|---|
Task | task_fastp.wdl |
Software Source Code | fastp on GitHub |
Software Documentation | fastp on GitHub |
Original Publication(s) | fastp: an ultra-fast all-in-one FASTQ preprocessor |
BBDuk
: Adapter Trimming and PhiX Removal
Adapters are manufactured oligonucleotide sequences attached to DNA fragments during the library preparation process. In Illumina sequencing, these adapter sequences are required for attaching reads to flow cells. You can read more about Illumina adapters here. For genome analysis, it's important to remove these sequences since they're not actually from your sample. If you don't remove them, the downstream analysis may be affected.
The bbduk
task removes adapters from sequence reads. To do this:
- Repair from the BBTools package reorders reads in paired fastq files to ensure the forward and reverse reads of a pair are in the same position in the two fastq files (it re-pairs).
- BBDuk ("Bestus Bioinformaticus" Decontamination Using Kmers) is then used to trim the adapters and filter out all reads that have a 31-mer match to PhiX, which is commonly added to Illumina sequencing runs to monitor and/or improve overall run quality.
BBDuk Technical Details
Links | |
---|---|
Task | task_bbduk.wdl |
Software Source Code | BBMap on SourceForge |
Software Documentation | BBDuk Guide (archived) |
Read Quantification
read_qc
with "fastq-scan"
(default) or "fastqc"
Either fastq-scan
or fastqc
can be used for read quantification. fastq-scan
is used by default.
To activate fastqc
, set the read_qc
input parameter to "fastqc"
.
These tasks are mutually exclusive.
fastq-scan
: Read Quantification
fastq-scan
quantifies the forward and reverse reads in FASTQ files. For paired-end data, it also provide the total number of read pairs. This task is run once with raw reads as input and once with clean reads as input. If QC has been performed correctly, you should expect fewer clean reads than raw reads.
fastq-scan
Technical Details
Links | |
---|---|
Task | task_fastq_scan.wdl |
Software Source Code | fastq-scan on GitHub |
Software Documentation | fastq-scan on GitHub |
FastQC
: Read Quantification
FastQC
quantifies the forward and reverse reads in FASTQ files. For paired-end data, it also provide the total number of read pairs. This task is run once with raw reads as input and once with clean reads as input. If QC has been performed correctly, you should expect fewer clean reads than raw reads.
This tool also provides a graphical visualization of the read quality.
FastQC
Technical Details
Links | |
---|---|
Task | task_fastqc.wdl |
Software Source Code | FastQC on Github |
Software Documentation | FastQC Website |
Kraken2
: Read Identification (optional)
To activate this task, set call_kraken
to true
and provide a value for kraken_db
.
Kraken2
is a bioinformatics tool originally designed for metagenomic applications. It has additionally proven valuable for validating taxonomic assignments and checking contamination of single-species (e.g. bacterial isolate, eukaryotic isolate, viral isolate, etc.) whole genome sequence data.
Database-dependent
This workflow automatically uses a viral-specific Kraken2 database. This database was generated in-house from RefSeq's viral sequence collection and human genome GRCh38. It's available at gs://theiagen-public-resources-rp/reference_data/databases/kraken2/kraken2_humanGRCh38_viralRefSeq_20240828.tar.gz
.
As an alternative to MIDAS
(see above), the Kraken2
task can also be turned on through setting the call_kraken
input variable as true
for the identification of reads to detect contamination with non-target taxa.
A database must be provided if this optional module is activated, through the kraken_db optional input. A list of suggested databases can be found on Kraken2 standalone documentation.
Kraken2 Technical Details
Links | |
---|---|
Task | task_kraken2.wdl |
Software Source Code | Kraken2 on GitHub |
Software Documentation | Kraken2 Documentation |
Original Publication(s) | Improved metagenomic analysis with Kraken 2 |
read_QC_trim Technical Details
Links | |
---|---|
Subworkflow | wf_read_QC_trim_pe.wdl wf_read_QC_trim_se.wdl |
qc_check
: Check QC Metrics Against User-Defined Thresholds (optional)
To activate this task, provide a qc_check_table
as input.
The qc_check
task compares generated QC metrics against user-defined thresholds for each metric. This task will run if the user provides a qc_check_table
TSV file. If all QC metrics meet the threshold, the qc_check
output variable will read QC_PASS
. Otherwise, the output will read QC_NA
if the task could not proceed or QC_ALERT
followed by a string indicating what metric failed.
The qc_check
task applies quality thresholds according to the sample taxa. The sample taxa is taken from the gambit_predicted_taxon
value inferred by the GAMBIT module OR can be manually provided by the user using the expected_taxon
workflow input.
Formatting the qc_check_table.tsv
- The first column of the qc_check_table lists the
organism
that the task will assess and the header of this column must be "taxon". - Any genus or species can be included as a row of the qc_check_table. However, these taxa must uniquely match the sample taxa, meaning that the file can include multiple species from the same genus (Vibrio_cholerae and Vibrio_vulnificus), but not both a genus row and species within that genus (Vibrio and Vibrio cholerae). The taxa should be formatted with the first letter capitalized and underscores in lieu of spaces.
- Each subsequent column indicates a QC metric and lists a threshold for each organism that will be checked. The column names must exactly match expected values, so we highly recommend copy and pasting the header from the template file below as a starting place.
Template qc_check_table.tsv files
- TheiaEuk_Illumina_PE_PHB: theiaeuk_qc_check_template.tsv
Example Purposes Only
The QC threshold values shown in the file above are for example purposes only and should not be presumed to be sufficient for every dataset.
qc_check Technical Details
Links | |
---|---|
Task | task_qc_check_phb.wdl |
These tasks assemble the reads into a de novo assembly and assess the quality of the assembly.
digger_denovo
: De novo Assembly
De novo assembly is the process or product of attempting to reconstruct a genome from scratch (without prior knowledge of the genome) using sequence reads. Assembly of fungal genomes from short-reads will produce multiple contigs per chromosome rather than a single contiguous sequence for each chromosome.
In TheiaProk and TheiaEuk Illumina workflows, de novo assembly is performed for samples that have sufficient read quantity and quality using digger_denovo, a subworkflow based off of Shovill pipeline. The name "digger" is a nod to Shovill and SPAdes.
De novo Assembly
assembler
with skesa
(default), spades
, or megahit
To activate a particular assembler, set the assembler
input parameter to either skesa
(default), spades
, or megahit
.
These tasks are mutually exclusive.
SKESA
: De novo Assembly (default)
This task is activated by default.
SKESA
(Strategic K-mer Extension for Scrupulous Assemblies) is a de novo assembler that is fairly conservative and introduces breaks in the genome at repeat regions. This leads to higher sequence quality but more fragmented assemblies, which, depending on the final analysis goal, can be either highly preferred or detrimental. Designed for Illumina reads and haploid genomes, SKESA is the default assembler in the digger_denovo
subworkflow.
SKESA Technical Details
Links | |
---|---|
Task | task_skesa.wdl |
Software Source Code | SKESA on GitHub |
Software Documentation | SKESA on GitHub |
Original Publication(s) | SKESA: strategic k-mer externsion for scrupulous assemblies |
SPAdes
: De novo Assembly (alternative)
To activate this task, set assembler
to spades
.
SPAdes
(St. Petersburg genome assembler) is a de novo assembly tool that uses de Bruijn graphs to assemble genomes from Illumina short reads.
In TheiaProk, SPAdes is run in --isolate
mode, which is the recommended flag for high-coverage isolate and multi-cell Illumina data, which is typical of most bacterial sequencing projects. This method is optimized for improving assembly quality and decreasing runtime.
Non-deterministic output(s)
This task may yield non-deterministic outputs.
MetaviralSPAdes Technical Details
Links | |
---|---|
Task | task_spades.wdl |
Software Source Code | SPAdes on GitHub |
Software Documentation | SPAdes Manual |
Original Publication(s) | TheiaProk: SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing TheiaViral: MetaviralSPAdes: assembly of viruses from metagenomic data |
MEGAHIT
: De novo Assembly (alternative)
To activate this task, set assembler
to megahit
.
The MEGAHIT assembler is a fast and memory-efficient de novo assembler that can handle large datasets. While optimized for metagenomics, MEGAHIT also performs well on single-genome assemblies, making it a versatile choice for various assembly tasks.
MEGAHIT uses a multiple k-mer strategy that can be beneficial for assembling genomes with varying coverage levels, which is common in metagenomic samples. It constructs succinct de Bruijn graphs to efficiently represent the assembly process, allowing it to handle large and complex datasets with reduced memory usage.
Non-deterministic output(s)
This task may yield non-deterministic outputs.
MEGAHIT Technical Details
Links | |
---|---|
Task | task_megahit.wdl |
Software Source Code | MEGAHIT on GitHub |
Software Documentation | MEGAHIT on GitHub |
Original Publication(s) | MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph |
Assembly Polishing (optional)
To activate assembly polishing, set call_pilon
to true
.
bwa
: Read Alignment to the Assembly
BWA (Burrow-Wheeler Aligner) is used to align the cleaned read files to generated assembly file in order to generate an alignment. The resulting BAM file is directly passed to the Pilon task to polish the assembly for errors.
BWA Technical Details
Links | |
---|---|
Task | task_bwa.wdl |
Software Source Code | BWA on GitHub |
Software Documentation | BWA Documentation |
Original Publication(s) | Fast and accurate short read alignment with Burrows-Wheeler transform |
Pilon
: Assembly Polishing
Pilon
is a tool that uses read alignments to correct errors in an assembly.
The bwa
-generated alignment of the read data to the assembly is used to identify inconsistences between the reads and the assembly in order to correct them. Pilon
will attempt to fix individual base errors and small indels using the read data. This can improve the overall quality of the assembly, especially when the assembler has made mistakes due to sequencing errors or low coverage regions.
The default parameters were set to mimic the parameters used by Shovill: --fix bases --minq 60 --minqual 3 --mindepth 0.25
. These can be modified by the user.
Pilon Technical Details
Links | |
---|---|
Task | task_pilon.wdl |
Software Source Code | Pilon on GitHub |
Software Documentation | Pilon Wiki |
Original Publication(s) | Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement |
Contig Filtering (optional)
Filter Contigs
: Contig Quality Control
To deactivate contig filtering, set run_filter_contigs
to false
.
This task filters the created contigs based on a default minimum length threshold of 200 bp and a minimum coverage of 2.0. It also eliminates homopolymer contigs (contigs of any length that consist of a single nucleotide).
Options are available to skip any of these filters by setting the respective parameters to false
: filter_contigs_skip_length_filter
, filter_contigs_skip_coverage_filter
, and filter_contigs_skip_homopolymer_filter
. The minimum length and coverage thresholds can be adjusted using the filter_contigs_min_length
and filter_contigs_min_coverage
parameters, respectively.
This ensures high-quality assemblies by retaining only contigs that meet specified criteria. Detailed metrics on contig counts and sequence lengths before and after filtering are provided in the output.
Filter Contigs Technical Details
Links | |
---|---|
WDL Task | task_filter_contigs.wdl |
Digger-Denovo Technical Details
Links | |
---|---|
Subworkflow | wf_digger_denovo.wdl |
quast
: Assembly Quality Assessment
QUAST stands for QUality ASsessment Tool. It evaluates genome/metagenome assemblies by computing various metrics without a reference being necessary. It includes useful metrics such as number of contigs, length of the largest contig and N50.
QUAST Technical Details
Links | |
---|---|
Task | task_quast.wdl |
Software Source Code | QUAST on GitHub |
Software Documentation | QUAST Manual on SourceForge |
Original Publication(s) | QUAST: quality assessment tool for genome assemblies |
CG-Pipeline
: Assessment of Read Quality, and Estimation of Genome Coverage
Thecg_pipeline
task generates metrics about read quality and estimates the coverage of the genome using the run_assembly_readMetrics.pl
script from CG-Pipeline. The genome coverage estimates are calculated using both using raw and cleaned reads, using either a user-provided genome_size
or the estimated genome length generated by QUAST.
CG-Pipeline Technical Details
The cg_pipeline
task is run twice in this workflow, once with raw reads, and once with clean reads.
Links | |
---|---|
Task | task_cg_pipeline.wdl |
Software Source Code | CG-Pipeline on GitHub |
Software Documentation | CG-Pipeline on GitHub |
Original Publication(s) | A computational genomics pipeline for prokaryotic sequencing projects |
read_QC_trim_ont
: Read Quality Trimming, Quantification, and Identification
read_QC_trim_ont
is a sub-workflow that filters low-quality reads and trims low-quality regions of reads. It uses several tasks, described below.
A note on estimated genome length
By default, the estimated genome length is set to 5 Mb, which is around 0.7 Mb higher than the average bacterial genome length, according to the information of thousands of NCBI bacterial assemblies collated here. This estimate can be overwritten by the user and is used by Rasusa
.
Rasusa
: Read Subsampling
Rasusa
is a tool to randomly subsample sequencing reads to a specified coverage without assuming that all reads are of equal length, making it especially suitable for long-read data while still being applicable to short-read data.
The Rasusa
task performs subsampling on the input raw reads. By default, this task will subsample TheiaProk_ONT reads to a depth of 150X using an estimated genome length of 5 million basepairs (0.7 Mb higher than the average bacterial genome length), and TheiaEuk_ONT reads using an estimated genome length of 50 million basepairs. The estimated genome length can be changed by the user by providing a different value for the genome_length
input parameter. The target subsampling depth can also be adjusted by modifying the subsample_coverage
variable.
Non-deterministic output(s)
This task may yield non-deterministic outputs since it performs random subsampling. To ensure reproducibility, set a a value for the rasusa_seed
optional input variable.
Rasusa Technical Details
Links | |
---|---|
Task | task_rasusa.wdl |
Software Source Code | Rasusa on GitHub |
Software Documentation | Rasusa on GitHub |
Original Publication(s) | Rasusa: Randomly subsample sequencing reads to a specified coverage |
Nanoq
: Read Filtering
Reads are filtered by length and quality using nanoq
. By default, sequences with less than 500 basepairs and quality scores lower than 10 are filtered out to improve assembly accuracy. These defaults are able to be modified by the user.
Nanoq Technical Details
Links | |
---|---|
Task | task_nanoq.wdl |
Software Source Code | Nanoq on GitHub |
Software Documentation | Nanoq Documentation |
Original Publication(s) | Nanoq: ultra-fast quality control for nanopore reads |
Kraken2
: Read Identification (optional)
To activate this task, set call_kraken
to true
and provide a value for kraken_db
.
Kraken2
is a bioinformatics tool originally designed for metagenomic applications. It has additionally proven valuable for validating taxonomic assignments and checking contamination of single-species (e.g. bacterial isolate, eukaryotic isolate, viral isolate, etc.) whole genome sequence data.
Kraken2 is run on the raw read data.
Database-dependent
This workflow automatically uses a viral-specific Kraken2 database. This database was generated in-house from RefSeq's viral sequence collection and human genome GRCh38. It's available at gs://theiagen-public-resources-rp/reference_data/databases/kraken2/kraken2_humanGRCh38_viralRefSeq_20240828.tar.gz
.
A database must be provided if this optional module is activated, through the kraken_db optional input. A list of suggested databases can be found on Kraken2 standalone documentation.
Kraken2 Technical Details
Links | |
---|---|
Task | task_kraken2.wdl |
Software Source Code | Kraken2 on GitHub |
Software Documentation | Kraken2 Documentation |
Original Publication(s) | Improved metagenomic analysis with Kraken 2 |
NanoPlot
: Read Quantification
NanoPlot is used for the determination of mean quality scores, read lengths, and number of reads. This task is run once with raw reads as input and once with clean reads as input. If QC has been performed correctly, you should expect fewer clean reads than raw reads.
While this task currently is run outside of the read_QC_trim_ont
workflow, it is being included here as it calculates statistics on the read data. This is done so that the actual assembly genome lengths can be used (if an estimated genome length is not provided by the user) to ensure the estimated coverage statistics are accurate.
NanoPlot Technical Details
Links | |
---|---|
Task | task_nanoplot.wdl |
Software Source Code | NanoPlot on GitHub |
Software Documentation | NanoPlot Documentation |
Original Publication(s) | NanoPack2: population-scale evaluation of long-read sequencing data |
read_QC_trim_ont Technical Details
Links | |
---|---|
Subworkflow | wf_read_QC_trim_ont.wdl |
These tasks assemble the reads into a de novo assembly and assess the quality of the assembly.
Flye
: De novo Assembly
flye_denovo
is a sub-workflow that performs de novo assembly using Flye for ONT data and supports additional polishing and visualization steps.
Ensure correct medaka model is selected if performing medaka polishing
In order to obtain the best results, the appropriate model must be set to match the sequencer's basecaller model; this string takes the format of {pore}_{device}_{caller variant}_{caller_version}. See also https://github.com/nanoporetech/medaka?tab=readme-ov-file#models. If flye
is being run on legacy data the medaka model will likely be r941_min_hac_g507
. Recently generated data will likely be suited by the default model of r1041_e82_400bps_sup_v5.0.0
.
The detailed steps and tasks are as follows:
Porechop
: Read Trimming (optional; off by default)
Read trimming is optional and can be enabled by setting the run_porchop
input variable to true.
Porechop is a tool for finding and removing adapters from ONT data. Adapters on the ends of reads are trimmed, and when a read has an adapter in the middle, the read is split into two.
Porechop Technical Details
Links | |
---|---|
WDL Task | task_porechop.wdl |
Software Source Code | Porechop on GitHub |
Software Documentation | https://github.com/rrwick/Porechop#porechop |
Flye
: De novo Assembly
Flye is a de novo assembler for long read data using repeat graphs. Compared to de Bruijn graphs, which require exact k-mer matches, repeat graphs can use approximate matches which better tolerates the error rate of ONT data.
flye_read_type
input parameter
This input parameter specifies the type of sequencing reads being used for assembly. This parameter significantly impacts the assembly process and should match the characteristics of your input data. Below are the available options:
Parameter | Explanation |
---|---|
--nano-hq (default) |
Optimized for ONT high-quality reads, such as Guppy5+ SUP or Q20 (<5% error). Recommended for ONT reads processed with Guppy5 or newer |
--nano-raw |
For ONT regular reads, pre-Guppy5 (<20% error) |
--nano-corr |
ONT reads corrected with other methods (<3% error) |
--pacbio-raw |
PacBio regular CLR reads (<20% error) |
--pacbio-corr |
PacBio reads corrected with other methods (<3% error) |
--pacbio-hifi |
PacBio HiFi reads (<1% error) |
Refer to the Flye documentation for detailed guidance on selecting the appropriate flye_read_type
based on your sequencing data and additional optional paramaters.
Non-deterministic output(s)
This task may yield non-deterministic outputs.
Flye Technical Details
Links | |
---|---|
WDL Task | task_flye.wdl |
Software Source Code | Flye on GitHub |
Software Documentation | Flye Documentation |
Original Publication(s) | Assembly of long, error-prone reads using repeat graphs |
Bandage
: Graph Visualization
Bandage creates de novo assembly graphs containing the assembled contigs and the connections between those contigs. These graphs are useful for visualizing the assembly structure, identifying potential misassemblies, and understanding the relationships between contigs.
Bandage Technical Details
Links | |
---|---|
WDL Task | task_bandage_plot.wdl |
Software Source Code | Bandage on GitHub |
Software Documentation | Bandage Documentation |
Original Publication(s) | Bandage: interactive visualization of de novo genome assemblies |
Polypolish
: Hybrid Assembly Polishing for ONT and Illumina data
If short reads are provided with the optional illumina_read1
and illumina_read2
inputs, Polypolish will use those short-reads to correct errors in the long-read assemblies. Uniquely, Polypolish uses the short-read alignments where each read is aligned to all possible locations, meaning that even repeat regions will have error correction.
Polypolish Technical Details
Links | |
---|---|
Task | task_polypolish.wdl |
Software Source Code | Polypolish on GitHub |
Software Documentation | Polypolish Documentation |
Original Publication(s) | Polypolish: short-read polishing of long-read bacterial genome assemblies How low can you go? Short-read polishing of Oxford Nanopore bacterial genome assemblies |
Medaka
: Polishing of Flye assembly (default; optional)
Polishing is optional and can be skipped by setting the skip_polishing
variable to true. If polishing is skipped, then neither Medaka or Racon will run.
Medaka is the default assembly polisher used in TheiaProk. Racon may be used alternatively, and if so, Medaka will not run. Medaka uses the raw reads to polish the assembly and generate a consensus sequence.
Importantly, Medaka requires knowing the model that was used to generate the read data. There are several ways to provide this information:
- Automatic Model Selection: Automatically determines the most appropriate Medaka model based on the input data, ensuring optimal polishing results without manual intervention.
- User-Specified Model Override: Allows users to specify a particular
Medaka model
if automatic selection does not yield the desired outcome or for specialized use cases. - Default Model: If both automatic model selection fails and no user-specified model is provided, Medaka defaults to the predefined fallback model
r1041_e82_400bps_sup_v5.0.0
.
Medaka Model Resolution Process
Medaka's automatic model selection uses the medaka tools resolve_model
command to identify the appropriate model for polishing. This process relies on metadata embedded in the input file, which is typically generated by the basecaller. If the automatic selection fails to identify a suitable model, Medaka gracefully falls back to the default model to maintain workflow continuity. Users should verify the chosen model and consider specifying a model override if necessary.
Medaka Technical Details
Links | |
---|---|
WDL Task | task_medaka.wdl |
Software Source Code | Medaka on GitHub |
Software Documentation | Medaka Documentation |
Racon
: Polishing of Flye assembly (alternative; optional)
Polishing is optional and can be skipped by setting the skip_polishing
variable to true. If polishing is skipped, then neither Medaka or Racon will run.
Racon
is an alternative to using medaka
for assembly polishing, and can be run by setting the polisher
input to "racon". Racon is a consensus algorithm designed for refining raw de novo DNA assemblies generated from long, uncorrected sequencing reads.
Racon Technical Details
Links | |
---|---|
WDL Task | task_racon.wdl |
Software Source Code | Racon on GitHub |
Software Documentation | Racon Documentation |
Original Publication(s) | Fast and accurate de novo genome assembly from long uncorrected reads |
Filter Contigs
: Filter contigs below a threshold length and remove homopolymer contigs
This task filters the created contigs based on a user-defined minimum length threshold (default of 1000) and eliminates homopolymer contigs (contigs of any length that consist of a single nucleotide).
This ensures high-quality assemblies by retaining only contigs that meet specified criteria. Detailed metrics on contig counts and sequence lengths before and after filtering are provided in the output.
Filter Contigs Technical Details
Links | |
---|---|
WDL Task | task_filter_contigs.wdl |
Dnaapler
: Final Assembly Orientation
Dnaapler reorients contigs to start at specific reference points. Dnaapler supports the following modes, which can be indicated by filling the dnaapler_mode
input variable with the desired mode. The default is all
, which reorients contigs to start with dnaA
, terL
, repA
, or COG1474
.
- all: Reorients contigs to start with
dnaA
,terL
,repA
, orCOG1474
(Default) - chromosome: Reorients to begin with the
dnaA
chromosomal replication initiator gene, commonly used for bacterial chromosome assemblies. - plasmid: Reorients to start with the
repA
plasmid replication initiation gene, ideal for plasmid assemblie - phage: Reorients to start with the
terL
large terminase subunit gene, used for bacteriophage assemblies - archaea: Reorients to start with the
COG1474
archaeal Orc1/cdc6 gene, relevant for archaeal assemblies - custom: Reorients based on a user-specified gene in amino acid FASTA format for experimental or unique workflows
- mystery: Reorients to start with a random CDS for exploratory purposes
- largest: Reorients to start with the largest CDS in the assembly, often useful for poorly annotated genomes
- nearest: Reorients to start with the first CDS nearest to the sequence start, resolving CDS breakpoints
- bulk: Processes multiple contigs to start with the desired start gene (
dnaA
,terL
,repA
, or custom)
Dnaapler Technical Details
Links | |
---|---|
WDL Task | task_dnaapler.wdl |
Software Source Code | Dnaapler on GitHub |
Software Documentation | Dnaapler Documentation |
Original Publication(s) | Dnaapler: a tool to reorient circular microbial genomes |
Flye-Denovo Technical Details
Links | |
---|---|
Subworkflow | wf_flye_denovo.wdl |
Organism-agnostic characterization¶
These tasks are performed regardless of the organism and provide quality control and taxonomic assignment.
GAMBIT
: Taxon Assignment
GAMBIT
determines the taxon of the genome assembly using a k-mer based approach to match the assembly sequence to the closest complete genome in a database, thereby predicting its identity. Sometimes, GAMBIT can confidently designate the organism to the species level. Other times, it is more conservative and assigns it to a higher taxonomic rank.
For additional details regarding the GAMBIT tool and a list of available GAMBIT databases for analysis, please consult the GAMBIT tool documentation.
GAMBIT Technical Details
Links | |
---|---|
Task | task_gambit.wdl |
Software Source Code | GAMBIT on GitHub |
Software Documentation | GAMBIT ReadTheDocs |
Original Publication(s) | GAMBIT (Genomic Approximation Method for Bacterial Identification and Tracking): A methodology to rapidly leverage whole genome sequencing of bacterial isolates for clinical identification |
BUSCO
: Assembly Quality Assessment
BUSCO (Benchmarking Universal Single-Copy Orthologue) attempts to quantify the completeness and contamination of an assembly to generate quality assessment metrics. It uses taxa-specific databases containing genes that are all expected to occur in the given taxa, each in a single copy. BUSCO examines the presence or absence of these genes, whether they are fragmented, and whether they are duplicated (suggestive that additional copies came from contaminants).
BUSCO notation
Here is an example of BUSCO notation: C:99.1%[S:98.9%,D:0.2%],F:0.0%,M:0.9%,n:440
. There are several abbreviations used in this output:
- Complete (C) - genes are considered "complete" when their lengths are within two standard deviations of the BUSCO group mean length.
- Single-copy (S) - genes that are complete and have only one copy.
- Duplicated (D) - genes that are complete and have more than one copy.
- Fragmented (F) - genes that are only partially recovered.
- Missing (M) - genes that were not recovered at all.
- Number of genes examined (n) - the number of genes examined.
A high equity assembly will use the appropriate database for the taxa, have high complete (C) and single-copy (S) percentages, and low duplicated (D), fragmented (F) and missing (M) percentages.
BUSCO Technical Details
Links | |
---|---|
Task | task_busco.wdl |
Software Source Code | BUSCO on GitLab |
Software Documentation | https://busco.ezlab.org/ |
Orginal publication | BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs |
Organism-specific characterization¶
The TheiaEuk workflow automatically activates taxa-specific tasks after identification of the relevant taxa using GAMBIT
. Many of these taxa-specific tasks do not require any additional inputs from the user.
Candidozyma auris (also known as Candida auris)
Three tools can be deployed when Candidozyma auris/Candida auris is identified.
Cladetyping: clade determination
A custom GAMBIT database is created using six clade-specific Candidozyma auris reference genomes. Sequences undergo genomic signature comparison against this database, which then enables assignment to one of the six Candidozyma auris clades (Clade I to Clade VI) based on sequence similarity and phylogenetic relationships. This integrated approach ensures precise clade assignments, crucial for understanding the genetic diversity and epidemiology of Candidozyma auris.
See more information on the reference information for the six clades below:
Clade | Genome Accession | Assembly Name | Strain | BioSample Accession |
---|---|---|---|---|
Clade I | GCA_002759435.3 | Cand_auris_B8441_V3 | B8441 | SAMN05379624 |
Clade II | GCA_003013715.2 | ASM301371v2 | B11220 | SAMN05379608 |
Clade III | GCA_002775015.1 | Cand_auris_B11221_V1 | B11221 | SAMN05379609 |
Clade IV | GCA_003014415.1 | Cand_auris_B11243 | B11243 | SAMN05379619 |
Clade V | GCA_016809505.1 | ASM1680950v1 | IFRC2087 | SAMN11570381 |
Clade VI | GCA_032714025.1 | ASM3271402v1 | F1580 | SAMN36753179 |
Clade VI annotation
Clade VI does not have an available reference genome annotation at the time of adding the reference genome into Cladetyping. While Clade VI assignment is functional, downstream variant calling is not currently possible without an annotation. Users may provide a close relative annotation, such as Clade IV, though it is unknown if Clade VI variants can reliably be called with respect to such a reference.
Cauris_Cladetyper Technical Details
amr_search
: Antimicrobial Resistance Profiling (optional)
To activate this task, set run_amr_search
to be true
.
This task performs in silico antimicrobial resistance (AMR) profiling for supported species using AMRsearch, the primary tool used by Pathogenwatch to genotype and infer antimicrobial resistance (AMR) phenotypes from assembled microbial genomes.
AMRsearch screens against Pathogenwatch's library of curated genotypes and inferred phenotypes, developed in collaboration with community experts. Resistance phenotypes are determined based on both resistance genes and mutations, and the system accounts for interactions between multiple SNPs, genes, and suppressors. Predictions follow S/I/R classification (Sensitive, Intermediate, Resistant).
Currently, only a subset of species are supported by this task.
Supported Species
The following table shows the species name and the associated NCBI Code. If you are running AMR Search as part of TheiaProk and TheiaEuk, these codes will be automatically determined based on the GAMBIT predicted taxon, or the user-provided expected_taxon
input.
Species | NCBI Code |
---|---|
Neisseria gonorrhoeae | 485 |
Staphylococcus aureus | 1280 |
Salmonella Typhi | 90370 |
Streptococcus pneumoniae | 1313 |
Klebisiella | 570 |
Escherichia | 561 |
Mycobacterium tuberculosis | 1773 |
Candida auris | 498019 |
Vibrio cholerae | 666 |
Campylobacter | 194 |
Outputs:
- JSON Output: Contains the complete AMR profile, including detailed resistance state, detected resistance genes/mutations, and supporting BLAST results.
- CSV & PDF Tables: An incorporated Python script,
parse_amr_json.py
, extracts and formats results into a CSV file and PDF summary table for easier visualization.
amr_search Technical Details
Links | |
---|---|
Task | task_amr_search.wdl |
Software Source Code | AMRsearch on GitHub |
Software Documentation | AMRsearch on GitHub |
Original Publication(s) | PAARSNP: rapid genotypic resistance prediction for Neisseria gonorrhoeae |
Snippy Variants: antifungal resistance detection
To detect mutations that may confer antifungal resistance, Snippy
is used to find all variants relative to the clade-specific reference, then these variants are queried for product names associated with resistance. It's important to note that unlike amr_search
, this task reports all variants found in the searched targets.
The genes in which there are known resistance-conferring mutations for this pathogen are:
- FKS1
- ERG11 (lanosterol 14-alpha demethylase)
- FUR1 (uracil phosphoribosyltransferase)
We query Snippy
results to see if any mutations were identified in those genes. By default, we automatically check for the following loci (which can be overwritten by the user). You will find the mutations next to the locus tag in the theiaeuk_snippy_variants_hits
column corresponding gene name (see below):
TheiaEuk Search Term | Corresponding Gene Name |
---|---|
B9J08_005340 | ERG6 |
B9J08_000401 | FLO8 |
B9J08_005343 | Hypothetical protein (PSK74852) |
B9J08_003102 | MEC3 |
B9J08_003737 | ERG3 |
lanosterol.14-alpha.demethylase | ERG11 |
uracil.phosphoribosyltransferase | FUR1 |
FKS1 | FKS1 |
For example, one sample may have the following output for the theiaeuk_snippy_variants_hits
column:
lanosterol.14-alpha.demethylase: lanosterol 14-alpha demethylase (missense_variant c.428A>G p.Lys143Arg; C:266 T:0),B9J08_000401: hypothetical protein (stop_gained c.424C>T p.Gln142*; A:70 G:0)
Based on this, we can tell that ERG11 has a missense variant at position 143 (Lysine to Arginine) and B9J08_000401 (which is FLO8) has a stop-gained variant at position 142 (Glutamine to Stop).
Known resistance-conferring mutations for Candidozyma auris
Mutations in these genes that are known to confer resistance are shown below
Snippy Variants Technical Details
Links | |
---|---|
Task | task_snippy_variants.wdl task_snippy_gene_query.wdl |
Software Source Code | Snippy on GitHub |
Software Documentation | Snippy on GitHub |
Candida albicans
When this species is detected by the taxon ID tool, an antifungal resistance detection task is deployed.
Snippy Variants: antifungal resistance detection
To detect mutations that may confer antifungal resistance, Snippy
is used to find all variants relative to the clade-specific reference, and these variants are queried for product names associated with resistance.
The genes in which there are known resistance-conferring mutations for this pathogen are:
- ERG11
- GCS1 (FKS1)
- FUR1
- RTA2
We query Snippy
results to see if any mutations were identified in those genes. By default, we automatically check for the following loci (which can be overwritten by the user). You will find the mutations next to the locus tag in the theiaeuk_snippy_variants_hits
column corresponding gene name (see below):
TheiaEuk Search Term | Corresponding Gene Name |
---|---|
ERG11 | ERG11 |
GCS1 | FKS1 |
FUR1 | FUR1 |
RTA2 | RTA2 |
Snippy Variants Technical Details
Links | |
---|---|
Task | task_snippy_variants.wdl task_snippy_gene_query.wdl |
Software Source Code | Snippy on GitHub |
Software Documentation | Snippy on GitHub |
Aspergillus fumigatus
When this species is detected by the taxon ID tool an antifungal resistance detection task is deployed.
Snippy Variants: antifungal resistance detection
To detect mutations that may confer antifungal resistance, Snippy
is used to find all variants relative to the clade-specific reference, and these variants are queried for product names associated with resistance.
The genes in which there are known resistance-conferring mutations for this pathogen are:
- Cyp51A
- HapE
- COX10 (AFUA_4G08340)
We query Snippy
results to see if any mutations were identified in those genes. By default, we automatically check for the following loci (which can be overwritten by the user). You will find the mutations next to the locus tag in the theiaeuk_snippy_variants_hits
column corresponding gene name (see below):
TheiaEuk Search Term | Corresponding Gene Name |
---|---|
Cyp51A | Cyp51A |
HapE | HapE |
AFUA_4G08340 | COX10 |
Snippy Variants Technical Details
Links | |
---|---|
Task | task_snippy_variants.wdl task_snippy_gene_query.wdl |
Software Source Code | Snippy on GitHub |
Software Documentation | Snippy on GitHub |
Cryptococcus neoformans
When this species is detected by the taxon ID tool an antifungal resistance detection task is deployed.
Snippy Variants: antifungal resistance detection
To detect mutations that may confer antifungal resistance, Snippy
is used to find all variants relative to the clade-specific reference, and these variants are queried for product names associated with resistance.
The genes in which there are known resistance-conferring mutations for this pathogen are:
- ERG11 (CNA00300)
We query Snippy
results to see if any mutations were identified in those genes. By default, we automatically check for the following loci (which can be overwritten by the user). You will find the mutations next to the locus tag in the theiaeuk_snippy_variants_hits
column corresponding gene name (see below):
TheiaEuk Search Term | Corresponding Gene Name |
---|---|
CNA00300 | ERG11 |
Snippy Variants Technical Details
Links | |
---|---|
Task | task_snippy_variants.wdl task_snippy_gene_query.wdl |
Software Source Code | Snippy on GitHub |
Software Documentation | Snippy on GitHub |
Outputs¶
Variable | Type | Description |
---|---|---|
amr_search_csv | File | CSV formatted AMR profile |
amr_search_docker | String | Docker image used to run AMR_Search |
amr_search_results | File | JSON formatted AMR profile including BLAST results |
amr_search_results_pdf | File | PDF formatted AMR profile |
amr_search_version | String | Version of AMR_Search libraries used |
assembler | String | Assembler used in digger_denovo subworkflow |
assembler_version | String | Version of the assembler used in digger_denovo |
assembly_fasta | File | De novo genome assembly in FASTA format |
assembly_length | Int | Length of assembly (total contig length) as determined by QUAST |
bbduk_docker | String | The Docker image for bbduk, which was used to remove the adapters from the sequences |
busco_database | String | BUSCO database used |
busco_docker | String | BUSCO docker image used |
busco_report | File | A plain text summary of the results in BUSCO notation |
busco_results | String | BUSCO results (see relevant toggle in this block) |
busco_version | String | BUSCO software version used |
cg_pipeline_docker | String | Docker file used for running CG-Pipeline on cleaned reads |
cg_pipeline_report_clean | File | TSV file of read metrics from clean reads, including average read length, number of reads, and estimated genome coverage |
cg_pipeline_report_raw | File | TSV file of read metrics from raw reads, including average read length, number of reads, and estimated genome coverage |
cladetyper_annotated_reference | String | The annotated reference file for the identified clade, "None" if no clade was identified/no annotation is inputted |
cladetyper_clade | String | The clade assigned to the input assembly |
cladetyper_docker_image | String | The Docker container used for the task |
cladetyper_gambit_version | String | The version of GAMBIT used for the analysis |
combined_mean_q_clean | Float | Mean quality score for the combined clean reads |
combined_mean_q_raw | Float | Mean quality score for the combined raw reads |
combined_mean_readlength_clean | Float | Mean read length for the combined clean reads |
combined_mean_readlength_raw | Float | Mean read length for the combined raw reads |
contigs_gfa | File | Assembly graph output generated by SPAdes (Illumina: PE, SE) or Flye (ONT), used to visualize and evaluate genome assembly results. |
est_coverage_clean | Float | Estimated coverage calculated from clean reads and genome length |
est_coverage_raw | Float | Estimated coverage calculated from raw reads and genome length |
fastp_html_report | File | The HTML report made with fastp |
fastp_version | String | The version of fastp used |
fastq_scan_clean1_json | File | The JSON file output from fastq-scan containing summary stats about clean forward read quality and length |
fastq_scan_clean2_json | File | The JSON file output from fastq-scan containing summary stats about clean reverse read quality and length |
fastq_scan_num_reads_clean1 | Int | The number of forward reads after cleaning as calculated by fastq_scan |
fastq_scan_num_reads_clean2 | Int | The number of reverse reads after cleaning as calculated by fastq_scan |
fastq_scan_num_reads_clean_pairs | String | The number of read pairs after cleaning as calculated by fastq_scan |
fastq_scan_num_reads_raw1 | Int | The number of input forward reads as calculated by fastq_scan |
fastq_scan_num_reads_raw2 | Int | The number of input reserve reads as calculated by fastq_scan |
fastq_scan_num_reads_raw_pairs | String | The number of input read pairs as calculated by fastq_scan |
fastq_scan_raw1_json | File | The JSON file output from fastq-scan containing summary stats about raw forward read quality and length |
fastq_scan_raw2_json | File | The JSON file output from fastq-scan containing summary stats about raw reverse read quality and length |
fastq_scan_version | String | The version of fastq_scan |
fastqc_clean1_html | File | An HTML file that provides a graphical visualization of clean forward read quality from fastqc to open in an internet browser |
fastqc_clean2_html | File | An HTML file that provides a graphical visualization of clean reverse read quality from fastqc to open in an internet browser |
fastqc_docker | String | The Docker container used for fastqc |
fastqc_num_reads_clean1 | Int | The number of forward reads after cleaning by fastqc |
fastqc_num_reads_clean2 | Int | The number of reverse reads after cleaning by fastqc |
fastqc_num_reads_clean_pairs | String | The number of read pairs after cleaning by fastqc |
fastqc_num_reads_raw1 | Int | The number of input forward reads by fastqc before cleaning |
fastqc_num_reads_raw2 | Int | The number of input reverse reads by fastqc before cleaning |
fastqc_num_reads_raw_pairs | String | The number of input read pairs by fastqc before cleaning |
fastqc_raw1_html | File | An HTML file that provides a graphical visualization of raw forward read quality from fastqc to open in an internet browser |
fastqc_raw2_html | File | An HTML file that provides a graphical visualization of raw reverse read quality from fastqc to open in an internet browser |
fastqc_version | String | Version of fastqc software used |
filtered_contigs_metrics | File | File containing metrics of contigs filtered |
gambit_closest_genomes | File | CSV file listing genomes in the GAMBIT database that are most similar to the query assembly |
gambit_db_version | String | Version of the GAMBIT database used |
gambit_docker | String | GAMBIT Docker used |
gambit_predicted_taxon | String | Taxon predicted by GAMBIT |
gambit_predicted_taxon_rank | String | Taxon rank of GAMBIT taxon prediction |
gambit_report | File | GAMBIT report in a machine-readable format |
gambit_version | String | Version of GAMBIT software used |
n50_value | Int | N50 of assembly calculated by QUAST |
number_contigs | Int | Total number of contigs in assembly |
qc_check | String | A string that indicates whether or not the sample passes a set of pre-determined and user-provided QC thresholds |
qc_standard | File | The file used in the QC Check task containing the QC thresholds. |
quast_gc_percent | Float | The GC percent of your sample |
quast_report | File | TSV report from QUAST |
quast_version | String | The version of QUAST |
r1_mean_q_raw | Float | Mean quality score of raw forward reads |
r1_mean_readlength_raw | Float | Mean read length of raw forward reads |
r2_mean_q_raw | Float | Mean quality score of raw reverse reads |
r2_mean_readlength_raw | Float | Mean read length of raw reverse reads |
rasusa_version | String | Version of RASUSA used for the analysis |
read1_clean | File | Forward read file after quality trimming and adapter removal |
read1_subsampled | File | Read1 FASTQ files downsampled to desired coverage |
read2_clean | File | Reverse read file after quality trimming and adapter removal |
read2_subsampled | File | Read2 FASTQ files downsampled to desired coverage |
read_screen_clean | String | PASS or FAIL result from clean read screening; FAIL accompanied by the reason(s) for failure |
read_screen_clean_tsv | File | Clean read screening report TSV depicting read counts, total read base pairs, and estimated genome length |
read_screen_raw | String | PASS or FAIL result from raw read screening; FAIL accompanied by the reason(s) for failure |
read_screen_raw_tsv | File | Raw read screening report TSV depicting read counts, total read base pairs, and estimated genome length |
seq_platform | String | Description of the sequencing methodology used to generate the input read data |
theiaeuk_illumina_pe_analysis_date | String | Date of TheiaEuk PE workflow execution |
theiaeuk_illumina_pe_version | String | TheiaEuk PE workflow version used |
theiaeuk_snippy_variants_bai | String | BAI file produced by the snippy module |
theiaeuk_snippy_variants_bam | String | BAM file produced by the snippy module |
theiaeuk_snippy_variants_coverage_tsv | String | TSV file containing coverage information for each base in the reference genome |
theiaeuk_snippy_variants_gene_query_results | String | File containing all lines from variants file matching gene query terms |
theiaeuk_snippy_variants_hits | String | String of all variant file entries matching gene query term |
theiaeuk_snippy_variants_num_reads_aligned | String | Number of reads aligned by snippy |
theiaeuk_snippy_variants_num_variants | String | Number of variants detected by snippy |
theiaeuk_snippy_variants_outdir_tarball | String | Tar compressed file containing full snippy output directory |
theiaeuk_snippy_variants_percent_ref_coverage | String | Percent of reference genome covered by snippy |
theiaeuk_snippy_variants_query | String | The gene query term(s) used to search variant |
theiaeuk_snippy_variants_query_check | String | Were the gene query terms present in the refence annotated genome file |
theiaeuk_snippy_variants_reference_genome | String | The reference genome used in the alignment and variant calling |
theiaeuk_snippy_variants_results | String | The variants file produced by snippy |
theiaeuk_snippy_variants_summary | String | A file summarizing the variants detected by snippy |
theiaeuk_snippy_variants_version | String | The version of the snippy_variants module being used |
trimmomatic_docker | String | The docker image used for the trimmomatic module in this workflow |
trimmomatic_version | String | The version of Trimmomatic used |
Variable | Type | Description |
---|---|---|
amr_search_csv | File | CSV formatted AMR profile |
amr_search_docker | String | Docker image used to run AMR_Search |
amr_search_results | File | JSON formatted AMR profile including BLAST results |
amr_search_results_pdf | File | PDF formatted AMR profile |
amr_search_version | String | Version of AMR_Search libraries used |
assembly_fasta | File | De novo genome assembly in FASTA format |
assembly_length | Int | Length of assembly (total contig length) as determined by QUAST |
bandage_plot | File | Image file (PNG) visualizing the Flye assembly graph generated by Bandage |
bandage_version | String | Version of Bandage used |
busco_database | String | BUSCO database used |
busco_docker | String | BUSCO docker image used |
busco_report | File | A plain text summary of the results in BUSCO notation |
busco_results | String | BUSCO results (see relevant toggle in this block) |
busco_version | String | BUSCO software version used |
bwa_version | String | Version of BWA software used |
cladetype_annotated_ref | String | The annotated reference file for the identified clade, "None" if no clade was identified/no annotation is inputted |
cladetyper_clade | String | The clade assigned to the input assembly |
cladetyper_docker_image | String | The Docker container used for the task |
cladetyper_version | String | The version of Cladetyper used for the analysis |
contigs_gfa | File | Assembly graph output generated by SPAdes (Illumina: PE, SE) or Flye (ONT), used to visualize and evaluate genome assembly results. |
dnaapler_version | String | Version of dnaapler used |
est_coverage_clean | Float | Estimated coverage calculated from clean reads and genome length |
est_coverage_raw | Float | Estimated coverage calculated from raw reads and genome length |
est_genome_length | Int | Estimated genome length |
filtered_contigs_metrics | File | File containing metrics of contigs filtered |
flye_assembly_info | String | Information file from Flye assembly |
flye_version | String | Version of Flye software used |
gambit_closest_genomes_file | File | CSV file listing genomes in the GAMBIT database that are most similar to the query assembly |
gambit_db_version | String | Version of the GAMBIT database used |
gambit_docker | String | GAMBIT Docker used |
gambit_next_taxon | String | Next taxon predicted by GAMBIT |
gambit_next_taxon_rank | String | Next taxon rank predicted by GAMBIT |
gambit_predicted_taxon | String | Taxon predicted by GAMBIT |
gambit_predicted_taxon_rank | String | Taxon rank of GAMBIT taxon prediction |
gambit_report_file | File | GAMBIT report in a machine-readable format |
gambit_version | String | Version of GAMBIT software used |
medaka_model | String | Model used by Medaka |
medaka_version | String | Version of Medaka used |
merlin_tag | String | Merlin tag for the assembly |
n50_value | Int | N50 of assembly calculated by QUAST |
nanoplot_docker | String | Docker image for nanoplot |
nanoplot_html_clean | File | An HTML report describing the clean reads |
nanoplot_html_raw | File | An HTML report describing the raw reads |
nanoplot_num_reads_clean1 | Int | Number of clean reads |
nanoplot_num_reads_raw1 | Int | Number of raw reads |
nanoplot_r1_est_coverage_clean | Float | Estimated coverage on the clean reads by nanoplot |
nanoplot_r1_est_coverage_raw | Float | Estimated coverage on the raw reads by nanoplot |
nanoplot_r1_mean_q_clean | Float | Mean quality score of clean forward reads |
nanoplot_r1_mean_q_raw | Float | Mean quality score of raw forward reads |
nanoplot_r1_mean_readlength_clean | Float | Mean read length of clean forward reads |
nanoplot_r1_mean_readlength_raw | Float | Mean read length of raw forward reads |
nanoplot_r1_median_q_clean | Float | Median quality score of clean forward reads |
nanoplot_r1_median_q_raw | Float | Median quality score of raw forward reads |
nanoplot_r1_median_readlength_clean | Float | Median read length of clean forward reads |
nanoplot_r1_median_readlength_raw | Float | Median read length of raw forward reads |
nanoplot_r1_n50_clean | Float | N50 of clean forward reads |
nanoplot_r1_n50_raw | Float | N50 of raw forward reads |
nanoplot_r1_stdev_readlength_clean | Float | Standard deviation read length of clean forward reads |
nanoplot_r1_stdev_readlength_raw | Float | Standard deviation read length of raw forward reads |
nanoplot_tsv_clean | File | A TSV report describing the clean reads |
nanoplot_tsv_raw | File | A TSV report describing the raw reads |
nanoplot_version | String | Version of nanoplot used for analysis |
nanoq_version | String | Version of nanoq used in analysis |
number_contigs | Int | Total number of contigs in assembly |
polypolish_version | String | Version of Polypolish used |
porechop_version | String | Version of Porechop used |
quast_gc_percent | Float | The GC percent of your sample |
quast_report | File | TSV report from QUAST |
quast_version | String | The version of QUAST |
racon_version | String | Version of Racon used |
read1_clean | File | Forward read file after quality trimming and adapter removal |
read_screen_clean | String | PASS or FAIL result from clean read screening; FAIL accompanied by the reason(s) for failure |
read_screen_clean_tsv | File | Clean read screening report TSV depicting read counts, total read base pairs, and estimated genome length |
read_screen_raw | String | PASS or FAIL result from raw read screening; FAIL accompanied by the reason(s) for failure |
read_screen_raw_tsv | File | Raw read screening report TSV depicting read counts, total read base pairs, and estimated genome length |
theiaeuk_ont_analysis_date | String | Date of TheiaEuk_ONT workflow execution |
theiaeuk_ont_version | String | TheiaEuk_ONT workflow version used |
theiaeuk_snippy_variants_bai | String | BAI file produced by the snippy module |
theiaeuk_snippy_variants_bam | String | BAM file produced by the snippy module |
theiaeuk_snippy_variants_coverage_tsv | String | TSV file containing coverage information for each base in the reference genome |
theiaeuk_snippy_variants_gene_query_results | String | File containing all lines from variants file matching gene query terms |
theiaeuk_snippy_variants_hits | String | String of all variant file entries matching gene query term |
theiaeuk_snippy_variants_num_reads_aligned | String | Number of reads aligned by snippy |
theiaeuk_snippy_variants_num_variants | String | Number of variants detected by snippy |
theiaeuk_snippy_variants_outdir_tarball | String | Tar compressed file containing full snippy output directory |
theiaeuk_snippy_variants_percent_ref_coverage | String | Percent of reference genome covered by snippy |
theiaeuk_snippy_variants_query | String | The gene query term(s) used to search variant |
theiaeuk_snippy_variants_query_check | String | Were the gene query terms present in the refence annotated genome file |
theiaeuk_snippy_variants_results | String | The variants file produced by snippy |
theiaeuk_snippy_variants_summary | String | A file summarizing the variants detected by snippy |
theiaeuk_snippy_variants_version | String | The version of the snippy_variants module being used |