TheiaMeta¶
Quick Facts¶
Workflow Type | Applicable Kingdom | Last Known Changes | Command-line Compatibility | Workflow Level |
---|---|---|---|---|
Genomic Characterization | Any Taxa | PHB v2.3.0 | Yes | Sample-level |
TheiaMeta Workflows¶
Genomic characterization of pathogens is an increasing priority for public health laboratories globally. The workflows in the TheiaMeta Genomic Characterization Series make the analysis of pathogens from metagenomic samples easy by taking raw next-generation sequencing (NGS) data and generating metagenome-assembled genomes (MAGs), either using a reference-genome or not.
TheiaMeta can use one of two distinct methods for generating and processing the final assembly:
- If a reference genome is not provided, the de novo assembly will be the final assembly. Additionally, go through a binning process where the contigs are separated into distinct files ("bins") according to composition and coverage such that each bin hopefully contains a single taxon.
- If a reference genome is provided by the user, the de novo metagenomic assembly is filtered by mapping the contigs to the reference and those constitute the final assembly. No binning is necessary as the mapping will filter contigs that are likely the same taxon as the reference.
Inputs¶
The TheiaMeta_Illumina_PE workflow processes Illumina paired-end (PE) reads generated for metagenomic characterization (typically by shotgun). By default, this workflow will assume that input reads were generated using a 300-cycle sequencing kit (i.e. 2 x 150 bp reads). Modifications to the optional parameter for trim_minlen
may be required to accommodate shorter read data, such as 2 x 75bp reads generated using a 150-cycle sequencing kit.
Terra Task Name | Variable | Type | Description | Default Value | Terra Status |
---|---|---|---|---|---|
theiameta_illumina_pe | read1 | File | Forward Illumina read in FASTQ file format | Required | |
theiameta_illumina_pe | read2 | File | Reverse Illumina read in FASTQ file format | Required | |
theiameta_illumina_pe | samplename | String | Name of the sample being analyzed | Required | |
assembled_reads_percent | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional |
assembled_reads_percent | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
assembled_reads_percent | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/samtools:1.17 | Optional |
assembled_reads_percent | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional |
bwa | cpu | Int | Number of CPUs to allocate to the task | 6 | Optional |
bwa | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
bwa | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/ivar:1.3.1-titan | Optional |
bwa | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 16 | Optional |
calculate_coverage | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional |
calculate_coverage | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
calculate_coverage | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/bedtools:2.31.0 | Optional |
calculate_coverage | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional |
calculate_coverage_paf | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional |
calculate_coverage_paf | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
calculate_coverage_paf | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/quay/ubuntu:latest | Optional |
calculate_coverage_paf | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional |
compare_assemblies | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
kraken2_clean | cpu | Int | Number of CPUs to allocate to the task | 4 | Optional |
kraken2_clean | disk_size | Int | GB of storage to request for VM used to run the kraken2 task. Increase this when using large (>30GB kraken2 databases such as the "k2_standard" database) | 100 | Optional |
kraken2_clean | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/kraken2:2.1.2-no-db | Optional |
kraken2_clean | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 16 | Optional |
kraken2_raw | cpu | Int | Number of CPUs to allocate to the task | 4 | Optional |
kraken2_raw | disk_dize | Int | GB of storage to request for VM used to run the kraken2 task. Increase this when using large (>30GB kraken2 databases such as the "k2_standard" database) | 100 | Optional |
kraken2_raw | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/kraken2:2.1.2-no-db | Optional |
kraken2_raw | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 16 | Optional |
krona_clean | cpu | Int | Number of CPUs to allocate to the task | 4 | Optional |
krona_clean | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
krona_clean | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/biocontainers/krona:2.7.1--pl526_5 | Optional |
krona_clean | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional |
krona_raw | cpu | Int | Number of CPUs to allocate to the task | 4 | Optional |
krona_raw | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
krona_raw | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/biocontainers/krona:2.7.1--pl526_5 | Optional |
krona_raw | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional |
metaspades | kmers | String | Kmer list to use with metaspades. If not provided metaspades automatically sets this value | Optional | |
metaspades | metaspades_opts | String | Additional arguments to pass to metaspades task | Optional | |
minimap2_assembly | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional |
minimap2_assembly | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
minimap2_assembly | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/minimap2:2.22 | Optional |
minimap2_assembly | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional |
minimap2_assembly | query2 | File | Internal component. Do not modify. | Optional | |
minimap2_reads | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional |
minimap2_reads | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
minimap2_reads | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/minimap2:2.22 | Optional |
minimap2_reads | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional |
quast | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional |
quast | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
quast | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/quast:5.0.2 | Optional |
quast | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional |
read_QC_trim | adapters | File | Adapter file to be trimmed by trimmomatic | Optional | |
read_QC_trim | bbduck_mem | Int | Memory to use with bbduck | 8 | Optional |
read_QC_trim | call_midas | Boolean | Optional to run Midas on input data | FALSE | Optional |
read_QC_trim | fastp_args | String | Fastp-specific options that you might choose, see https://github.com/OpenGene/fastp | Optional | |
read_QC_trim | midas_db | File | A Midas database in .tar.gz format | gs://theiagen-public-files-rp/terra/theiaprok-files/midas/midas_db_v1.2.tar.gz | Optional |
read_QC_trim | phix | File | Optional | ||
read_QC_trim | read_processing | String | Optional | ||
read_QC_trim | read_qc | String | Allows the user to decide between fastq_scan (default) and fastqc for the evaluation of read quality. | fastq_scan | Optional |
read_QC_trim | target_organism | String | Internal component. Do not modify. | Optional | |
read_QC_trim | trim_min_length | Int | Optional | ||
read_QC_trim | trim_window_size | Int | Optional | ||
read_QC_trim | trimmomatic_args | String | Optional | ||
retrieve_aligned_contig_paf | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional |
retrieve_aligned_contig_paf | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
retrieve_aligned_contig_paf | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/seqkit:2.3.1 | Optional |
retrieve_aligned_contig_paf | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional |
retrieve_aligned_pe_reads_sam | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional |
retrieve_aligned_pe_reads_sam | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
retrieve_aligned_pe_reads_sam | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/samtools:1.17 | Optional |
retrieve_aligned_pe_reads_sam | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional |
retrieve_unaligned_pe_reads_sam | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional |
retrieve_unaligned_pe_reads_sam | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
retrieve_unaligned_pe_reads_sam | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/samtools:1.17 | Optional |
retrieve_unaligned_pe_reads_sam | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional |
sam_to_sorted_bam | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional |
sam_to_sorted_bam | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
sam_to_sorted_bam | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/samtools:1.17 | Optional |
sam_to_sorted_bam | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional |
semibin | cpu | Int | Number of CPUs to allocate to the task | 6 | Optional |
semibin | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
semibin | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/biocontainers/semibin:2.0.2--pyhdfd78af_0 | Optional |
semibin | environment | String | Environment model to use. Options: • human_gut • dog_gut • ocean • soil • cat_gut • human_oral • mouse_gut • pig_gut • built_environment • wastewater • chicken_caecum - global |
global | Optional |
semibin | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional |
semibin | min_length | Int | Minimum contig length for binning | 1000 | Optional |
semibin | ratio | Float | If the ratio of the number of base pairs of contigs between 1000-2500 bp smaller than this value, the minimal length will be set as 1000bp, otherwise 2500bp. | 0.05 | Optional |
sort_bam_assembly_correction | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional |
sort_bam_assembly_correction | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
sort_bam_assembly_correction | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/samtools:1.17 | Optional |
sort_bam_assembly_correction | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional |
theiameta_illumina_pe | kraken2_db | File | A Kraken2 database in .tar.gz format | gs://theiagen-public-files-rp/terra/theiaprok-files/k2_standard_08gb_20230605.tar.gz | Optional |
theiameta_illumina_pe | output_additional_files | Boolean | Output additional files such as aligned and unaligned reads to reference | FALSE | Optional |
theiameta_illumina_pe | reference | File | Reference file for consensus calling, in FASTA format | Optional | |
version_capture | docker | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
version_capture | timezone | String | Set the time zone to get an accurate date of analysis (uses UTC by default) | Optional |
Workflow Tasks¶
versioning
: Version Capture for TheiaMeta
The versioning
task captures the workflow version from the GitHub (code repository) version.
Version Capture Technical details
Links | |
---|---|
Task | task_versioning.wdl |
Read Cleaning and QC¶
HRRT
: Human Host Sequence Removal
All reads of human origin are removed, including their mates, by using NCBI's human read removal tool (HRRT).
HRRT is based on the SRA Taxonomy Analysis Tool and employs a k-mer database constructed of k-mers from Eukaryota derived from all human RefSeq records with any k-mers found in non-Eukaryota RefSeq records subtracted from the database.
NCBI-Scrub Technical Details
Links | |
---|---|
Task | task_ncbi_scrub.wdl |
Software Source Code | NCBI Scrub on GitHub |
Software Documentation | https://github.com/ncbi/sra-human-scrubber/blob/master/README.md |
read_QC_trim
: Read Quality Trimming, Adapter Removal, Quantification, and Identification
read_QC_trim
is a sub-workflow within TheiaMeta that removes low-quality reads, low-quality regions of reads, and sequencing adapters to improve data quality. It uses a number of tasks, described below.
Read quality trimming
Either trimmomatic
or fastp
can be used for read-quality trimming. Trimmomatic is used by default. Both tools trim low-quality regions of reads with a sliding window (with a window size of trim_window_size
), cutting once the average quality within the window falls below trim_quality_trim_score
. They will both discard the read if it is trimmed below trim_minlen
.
If fastp is selected for analysis, fastp also implements the additional read-trimming steps indicated below:
Parameter | Explanation |
---|---|
-g | enables polyG tail trimming |
-5 20 | enables read end-trimming |
-3 20 | enables read end-trimming |
--detect_adapter_for_pe | enables adapter-trimming only for paired-end reads |
Adapter removal
The BBDuk
task removes adapters from sequence reads. To do this:
- Repair from the BBTools package reorders reads in paired fastq files to ensure the forward and reverse reads of a pair are in the same position in the two fastq files.
- BBDuk ("Bestus Bioinformaticus" Decontamination Using Kmers) is then used to trim the adapters and filter out all reads that have a 31-mer match to PhiX, which is commonly added to Illumina sequencing runs to monitor and/or improve overall run quality.
What are adapters and why do they need to be removed?
Adapters are manufactured oligonucleotide sequences attached to DNA fragments during the library preparation process. In Illumina sequencing, these adapter sequences are required for attaching reads to flow cells. You can read more about Illumina adapters here. For genome analysis, it's important to remove these sequences since they're not actually from your sample. If you don't remove them, the downstream analysis may be affected.
Read Quantification
There are two methods for read quantification to choose from: fastq-scan
(default) or fastqc
. Both quantify the forward and reverse reads in FASTQ files. In TheiaProk_Illumina_PE, they also provide the total number of read pairs. This task is run once with raw reads as input and once with clean reads as input. If QC has been performed correctly, you should expect fewer clean reads than raw reads. fastqc
also provides a graphical visualization of the read quality.
Read Identification (optional)
The MIDAS
task is for the identification of reads to detect contamination with non-target taxa. This task is optional and turned off by default. It can be used by setting the call_midas
input variable to true
.
The MIDAS reference database, located at gs://theiagen-large-public-files-rp/terra/theiaprok-files/midas/midas_db_v1.2.tar.gz
, is provided as the default. It is possible to provide a custom database. More information is available here.
How are the MIDAS output columns determined?
Example MIDAS report in the ****midas_report
column:
species_id | count_reads | coverage | relative_abundance |
---|---|---|---|
Salmonella_enterica_58156 | 3309 | 89.88006645 | 0.855888033 |
Salmonella_enterica_58266 | 501 | 11.60606061 | 0.110519371 |
Salmonella_enterica_53987 | 99 | 2.232896237 | 0.021262881 |
Citrobacter_youngae_61659 | 46 | 0.995216227 | 0.009477003 |
Escherichia_coli_58110 | 5 | 0.123668877 | 0.001177644 |
MIDAS report column descriptions:
- species_id: species identifier
- count_reads: number of reads mapped to marker genes
- coverage: estimated genome-coverage (i.e. read-depth) of species in metagenome
- relative_abundance: estimated relative abundance of species in metagenome
read_QC_trim Technical Details
kraken
: Taxonomic Classification
Kraken2 is a bioinformatics tool originally designed for metagenomic applications. It has additionally proven valuable for validating taxonomic assignments and checking contamination of single-species (e.g. bacterial isolate, eukaryotic isolate, viral isolate, etc.) whole genome sequence data.
Kraken2 is run on the set of raw reads, provided as input, as well as the set of clean reads that are resulted from the read_QC_trim
workflow
Database-dependent
The Kraken2 software is database-dependent and taxonomic assignments are highly sensitive to the database used. An appropriate database should contain the expected organism(s) (e.g. Escherichia coli) and other taxa that may be present in the reads (e.g. Citrobacter freundii, a common contaminant).
Kraken2 Technical Details
Links | |
---|---|
Task | task_kraken2.wdl |
Software Source Code | Kraken2 on GitHub |
Software Documentation | https://github.com/DerrickWood/kraken2/wiki |
Original Publication(s) | Improved metagenomic analysis with Kraken 2 |
Assembly¶
metaspades
: De Novo Metagenomic Assembly
While metagenomics has emerged as a technology of choice for analyzing bacterial populations, the assembly of metagenomic data remains challenging. A dedicated metagenomic assembly algorithm is necessary to circumvent the challenge of interpreting variation. metaSPAdes addresses various challenges of metagenomic assembly by capitalizing on computational ideas that proved to be useful in assemblies of single cells and highly polymorphic diploid genomes.
metaspades
is a de novo assembler that first constructs a de Bruijn graph of all the reads using the SPAdes algorithm. Through various graph simplification procedures, paths in the assembly graph are reconstructed that correspond to long genomic fragments within the metagenome. For more details, please see the original publication.
MetaSPAdes Technical Details
Links | |
---|---|
Task | task_metaspades.wdl |
Software Source Code | SPAdes on GitHub |
Software Documentation | SPAdes Manual |
Original Publication(s) | metaSPAdes: a new versatile metagenomic assembler |
minimap2
: Assembly Alignment and Contig Filtering
If a reference genome is provided through the reference
optional input, the assembly produced with metaspades
will be mapped to the reference genome with minimap2
. The contigs which align to the reference are retrieved and returned in the assembly_fasta
output.
minimap2
is a popular aligner that is used for correcting the assembly produced by metaSPAdes. This is done by aligning the reads back to the generated assembly or a reference genome.
In minimap2, "modes" are a group of preset options. Two different modes are used in this task depending on whether a reference genome is provided.
If a reference genome is not provided, the only mode used in this task is sr
which is intended for "short single-end reads without splicing". The sr
mode indicates the following parameters should be used: -k21 -w11 --sr --frag=yes -A2 -B8 -O12,32 -E2,1 -b0 -r100 -p.5 -N20 -f1000,5000 -n2 -m20 -s40 -g100 -2K50m --heap-sort=yes --secondary=no
. The output file is in SAM format.
If a reference genome is provided, then after the draft assembly polishing with pilon
, this task runs again with the mode set to asm20
which is intended for "long assembly to reference mapping". The asm20
mode indicates the following parameters should be used: -k19 -w10 -U50,500 --rmq -r100k -g10k -A1 -B4 -O6,26 -E2,1 -s200 -z200 -N50
. The output file is in PAF format.
For more information, please see the minimap2 manpage
minimap2 Technical Details
Links | |
---|---|
Task | task_minimap2.wdl |
Software Source Code | minimap2 on GitHub |
Software Documentation | minimap2 |
Original Publication(s) | Minimap2: pairwise alignment for nucleotide sequences |
samtools
: SAM File Conversion
This task converts the output SAM file from minimap2 and converts it to a BAM file. It then sorts the BAM based on the read names, and then generates an index file.
samtools Technical Details
Links | |
---|---|
Task | task_samtools.wdl |
Software Source Code | samtools on GitHub |
Software Documentation | samtools |
Original Publication(s) | The Sequence Alignment/Map format and SAMtools Twelve Years of SAMtools and BCFtools |
pilon
: Assembly Polishing
pilon
is a tool that uses read alignment to correct errors in an assembly. It is used to polish the assembly produced by metaSPAdes. The input to Pilon is the sorted BAM file produced by samtools
, and the original draft assembly produced by metaspades
.
pilon Technical Details
Links | |
---|---|
Task | task_pilon.wdl |
Software Source Code | Pilon on GitHub |
Software Documentation | Pilon Wiki |
Original Publication(s) | Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement |
Assembly QC¶
quast
: Assembly Quality Assessment
QUAST stands for QUality ASsessment Tool. It evaluates genome/metagenome assemblies by computing various metrics without a reference being necessary. It includes useful metrics such as number of contigs, length of the largest contig and N50.
QUAST Technical Details
Links | |
---|---|
Task | task_quast.wdl |
Software Source Code | QUAST on GitHub |
Software Documentation | https://quast.sourceforge.net/ |
Original Publication(s) | QUAST: quality assessment tool for genome assemblies |
Binning¶
semibin2
: Metagenomic binning (if a reference is NOT provided)
If no reference genome is provided through the reference
optional input, the assembly produced with metaspades
will be binned with semibin2
, a a command tool for metagenomic binning with deep learning.
Outputs¶
Variable | Type | Description |
---|---|---|
assembly_fasta | File | Final assembly (MAG) |
assembly_length | Int | Length of final assembly in basepairs |
assembly_mean_coverage | Float | Mean depth of coverage of the final assembly |
average_read_length | Float | Average read length of the clean reads |
bbduk_docker | String | Docker image for bbduk |
bedtools_docker | String | Docker image for bedtools |
bedtools_version | String | Version of bedtools |
contig number | Int | Number of contigs in final assembly |
fastp_html_report | File | Report file for fastp in HTML format |
fastp_version | String | Version of fastp used |
fastq_scan_docker | String | Docker image of fastq_scan |
fastq_scan_clean1_json | File | JSON file output from fastq-scan containing summary stats about clean forward read quality and length |
fastq_scan_clean2_json | File | JSON file output from fastq-scan containing summary stats about clean reverse read quality and length |
fastq_scan_num_reads_clean_pairs | String | Number of read pairs after cleaning as calculated by fastq_scan |
fastq_scan_num_reads_clean1 | Int | Number of forward reads after cleaning as calculated by fastq_scan |
fastq_scan_num_reads_clean2 | Int | Number of reverse reads after cleaning as calculated by fastq_scan |
fastq_scan_num_reads_raw_pairs | String | Number of input read pairs as calculated by fastq_scan |
fastq_scan_num_reads_raw1 | Int | Number of input forward reads as calculated by fastq_scan |
fastq_scan_num_reads_raw2 | Int | Number of input reserve reads as calculated by fastq_scan |
fastq_scan_raw1_json | File | JSON file output from fastq-scan containing summary stats about raw forward read quality and length |
fastq_scan_raw2_json | File | JSON file output from fastq-scan containing summary stats about raw reverse read quality and length |
fastq_scan_version | String | fastq_scan version |
fastqc_clean1_html | File | Graphical visualization of clean forward read quality from fastqc to open in an internet browser |
fastqc_clean2_html | File | Graphical visualization of clean reverse read quality from fastqc to open in an internet browser |
fastqc_docker | String | Docker container used for fastqc |
fastqc_num_reads_clean_pairs | String | Number of read pairs after cleaning by fastqc |
fastqc_num_reads_clean1 | Int | Number of forward reads after cleaning by fastqc |
fastqc_num_reads_clean2 | Int | Number of reverse reads after cleaning by fastqc |
fastqc_num_reads_raw_pairs | String | Number of input read pairs by fastqc |
fastqc_num_reads_raw1 | Int | Number of input forward reads by fastqc |
fastqc_num_reads_raw2 | Int | Number of input reverse reads by fastqc |
fastqc_raw1_html | File | Graphical visualization of raw forward read quality from fastqc to open in an internet browser |
fastqc_raw2_html | File | Graphical visualization of raw reverse read qualityfrom fastqc to open in an internet browser |
fastqc_version | String | Version of fastqc software used |
kraken2_docker | String | Docker image of kraken2 |
kraken2_percent_human_clean | Float | Percentage of human-classified reads in the sample's clean reads |
kraken2_percent_human_raw | Float | Percentage of human-classified reads in the sample's raw reads |
kraken2_report_clean | File | Full Kraken report for the sample's clean reads |
kraken2_report_raw | File | Full Kraken report for the sample's raw reads |
kraken2_version | String | Version of kraken |
krona_docker | String | Docker image of Krona |
krona_html_clean | File | The KronaPlot after reads are cleaned |
krona_html_raw | File | The KronaPlot before reads are cleaned |
krona_version | String | Version of Krona |
largest_contig | Int | Largest contig size |
metaspades_docker | String | Docker image of metaspades |
metaspades_version | String | Version of metaspades |
midas_primary_genus | String | Primary genus detected by MIDAS |
midas_report | File | MIDAS report file tsv file |
minimap2_docker | String | Docker image of minimap2 |
minimap2_version | String | Version of minimap2 |
ncbi_scrub_docker | String | Docker image for NCBI's HRRT |
percent_coverage | Float | Percentage coverage of the reference genome provided |
percentage_mapped_reads | Float | Percentage of mapped reads to the assembly |
pilon_docker | String | Docker image for pilon |
pilon_version | String | Version of pilon |
quast_docker | String | Docker image of QUAST |
quast_version | String | Version of QUAST |
read1_clean | File | Clean forward reads file |
read1_dehosted | File | Dehosted forward reads file |
read1_mapped | File | Mapped forward reads to the assembly |
read1_unmapped | File | Unmapped forwards reads to the assembly |
read2_clean | File | Clean reverse reads file |
read2_dehosted | File | Dehosted reverse reads file |
read2_mapped | File | Mapped reverse reads to the assembly |
read2_unmapped | File | Unmapped reverse reads to the assembly |
samtools_docker | String | Docker image of samtools |
samtools_version | String | Version of samtools |
semibin_bins | Array[File] | Array of binned metagenomic assembled genome files |
semibin_docker | String | Docker image of semibin |
semibin_version | String | Semibin version used |
theiameta_illumina_pe_analysis_date | String | Date of analysis |
theiameta_illumina_pe_version | String | Version of workflow |
trimmomatic_docker | String | Docker image of trimmomatic |
trimmomatic_version | String | Version of trimmomatic used |
References¶
Human read removal tool (HRRT): https://github.com/ncbi/sra-human-scrubber
Trimmomatic: Anthony M. Bolger and others, Trimmomatic: a flexible trimmer for Illumina sequence data, Bioinformatics, Volume 30, Issue 15, August 2014, Pages 2114–2120, https://doi.org/10.1093/bioinformatics/btu170
Fastq-Scan: https://github.com/rpetit3/fastq-scan
metaSPAdes: Sergey Nurk and others, metaSPAdes: a new versatile metagenomic assembler, Genome Res. 2017 May; 27(5): 824–834., https://doi.org/10.1101%2Fgr.213959.116
Pilon: Bruce J. Walker and others. Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement. Plos One. November 19, 2014. https://doi.org/10.1371/journal.pone.0112963
Minimap2: Heng Li, Minimap2: pairwise alignment for nucleotide sequences, Bioinformatics, Volume 34, Issue 18, September 2018, Pages 3094–3100, https://doi.org/10.1093/bioinformatics/bty191
QUAST: Alexey Gurevich and others, QUAST: quality assessment tool for genome assemblies, Bioinformatics, Volume 29, Issue 8, April 2013, Pages 1072–1075, https://doi.org/10.1093/bioinformatics/btt086
Samtools: Li, Heng, Bob Handsaker, Alec Wysoker, Tim Fennell, Jue Ruan, Nils Homer, Gabor Marth, Goncalo Abecasis, Richard Durbin, and 1000 Genome Project Data Processing Subgroup. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25(16): 2078-2079.
Bcftools: Petr Danecek, James K Bonfield, Jennifer Liddle, John Marshall, Valeriu Ohan, Martin O Pollard, Andrew Whitwham, Thomas Keane, Shane A McCarthy, Robert M Davies, Heng Li. Twelve years of SAMtools and BCFtools. GigaScience, Volume 10, Issue 2, February 2021, giab008, https://doi.org/10.1093/gigascience/giab008
Semibin2: Shaojun Pan, Xing-Ming Zhao, Luis Pedro Coelho, SemiBin2: self-supervised contrastive learning leads to better MAGs for short- and long-read sequencing, Bioinformatics, Volume 39, Issue Supplement_1, June 2023, Pages i21–i29, https://doi.org/10.1093/bioinformatics/btad209