Skip to content

Snippy_Variants

Quick Facts

Workflow Type Applicable Kingdom Last Known Changes Command-line Compatibility Workflow Level
Phylogenetic Construction Bacteria, Mycotics, Viral PHB v2.2.0 Yes Sample-level

Snippy_Variants_PHB

The Snippy_Variants workflow aligns single-end or paired-end reads (in FASTQ format), or assembled sequences (in FASTA format), against a reference genome, then identifies single-nucleotide polymorphisms (SNPs), multi-nucleotide polymorphisms (MNPs), and insertions/deletions (INDELs) across the alignment. If a GenBank file is used as the reference, mutations associated with user-specified query strings (e.g. genes of interest) can additionally be reported to the Terra data table.

Snippy_Variants Workflow Diagram

Snippy_Variants Workflow Diagram

Example Use Cases

  • Finding mutations (SNPs, MNPs, and INDELs) in your own sample's reads relative to a reference, e.g. mutations in genes of phenotypic interest.
  • Quality control: When undertaking quality control of sequenced isolates, it is difficult to identify contamination between multiple closely related genomes using the conventional approaches in TheiaProk (e.g. isolates from an outbreak or transmission cluster). Such contamination may be identified as allele heterogeneity at a significant number of genome positions. Snippy_Variants may be used to identify these heterogeneous positions by aligning reads to the assembly of the same reads, or to a closely related reference genome and lowering the thresholds to call SNPs.
  • Assessing support for a mutation: Snippy_Variants produces a BAM file of the reads aligned to the reference genome. This BAM file can be visualized in IGV (see Theiagen Office Hours recordings) to assess the position of a mutation in supporting reads, or if the assembly of the reads was used as a reference, the position in the contig.
    • Mutations that are only found at the ends of supporting reads may be an error of sequencing.
    • Mutations found at the end of contigs may be assembly errors.

Inputs

  • Single or paired-end reads resulting from Illumina or IonTorrent sequencing can be used. For single-end data, simply omit a value for read2
  • Assembled genomes can be used. Use the assembly_fasta input and omit read1 and read2
  • The reference file should be in fasta (e.g. .fa, .fasta) or full GenBank (.gbk) format. The mutations identified by Snippy_Variants are highly dependent on the choice of reference genome. Mutations cannot be identified in genomic regions that are present in your query sequence and not the reference.

Query String

The query string can be a gene or any other annotation that matches the GenBank file/output VCF EXACTLY

Terra Task Name Variable Type Description Default Value Terra Status
snippy_variants_wf reference_genome_file File Reference genome (GenBank file or fasta) Required
snippy_variants_wf samplename String Names of samples Required
snippy_gene_query cpu Int Number of CPUs to allocate to the task 8 Optional
snippy_gene_query disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
snippy_gene_query docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-06-21 Optional
snippy_gene_query memory Int Amount of memory/RAM (in GB) to allocate to the task 32 Optional
snippy_variants disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
snippy_variants_wf assembly_fasta File Assembly file Optional
snippy_variants_wf base_quality Int Minimum quality for a nucleotide to be used in variant calling 13 Optional
snippy_variants_wf cpus Int Number of CPUs to use 4 Optional
snippy_variants_wf docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/snippy:4.6.0 Optional
snippy_variants_wf map_qual Int Minimum mapping quality to accept in variant calling, default from snippy tool is 60 Optional
snippy_variants_wf maxsoft Int Number of bases of alignment to soft-clip before discarding the alignment, default from snippy tool is 10 Optional
snippy_variants_wf memory Int Amount of memory/RAM (in GB) to allocate to the task 16 Optional
snippy_variants_wf min_coverage Int Minimum read coverage of a position to identify a mutation 10 Optional
snippy_variants_wf min_frac Float Minimum fraction of bases at a given position to identify a mutation, default from snippy tool is 0 0.9 Optional
snippy_variants_wf min_quality Int Minimum VCF variant call "quality" 100 Optional
snippy_variants_wf query_gene String Comma-separated strings (e.g. gene names) in which to search for mutations to output to data table Optional
snippy_variants_wf read1 File Forward read file Optional
snippy_variants_wf read2 File Reverse read file Optional
version_capture docker String The Docker container to use for the task "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" Optional
version_capture timezone String Set the time zone to get an accurate date of analysis (uses UTC by default) Optional

Workflow Tasks

Snippy_Variants uses the snippy tool to align reads to the reference and call SNPs, MNPs and INDELs according to optional input parameters. The output includes a file of variants that is then queried using the grep bash command to identify any mutations in specified genes or annotations of interest. The query string MUST match the gene name or annotation as specified in the GenBank file and provided in the output variant file in the snippy_results column.

Outputs

Visualize your outputs in IGV

Output bam/bai files may be visualized using IGV to manually assess read placement and SNP support.

Variable Type Description
snippy_variants_bai File Indexed bam file of the reads aligned to the reference
snippy_variants_bam File Bam file of reads aligned to the reference
snippy_variants_coverage_tsv File Coverage stats tsv file output by the samtools coverage command
snippy_variants_docker String Docker image for snippy variants task
snippy_variants_gene_query_results File CSV file detailing results for mutations associated with the query strings specified by the user
snippy_variants_hits String A summary of mutations associated with the query strings specified by the user
snippy_variants_num_reads_aligned Int Number of reads that aligned to the reference genome as calculated by samtools view -c command
snippy_variants_num_variants Int Number of variants detected between sample and reference genome
snippy_variants_outdir_tarball File A compressed file containing the whole directory of snippy output files. This is used when running Snippy_Tree
snippy_variants_percent_ref_coverage Float Proportion of reference genome with depth greater than or equal to min_coverage
snippy_variants_query String Query strings specified by the user when running the workflow
snippy_variants_query_check String Verification that query strings are found in the reference genome
snippy_variants_results File CSV file detailing results for all mutations identified in the query sequence relative to the reference
snippy_variants_summary File A summary TXT fie showing the number of mutations identified for each mutation type
snippy_variants_version String Version of Snippy used
snippy_variants_wf_version String Version of Snippy_Variants used