Skip to content

Find_Shared_Variants

Quick Facts

Workflow Type Applicable Kingdom Last Known Changes Command-line Compatibility Workflow Level
Phylogenetic Construction Bacteria, Mycotics PHB v2.0.0 Yes Set-level

Find_Shared_Variants_PHB

Find_Shared_Variants_PHB is a workflow for concatenating the variant results produced by the Snippy_Variants_PHB workflow across multiple samples and reshaping the data to illustrate variants that are shared among multiple samples.

Find_Shared_Variants Workflow Diagram

Find_Shared_Variants Workflow Diagram

Inputs

The primary intended input of the workflow is the snippy_variants_results output from Snippy_Variants_PHB or the theiaeuk_snippy_variants_results output of the TheiaEuk workflow. Variant results files from other tools may not be compatible at this time.

All variant data included in the sample set should be generated from aligning sequencing reads to the same reference genome. If variant data was generated using different reference genomes, shared variants cannot be identified and results will be less useful.

Terra Task Name Variable Type Description Default Value Terra Status
shared_variants_wf concatenated_file_name String String of your choice to prefix output files Required
shared_variants_wf samplenames Array[String] The samples to be included in the analysis Required
shared_variants_wf variants_to_cat Array[File] The result file from the Snippy_Variants workflow Required
cat_variants docker_image String The Docker container to use for the task "us-docker.pkg.dev/general-theiagen/theiagen/utility:1.1" Optional
shared_variants cpu Int Number of CPUs to allocate to the task 1 Optional
shared_variants disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
shared_variants docker String The Docker container to use for the task "us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-03-16" Optional
shared_variants memory Int Amount of memory/RAM (in GB) to allocate to the task 8 Optional
version_capture docker String The Docker container to use for the task "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" Optional
version_capture timezone String Set the time zone to get an accurate date of analysis (uses UTC by default) Optional

Tasks

Concatenate Variants
Concatenate Variants Task

The cat_variants task concatenates variant data from multiple samples into a single file concatenated_variants. It is very similar to the cat_files task, but also adds a column to the output file that indicates the sample associated with each row of data.

The concatenated_variants file will be in the following format:

samplename CHROM POS TYPE REF ALT EVIDENCE FTYPE STRAND NT_POS AA_POS EFFECT LOCUS_TAG GENE PRODUCT
sample1 PEKT02000007 5224 snp C G G:21 C:0
sample2 PEKT02000007 34112 snp C G G:32 C:0 CDS + 153/1620 51/539 missense_variant c.153C>G p.His51Gln B9J08_002604 hypothetical protein
sample3 PEKT02000007 34487 snp T A A:41 T:0 CDS + 528/1620 176/539 missense_variant c.528T>A p.Asn176Lys B9J08_002604 hypothetical protein

Technical Details

Links
Task /tasks/utilities/file_handling/task_cat_files.wdl
Software Source Code task_cat_files.wdl
Shared Variants Task
Shared Variants Task

The shared_variants task takes in the concatenated_variants output from the cat_variants task and reshapes the data so that variants are rows and samples are columns. For each variant, samples where the variant was detected are populated with a "1" and samples were either the variant was not detected or there was insufficient coverage to call variants are populated with a "0". The resulting table is available as the shared_variants_table output.

The shared_variants_table file will be in the following format:

CHROM POS TYPE REF ALT FTYPE STRAND NT_POS AA_POS EFFECT LOCUS_TAG GENE PRODUCT sample1 sample2 sample3
PEKT02000007 2693938 snp T C CDS - 1008/3000 336/999 synonymous_variant c.1008A>G p.Lys336Lys B9J08_003879 NA chitin synthase 1 1 1 0
PEKT02000007 2529234 snp G C CDS + 282/336 94/111 missense_variant c.282G>C p.Lys94Asn B9J08_003804 NA cytochrome c 1 1 1
PEKT02000002 1043926 snp A G CDS - 542/1464 181/487 missense_variant c.542T>C p.Ile181Thr B9J08_000976 NA dihydrolipoyl dehydrogenase 1 1 0

Technical Details

Links
Task task_shared_variants.wdl
Software Source Code task_shared_variants.wdl

Outputs

The outputs of this workflow are the concatenated_variants file and the shared_variants_table file.

Variable Type Description
concatenated_variants File The concatenated variants without presence/absence
shared_variants_analysis_date String The date the workflow was run
shared_variants_table File The shared variants table listing presence/absence for each mutation identified in the samples
shared_variants_version String The version of PHB the workflow is in