Samples_to_Ref_Tree¶
Quick Facts¶
Workflow Type | Applicable Kingdom | Last Known Changes | Command-line Compatibility | Workflow Level |
---|---|---|---|---|
Phylogenetic Placement | Viral | PHB v2.1.0 | Yes | Sample-level, Set-level |
Samples_to_Ref_Tree_PHB¶
Nextclade rapidly places new samples onto an existing reference phylogenetic tree. Phylogenetic placement is done by comparing the mutations of the query sequence (relative to the reference) with the mutations of every node and tip in the reference tree, and finding the node which has the most similar set of mutations. This operation is repeated for each query sequence, until all of them are placed onto the tree. This workflow uses the Nextstrain-maintained nextclade datasets for SARS-CoV-2, mpox, influenza A and B, and RSV-A and RSV-B. The organism must be specified as input in the field organism
, and these align with the nextclade dataset names, i.e. " sars-cov-2", "flu_h1n1pdm_ha", "flu_h1n1pdm_na", "flu_h3n2_ha", "flu_h3n2_na", "flu_vic_ha", "flu_vic_na", "flu_yam_ha", "hMPXV", "hMPXV_B1", "MPXV", "rsv_a" and "rsv_b".
However, nextclade can be used on any organism as long as an an existing, high-quality input reference tree with representative samples on it is provided, in addition to other optional inputs. Contact us if you need help generating your own mutation-annotated tree, or follow the instructions available on the Augur wiki here.
Placement not construction
This workflow is not for building a tree from scratch, but rather for the placement of new sequences onto an existing high-quality input reference tree with representative samples on it. In effect, query samples are only compared to reference samples and never to the other query samples.
Inputs¶
Terra Task Name | Variable | Type | Description | Default Value | Terra Status |
---|---|---|---|---|---|
nextclade_addToRefTree | assembly_fasta | File | A fasta file with query sequence(s) to be placed onto the global tree | Required | |
nextclade_addToRefTree | nextclade_dataset_name | String | What nextclade dataset name to run nextclade on; the options are: "sars-cov-2", "flu_h1n1pdm_ha", "flu_h1n1pdm_na", "flu_h3n2_ha", "flu_h3n2_na", "flu_vic_ha", "flu_vic_na", "flu_yam_ha", "hMPXV", "hMPXV_B1", "MPXV", "rsv_a" and "rsv_b" | Required | |
nextclade_add_ref | cpu | Int | Number of CPUs to allocate to the task | 2 | Optional |
nextclade_add_ref | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
nextclade_add_ref | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/nextstrain/nextclade:3.3.1 | Optional |
nextclade_add_ref | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional |
nextclade_add_ref | verbosity | String | Set the nextclade output verbosity level. Options: off, error, warn, info, debug, trace | "warn" | Optional |
nextclade_addToRefTree | dataset_tag | String | nextclade dataset tag | Uses the dataset tag associated with the nextclade docker image version | Optional |
nextclade_addToRefTree | gene_annotations_gff | File | A genome annotations file for codon-aware alignment, gene translation and calling of aminoacid mutations | Uses the genome annotation associated with the nextclade dataset name | Optional |
nextclade_addToRefTree | input_ref | File | An optional FASTA file containing reference sequence. This file should contain exactly 1 sequence. | Uses the reference fasta associated with the specified nextclade dataset name | Optional |
nextclade_addToRefTree | nextclade_pathogen_json | File | An optional pathogen JSON file containing configuration and data specific to a pathogen. | Uses the reference pathogen JSON file associated with the specified nextclade dataset name | Optional |
nextclade_addToRefTree | reference_tree_json | File | An optional phylogenetic reference tree file which serves as a target for phylogenetic placement | Uses the reference tree associated with the specified nextclade dataset name | Optional |
version_capture | docker | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
version_capture | timezone | String | Set the time zone to get an accurate date of analysis (uses UTC by default) | Optional |
Outputs¶
Variable | Type | Description |
---|---|---|
treeUpdate_auspice_json | File | Phylogenetic tree with user placed samples |
treeUpdate_nextclade_docker | String | Nextclade docker image used |
treeUpdate_nextclade_json | File | JSON file with the results of the Nextclade analysis |
treeUpdate_nextclade_tsv | File | Tab-delimited file with Nextclade results |
treeUpdate_nextclade_version | String | Nextclade version used |
samples_to_ref_tree_analysis_date | String | Date of analysis |
samples_to_ref_tree_version | String | Version of the Public Health Bioinformatics (PHB) repository used |