PhyloCompare¶
Quick Facts¶
Workflow Type | Applicable Kingdom | Last Known Changes | Command-line Compatibility | Workflow Level |
---|---|---|---|---|
Standalone | Any taxa | vX.X.X | Yes | Sample-level |
PhyloCompare_PHB¶
PhyloCompare will calculate the distance between two newick-formatted phylogenies as a measure of the difference in their topologies (tip and branch arrangement). A distance of 0 indicates the phylogenies have the same topology. PhyloCompare will validate if the phylogenies exceed an inputted maximum distance. The maximum distance is 0 by default and the phylogenies must have the same tips.
It is difficult to conceptualize what a non-0 distance indicates, so please see the following citations for their interpretation. For unrooted phylogenies, PhyloCompare calculates the Lin-Rajan-Moret distance, and for rooted phylogenies, PhyloCompare calculates the matching cluster distance. The Robinson-Foulds distance is also calculated, though it is disregarded in validation (see citations for criticism).
PhyloCompare can automatically root upon outgroup tips or the midpoint. If more than a single outgroup tip is supplied then the phylogenies will be rooted on their most recent common ancestor branch. Input multiple outgroup tips as a comma-delimited list, e.g. "tip1,tip2".
Tree rooting
If no rooting options are supplied PhyloCompare will determine if the trees are rooted or unrooted.
outgroups
and midpoint
are incompatible options and the outgroups
input will take precedence.
phylocompare_flag
errors
The phylocompare_flag
flags information that may confound distance calculation; e.g. "polytomy" can confound tree comparison if there are non-0 length branches descending from a polytomy, which may lead to erroneous distances if tips are reported in different order. In other words, phylogenies with the same topology may be reported with a non-0 distance if the tips within a polytomy are rearranged within the tree file.
If flags are accompanied by a ">0" phylocompare_distance
, then this indicates no distance was calculated; e.g. the "edge_count_mismatch" flag is raised when the number of edges differs between trees and a distance could not be calculated.
Inputs¶
Terra Task Name | Variable | Type | Description | Default Value | Terra Status |
---|---|---|---|---|---|
phylocompare | tree1 | File | Path to a newick-formatted phylogenetic tree in an accessible bucket | Required | |
phylocompare | tree2 | File | Path to a newick-formatted phylogenetic tree in an accessible bucket | Required | |
phylovalidate_task | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional |
root_tree1_task | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional |
root_tree2_task | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional |
phylocompare | max_distance | Float | Maximum tolerable distance in validation | 0.0 | Optional |
phylocompare | midpoint | Boolean | Root phylogenies at their midpoint | FALSE | Optional |
phylocompare | outgroups | String | Comma-delimited list of outgroup tip(s) to root upon. Multiple outgroup tips will root on the branch descended from their most recent common ancestor | Optional | |
phylovalidate_task | cpu | Int | Number of CPUs to allocate to the task | 1 | Optional |
phylovalidate_task | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
phylovalidate_task | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/theiaphylo:0.1.7 | Optional |
root_tree1_task | cpu | Int | Number of CPUs to allocate to the task | 1 | Optional |
root_tree1_task | disk_size | Int | Amount of storage (in GB) to allocate to the task | 10 | Optional |
root_tree1_task | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/theiaphylo:0.1.7 | Optional |
root_tree2_task | cpu | Int | Number of CPUs to allocate to the task | 1 | Optional |
root_tree2_task | disk_size | Int | Amount of storage (in GB) to allocate to the task | 10 | Optional |
root_tree2_task | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/theiaphylo:0.1.7 | Optional |
version_capture | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0 | Optional |
version_capture | timezone | String | Set the time zone to get an accurate date of analysis (uses UTC by default) | Optional |
Outputs¶
Variable | Type | Description |
---|---|---|
phylocompare_distance | String | Distance between the phylogenies, or "None"/">0" if distance was unable to be calculated |
phylocompare_flag | String | Flags raised that may confound distance calculation |
phylocompare_phb_version | String | The version of the Public Health Bioinformatics (PHB) repository used |
phylocompare_report | File | Text file of the calculated distances |
phylocompare_tree1_clean | File | Cleaned newick file for the first tree |
phylocompare_tree2_clean | File | Cleaned newick file for the second tree |
phylocompare_validation | String | PASS if distance < max_distance and "FAIL" if distance > max_distance or could not be calculated |
phylocompare_version | String | Version of PhyloCompare python script |
All Tasks¶
root_phylo
Root_Phylo returns a rooted phylogeny from inputted outgroup(s) or by finding the midpoint root. Outgroups must be tip names (case-sensitive) that exist within the tree, and multiple outgroups must be comma-delimited. Up to two outgroup tips can be supplied, and the most-recent common ancestor (MRCA) of the these outgroups will be used as the rooting branch. It is important to note that rooting on the MRCA of two outgroups is relative to the topology of the tree prior to rooting - if one of the samples is at that base of the phylogeny prior to rooting, then a random tip will be selected to allow for rooting upon the MRCA of the two inputted outgroups.
Root_Phylo Technical Details
Links | |
---|---|
Task | task_root_phylo.wdl |
Software Source Code | https://github.com/theiagen/theiaphylo |
Software Documentation | TheiaPhylo |
phylocompare
PhyloCompare will clean two phylogenies and validate if the distance between these two phylogenies' topologies is less than an inputted max_distance
float (0 by default). Phylogenies are cleaned by converting 0 branch length nodes into polytomies, and any detected polytomies are reported as a flag. Polytomies may arbitrarily yield a non-0 distance, though if a 0 distance is reported with a polytomy then it indicates that the polytomy did not confound distance calculation.
For unrooted phylogenies, PhyloCompare calculates the Lin-Rajan-Moret distance, and for rooted phylogenies, PhyloCompare calculates the matching cluster distance. The Robinson-Foulds distance is also calculated, though it is disregarded in validation (see citations for criticism).
PhyloCompare Technical Details
Links | |
---|---|
Task | task_phylocompare.wdl |
Software Source Code | https://github.com/theiagen/theiaphylo |
Software Documentation | TheiaPhylo |
References¶
Lin, Y., Rajan, V., Moret, B. M. E. (2012). A metric for phylogenetic trees based on matching. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(4), 1014-22, https://doi.org/10.1109/tcbb.2011.157
Moon, J. & Eulenstein, O. (2018). Cluster Matching Distance for Rooted Phylogenetic Trees. Lecture Notes in Computer Science, 10847, https://doi.org/10.1007/978-3-319-94968-0_31
Cogent3 Python Library https://github.com/cogent3/cogent3