PhyloCompare¶
Quick Facts¶
| Workflow Type | Applicable Kingdom | Last Known Changes | Command-line Compatibility | Workflow Level | Dockstore |
|---|---|---|---|---|---|
| Standalone | Any taxa | v4.0.0 | Yes | Sample-level | PhyloCompare_PHB |
PhyloCompare_PHB¶
PhyloCompare will generate a cophylogeny plot that visualizes the differences in two trees' tip arrangements. PhyloCompare can also quantitatively compare two phylogenies by calculating the distance between two trees as a measure of the difference in their topologies (tip and branch arrangement). Validation is triggered by setting the validate boolean to "true".
It is recommended to root a phylogeny and PhyloCompare can root upon an outgroup tip or the midpoint.
Tree rooting
If no rooting options are supplied PhyloCompare will determine if the trees are rooted or unrooted.
outgroup and midpoint are incompatible options and the outgroups input will take precedence.
phylovalidate_flag errors
The phylovalidate_flag flags information that may confound distance calculation; e.g. "polytomy" can confound tree comparison if there are non-0 length branches descending from a polytomy, which may lead to erroneous distances if tips are reported in different order. In other words, phylogenies with the same topology may be reported with a non-0 distance if the tips within a polytomy are rearranged within the tree file.
If flags are accompanied by a ">0" phylocompare_distance, then this indicates no distance was calculated; e.g. the "edge_count_mismatch" flag is raised when the number of edges differs between trees and a distance could not be calculated.
Inputs¶
| Terra Task Name | Variable | Type | Description | Default Value | Terra Status |
|---|---|---|---|---|---|
| phylocompare | tree1 | File | Path to a newick-formatted phylogenetic tree in an accessible bucket | Required | |
| phylocompare | tree2 | File | Path to a newick-formatted phylogenetic tree in an accessible bucket | Required | |
| phylovalidate_task | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional |
| root_tree1_task | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional |
| root_tree2_task | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 4 | Optional |
| cophylo_task | cpu | Int | Number of CPUs to allocate to the task | 1 | Optional |
| cophylo_task | disk_size | Int | Amount of storage (in GB) to allocate to the task | 10 | Optional |
| cophylo_task | docker | String | Docker image to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/theiaphylo:0.2.0 | Optional |
| cophylo_task | memory | Int | Amount of memory (in GB) to allocate to the task | 4 | Optional |
| phylocompare | midpoint | Boolean | Root phylogenies at their midpoint | False | Optional |
| phylocompare | outgroup | String | Root phylogenies with an outgroup tip | Optional | |
| phylocompare | validate | Boolean | Run phylogenetic validation by calculating the distance between two phylogenies' tips and branching order | False | Optional |
| phylovalidate_task | cpu | Int | Number of CPUs to allocate to the task | 1 | Optional |
| phylovalidate_task | disk_size | Int | Amount of storage (in GB) to allocate to the task | 10 | Optional |
| phylovalidate_task | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/theiaphylo:0.2.0 | Optional |
| phylovalidate_task | max_distance | Float | Maximum tolerable distance during validation | Optional | |
| phylovalidate_task | resolve_tip_discrepancies | Boolean | Remove tips that are discrepant between trees instead of failing | True | Optional |
| root_tree1_task | cpu | Int | Number of CPUs to allocate to the task | 1 | Optional |
| root_tree1_task | disk_size | Int | Amount of storage (in GB) to allocate to the task | 10 | Optional |
| root_tree1_task | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/theiaphylo:0.1.8 | Optional |
| root_tree2_task | cpu | Int | Number of CPUs to allocate to the task | 1 | Optional |
| root_tree2_task | disk_size | Int | Amount of storage (in GB) to allocate to the task | 10 | Optional |
| root_tree2_task | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/theiaphylo:0.1.8 | Optional |
| version_capture | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0 | Optional |
| version_capture | timezone | String | Set the time zone to get an accurate date of analysis (uses UTC by default) | Optional |
Workflow Tasks¶
root_phylo
root_phylo returns a rooted phylogeny from inputted outgroup or by rooting upon the midpoint root. Outgroups must be tip names (case-sensitive) that exist within the tree.
Root_Phylo Technical Details
| Links | |
|---|---|
| Task | task_root_phylo.wdl |
| Software Source Code | https://github.com/theiagen/theiaphylo |
| Software Documentation | TheiaPhylo |
cophylogeny
The Cophylogeny task will generate cophylogeny plots of two inputted phylogenies. A cophylogeny plot draws lines between two trees' tips as a method for visualizing their topological (tip/branch arrangment) differences.
A cophylogeny plot is generated with branch lengths (cophylogeny_plot_with_branch_lengths) and a cophylogeny plot without branch lengths (cophylogeny_plot). The plot without branch lengths is better for depicting branching order differences, though it is important to note that the branch lengths within this plot are arbitrary and do not convey evolutionary distance. Users will most likely need to visualize the phylogenies independently to interpret evolutionary distance because it is difficult to automatically graph two phylogenies with scaled and viewable branch lengths.
Cophylogeny Technical Details
| Links | |
|---|---|
| Task | task_cophylogeny.wdl |
| Software Source Code | https://github.com/theiagen/theiaphylo |
| Software Documentation | TheiaPhylo |
phylovalidate
phylovalidate will clean two phylogenies and validate if the distance between these two phylogenies' topologies is less than an inputted max_distance float (0 by default). Phylogenies are cleaned by converting 0 branch length nodes into polytomies, and any detected polytomies are reported as a flag. Polytomies may arbitrarily yield a non-0 distance, though if a 0 distance is reported with a polytomy then it indicates that the polytomy did not confound distance calculation. Trees can only be compared if the number of nodes between the trees are the same. Additionally, the tips must be the same between trees, though the resolve_tip_discrepancies boolean is set to "true" by default to remove discrepant tips.
It is difficult to conceptualize what a non-0 distance indicates, so please see the following citations for their interpretation. For unrooted phylogenies, phylovalidate calculates the Lin-Rajan-Moret distance, and for rooted phylogenies, phylovalidate calculates the matching cluster distance. The Robinson-Foulds distance is also calculated, though it is disregarded in validation (see citations for criticism).
PhyloValidate Technical Details
| Links | |
|---|---|
| Task | task_phylovalidate.wdl |
| Software Source Code | https://github.com/theiagen/theiaphylo |
| Software Documentation | TheiaPhylo |
Outputs¶
| Variable | Type | Description |
|---|---|---|
| cophylogeny_plot | File | A cophylogeny plot depicting branching order differences between two phylogenies without branch lengths |
| cophylogeny_plot_with_branch_lengths | File | A cophylogeny plot depicting branching order differences between two phylogenies with branch lengths |
| cophylogeny_version | String | Version of the TheiaPhylo repository used for analysis |
| phylocompare_phb_version | String | The version of the Public Health Bioinformatics (PHB) repository used |
| phylovalidate_distance | String | The quantitative distance between two phylogenies' tip/branch arrangements |
| phylovalidate_flag | String | Flag depicting potential confounding factors during validation status |
| phylovalidate_report | File | Report file summarizing the validation results |
| phylovalidate_tree1_clean | File | Cleaned version of the first phylogenetic tree |
| phylovalidate_tree2_clean | File | Cleaned version of the second phylogenetic tree |
| phylovalidate_version | String | Version of phylovalidate used |
References¶
Lin, Y., Rajan, V., Moret, B. M. E. (2012). A metric for phylogenetic trees based on matching. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(4), 1014-22, https://doi.org/10.1109/tcbb.2011.157
Moon, J. & Eulenstein, O. (2018). Cluster Matching Distance for Rooted Phylogenetic Trees. Lecture Notes in Computer Science, 10847, https://doi.org/10.1007/978-3-319-94968-0_31