Skip to content

PhyloCompare

Quick Facts

Workflow Type Applicable Kingdom Last Known Changes Command-line Compatibility Workflow Level
Standalone Any taxa vX.X.X Yes Sample-level

PhyloCompare_PHB

PhyloCompare will calculate the distance between two newick-formatted phylogenies as a measure of the difference in their topologies (tip and branch arrangement). A distance of 0 indicates the phylogenies have the same topology. PhyloCompare will validate if the phylogenies exceed an inputted maximum distance. The maximum distance is 0 by default and the phylogenies must have the same tips.

It is difficult to conceptualize what a non-0 distance indicates, so please see the following citations for their interpretation. For unrooted phylogenies, PhyloCompare calculates the Lin-Rajan-Moret distance, and for rooted phylogenies, PhyloCompare calculates the matching cluster distance. The Robinson-Foulds distance is also calculated, though it is disregarded in validation (see citations for criticism).

PhyloCompare can automatically root upon outgroup tips or the midpoint. If more than a single outgroup tip is supplied then the phylogenies will be rooted on their most recent common ancestor branch. Input multiple outgroup tips as a comma-delimited list, e.g. "tip1,tip2".

Tree rooting

If no rooting options are supplied PhyloCompare will determine if the trees are rooted or unrooted.

outgroups and midpoint are incompatible options and the outgroups input will take precedence.

phylocompare_flag errors

The phylocompare_flag flags information that may confound distance calculation; e.g. "polytomy" can confound tree comparison if there are non-0 length branches descending from a polytomy, which may lead to erroneous distances if tips are reported in different order. In other words, phylogenies with the same topology may be reported with a non-0 distance if the tips within a polytomy are rearranged within the tree file.

If flags are accompanied by a ">0" phylocompare_distance, then this indicates no distance was calculated; e.g. the "edge_count_mismatch" flag is raised when the number of edges differs between trees and a distance could not be calculated.

Inputs

Terra Task Name Variable Type Description Default Value Terra Status
phylocompare tree1 File Path to a newick-formatted phylogenetic tree in an accessible bucket Required
phylocompare tree2 File Path to a newick-formatted phylogenetic tree in an accessible bucket Required
phylovalidate_task memory Int Amount of memory/RAM (in GB) to allocate to the task 4 Optional
root_tree1_task memory Int Amount of memory/RAM (in GB) to allocate to the task 4 Optional
root_tree2_task memory Int Amount of memory/RAM (in GB) to allocate to the task 4 Optional
phylocompare max_distance Float Maximum tolerable distance in validation 0.0 Optional
phylocompare midpoint Boolean Root phylogenies at their midpoint FALSE Optional
phylocompare outgroups String Comma-delimited list of outgroup tip(s) to root upon. Multiple outgroup tips will root on the branch descended from their most recent common ancestor Optional
phylovalidate_task cpu Int Number of CPUs to allocate to the task 1 Optional
phylovalidate_task disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
phylovalidate_task docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/theiagen/theiaphylo:0.1.7 Optional
root_tree1_task cpu Int Number of CPUs to allocate to the task 1 Optional
root_tree1_task disk_size Int Amount of storage (in GB) to allocate to the task 10 Optional
root_tree1_task docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/theiagen/theiaphylo:0.1.7 Optional
root_tree2_task cpu Int Number of CPUs to allocate to the task 1 Optional
root_tree2_task disk_size Int Amount of storage (in GB) to allocate to the task 10 Optional
root_tree2_task docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/theiagen/theiaphylo:0.1.7 Optional
version_capture docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0 Optional
version_capture timezone String Set the time zone to get an accurate date of analysis (uses UTC by default) Optional

Outputs

Variable Type Description
phylocompare_distance String Distance between the phylogenies, or "None"/">0" if distance was unable to be calculated
phylocompare_flag String Flags raised that may confound distance calculation
phylocompare_phb_version String The version of the Public Health Bioinformatics (PHB) repository used
phylocompare_report File Text file of the calculated distances
phylocompare_tree1_clean File Cleaned newick file for the first tree
phylocompare_tree2_clean File Cleaned newick file for the second tree
phylocompare_validation String PASS if distance < max_distance and "FAIL" if distance > max_distance or could not be calculated
phylocompare_version String Version of PhyloCompare python script

All Tasks

root_phylo

Root_Phylo returns a rooted phylogeny from inputted outgroup(s) or by finding the midpoint root. Outgroups must be tip names (case-sensitive) that exist within the tree, and multiple outgroups must be comma-delimited. Up to two outgroup tips can be supplied, and the most-recent common ancestor (MRCA) of the these outgroups will be used as the rooting branch. It is important to note that rooting on the MRCA of two outgroups is relative to the topology of the tree prior to rooting - if one of the samples is at that base of the phylogeny prior to rooting, then a random tip will be selected to allow for rooting upon the MRCA of the two inputted outgroups.

Root_Phylo Technical Details

Links
Task task_root_phylo.wdl
Software Source Code https://github.com/theiagen/theiaphylo
Software Documentation TheiaPhylo
phylocompare

PhyloCompare will clean two phylogenies and validate if the distance between these two phylogenies' topologies is less than an inputted max_distance float (0 by default). Phylogenies are cleaned by converting 0 branch length nodes into polytomies, and any detected polytomies are reported as a flag. Polytomies may arbitrarily yield a non-0 distance, though if a 0 distance is reported with a polytomy then it indicates that the polytomy did not confound distance calculation.

For unrooted phylogenies, PhyloCompare calculates the Lin-Rajan-Moret distance, and for rooted phylogenies, PhyloCompare calculates the matching cluster distance. The Robinson-Foulds distance is also calculated, though it is disregarded in validation (see citations for criticism).

PhyloCompare Technical Details

Links
Task task_phylocompare.wdl
Software Source Code https://github.com/theiagen/theiaphylo
Software Documentation TheiaPhylo

References