First paper on the curvature of tree space

02 Apr 2015, by Erick

Imagine a graph with vertices representing the trees of a given number of taxa, and edges connecting trees such that can be transformed to each other (see below for the “rSPR” example). All popular likelihood-based tree inference algorithms perform some traversal of this graph: Bayesian algorithms, for example, perform Markov chain Monte Carlo (MCMC) on it. In our recent Sys Bio paper, Chris Whidden and I demonstrated graph effects on phylogenetic MCMC: that the graph structure combined with the likelihood function led to bottlenecks in tree space where it was difficult to move from one peak of good trees to another.

These results motivated us to learn more about these graph structures. Consider the best-studied of these graphs, the rSPR graph, in which vertices represent rooted trees, and edges are rooted subtree-prune-regraft operations, in which a rooted tree is cut off of a tree and then reattached somewhere else in the tree with the same rooting. Chris has done fundamental work on inferring shortest paths in this graph, developing practical fixed parameter-tractable algorithms and has continued to make them faster.

Given that such graphs form the underlying space explored by phylogenetic tree inference algorithms, one would hope that we had a good understanding about this best-studied version of the graph. However, all that is known about the rSPR graph is the diameter (the maximum pairwise distance between vertices) and a recursive formula to calculate the degree of a vertex (how many trees are one rSPR move away from a given tree vertex). This is a great start, but is not enough detail to understand bottlenecks.

So Chris and I set out to learn more about the rSPR graph. Our explicit intent was to learn characteristics that would be useful for understanding the difficulty of moving about tree space. For this, we decided to apply recent advances in differential geometry to understand the curvature of rSPR tree-space. Pairs of vertices in a graph are positively curved with respect to a random walk if such walking brings walker positions closer together on average, and negatively curved if walking takes walker positions farther apart. The image above in this post is that of a tiling of the Poincaré disk, which is a negatively curved graph, which you can see because the uniformly-selected-neighbor random walk on average increases distance between walkers. We were able to compute curvatures on the rSPR graph up to 7 taxon trees for simple random walks, and prove some theorems describing curvature for larger numbers of taxa. These results are in a preprint now up on arXiv, which is a fun start, although lots remains to be done.

all posts