Pplacer places query sequences on a fixed reference phylogenetic tree to maximize phylogenetic likelihood or posterior probability according to a reference alignment. Pplacer is designed to be fast, to give useful information about uncertainty, and to offer advanced visualization and downstream analysis.
We have written a several papers about pplacer. The BMC Bioinformatics paper describes the v1.0 version of pplacer. There is also the following related work:
| v1.1 binaries | 64-bit Linux (2.6.32) | OS X Snow Leopard (10.7.0) | ||
| documentation | manual | tutorial | ||
| code | github repository | compilation | ||
| reference package | software |
This will be the last release before we release the improved taxonomic classification code, which is still getting tuned. There is some new guppy functionality that is useful for doing edge PCA and the like for big trees: trim, which finds the subset of the tree that actually has reads placed on it, and the --rep-edges flag for the splitify subcommands that just takes representatives of edge collections that are similar to one another. rppr voronoi has also seen some new work, including a simple way of passing just a tree to voronoi (rather than a collection of placed reads) called rppr vorotree. There is also an implementation of a variant of a classical heuristic, called Partitioning Among Medoids (PAM), to solve the ADCL problem, which will be explained in a manuscript to be submitted this month. There is also the fairly self-explanatory guppy unifrac and guppy rarefy, and subcommand that turns jplace files in to CSV files called guppy to_csv. As usual, see the CHANGELOG for more details.
There isn’t anything super monumental in this release, but in the 25 closed issues there are lots of little fixes and the seeds for some cool new functionality.
guppy fpd is a generalized alpha diversity calculator, including phylogenetic entropy and phylogenetic quadratic entropyguppy ograph gives the overlap graph for a collection of reads.As usual, see the CHANGELOG for more details.
This is mostly a bugfix release. The most significant fixed problem was that placing on a FastTree tree could fail when when using a combined reference + query alignment.
Also
guppy pca can now be done in an unweighted fashionThere are a bunch of other minor fixes and changes; see the CHANGELOG for more details.
This rolls out our FastTree CAT model support and a new collection of heuristics for the initial evaluation phase of placement.
To run pplacer using a FastTree tree, build your tree using the -gtr flag and save the log file using the -log option. The log file is used in the same way as the statistics file when building a reference package. If you haven’t built one already, just have a look at the taxtastic quickstart. From there, the reference package is used just like any other. Note that pplacer won’t have to re-infer site categories (faster) if you are using the alignment in the reference package. Placing on FastTree trees takes about about 1/4 the memory of the equivalent tree inferred using GTRGAMMA in RAxML.
This release also contains command line flags that control the new “fig” heuristics. These heuristics greatly accelerate placement on reference trees when the reference tree is big (e.g. > 10k leaves). In short, the tree gets divided up into subtrees, that we call “figs”. These are connected units of the tree such that the distance between any two leaves is less than the value specified with the --fig-cutoff flag on the command line. The initial evaluation of edges for a placement then happens in three phases: first, evaluate each of the figs using representative edges. Then, merge figs that are close to one another in score and sort them. Finally, treat each (potentially merged) fig as a unit in the baseball heuristics; if we try all of the edges of one fig then we drop down to the next highest scoring fig and evaluate its edges. We have not seen a noticeable drop in accuracy using --fig-cutoff 0.2, and it’s much faster for a 35K taxon tree. Your mileage may vary.
Both of these new features are experimental.
I’m afraid that there is a new version of the installation script for those of you who are compiling. Our recent work placing on big trees broke our previous XML library and we’ve had to replace it. We’re hoping this will be the last change for a while.
This release was a long time coming, and it’s a doozy: 63 closed issues and 480-ish commits. The changelog entry is 200 lines, so I’m not going to go over all of it here. Here are some highlights:
rppr being the new kid. rppr stands for Reference Package PReparer, and it consequently has functionality that is relevant for working with reference pacakges.guppy mft is a new command for transforming these masses.rppr convexify describes phylogeny/taxonomy discordance and rppr
reroot does taxonomic rooting, as described in the corresponding paper.For those of you who are compiling from source, there is a new version of the installation script. We’ve upgraded to OCaml 3.12 and started using the Batteries library, and this script will bring things up to date.