pplacer
Pplacer places query sequences on a fixed reference phylogenetic tree to maximize phylogenetic likelihood or posterior probability according to a reference alignment. Pplacer is designed to be fast, to give useful information about uncertainty, and to offer advanced visualization and downstream analysis.
We have written a several papers about pplacer. The BMC Bioinformatics paper describes the v1.0 version of pplacer. There is also the following related work:
- a paper describing the phylogenetic Kantorovich-Rubinstein metric as implemented in guppy
- a paper describing ordination and clustering methods that take advantage of the special structure of phylogenetic placement data (edge PCA and squash clustering)
- a paper describing the phylogeny/taxonomy discordance and taxonomic rooting algorithms implemented in rppr.
- a paper describing the ADCL minimization algorithms implemented in rppr.
- a paper describing the calculation of the mean and variance of PD under rarefaction implemented in guppy.
- a paper describing the BWPD diversity measure implemented in guppy.
v1.1 binaries | latest release | |||
documentation | manual | tutorial | ||
code | github repository | compilation | ||
reference package | software |
News
- v1.1.alpha17: better error messages (2015-09-23)
- v1.1.alpha16: fix NBC and hybrid classification tallies in `classif_table.py` (2014-05-05)
- v1.1.alpha15: gaps, rates, and better support for more sequences (2014-02-07)
- v1.1.alpha14: BIOM format support, improvements, and fixes (2013-05-20)
- For mothur/QIIME users,
guppy
andrppr
can now use a BIOM file and tree in place of a reference package and.jplace
file. - Edge PCA can now perform support overlap minimization (SOM), which minimizes overlap between the first two or three principal components.
Note that the
guppy pca
subcommand has been renamed toguppy epca
, and the output file extensions have changed. - The mean and variance of PD can be calculated using
guppy rarefact
. - The alpha diversity measures calculated by
guppy fpd
now includes the BWPD metric we recently described, as well as support for theqD(T)
measure of Chao et. al. (2010). - v1.1.alpha13: New guppy commands and fixes (2012-05-21)
This is a minor release, mostly to have a place to put the binaries on the latest releases page. We are moving hosting binaries on the Matsen group webpage to GitHub.
There was a bug in the Python classification scripts when deduplication was used with NBC or hybrid classification methods. This bug resulted in inaccurate tally outputs, and is fixed in this release.
We are releasing a new version today, and there is one change that should be especially noted because it can change placement results. I have changed how non-informative columns (such as columns that are gap in a query sequence) are masked out of the final alignments that are used for placement. The difference is fully explained on the pplacer documentation page, but in short there are some subtle effects that non-informative sites can have on placement, and so it matters if they are masked or not. I also fixed a bug that was misaligning rate assignments when placing on a tree that was built with FastTree.
I do not expect the changes to substantially impact your results. For instance, this changed classification results for 4-6% of sequences in a trial run, and those changes were almost uniformly between classifications at the species level and the corresponding genus and vice versa, although I can’t promise what will happen on your data.
However, on the bright side, pplacer is now guaranteed to give identical results irrespective of any other sequences that are in your query fasta file.
I should also note that taxtastic was making reference packages for FastTree amino acid trees that were incorrectly being called as using an empirical rate matrix.
This bug has been fixed and I encourage you to update to the master branch there (you can update via pip
).
There are a number of new features in this release that will be of interest to people pushing lots of data through pplacer, including that pplacer
, guppy
, and rppr
support gzip-compressed input sequence and .jplace
files, and deduplicate_sequences.py
supports gzip-compressed FASTA files.
As usual, there were also lots of bugfixes and improvements: see the CHANGELOG for details.
There are a number of new features, including
Plus lots of bugfixes and improvements: see the CHANGELOG for details.
Although we very much wanted a final version of classification code in this release, validation is still ongoing.
This will be the last release before we release the improved taxonomic classification code, which is still getting tuned.
There is some new guppy functionality that is useful for doing edge PCA and the like for big trees: trim
, which finds the subset of the tree that actually has reads placed on it, and the --rep-edges
flag for the splitify subcommands that just takes representatives of edge collections that are similar to one another.
rppr voronoi
has also seen some new work, including a simple way of passing just a tree to voronoi (rather than a collection of placed reads) called rppr vorotree
.
There is also an implementation of a variant of a classical heuristic, called Partitioning Among Medoids (PAM), to solve the ADCL problem, which will be explained in a manuscript to be submitted this month.
There is also the fairly self-explanatory guppy unifrac
and guppy rarefy
, and subcommand that turns jplace files in to CSV files called guppy to_csv
.
As usual, see the CHANGELOG for more details.