Comparisons of microbial communities are often made using measures of alpha and beta diversity.
On the beta diversity side, Weighted UniFrac has seen wide adoption as an abundance-weighted phylogenetic measure of community similarity. However, no abundance-weighted phylogenetic measure of alpha diversity has seen the same success, although several have been described.
We developed a new family of measures interpolating between traditional phylogenetic diversity (PD) and an abundance-weighted generalization, and compared them to extant alpha diversity measures. Spurred on by the recent publication of the generalized UniFrac framework (Chen et. al. 2012), we decided to write up the results.
In applying these new measures to three microbiome datasets, we find something interesting: phylogenetic and taxonomic measures perform consistently better than OTU-based measures in both predicting dysbiosis and differentiating between development stages. This is despite the fact that OTU-based measures of alpha diversity are the dominant way of assessing diversity in human microbiome studies. Our new “balance-weighted phylogenetic diversity” (BWPD) measures perform quite well, especially with an intermediate exponent, similar to the result of Chen et al.
Overall, we suggest using abundance-weighted phylogenetic diversity measures to calculate alpha diversity. Why not give ours a spin? BWPD and other phylogenetic measures are implemented in guppy’s fpd subcommand, and the paper is up on arXiv.
Well, I’m a couple of months late on this, but now I can say with even more confidence that I am thrilled to have Brian Claywell in our group as a programmer and analyst. Brian comes to us from a lab in Missouri, where he worked on methods to process satellite images. He has already proven his ability to take on whatever we throw at him, from developing and implementing a new algorithm in OCaml to knitting together code for a Galaxy instance.
He’ll be doing microbiome-related work for the foreseeable future.
Several years ago, when testing the first version of pplacer, I built a Rube-Goldbergian system of shell scripts to test the effect of the various parameters on accuracy, and another system of scripts to collect the results. After getting it all set up, which was a pain, I still had a delicate and inflexible system depending on filename conventions. Once that paper was done, I vowed never to have to go through that annoyance again.
What I came up with was the first version of nestly. To keep things organized, it made a nested set of directories, with each level in the hierarchy corresponding to a parameter choice, and with JSON files at the tips of the directory tree with all of the relevant parameters. That first version was not very elegant and completely opaque.
Connor and Aaron then got a hold of it and made it a thing of beauty, with both a simple way of using it, and very powerful SCons integration that does incremental builds. Along with a script for actually running your code in the various settings, there is also a script and an API for collating results from the different runs. It has abundant documentation and examples and as well as continuous integration on Travis.
It is no exaggeration that this small package has become completely essential in our workflow, for simulations and analysis alike. The short paper description just came out (a PDF is here).
When comparing communities of organisms, it can be useful to characterize and compare their diversity. A way of doing this that avoids that pesky discrete species concept is to use phylogenetic diversity (PD) to describe a the organisms in a sample. However, like every diversity concept I can think of, if you work harder to sample a given environment (e.g. deeper sequencing) you will end up with more apparent diversity. That doesn’t make for a fair comparison when you are looking at differences of diversity between samples. A common way to do this is to rarefy samples, which is to take an equal-sized random subset of each of the samples. When I started playing around with this a year ago, it didn’t seem like it would be hard to come up with a closed-form solution to the expectation of PD in the unrooted case. Indeed, I doodled out a solution and then found that David Nipperess had done the same for the rooted case. I then worked out the variance of PD under rarefaction in the rooted and unrooted cases, and David and I wrote it up. The paper is now up on arXiv. We also have some neat examples, including the example to the right, which shows how the shape of the rarefaction curve is different for different values of the Nugent diagnostic score (where dark blue -> low Nugent score, light blue -> high Nugent score).
I just learned (from Melissa, our admin) that a grant I wrote with Aaron Darling was selected for funding under the Algorithms for Threat Detection program, which is a joint venture between the NSF, the Defense Threat Reduction Agency, and the National Geospatial-Intelligence Agency. Aaron’s one of my heroes and I’m looking forward to working with him. Yahoo!
We recently got news that FHCRC postdoc Lucie Etienne got an interdisciplinary traning grant to work jointly between our group and the lab of Michael Emerman. Lucie comes from the lab of Martine Peeters in Montpellier, France, where she did some really neat work characterizing the natural history of gorilla SIV in the wild and a novel strain of chimpanzee SIV. With us she’ll be looking more at chimpanzee SIV.
We were happy this week to see our new paper come out on bacterial communities in women with bacterial vaginosis, as surveyed through 16s 454 sequencing. In contrast to some other previous studies, the explicit aim of this study was to study communities in women with BV.
A lot of the story of the paper can be told in the figure here to the right. The two strips on the left show BV status by two different methods of evaluation, with red being positive and green being negative. The big bar plot is an abundance plot of the different species (represented by the different colors). Thus, as reported by a number of other groups, we see that BV is characterized by a highly diverse collection of taxa. This includes the BVAB’s, three species of Clostridium, which were discovered by the Fredricks lab.
This paper featured the use of the pplacer suite of tools in a variety of contexts. Of particular interest to our collaborators were species-level taxonomic classifications using phylogenetic placement of reads (paper in preparation). The paper also used our edge PCA and squash clustering methods to uncover interesting structure in the data. Time for us to get these methods published in their final versions!
I don’t know the first thing about fly genetics. Me explaining something about fly genetics is like someone explaining the Star Wars trilogy who has never seen the movies in their entirety. But, here goes. Eukaryotic DNA is packed into in chromatin, which comes in two flavors: euchromatin and heterochromatin. Heterochromatin is typically not transcriptionally active, and is chock-full of repetitive sequence. However, it’s quite important for cell division. The HP1 gene family is known to maintain heterochromatin, and knocking genes out from this family can have lethal consequences. The conventional wisdom (e.g. Wikipedia) is that HP1 genes have “chromo” and “chromoshadow” domains, linked by a “hinge” region.
Well, it turns out that when Mia Levine, a postdoc in Harmit Malik’s lab, searched fly genomes to find more members of the HP1 gene family, she found something really strange. There were some HP1s that didn’t have chromo domains, and some that didn’t have chromoshadow domains. These aren’t just degraded genes: they are quite conserved at the codon level and transcriptionally active. When we helped them build trees on these genes, a picture emerged with many paralogous HP1’s. Strangely enough, though, there are diagonals in the gene presence/absence table, showing a “revolving door” of HP1 paralogs (see image). It’s safe to say we don’t completely understand what’s going on, but Mia is working hard on characterizing the function and expression localization of these genes. Here’s the paper describing what we found. More fun with Connor and the Malik group!