- 22 Jul 2015 » Tanglegrams!
- 15 Jul 2015 » New paper on the shape of the phylogenetic likelihood function
- 02 Apr 2015 » First paper on the curvature of tree space
- 23 Mar 2015 » New paper on annotation of BCR sequences
- 21 Mar 2015 » Welcome Vu Dinh!
- 15 Jan 2015 » PrEP and drug resistance mutations
- 01 Sep 2014 » Funding to support work on B cells
- 22 Aug 2014 » Welcome Chris Warth!

Say we care about a function on pairs of trees (such as subtree-prune-regraft distance) that doesn’t make reference to the labels as such, but simply uses them as markers to ensure that the leaves end up in the right place. We’d like to calculate this function for all trees of a certain type. However, doing so for *every* pair of labeled trees is a waste, because if we just relabel the two trees in the same way, we will get the same result.

So, how many computations do we actually need to do?

It turns out that we only need to do one calculation per *tanglegram*. A tanglegram is a pair of trees along with a bijection between the leaves of those trees. They have been investigated in coevolutionary analyses before, and there’s a considerable literature concerning how to draw them in the plane with the minimal number of crossings. However, the symmetries and number of...
*(full post)*

Imagine we have a tree, sequence data for the leaves of that tree, and some fixed mutation rate matrix. Then we fix all of the branch lengths of that tree except for one. The likelihood function restricted to that branch gives a function from the positive real numbers to the unit interval. Question: what is the shape of that function?

I asked Vu this question when he arrived. As described in our new paper on arXiv, the answer is rather interesting, and more complex than I would have thought. Vu did a fantastic job with this project, taking (surprisingly to me) an algebraic approach, defining the *characteristic polynomial* of a likelihood function, defining an algebraic structure on *conditional frequency patterns*, then using a result about path-connected subgroups.

To summarize, if the model is quite simple (JC, F81), then the likelihood has a single maximum. However, more complex models such as K2P can take on arbitrarily...
*(full post)*

Imagine a graph with vertices representing the trees of a given number of taxa, and edges connecting trees such that can be transformed to each other (see below for the “rSPR” example). All popular likelihood-based tree inference algorithms perform some traversal of this graph: Bayesian algorithms, for example, perform Markov chain Monte Carlo (MCMC) on it. In our recent Sys Bio paper, Chris Whidden and I demonstrated graph effects on phylogenetic MCMC: that the graph structure combined with the likelihood function led to bottlenecks in tree space where it was difficult to move from one peak of good trees to another.

These results motivated us to learn more about these graph structures. Consider the best-studied of these graphs, the rSPR graph, in which vertices represent *rooted* trees, and edges are rooted subtree-prune-regraft operations, in which a rooted tree is cut off of a tree and then reattached somewhere else in the tree with the same...
*(full post)*

The antigen binding properties of antibodies are determined by the sequences of their corresponding B cell receptors (BCRs). These BCR sequences are created in “draft” form by VDJ recombination, which randomly selects and trims the ends of V, D, and J genes, then joins them together with additional random nucleotides. If they pass initial screening and bind an antigen, these sequences then undergo an evolutionary process of mutation and selection, “revising” the BCR to improve binding to its cognate antigen.

Our first paper on BCRs concerned natural selection as part of the “revision” process, and when Duncan joined the group we got to work on the “drafting” part. Specifically, the first step was to work on the *annotation problem*: given a BCR sequence, which nucleotides came from which genes or non-templated insertions? We recently posted a paper on arXiv describing our approach. Like previous work, we use a hidden Markov model (HMM) for this...
*(full post)*

Vu Dinh joined as a postdoc in our group at the new year. Vu came to us from Purdue, where he got his PhD working in statistics and computational biology. He is especially interested in machine learning theory.

Vu has already been highly productive during his short time here. He has proven some theoretical results describing the shape of the phylogenetic likelihood function, and used these to prove convergence of our likelihood function approximation scheme. He has also made significant progress on effective sample size guarantees for online phylogenetic SMC. Stay tuned for upcoming arXiv submissions, and for other nice work from Vu!

There has been a lot of interest and excitement about the application of pre-exposure prophylaxis (PrEP) for HIV, which means giving people uninfected people HIV drugs in order to *keep* them from being infected with HIV. This works very well when people adhere to the regimen, and in fact was Science’s 2011 breakthrough of the year after the “Partners in PrEP” study. FTC/TDF, sold under the trademark name Truvada, is the first drug approved by the FDA for PrEP. However, HIV is notorious for becoming drug resistant, and so large-scale deployment of HIV drugs for uninfected people leads to the obvious question: won’t that lead to increased drug resistance? In HIV treatment, for example, traditionally patients are not administered drug during the “acute” phase of infection because drug resistance mutations are more likely to arise when replication rates are high. If you give people the drug ahead of time, won’t that be even worse?

A while back Connor...
*(full post)*

As described in a preprint announcement post we are now hard at work on analyzing the evolutionary process of B cell molecular evolution that is continuously happening within each of us. I’m happy (relieved!) to report that we were recently awarded a grant through the DMS/NIGMS Interface program program to pursue this work, joint with brilliant co-investigators Trevor Bedford, Vladimir Minin, and Harlan Robins. Now, to work!

We got exceedingly lucky recently. Just as Connor and Chris Small were getting ready to leave, a programmer named Chris Warth was moving on from Marty McIntosh’s group and came knocking on our door. After some initial work together, it became clear that Chris was going to be a great fit. Chris has had a long history in computing, including as part of the group of five programmers that developed the Java programming language at Sun while it was still called Oak. After moving around the tech industry for a while, he spent a stint as a farmer and then decided to take up biology.

Chris is a real pleasure to have in the group. He is, of course, an excellent programmer, but also both knowledgeable and very curious about biology. So far he has taken over the work that Connor was doing with the Overbaugh lab, and has also been working on likelihood curve fitting and ancestral sequence reconstruction.

...all posts