- 01 Sep 2015 » Postdoctoral position to study molecular evolution and phylogenetics of immune cells
- 27 Aug 2015 » High school students 2015
- 22 Jul 2015 » Tanglegrams!
- 15 Jul 2015 » New paper on the shape of the phylogenetic likelihood function
- 02 Apr 2015 » First paper on the curvature of tree space
- 23 Mar 2015 » New paper on annotation of BCR sequences
- 21 Mar 2015 » Welcome Vu Dinh!
- 15 Jan 2015 » PrEP and drug resistance mutations
The adaptive immune system continually evolves within each individual in order to neutralize and destroy pathogens. The receptor sequences of antibody-making B cells undergo a Darwinian process of mutation and selection which improves their binding to antigen (Cobey et al, 2015). It is now possible to sequence these B cell receptors (BCRs) in high throughput, giving a profound new perspective on how the immune system responds to infection (Georgiou et al, 2014). Although the elements of B cell affinity maturation are the same as molecular evolution in other settings, being based on recombination, point mutation, and selection, there are a number of important differences. These differences, along with the sheer volume of sequence data available, bring new challenges for phylogenetics and molecular evolution.
The translational medical consequences of improved methods are significant. Improved methods would especially help in understanding the development of broadly neutralizing antibodies against HIV. The current best hope for an... (full post)
This summer we had two high school students, Andrew and Kate. They were quite sharp, so we threw them in the deep end with some real science projects. Andrew taught himself shell programming and Docker, and learned enough B cell analysis to work on making Bioboxes to be used for validation of B cell sequence analysis software. Kate taught herself shell programming, Python, and learned some undergraduate abstract algebra to learn about the SPR graph by characterizing its distribution of pairwise distances. They were both very independent, and a pleasure to have in the office!
Say we care about a function on pairs of trees (such as subtree-prune-regraft distance) that doesn’t make reference to the labels as such, but simply uses them as markers to ensure that the leaves end up in the right place. We’d like to calculate this function for all trees of a certain type. However, doing so for every pair of labeled trees is a waste, because if we just relabel the two trees in the same way, we will get the same result.
So, how many computations do we actually need to do?
It turns out that we only need to do one calculation per tanglegram. A tanglegram is a pair of trees along with a bijection between the leaves of those trees. They have been investigated in coevolutionary analyses before, and there’s a considerable literature concerning how to draw them in the plane with the minimal number of crossings. However, the symmetries and number of... (full post)
Imagine we have a tree, sequence data for the leaves of that tree, and some fixed mutation rate matrix. Then we fix all of the branch lengths of that tree except for one. The likelihood function restricted to that branch gives a function from the positive real numbers to the unit interval. Question: what is the shape of that function?
I asked Vu this question when he arrived. As described in our new paper on arXiv, the answer is rather interesting, and more complex than I would have thought. Vu did a fantastic job with this project, taking (surprisingly to me) an algebraic approach, defining the characteristic polynomial of a likelihood function, defining an algebraic structure on conditional frequency patterns, then using a result about path-connected subgroups.
To summarize, if the model is quite simple (JC, F81), then the likelihood has a single maximum. However, more complex models such as K2P can take on arbitrarily... (full post)
Imagine a graph with vertices representing the trees of a given number of taxa, and edges connecting trees such that can be transformed to each other (see below for the “rSPR” example). All popular likelihood-based tree inference algorithms perform some traversal of this graph: Bayesian algorithms, for example, perform Markov chain Monte Carlo (MCMC) on it. In our recent Sys Bio paper, Chris Whidden and I demonstrated graph effects on phylogenetic MCMC: that the graph structure combined with the likelihood function led to bottlenecks in tree space where it was difficult to move from one peak of good trees to another.
These results motivated us to learn more about these graph structures. Consider the best-studied of these graphs, the rSPR graph, in which vertices represent rooted trees, and edges are rooted subtree-prune-regraft operations, in which a rooted tree is cut off of a tree and then reattached somewhere else in the tree with the same... (full post)
The antigen binding properties of antibodies are determined by the sequences of their corresponding B cell receptors (BCRs). These BCR sequences are created in “draft” form by VDJ recombination, which randomly selects and trims the ends of V, D, and J genes, then joins them together with additional random nucleotides. If they pass initial screening and bind an antigen, these sequences then undergo an evolutionary process of mutation and selection, “revising” the BCR to improve binding to its cognate antigen.
Our first paper on BCRs concerned natural selection as part of the “revision” process, and when Duncan joined the group we got to work on the “drafting” part. Specifically, the first step was to work on the annotation problem: given a BCR sequence, which nucleotides came from which genes or non-templated insertions? We recently posted a paper on arXiv describing our approach. Like previous work, we use a hidden Markov model (HMM) for this... (full post)
Vu Dinh joined as a postdoc in our group at the new year. Vu came to us from Purdue, where he got his PhD working in statistics and computational biology. He is especially interested in machine learning theory.
Vu has already been highly productive during his short time here. He has proven some theoretical results describing the shape of the phylogenetic likelihood function, and used these to prove convergence of our likelihood function approximation scheme. He has also made significant progress on effective sample size guarantees for online phylogenetic SMC. Stay tuned for upcoming arXiv submissions, and for other nice work from Vu!
There has been a lot of interest and excitement about the application of pre-exposure prophylaxis (PrEP) for HIV, which means giving people uninfected people HIV drugs in order to keep them from being infected with HIV. This works very well when people adhere to the regimen, and in fact was Science’s 2011 breakthrough of the year after the “Partners in PrEP” study. FTC/TDF, sold under the trademark name Truvada, is the first drug approved by the FDA for PrEP. However, HIV is notorious for becoming drug resistant, and so large-scale deployment of HIV drugs for uninfected people leads to the obvious question: won’t that lead to increased drug resistance? In HIV treatment, for example, traditionally patients are not administered drug during the “acute” phase of infection because drug resistance mutations are more likely to arise when replication rates are high. If you give people the drug ahead of time, won’t that be even worse?
A while back Connor... (full post)