### 26 Jan 2020 » Open postdoc position to work on adaptive immune repertoires

We are at a very exciting confluence right now: there is lots of data on adaptive immune receptors (antibodies and T cell receptors) and also new methods with which to learn from these data. Our group has a relatively open-ended postdoc position available, in which you can work on our ideas, or bring your own ideas about how we can learn from these data using novel statistical and computational approaches. We have a great team of computational collaborators including Phil Bradley and statistician Noah Simon, as well as access to novel data sets through our network of collaborators including the Overbaugh, Bloom, and Victora labs.

Apply here or just get in touch. Don’t be put off by any jargon you might find there. We’d like to hear from people who can formulate interesting biological questions, and we can help you learn the computational skills to answer them. On the other hand if you come from the computational side, we can get you...
*(full post)*

### 23 Jan 2020 » A Bayesian phylogenetic hidden Markov model for B cell receptor sequences

#### Summary

- antibodies develop within you via an evolutionary process
- understanding these evolutionary patterns is important for understanding how we respond to infection and vaccination
- we have found using Bayesian methods that evolutionary inferences are uncertain in this regime
- our most recent work develops a “Bayesian phylogenetic hidden Markov model,” which takes into account uncertainty in both the V(D)J recombination process and the evolutionary process
- this work reveals substantial amino-acid uncertainty in the inference of the unmutated common ancestor of VRC01, an important and heavily-studied anti-HIV antibody
- our results are described in a preprint which is now being revised for
*PLOS Computational Biology*

#### A brief description of antibody affinity maturation

In order to defend against a very large and ever-mutating pool of pathogens, your body randomly generates, and then optimizes, a large collection of antibodies. These antibodies are displayed as so-called *B cell receptors* on the surface of specialized B cells. The random generation is a process called V(D)J...
*(full post)*

### 24 Aug 2019 » Variational Bayesian phylogenetic inference

In late 2017 we were stuck without a clear way forward for our research on Bayesian phylogenetic inference methods.

We knew that we should be using gradient (i.e. multidimensional derivative) information to aid in finding the posterior, but couldn’t think of a way to find the *right* gradient. Indeed, we had recently finished our work on a variant of Hamiltonian Monte Carlo (HMC) that used the branch length gradient to guide exploration, along with a probabilistic means of hopping from one tree structure to another when a branch became zero. Although this project was a lot of fun and was an ICML paper, it wasn’t the big advance that we needed: these continuous branch length gradients weren’t contributing enough to the fundamental challenge of keeping the sampler in the good region of phylogenetic tree structures. But it was hard to even imagine a good solution to the central question: *how can we take gradients in the discrete space of phylogenetic trees?*

Meanwhile,...
*(full post)*

### 18 Jun 2019 » Bayesian phylogenetic inference without sampling trees

Most every description of Bayesian phylogenetics I’ve read proceeds as follows:

- “Bayesian phylogenetic analyses are conducted using a simulation technique known as Markov chain Monte Carlo (MCMC).” (Alfaro & Holder, 2006)
- “Posterior probabilities are obtained by exploring tree space using a sampling technique, called Markov chain Monte Carlo (MCMC).” (Lemey et al,
*The Phylogenetic Handbook*) - “Once the biologist has decided on the data, model and prior, the next step is to obtain a sample from the posterior. This is done by using MCMC…” (Nascimento et al, 2017.)

With statements like these in popular (and otherwise excellent!) reviews, it’s not surprising that people confuse Bayesian phylogenetics and Markov chain Monte Carlo (MCMC). Well, let’s be clear.

*MCMC is one way to approximate a Bayesian phylogenetic posterior distribution. It is not the only way.*

In this post I’ll describe two of our recent papers that together give a systematic, rather than random, means of approximating a phylogenetic posterior distribution.

Without a...
*(full post)*

### 11 Dec 2018 » Open postdoc position to work on variational Bayesian phylogenetic inference

We are obsessed with finding efficient alternatives to random-walk MCMC for Bayesian phylogenetic inference. We have developed online sequential Monte Carlo theory and algorithms, phylogenetic Hamiltonian Monte Carlo, and inference via direct topology search and efficient marginal likelihood computation.

Come work with us on a strategy that is producing very promising results: variational Bayesian phylogenetic inference based on subsplit Bayesian networks. There are lots of opportunities for projects to flesh out this direction. We would like to find someone who can collaborate with us on methods development and implementation, thus knowledge of both Bayesian statistics and programming expertise are needed. Experience with an existing code base for phylogenetics would be a big plus.

We’re stoked but are happy to wait for the right person to fill the position. If you aren’t ready until this summer, no problem!

Apply here or just get in touch.

### 05 Dec 2018 » Generalizing tree probability estimation via Bayesian networks

Posterior probability estimation of phylogenetic tree topologies from an MCMC sample is currently a pretty simple affair. You run your sampler, you get out some tree topologies, you count them up, normalize to get a probability, and done. It doesn’t seem like there’s a lot of room for improvement, right?

Wrong.

Let’s step back a little and think like statisticians. The posterior probability of a tree topology is an unknown quantity. By running an MCMC sampler, we get a histogram, the normalized version of which will converge to the true posterior in the limit of a large number of samples. We can use that simple histogram estimate, but nothing is stopping us from taking other estimators of the per-topology posterior distribution that may have nicer properties.

For real-valued samples we might use kernel density estimates to smooth noisy sampled distributions, which may reduce error when sampling is sparse. Because the number of phylogenies is huge, MCMC is computationally expensive, and we are naturally...
*(full post)*

### 15 May 2018 » Human T cell receptor occurrence patterns encode immune history, genetic background, and receptor specificity

High-throughput sequencing of our adaptive immune repertoires holds great promise for understanding immune state. These sequences implicitly contain a wealth of information on past and present exposures to infectious and autoimmune diseases, to environmental stimuli, and even to tumor-derived antigens. In principle, we should be able to use these sequences of rearranged receptors to infer their eliciting antigens, either individually or collectively.

We’re starting to see neat progress in these areas for T cell receptors (TCRs). Some recent studies compare TCR repertoire between individuals who do or do not have some immune state, such as an immunization, an autoimmune disease or a viral infection and work to find sequence-level differences between the repertoires. The Walczak-Mora team recently upped the bar by not requiring a control cohort. There has also been interesting progress on predicting epitope specificity from TCR sequence using structurally-informed sequence analysis.

Phil Bradley, just down the hall from us, wanted to take a different approach,...
*(full post)*

### 12 May 2018 » The Bayesian optimist's guide to adaptive immune receptor repertoire analysis

Immune receptor sequencing is stochastic through and through. We have cells with random V(D)J rearrangements that are stimulated through some random process of exposures, which lead to some random amount of expansion, and in the B cell case there is some random process of mutation and selection. So why don’t we use methods incorporating that uncertainty into our analysis?

We’ve tried to do this in our work, and have made some progress, but there is so much left to be done. When Sarah Cobey and Patrick Wilson kindly invited me to contribute to their special issue of *Immunological Reviews*, I knew I wanted to step back and ask:

*If computation was no barrier, how would we design an analysis framework that integrated out uncertainty in unknown quantities and took advantage of the hierarchical structure inherent in immune receptor data?*

I teamed up with Branden Olson, a Statistics PhD student in the lab, and went to work. It was a fun exercise to think...
*(full post)*

Complete list of all posts