15 May 2018 » Human T cell receptor occurrence patterns encode immune history, genetic background, and receptor specificity

High-throughput sequencing of our adaptive immune repertoires holds great promise for understanding immune state. These sequences implicitly contain a wealth of information on past and present exposures to infectious and autoimmune diseases, to environmental stimuli, and even to tumor-derived antigens. In principle, we should be able to use these sequences of rearranged receptors to infer their eliciting antigens, either individually or collectively.

We’re starting to see neat progress in these areas for T cell receptors (TCRs). Some recent studies compare TCR repertoire between individuals who do or do not have some immune state, such as an immunization, an autoimmune disease or a viral infection and work to find sequence-level differences between the repertoires. The Walczak-Mora team recently upped the bar by not requiring a control cohort. There has also been interesting progress on predicting epitope specificity from TCR sequence using structurally-informed sequence analysis.

Phil Bradley, just down the hall from us, wanted to take a different approach,... (full post)

12 May 2018 » The Bayesian optimist's guide to adaptive immune receptor repertoire analysis

Immune receptor sequencing is stochastic through and through. We have cells with random V(D)J rearrangements that are stimulated through some random process of exposures, which lead to some random amount of expansion, and in the B cell case there is some random process of mutation and selection. So why don’t we use methods incorporating that uncertainty into our analysis?

We’ve tried to do this in our work, and have made some progress, but there is so much left to be done. When Sarah Cobey and Patrick Wilson kindly invited me to contribute to their special issue of Immunological Reviews, I knew I wanted to step back and ask:

If computation was no barrier, how would we design an analysis framework that integrated out uncertainty in unknown quantities and took advantage of the hierarchical structure inherent in immune receptor data?

I teamed up with Branden Olson, a Statistics PhD student in the lab, and went to work. It was a fun exercise to think... (full post)

02 May 2018 » Benchmarking tree and ancestral sequence inference for B cell receptor sequences

Phylogenetic tools, in particular for ancestral sequence reconstruction, get used a lot in the B cell receptor (BCR) sequence analysis world. For example, they get used to reconstruct intermediate antibodies that then get synthesized in the lab and tested for binding (Wu et. al, 2011). But how well do phylogenetic tools work in this parameter regime? Although there have been countless benchmarking studies for phylogenetics, the case of B cell sequence evolution is different than the usual setting for phylogenetics:

19 Apr 2018 » Predicting B cell receptor substitution profiles using public repertoire data

Can we predict how sites of an antibody will tolerate amino acid substitutions? Kristian Davidsen posed this question shortly after he arrived in my group, pointing out that being able to do such prediction would be quite useful. For example, engineered antibodies sometimes aggregate into clumps or have other properties that that make them useless for mass production. If we could figure out ways to change the amino acid sequence of an antibody without changing binding properties, that could help us avoid aggregation and make a more useful antibody.

How to start to address this complex and high-dimensional question? Although people have started to do deep mutational scanning on antibodies this type of data is hard to come by. On the other hand, B cell repertoire (i.e. antibody-coding) sequence data is becoming plentiful. B cells undergo affinity maturation to improve binding in collections of sequences called “clonal families” grouped by naive ancestor sequence (more background here). Although it’s not quite the... (full post)

10 Jan 2018 » Postdoc opening to learn about antibody development during HIV superinfection

Please see https://b-t.cr/t/506 for details.

01 Dec 2017 » Per-sample immunoglobulin germline inference from B cell receptor deep sequencing data

Every B cell receptor sequence in a repertoire came from a V(D)J recombination of germline genes. Each individual has only certain alleles of these genes in their germline, and knowing this set improves the accuracy of all aspects of BCR sequence analysis, from alignment to phylogenetic ancestral sequence reconstruction. This germline allele set can be estimated directly from BCR sequence data, and it’s time to treat such estimation as part of standard BCR sequence analysis pipelines.

This central message is not new, but it’s worth emphasizing because doing germline set inference is not part of most current studies of B cell receptor (BCR) sequences.

Indeed, the most common way to annotate sequences is to align them one by one to the full set of alleles present in the IMGT database, which has hundreds of alleles. Each individual has only a fraction of these alleles in their genome.

Unsurprisingly, aligning sequences one by one to the whole IMGT set can cause problems. Imagine that... (full post)

14 Nov 2017 » Survival analysis of DNA mutation motifs with penalized proportional hazards

We are equipped with purpose-built molecular machinery to mutate our genome so that we can become immune to pathogens. This is truly a thing of wonder.

More specifically, I’m talking about mutations in B cells, the cells that make antibodies. Once a randomly-generated antibody expressed on the outside of the B cell finds something it’s good at binding, the cell boosts the mutation rate of its antibody-coding region by about one million fold. Those that have better binding are rewarded by stimulation to divide further. The result of this Darwinian mutation and selection process is antibodies with improved binding properties.

The mutation process is wonderfully complex and interesting. Being statisticians, we payed our highest tribute that we can to a process we think is beautiful: we developed a statistical model of it. This work was led by the dynamic duo of Jean Feng and David Shaw, while Vladimir Minin, Noah Simon and I kibitzed. Our model is known in... (full post)

05 Sep 2017 » Using genotype abundance to improve phylogenetic inference

When doing computational biology, listen to biologists. I have found them to have remarkable intuition; this can be a gold mine of opportunity for us computational types.

In this particular case, the starting point was the stunningly beautiful work of Gabriel Victora’s lab visualizing germinal center dynamics in living mice. For those not yet initiated into the beauty of B cell repertoire, germinal centers are crucibles of evolution, in which B cells compete in an antigen-binding contest such that the best binder reproduces more. As part of the Victora lab work, they did single-cell extraction and sequencing, which enabled them to quantify the frequency of each B cell genotype without PCR bias or other artifacts. Such single-cell sequencing, and consequent abundance information, is now becoming commonplace. How should we use this abundance information in phylogenetics?

Well, the Victora lab knew, even if their algorithm implementation is not one we would have considered. Indeed, they were building trees by hand, using several criteria... (full post)

Complete list of all posts