Adding paired BCR data to OAS | Oxford Protein Informatics Group

Hello,

Today is the day for my final blog post before I enter a thesis writing mode. Using this given opportunity, I would like to present to you our recent update to the Observed Antibody Space (OAS) resource where we included paired antibody data (http://opig.stats.ox.ac.uk/webapps/oas).

In my DPhil, I primarily focused on structural interrogation of antibody repertoires. Almost all publicly available antibody repertories do not contain information about original pairings of antibody variable heavy (VH) and light (VL) chains. Since both VH and VL play a pivotal role in shaping of the antigen binding site (known as paratope), sequencing of unpaired VH/VL repertoires hinders our ability to predict three dimensional configurations of the original paratopes.

With the advent of 10xGenomics, it has become possible to sequence the natively paired full-length VH and VL sequences. Briefly, 10xGenomics pipeline relies on identical labelling of VH/VL cDNA by fusing unique cell barcodes with individual B-cells. This pipeline is optimised to deliver a single 10xGenomics barcode bead to an individual B-cell. Next, 10xGenomics barcodes are attached to the VH and VL cDNA followed by Illumina HiSeq sequencing Figure 1.

Figure 1. Summary of 10xGenomics VH/VL cDNA labelling

Here, I am very happy to share with you our paired version of OAS. So far, only few studies have made their 10xGenomics antibody repertoire publicly available. We download, assemble, clean and annotate raw 10xGenomics reads (Figure 2). Briefly, we download our FASTQ files from the NCBI website. These raw reads are assembled into contigs using CellRanger (10xGenomics software). We only keep the highest quality contigs (filtered contigs). Next, we use IgBlastn to generate the MiAIRR compliant annotation of V(D)J gene recombination events. At OPIG, when we structurally interrogate antibody repertoire, we need to make sure that all filtered sequences are structurally viable. To achieve this, we employ ANARCI to number of amino acid sequences according to the IMGT numbering scheme. Any positions that do not fit the scheme are flagged. Finally, we combine CellRanger, IgBlastn and ANARCI annotations with study metadata information to form an OAS data unit.

Figure 2. Processing raw 10xGenomics reads

In an ideal case scenario, only one 10xGenomics barcode is shared between one unique VH and VL contigs. However, in many cases it does not hold true as the same 10xGenomics barcode can be shared with more than two unique V(D)J gene recombination events. To account for that, we provide “paired” data where we filter for sequences with 10xGenomics barcodes that are only shared between one unique VH and VL chain sequences. We also provide all “unpaired” sequences that passed CellRanger and IgBlastn annotation. Both these files can be accessed in the download tab.

I wrote a separate blogpost about working with OAS data units. Briefly, these are gzipped comma-separated files, where the first row is data unit metadata, and all the subsequent rows contain sequence annotation data.

Thank you for your attention.

All the best,

Alex

P.S. I would like to thank Claire for these amazing figures 🙂

Author

Aleksandr Kovaltsuk

View all posts