Here, we describe protocols for the analysis and visualization of the structure and constitution of whole antibody repertoires. This involves the acquisition of vast sequences of antibody RNA using next-generation sequencing.
The immense adaptability of antigen recognition by antibodies is the basis of the acquired immune system. Despite our understanding of the molecular mechanisms underlying the production of the vast repertoire of antibodies by the acquired immune systems, it has not yet been possible to arrive at a global view of a complete antibody repertoire. In particular, B cell repertoires have been regarded as a black box because of their astronomical number of antibody clones. However, next-generation sequencing technologies are enabling breakthroughs to increase our understanding of the B cell repertoire. In this report, we describe a simple and efficient method to visualize and analyze whole individual mouse and human antibody repertoires. From the immune organs, representatively from spleen in mice and peripheral blood mononuclear cells in humans, total RNA was prepared, reverse transcribed, and amplified using the 5'-RACE method. Using a universal forward primer and antisense primers for the antibody class-specific constant domains, antibody mRNAs were uniformly amplified in proportions reflecting their frequencies in the antibody populations. The amplicons were sequenced by next-generation sequencing (NGS), yielding more than 105 antibody sequences per immunological sample. We describe the protocols for antibody sequence analyses including V(D)J-gene-segment annotation, a bird's-eye view of the antibody repertoire, and our computational methods.
The antibody system is one of the fundamentals of the acquired immune system. It is highly potent against invading pathogens due to its vast diversity, fine antigen recognition specificity, and the clonal expansion of antigen-specific B cells. The repertoire of antibody-producing B cells is estimated to be more than 1015 in a single individual1. This immense diversity is generated with the help of VDJ gene recombination in the immunoglobulin genetic loci2. Description of the entire B cell repertoires and their dynamic changes in response to antigen-immunization is therefore challenging, but essential for a complete understanding of the antibody response against invading pathogens.
Because of their astronomical diversity, B cell repertoires have been regarded as a black box; however, the advent of NGS technology has enabled breakthroughs to an enhanced understanding of their complexity3,4. Whole antibody repertoires have been successfully analyzed, firstly in zebrafish5, then mice6, and humans6,7. Although NGS has now become a powerful tool in the study of the adaptive immune response, basic analyses of the commonalities and differences in antibody repertoires among individual animals are lacking.
In mice, it was reported that the IgM repertoires are almost identical between individuals, whereas those of IgG1 and IgG2c are substantially different between individuals8. In addition to V-gene usage profile, the observed frequency of VDJ-profile in naive peripheral B cells is highly similar between individuals8. The analysis of the amino acid sequences of the VDJ-region also showed the occurrence of the same junctional sequences in different mice much more frequently than previously thought8. These results indicate that the mechanisms for the antibody repertoire formation can be deterministic rather than stochastic5,8,9. The process of antibody repertoire development in mice has also been successfully analyzed using NGS to further highlight the potential of NGS to uncover the antibody immune system in detail10.
In this report, we describe a simple and efficient method to visualize and analyze an antibody repertoire at a global level.
All animal experiments were performed according to institutional guidelines and with the approval of the National Institute of Infectious Diseases Animal Care and Use Committee. Sampling of PBMCs from healthy adult volunteers, used as the representative result in this report, was performed with the approval of the Ethics Committee of the National Institute of Infectious Diseases, Tokyo, Japan, and written informed consent was obtained from each participant using an ethics committee-approved form.
1. Primer Design
Universal forward primer | 5'- AAGCAGTGGTATCAACGCAGAGT-3' |
Reverse primers for the mouse immunoglobulins (Ref.8) | |
IgM_CH1: | 5'- CACCAGATTCTTATCAGACAGGGGGCTCTC -3' |
IgG1_CH1: | 5'- CATCCCAGGGTCACCATGGAGTTAGTTTGG -3' |
IgG2c_CH1: | 5'- GTACCTCCACACACAGGGGCCAGTGGATAG -3' |
IgG3_CH1: | 5'-ATGTGTCACTGCAGCCAGGGACCAAGGGA-3' |
IgA_CH1: | 5'-GAATCAGGCAGCCGATTATCACGGGATCAC-3' |
Igκ_CH1: | 5'- GCTCACTGGATGGTGGGAAGATGGATACAG -3' |
Igλ_CH1: | 5'- CTBGAGCTCYTCAGRGGAAGGTGGAAACA -3' |
Reverse primers for the human immunoglobulins (Ref.14) | |
IgM_CH1: | 5'- GGGAATTCTCACAGGAGACG -3' |
IgG_CH1: | 5'- AAGACCGATGGGCCCTTG -3' |
IgD_CH1: | 5'- GGGTGTCTGCACCCTGATA -3' |
IgA_CH1: | 5'- GAAGACCTTGGGGCTGGT -3' |
IgE1_CH1: | 5'- GAAGACGGATGGGCTCTGT -3' |
IgE2_CH1: | 5'- TTGCAGCAGCGGGTCAAGGG -3' |
Igκ_CH1: | 5'- TGCTCATCAGATGGCGGGAAGAT -3' |
Igλ_CH1: | 5'- AGAGGAGGGCGGGAACAGAGTGA -3' |
Table 1: Primer sequences for PCR-amplification of immunoglobulins
2. Nucleic Acid Isolation from Immune Cells and Tissues
NOTE: The procedure given below is for extracting nucleic acids from the mouse spleen. However, it is applicable to other immune tissues and human cells such as lymph nodes or peripheral blood mononuclear cells (PBMCs) (Figure 1B).
3. cDNA Synthesis and PCR Amplification
NOTE: The method described below is based on the 5'-RACE11,12 and SMART-PCR techniques13. The details and optimization of the reaction are described in the manual of the kit 15. The starting materials for mouse immunoglobulin are the sample from step 2.10. The starting materials for human immunoglobulin are the sample from human tissues, ex. PBMC, treated as described in steps 2.3 to 2.10.
4. NGS Sequencing of Libraries
5. Quality Control of NGS Data
6. Extraction and Analysis of Immunoglobulin Sequences from .fna Data
NOTE: The example programs were implemented in a UNIX environment. Please use them as an example references because performance may depend on the operating system and hardware environment. The authors do not accept any liability for errors or omissions. The programming languages, Perl17, R18, and required modules need to be installed according to the instructions on the cited websites. the IgBLAST program need to be installed according to the instructions on the appropriate website19,20.
Antibody repertoires of mouse
A perspective of a murine antibody repertoire as a whole can be obtained from cells or tissues such as the spleen, bone marrow, lymph node, or blood. Figure 3 shows representative results of IgM, IgG1, IgG2c, and immunoglobulin light chain (IgL) repertoires from a naïve mouse spleen. The summary of the read numbers is shown in Table 3. For example, 166,175/475,144 reads contained IgM-specific signature sequence (Table 2) and 133,371/166,175 reads were VDJ-productive inferred by IgBLAST19.
Figure 3 shows a repertoire profile of VDJ-rearrangement by 3D-VDJ-plot, in which the size of each ball represents the relative number of reads; in other words, the number of antibody mRNAs in whole B cells. The 3-D mesh consists of 110 IGHV, 12 IGHD, and 4 IGHJ, which are aligned to reflect their order on the chromosome. In addition, the genes ambiguously assigned by IgBLAST were collected separately in the last position for each IGHV, IGHD and IGHJ line, giving rise to 7,215 nodes in the cuboid.
Also, shown in Figure 3 is a 2D-VJ-plot showing the profile of VJ-rearrangement in the IgL repertoire. The length of each bar on this plot represents the relative number of reads. The x-axis represents 101 IGLVκ and 3 IGLVλ genes, and the y-axis represents 4 IGLJκ and 3 IGLJλ genes. The unannotated V- and J-genes are represented on the right borderline.
The complementarity-determining region 3 (CDR3) sequences of these productive reads, which give rise to the majority of antigen-binding specificity, are given in IgBLAST outputs. The CDR3 sequences can be analyzed statistically, including biological or technical replicates, as described previously8,10.
Human antibody repertoires
A perspective of a human antibody repertoire as a whole can be analyzed from various tissues including peripheral blood mononuclear cells (PBMCs) or pathological tissues. Figure 4 shows representative results of IgM, total IgG (IgG1, IgG2, IgG3, and IgG4), total IgA (IgA1 and IgA2), IgD, IgE and IgL repertoires from normal PBMCs. A summary of the read numbers is shown in Table 3. For example, 90,238/1,582,754 reads contained IgM-specific signature sequence and 67,896/90,238 reads were VDJ-productive.
The repertoire profile of VDJ rearrangement is shown on a 3D-VDJ-plot in which the size of each ball represents the relative number of reads; in other words, the number of antibody mRNAs from whole PBMCs (Figure 4). The 3-D mesh consists of 56 IGHV, 27 IGHD, and 6 IGHJ, aligned in the order they appear on the chromosome. In addition, genes ambiguously assigned by IgBLAST are represented separately in the last position for each IGHV, IGHD and IGHJ line, giving rise to 11,172 nodes in the cuboid.
The profile of VJ-rearrangement in the IgL repertoire is depicted in a 2D-VJ-plot in which the length of each bar represents the relative number of reads (Figure 4). The x-axis represents 41 IGLVκ and 32 IGLVλ genes, and the y-axis represents 5 IGLJκ and 5 IGLJλ genes. The un-annotated V- and J-genes are represented on the right borderline.
The human CDR3 sequences are given in IgBLAST outputs and can be analyzed statistically as described previously8,10.
Immunoglobulin class | Sense | Antisense |
Mouse immunoglobulin heavy chains (C57BL/6) | ||
IgM | AGTCAGTCCTTCCCAAATGTC | GACATTTGGGAAGGACTGACT |
IgG1 | AAAACGACACCCCCATCTGTC | GACAGATGGGGGTGTCGTTTT |
(IgG1 variant) | AAAACAACACCCCCATCAGTC | GACTGATGGGGGTGTTGTTTT |
IgG2c | AAAACAACAGCCCCATCGGTC | GACCGATGGGGCTGTTGTTTT |
IgG3 | GTGATCCCGTGATAATCGGCT | AGCCGATTATCACGGGATCAC |
IgA | TCCCTTGGTCCCTGGCTGCAG | TCCCTTGGTCCCTGGCTGCAG |
Mouse immunoglobulin light chains (C57BL/6) | ||
Igκ | CTGTATCCATCTTCCCACCATCCAGTGAGC | GCTCACTGGATGGTGGGAAGATGGATACAG |
Igλ1 | TGTTTCCACCTTCCTCTGAAGAGCTCGAG | CTCGAGCTCTTCAGAGGAAGGTGGAAACA |
Igλ2 | TGTTTCCACCTTCCTCTGAGGAGCTCAAG | CTTGAGCTCCTCAGAGGAAGGTGGAAACA |
Igλ3 | TGTTTCCACCTTCCCCTGAGGAGCTCCAG | CTGGAGCTCCTCAGGGGAAGGTGGAAACA |
Igλ4 | TGTTCCCACCTTCCTCTGAAGAGCTCAAG | CTTGAGCTCTTCAGAGGAAGGTGGGAACA |
Human immunoglobulin heavy chains | ||
IgM | GGGAGTGCATCCGCCCCAAC | GTTGGGGCGGATGCACTCCC |
IgG | GCTTCCACCAAGGGCCCATC | GATGGGCCCTTGGTGGAAGC |
IgA | GCATCCCCGACCAGCCCCAA | GACCGATGGGGCTGTTGTTTT |
IgD | GCACCCACCAAGGCTCCGGA | TCCGGAGCCTTGGTGGGTGC |
IgE | GCCTCCACACAGAGCCCATC | GATGGGCTCTGTGTGGAGGC |
Human immunoglobulin light chains | ||
Igκ | ACTGTGGCTGCACCATCTGC | GCAGATGGTGCAGCCACAGT |
Igλ1,2,6 | GTCACTCTGTTCCCGCCCTC | GAGGGCGGGAACAGAGTGAC |
Igλ3,7 | GTCACTCTGTTCCCACCCTC | GAGGGTGGGAACAGAGTGAC |
Table 2: Summary of the immunoglobulin signature sequences
Mouse IgH | Total reads | IgM | IgG1 | IgG2c | ||
Input | 475,144 | |||||
IgC-containing | 166,175 | 229,671 | 36,628 | |||
VDJ-productive | 133,371 | 196,583 | 31,446 | |||
Mouse IgL | Total reads | IgKappa | IgLambda | |||
Input | 527,668 | |||||
IgC-containing | 178,948 | 21,446 | ||||
VJ-productive | 160,924 | 16,988 | ||||
Human IgH | Total reads | IgM | IgG | IgA | IgD | IgE |
Input | 1,582,754 | |||||
IgC-containing | 90,238 | 5,298 | 94,061 | 75,549 | 2,932 | |
VDJ-productive | 67,896 | 2,775 | 78,203 | 56,495 | 3 | |
Human IgL | Total reads | IgKappa | IgLambda | |||
Input | 1,582,754 | |||||
IgC-containing | 120,316 | 64,148 | ||||
VJ-productive | 97,169 | 52,324 |
Table 3: Summary of the read numbers in the experiments
Figure 1: Schematic representation of sequencing strategy for analyzing antibody repertoires in individual mice. (A) Total RNA from the immune cells or tissues was reverse-transcribed and PCR-amplified using the universal forward primer and immunoglobulin class-specific reverse primers. The amplicons from each immunoglobulin class were pooled and rendered for next-generation sequencing.(B) The biological replicates such as spleens from C57BL/6 mice were treated as follows: total RNAs were purified from spleen samples, and cDNAs were amplified by 5'-RACE using the universal primer and antibody class-specific primer. They were then rendered for next-generation sequencing with labeling primers for individual mice. Parts of the figure are adapted from8 with permission. Please click here to view a larger version of this figure.
Figure 2: Schematic of data-processing flowchart for analyzing antibody repertoires in individual mice. Amplicon reads obtained after next-generation sequencing were processed as follows: (1) read sequences were checked for the presence of antibody class-specific signature sequences; (2) sequences were examined for the V, D, and J gene fragments using IMGT/HighV-Quest and/or IgBLAST; (3) the sequences containing a productive VDJ junction were collected; and (4) these sequences were used for the analysis of overall repertoire features, CDR3, etc. Please click here to view a larger version of this figure.
Figure 3: Global data visualization for mouse antibody repertoires. The overall repertoire profiles of each antibody class were visualized by 3D-VDJ-plot. The x-axis represents 110 IGHV genes ordered as on the chromosome. The y- and z-axis represents 12 IGHD and 4 IGHJ genes, respectively. The volume of spheres on each node represents the number of reads. Red spheres: un-annotated V, D, and J genes. The IgL read distributions are shown on a 2D-VJ-plot in which the length of each bar represents the relative number of reads. The x-axis represents 101 x IGLVκ and 3 x IGLVλ genes, and the y-axis represents 4 x IGLJκ and 3 x IGLJλ genes. The un-annotated V and J genes are represented on the right borderline. Please click here to view a larger version of this figure.
Figure 4: Global data visualization for human antibody repertoires. The overall repertoire profiles of each antibody class were visualized by 3D-VDJ-plot. The x-axis represents 56 IGHV genes ordered as on the chromosome. The y- and z-axis represents 27 IGHD and 6 IGHJ genes, respectively. The volume of spheres on each node represents the number of reads. Red spheres: un-annotated V, D, and J genes. The IgL reads are arrayed on the 2D-VJ-plot in which the length of each bar represents the relative number of the reads. The x-axis represents 41 x IGLVκ and 32 x IGLVλ genes, and the y-axis represents 5 x IGLJκ and 5 x IGLJλ genes. The un-annotated V- and J-genes are represented on the right borderline. Please click here to view a larger version of this figure.
The method described here utilizes NGS for antibody RNA amplified using the 5'-RACE method. In contrast to methods that use degenerate 5'-VH gene primers, mRNAs of each antibody class are amplified evenly using universal forward primers. In addition, the use of antisense primers specific for the constant-region 1 (CH1) of the antibody gene enables repertoire profiling of specific immunoglobulin classes. This is very beneficial for dissecting the class-specific antibody response, as well as for comparing naive and immunized repertoires8,9.
A most likely pitfall of the method is a paucity of amplified immunoglobulin messages. The depth of antibody repertoire obtained by this protocol substantially depends on the PCR amplification described in steps 3.1 and 3.2. If the repertoire depth is not properly obtained, changing the ratios of template cDNA and primers in steps 3.2.1 or 3.2.2 is strongly recommended.
Generally, approximately 20% of the antibody reads produced by NGS are ambiguous sequences21. Even with established "correction methods", 5-10% remain ambiguous3. We, therefore, analyzed the sequence and filtered raw reads containing signature sequences corresponding to immunoglobulin constant regions (CμH1, Cγ1H1, Cγ2cH1, etc.). Hence the analysis of somatic hyper-mutations needs the careful examinations.
One of the limitations of this method is that immunoglobulin heavy and light chain pair is unable to be inferred. Hence the repertoire view obtained by this method is not holistic. However, it is possible to approximate the top-ranking pairs by statistical analysis of the data10. Also, a novel method to sequence the immunoglobulin pairs was reported recently3,4.
The immunoglobulin sequences in the output .fna data were extracted based on the presence of immunoglobulin gene signature sequences. The V, D, and J gene segments were then annotated and the productivity of V(D)J rearrangements were assessed. The complementarity-determining region 3 (CDR3) sequences were also annotated. These systematic examinations of immunoglobulin sequences in .fna data were usefully provided by the IMGT/HighV-QUEST server22,23,24. However, building an automated processing pipeline has the merit to analyze the big experimental data. The pipeline customized for each purpose is possible to set up by using the standalone IgBLAST protocol19. This approach needs basic programming literacy but is very useful for detailed analyses of the immunoglobulin system. The pipelines described are the examples of the customized protocol (Figure 2).
The number of antibody reads is proportional to the amount of antibody RNAs in the sample, reflecting the antibody constituents of the antibody system at given time points5,8,25. The method described here gives a bird's eye view of the V(D)J constitution of an antibody repertoire using R programs8,18,26.
The global view of IgM antibody repertoires of individual naive mice revealed a highly conserved VDJ-profile as compared to those of IgG1 or IgG2c8. It was reported that VDJ combinations of immature zebrafish are highly stereotyped9. In contrast, human VDJ combinations are reported to be highly skewed6. The highly conserved deterministic VDJ-profiles in naive B cells are probably generated either by skewed VDJ-rearrangements or negative selection with auto-antigens presented in the body. For example, IGHV11-2 is expressed preferentially in the fetal IgM repertoire27 and this predominance is attributed to the autoreactivity of IGHV11-2 against senescent erythrocytes27. Interestingly, IGHV11-2 was also the most common major repertoire in our previously published analysis of naive IgM8.
The method described here is useful for deciphering antigen-responsive antibody repertoires by inclusively analyzing the antibody-repertoire space generated in individual bodies, avoiding inadvertent omission of key antibody repertoires8,10. This method also allows the examination of detailed antibody network dynamism, which would facilitate accelerated discovery of protective antibodies against newly emerging pathogens.
The authors have nothing to disclose.
This work was supported by a grant from AMED under Grant Number JP18fk0108011 (KO and SI) and JP18fm0208002 (TS, KO, and YO), and a Grant-in-Aid from the Ministry of Education, Culture, Sports, Science and Technology (15K15159) to KO. We thank Sayuri Yamaguchi and Satoko Sasaki for the valuable technical assistance. We would like to thank Editage (www.editage.jp) for English language editing.
0.2 mL Strip Tubes | Thermo Fisher Scientific | AB0452 | 120 strips |
100 bp DNA Ladder | TOYOBO | DNA-035 | 0.5 mL |
2100 Bioanalyzer Systems | Agilent Technologies | G2939BA /2100 | |
Acetic Acid | Wako | 017-00256 | 500 mL |
Agarose, NuSieve GTG | Lonza | 50084 | |
Ammonium Chloride | Wako | 017-02995 | 500 g |
Chloroform | Wako | 038-02606 | 500 mL |
Dulbecco's PBS (-)“Nissui” | NISSUI | 08192 | |
Ethylenediamine-N,N,N',N'-tetraacetic Acid Disodium Salt Dihydrate (2NA) | Wako | 345-01865 | 500 g |
Falcon 40 µm Cell Strainer | Falcon | 352340 | 50/Case |
ling lock tube 1.7 mL | BM EQUIPMENT | BM-15 | |
ling lock tube 2.0 mL | BM EQUIPMENT | BM-20 | |
MiSeq Reagent Kit v2 | illumina | MS-102-2003 | 500 cycles |
MiSeq System | illumina | SY-410-1003 | |
NanoDrop 2000c Spectrophotometer | Thermo Fisher Scientific | ||
Potassium Hydrogen Carbonate | Wako | 166-03275 | 500 g |
PureLink RNA Mini Kit | life technologies | 12183018A | |
Qubit 3.0 Fluorometer | Thermo Fisher Scientific | Q33216 | |
Qubit dsDNA HS Assay Kit | Thermo Fisher Scientific | Q32854 | 500 assays |
SMARTer RACE 5’/3’ Kit | Clontech | 634858 | |
TaKaRa Ex Taq Hot Start Version | Takara Bio Inc. | RR006A | |
Trizma base | Sigma | T6066 | 1 kg |
TRIzol Reagent | AmbionThermo Fisher Scientific | 15596026 | 100 mL |
Ultra Clear qPCR Caps | Thermo Fisher Scientific | AB0866 | 120 strips |
UltraPure Ethidium Bromide | Thermo Fisher Scientific | 15585011 | |
Wizard SV Gel and PCR Clean-Up System | Promega | A9282 |