A protocol is described that uses Illumina's Infinium assays to perform large-scale genotyping. These assays can reliably genotype millions of SNPs across hundreds of individual DNA samples in three days. Once generated, these genotypes can be used to check for associations with a variety of different diseases or phenotypes.
Genotyping variants in the human genome has proven to be an efficient method to identify genetic associations with phenotypes. The distribution of variants within families or populations can facilitate identification of the genetic factors of disease. Illumina's panel of genotyping BeadChips allows investigators to genotype thousands or millions of single nucleotide polymorphisms (SNPs) or to analyze other genomic variants, such as copy number, across a large number of DNA samples. These SNPs can be spread throughout the genome or targeted in specific regions in order to maximize potential discovery. The Infinium assay has been optimized to yield high-quality, accurate results quickly. With proper setup, a single technician can process from a few hundred to over a thousand DNA samples per week, depending on the type of array. This assay guides users through every step, starting with genomic DNA and ending with the scanning of the array. Using propriety reagents, samples are amplified, fragmented, precipitated, resuspended, hybridized to the chip, extended by a single base, stained, and scanned on either an iScan or Hi Scan high-resolution optical imaging system. One overnight step is required to amplify the DNA. The DNA is denatured and isothermally amplified by whole-genome amplification; therefore, no PCR is required. Samples are hybridized to the arrays during a second overnight step. By the third day, the samples are ready to be scanned and analyzed. Amplified DNA may be stockpiled in large quantities, allowing bead arrays to be processed every day of the week, thereby maximizing throughput.
Typing single nucleotide polymorphisms (SNPs) is a key method of identifying risk variants associated with disease. Historically, the scope of genotyping experiments has been limited by the technology available. Gel electrophoresis-based genotyping methods are limited in sample- and SNP-throughput[1]. Developing these assays can often be labor-intensive, relying on the makeup and structure of the region surrounding the variant for optimization[1]. TaqMan genotyping assays, developed by Life Technologies, can run a large number of samples quickly and with minimal technician involvement[2], but SNP-multiplexing restrictions continue to limit the total number of genotypes to well under one million per day[3,4]. Sequenom's iPlex platform can also run many samples at once, but, as fewer than one hundred SNPs can be multiplexed together, throughput is comparably low overall[5]. Beckman Coulter's SNP stream technology could theoretically produce over three million genotypes per day, but this technology limits project range to a maximum of only forty-eight SNPs per reaction[4,6]. While the GoldenGate assay can process nearly two hundred DNA samples each day on hundreds or thousands of SNPs per sample, the price per genotype is not competitive with advanced, ultra-high-throughput techniques when typing over three thousand SNPs at once[4,7]. In order to process several million genotypes per day, the scale required for large genome-wide association studies, array-hybridization assays have become the most cost-effective option on the market.
Affymetrix's line of hybridization arrays and Illumina's line of Infinium-based arrays allow potentially hundreds of samples to be typed on hundreds of thousands or millions of SNPs in parallel[4,8]. These SNPs can be scattered across the entire genome, localized in regions of interests, such as exomes, or customized to the user's preference. These arrays have the benefit of not only being able to accurately genotype one million SNPs per sample at once, but also to measure copy number variation, potentially unveiling chromosomal abnormalities. Infinium-aligned OMNI BeadChip arrays currently have the ability to genotype up to nearly five million markers per sample, including half a million custom loci, on up to nearly one hundred samples each day.
As most diseases have a genetic component, these large-scale experiments can be crucial in finding genes associated with disease. High-throughput genotyping allows for efficient genotype generation in sample sets large enough to convincingly detect genetic association at lower minor allele frequencies. Whole-genome genotyping projects can be used to locate regions with statistically significant case-control allele frequency or copy number differences[9,10,11]. According to the National Human Genome Research Institute, genome-wide association studies led to 1,490 separate publications between November 25, 2008 and January 25, 2013, stemming from the discovery of 8,283 SNPs with a p-value less than 1 x 10-5 (see http://www.genome.gov/gwastudies/). These studies, which researched conditions ranging from height to testicular cancer, benefited from the broad approach afforded by a genome-wide analysis. In cases such as these, entire regions of interest might have escaped notice had the scope of typing been too restrictive. Thus, for large-scale association analyses, a genome-wide genotyping technique is the technique of choice.
Different versions of the Infinium assay exist, each intended for use with specific types of arrays. The InfiniumUltra assay, discussed in depth below, is appropriate for many 12- or 24-sample array chips. These often genotype over a hundred thousand SNPs per DNA sample and focus on targeted regions, such as on exome or custom panels. Other assay versions might be required for other chip types, such as the whole-genome genotyping arrays. However, as all Infinium assays share a common basis and mainly differ only by the reagent names, the reagent volumes, or the exact staining reagent procedure, techniques perfected on one assay version can often be universally applied. Other arrays, such as methylation arrays, might use a nearly-identical protocol, as well. Care must be taken to only use the version of the assay required for the chip type in use. Some types, such as ones measuring gene expression level, might require use of a nonInfinium protocol.
Samples must be processed in batches. For example, with the InfiniumUltra assay, prehybridization reagent tubes contain enough volume to run 96 samples, and the tubes cannot be refrozen. Therefore, samples must be run in batches of 96 samples at a time. The samples will be amplified on the first day. After approximately 1 hr of benchwork, the samples must be heated in a convection oven for 20-24 hr. The following day, nearly 4 hr will be spent fragmenting, precipitating, and resuspending the samples, at which point the samples can either be frozen for future use or hybridized to the chip. Loading chips takes nearly 2 hr, after which the samples will be hybridized overnight for 16-24 hr. On the third day, the staining and extension step takes ~4 hr. A further hour will be spent washing, coating, and drying the chips. Finally, the arrays are scanned, which may take from 15-60 min/chip, depending on the type used.
Standard laboratory safety and cleanliness precautions apply. Though the amplification is not PCR-based, separate workstations for pre- and postamplification procedures are necessary in order to minimize likelihood of contamination. The identification number of every kit-supplied reagent in use must be logged on a tracking sheet. Reagents should be thawed immediately before use and inverted several times before dispensing. The DNA to be typed must be high-quality genomic DNA (260/280 absorbance ratio of 1.6-2.0, 260/230 absorbance ratio of below 3.0), isolated by standard methods and quantified with a fluorometer. Degradation of DNA is often a contributing factor in low-quality assay results. Typically, 200 ng of DNA is required, though this amount may vary for some chip types. A Tecan liquid-handling robot can automate many steps of the protocol and minimize human error as a factor.
Day One
1. Preparation
2. Amplification
WARNING: samples should not undergo amplification unless 4 hr will be available on the following day for the fragmentation, precipitation, and resuspension steps.
Day Two
3. Fragmentation
4. Precipitation
5. Resuspension
6. Hybridization
WARNING: Samples should not undergo hybridization unless 5.5 hr are available on the following day for the staining and wash steps.
Day Three
7. Staining Preparation
8. Staining and Extension
9. Washing and Sealing
10. Scanning
A properly-processed bead chip should display bright and distinct red and green laser-light intensities while scanning. In Figure 6, the iScan scanning software displays a standard genomic DNA sample successfully hybridized to a custom-panel SNP genotyping array. The labeled nucleotides that were attached during the extension and staining steps fluoresce under the lights of the two lasers. As these nucleotides selectively extend the bead's oligonucleotide chain hybridized to the fragmented DNA strand, and as the oligonucleotide chains are designed to terminate at the site of the variant, the color and intensity of the resulting signal can be used to determine the alleles present at the SNP site.
A user-generated sample sheet, which matches sample IDs and clinical information to chipID and position, and an array-specific SNP manifest, obtained directly from the company, are both necessary in order to import the scanner output files into the GenomeStudio analysis software. Example sample sheets can be found on the company's website. Once the GenomeStudio project is built, genotypes can be obtained from the resulting intensity clusters automatically generated by the software. Though multiple display options exist, the default view, Norm-R (a normalized intensity value) vs. Norm-Theta (an allelic intensity ratio), is often the easiest plot to differentiate three distinct clusters. Most SNPs should show one, two, or three clusters, depending on the minor allele frequency.
The three clusters represent samples demonstrating AA, AB, or BB alleles (see Figure 7). These clusters can be assigned genotypes either by importing a standard clustering position template directly from the company, by selecting "Cluster All SNPs" from the Analysis toolbar of the software, or by clicking and dragging the colored circles on the intensity plots to manually edit calls. The color of the sample on the intensity plot (red, purple, or blue) indicates the call (AA, AB, or BB, respectively); black indicates no call. By scrolling through the SNPs listed in the Full Data pane of the software, the clustering for each SNP can be viewed or recalled. Once the clusters are assigned satisfactory calls, the Full Data pane of the software lists the genotypes for each individual sample at each particular SNP. The Column Chooser option above the table can toggle data formats.
Data can be exported either through the Analysis toolbar or directly from the Full Data table for in-depth analysis.
Figure 1. Overview – Infinium Assay Protocol. Click here to view larger image.
Figure 2. Complete Hyb Chamber base and mat, plus lid.[12] Click here to view larger image.
Figure 3. Loading a Beadchip – Dispense the sample on the inlet ports.[12] Click here to view larger image.
Figure 4. Removing the Beadchip Coverseal – Grasp the seal at the corner and gently peel diagonally.[12] Click here to view larger image.
Figure 5. Complete Beadchip Flow-through Assembly – The BeadChip is separated from a glass slide with a spacer and bound with clasps. Click here to view larger image.
Figure 6. Successful BeadChip Scan – A) The BeadChip is scanned with both a red and green laser; the scanning software displays both simultaneously. Sections passing intensity QC will highlight green on the BeadChip display to the left. Sections failing intensity QC will highlight red on the BeadChip display. B) Once scanning is complete, the software will overlay the red and green displays. A zoomed-in image is shown. The color and intensity of each individual bead indicates the allele present. Click here to view larger image.
Figure 7. SNP NORM-R vs. NORM-THETA Cluster Profiles – A) A valid SNP with three distinct clusters representing AA, AB, and BB genotypes, colored red, purple, and blue. B) A SNP requiring editing. The middle cluster, which should be homozygous AB, is left un-called. The BB cluster is mistakenly called AB. C) A poor-performing SNP. No genotypes can be obtained from this intensity plot, as no distinct clusters exist. Click here to view larger image.
Large-scale genotyping applications have been used to better understand the genetic mechanism underlying many human diseases. The discovery of any significant variant through a genome-wide association analysis can flag a candidate region for further study. In addition, genotype data is a good tool for quality control on sequencing projects.
To maximize sample throughput, multiple sample plates can be amplified and stored in their fragmented, resuspended states. Eight plates may be amplified in a single day, combining the first 24 hr of the protocol for multiple batches and providing enough material for ~2-8 days of chip-processing. If amplified plates are stockpiled beforehand, and if new samples are hybridized to chips immediately after scanning begins on the previous run, processing can run continuously without the need to pause for additional sample preparation. Therefore, though samples will take three days to undergo the complete assay, data can be generated daily. Assuming24 chips are processed every day, a 5 day workweek allows for over a 1,000 DNA samples to be run on a 12-sample bead chip. If any step or reagent has failed, however, multiple batches might be at risk for poor performance before any correction can be applied. Errors might escape notice until the arrays are scanned or analyzed; therefore, if throughput is maximized, hundreds of samples in various stages of the protocol might already have received the same defective treatment upon discovery. As lost reagents and data cannot be recovered, the user must weigh these risks against the need for an accelerated workflow.
The GenomeStudio analysis software is the first chance to truly gauge the success of the genotyping process. If the Norm-R vs. Norm-Theta intensity plots are properly clustered, the average call rate (percent of total SNPs successfully typed) of the samples should approach 99%, though this value varies slightly depending on array type. The data from any sample with a call rate lower than 85-90% is not trustworthy and should be discarded. For quality control purposes, results should be compared to any previously-known genotypes whenever possible. If no such data exists, intentional sample duplication is a useful tool in verifying plate or array placements. These duplicate pairs should be placed on separate chips, plates, batches, or projects; their genotypes checked upon generation. While specific QC constraints vary according to the investigator's preference, common SNP constraints are based on sample call success, Hardy-Weinberg equilibrium, or missingness between cases and controls, while common sample constraints are based on call rates, Mendelian inconsistencies, or cross-references of X-chromosome heterozygosity to clinical gender data[13].
If any problem arises, the Controls Dashboard, found in the analysis suite, can be submitted to the company in order to determine cause. These controls can often narrow down the issue to the likeliest step or reagent failure. If any SNPs of interest are found through an Infinium genotyping experiment, their intensity plots should be double-checked in GenomeStudio for clustering errors before further research is conducted.
A failed Infinium genotyping experiment is likely due to human processing error or poor-quality input DNA. Sample quantification must be accurate and precise. For best results, any reagents added to any sample or chip must be dispensed at the volume set by the protocol. Pipettes must be properly calibrated. Reagents should not be run after expiration and should not be refrozen once thawed, save for the RA1 reagent. In order to minimize possible staining and extension errors, the formamide/EDTA mixture should be prepared fresh every month. All -20 °C reagents should be stored in manual-defrost freezers only. All labware used in the staining, extension, and wash portions of the protocol should be rinsed thoroughly with water and mild detergent immediately upon disuse. The humidifying reservoirs in the hyb chamber should be scrubbed with a test tube brush and mild detergent. The glass slides should be washed with 10% bleach, as instructed by their user manuals, once a week.
The authors have nothing to disclose.
Funding for this work has been provided by NIH P20 GM103456, NIH RC2 AR058959, and NIH R56 AI063274
Consumable or Equipment | Manufacturer | Part Number | Minimum Required for 96 Samples |
0.8 ml Deep Well Plate | Thermo Scientific | AB-0765 | 1 |
Plate Mats | Thermo Scientific | AB-0674 | 2 |
Reagent Basin | Fisher Scientific | 13-681-502 | 9 |
Heat-seal Sheets | Thermo Scientific | AB-0559 | 1 |
Flow-through Spacer | Fisher Scientific | NC9563984 | 6 |
Pipette tips – 200 μl | Rainin | GP-L200F | 192 |
Pipette tips – 10 μl | Rainin | GP-L10F | 16 |
Pipette tips – 1,000 μl | Rainin | GP-L1000F | 16 |
DNA Suspension Buffer | Teknova | T0220 | 0.5 ml |
0.1 N NaOH | Fisher Scientific | AC12419-0010 | 0.5 ml |
Isopropanol (HPLC grade) | Fisher Scientific | A451 | 15 ml |
Ethanol (200-proof) | Sigma-Aldrich | 459836 | 330 ml |
Formamide (100%) | Thomas Scientific | C001K38 | 15 ml |
EDTA (0.5 M) | Amresco | E177 | 0.2 ml |
10 μl 8-channel Pipette | Rainin | L8-10XLS | 1 |
200 μl 8-channel Pipette | Rainin | L8-200XLS | 2 |
1,000 μl Single-channel Pipette | Rainin | L-1000XLS | 1 |
Microplate Shaker | VWR | 13500-890 | 1 |
Refrigerated Microplate Centrifuge | VWR | BK369434 | 1 |
Hybridization Oven | Illumina | SE-901-1001 | 1 |
Hybex Microsample Incubator | SciGene | 1057-30-0 | 1 |
Hybex MIDI Heat Block Insert | Illumina | BD-60-601 | 1 |
Heat Sealer | Thermo Scientific | AB-0384 | 1 |
Hyb Chamber w/ Insert and Mat | Illumina | BD-60-402 | 2 |
Surgical Scissors | Fisher Scientific | 13-804-20 | 1 |
Flow Through Assembly Parts | Illumina | WG-10-202 | 8 |
Wash Rack and Dish | Illumina | BD-60-450 | 1 |
Genepaint Chamber Rack | Tecan | 760-800 | 1 |
Temperature Probe | Illumina | A1-99-109 | 1 |
Staining Rack and Dish | Illumina | WG-10-207 | 1 |
Self-Closing Tweezers | Ted Pella, Inc | 5374-NM | 1 |
Vacuum Manifold | Ted Pella, Inc | 2240 | 1 |
iScan or HiScan | Illumina | – | 1 |