Reciprocal hemizygosity via sequencing (RH-seq) is a powerful new method to map the genetic basis of a trait difference between species. Pools of hemizygotes are generated by transposon mutagenesis and their fitness is tracked through competitive growth using high-throughout sequencing. Analysis of the resulting data pinpoints genes underlying the trait.
A central goal of modern genetics is to understand how and why organisms in the wild differ in phenotype. To date, the field has advanced largely on the strength of linkage and association mapping methods, which trace the relationship between DNA sequence variants and phenotype across recombinant progeny from matings between individuals of a species. These approaches, although powerful, are not well suited to trait differences between reproductively isolated species. Here we describe a new method for genome-wide dissection of natural trait variation that can be readily applied to incompatible species. Our strategy, RH-seq, is a genome-wide implementation of the reciprocal hemizygote test. We harnessed it to identify the genes responsible for the striking high temperature growth of the yeast Saccharomyces cerevisiae relative to its sister species S. paradoxus. RH-seq utilizes transposon mutagenesis to create a pool of reciprocal hemizygotes, which are then tracked through a high-temperature competition via high-throughput sequencing. Our RH-seq workflow as laid out here provides a rigorous, unbiased way to dissect ancient, complex traits in the budding yeast clade, with the caveat that resource-intensive deep sequencing is needed to ensure genomic coverage for genetic mapping. As sequencing costs drop, this approach holds great promise for future use across eukaryotes.
Since the dawn of the field, it has been a prime goal in genetics to understand the mechanistic basis of variation across wild individuals. As we map loci underlying a trait of interest, the emergent genes can be of immediate use as targets for diagnostics and drugs, and can shed light on the principles of evolution. The industry standard toward this end is to test for a relationship between genotype and phenotype across a population via linkage or association1. Powerful as these approaches are, they have one key limitation—they rely on large panels of recombinant progeny from crosses between interfertile individuals. They are of no use in the study of species that cannot mate to form progeny in the first place. As such, the field has had little capacity for unbiased dissection of trait differences between reproductively isolated species2.
In this work we report the technical underpinnings of a new method, RH-seq3, for genome-scale surveys of the genetic basis of trait variation between species. This approach is a massively parallel version of the reciprocal hemizygote test4,5, which was first conceived as a way to evaluate the phenotypic effects of allelic differences between two genetically distinct backgrounds at a particular locus (Figure 1A). In this scheme, the two divergent individuals are first mated to form a hybrid, half of whose genome comes from each of the respective parents. In this background, multiple strains are generated, each containing an interrupted or deleted copy of each parent’s allele of the locus. These strains are hemizygous since they remain diploid everywhere in the genome except at the locus of interest, where they are considered haploid, and are referred to as reciprocal since each lacks only one parent’s allele, with its remaining allele derived from the other parent. By comparing the phenotypes of these reciprocal hemizygote strains, one can conclude whether DNA sequence variants at the manipulated locus contribute to the trait of interest, since variants at the locus are the only genetic difference between the reciprocal hemizygote strains. In this way, it is possible to link genetic differences between species to a phenotypic difference between them in a well-controlled experimental setup. To date the applications of this test have been in a candidate-gene framework—that is, cases in which the hypothesis is already in hand that natural variation at a candidate locus might impact a trait.
In what follows, we lay out the protocol for a genome-scale reciprocal hemizygosity screen, using yeast as a model system. Our method creates a genomic complement of hemizygote mutants, by generating viable, sterile F1 hybrids between species and subjecting them to transposon mutagenesis. We pool the hemizygotes, measure their phenotypes in sequencing-based assays, and test for differences in frequency between clones of the pool bearing the two parents’ alleles of a given gene. The result is a catalog of loci at which variants between species influence the trait of interest. We implement the RH-seq workflow to elucidate the genetic basis of thermotolerance differences between two budding yeast species, Saccharomyces cerevisiae and S. paradoxus, which diverged ~5 million years ago6.
1. Preparation of the piggyBac-containing plasmid for transformation
2. Creating a pool of untargeted genome-wide reciprocal hemizygotes
3. Selection of reciprocal hemizygotes in a pooled format
4. Tn-seq library construction and Illumina sequencing to determine abundance of transposon mutant hemizygotes
5. Mapping the locations of transposon insertions and RH-seq analysis
NOTE: The following data analysis was accomplished with custom Python scripts (found online at https://github.com/weiss19/rh-seq), but could be redone using other scripting languages. Below, the major steps in the process are outlined. Perform the following steps on each individual replicate read file unless it is noted to combine them.
We mated S. cerevisiae and S. paradoxus to form a sterile hybrid, which we subjected to transposon mutagenesis. Each mutagenized clone was a hemizygote, a diploid hybrid in which one allele of one gene is disrupted (Figure 1A, Figure 2). We competed the hemizygotes against one another by growth at 39 °C and, in a separate experiment as a control, at 28 °C (Figure 1B), and we isolated DNA from each culture. To report the fitness of each hemizygote we quantified abundance via bulk sequencing, using a protocol in which DNA was fragmented and ligated to adapters, followed by amplification of transposon insertion positions (Figure 1C). If the primers for this amplification are distinct from, and less efficient than, those provided in the protocol, background reads will predominate in the sequencing data, leading to fewer usable reads and eroding the accuracy of fitness estimates. Similar quality issues may result from low DNA input into the sequencing library preparation.
With results in hand from our sequencing, for a given gene we compared hemizygote abundances at the two temperatures between two classes of hemizygotes: clones where only the S. cerevisiae allele was wild-type and functional, and clones relying only on the S. paradoxus allele (Figure 1D). In analysis at this stage, if the computational post-processing strategy of the protocol is not followed and genes with relatively few transposon mutants in the pool are included in the analysis, statistical power will drop and no significant gene calls will result. In our implementation, we detected strong signal at eight housekeeping genes (Figure 3). In each case, transposon insertions in the S. cerevisiae allele in the hybrid compromised growth at high temperature (Figure 3). These loci represented candidate determinants of the thermotolerance trait that distinguishes S. cerevisiae from S. paradoxus. In separate experiments reported elsewhere, we validated the impact of allelic variation at each site using standard transgenesis methods beyond the scope of the current protocol3.
Figure 1. Schematic of the RH-seq workflow.
A. S. cerevisiae and S. paradoxus (blue and yellow respectively), are mated to form a hybrid (green) that contains a single copy of each of the parents’ genomes. At a given locus in the hybrid, a transposon insertion (black box) in each species’ allele in turn creates a hemizygote, which is diploid at the rest of the genome except for the locus of interest. Comparing phenotypes across hemizygotes reveals the phenotypic effects of allelic variation at the manipulated locus. B. Across many clones hemizygous at a given gene (YFG), some reach higher abundance than others in competitive culture, as quantified by sequencing. C. DNA from a hemizygote pool is sheared and ligated to adapters (red). For a given clone, the junction between the transposon (tn, black) and the genome (blue) is amplified with a transposon-specific primer (black arrowhead) and an adapter-specific primer (red arrowhead). Sequencing read counts from the amplicon report the fitness of the clone in the population. D. For an RH-seq gene hit, tabulating the proportion of hemizygote clones (y-axis) exhibiting a given fitness after competition at high temperature (x-axis) reveals a striking difference between two genotypic classes: those with a transposon insertion in the S. cerevisiae allele (with the S. paradoxus allele remaining; yellow) and those with the S. paradoxus allele disrupted (and S. cerevisiae allele remaining; blue). Please click here to view a larger version of this figure.
Figure 2. Selection scheme for generating a pool of genome-wide reciprocal hemizygotes with the PiggyBac plasmid-borne transposon.
The PiggyBac plasmid (pJR487) is transformed into a URA3-/- clone of the diploid hybrid S. cerevisiae DBVPG1373 x S. paradoxus Z1 (JR507). The presence of the plasmid or transposon is selected for via growth in G418, which selects for the presence of the KanMX cassette; survivors are cells which have taken up the PiggyBac plasmid and/or harbor an integrated transposon. Cells without the latter are selected against via growth in 5-FOA, which is toxic in the presence of the URA3 cassette. Since the untransformed hybrid is URA3-/-, the only cells that will die in this step are those still containing the PiggyBac plasmid, which contains a URA3 cassette. What remains is a pool of hybrid mutant cells containing the transposon integrated into the genome. Please click here to view a larger version of this figure.
Figure 3. Top hits mapped by RH-seq.
Each panel reports RH-seq data for the indicated gene from RH-seq. The x-axis reports the log2 of abundance of a transposon mutant clone after selection at 39 °C, relative to the analogous quantity at 28 °C. The y-axis reports the proportion of all clones bearing insertions in the indicated allele that exhibited the abundance ratio on the x, as a kernel density estimate. Underlying read and count data for insertions are reported elsewhere3. Please click here to view a larger version of this figure.
The advantages of RH-seq over previous statistical-genetic methods are several-fold. In contrast to linkage and association analysis, RH-seq affords single-gene mapping resolution; as such, it will likely be of significant utility even in studies of trait variation across individuals of a given species, as well as interspecific differences. Also, previous attempts at genome-wide reciprocal hemizygosity analysis used collections of gene deletion mutants, some of which harbor secondary mutations that can lead to false positive results9,10. The RH-seq strategy sidesteps this issue by generating and phenotyping many hemizygote mutants in each gene in turn, such that the background of any individual mutant clone contributes only marginally to the final result. In principle, RH-seq also affords the study of noncoding loci, although in the current work we focused exclusively on genes.
There are a few quirks to RH-seq, some biological and some technical, that a successful practitioner will deal with up front to maximize the utility of the approach and accelerate the path to best results. Biologically, RH-seq only makes sense as a technique if the two target species can be mated to form a stable, viable hybrid that can be genetically manipulated. Thus we cannot envision applying RH-seq to species so divergent that they fail to fuse into a karyotypically stable diploid. On the other hand, if the two parents of the diploid hybrid are too similar at the DNA level, most reads from the transposon insertion sequencing cannot be mapped allele-specifically to just one of the two parent genomes and will be unusable; thus, a given RH-seq experiment will be most successful when the parents have high-quality reference genomes available and hit a “sweet spot” of sequence divergence. As a separate point of consideration, given an RH-seq project formulated to dissect the genetic basis of a trait difference between the parent species, results are likely to be much more interpretable when, for the trait of interest, the biology of the hybrid serves as a reasonable representative of that of the parents. Extreme phenotypes unique to the hybrid (heterosis) could influence or obscure the effects of genes of interest underlying the phenotype as it differs between the parents. Any genes mapped through reciprocal hemizygosity analysis must be validated by independent allele-swap experiments in the genetic backgrounds of the purebred parent species.
As for technical issues in an RH-seq experiment, our experience has highlighted several potential sources of noise and provided workable solutions. Noise manifests as disagreement among the sequencing-based estimates of fitness of the hemizygotes harboring transposon insertions in a given allele of a given gene. This can derive from differing secondary mutations in the backgrounds of transposon mutants (see below); variability in the efficiency of the PCR amplifying different insertion sites; low representation of a given mutant in the bulk pool, leading to low sequencing coverage which weakens precision; and differences in position of the transposon insertion within the gene (e.g., transposons inserting at a 3’ gene end may have minimal phenotypic effect). For all these reasons, we consider it critical to generate very large transposon mutant pools and, in the final analysis, to exclude from testing any gene without a reasonable number of mutants in each of the two alleles. We note that, although we have not implemented it here, a barcoded transposon system7 could further help resolve issues of PCR bias and cut down on the cost and labor of an RH-seq experiment.
In conclusion, we have established a straightforward workflow for RH-seq, and have specified caveats of the approach. We find that the latter does not significantly compromise the utility of RH-seq; we consider that it holds great promise for high-resolution, genome-scale dissection of the phenotypic consequences of genetic variation, including differences between species that have been reproductively isolated for millions of years.
The authors have nothing to disclose.
We thank J. Roop, R. Hackley, I. Grigoriev, A. Arkin and J. Skerker for their contributions to the original study, F. AlZaben, A. Flury, G. Geiselman, J. Hong, J. Kim, M. Maurer, and L. Oltrogge for technical assistance, D. Savage for his generosity with microscopy resources, and B. Blackman, S. Coradetti, A. Flamholz, V. Guacci, D. Koshland, C. Nelson, and A. Sasikumar for discussions; we also thank J. Dueber (Department of Bioengineering, UC Berkeley) for the PiggyBac plasmid. This work was supported by R01 GM120430-A1 and by Community Sequencing Project 1460 to RBB at the U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility. The work conducted by the latter was supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.
1-2 plasmid Gigaprep kits | Zymo Research | D4204 | The number of kits required depends on how efficient your preps are in each kit. This kit comes with 5 individual plasmid prep columns. Run 1 L of saturated E. coli culture through each prep column, as using more than 1 L per column can cause clogging of the prep filter, leading to low yield and poor quality DNA. |
10X Tris-EDTA (TE) buffer (100 mM Tris-HCl and 10 mM EDTA) | Any | N/A | Filter sterilize through a 0.22 μm filter before use. |
1M LiOAc | Any | N/A | Filter sterilize through a 0.22 μm filter before use. |
300 mg/mL Geneticin (G418) | Gibco | 11811023 | |
52% polyethylene glycol (PEG) 3350 | Sigma | 1546547 | Dissolve in water and filter sterilize through a 0.22 μm filter before use. 1X trafo mix: 228 uL 52% PEG, 36 uL 1M LiOAc, 36 uL 10X TE buffer |
Autoclaved LB liquid broth | BD Difco | 244620 | Make LB liquid broth using your powder from any brand, and milliQ water. Autoclave it before use. |
Carbenicillin stock in water (100 mg/mL) | Any | N/A | Filter sterilize through a 0.22 μm filter before use. |
Complete synthetic agar plates (24.1cm x 24.1cm) with 5-fluoroorotic acid (5-FOA) [0.2% drop-out amino acid mix without uracil or yeast nitrogen base (YNB), 0.005% uracil , 2% D-glucose, 0.67% YNB without amino acids, 0.075% 5-FOA] | 5-FOA: Zymo Research, Drop-out mix: US Biological, Uracil: Sigma, D-glucose: Sigm), YNB: Difco | 5-FOA: F9001-5, Drop-out mix: D9535, Uracil: U0750, D-glucose: G8270, YNB: DF0919 | |
DMSO | Any | N/A | |
E. coli strain carrying pJR487 (CEN-/ARS+ piggyBac-containing plasmid) | N/A | N/A | Request from Brem lab. |
Hybrid yeast strain JR507 (S. cerevisiae DBVPG1373 x S. paradoxus Z1, URA-/URA-) | N/A | N/A | Request from Brem lab. |
Illumina Hiseq 2500 | used for SE-150 reads | ||
Large shaking incubators with variable temperature settings | Any | N/A | |
LB + carbenicillin agar plates (100 μg/mL) | Agar: BD Difco | Agar: 214010 | Make LB agar plates as normal and add carbenicillin to 100 μg/mL before drying. |
Nanodrop spectrophotometer | Thermo Scientific | ND-2000 | |
Qubit Fluorimeter | Thermo Scientific | Q33240 | |
Salmon sperm DNA | Invitrogen | 15632011 | |
Water bath at 39°C | Any | N/A | |
Yeast fungal gDNA prep kit | Zymo Research | D6005 | |
Yeast peptone dextrose (YPD) liquid media | BD Difco | Peptone: 211677, Yeast Extract: 212750 | Add filter-sterilized D-glucose to 2% after autoclaving. |
YPD + G418 agar plates (300 μg/mL) | Agar: BD Difco | Agar: 214010 | Make YPD agar plates as normal and add G418 to 300 μg/mL before drying. |
YPD agar plates | Agar: BD Difco | Agar: 214010 |