Human endogenous retroviruses (HERV), which occupy 8% of the human genome, retain scarce coding capacities but a hundred thousand long terminal repeats (LTRs). A custom Affymetrix microarray was designed to identify individual HERV locus expression and was used on prostate cancer tissues as a proof of concept for future clinical studies.
The prostate-specific antigen (PSA) is the main diagnostic biomarker for prostate cancer in clinical use, but it lacks specificity and sensitivity, particularly in low dosage values1. ‘How to use PSA' remains a current issue, either for diagnosis as a gray zone corresponding to a concentration in serum of 2.5-10 ng/ml which does not allow a clear differentiation to be made between cancer and noncancer2 or for patient follow-up as analysis of post-operative PSA kinetic parameters can pose considerable challenges for their practical application3,4. Alternatively, noncoding RNAs (ncRNAs) are emerging as key molecules in human cancer, with the potential to serve as novel markers of disease, e.g. PCA3 in prostate cancer5,6 and to reveal uncharacterized aspects of tumor biology. Moreover, data from the ENCODE project published in 2012 showed that different RNA types cover about 62% of the genome. It also appears that the amount of transcriptional regulatory motifs is at least 4.5x higher than the one corresponding to protein-coding exons. Thus, long terminal repeats (LTRs) of human endogenous retroviruses (HERVs) constitute a wide range of putative/candidate transcriptional regulatory sequences, as it is their primary function in infectious retroviruses. HERVs, which are spread throughout the human genome, originate from ancestral and independent infections within the germ line, followed by copy-paste propagation processes and leading to multicopy families occupying 8% of the human genome (note that exons span 2% of our genome). Some HERV loci still express proteins that have been associated with several pathologies including cancer7-10. We have designed a high-density microarray, in Affymetrix format, aiming to optimally characterize individual HERV loci expression, in order to better understand whether they can be active, if they drive ncRNA transcription or modulate coding gene expression. This tool has been applied in the prostate cancer field (Figure 1).
Human endogenous retroviruses (also called HERVs) are spread throughout our genome. They originate from ancestral and independent infections within the germ line, followed by copy-paste propagation processes and leading to multicopy families. Today, they are no more infectious but they occupy 8% of the human genome; as a point of comparison, exons span 2% of the human genome. Data from the ENCODE project published in 2012 showed that different RNA types cover about 62% of the genome, including one third in intergenic regions. Moreover, it appears that the amount of transcriptional regulatory motifs is at least 4.5x higher than the one corresponding to protein-coding exons. HERVs long terminal repeats (LTR) represent a broad range of potential transcriptional regulatory elements, as it is their usual function in infectious retroviruses. Historically, apart from a few loci expressed in the placenta or testis, it was commonly believed that HERV are silent due to epigenetic regulation. Therefore, we have designed a high-density microarray, in Affymetrix format, aiming to optimally characterize individual HERV loci expression, in order to better understand whether they are active, if they drive lncRNA transcription or modulate coding gene expression. This tool dubbed HERV-V2 GeneChip integrates 23,583 HERV probesets and can discriminate 5,573 distinct HERV elements composed of solo LTRs as well as complete and partial proviruses (Figure 2).
Diagnosis, Assessment, and Plan:
Diagnosis of prostate cancer is based on dosage of the prostate specific antigen (PSA) biomarker in clinical laboratory, a digital rectal examination to evaluate morphological alteration of the prostate and finally prostate biopsies observed by the pathologist. The lack of sufficient specificity and sensitivity among conventional cancer biomarkers, such as PSA for prostate cancer, has been widely recognized after several decades of clinical implications1. Initially, PSA was proposed for the diagnosis and treatment of adenocarcinoma of the prostate11. It was latter proposed for cancer screening and monitoring the development of the disease12. However, there remains a question which is regularly asked: ‘how to use PSA'. (i) A gray zone corresponding to a concentration in serum of 2.5-10 ng/ml does not allow a clear difference to be made between cancer and noncancer2; (ii) two large cohort studies enrolling hundreds of thousands of people in Europe and USA failed to come to a clear conclusion about the usefulness of screening in terms of disease specific mortality13,14; (iii) analysis of post-operative PSA kinetic parameters such as PSA clearance, PSA velocity and doubling time, although simple in theory, can pose considerable challenges in practical application3,4. We may expect that in the coming years, biomarker applications will support a clinical choice between watchful waiting and more or less aggressive treatments depending on tumor phenotype. Concerning the diagnosis rendered by the pathologist, a first limiting factor comes from a 20% false negative diagnosis within prostate biopsies (many cancers are missed by sampling). A second concern deals with the need for an additional biopsy procedure following a negative one, which may present adverse effects.
Radical prostatectomy is currently one of the standard treatments for prostate cancer. It is proposed in healthy patients, aging from 45-65 years, especially in the case of aggressive patterns (Gleason 7 to 10), multifocal tumor or palpable tumor. It is now done in our department using robotic assisted surgery. Because of the growing evidence that molecular markers will have paramount importance in the coming years, we decided to propose to all our patients the possibility of participating in a program for prostate tissue banking. More precisely, the expanding molecular research programs on prostate cancer have resulted in an increasing requirement for access to high quality fresh tumor tissues from prostatectomy specimens. This research, in particular the genomic approaches, required large samples of high DNA/RNA quality. Tumoral and adjacent ‘non tumoral' tissues from the same patient are needed. Recommendations for handling and processing radical prostatectomies are designed to preserve pathological features that determine stage and margin status and thereby potential further treatment and prognosis. Any fresh tissue sampling method, therefore, should not compromise subsequent pathological assessments in order to be acceptable to the diagnosis. Macroscopic dissection of the prostate is difficult and great attention needs to be paid to margin tissues and capsular invasion: any dissection for prostate banking should be always conducted by a trained uropathologist according to an agreed protocol. The ethics committee of the medical faculty and the state medical board agreed to these investigations and informed consent was obtained for all patients included in the prostate tissues banking.
1. Surgery
Once removed by the surgeon, keep the prostate on ice until taken in charge of by a pathologist.
2. Handling of Prostate Tissues
3. RNA Extraction, Purification, and Quality Control
4. WT-ovation RNA Amplification
Recommendations to perform the amplification steps using the WT-Ovation amplification kit in optimal conditions:
Reagent | Volume |
---|---|
First Strand Buffer Mix (A2) | 5 µl |
Poly-A RNA Control (1:25,000) | 0.5 µl |
First Strand Enzyme Mix (A3) | 0.5 µl |
Reagent | Volume |
---|---|
Second Strand Buffer Mix (B1) | 9.75 µl |
Second Strand Enzyme Mix (B2) | 0.25 µl |
Reagent | Volume |
---|---|
Second Strand Buffer Mix (B1) | 1.9 µl |
Reaction Enhancement Enzyme Mix (B3) | 0.1 µl |
Reagent | Volume |
---|---|
SPIA-Buffer Mix (C2) | 5 µl |
SPIA-Primer Mix (C1) | 5 µl |
SPIA-Enzyme Mix (C3) | 10 µl |
5. sscDNA Purification and Quality Control
6. sscDNA Fragmentation
Reagent | Volume |
---|---|
10X One-Phor-All Buffer PLUS | 3.6 µl |
DNase I (0.2 U/µl) | 3 µl |
7. Labeling of Fragmented sscDNA
Reagent | Volume |
---|---|
5x TdT Reaction Buffer | 14 µl |
CoCl2 (25 mM) | 14 µl |
DLR-1a (5 mM) | 1 µl |
Terminal transferase (400 U/µl) | 4.4 µl |
8. Hybridization to the HERV Chip Microarray
Reagent | Volume |
---|---|
Control Oligo B2 (3nM) | 3.3 µl |
20x Eukaryotic Hybridization Control | 10 µl |
2x Hybridization Mix | 100 µl |
99.9% DMSO | 17.7 µl |
9. Washing and Staining
10. Scanning
11. Data Analysis
The value of transcriptomic studies lies primarily in the quality of the starting biological material. If the RNA extraction is performed in optimal conditions, the RNA Integrity Number (RIN) is typically 7 or greater (Figure 4A). The need to hybridize 2 µg of cDNA on the Affymetrix HERV-V2 chip implies the use of an amplification process. A successful amplification step leads to a bell-shaped distribution (Figure 4B). Then, DNAse1 fragmentation is performed in order to homogenize the cDNA size distribution around 100 nucleotides before hybridization (Figure 4C). After hybridization and scanning (Figure 4D), a visual inspection of the image enables one to check if the grid is well aligned to the spots (Figure 4E) and if hybridization controls are consistent (Figure 4F). This step is also useful in order to exclude microarrays in which air-bubbles or errors occurred during the experiment.
Once the chips have passed QC (Figure 5) and after normalization, the statistical analysis of 5 match-pair tumor and normal prostate RNA samples from the Lyon-Sud Hospital led to the identification of 207 HERV probesets with differential expression values (p.val <0.05) (Figure 6A). To support these records and to gain prostate-specific information, 35 additional match-pair samples (colon, ovary, testis, breast, lung and prostate) were added to the analysis and the SAM-FDR procedure (FDR = 20%) eventually identified 44 prostate specific HERV probesets. Among them, the most relevant 10 HERV structures are described (Figure 6B). Further clinical studies will be required to assess the values of sensitivity and specificity of these candidate biomarkers.
Figure 1. Scheme of the overall procedure from the clinic (1: prostatectomy by the clinician and the tissue preparation by the pathologist) to the bench (2-6: sample preparation, target preparation, microarray processing) leading to the identification of candidate biomarkers (7: biocomputing analysis of the HERV microarrays). Nucleic acids derived from normal tissue are depicted in orange; nucleic acids derived from tumoral area consist of a mix of normal (orange) and tumor specific (black) nucleic acids. Click here to view larger image.
Figure 2. Conception and content of the HERV-V2 chip: HERV sequences retrieved from the human genome are stored in a database called HERV-gDB3, then the 25-mer candidate probes pass through a dedicated hybridization modeling procedure (EDA+) before being eventually synthesized on the array (the resulting targeted sub-regions are depicted for each family). Click here to view larger image.
Figure 3. Prostate handling by the pathologist. (A) Fresh radical prostatectomy specimen is transferred to the laboratory. (B-C) The prostate is stained (green on the right side, black on the left side). (D) Large transverse section of the gland on the posterior side. (E) Leaving the margins intact, pieces of tissues are dissected from different areas of the prostate gland. (F) Cores of tissue are placed in an Eppendorf tube. (G) Suture thread is used to close the prostate and to prevent gland distortion and minimal disruption of the surgical margin. Then, the radical prostatectomy specimen is ready for fixing in formalin according to the usual procedure for histological analysis. Click here to view larger image.
Figure 4. Quality controls of nucleic acid preparation and hybridization efficiency. (A) RNA integrity, (B) cDNA amplified targets and (C) fragmented targets used in the hybridization stage. These three quality controls were obtained with the Bioanalyzer using RNA nano chips and the Eukaryote Nano Serie II assay. (D) Overall image of the HERV-V2 microarray hybridization area after scanning, (E) enlargement of the upper left corner showing grid alignment controls and (F) enlargement of the center area showing spotting hybridization controls. Click here to view larger image.
Figure 5. Processing of signals. (A) Affymetrix polyA spike-in amplification controls. The polyA controls Dap, Thr, Phe and Lys transcripts from B. subtilis genes are spiked in the RNA sample and serve to assess the overall success of the target preparation steps. Intensity should be detected at decreasing values among these spike-in controls to ensure that there was no bias during the WT-Ovation amplification between highly- and low-expressed genes. (B) Affymetrix spike-in hybridization controls. These targets isolated from E. coli and P1 bacteriophage are spiked before the labeling procedure. Increasing values from BioB, BioC, BioD and Cre indicate the overall success of the hybridization. (C) Intensity distribution of the chip signals after RMA normalization. Most of the probesets exhibit signals with values lower than 26 (background), indicating an overall expression mainly restricted to some specific HERV loci. Click here to view larger image.
Figure 6. Data analysis. (A) Hierarchical clustering analysis of normal and tumoral samples. Partitioning clustering was applied to the normalized expression values using a Euclidean distance function algorithm, grouping probesets into up (red)- and down (blue)-regulation among normal and tumoral samples. (B) Selection of the top 10 HERV structures identified as candidate biomarker of prostate cancer. For each HERV element, the related HERV family, the genomic coordinates (NCBI 36/hg18) and a brief description of the HERV structure are given. Click here to view larger image.
Figure 7. The HERV repertoire. (A) Sequencing of the human genome revealed 25,000 protein-coding genes (exons, 2%) and a huge amount of transposable elements including 200,000 long-terminal repeat (LTR) retrotransposons (HERV, 8%). (B) Extrapolation from HERV-V2 chip content and associated expression data (79 samples originating from 8 normal versus tumoral tissue types) suggest that one third of the HERV repertoire is transcriptionally active. Click here to view larger image.
Figure 8. Functional interpretation of signals from the chip. (A) Promoter identification and epigenetic control: U3 negative signal (red probe, 5'LTR) versus R-U5 positive signal (blue probe, 5'LTR) suggest U3-driven transcription, supported by the different CpG methylation (solid black circles) content of U3 in peritumoral normal versus tumoral tissues. (B) Splicing strategy: the putative 3.1 kb envelope encoding mRNA expressed exclusively in the tumor is identified using SD1/SA2 splice junction overlapping probe. *Deduced by the comparison with other non-placental tissues. Click here to view larger image.
Over the last 10 years, most of the attempts for HERV expression measurement have used RT-PCR techniques either to focus on a specific locus20-24 or based on the relative conservation of the pol genes to evaluate general trends within HERV genera25,26. Additionally, PCR amplifications using highly degenerated primers coupled with low density microarrays intended to detect and quantify the expression of HERV families27,28. In order to trace the expression of individual locus within a family, approaches based on the PCR amplification of conserved regions combined with subsequent cloning and sequencing enabled transcriptionally active distinct elements of the HML-229,30 or HERV-E4.131 families to be identified. Also ending by cloning and sequencing steps, the genome repeat expression monitoring technique aiming to identify promoters among repeats identified active HML-2 specific human solitary LTRs32,33. We successively developed two generations of high-density microarrays dedicated to the analysis of the HERV transcriptome, introducing methodologies suitable for repeated element probe design in order to minimize cross reactions between paralogous elements within a family34,35. The HERV-V2 chip which targets 2,690 distinct proviruses and 2,883 solo LTRs of the HERV-W, HERV-H, HERV-E 4.1, HERV-FRD, HERV-K HML-2 and HERV-K HML-5 families, unveiled the expression of 1,718 HERV loci (Figures 7A and B) in a wide range of tissues35, illustrated in this paper by the identification of putative prostate cancer biomarkers. In addition, the use of multiple probesets on a given locus is informative about its transcriptional regulation. First, a U3 negative signal in conjunction with a U5 positive one classifies the LTR as a promoter, and conversely U3 positive and U5 negative signals may reflect a polyadenylation role. We thus identified 326 promoter LTRs in a broad range of tissues 35 and, based on this U3-U5 dichotomous information provided by the array, we proposed and experimentally confirmed for some selected cases that such autonomous transcription was controlled by a methylation dependent epigenetic process34 (Figure 8). Second, the detection of signals from e.g. LTR, gag and env independent probesets or issued from probes targeting specific splice junction is informative about the proviral splicing strategy, as illustrated by the ERVWE1/Syncytin1 expression profile in placenta or in tumoral testis34. This indicates that the process of HERV specific probe selection is robust enough to support the identification of tissue-associated splicing strategy, as efficiently as for conventional genes36 (Figure 8).
This method is the first attempt to identify individually HERV locus expression using a custom high density microarray based on Affymetrix technology. The clearly identified advantages of the microarray format to decipher HERV transcriptome consisting of (i) the coordinated exploration of several HERV families and (ii) the simultaneous and independent analysis of the different regions for each locus, e.g. U3 and U5 domains for solo and proviral LTRs, gag or env regions and possible spliced junctions associated with proviral structures, without any a priori on the functionality of the HERV element. Prospects rely upon an improvement of annotations in the microarray-associated biocomputing tools. This should allow one to convert chip signals into biological hypotheses such as whether evidenced active HERVs drive lncRNA transcription or modulate more or less proximal coding gene expression. Indeed, such assumption is supported by recent studies that identified prostate cancer-associated ncRNA transcripts containing components of viral ORFs from the HERV-K endogenous retrovirus family or portions of a viral LTR promoter region37, as well as two gene fusion events namely HERV-K22q11-ETV1 and HERV-K17-ETV38,39. Taken together, this whole transcriptome approach combined with LTR function and splicing strategy identifications may help to decipher the marker versus the trigger components of HERV expression in chronic40,41 and infectious diseases42,43.
The authors have nothing to disclose.
We thank Cecile Montgiraud, Juliette Gimenez, Magali Jaillard and Bertrand Bonnaud for their contribution to the initial development and optimization of the HERV-V2 protocol. We also wish to thank Hader Haidous for his guidance on ethical considerations.
Trizol | Invitrogen | 15596-026 | |
RNA poly-A control stock | Affymetrix | 900433 | |
DNAse 1 | Promega | M6101 | 1,000 U (1 U/µl) |
Terminal transferase | Roche | 3333574001 | 400 U. Including enzyme and coenzyme (CoCl2). |
DLR-1a | Affymetrix | 900542 | |
Hybridization internal controls B2 and 20x Eukaryotic Hybridization Control | Affymetrix | 900454 | |
GeneChip Hybridization, Wash and staining | Affymetrix | 900720 | Including PreHybridization Mix and 2x Hybridization Mix for 30 reactions |
10x One-Phor-All Buffer PLUS | Composition in DEPC-treated water: 100 mM Tris-acetate pH 7.5; 100 mM magnesium acetate; 500 mM potassium acetate. | ||
RNeasy Mini kit | Qiagen | 74104 | RNA cleanup protocol |
WT-Ovation RNA amplification system | Nugen | 2210-24 | |
QIAquik PCR purification kit | Qiagen | 28104 | |
EQUIPMENT | |||
Material Name | Company | Catalogue Number | コメント |
Nanodrop 1000 | Thermo Scientific | ||
GeneChip Scanner 3000 7G | Affymetrix | GS30007G | Optional: autoloader |
GeneChip Fluidics Station 450 | Affymetrix | FS450 | |
GeneChip Hybridization 640 Oven | Affymetrix | 640 | Includes 4 GeneChip Probe array carriers |
Workstation loaded with GeneChip Operating Software (GCOS) including the GeneChip Scanner 3000 High-Resolution Scanning Patch | |||
HERV-V2 chip | Affymetrix | Custom array. For microarray availability (for research use only), please contact: François Mallet Laboratoire Commun de Recherche Hospices Civils de Lyon-bioMérieux Medical Diagnostic Discovery Department Centre Hospitalier Lyon Sud, Bâtiment 3F 69495, Pierre Bénite cedex France Phone: 33 (0)4 72 67 87 85 Email: francois.mallet@biomerieux.com |
|
HERV-V2 conception Dedicated database and annotations The construction of a dedicated database, grouping genomic HERV sequences belonging to 6 HERV families, has been achieved by the following procedure: (i) the most complete and representative sequence of each HERV family was selected from the literature and defined as a prototype sequence (Figure 2). (ii) The 6 prototypes were functionally annotated with reference to their LTR (U3/R/U5) and internal parts (gag/pol/env). (iii) RepeatMasker 44 was then applied using these functional sequences as input libraries. A genome-wide search of all related sequences was performed over the human genome on the basis of a minimum 80% homology (NCBI 36/hg18). (iv) Finally, the functional sequences retrieved by this process were assembled into distinct loci on the basis of their genomic location and eventually implemented in a dedicated HERV database. This database, called HERV-gDB3, contains 10,035 individual HERV loci35. Locus-specific probes design Starting from HERV-gDB3, overlapping tracks of 25-mer candidate probes were firstly generated. Each candidate probe was then aligned against the human genome using KASH 45 in order to assess the cross-hybridization potentialities. This latter estimation was performed by a model developed specifically for this purpose and referred to as EDA+. Briefly, the principle of EDA+ is to take into account the instability brought by mismatches and gaps in a 25-mer target/probe hybridization complex. Candidate probes exhibiting low cross-hybridization risks (i.e. a low number of non-specific genomic targets) are selected and lastly assembled into probesets. Custom HERV GeneChip microarray The custom HERV GeneChip integrates 23,583 HERV probesets and can discriminate 5,573 distinct HERV elements, composed of solo LTRs, complete and partial proviruses (Figure 2). The standard Affymetrix control probes for unbiased amplification and hybridization were also included in the microarray. |