|
|
||||||||
Call For Papers: 2nd International Symposium on Animal Functional Genomics
1 Genus Plc., Hendersonville, Tennessee
2 Department of Animal Science and Center for Animal Functional Genomics, Michigan State University, East Lansing, Michigan
ABSTRACT
A growing body of evidence implicates the oocyte as a key regulator of ovarian folliculogenesis and early embryonic development. We have screened bovine cDNA microarrays (containing expressed sequence tags representing >15,000 unique genes) with Cy3- and Cy5-labeled cDNA derived from bovine oocyte samples collected at two different stages of meiotic maturation (germinal vesicle vs. metaphase II; n = 3 samples per group). Here, we present a novel data analysis approach that uses all available information from above experiments to obtain and index the transcriptome of bovine oocytes and changes in transcriptome composition in response to meiotic maturation. Signal intensities (Fg) for all housekeeping genes were omitted prior to analysis. A local threshold for gene expression was computed as background intensity (Bg) plus 2 times the standard deviation of background and foreground signals. Within each array, data were normalized by the LOWESS procedure. Subsequently, a two-stage mixed model was fitted to remove systematic variations. In the first stage, the response was the LOWESS normalized Fg with treatment as a fixed effect. In stage 2, the residuals from stage 1 were analyzed in a gene-specific model that included treatment group and spots nested within patch and array. A test for the difference between least squares means for the treatment effect was performed. A false discovery rate (FDR) adjustment on the p values for the difference was carried out. This novel algorithm was compared with approaches that ignore the FDR and the threshold described herein and stark differences obtained.
bovine; microarray; local threshold; mixed model; false discovery rate
ABUNDANT EVIDENCE INDICATES the oocyte is a key regulator of fertility. A developmental program intrinsic to the oocyte controls the overall rate of ovarian follicular development (8), and oocyte-derived transcripts are critical for early embryonic development before initiation of transcription from the embryonic genome (18). However, in general, the composition of the oocyte transcriptome (catalog of genes expressed in female germ cells) and temporal regulation of the majority of oocyte-expressed genes are not well understood. Furthermore, improvements in procedures for in vitro meiotic maturation of oocytes are critical to enhance application of relevant biotechnologies such as in vitro embryo production and nuclear transfer cloning and dependent upon an enhanced knowledge of regulation of meiotic maturation at the transcriptome level.
The merits of microarray-based transcriptome analysis have been established (1, 20). In microarray experiments, optical fluorescence intensity is the main cause for background intensity (19). Background signal can also be attributable to contamination from the hybridization or washing procedures (19). Therefore, it is customary to adjust the observed foreground intensity for the background intensity (21). However, subtracting the signal background from the foreground intensity introduces negative estimates of gene expression for features with a low dynamic range. Consequently, the high local background hinders quantification of mRNA abundance. Model-based approaches for background adjustment have been proposed (3, 7, 11, 13, 22). Most of these methods depend on the proper choice of positive and (or) negative controls. For this reason, a local threshold criterion for detecting differentially expressed genes that does not depend on representative positive and negative controls is appealing.
In transcriptome profiling experiments of this nature, traditional approaches for controlling the error rates in the presence of a large number of comparisons include conservative and liberal control of family-wise error rates (FWER), using procedures such as Bonferroni correction. When multiplicity exists, these procedures can generate false positives and false negatives. Potential sources of multiplicity include comparison of several treatment or dose groups or genes, multiple endpoints, multiple time points, interim analysis, multiple tests of the same hypothesis (e.g., parametric and nonparametric), variable and model selection, and subgroup analysis. The false discovery rate (FDR) provides an alternative quantification of error under a multiplicity of comparisons. Possible outcomes from multiple comparisons are given in Table 1. 1
|
0 be the proportion of such genes. An estimate of
0 may be taken as the value that solves the equation (16):
![]() |
< 1 is a tuning parameter and was assumed to be 0.09. In multiple testing, the multiplicity-adjusted p value for a particular null hypothesis being tested is the smallest FWER at which the test may be declared significant. Analogously, the q value (16) is the smallest FDR at which the test may be declared significant:
![]() |
15,200 unique genes (17) to partially characterize the bovine oocyte transcriptome and changes in transcriptome composition in bovine oocytes collected at two stages of meiotic maturation [germinal vesicle (GV) and metaphase II (MII)]. In this report, an estimate of
0 from a two- component mixture model (2) was used to obtain an FDR cut-off for the Benjamini and Hochberg (4) step-up procedure. We then compared the number of significant genes with and without FDR adjustments and with and without consideration of the "signal above background" threshold. We anticipated that using FDR adjustments would lead to the inclusion of genes that would otherwise have been ignored by inferences derived from raw p values. MATERIALS AND METHODS
GV and MII oocyte collection.
Ovaries from adult animals were collected at a local abattoir and transported to the laboratory in sterile 0.25 M NaCl. Upon return to the laboratory, ovaries were washed in sterile 0.25 M NaCl, cumulus-oocyte complexes (COCs) were aspirated and selected (those with more than four compact layers of cumulus cells and homogeneous cytoplasm), and cumulus cells were denuded as described previously (5). The denuded GV oocytes (3 pools of 20 oocytes) were snap-frozen in 100 µl of lysis solution (RNAqueous Micro Kit; Ambion, Austin, TX) and stored at 80°C until RNA isolation.
For collection of MII oocytes, GV stage COCs (from adult ovaries; collected as described above) were matured in vitro as described previously (5). Oocytes with expanded cumulus were denuded, selected based on the presence of a single polar body, and processed in groups of 20 (n = 3) as described above.
RNA extraction.
Total RNA was extracted from each pool of GV and MII oocytes using the RNAqueous micro kit (Ambion) according to the manufacturer's instructions. RNA was eluted twice from the silica-based microfilter cartridge using a 10-µl volume of prewarmed (75°C) elution solution according to the manufacturer's instructions.
Total RNA amplification and cDNA microarray analysis.
Total RNA (10 µl) from the pools of GV and MII oocytes (n = 3 each) was amplified using the RiboAmp kit [Arcturus, Mountain View, CA, as described previously (14)]. The quality and quantity of the amplified RNA generated were estimated with a UV spectrophotometer (Beckman Instruments, Fullerton, CA) and the Bioanalyzer 2100 RNA 6000 nanochip (Agilent Technologies, Walbronn, Germany).
Microarray experiments were conducted using procedures described previously (14) and a bovine cDNA array containing expressed sequence tags (ESTs) representing
15,200 unique genes (17). A total of 15 µg of amplified RNA from GV and MII oocytes were used for cDNA synthesis and labeling.
Statistical analysis.
A novel algorithm for characterization of the oocyte transcriptome was developed as follows. Firstly, within channel (Cy3 or Cy5), the local "signal above background" threshold for significance was (1):
![]() |
![]() |
ijkl is the random residual error. In the second step,
ijkl, the residuals for each gene i were used to fit the model:
![]() |
Real-time PCR validation of transcriptome analysis.
Real-time PCR procedures were utilized to confirm oocyte expression of a subset of genes determined to be components of the bovine oocyte transcriptome based on results obtained from analysis of described microarray data using above algorithm. Approximately 15 genes located either just above the detection threshold or just below the median point of genes above the detection threshold were selected for analysis. Gene name, GenBank accession number, and primer sequences for genes selected are detailed in Table 2. Procedures utilized for real-time PCR analysis were as described previously (5) with cDNA derived from RNA isolated from GV and MII oocytes used as template. Criteria used for confirmation of oocyte expression of individual genes by real-time PCR included obtainment of an amplification profile where threshold was reached and an amplification plateau was obtained within 38 cycles and obtainment of a single peak of predicted Tm following melting curve analysis.
|
Determining the FDR cut-off.
A standard analysis for gene expression data uses an FDR cut-off of 5% that is motivated by the traditional p < 0.05 cut-off for estimating a critical region (10). Arbitrary FDR cut-offs of 10% (9) or higher, e.g., 20% (6), have been reported. Figure 1 shows a probability histogram of p values from the two-component mixture model. This plot differs from the uniform distribution because some of the null hypotheses that were tested did not hold. Only a few of the p values correspond to genes that were differentially expressed at the 5% level of significance. While the primary objective was to characterize the oocyte transcriptome, it is worth noting that a majority of the p values corresponded to genes where the null hypothesis of no difference between GV and MII holds. Therefore, an estimate of
0 could be obtained from the mixture model. A plot of the quantiles of the q values and p values from the two-component mixture model is shown in Fig. 2. A quantile is defined as the fraction (or percent) of points below a given value. For example, the 0.3 (or 30%) quantile is the point at which 30% of the data fall below and 70% fall above that value. The data points in Fig. 2 fall approximately along a 45-degree line. This feature demonstrates no lack-of-fit for the mixture model. The estimated value for
0 was 0.52; thus an FDR cut-off of 52% was used.
|
|
|
We used a gene-centric local threshold to detect transcripts with signal above background. As already mentioned, alternative thresholds that use spike-in controls have been proposed (3, 7, 13, 22). Ease of implementation and performance are factors that currently influence which method is routinely used to correct for the background intensity in a given experiment. Recall that the mean signal intensity, mean background variance, and mean foreground variance were used to compute the threshold. The choice of which measure to use in generating the threshold depends entirely on the software used for image analysis.
To conclude, a local threshold that is based on the variation in the foreground and background signal provides a potentially meaningful basis for whole transcriptome analysis. The analytical approach described facilitates identification of factors that can explain the variability in the data and design of microarray experiments that allow for statistical treatment of the variability and estimation of these factors. Specifically, the mixed model methodology adjusts for known sources of variability by standardizing the data to allow for estimation of adjusted means for transcripts. Multiplicity adjustments allow direct control over the percentage of false positives and improve on existing methods with respect to the percentage of false negatives. For this study, an FDR cut-off of 0.52 was considered to be adequate and 93% confirmation of oocyte expression of a subset of genes determined to be components of the oocyte transcriptome based on described computational procedures using above cut-off was obtained. Above cut-off is specific to this study and was derived from the p value distribution. It is postulated that the dynamic nature of the novel algorithm presented herein should augment existing transcriptome analysis pipelines.
APPENDIX
Benjamini and Hochberg Step-up Procedure
The step-up procedure is as follows (4):
Order the raw p values: p(1)
p(2)
...
p(m)
Find
= max{k : p(k)
k
/m}
If
exists, reject tests attributable to p(1), p(2),...,p(
)
Thus, the adjusted P values are given by:
![]() |
PROC MULTTEST PDATA=pvalues FDR;
RUN;
Mixtures of Betas
Under the null hypothesis, the distribution of p values, for any valid test, is uniform on the unit interval, U[0,1]. Any such distribution can be modeled as a mixture of
+ 1 component distributions in which the jth component is a beta distribution, ß(a,b), with probability density function,
![]() |
![]() |
0 is the probability that a randomly chosen test from the collection of tests is for a gene for which there is no population difference in gene expression,
j is the probability that a randomly chosen test is for a gene from the jth component distribution for which there is a true population difference in gene expression. The maximum likelihood estimates for the parameters
j, aj, and bj, was obtained iteratively using the NLMIXED procedure of SAS. The sample SAS code below assumes that the variable raw_p contains the p values: PROC NLIMIXED DATA = pvalues;
PARAMETERS pi0=.05 a=2 b=2;
pi1=1 pi0;
loglikelihood=LOG(pi0 + pi1*PDF('BETA',raw_p, a, b));
MODEL raw_p
GENERAL(loglikelihood);
RUN;
GRANTS
This work was supported by the Rackham Foundation, the Michigan State University Office of the Vice President for Research and Graduate Studies, and the Michigan Agricultural Experiment Station.
FOOTNOTES
Address for reprint requests and other correspondence: G. W. Smith, 1230 Anthony Hall, E. Lansing, MI 48824-1225 (e-mail: smithge7{at}msu.edu).
Article published online before print. See web site for date of publication (http://physiolgenomics.physiology.org).
1 The 2nd International Symposium on Animal Functional Genomics was held May 1619, 2006 at Michigan State University in East Lansing, MI, and was organized by Jeanne Burton of Michigan State University and Guilherme J. M. Rosa of University of Wisconsin-Madison (see meeting report by Drs. Burton and Rosa, Physiol Genomics 28: 1-4, 2006). ![]()
REFERENCES
This article has been cited by other articles:
![]() |
J. L. Burton and G. J. M. Rosa Physiological genomics special issue on animal functional genomics Physiol Genomics, December 13, 2006; 28(1): 1 - 4. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Visit Other APS Journals Online |