|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Toolbox
1 Department of Animal and Food Sciences, University of Delaware, Newark, Delaware
4 Delaware Biotechnology Institute, University of Delaware, Newark, Delaware
2 Department of Animal and Avian Sciences, University of Maryland, College Park, Maryland
3 Station de Recherches Avicoles, Institut National de la Recherche Agronomique, Nouzilly, France
5 Department of Poultry Science, University of Georgia, Athens, Georgia
| ABSTRACT |
|---|
|
|
|---|
chicken cDNA libraries; high-throughput DNA sequencing; expressed sequence tags; expressed sequence tag sequence assembly; nonredundant gene sets; Gallus gallus
| INTRODUCTION |
|---|
|
|
|---|
A critical step for assembly and annotation of the chicken genome sequence was the acquisition of an extensive catalog of expressed sequence tags (ESTs) (8). This feat was accomplished by completion of several international chicken EST sequencing projects in a relatively short (<5 yr) period (23). Despite its global agricultural importance, the lack of ESTs and a completed genome sequence once hindered genomics research in the chicken. At the inception of our functional genomics project in 2000, only several thousand chicken ESTs, derived mainly from thymic (46) and bursal (1) lymphocytes, had been determined for the chicken. The first chicken EST database and cDNA clone repository (http://www.chickest.udel.edu/) was established (by J. Burnside and R. Morgan) at the University of Delaware (UD) in 2000 with the deposition and annotation of 5,251 chicken ESTs derived from an activated chicken T cell cDNA library (46). The first objective of a second functional genomics project (L. A. Cogburn, T. E. Porter, S. E. Aggrey, and J. Simon) was limited EST sequencing of about 30,000 clones from normalized cDNA libraries for development of tissue-specific chicken microarrays (11). Our chicken cDNA libraries were constructed from metabolic, somatic, neuroendocrine, reproductive, and mixed lymphoid tissues derived mainly from broiler (meat type) chickens. In the midst of our EST sequencing effort, a consortium funded by the British Biotechnology and Biological Sciences Research Council (BBSRC) released a larger and more comprehensive collection of chicken ESTs (3). A total of 332,920 ESTs were derived from 21 normalized libraries representing a wide range of embryonic stages and brain tissues from Leghorn (egg type) chickens and other adult tissues from a mixture of broiler and layer breeds (http://www.chick.umist.ac.uk/). The UD chicken cDNA libraries represent several tissues that are either absent or not well represented in other public chicken EST collections. These unique chicken cDNA libraries were derived from the spleen/bursa/thymus/bone marrow/peripheral blood lymphocytes, pituitary/hypothalamus/pineal, abdominal fat, and oviduct mainly from broiler (meat type) chickens.
In this paper, we describe the construction and normalization of single and multiple tissue cDNA libraries, sequencing of 37,577 chicken cDNA clones from these libraries, and the CAP3 assembly of a chicken gene index from all publicly available chicken ESTs. In addition, several nonredundant EST clone sets were clustered from the UD collection for the production of custom chicken cDNA microarrays (10, 11, 28, 31).
| MATERIALS AND METHODS |
|---|
|
|
|---|
Pilot libraries.
The UD chicken EST sequencing project was initiated with the construction of several "pilot" cDNA libraries (ptr1c, pti1c, pnf-b, pnl-b, and pco1c). These pilot libraries were made by OriGene Technologies (Rockville, MD) with the exception of the pilot oviduct library (pco1c), which was made by Life Technologies (Rockville, MD). Two of the pilot libraries (pnl1s and pco1s) were subtracted with highly redundant clones sequenced from each primary pilot library (pnl-b or pco1c, respectively) using protocols described by Bonaldo et al. (4). The details of construction and subtraction of pilot chicken cDNA libraries are presented as Supplemental Materials (available at the Physiological Genomics web site).1
Lymphoid tissue RNA pools.
RNA was prepared from bursa, thymus, spleen, and bone marrow of four individual broiler chickens at day 18 (embryo) and 1, 3, 5, and 7 wk of age and combined with RNA prepared from a pool of peripheral blood lymphocytes of 4-wk-old Leghorn birds. Each tissue comprised
20% of the final RNA pool for the mixed lymphoid tissue library (pgn1c). [Note that the details of the composition of RNA pools used to construct each primary library are presented in the Supplemental Text and Supplemental Fig. S1.]
Liver RNA pools.
The liver cDNA library (pgl1c) was constructed using liver RNA isolated from broiler (meat type) chickens pooled across different developmental stages and different genetic backgrounds. Equal amounts of RNA from four birds/genetic lines [strains 80 (or 90) and 21 from the Centre for Food and Animal Research (CFAR), Agriculture and Agri-Food Canada, Ottawa, ON, Canada] (29) at four ages (1, 3, 7, and 11 wk) were combined with RNA isolated from the liver of day 17 embryos from a commercial broiler strain (Ross x Arbor Acres). The final composition of the chicken liver cDNA library is represented by day 17 embryos (20%) and six ages of CFAR strains 80 (40%) and 21 (40%).
Abdominal fat RNA pools.
A chicken abdominal fat cDNA library (pft1c) was constructed from a pool of RNA isolated from the same birds used in preparation of the primary liver library (pgl1c). Equal amounts of total RNA from four birds/genetic lines were pooled across four ages. Sequencing from the first normalized fat library (pgf1n) revealed a slight contamination (1.94%) with Escherichia coli phage protein. A second abdominal fat library (pgf2c) was constructed from a mixture of RNA isolated from adult single-comb White Leghorn (SCWL; egg type) chickens, CFAR broiler strains 80 and 21 (7 and 9 wk), and commercial broiler (Ross x Arbor Acres) chickens (day 19 embryos and 1-day- and 3-wk-old chicks). The final composition of the second fat library (pgf2c) represents abdominal fat RNA from three developmental stages (late embryo, juvenile, and adult) of broiler (79%) and egg-type (21%) chickens.
Skeletal muscle and epiphyseal growth plate RNA pools.
The skeletal muscle RNA pool was made from equal amounts of breast (white fiber) and leg (red fiber) muscle RNA from CFAR strains 90 and 21 at six ages (1, 3, 5, 7, 9, and 11 wk) and commercial broiler chickens (day 17 and day 18 embryos and 1-day-old chicks). The skeletal muscle RNA pool was combined with RNA isolated from the epiphyseal growth plate (36) of commercial (Cobb) broiler chickens (1-, 7-, and 14-day-old chicks) (kindly provided by E. Monsonego-Ornan, Volcani Center, Agricultural Research Organization, Bet-Dagan, Israel). Thus the skeletal muscle/epiphyseal growth plate cDNA library (pgm1c) was constructed from an RNA pool made from one-third portions of embryonic and posthatching breast and leg muscle RNA and epiphyseal growth plate RNA from juvenile broiler chickens.
Neuroendocrine tissue RNA pools.
The hypothalamus and pituitary gland were collected from commercial broiler chickens (Avian x Avian strain) during late embryonic (days 12, 14, and 19) and early juvenile development (1, 3, 5, 7, and 9 wk). For embryos, the pituitary glands and hypothalami were pooled together at each age because of the small size of these tissues. The pituitary glands and hypothalami from posthatching chickens were collected and processed separately. Pineal glands were collected from posthatching chickens and processed as a single pool for each age. The final total RNA pool for the neuroendocrine system cDNA library (pgp1c) was composed of 40% pituitary, 40% hypothalamic, and 20% pineal total RNA.
Reproductive tract RNA pools.
The chicken reproductive tract cDNA library (pgr1c) was constructed from RNA isolated from oviduct, ovary, and testes of both broiler (Ross x Arbor Acres) and Leghorn chickens at various stages of sexual development. Testes RNA from 5-, 7-, 13-, 21-, and 35-wk-old broiler males and a year-old Leghorn rooster were pooled together. RNA isolated from immature ovaries of 5-, 7-, and 8-wk-old broiler females and ovaries of 1-yr-old Leghorn laying hens were pooled (the yellow and large white follicles were removed). RNA was also isolated from the magnum, white isthmus, and uterus of commercial laying hens (ISA-Brown) at 3 and 16 h after oviposition to obtain oviduct tissue during different stages of transit of the developing egg. Thus the chicken reproductive tract cDNA library was constructed from an RNA pool composed of 50% oviduct, 25% ovary, and 25% testes RNA.
Construction and normalization of cDNA libraries.
The construction and normalization of our chicken cDNA libraries were performed as a custom service (catalog no. 11315-017) by a commercial company [Life Technologies (LTI), Rockville, MD; now Invitrogen, Carlsbad, CA]. The primary libraries were constructed and directionally cloned using SuperScript II H RNase RT, ElectroMax DH10B cells, and pCMV Sport 6.0 vector, with the exception of pft1c, which was cloned into pSPORT1. The primary libraries contained at least 3 x 106 primary clones. The average insert size (Table 2) was initially estimated by PCR amplification of 23 randomly picked clones/library by the vendor, LTI (Invitrogen).
The libraries were amplified by a semisolid agar procedure to minimize clone size bias and then normalized by LTI's proprietary Subtraction Technology, which is largely based on the procedures described by Soares et al. (40) and Bonaldo et al. (4). The protocols used by LTI (and subsequently Invitrogen) in the construction and normalization of single and multiple tissue cDNA libraries from livestock species have been described in detail elsewhere (15, 39, 41). Our multitissue chicken cDNA libraries were designed to yield the maximum number of nonredundant ESTs for development of custom microarrays.
High-throughput DNA sequencing.
DNA sequencing was performed at Dupont's high-throughput sequencing facility (Agricultural Products Division, EI du Pont de Nemours, Delaware Technology Park, Newark, DE). Big Dye terminator cycle sequencing reactions (20 µl) were performed using vector primers and a one-fourth dilution of Big Dye (v3.0) on ABI 3700 sequencers (Applied Biosystems). Sequence was obtained from the 5'-end to improve the likelihood of obtaining coding sequence and, therefore, the identity of the cDNA. A quality score of 20 (q20), generated by the Phred basecaller (14), was used as the cutoff parameter for sequence data. After trimming of vector sequence and ambiguous bases at the beginnings and ends of reads, the sequences were stored in a Sybase database at Dupont and continuously clustered and compared within and among all chicken cDNA libraries. For clustering of ESTs within a library, distinct cDNA sequences were identified by basic local alignment search tool (Blast)N analysis with a minimum score of 750 (where the matrix was +5/4, gap open and extended by 10), a minimum sequence overlap of 75 bp, and a minimum sequence identity of 80%. The number of distinct cDNA sequences represents the sum of unique singlets (only 1 EST) plus the number of unique clusters containing multiple overlapping ESTs (Table 2).
One to three 384-well plates of randomly picked clones were sequenced from each primary cDNA library to evaluate library normalization. EST sequences were annotated with the highest BlastX and BlastN scores and electronically transferred to the UD investigators for batch submission to GenBank and integration into the chicken EST database. The sources of contaminating sequence, expressed as a percentage of all ESTs sequenced, were as follows: bacterial phage head protein (0.18%) introduced during normalization of one library (pgf1n), bacteria (E. coli) (0.08%), cloning vector (pCMV Sport 6.0) (0.11%), mitochondrial RNA (2.75%), and ribosomal RNA (0.11%).
Bioinformatics.
The UD chicken EST database (http://www.chickest.udel.edu) was developed by INCOGEN (Williamsburg, VA) under a United States Department of Agriculture (USDA)-National Research Initiative (NRI) grant (to J. Burnside and R. Morgan). The top five Blast results for each EST are stored in the database, which is searchable by key word or clone identification (ID) with a web-based browser that also provides BlastN queries of sequences.
All chicken ESTs found in public databases on July 1, 2004, were assembled into contigs to improve clone annotation and to identify nonredundant clone sets for development of custom chicken microarrays. A total of 517,727 chicken sequences (492,786 ESTs and 24,941 mRNAs) were trimmed of the poly(A) tail, vector, phage, and/or bacterial contaminants using Phil Green's Cross_match program (Washington University, St. Louis, MO, and http://www.phrap.com). First, a BlastN analysis was used to group ESTs with overlapping sequence into seven cluster bins. These seven cluster bins were then used to build contigs with the CAP3 sequence assembly program (22), using parameter settings of 90% sequence identity and 40 bp minimum overlap with a maximum overhang length of 50 bp (recommended by X. Huang, Michigan Technological University, Houghton, MI). Furthermore, these contig and singlet sequences were then used in a local BlastN search against the first draft of the chicken genome sequence (ftp://ftp.ncbi.nih.gov/genomes/Gallus_gallus/). The parameters used for the BlastN search against the chicken genome were as follows: E-value <1020, >95% identity, and >75% coverage of the contig sequence.
| RESULTS |
|---|
|
|
|---|
|
1-actin was reduced 319-fold in the normalized library pgm2n. Similarly, the abundance of proopiomelanocortin (POMC) was 97-fold lower in the normalized neuroendocrine library (pgp2n) compared with the primary library (pgp1c). The redundancy of the most abundant clones found in each of the primary libraries was dramatically reduced in each respective normalized library (Supplemental Table S1). A total of 14,346 ESTs were sequenced from the primary (unnormalized) chicken cDNA libraries. The average insert size of clones in the primary libraries (Table 2) ranged from 1.6 kb (pgm1n) to 2.2 kb (pgl1c and pgr1c). The average length of EST reads from the primary libraries was 584 bp, and the percent distinct clones ranged from 53.1% (pgm1c) to 85.4% (pgn1c). A total of 20,091 ESTs were sequenced from six normalized libraries. The average sequencing read from the normalized libraries was 586 bp, and the number of distinct clones sequenced from each library ranged from 74.2% (pgl1n) to 91.2% (pgr1n). The average insert size of clones from the normalized libraries, based on insert size from PCR amplification of four 96-well plates of nonredundant clones/library, ranged between 1.59 kb (pgm2n) and 1.89 kb (pgl1n).
|
Sequence alignment and cluster analyses.
To improve annotation of our EST clones, we assembled a gene index from all chicken ESTs and mRNAs found in public databases on July 1, 2004 (Table 3). The CAP3 sequence cluster program (22) was used at the recommended stringency (40 bp overlap with 90% sequence identity). Considering only the 43,928 ESTs found in the UD collection, 38,186 ESTs were clustered into 13,495 contigs (in silico cDNAs), while an additional 5,742 ESTs were classified as singlets, which represent the nonoverlapping sequences. Thus the UD collection represents 19,237 nonredundant EST sequences (contigs + singlets). There are 6,223 unique sequences that are only found in the UD collection (i.e., UD specific), where 76% of these UD-specific sequences match the draft chicken genome sequence. The UD-specific sequences represent 481 contigs and 5,742 singlets. (Within the 5,742 UD-specific singlets, 85% of the high-scoring ESTs matched the genome sequence, while 67% of the low Blast score ESTs and 75% of unknown ESTs matched the chicken genome sequence.) The CAP3 assembly of 492,786 chicken ESTs and 24,941 mRNAs found in public databases (as of July 1, 2004) shows that 438,535 sequences (414,980 ESTs + 23,555 mRNAs) form 40,850 contigs, while 79,192 sequences (77,806 ESTs + 1,386 mRNAs) represent nonoverlapping singlets. The present CAP3 assembly of a chicken gene index (Table 3) closely corresponds to The Institute for Genomic Research (TIGR) Gallus gallus Gene Index (GgGI; release 8.0) (http://www.tigr.org/tdb/tgi/gggi/), which shows 493,547 chicken ESTs and 23,057 expressed transcripts (ETs or mRNAs) assembled into 116,777 nonredundant sequences from 42,988 contigs (tentative consensus sequences; TCs), 72,941 singlets, and 848 mature transcripts (ETs).
|
The UD CAP3 database of contigs and unassembled singlets is searchable by Blast or key word queries under the Gene Index button (http://cogburn.dbi.udel.edu/). For example, a key word query for CCAAT/enhancer-binding protein-ß (C/EBPß) or a BlastN search with its cDNA sequence against our CAP3 database generates a web page (Fig. 1) \. that displays the ESTs used to assemble UD CAP3 Contig_23098.4. A BlastN search of the UD CAP3 Contig_23098.4 sequence against the TIGR GgGI shows 99% nucleotide identity to GgGI TC158932 (C/EBPß).
|
|
|
| DISCUSSION |
|---|
|
|
|---|
30,000 clones from tissues with the greatest agricultural importance for development of custom high-density cDNA microarrays. Our cDNA libraries were constructed from developmentally and genetically complex pools of RNA to increase novel gene discovery and reduce overall redundancy. This approach of pooling RNA samples from different animals, developmental stages, and tissues before normalization has yielded several high-quality cDNA libraries that were deeply sequenced for porcine (15) and bovine (39, 41) gene discovery. Presently, we have sequenced and functionally annotated an additional 35,407 chicken ESTs from several single and multiple tissue cDNA libraries. These sequences were entered into GenBank and the UD chick EST database (http://www.chickest.udel.edu) as accrued. The total UD collection has made a significant contribution to the present number of chicken ESTs in GenBank and to the assembly of the TIGR GgGI (see Attribution at homepage: http://www.tigr.org/tdb/tgi/gggi/). Furthermore, the UD EST collection was important for the recent functional annotation of the chicken genome sequence (25). Several international EST projects, including ours, have contributed to the total of 517,727 chicken EST sequences in GenBank (as of July 1, 2004). Our CAP3 assembly of these sequences into a chicken gene index has revealed a similar number of contigs (40,850) and singlets (79,192) as those found in the TIGR Chicken Gene Index (release 8.0) (http://www.tigr.org/tdb/tgi/gggi/). The number of contigs represented in these two chicken EST assemblies exceeds the original estimate of 35,000 genes expressed in the chicken genome (3). This large number of chicken contigs could be due in part to the presence of nonoverlapping fragments of identical transcripts. A more recent analysis of the chicken transcriptome, based on an analysis of 19,626 finished cDNAs and 485,337 public ESTs, suggests that there are at least 19,000 chicken genes (23). And, analysis of the first draft of the chicken genome sequence provides an estimate of 20,00023,000 chicken genes (25). Furthermore, alternative splicing of exons can generate an even greater number of putative transcripts (25). Extensive alternative promoter usage, splicing, and polyadenylation contribute to a diverse transcriptome in the mouse of 181,047 transcripts (45). The total number of genes predicted for the chicken is similar to the human genome, which harbors from 20,000 to 25,000 protein-coding genes (24).
The UD EST clone collection represents the second largest catalog and repository of chicken EST clones, which were derived from tissues of major agricultural and biomedical importance. The UD EST collection has a minimum overlap with and is complementary to chicken sequences found in the larger BBSRC database (http://www.chick.umist.ac.uk/) (3) and the bursal (B) lymphocyte transcript database (http://pheasant.gsf.de/DEPARTMENT/DT40/dt40Transcript.html) (1, 6). Our chicken cDNA libraries were constructed from mixed lymphoid tissue and metabolic (liver and abdominal fat), somatic (breast and leg muscle/bone growth plate), neuroendocrine (pituitary/hypothalamus/pineal), and reproductive tissues (oviduct/ovary/testes). Several UD libraries represent novel tissues (i.e., lymphoid, abdominal fat, pituitary, hypothalamus, pineal, and oviduct) that are either not found in or underrepresented in other public chicken EST databases (3). Furthermore, 6,223 unique sequences (481 contigs and 5,742 singlets) are found only in the UD collection. Many of the unique singlets are from the immune system cluster, which is composed of a nearly equal number of contigs and singlets. The UD collection contains 5,742 unique singlet sequences (not found in other public EST collections), of which 74% match to the chicken genome sequence. One explanation of the higher rate of singlets not matching the chicken genome assembly could be the presence of high GC content sequences, which would reduce the frequency of G+C-rich ESTs sequenced from our libraries. However, the initial draft sequence is incomplete, with as many as 10% of the protein-coding genes still missing from the Ensembl gene set and with very poor coverage of microchromosomes and two chromosomes in particular, GGA16 and GGAW (25). Therefore, it seems reasonable that an even higher number of our unique ESTs would match the completely finished chicken genome sequence when it becomes available. Furthermore, a large number of contig and singlet sequences match a chicken genome sequence that is not yet assigned to a specific chromosome (i.e., chrUn). The finished chicken genome sequence could reveal an even greater density of genes on the microchromosomes, which have a higher recombination rate and a higher G+C content than the macrochromosomes (25).
The UD chicken EST collection contains a large number of lymphoid ESTs (12,261 clones) sequenced mainly from two unnormalized cDNA libraries: an activated T cell library (46) and a mixed lymphoid tissue library. Although unnormalized, the mixed lymphoid tissue (pgn1c) had a very low redundancy rate (18%) even after sequencing of 5,642 randomly picked clones. Numerous clusters of differentiation (CD) antigens, cytokines, cytokine receptors, and coagulation/complement factors were identified from ESTs sequenced from the UD lymphoid tissue cDNA libraries. The large number of unknown singlets found in the lymphoid tissue libraries (Supplemental Fig. S3B) may reflect unusual/rare clones that are less likely to have matches in the database.
The UD collection contains ESTs sequenced from other novel chicken cDNA libraries. For example, 6,739 chicken ESTs were sequenced from our adipose tissue cDNA libraries compared with only 2,672 ESTs sequenced from the BBSRC adipose tissue library, which completely failed normalization (3). Osteonectin, or SPARC (secreted protein acidic and rich in cysteine), was very abundant in the primary abdominal fat cDNA library (pft1c). This adipose-specific autocrine/paracrine factor (an "adipokine") is implicated in development of obesity in mice (44). Other adipokines identified in the UD chicken EST collection are adiponectin and visfatin. Visfatin is a newly discovered adipokine secreted from visceral fat that is thought to be a missing link between obesity and diabetes; two contigs in our CAP3 database (UD_Contig_2318.1 and UD_Contig_2318.2) represent chicken homologs of visfatin, previously identified as pre-B cell colony-enhancing factor (PBEF1) (37). Surprisingly, three very important genes involved in lipid metabolism in mammals, hormone-sensitive lipase (HSL), resistin (RETN), and leptin (LEP), have not yet been identified in our collection or among the 578,445 ESTs now sequenced from the chicken. However, the existence of chicken LEP (2, 43) remains very controversial (16, 34). The single EST clone in the BBSRC collection (clone ID no. ChEST698d23), originally identified as chicken LEP, appears to be a contaminating sequence that corresponds to bovine LEP [i.e., 98% identical to bovine LEP (TIGR BtGI TC292189)]. Furthermore, this BBSRC EST sequence for "chicken" LEP fails to show a BlastN hit to the chicken genome sequence. Searches over the genomic region of the chicken comparable with human LEP synteny also revealed no evidence of a chicken LEP gene. In an exhaustive PCR analysis of chicken LEP with multiple primer sets designed from two published cDNA sequences (2, 43), we have consistently failed to produce an amplified PCR product using liver, fat, and muscle total RNA as template (W. Carre, X. Wang, and L. A. Cogburn, unpublished observations). However, two EST clones corresponding to the chicken LEP receptor (LEPR) cDNA sequence (21, 32) were found in our neuroendocrine library (pgp1n).
Our liver cDNA library is also populated by a large number of genes involved in lipogenesis (adipophilin, fatty acid-binding protein, Spot 14, fatty acid desaturase, malic enzyme,
-9 desaturase, etc.). The large number of lipogenic genes found in the chicken's liver reflects a specific feature of avian metabolism, where the liver is the major site of lipogenesis (18, 19). The chicken homolog of apolipoprotein AV (ApoAV) is represented by UD_Contig_12151.1, which was assembled from 13 UD ESTs (11 ESTs from the liver) and 3 public ESTs. This newest member of the apolipoprotein gene cluster (ApoAV) was recently revealed by a comparative analysis of the human and mouse genome sequences (33). Three single nucleotide polymorphisms (SNPs) were found across the ApoAV locus in humans that are associated with plasma triglyceride levels. One SNP in the promoter region of human ApoAV has garnered a great deal of attention as an important determinant of plasma triglyceride levels and a potential molecular marker for diagnosis of cardiovascular disease (38). Detailed sequence alignment and BlastN analysis of chicken ApoAV (UD Contig_12151.1) against the chicken genome sequence revealed its chromosomal location (GGA24_random at 115,638116,824) within an apolipoprotein gene cluster and several potentially important polymorphisms: nine SNPs in the coding region and a 7-bp insertion/deletion polymorphism located in the proximal promoter region near the TATA box (L. A. Cogburn, X. Wang, and W. Carre, unpublished observations). Thus the genetic complexity of our cDNA libraries makes the UD EST collection a valuable resource for discovery of important chicken genes and identification of polymorphisms (11).
A large number of ESTs (8,734) were derived from our neuroendocrine cDNA libraries, which were constructed from the pituitary, hypothalamus, and pineal gland. Numerous pituitary-specific hormones [POMC, growth hormone (GH), and prolactin (PRL)], hormone receptors [leptin receptor, growth hormone-releasing hormone receptor (GHRH-R)], and transcription factors [Pit-1, sterol response element-binding protein 2 (SREBP2)] were sequenced from the neuroendocrine libraries. A number of these gene sequences are unique to our neuroendocrine cDNA libraries and to the UD collection. These include the preprohormone POMC, which yields multiple peptide products from proteolytic cleavage to generate ß-endorphin, ß-lipotropin (ß-LPH),
-melanocyte-stimulating hormone (
-MSH), and adrenocorticotropic hormone (ACTH). In birds, POMC products play critical roles in the regulation of growth, metabolism, and the adaptive stress responses. However, several genes were noticeably absent from our ESTs sequenced from the neuroendocrine libraries [i.e., pre-pro-thyrotropin-releasing hormone (TRH), corticotropin-releasing hormone (CRH), somatotropin release-inhibiting factor (SRIF), gonadotropin-releasing hormone (GnRH), and luteinizing hormone beta subunit (LH-ß)]. Interestingly, three elements of the somatotropic axis that regulate animal growth and development were unique to these libraries and the UD EST collection: GH, GHRH, and GHRH-R genes. Another interesting example of a gene that is unique to our collection is Contig_13370.2 (assembled from 11 UD ESTs from the neuroendocrine libraries), which represents the chicken homolog of ß3-tubulin. Furthermore, the sequencing of a large number of redundant ESTs from the "unnormalized" neuroendocrine library (pgp1n) has contributed to the discovery of SNPs in a number of important pituitary hormones (i.e., POMC, GH, PRL, etc.) (27).
An initial chicken SNP discovery effort, initiated by another UD group (27), identified 1,210 SNPs from a subset of 23,427 UD ESTs. However, a more comprehensive polymorphism map was recently developed for the chicken that contains 2.8 million SNPs or about five SNPs per kilobase of genome sequence (26). The chicken genetic variation map (http://chicken.genomics.org.cn/index.jsp) is based on comparison of the 0.25 x coverage of genome sequence from three distinct domestic breeds (broiler, layer, and Chinese silkie) against the 6.6 x coverage of the red jungle fowl genome sequence (25). Thus the integration of the genetic variability from 549,157 EST sequences (http://www.ncbi.nlm.nih.gov/dbEST/dbEST_summary.html) with the chicken genome sequence (25) and the polymorphism map (26) now provides abundant genomic resources required for fine mapping of quantitative trait loci (QTL) and the eventual identification of polymorphic genes that control many important phenotypic traits.
This recent cache of chicken genomic resources has provided us with the first global view of the homology between the chicken and mammalian genomes and proteomes. BlastX analysis of a set of high-fidelity UD CAP3 contigs against the human protein database showed a median amino acid identity of 73% for the putative chicken homologs, which is very similar to the recent estimate of 76% homology derived from the draft chicken genome sequence (25). About 17% of the contigs and 39% of the singlets in the UD EST collection have no BlastX hit against the GenBank nonredundant (nr) database. Some of these sequences could reflect highly divergent or rarely expressed transcripts. One contig (UD Contig_13866.1), which was assembled from 269 ESTs derived from many tissues and is therefore authentic, is a good example of a highly divergent transcript. This unknown transcript is abundantly expressed in the unnormalized mixed lymphoid tissue library (pgn1c) (Supplemental Fig. S1A). Even after an extensive Blast search, the identity of this putative gene remains unknown. Detailed analysis of unknown sequences could lead to the identification of additional orthologs and paralogs. For example, we have discovered two contigs that represent chicken paralogs (THRSP
and THRSPß) of the human thyroid hormone-responsive Spot 14 protein gene (THRSP) (20). These high-fidelity contigs encode amino acid sequences that are only 29% identical to the THRSP human protein. Our CAP3 assembly of two unique contigs for Spot 14 (THRSP) allowed us to identify this unique gene duplication, which was not revealed by the BBSRC or TIGR chicken EST assemblies. THRSP is an important transcription factor that controls expression of several metabolic genes in the lipogenic pathway (47). We have identified insertion/deletion polymorphisms near the DNA-binding domain of chicken THRSP
that are associated with a QTL for abdominal fatness located on GGA1 (49). THRSPß has a very high G+C content, which we discovered in a single shotgun sequence that was not included in the draft chicken genome sequence. Thus our chicken genomic resources (ESTs, CAP3 database, and tissue-specific microarrays) have been very useful for gene discovery, expression profiling, and identification of major genes that control economically important production traits in the broiler chicken (10, 11, 49).
In summary, we have sequenced 35,407 chicken ESTs from developmentally and genetically complex cDNA libraries that are either absent from or not well represented in other public EST databases. The UD ESTs have been integrated into a comprehensive catalog of expressed chicken genes that will aid the discovery of sequence polymorphisms. The CAP3 assembly of our ESTs with publicly available sequences was used for annotation and selection of nonredundant sets of cDNA clones. The UD EST collection contains 19,237 nonredundant cDNA sequences derived from major physiological (immune, metabolic/somatic, and neuroendocrine/reproductive) systems. Unique system-specific gene sets (Fig. 3) have been used for production of custom chicken cDNA microarrays for transcription profiling. These initial chicken cDNA microarrays have given us the first glimpse of the chicken's transcriptome (7, 12, 30). The availability of high-density microarrays, the immediate access to the CAP3 chicken EST assemblies, and the large number of physical cDNAs in the UD collection (43,928 EST clones) greatly enhance the value of our genomic resources for the chicken. Several chicken EST sequencing projects, including the present one, have now placed the chicken in 10th place for accrued ESTs among all organisms represented in GenBank (http://www.ncbi.nlm.nih.gov/dbEST/dbEST_summary.html). Furthermore, the large international chicken EST collection was essential for the recent assembly and annotation of the first draft of the chicken genome sequence (25). These important new developments, acquisition of large public collections of ESTs, a completed genome sequence, and a dense polymorphism map, emphasize an important new role for the chicken (G. gallus) in developmental biology and genomics research and a continuing role in the advancement of biomedical sciences.
| GRANTS |
|---|
|
|
|---|
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Address for reprint requests and other correspondence: L. A. Cogburn, 531 South College Ave., Dept. of Animal and Food Sciences, Univ. of Delaware, Newark, DE 19717 (e-mail: cogburn{at}udel.edu).
1 The Supplemental Material for this article (Supplemental Text, Supplemental Table S1, and Supplemental Figs. S1S3) is available online at http://physiolgenomics.physiology.org/cgi/content/full/00207.2005/DC1. ![]()
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
J. Wang, Y. Wang, X. Li, J. Li, and F. C. Leung Cloning, Tissue Distribution, and Functional Characterization of Chicken Glucagon Receptor Poult. Sci., December 1, 2008; 87(12): 2678 - 2688. [Abstract] [Full Text] [PDF] |
||||