aba33301b0a28e3cd769b114b884bfa8.ppt
- Количество слайдов: 1
Objectives and coordination structure PI & Project Coordinator: Calvin Qualset Project Manager: Patrick Mc. Guire The Structure and Function of the Expressed Portion of the Wheat Genomes DBI-9975989 Objectives 1 and 2. EST Production Coordinator: Olin Anderson The Project’s goal is to generate and map a large number of unique DNA sequences from the bread wheat genomes. The assumption is that these unique DNA sequences will correspond to individual genes of wheat and their identification is a first step in determining gene function. The ultimate use of this information is the improvement of wheat quality, yield, and adaptability to new and marginal environments, thus increasing production. SAGE Obj. 4. Functional Genomics Sequence Matching The results from this Project will be immediately applicable to other crops, because of the close relationship of wheat to other species in the Triticeae tribe and other grass species, especially corn and rice. The diversity of experimental techniques and traits pursued in the individual laboratories collaborating on this Project is an ideal training ground for graduate students and postdoctoral scientists. The large pool of well-characterized and mapped unique DNA sequences, available in the public domain will be an exceedingly important resource for future Triticeae research and basic functional genomics research. Because of the large size of the wheat genomes, it is unlikely that the actual base-pair sequences of the DNA molecules will be learned completely in the near future. This Project takes an alternative strategy to realize the benefits of new techniques for discovering genes and learning their function. Following the identification of 10, 000 unique wheat DNA sequences (termed ESTs, Expressed Sequence Tags), they will be mapped to their physical Table 1. Sequencing status by library Name* Tissue TA 001 E 1 X endosperm (Cheyenne) TA 001 E 1 S endosperm subtracted (Cheyenne) TA 005 E 1 X dehydrated seedling TA 006 E 1 X unstressed shoot TA 006 E 2 N unstressed shoot normalized TA 006 E 3 N unstressed shoot normalized TA 007 E 1 X cold-stressed seedling TA 007 E 3 S cold-stressed seedling subtracted TA 008 E 1 X etiolated root TA 008 E 3 N etiolated root normalized TA 009 XXX spike (Sumai 3) TA 012 XXX ABA-treated embryo (Brevor) TA 015 E 1 X heat-stressed seedling TA 016 E 1 X vernalized crown TA 017 E 1 X 20 to 45 DAP spike TA 018 E 1 X 5 to 15 DAP spike TA 019 E 1 X pre-anthesis spike TA 027 E 1 X drought-stressed leaf (TAM W 101) TA 031 E 1 X heat-stressed flag leaf TA 032 E 1 X heat-stressed spike TA 036 E 1 X drought-stressed leaf TA 037 E 1 X salt-stressed sheath TA 038 E 1 X salt-stressed crown TA 047 E 1 X root tip TA 048 E 1 X Al-stressed root tip (BH 1146) TA 049 E 1 X dormant embryo (Brevor) TA 055 E 1 X drought-stressed root TA 056 E 1 X Al-stressed root tip TA 058 E 1 X unstressed root at tiller stage TA 059 E 1 X whole grain (Butte) TA 065 E 1 X salt-stressed root TA 066 E 1 X mixed tissue TM 011 XXX vegetative apex (acc. DV 92) TM 043 E 1 X early reproductive apex (acc. DV 92) TT 039 E 1 X whole plant (Langdon-16) SC 010 XXX Al-stressed root tip (Blanco) SC 013 XXX control root tip (Blanco) SC 024 E 1 X anther (Blanco) AS 040 E 1 X anther AS 067 E 1 X anther Total (9/23/02) No. ESTs Within a library No. unassem. No. contigs bled ESTs 2, 728 269 795 2, 261 1, 686 1, 672 938 1555 4, 017 4, 308 10, 287 2, 207 821 2, 286 1, 076 2, 860 11, 194 905 973 1, 012 641 964 943 959 991 2, 927 1, 023 1, 032 1, 025 3, 649 2, 055 1, 404 3031 2, 647 1, 194 1, 198 778 4, 631 2, 466 1, 044 91, 715 22, 001 417 23 82 375 336 139 107 203 643 963 1, 854 264 100 283 127 415 1, 754 94 86 97 55 123 75 125 143 438 116 174 127 624 288 211 432 382 123 105 57 639 330 134 1, 125 218 622 1, 224 268 1, 338 696 956 2, 143 1, 739 4, 881 1, 491 567 1, 555 422 1, 581 5, 201 635 710 716 485 559 743 682 646 1, 519 769 657 770 1451 1385 864 1, 906 1, 516 765 905 649 1, 994 1, 408 695 Among all libraries No. ESTs (unique to library) 305 55 168 433 52 836 181 816 747 702 3, 003 625 200 496 119 499 2, 766 231 243 259 165 166 207 178 214 714 345 219 286 509 585 303 937 673 241 457 319 987 591 231 *In the Name field, TA indicates Triticum aestivum, TM is T. monococcum, TT is T. turgidum, SC is Secale cereale, and AS is Aegilops speltoides. All of the TA libraries are from the Chinese Spring genotype except where indicated otherwise in parentheses in the Tissue field. Training In year 2, the B. Gill lab (KSU) held a workshop (Feb. 11– 16, 2001) for the postdocs from the 10 mapping labs to ensure standard mapping and data entry protocols. Also in year 2, Project PIs were successful with an NSF REU proposal to support participation of 13 under- graduates in Project labs. In year 3, a microarray production and analysis workshop was held for 8 Project postdocs and graduate students in the D. Laudencia-Chingcuanco lab (USDAARS, Albany) (Aug. 12– 16, 2002). OD Anderson X* X* TJ Close X X UCDavis/ARS UCRiverside X Objs. 1 & 2. EST Production Obj. 3. Mapping c. DNA libraries Screening/normalizations Sequencing Data analysis DNA storage/distribution Deletion Mapping Comparative Mapping X BS Gill X* X ME Sorrells X X* J Dvořák X X J Dubcovsky X X KS Gill X X JP Gustafson X X SF Kianian X X JA Anderson X X U Missouri Kansas State U Cornell U UCDavis Objective 3. Mapping Coordinator: Bikram Gill Obj. 6. Genome Structure & Evolution Objective 6. Genome Structure & Evolution Coordinator: Jan Dvořák X* X NLV Lapitan EST Arrays Contract Agreement DBI-9975989 location on wheat chromosomes using a set of deletion stocks. The information gathered on the sequence and position of these genes in the wheat chromosomes is publicly available, distributed by means of the website created for this Project. Investigator Objective 5. Bioinformatics Coordinator: Olin Anderson Obj. 6 Obj. 1 Obj. 2 Obj. 4 Obj. 5 Genome c. DNA Obj. 3 Functional Bioinfor- structure & librariessequencing Mapping genomics matics evolution HT Nguyen Objective 4. Functional Genomics Coordinator: Mark Sorrells Introduction Distribution of research investigators by objective UCDavis Wash State U U Mo/ARS N Dak State U U Minn Colo State U CM Steber Wash State U/ARS X X X* X * designates coordinator for the corresponding objective. Objectives, approaches, and status after 36 months (9/1/99– 8/31/02) 1. To produce c. DNA libraries from as many tissue and condition combinations as possible. Approach: Produce multiple c. DNA libraries from m. RNAs isolated in several labs with a target of 30 total libraries. Status: This work is essentially completed. 50 c. DNA libraries are now available to the Project. 28 were made at T. Close’s lab at the University of California, Riverside, eight are from H. Nguyen’s lab at Texas Tech University, and 14 were contributed from other sources. Tissue sources included spikes sampled at various developmental stages, anther, embryo, endosperm, young seedling, root, crown, and flag leaf and sheath. Tissues were sampled under various treatments, such as drought stress, cold stress, salt stress, aluminum stress, ABA treatment, and vernalization. Of these libraries, 41 have been used to date for ESTs (Table 1). b. The second round of global phrap assembly was done on April 1, 2002 on 77, 022 Project ESTs. Of these, 70, 074 were 5' ESTs and 6, 948 were 3' ESTs. They were assembled into 11, 758 contigs. c. ESTs selected from each contig and those of unassembled ESTs are the resources for singleton selection. Altogether, about 32, 000 ESTs are in this resource pool for further screening. d. ESTs containing sequences found to match retroelements, E. coli, phage, mitochondrial, and chloroplast gene sequences are removed. Sequence comparison is done using the cross_match program. e. Validation process—Redundant ESTs are further screened and removed by comparing 3' sequence data with 3' sequences of previously identified singletons. The resulting singletons were rearrayed for probe distribution. 2. Mapping results 2. To determine the base-pair sequence of these c. DNAs, yielding ESTs. Approach: In-house, single-site 5' sequencing of approx. 3000 clones in at least 30 libraries, with 3' sequencing of putative singletons. Status: Sequencing has been carried out at O. Anderson’s lab, Albany CA. To date, over 90, 000 5'sequenced ESTs have been generated from 41 of the libraries (Table 1). Library quality was evaluated based on (1) number of empty clones or clones containing vector sequence or short adapter sequence only, (2) number of clones containing ribosomal RNA sequence contamination, (3) number of clones with reversed orientation (most of the libraries were made with c. DNAs cloned in the fixed direction). Library complexity was evaluated based on the level of clone redundancy using the method of comparing all the 5' ESTs within each library. ESTs are considered redundant if they show a degree of similarity and overlapping with other ESTs. These ESTs can be grouped and assembled together into a contig. Representatives from each contig and those ESTs not forming contigs are singleton candidates. Those libraries exhibiting the highest proportions of singleton candidates are considered to be of higher complexity, thus worth extensive sampling. EST assembly analysis was carried out among libraries. This analysis has indicated that among the 90, 000 ESTs generated so far, about 22, 000 are singleton candidates (Table 1). More analysis is underway to characterize and identify unique gene sequences. 3. To map into wheat deletion stocks a set of 10, 000 unique ESTs. Approach: Map EST singletons into bins defined by wheat deletion stocks; target is 10, 000 mapped singletons. Status: 1. Singleton selection strategy: a. Processed 5' ESTs were searched against NCBI’s nonredundant nucleotide (blastn) and pro-tein (blastx) databases. As of Aug. 27, 8, 789 probes have been sent out to the 10 labs and mapping data have been returned for 46% of the distributed probes. At Albany, mapping data are processed to display the mapped probes by chromosome bin position, defined by the deletion line break points. The Project has assigned a coordinator from among the investigators for each homoeologous chromosome group who review and validate the assignments of the probe locations. Each probe may identify more than one loci and, at this point, the validated probe locations account for 7, 985 individual loci mapped to chromosome bins. 4. To determine functional activity of the mapped ESTs relevant to reproductive biology of wheat. Approach: Initially, the plan was to produce and analyze with respect to function microarrays of the mapped EST singletons in 10 labs focusing on five aspects of wheat reproduction. As a result of an NSF mid-term site review, the plan now is to develop a test array, organize and hold a training workshop in microarray construction and analysis for Project personnel, and evaluate microarray production strategies for wheat. Status: Technology development for using c. DNAs in microarray analysis has been initiated at Albany. All equipment (arrayer and scanner) are in place and operational in the Albany labs of O. D. Anderson and D. Laudencia-Chingcuanco, and printing of a limited number of a test array is underway. RNA has been prepared to test this initial array and begin evaluation of analysis software options. The training workshop was held in August (see Training). An evaluation of the suitability of arrays of long oligonucleotides for transcriptional analysis in wheat is being carried out. Information sought include an estimate of the optimal size of oligos to represent the wheat ESTs and a comparison of an oligo microarray with a c. DNA array (PIs Steber, Sorrells, and K. S. Gill). 5. To process, analyze, and display data accumulated in this project (bioinformatics). Approach: Develop and enhance means to analyze, interpret, and visualize Project data (data processing, database modifications, and web page maintenance). Status: Protocols were established for data entry and the linking of the EST data to the records for the mapped loci. All the mapping laboratories participate by submitting hybridization results through a webbased interface to the central bioinformatics site in Albany. The information is parsed through Perl scripts which prepared the submitted information for database entry. All hybridizations were scanned by each submitting lab and formatted to an image template, and then submitted to this central database. To date, 3408 images are on line. Data are viewable at the public website (http: //wheat. pw. usda. gov/NSF/). An interface was developed for the mapping coordinators to survey results and verify scoring of the results. Both validated (“Confirmed”) and nonvalidated (“Unconfirmed”) data are presen-ted along with a disclaimer making clear the prelim-inary nature of the unconfirmed locations. Using a relational database built with my. SQL, several dis-play options are available to users through queries with such criteria as location, status of verification, or mapping lab origin. One database uses the ACEDB biologically oriented database program and the other uses the my. SQL relational database program. Information from ACEDB is available through the webace/Ace. Browser interface which also includes links to EST and contig assembly information. The ACEDB display is familiar to many of the Triticeae working laboratories familiar with Grain. Genes. The my. SQL relational database also has these links and, in addition, contains specialized constructions for data-mining the relationships of loci to ESTs and contig assembly information. The my. SQL database is a version built for efficient mining of the archived information. A user-friendly link to EST data from map information was also created. Databases allow linking of the mapped information to other information associated with the EST project. In some cases, external links are made to resource sites and related projects. Co-PI Close has contributed to the annotations for the Project c. DNA libraries that are available from the Project website. In addition, he and a program-mer (Steve Wanamaker) have developed a stand-alone tool for creating contig assemblies of EST data (Harv. EST, http: //harvest. ucr. edu). This data-base integrates all Triticeae EST data, including wheat and rye ESTs generated by this Project and the CUGI Barley EST Project (http: //www. genome. clemson. edu/projects/barley/) and allows analyses of the relationships of ESTs assembled into contigs and their c. DNA library of origin. Bioinformatics personnel: Data Curator Shiaoman Chao, based at Albany, and Bioinformatics Programmer Hugh Edwards, based at Cornell Univ. , are supported by collaboration with USDA ARS bioinformatics specialists in Albany (G. Lazo) and at Cornell (D. Matthews). 6. To analyze gene density and distribution of mapped ESTs and thus genes in the wheat genomes (genome structure and evolution). Approach: Analyze densities and distributions of ESTs in deletion maps. Status: The database of mapped ESTs became large enough in this past year to allow (1) study of wheat transcriptome structure and evolution and (2) comparisons of wheat ESTs with sequence information from other taxa. For (1), one manuscript, directed by co-PI Dvořák and postdoc E. Akhunov, has been submitted and a second is in preparation. For (2), a manuscript, directed by co-PI Sorrells, is in preparation. 1. Analyzing 3977 ESTs mapped into chromosome deletion bins, it was found that single-gene loci that were not subjected to gene duplication and loci ancestral to duplicated loci are most frequently found in proximal chromosome regions, while multi-gene loci and loci derived by duplication are most frequently found in distal chromosome regions. This distribution correlated with increasing recombination rates from centromere to telomere along chromosome arms. It is suggested that recombination has played a central role in evolution of wheat transcriptome structure and that microsynteny of the wheat transcriptome is diverging faster where recombination is higher. 2. Analyzing 2835 ESTs mapped into chromosome deletion bins and segregating populations in comparison to the public rice genome sequence data from ordered BAC/PAC clones, revealed strong similarities between the resulting DNA sequencebased comparative map and previously published comparative maps based on RFLPs. While there appears to be extensive conservation of both gene content and order at the resolution conferred by the physical chromosome deletions in the wheat genome, there has also been an abundance of rearrangements, insertions, deletions, and duplications that may complicate the use of rice as a model for cross-species transfer of information in nonconserved regions.
aba33301b0a28e3cd769b114b884bfa8.ppt