Скачать презентацию Introduction to Genomics and Proteomics Historical Perspective and Скачать презентацию Introduction to Genomics and Proteomics Historical Perspective and

32a2e0cf1ec9ecf3cb01465d35967641.ppt

  • Количество слайдов: 60

Introduction to Genomics and Proteomics Historical Perspective and the Future Eleftherios P. Diamandis, M. Introduction to Genomics and Proteomics Historical Perspective and the Future Eleftherios P. Diamandis, M. D. , Ph. D. , FRCPC (C) UNIVERSITY OF TORONTO (Course 1505 S/Jan. 9, 2001 #1)

Organization of the Lecture Historical Background The Human Genome Project Critical Technologies: • Massive, Organization of the Lecture Historical Background The Human Genome Project Critical Technologies: • Massive, automated sequencing • DNA and RNA analysis • Mass spectrometry • DNA and protein microarrays • Bioinformatics • Single nucleotide polymorphisms Applications: • Diagnostics • Therapeutics • Pharmacogenetics Ethics Patents (Course 1505 S/Jan. 9, 2001 #2)

Historical Milestones Year 1866 1871 1953 1960 s 1977 1975 -79 1986 1995 1999 Historical Milestones Year 1866 1871 1953 1960 s 1977 1975 -79 1986 1995 1999 2000 2001 Milestone Mendel’s discovery of genes Discovery of nucleic acids First protein sequence (insulin) Double helix structure of DNA Elucidation of the genetic code Advent of DNA sequencing First cloning of human genes Fully automated DNA sequencing First whole genome (Haemophilus Influenza) First human chromosome(Chr #22) Drosophila / Arabidopsis genomes Human and mouse genomes (Course 1505 S/Jan. 9, 2001 #3)

Terminology DNA Genomics m. RNA Transcriptomics Protein Proteomics Metabolites Metabolomics Functional genomics, proteomics ----- Terminology DNA Genomics m. RNA Transcriptomics Protein Proteomics Metabolites Metabolomics Functional genomics, proteomics ----- etc. (Course 1505 S/Jan. 9, 2001 #4)

History On June 26, 2000, at The White House, it was announced that the History On June 26, 2000, at The White House, it was announced that the Human Genome Project was essentially completed by Celera Genomics (private company) The National Human Genome Research Initiative and its International Partners (publicly funded) Work has yet to be published but Celera scientists submitted a paper to “Science” on December 6, 2000. (Course 1505 S/Jan. 9, 2001 #5)

History On June 26, 2000, at The White House, it was announced that the History On June 26, 2000, at The White House, it was announced that the Human Genome Project was essentially completed by Celera Genomics (private company) The National Human Genome Research Initiative and its International Partners (publicly funded) Work has yet to be published but Celera scientists submitted a paper to “Science” on December 6, 2000. (Course 1505 S/Jan. 9, 2001 #5)

Diagnostics / Prognostics • Does my DNA predispose me to a specific disease? • Diagnostics / Prognostics • Does my DNA predispose me to a specific disease? • Do I want to know? (Ethics) • Genetic mutations ® disease ® cancer ® diabetes ® Alzheimer’s ® heart disease • Whole genome scans for identification of mutations/polymorphisms? AACC 2000 -#2 - 1

Pharmacogenetics and Pharmacogenomics Goal is to associate human sequence polymorphisms with: • Drug metabolism Pharmacogenetics and Pharmacogenomics Goal is to associate human sequence polymorphisms with: • Drug metabolism • Adverse effects • therapeutic efficacy ß * Decrease drug development cost * Optimize selection of clinical trial participants * Increase patient benefit AACC 2000 -#2 - 3

Critical Protein Technologies Protein • • Make pure form (recombinant) Activity Reagents (antibodies) Identification Critical Protein Technologies Protein • • Make pure form (recombinant) Activity Reagents (antibodies) Identification (sequencing) Identify post-translational modification (glycation, phosphorylation, etc. ) Protein-protein interactions (physiological function) Gene ® protein knockout / transgene AACC 2000 -#2 - 13

Models of Human Disease • Identify natural human knockouts • Develop mice with every Models of Human Disease • Identify natural human knockouts • Develop mice with every gene (or gene combination) being knocked out (this project is now underway!) AACC 2000 -#2 -14

Expressed Sequence Tags (ESTs) • • • Cloned c. DNAs from various tissues (c. Expressed Sequence Tags (ESTs) • • • Cloned c. DNAs from various tissues (c. DNA libraries) Can search through by BLAST analysis Can purchase them, fully sequence and characterize them Great help for new gene identification. AACC 2000 -#2 -16

Gene Patents • • Gene fragments Whole genes without function Whole genes with function Gene Patents • • Gene fragments Whole genes without function Whole genes with function and utility (enablement) AACC 2000 -#2 - 18

Where Do We Stand Today? (July 2000) Public Consortium: 85% of Genome is done Where Do We Stand Today? (July 2000) Public Consortium: 85% of Genome is done * 24% finished form * 22% near finished * 38% draft * rest is being done Celera: Claims to have more than 99% of genome now! Incyte: They may have all the genes! AACC 2000 -#2 -25

Where Does the Individual Researcher Stand? • • • At the end of the Where Does the Individual Researcher Stand? • • • At the end of the day, each gene must be looked at in great detail: - structure - function - physiology/pathways - pathophysiology - connection to disease - tools Individual researchers can make the big discoveries on a very specific gene or a very specific gene family Great time for individual researchers AACC 2000 -#2 - 20

The Future of Genome Projects Human ¯ Mouse (just started) ¯ Rat ¯ Zebra The Future of Genome Projects Human ¯ Mouse (just started) ¯ Rat ¯ Zebra Fish ¯ Dog ¯ Other Primates * The Era of Comparative Genomics (you can learn a lot about humans by studying the yeast, drosophila, mouse, etc. ) AACC 2000 -#2 - 21

The Impact of the Human Genome Project in Medicine • • You can’t make The Impact of the Human Genome Project in Medicine • • You can’t make a car if you are missing parts Once all genes are known, we will start understanding their function ® PATHWAYS We will then be able to correlate disease states to certain genes (Pathobiology) DISEASE ® GENE (S) ® DISEASE We will then find ways for rational treatments (designer drugs), prevention, diagnosis…… AACC 2000 -#2 - 22

Gene Manipulation (Ethics? ? ) Gene modulation ( ¯ regulation) Gene repair Gene excision Gene Manipulation (Ethics? ? ) Gene modulation ( ¯ regulation) Gene repair Gene excision Gene replacement/transplantation Gene improvement AACC -#2 -23

Celera’s Whole Genome Shotgun Strategy • Doe not use BAC clones; cuts whole DNA Celera’s Whole Genome Shotgun Strategy • Doe not use BAC clones; cuts whole DNA into millions of pieces which are sequenced • Computer assembles pieces together • Achieve high accuracy with X 6 coverage • Lots of relatively short gaps AACC -#2 - 26

Strategy to Sequence Human Genome Construct a human genomic library in an appropriate vector Strategy to Sequence Human Genome Construct a human genomic library in an appropriate vector (BAC) Assemble overlapping BAC clones in order to obtain full coverage of the distance (restriction map) BAC Clones DNA Start sequencing each BAC until you finish the job AACC -#2 - 27

How are these BACs Sequenced? Shotgun Sequencing BAC clone is broken down to small How are these BACs Sequenced? Shotgun Sequencing BAC clone is broken down to small pieces which have overlapping ends Small pieces are sequenced and a computer assembles the pieces based on the overlapping sequence information Construct contigs (contiguous areas of sequence) Larger contigs ------------AACC -#2 -28

Other Important Genomic Technologies • • Recombinant DNA (cloning) PCR Pulsed Field Gel Electrophoresis Other Important Genomic Technologies • • Recombinant DNA (cloning) PCR Pulsed Field Gel Electrophoresis (PFGE) Chromosome microdissection Somatic hybrid cell lines (mapping) [rodent x human] Radiation hybrid cell lines [rodent x human] DNA sequencing AACC 2000 - #2 - 32

Annotation What is annotation? Make sense out of a linear sequence ® identify genes, Annotation What is annotation? Make sense out of a linear sequence ® identify genes, intron/exon boundaries, regulatory sequences, predict protein structure, identify motifs, predict function, etc. Annotation will likely go on for a few years. Major annotation tool Þ BIOINFORMATICS (hardware & software)

Celera Genomics • • The publicly funded project started around 1990 with a goal Celera Genomics • • The publicly funded project started around 1990 with a goal to produce a highly accurate sequence by 2005 Celera started in 1998 and within 2 years sequenced more DNA than the publicly funded consortium! Why? • • • No bureaucracy Facility (300 sequencers x 24 h/day) Powerful supercomputer Lots of money More efficient sequencing approach (no BACs necessary) Use of data from the publicly funded project

Cloning Vectors • Replicable units of DNA which can carry exogenously inserted DNA; size Cloning Vectors • Replicable units of DNA which can carry exogenously inserted DNA; size of insert varies with vector type: • plasmid 5 -10 kb • l phage 20 kb • cosmid 45 kb PAC/BAC (P 1 - or bacterial artificial chromosome) 100 200 kb YAC (yeast artificial chromosome) 1, 000 kb AACC 2000 - #2 - 31

Human Genome • • 3 x 109 base pairs Approximately 100, 000 genes < Human Genome • • 3 x 109 base pairs Approximately 100, 000 genes < 10% of DNA encodes for genes; the rest represents introns/repetitive elements Importance of non-coding sequences currently not understood AACC 2000 -#2 -33

Quality of Sequencing • Clones are sequenced more than once to verify the sequence Quality of Sequencing • Clones are sequenced more than once to verify the sequence many times: x 4 ® rough draft ® 1 error per 100 bases x 8 -11 ® finished draft ® 1 error per 10, 000 bases AACC 2000 -#2 -34

The Next Race • It will not be who has the sequence • It The Next Race • It will not be who has the sequence • It will be how you can use the sequence to arrive at products * DIAGNOSTICS * THERAPEUTICS AACC 2000 -#2 - 35

Genomics and Drug Discovery Genomic technologies are involved in all aspects of the drug Genomics and Drug Discovery Genomic technologies are involved in all aspects of the drug discovery process from target validation though to the marketed drug, which include: • • • Molecular target identification Drug target characterization and validation Lead discovery Lead optimization Clinical candidate to marketed drug AACC 2000 - #2 - 37

Key Corporate Players in Proteomics Compay Location Approach Celera Rockville, MD Databases Incyte Pharmaceuticals Key Corporate Players in Proteomics Compay Location Approach Celera Rockville, MD Databases Incyte Pharmaceuticals Palo Alto, CA Databases Gene. Bio Geneva, Switzerland Databases Proteome Inc. Beverly, MA Databases PE Biosystems Framingham, MA Instrumentation Ciphergen Biosystems Palo Alto, CA Protein arrays Oxford Glyco. Sciences Oxford, UK 2 D gel/MS* Protana Odense, Denmark 2 D gel/MS Genomic Solutions Ann Arbor, MI 2 D gel/MS Large Scale Proteomics Corp. Rockville, MD 2 D gel/MS _________________________ * 2 D gel electrophoresis and mass spectrometry AACC 2000 - #2 -381

Pharmacogenetics and Pharmacogenomics in Drug Discovery ____________________________ Aspect of Drug Development Approach ____________________________ Drug-drug Pharmacogenetics and Pharmacogenomics in Drug Discovery ____________________________ Aspect of Drug Development Approach ____________________________ Drug-drug interactions Examine polymorphism in metabolic enzymes Efficacy Differentiate responders from nonresponders Side Effects Examine variation in gene or genes involved in mediating the effects (may be mechanism related or unrelated) Toxicity Gene expression profiling in cells treated with compound. Look for toxicity signatures. AACC 2000 - #2 - 39

The Biography of the Year 2000 (Francis Collins and J. Craig Venter) The Biography of the Year 2000 (Francis Collins and J. Craig Venter)

Creating an Array of Contigous BAC Clones Creating an Array of Contigous BAC Clones

The …. omics The …. omics

Introduction to Genomics and Proteomics Historical Perspective and the Future Eleftherios P. Diamandis, M. Introduction to Genomics and Proteomics Historical Perspective and the Future Eleftherios P. Diamandis, M. D. , Ph. D. , FRCPC (C) UNIVERSITY OF TORONTO (Course 1505 S/Jan. 9, 2001 #1)

Organization of the Lecture Historical Background The Human Genome Project Critical Technologies: • Massive, Organization of the Lecture Historical Background The Human Genome Project Critical Technologies: • Massive, automated sequencing • DNA and RNA analysis • Mass spectrometry • DNA and protein microarrays • Bioinformatics • Single nucleotide polymorphisms Applications: • Diagnostics • Therapeutics • Pharmacogenetics Ethics Patents (Course 1505 S/Jan. 9, 2001 #2)

Historical Milestone Year 1866 1871 1953 1960 s 1977 1975 -79 1986 1995 1999 Historical Milestone Year 1866 1871 1953 1960 s 1977 1975 -79 1986 1995 1999 2000 2001 Milestone Mendel’s discovery of genes Discovery of nucleic acids First protein sequence (insulin) Double helix structure of DNA Elucidation of the genetic code Advent of DNA sequencing First cloning of human genes Fully automated DNA sequencing First whole genome (Haemophilus Influenza) First human chromosome Drosophila / Arabidopsis genomes Human and mouse genomes (Course 1505 S/Jan. 9, 2001 #3)

Technologies DNA Genomics m. RNA Transcriptomics Protein Proteomics Metabolites Metabolomics Functional genomics, proteomics ----- Technologies DNA Genomics m. RNA Transcriptomics Protein Proteomics Metabolites Metabolomics Functional genomics, proteomics ----- etc. (Course 1505 S/Jan. 9, 2001 #4)

History On June 26, 2000, at The White House, it was announced that the History On June 26, 2000, at The White House, it was announced that the Human Genome Project was essentially completed by Celera Genomics (private company) The National Human Genome Research Initiative and its International Partners (publicly funded) Work has yet to be published but Celera scientists submitted a paper to “Science” on December 6, 2000. (Course 1505 S/Jan. 9, 2001 #5)

Predicting the Future What is going to happen now that the human and other Predicting the Future What is going to happen now that the human and other genomes are completed? How quickly the next steps will happen? What are the potential difficulties? Are we expecting too much? (Course 1505 S - Jan. 15/01 - #6)

Grand Plan Find all the genes Translate genes to proteins “Compute” function by similarity Grand Plan Find all the genes Translate genes to proteins “Compute” function by similarity search and comparison to known proteins “Compute” structure (Course 1505 S - Jan. 15/01 - #7)

Difficulties • Gene prediction programs are unreliable • Function inference by just similarity search Difficulties • Gene prediction programs are unreliable • Function inference by just similarity search may be fallacious • Computation of structure is still unreliable Our databases may get contaminated with “wrong” information. (Course 1505 S - Jan. 15/01 - #8)

Gene Prediction • Programs were designed based on knowledge of already cloned genes (ORFs; Gene Prediction • Programs were designed based on knowledge of already cloned genes (ORFs; splice sites; start/stop codons, etc. ) • These programs provide excellent clues for gene presence but they never or rarely predict the complete gene structure • The computer prediction must be taken as a “starting point” to experimentally clone a gene How many genes in the genome? Estimate: 27, 462 to 312, 278! (Course 1505 S - Jan. 15/01 - #9)

What is a Gene? • Heritable unit corresponding to a phenotype? • DNA that What is a Gene? • Heritable unit corresponding to a phenotype? • DNA that encodes for a protein? • DNA that encodes RNA? • What if RNA is not translated? • What if a “gene” is not expressed? (Course 1505 S - Jan. 15/01 - #10)

Prediction of Function What is function? This is not a simple term Function may Prediction of Function What is function? This is not a simple term Function may be: • a biological process (e. g. serine protease activity) • a molecular event (e. g. proteolysis of a specific substrate) • a cellular structure (e. g. membrane; chromatin; mitochondrion; etc. ) • relevance to a whole process (e. g. cell cycle) • relevance to the whole organism (e. g. ovulation) * Some scientists have now initiated projects to “compute” function of whole organisms. (Course 1505 S - Jan. 15/01 - #11)

Pattern Recognition • Looks for motifs that may have functional relevance (family signatures): * Pattern Recognition • Looks for motifs that may have functional relevance (family signatures): * Membrane anchoring * Catalytic site * Nucleotide binding * Nuclear localization signal * Hormone response element * Calcium binding, etc. • Protein family resources (being created now) (Course 1505 S - Jan. 15/01 - # 12)

Homology • What is “homology”? Definition: Two proteins are homologous if they are related Homology • What is “homology”? Definition: Two proteins are homologous if they are related by divergence from a common ancestor. B Divergent A C Evolution Ancestor D Homologous (Course 1505 S - Jan. 15/01 - # 13)

Analogy • What is “analogy”? Definition: Two proteins are “analogous” if they acquired common Analogy • What is “analogy”? Definition: Two proteins are “analogous” if they acquired common structural and functional features via convergent evolution from unrelated ancestors. A C Convergent Evolution Unrelated B D Analogous (similar structure and/or function) (Course 1505 S - Jan. 15/01 - # 14)

Serine Proteases (Convergent Evolution) Trypsin-like Subtilisin-like Many homologous members Analogous proteins Many homologous members Serine Proteases (Convergent Evolution) Trypsin-like Subtilisin-like Many homologous members Analogous proteins Many homologous members Trypsin and subtilisin share groups of catalytic residues with almost identical spatial geometries but they have no other sequence or structural similarities. (Course 1505 S - Jan. 15/01 - # 15)

Human Kallikrein Gene Family (Divergent Evolution) 15 homologous genes on human chromosome 19 q Human Kallikrein Gene Family (Divergent Evolution) 15 homologous genes on human chromosome 19 q 13. 4 Divergence in tissue expression and substrate specificity (Course 1505 S - Jan. 15/01 - # 16)

Orthologs Proteins that usually perform same function in different species (e. g. DNA polymerase; Orthologs Proteins that usually perform same function in different species (e. g. DNA polymerase; glucose 6 phosphate dehydrogenase; retinoblastoma gene; p 53, etc. ). Paralogs Proteins that perform different but related functions within one organism [usually formed by gene duplication and divergent evolution] (e. g. the 15 kallikrein genes mentioned above). (Course 1505 S - Jan. 15/01 - # 17)

Functional Annotation - Difficulties • Who knows if the best matches in a database Functional Annotation - Difficulties • Who knows if the best matches in a database query is really Orthologs or Paralogs • Modules: Building blocks of proteins. Finding a “module” in a protein does not mean that a “function” can be assigned since these modules do not always perform the same function Aphorism: The properties of a system can be explained by, but not deduced from those of its components (Course 1505 S - Jan. 15/01 - # 18)

Structure Prediction • How proteins fold in 3 D space • We still cannot Structure Prediction • How proteins fold in 3 D space • We still cannot reliably “compute” structures of > 100 amino acid proteins (ab initio methods) • Experiment and computation: Crystallography NMR (Course 1505 S - Jan. 15/01 - # 19)

Future • Lots of rigorous work needs to be done • Holistic view -- Future • Lots of rigorous work needs to be done • Holistic view -- regulation of gene expression -- metabolic pathways -- signaling cascades Remember: Proteins do not work in isolation but within integrated networks. (Course 1505 S - Jan. 15/01 - # 20)

The Importance of Accurate Functional Annotation • • • Function in whole organisms is The Importance of Accurate Functional Annotation • • • Function in whole organisms is complex and interrelated Need for close collaboration between: - software developers - annotators - experimentalists Holistic approaches needed for optimal knowledge-based inference and innovation (drugs, diagnostics, etc. ) (Course 1505 S - Jan. 15/01 - # 21)

How proten Structure is Elucidated How proten Structure is Elucidated

Protein Annotation Protein Annotation

Protein Annotation Protein Annotation

Species Brassicas Thale cress PLANT GENOMES Genome Size (base pairs) Arabidoopsis 1. 0 x Species Brassicas Thale cress PLANT GENOMES Genome Size (base pairs) Arabidoopsis 1. 0 x 108 thaliana -------------------------------------------Oilseed rape/ Brassica napus 1. 2 x 109 canola -------------------------------------------Cereals Rice Oryza sativa 4. 2 x 108 Barley Hordeum vulgare 4. 8 x 109 Wheat Triticum aestivum 1. 6 x 1010 Maize/corn Zea mays 2. 5 x 109 ------------------------------------------Legumes Garden pea Pitsum sativum 4. 1 x 109 Soya bean Glycine max 1. 1 x 109 ------------------------------------------Solanaceae Potato Solanum 1. 8 x 109 tuberosum Tomato Lycopersicon 1. 0 x 109 esculentum ------------------------------------------Human Homo sapiens 3. 2 x 109