Скачать презентацию Introduction to Bioinformatics 236523 234525 Lecturer Prof Yael Mandel-Gutfreund Скачать презентацию Introduction to Bioinformatics 236523 234525 Lecturer Prof Yael Mandel-Gutfreund

613b9d2a13b282fbdb0846d9cb14b2a8.ppt

  • Количество слайдов: 45

Introduction to Bioinformatics 236523/234525 Lecturer: Prof. Yael Mandel-Gutfreund Teaching Assistance: Shai Ben-Elazar Idit kosti Introduction to Bioinformatics 236523/234525 Lecturer: Prof. Yael Mandel-Gutfreund Teaching Assistance: Shai Ben-Elazar Idit kosti Course web site : http: //webcourse. cs. technion. ac. il/236523

What is Bioinformatics? 2 What is Bioinformatics? 2

Course Objectives • To introduce the bioinfomatics discipline • To make the students familiar Course Objectives • To introduce the bioinfomatics discipline • To make the students familiar with the major biological questions which can be addressed by bioinformatics tools • To introduce the major tools used for sequence and structure analysis and explain in general how they work (limitation etc. . ) 3

Course Structure and Requirements 1. Class Structure 1. 2. 2 hours Lecture 1 hour Course Structure and Requirements 1. Class Structure 1. 2. 2 hours Lecture 1 hour tutorial 2. Home work • Homework assignments will be given every second week • The homework will be done in pairs. • 5/5 homework assignments will be submitted 2. A final project will be conducted in pairs * Project will be presented as a poster –poster day 14. 3 4

Grading • 20 % Homework assignments • 80 % final project 5 Grading • 20 % Homework assignments • 80 % final project 5

Literature list • Gibas, C. , Jambeck, P. Developing Bioinformatics Computer Skills. O'Reilly, 2001. Literature list • Gibas, C. , Jambeck, P. Developing Bioinformatics Computer Skills. O'Reilly, 2001. • Lesk, A. M. Introduction to Bioinformatics. Oxford University Press, 2002. • Mount, D. W. Bioinformatics: Sequence and Genome Analysis. 2 nd ed. , Cold Spring Harbor Laboratory Press, 2004. Advanced Reading Jones N. C & Pevzner P. A. An introduction to Bioinformatics algorithms MIT Press, 2004 6

What is Bioinformatics? 7 What is Bioinformatics? 7

What is Bioinformatics? “The field of science in which biology, computer science, and information What is Bioinformatics? “The field of science in which biology, computer science, and information technology merge to form a single discipline” Ultimate goal: to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned. 8

Central Paradigm in Molecular Biology Gene (DNA) m. RNA Protein 21 ST centaury Genome Central Paradigm in Molecular Biology Gene (DNA) m. RNA Protein 21 ST centaury Genome Transcriptome Proteome 9

From DNA to Genome Watson and Crick DNA model 1955 1960 1965 1970 1975 From DNA to Genome Watson and Crick DNA model 1955 1960 1965 1970 1975 1980 1985 10

1990 First genome Hemophilus Influenzae 1995 Yeast genome 2000 First human genome draft 11 1990 First genome Hemophilus Influenzae 1995 Yeast genome 2000 First human genome draft 11

Complete Genomes Total 2010 1379 2005 294 Eukaryotes 133 39 Bacteria 1152 235 Archaea Complete Genomes Total 2010 1379 2005 294 Eukaryotes 133 39 Bacteria 1152 235 Archaea 94 23 12

1, 000 Genomes Project: Expanding the Map of Human Genetics Researchers hope the effort 1, 000 Genomes Project: Expanding the Map of Human Genetics Researchers hope the effort will speed up the discovery of many diseases's genetic roots 13

25000 genomes… What’s Next ? The “post-genomics” era Annotation Comparative genomics Functional genomics Systems 25000 genomes… What’s Next ? The “post-genomics” era Annotation Comparative genomics Functional genomics Systems Biology Main Goal: To understand the living cell 14

From …. 25000 genomes To…Understanding living cells From …. 25000 genomes To…Understanding living cells

Annotation CCTGACAAATTCGACGTGCGGCATTGCAGACGTGCATG CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA TAT GGA CAA TTG GTT TCT CTG Annotation CCTGACAAATTCGACGTGCGGCATTGCAGACGTGCATG CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA TAT GGA CAA TTG GTT TCT CTG AAT. . . . . TGAAAAACGTA 16

Identify the genes within a given sequence of DNA Annotation Identify the sites Which Identify the genes within a given sequence of DNA Annotation Identify the sites Which regulate the gene Predict the function 17

How do we identify a gene in a genome? A gene is characterized by How do we identify a gene in a genome? A gene is characterized by several features (promoter, ORF…) some are easier and some harder to detect… 18

TF binding site CCTGACAAATTCGACGTGCGGCATTGCAGACGTGCATG CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA TAT GGA CAA TTG GTT TF binding site CCTGACAAATTCGACGTGCGGCATTGCAGACGTGCATG CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA TAT GGA CAA TTG GTT TCT CTG AAT. . . . Transcription Start Site promoter . . . TGAAAAACGTA ORF=Open Reading Frame Ribosome binding Site CDS=Coding Sequence 19

Using Bioinformatics approaches for Gene hunting Relative easy in simple organisms (e. g. bacteria) Using Bioinformatics approaches for Gene hunting Relative easy in simple organisms (e. g. bacteria) VERY HARD for higher organism (e. g. humans) 20

Comparative genomics 21 Comparative genomics 21

Perhaps not surprising!!! How humans are chimps? Comparison between the full drafts of the Perhaps not surprising!!! How humans are chimps? Comparison between the full drafts of the human and chimp genomes revealed that they differ only by 1. 23% 22

So where are we different ? ? Human ATAGCGGGGGGATGCGGGCCCTATACCC Chimp ATAGGGGGGATGCGGGCCCTATACCC Mouse ATAGCGGGATGCGGCGCTATACCA Human So where are we different ? ? Human ATAGCGGGGGGATGCGGGCCCTATACCC Chimp ATAGGGGGGATGCGGGCCCTATACCC Mouse ATAGCGGGATGCGGCGCTATACCA Human ATAGCGGGGGGATGCGGGCCCTATACCC Chimp ATAGGGG--GGATGCGGGCCCTATACCC Mouse ATAGCG---GGATGCGGCGC-TATACC-A 23

And where are we similar ? ? ? VERY DIFFERENT VERY SIMAILAR Conserved between And where are we similar ? ? ? VERY DIFFERENT VERY SIMAILAR Conserved between many organisms 24

Functional genomics 25 Functional genomics 25

TO BE IS NOT ENOUGH In any time point a gene can be functional TO BE IS NOT ENOUGH In any time point a gene can be functional or not 26

From the gene expression pattern we can lean: What does the gene do ? From the gene expression pattern we can lean: What does the gene do ? When is it needed? What other genes or proteins interact with it? …. . What's wrong? ? 27

Systems Biology 28 Systems Biology 28

Biological networks Jeong et al. Nature 411, 41 - 42 (2001) Biological networks Jeong et al. Nature 411, 41 - 42 (2001)

What can we learn from a network? What can we learn from a network?

What can we learn from Biological Networks What can we learn about this protein What can we learn from Biological Networks What can we learn about this protein • Is the protein essential for the organism ? • Is it a good drug targets?

What of all this will we learn in the course? The course will concentrate What of all this will we learn in the course? The course will concentrate on the bioinformatics tools and databases which are used to : Annotate genes, Compare genes and genomes Infer the function of the genes and proteins Analyze the interactions between genes and proteins ETC…. 32

Biological Databases The different types of data are collected in database – Sequence databases Biological Databases The different types of data are collected in database – Sequence databases – Structural databases – Databases of Experimental Results All databases are connected 33

Sequence databases • • Gene database Genome database Disease related mutation database …………. 34 Sequence databases • • Gene database Genome database Disease related mutation database …………. 34

Genome Browsers Easy “walk” through the genome UCSC Genome Browser http: //genome. ucsc. edu/ Genome Browsers Easy “walk” through the genome UCSC Genome Browser http: //genome. ucsc. edu/ 35

Disease related database 36 Disease related database 36

Sickle Cell Anemia • Due to 1 swapping an A for a T, causing Sickle Cell Anemia • Due to 1 swapping an A for a T, causing inserted amino acid to be valine instead of glutamine in hemoglobin Image source: http: //www. cc. nih. gov/ccc/ccnews/nov 99/ 37

Healthy Individual >gi|28302128|ref|NM_000518. 4| Homo sapiens hemoglobin, beta (HBB), m. RNA ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA GGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGC AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG Healthy Individual >gi|28302128|ref|NM_000518. 4| Homo sapiens hemoglobin, beta (HBB), m. RNA ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA GGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGC AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGC TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGCTGCACTGTGACAAGCTGCACGTGGAT CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCA CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACT GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTTCATTGC >gi|4504349|ref|NP_000509. 1| beta globin [Homo sapiens] MVHLTP EEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN ALAHKYH 38

Diseased Individual >gi|28302128|ref|NM_000518. 4| Homo sapiens hemoglobin, beta (HBB), m. RNA ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA GGTGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGC AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG Diseased Individual >gi|28302128|ref|NM_000518. 4| Homo sapiens hemoglobin, beta (HBB), m. RNA ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA GGTGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGC AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGC TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGCTGCACTGTGACAAGCTGCACGTGGAT CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCA CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACT GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTTCATTGC >gi|4504349|ref|NP_000509. 1| beta globin [Homo sapiens] MVHLTP VEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN ALAHKYH 39

Structure Databases • 3 -dimensional structures of proteins, nucleic acids, molecular complexes etc • Structure Databases • 3 -dimensional structures of proteins, nucleic acids, molecular complexes etc • 3 -d data is available due to techniques such as NMR and X-Ray crystallography 40

41 41

Databases of Experimental Results • Data such as experimental microarray images- gene expression data Databases of Experimental Results • Data such as experimental microarray images- gene expression data • Proteomic data- protein expression data • Metabolic pathways, protein-protein interaction data, regulatory networks • ETC…………. 42

Literature Databases Pub. Med http: //www. ncbi. nlm. nih. gov/pubmed/ Service of the National Literature Databases Pub. Med http: //www. ncbi. nlm. nih. gov/pubmed/ Service of the National Library of Medicine 43

Putting it all Together • Each Database contains specific information • Like other biological Putting it all Together • Each Database contains specific information • Like other biological systems also these databases are interrelated 44

PROTEIN PIR SWISS-PROT DISEASE ASSEMBLED GENOMES Locus. Link OMIM Golden. Path OMIA Worm. Base PROTEIN PIR SWISS-PROT DISEASE ASSEMBLED GENOMES Locus. Link OMIM Golden. Path OMIA Worm. Base MOTIFS TIGR BLOCKS Pfam GENOMIC DATA Prosite Gen. Bank ESTs DDBJ EMBL db. EST GENES Ref. Seq unigene All. Genes SNPs GENE EXPRESSION db. SNP PATHWAY STRUCTURE PDB MMDB SCOP Stanford MGDB KEGG Net. Affx COG GDB Array. Express LITERATURE Pub. Med 45