
e67f64fb07e857a5c0357ee683b17854.ppt
- Количество слайдов: 58
Bioinformatics Structural and functional prediction Master in Molecular Biotecnology 2009 -10
Outline ¡ ¡ ¡ Introduction Biological Databases Sequence Comparison 3 D Structure visualization Functional Prediction Structural Prediction http: //mmb. pcb. ub. es/MBIOTEC/
Material and Evaluation ¡ Exercises and slides l l ¡ Campus Virtual http: //mmb. pcb. ub. es/MBIOTEC Evaluation. l Practical test on Campus Virtual.
Bioinformatics intuitive definition Informatic tools that can suggest solutions to biological problems
You really understand a system when you are able to represent it using a mathematical equation Lord Kelvin Living organisms are the most perverse of chemical systems Coulson
FEBRUARY 2001: Public Consortium Celera Genomics NOVIEMBRE 2001 : Ohio State University
Sequencing Paralel / combinatorial synthesis HT Screening Separation Purification Crystallization. . .
DATA INFORMATION
Bioinformatics Genome projects ¡ Functional genomics ¡ Structural genomics ¡ Proteomics, systems biology ¡ Molecular recognition ¡ … ¡
Genome projects Genome Determinations Massive sequencing Genome Annotation Genomics and disease
+ 2124 Virus http: //www. ncbi. nlm. nih. gov/genomes/static/gpstat. html
http: //www. ncbi. nlm. nih. gov/mapview/map_search. cgi? taxid=960 6&build=previous
Functional genomics Statistical analysis DNA-chips …. Expression profiles Image processing Data mining
Statistical analysis ¡ Clustering ¡ Machine learning methods ¡ Ontology ¡
http: //www. ncbi. nlm. nih. gov/geo/
Structural genomics X Ray NMR Structure selection 3 D Structure Homology 3 D Structure-function analysis Molecular modeling Structure-function New biomolecules
Rosalyn Franklin Mapa difracción B-DNA
COX-2 FKBP ADA XO
ATP (Mg) - ACV
Dynamic properties. ¡ Molecular recognition requires structural adjustment
Proteomics Proteoma Metaboloma System biology
HUMAN PLASMA
http: //www. imb-jena. de/jcb/ppi/
Barabasi et al. (and others), since 1999 Pazos et al. , EMBO Reports 2003
Bioinformatics & prediction ¡ Most used bioinformatics tools try to predict function or structure of macromolecules ¡ Sequence information is the primary entry point ¡ Evolutionary pressure assures conservation l DNA seq < Protein 3 D structure
Prediction. Possible scenarios 1. Homology can be recognized using sequence comparison tools or protein family databases (blast, clustal, pfam, . . . ). Structural and functional predictions are feasible 2. Homology exist but cannot be recognized easily (psiblast, threading) Low resolution fold predictions are possible. No functional information. 3. No homology 1 D predictions. Sequence motifs. Limited functional prediction. Ab-initio prediction
Reminder ¡ Bioinformatics “suggests” answers, experimental proof is still necessary ¡ Bioinformatics can “save work”. Hypothesis can be tested “in silico” ¡ Bioinformatics can do impossible experiments ¡ However, never trust bioinformatics
Biological databases
DNA sequence Molecular Recognition Protein sequence 3 D Structure 41
In real life however …. . >gi|261252063|ref|NZ_ACZV 01000005. 1| Vibrio orientalis CIP 102891 VIA. Contig 80, whole genome shotgun sequence ACGCGTTAAGTAGACCGCCTGGGGAGTACGGTCGCAAGATTAAAACTCAAATGAATTGACGGGGGCCCGC ACAAGCGGTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTACTCTTGACATCCAGAGA AGCCGGAAGAGATTCTGGTGTGCCTTCGGGAACTCTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTG TTGTGAAATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATCCTTGTTTGCCAGCGAGTAATGTCGG GAACTCCAGGGAGACTGCCGGTGATAAACCGGAGGAAGGTGGGGACGACGTCAAGTCATCATGGCCCTTA CGAGTAGGGCTACACACGTGCTACAATGGCGCATACAGAGGGCAGCCAACTTGCGAAAGTGAGCGAATCC CAAAAAGTGCGTCGTAGTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCG TGGATCAGAATGCCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTGGG CTGCAAAAGAAGTAGTTTAACCTTCGGGAGAACGCTTACCACTTTGTGGTTCATGACTGGGGTGAA GTCGTAACAAGGTAGCCCTAGGGGAACCTGGGGCTGGATCACCTCCTTATACGATGATTACTCACGATGA GTGTCCACACAGATTGATATGTCTTTATTAGAGCTTTGAGGGGCTATAGCTCAGCTGGGAGAGCGCTTCG DNA sequence ATOM ATOM ATOM ATOM Molecular Recognition 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 CE 2 CE 3 CZ 2 CZ 3 CH 2 N CA C O CB CG OD 1 OD 2 N CA TRP TRP TRP ASP ASP PHE 115 115 115 116 116 117 Protein sequence 28. 381 27. 500 27. 750 26. 888 27. 053 26. 290 25. 763 24. 689 24. 564 26. 872 26. 368 25. 812 26. 590 23. 915 22. 766 8. 071 9. 825 7. 155 8. 895 7. 584 11. 255 10. 825 11. 802 12. 103 10. 617 10. 397 9. 294 11. 276 12. 348 13. 148 33. 915 32. 526 33. 103 31. 705 32. 002 36. 778 38. 096 38. 607 39. 797 39. 142 40. 557 40. 721 41. 416 37. 709 38. 156 3 D Structure 1. 00 10. 00 10. 00 50. 00 10. 00 42
The amount of data is huge 43
http: //www 3. ebi. ac. uk/Services/DBStats/ 44
Biological databases ¡ Primary l l l ¡ Derived l l l ¡ Annotated a posteriori Data is revised and corrected. Information from literature is added Ex. SWISS-PROT Reusable Experimental data l ¡ Information comes from experiment Database only organizes and provides the data Ex. Gen. Bank, EMBL GEO, SRA Computationally derived l l Ex. PFAM Specific Molecular Database Collection 2009 update
Search strategies ¡ Direct access to database l ¡ Global retrieval l l ¡ Usually more elaborated information Sequence Retrieval System (SRS), NCBI Entrez Automated, uniform. Allows to check several (all) databases simultaneously Program access (bio. XXX, Web services, Taverna)
Origin of information ¡ Individual research l ¡ Good quality but very limited amount Massive sequencing projects: EST, HTS, genome projects. l Large amount of data. Quality not assured. Frequent update
Main sequence repositories ¡ DNA l ¡ EMBL, Genbank, DDBJ Protein l Swissprot/Tr. EMBL, PIR
50
51
52
53
54
55
Trusted annotation Translation from DNA http: //www. expasy. org
Cross links ¡ Most database files contain links to other databases l l DNA sequence to Protein sequence Sequence to 3 D structure Sequence to bibliographic data. .
Warnings Prediction method can fail and some times accurancy is not available ¡ Prediction is always made of known issues ¡ Databases can contain incorrect data ¡ Avoid overvaloration of results ¡