5b7b03a9400afd811df20b6a8bc425a8.ppt
- Количество слайдов: 54
Center for Biologisk Sekvensanalyse ”Resources of Biomolecular Data: Sequences, Structures and Functionality” Ph. D course #27803 Nikolaj Blom Center for Biological Sequence Analysis Bio. Centrum -DTU Technical University of Denmark nikob@cbs. dtu. dk
Center for Biologisk Sekvensanalyse Outline Magnitudes and Scales Resources: Data Sources & Tools • • • Primary DNA sources Sequence Repositories Structure Repositories Functional. Categorization Integration of Databases The Human Genome • Genome Browsers • Prediction. Tools • Evaluationof Prediction Servers Starting points • Link collections
Center for Biologisk Sekvensanalyse Resources: Sources & Tools There is A LOT OF biomolecular databases/sources A LOT OF overlap of information/redundancy A LOT OF TOOLS Personal picks/preferences • User-friendliness • Update intervals • Curation efforts / error correction • Linkage to other DBs
Center for Biologisk Sekvensanalyse Faster than Moore’s law. . .
Center for Biologisk Sekvensanalyse Human Genome Published HUGO: Nature, 15. feb. 2001 Celera: Science, 16. feb. 2001
Center for Biologisk Sekvensanalyse Magnitudes and Scales
How we got the sequence Center for Biologisk Sekvensanalyse Sanger chain termination method
Center for Biologisk Sekvensanalyse Primary DNA sources Trace files repositories Single read: 500 -1000 (~golf ball size jig saw puzzle) bp / Variable quality • Wash. U-Merck Human EST Project / Trace files • ”Base-calling” non-trivial
Center for Biologisk Sekvensanalyse Assembly is Non-trivial !
Center for Biologisk Sekvensanalyse Sequence repositories Gen. Bank et al.
Center for Biologisk Sekvensanalyse Non-redundant and. Curated databases Non-redundant • Manual or automatic curation • DNA • Ref. Seq (NCBI; semi-automated) • Ensembl gene index (automated) • Protein • Ref. Seq (NCBI; semi-automated) • Tr. EMBL (EMBL; automated)
Center for Biologisk Sekvensanalyse Curated database: Uni. Prot/Swiss. Prot SIB - Swiss Instituteof Bioinformatics Protein Knowledgebase / Sequence Database • Highly curated • Experimentalevidenceevaluated (e. g. modifications) • All 80, 000 entries checked by Amos Bairoch himself ; -) Ex. PASy - Expert Protein Analysis System • Proteomicstools: links + local servers
Center for Biologisk Sekvensanalyse Structure databases / Protein Data Bank (PDB) X-ray , NMR biomolecular structures Protein Data Bank (PDB) >22, 000 structures (April 2003) http: // www. rcsb. org/pdb /
Center for Biologisk Sekvensanalyse Functional. Categorization Gene Ontology (GO) • Hierarchical • Controlled vocabulary
Center for Biologisk Sekvensanalyse Functional. Categorization Gene Ontology (GO) http: //www. geneontology. org/ • Molecular Function the tasks performed by individual gene products; examples are transcription factor and DNA helicase • Biological Process broad biological goals, such as mitosis orpurine metabolism that are , accomplished by ordered assemblies of molecular functions • Cellular Component subcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere, and origin recognition complex
Center for Biologisk Sekvensanalyse Integration of databases - Webs of websites Links, links. . . SRS = Sequence Retrieval System • Powerful, complex query language Bio. DAS – Distributed Annotation System http: //srs. ebi. ac. uk/
Center for Biologisk Sekvensanalyse For ’my gene’, how do I: Get an overview the sequence information of known? (Gene. Cards) Examine the ’Genome. Neighbourhood’? (Genome Browsers) Predict protein post-translational modifications (PTMs)? (Prediction servers) • (Evaluate the value of predicted features)
Center for Biologisk Sekvensanalyse Gene. Cards http: //nciarray. nci. nih. gov /cards/
Center for Biologisk Sekvensanalyse Gene. Cards-II
Center for Biologisk Sekvensanalyse Gene. Cards-III
Center for Biologisk Sekvensanalyse Gene. Cards-IV
Center for Biologisk Sekvensanalyse Gene. Cards-V
Center for Biologisk Sekvensanalyse Genetic/Medical Information OMIM, Online. Mendelian Inheritance in Man (NCBI) • The OMIM database is a catalog of human genes and genetic disorders • >13, 000 entries (April, 2002) • Examples cystic fibrosis, : prions, amyloid precursor protein • Condensed highlycurated descriptions of , genetics/disease/animal models/references
Center for Biologisk Sekvensanalyse OMIM-I (http: //www 3. ncbi. nlm. nih. gov/Omim /)
Center for Biologisk Sekvensanalyse OMIM-II
Center for Biologisk Sekvensanalyse OMIM-III
Center for Biologisk Sekvensanalyse For ’my gene’, how do I: Get an overview the sequence information of known? (Gene. Cards) Examine the ’Genome. Neighbourhood’? (Genome Browsers) Predict protein post-translational modifications (PTMs)? (Prediction servers) • (Evaluate the value of predicted features)
Center for Biologisk Sekvensanalyse Genome Browsing Three public • Open access • Use same genomebuild/assembly • NCBI (U. S. ) • UCSC (Santa Cruz, U. S. ) • Ens. Embl (EBI, EU) One private • Restricted, commercial • Academic, free usage: 1 Mbase /week • Proprietaryassembly • Celera Genomics (U. S. )
Center for Biologisk Sekvensanalyse Celera Human/Mouse Genomes
Center for Biologisk Sekvensanalyse Genome Browsers - Portals to the Genomic World NCBI – National Center for Biotechnology Information (U. S. ) • http: //www. ncbi. nlm. nih. gov /Genomes /index. htm l UCSC – Univ. California – Santa Cruz (U. S. ) • http: //genome. ucsc. edu / Ens. Embl – European. Molecular Biology Laboratory (E. U. ) • http: //www. ensembl. org /
Center for Biologisk Sekvensanalyse NCBI
Center for Biologisk Sekvensanalyse NCBI
Center for Biologisk Sekvensanalyse UCSC – Genome Browser
Center for Biologisk Sekvensanalyse UCSC – Genome Browser II
Center for Biologisk Sekvensanalyse
Center for Biologisk Sekvensanalyse Ens. Embl – Genome Browser
Center for Biologisk Sekvensanalyse Ens. Embl – Genome Browser
Center for Biologisk Sekvensanalyse Ens. Embl – Genome Browser
Center for Biologisk Sekvensanalyse Ens. Embl – Genome Browser
Center for Biologisk Sekvensanalyse Ens. Embl – Genome Browser
Center for Biologisk Sekvensanalyse Ens. Embl – Genome Browser
Center for Biologisk Sekvensanalyse For ’my gene’, how do I: Get an overview the sequence information of known? (Gene. Cards) Examine the ’Genome. Neighbourhood’? (Genome Browsers) Predict protein post-translational modifications (PTMs) or Gene Structure? (Prediction servers) • . . . and evaluatethe reliabilityof predictionmethods
CBS Services/Toolbox Center for Biologisk Sekvensanalyse http: //www. cbs. dtu. dk/services/
Center for Biologisk Sekvensanalyse
Net. Phos – a prediction server Center for Biologisk Sekvensanalyse http: // www. cbs. dtu. dk /services/ Net. Phos /
Center for Biologisk Sekvensanalyse Net. Phos – a prediction server
Center for Biologisk Sekvensanalyse Evaluating. Prediction Servers Performance on independent cross/ validated data presented? Published in peer-reviewed journal? Cited by others? • Science Citation Index Linked to from credible web sites? • Google Page-rank • ”link: URL” search
Center for Biologisk Sekvensanalyse Evaluating. Prediction Servers
Center for Biologisk Sekvensanalyse 2 can Bioinformatics Education At EBI – European Bioinformatics Institute http: //www. ebi. ac. uk/2 can/index. html Tutorials, resource links, etc.
Center for Biologisk Sekvensanalyse Starting Points General Bioinformatics • NCBI, National Center for Biotechnology Information, U. S. • EBI, European Bioinformatics Institute Prediction. Tools • CBS, DK • Expasy (Protein analysis), Switzerland
Center for Biologisk Sekvensanalyse Dynamic Resources Pros • Includes most recent developments • Updated regularly • User interface improves(usually) Cons • Difficult to keep pace • Tutorials and lectures hard to recycle ; -( • Difficult to use at irregular intervals
Center for Biologisk Sekvensanalyse Genome Browsers - Portals to the Genomic World Three main entry points: • NCBI, UCSC, Ens. Embl • Essentiallycontain same information • High degree of linking to secondary databases • Advisable to become familiarwith only one genome browser • Learn to navigate and make queries Gene. Cards and OMIM • well suited for getting a quick overviewof a gene of interest
Center for Biologisk Sekvensanalyse Prediction Servers Evaluate scientific ’soundness’ • Look for indications quality (citations, etc. ) of Rememberthat prediction servers provide. . . well, predictions!
Center for Biologisk Sekvensanalyse The End
5b7b03a9400afd811df20b6a8bc425a8.ppt