eb54863246aae3174aad9d1a8f6e798c.ppt
- Количество слайдов: 49
Short Introduction To EMBL-EBI Vicky Schneider, EMBL-EBI Training Programme Project leader vicky@ebi. ac. uk
What is EMBL-EBI? • Based on the Wellcome Trust Genome Campus near Cambridge, UK • Part of the European Molecular Biology Laboratory • Non-profit organisation 2 3/18/ 2018
The five branches of EMBL Heidelberg Basic research in molecular biology Administration EMBO • Structural biology Grenoble Bioinformatics Monterotondo >60 nationalities Structural biology 3 Hinxton 1500 staff • Hamburg Mouse biology
EMBL member states Austria, Belgium, Croatia, Denmark, Finland, France, Germany, Greece, Iceland, Ireland, Israel, Italy, Luxembourg, the Netherlands, Norway, Portugal, Spain, Sweden, Switzerland the United Kingdom Associate member state: Australia 4
How is EMBL-EBI funded? • In 2010 it cost € 41 million to run EMBL EBI. EU (€ 7. 4 M) EMBL member states (€ 22. 4 M) Charity (€ 4. 1 M) 5 US Govt (€ 2. 9 M) UK Research Councils (€ 2. 5 M)
What Is Bioinformatics?
What is bioinformatics? storing 7 3/18/ 2018 retrieving Interdisciplinary analysing Heart of modern biology
Biology is changing • Data explosion • New types of data • High-throughput biology • Emphasis on systems, not reductionism • Growth of applied biology • molecular medicine • agriculture • food • environmental sciences… 8 Growth of raw storage at EMBL-EBI (in terabytes)
The molecules of life Nature’s ingredients Small molecules provide building blocks, messengers and helpers: Amino acids: the building blocks of proteins Nucleotides and sugars: the building blocks of DNA and RNA Co-enzymes: pigments such as chlorophyll and haem help imprortant processes such as photosynthesis and respiration Hormones: small molecules such as adrenalin and testosterone send important messages from cell to cell 9 3/18/ 2018 The ‘book of life’ DNA contains the information needed to build an organism The interpreter RNA translates the DNA code into protein Molecular machines Proteins carry out the functions of life: Catalysts: enzymes enable reactions to occur at body temperature Structural support: keratin and collagen give structure to our tissues Transport: carrier proteins move molecules into and out of cells Defense: antibodies protect us from disease-causing organisms Movement: myosin in muscles enables them to contract
Bioinformatics underpins life-science research 1 Genomes Contain genes 2 Genes are transcribed 3 Transcripts translate to protein sequences 4 Proteins form threedimensional structures 5 Proteins interact with each other and with small molecules to form pathways 6 Pathways combine to build systems
From molecules to medicine Molecular components Integration Translation Genomes Human populations Nucleotides Biobanks Tissues and organs Transcripts Complexes Therapies Proteins Disease prevention Domains Pathways Cells Structures Small molecules 11 3/18/ 2018 Human individuals Early Diagnosis
Examples of the importance of biological information to all of us
Genome-wide analysis of crop plants • Population growth and climate change are major challenges to food security. • Traditional routes to crop improvement are too slow to keep up with this increase in demand. • Understanding plant genomes helps us identify which species will be most tolerant to drought, salt and pests while still providing optimum nutrition.
Matching the treatment to the cancer • One in ten women in the EU-27 will develop breast cancer before the age of 80. • If we can identify patterns of genes that are active in different tumours, we can diagnose and treat cancers earlier.
Tracking the source of infectious disease • Methicillin-resistant MRSA (Staphylococcus aureus) infection is a global problem. • Transmission of individual clones can be tracked using small variations in DNA sequence. • This technology can be used to identify the source of new outbreaks across continents and within wards.
Barcoding life • DNA barcodes are short sections of DNA that we use to identify an organism. • The Barcode of Life Initiative is developing DNA barcoding as a global standard for identifying species. • Applications include: • Protection of endangered species • Sustaining natural resources through pest control • Food labelling
Repurposing drugs for neglected diseases • Schistosomiasis is a parasitic infection that affects 210 million people in 76 countries. • Resistance is developing to the one available drug. • We look at the Schistosome genome to identify the targets of existing drugs. • Candidates can be tested for anti-schistosomal activity or used as leads for further optimisation.
Lots of data and new types of data Literature Genomes Protein sequence Proteomes Nucleotide sequence Protein structure Gene expression Protein families, domains and motifs Chemical entities Protein-protein interactions Pathways 18 Systems
EMBL-EBI’s mission statement • To provide freely available data and bioinformatics services to all facets of the scientific community in ways that promote scientific progress • To contribute to the advancement of biology through basic investigator-driven research in bioinformatics • To provide advanced bioinformatics training to scientists at all levels, from Ph. D students to independent investigators • To help disseminate cutting-edge technologies to industry • To coordinate biological data provision across Europe 18/03 /2018
Services www. ebi. ac. uk/services
Principles of service provision @ Patrick Hoesly Accessibility Compatibility Portability 21 Comprehensive Quality
Databases: molecules to systems Genomes Ensembl Genomes EGA Nucleotide sequence ENA Functional genomics Array. Expression Atlas Literature and ontologies Cite. Xplore, GO Protein families, motifs and domains Inter. Pro Macromolecular PDBe Protein activity Int. Act , PRIDE Pathways Reactome Protein Sequences Uni. Prot Chemical entities Ch. EBI Chemogenomics Ch. EMBL 22 Systems Bio. Models Bio. Samples
Database collaborations 23
Standards development – international collaborations Genomics Standards Consortium (GSC) http: //gensc. org Genome annotation www. geneontology. org Protein sequence www. uniprot. org Nucleotide sequence www. insdc. org Functional Genomics Data Society www. fged. org Cheminformatics www. ebi. ac. uk/chebi HUPO- Proteomics Standards Initiative (PSI) www. psidev. info/ Pathways www. reactome. org www. biopax. org Metabolomics Standards Initiative (MSI) www. metabolomicssociety. org 24 Protein structure www. wwpdb. org Systems modelling standards www. sbml. org
CATH BLAST Ensembl PDBsum MACi. E VAST ENA Pub. Chem UCSC Genome Browser Cite. Xplore Int. Enz STRING Int. Act SCOP GO PRINTS Inter. Pro. Scan Atlas Genomes GEO Flybase DDBJ Uni. Prot Ch. EBI Ref. Seq Gene 3 D PRIDE PDB Reactome Gen. Bank Pro. Func Pfam Pubmed Macromolecular Structures Small Molecules Gene Expression Molecular Interactions Reactions & Pathways Protein Families (Diagnostic) Literature Ontologies Proteomics Sequence Similarity & Analysis Bio. Models Gramene Protein Sequences Enzymes Array. Express FASTA Nucleotide Sequences Pattern & Motif Search (Diagnostic) GOA Structure Analysis
UCSC Genome Browser Flybase Gramene DDBJ Ref. Seq Ensembl Ref. Seq Gen. Bank Gramene SCOP Uni. Prot PDB CATH PDBsum Ch. EBI Protein Sequences Macromolecular Structures Pub. Chem Small Molecules Atlas GEO Array. Express Gene Expression STRING Molecular Interactions Bio. Models Reactions & Pathways Int. Act Reactome Inter. Pro Nucleotide Sequences ENA Ref. Seq Genomes PRINTS Pfam Int. Enz MACi. E Pubmed Cite. Xplore Protein Families (Diagnostic) Enzymes Literature GO FASTA Gene 3 D SCOP PRINTS Ch. EBI Ontologies PRIDE GOA Proteomics BLAST Inter. Pro. Scan CATH Pro. Func VAST Sequence Similarity & Analysis Pattern & Motif Search (Diagnostic) Structure Analysis
New search service Access from the EBI’s homepage Species selector allows for easy comparison Data organised according to: • gene • expression • protein • structure • literature 27 Explore data, return easily to your results
Goals of the new EBI Search • Relevant to ‘wet-lab’ biologists • Organises information based around a single gene (or a small number of genes) • User-expectation centric (not database centric) • Smooth transition to the detailed information in many of EBI’s core databases • NOT for bioinformaticians: does not provide programmatic access 28
Quick databases tour 29
Genomes 1: Ensembl Chromosomes Genomic alignments Pick a genome Synteny Variations Variation Effect Predictor Gene trees 30 Gene families User Upload
Genomes 2: Ensembl Genomes Genome portals for the five kingdoms of life Interface uses Ensembl technology Variation data for plant, metazoan and fungal species Multi-way comparison of whole bacterial chromosomes 31 Pan-taxonomic comparative analysis
Nucleotides: European Nucleotide Archive (ENA) The ENA has a three-tiered data architecture. It consolidates information from EMBL-Bank, the European Trace Archive (containing raw data from electrophoresis-based sequencing machines) and the Sequence Read Archive (containing raw data from next-generation sequencing platforms). Figure adapted from: Cochrane, G. et al. Public Data Resources as the Foundation for a Worldwide Metagenomics Data Infrastructure. In: Metagenomics: Theory, Methods and Applications (Chapter 5), Caister Academic Press, Universidad Nacional de Cordoba, Argentina. Ed. D. Marco (2010). 32
Transcriptomes: Array. Express Expand results Array. Express Archive: browse experiments Search by keyword Spreadsheets describing the sample properties 33
Transcriptomes: Gene Expression Atlas: browse changes in gene expression Gene page Experiment page 34 Search by gene or biological condition
Some data sources for annotation Input sources for Uni. Prot. KB 35 Functional info PRIDE Protein identification data Inter. Pro HAMAP • Sequence analysis Inter. Pro classification Enzymes Microbial protein families Literature-based annotation Molecular interactions Int. Enz Manual curation Protein families and domains Int. Act • • GO RESID Post-translational modifications Signal prediction Uni. Prot • Automated annotation Transmembrane prediction Other predictions Protein classification
Protein families, motifs and domains: Inter. Pro Powerful tool for protein classification, integrating several methods into one resource Compare methods of protein signature prediction Visualise the taxonomic range for a protein signature View architectures of proteins containing a signature 36
Proteomics services PRIDE: protein identifications from proteomics experiments Int. Act: molecular interactions Ch. EBI: small molecules 37 INTENZ: enzyme classification
Structures: PDBe 38
Chemogenomics: Ch. EMBL database Browse targets Ch. EMBL Target search Kinase SARfari Search results Compound search 39 Neglected Tropical Disease (NTD) archive GPCR SARfari
Pathways: Reactome Compare events in different species View expression values overlaid on a pathway Link to source databases Interaction overlay on a pathway diagram 40 Export pathway to your favourite modelling software
Data management • Over 4 M web requests per day – over 4. 6 M if Ensembl is included • Over 280, 000 unique hosts served per month, excluding Ensembl • Total disk space: 10 petabytes in 2010. • Leased two new data centres (with € 11. 4 M from UK Research Councils) • Over 800 million crossreferences in the databases we serve 41
User support • E-mail support – www. ebi. ac. uk/support • Online help pages – www. ebi. ac. uk/help • 2 Can bioinformatics user support – www. ebi. ac. uk/2 Can • e. Learning Portal – coming soon (elearning@ebi. ac. uk) 42
Research www. ebi. ac. uk/groups
Key facts about research • The EBI provides a unique environment for bioinformatics research • Eight dedicated research groups aim to understand biology through new approaches to interpreting biological data • Services teams also carry out R&D to enhance existing services and develop new ones • Research programme complements services and the two are mutually supportive 44
Curiosity-driven research Genomes Transcriptomes Proteins Pathways and systems Ewan Birney Alvis Brazma Janet Thornton Nicolas Le Novère Paul Flicek Anton Enright Rolf Apweiler Nick Luscombe Nick Goldman John Marioni Gerard Kleywegt Paul Bertone Text mining biology/medicine chemistry/chem engineering Dietrich Rebholz. Schuhmann Chemistry Christoph Steinbeck maths physics John Overington Julio Saez. Rodriguez
Training www. ebi. ac. uk/training
Hands-on training for all levels of experience • Interactive training in our purpose-built IT training suite at EMBL-EBI, Hinxton, Cambridge • Learn from the EBI’s experts through a combination of talks and practical exercises • Take a tour of all our core data resources, or focus in on specific data types • Full programme at www. ebi. ac. uk/training/handson 48
Predoc and postdoc training • Open Days for bioinformatics early-stage researchers www. ebi. ac. uk/training/openday • Ph. D studentships through EMBL International Ph. D Programme www. ebi. ac. uk/training/Studentships • EIPOD interdisciplinary post-doc fellowship programme www. embl. de/training/postdocs/eipod • EBI–Sanger postdoc programme ww. ebi. ac. uk/training/postdoc/ESPOD 49
eb54863246aae3174aad9d1a8f6e798c.ppt