775b4d395438b71ab42e2bf911648ad1.ppt
- Количество слайдов: 37
my. Grid Professor Carole Goble University of Manchester, UK Open Middleware Infrastructure Institute-UK
Middleware for Bioinformaticians From Cottage Industry to Industrialisation Enable life scientists to rapidly assemble services into automated, reusable data-intensive in silico experiments. Record and Share steps, methods, results, provenance. Best practice. Improve practice. Accelerate discovery. Avoid reinvention.
Middleware for Bioinformaticians Come as you are Community services. Community applications. 1100+ datasets NAR 2009. As they are. Heterogeneous. In flux. Fit in with Open services. Open Apps. Local environments.
Middleware Stakeholders Bioinformaticians, Tool developers, Service providers. Integrate and reuse inhouse and third party applications and datasets. Deploy into specialist, customised apps for Biologists Bioinformaticians Tool Developers Service Providers
my. Grid Phases 1 Pilot. Practice + Best Registries Innovate Prototype Taverna OGSA-DQP Soaplab Production Adoption Sustain Support 2 OMII-UK Platform + Projects 3 Basket of Projects UTOPIA e. Laboratories my. Experiment Bio. Catalogue Taverna OGSA-DQP Soaplab UTOPIA Sys. MO e. Stat Method. Box … my. Experiment Bio. Catalogue Taverna OGSA-DQP Soaplab
Pilot: Scientific Case Studies Trypanosomiasis in Cattle (Manchester, Liverpool, Nairobi) Grave’s Disease (Newcastle) Williams-Beuren Syndrome (Manchester + St Mary’s Hospital)
Enable change in experimental practice • Systematic and comprehensive automation with accurate provenance. • Elimination of user bias and premature filtering in datasets in Trypanosomiasis analysis. • Elapse time to perform WBS pipeline: 2 weeks to 2 hours. • Workflow exchange • Dry scientists hypothesise; Bench scientists validate. • Science factories. Kell and Oliver Bio. Essays 26. 1 8
Pilot: Science • Identification of a mutation associated with the autoimmune disorder Graves’ Disease in the I kapa Bepsilon gene • Identification and classification of proteins secreted by the anthrax bacterium Bacillus anthracis • First complete and accurate map of the region of chromosome 7 involved in Williams Beuren Syndrome • Automatic reconstruction of genomescale yeast metabolic pathways, by creating and manipulating SBML models • Automatic target selection for protein structure and function studies in the e -Family project. Identification of a pathway for which its correlating gene (Daxx) is believed to play a role in trypanosomiasis resistance.
Pilot: Supporting the Life. Cycle
Pilot: Technologies and Products Scientific Workflows Distributed Query Processing Monitor Provenance Assemble Catalogues Registries Repositories Services Ontologies Notebook Sharing Semantic Web Compare Virtual Research Environments Social Computing
Pilot: Technologies and Products Scientific Workflows Distributed Query Processing DQP Info. Model Provenance Model; Life Science IDs; Semantic Web Catalogues Registries Repositories Services Feta Ontologies my. Grid Ontology Virtual Research Environments Social Computing
Post-Pilot: Technologies and Products DQP Scientific Workflows Distributed Query Processing Open Provenance Model; Semantic Web Catalogues Registries Repositories Services Ontologies my. Grid Ontology Virtual Research Environments Social Computing
EMBOSS Service Wrapper Taverna Workflow Workbench Workflow Execution Engine Workflow Engine Scufl Workflow Server Bio. Catalogue Services Resource my. Experiment Workflow Resource
Outcomes • A workflow system adopted by > 350 organisations for fielded use in bioinformatics, ported over multiple e. Infrastructure platforms. – Adopted by local scientists and international research centres – Adopted beyond bioinformatics. – Active community of users, developers and collaborators • Premium community resources – Bio. Catalogue for web services. – my. Experiment for scientific workflows. • Over 195 publications in computing and bioinformatics. – workflow, semantic metadata for scientific provenance and services, publishing and social computing. – Spawned projects in provenance, data management, services, workflow analytics and e-Laboratories. • Continued follow-on funding. Commercial support plan.
A multi-institutional, multidisciplinary, multi-project group with numerous collaborators In 2009: XX developers; XX researchers; XX projects.
Pilot Outcomes Notification Framework Microbase Text Mining Termino CLEF Information Model GOLD OGSA-DQP DAIT OMII-UK OGSA-DAI BIRN and others Provenance PASOA, Provenance Open Provenance Model Southampton Spin-Off Co Workflow Repository my. Experiment Nema, Space. Book, Neuro. Hub, Sys. MO-DB, Method. Box, e. Stat … Service Registry GRIMOIRES Bio. Catalogue Taverna Workflow OMII-UK, Platform (blue funded by e. Science Programme) Community Resource NGS, CARMEN Community resource Wide spread adoption CARMEN, ISPIDER, my. Tea, my. IB, MIASGrid …
Platform Independent. Open Source. Scientific Workflow Management System • Workbench, Server, Platform, Plugins, Portal Plugin • 69, 500+ downloads (total) – Taverna 1. 7 11, 100+ – Taverna 2 beta 3, 500+ • 350+ organisations – 23 commercial – 140 active during July 09 audit • UK , China, Europe, USA, Canada, SE Asia and South America (35 countries) • Developers community – 10 third party plug-ins – 3000+ publicly accessible Web Service operations
UK Institutes UK and European Systems Biology International Institutes International. N etworks Universities Projects Lots of Universities
What’s Taverna used for?
Beyond Life Science Social Science Obesity e. Lab, FLOSS (USA) UK National e. Infrastructure for Social Science Meteorology Astronomy Astro. Grid, HELIO (EU), Engineering European Southern Observatory Finland RENCI (USA) The Jet Propulsion Laboratory, NASA (USA) Chemistry Chemo. Bio. Grid (USA), CDK-Taverna (Germany), Chemical Informatics and Cyberinfrastructure Collaboratory (USA) Medical image MIASGrid, cancer. Grid, processing EGEE III CNRS (France) e-Learning Document digitisation Music British Library, National Library of the Netherlands Computer Science Goal. Net (Singapore); Phylogrid (Spain) Shanghai Jiaotong University (China) University of Bath
e-Infrastructure Adoption – ARC, g. Lite, Globus, ca. Grid, Sun Grid Engine, GRIA, Amazon Cloud – Dutch Grid, Nordic Grid • Community services and platforms Lymphoma Prediction • Web Services • Grid middleware – R, Bio. Mart, Bio. Moby • Local processes – Java API, Beanshell scripts Wei Tan Univ. Chicago
www. myexperiment. org Socially share, discover and reuse workflows and other methods. 2781 members, 208 groups, 865 workflows, 278 files and 91 packs The Scientific Workflow community’s Public Repository Other disciplines: astronomy, statistics, numerical methods, chemistry plans. For social networking: Concept Web Alliance.
Publish. Share. Monitor. Protect. 2 nd Generation my. Grid registry • Result of… Crowd Curated Community resource 1154 services, 200 providers and 151 members The Life Science community’s Public Resource
Follow-on Projects £XXXXXXX (e-Science Programme, current) Scientific Driven Infrastructure – MIASGrid, ISPIDER, Obesity e‑Lab, Polymnia, Presto. Space, Sys. Mo-DB, Bio. Catalogue, ONDEX, Shared Genomics, EMBRACE, e-Lico, UTOPIA, Nei. SS Infrastructure Development – OMII-UK Node, OGSA-DAIT, GRIMOIRES, ca. BIG-Taverna, Lilly Science Grid, ENGAGE Sustainability – ESouth East, ESNW Networking, Collaboration and Training – Linkup sisters network, SOCA, e-Health+ Cluster Applied Research – my. IB, my. Tea, my. Grid Platform, my. Experiment Repository, Platform renewal, Life. Guide, Onto. Grid, RSSGrid Fundamental CS Research – PASOA, Dynam. O, REIO, Qurator, Dataspaces
? ? Using My. Grid? ? Facilitating Science with E-Science • Systems biology – Dynamics and Function of NF-kappa. B Signalling System – Target Practice - informatic and metabolomic assessment of biological network changes and of drug-cell interactions – Manchester Centre for Integrative Systems Biology – e-Fungi – Microbase • Micro-array analysis – Functional genomics of host tolerance and host-pathogen interactions in vector born parasitic diseases of cattle – Genome Microarray support – Microarray studies of T cells in chronic parasitic infection. – Provision of support for microarray bioinformatics to the Environmental Genomics Initiative • Pharma – Virtual Drug Production Environment • Medical – Autoimmune thyroid disease on chromosome ? ?
Research Highlights 195 publications to date Journals: 54 Conf: 125 Chap: 19 Taverna book Computing Workflows, Semantic Grid, Social Computing, Distributed Computing, Provenance, Logic Conc and Comp: Prac and Exp, 18(10), ISI: 51 GS: 204 Bioinformatics, 20(17), ISI: 220 GS: 420 Nucleic Acids Research, Briefing in Bioinformatics, JBI 55 Keynotes
Research Highlights 195 publications to date Journals: 54 Conf: 125 Chap: 19 Taverna book Computing Workflows, Semantic Grid, Social Computing, Distributed Computing, Provenance, Logic Conc and Comp: Prac and Exp, 18(10), ISI: 51 GS: 204 Bioinformatics, 20(17), ISI: 220 GS: 420 Nucleic Acids Research, Briefing in Bioinformatics, JBI 55 Keynotes
Research Highlights 30
Community Impact ? ? & Technology Transfer? ? • GGF / OGF – WG: OGSA-DAIS, – RG: SEM-GRID, Workflows • I 3 C/OMG – Life Science Identifiers early adopter • W 3 C – Semantic Web, Provenance Incubator • IPAW: Provenance Challenge • SAGE Bio. Networks, Concept Web Alliance, Elixir
Knowledge Transfer • Between e-Science projects – Best Practice awards – New pilots • Visits and PALs – Link-up sisters, OMII-UK, Platform and other grants • Networks and joint projects • Commercial support – Eagle Genomics • Relentless promotion – Meetings, demos, exhibitions – Taverna Handbook. – ? ? ? Massive take-up? ? ?
Community Collaborations Joint work. Visits. Exchanges. Plug-ins. Workflow and Service content. Applications. Research. Community Infrastructure Initiatives – SAGE Bionetworks, Concept Web Alliance, Bio. Sharing, W 3 C HCLS … International Bioinformatics Groups – Dutch National Bioinformatics Centre … Projects – Know. ARC, IMPACT, ca. BIG … University Groups and Institutes – XX USA – XX Europe XX Canada XX SE Asia and Australia Commercial – Microsoft Corp, Eli Lilly, Eagle Genomics, Syngenta, IBM Almaden, Emergent Technologies, Seek. Da PALS focus groups.
Community Capacity Building • >40 research and software engineering staff. • 17 postgrad students. • 10 long-stay visitors. • 56 tutorials to >820 people. • >20 universities, national and international Life Science institutes, and networks. • Tutorials – Major Bio conferences – Summer schools in Biology and Middleware. • Developer and User Days • Annotation Jamborees
The Future Platforms for e-Science • Production Open Source Enterprise Platform • e-Laboratories • Packages – Chem. Taverna • New Applications Sustain Research in CS • • Commercial Support – Eagle Semantics, social computing, workflows Genomics Ltd • Not for Profit Alliance – Emergence Technologies Outreach • Training, networks, initiatives • New awards like SAGE bionetworks, CWA
Summary • We built a platform that has been adopted – The most widely adopted open workflow system in the community – Adopted workflow and service resources • e-Science delivery to Science in the field; new e. Science research and CS research • Future funding and future plans. – For commercial support. – For wider adoption within and without life sciences. • Strategy: – Research Council funding for pilots, application projects and best practice transfer; – Core programme OMII-UK for production quality. • Production step crucial for real and trusted adoption beyond the project.
• • Acknowledgements David De Roure, Norman Paton, Robert Stevens, Andy Brass, Anil Wipat Matt Lee, Don Cruickshank, Jiten Bhagat, David Newman, Mark Borkum, Danius Michaelides, Ed Zaluska, Jeremy Frey, Simon Coles, Rob Procter, Alex Voss, Sergejs Aleksejevs Marco Roos, Duncan Hull, Paul Fisher, Hannah Tipney, Jo Pennock, Peter Li Simon Pearce, Claire Jennings, May Tassabehji, Steve Kemp Katy Wolstencroft, Franck Tanoh David Withers, Alan Williams, Stuart Owen, Stian Soiland-Reyes, Alex Nenadic, Bharathi Kattamuri Kaixuan Wang, Antoon Goderis, Jun Zhao, Martin Szomszor, Tracy Craddock, Alastair Hampshire, Qiuwei Yu Khalid Belhajjame, Paolo Missier, Oscar Corcho • • Martin Senger, Rodrigo Lopez, Tom Oinn, Thomas Laurent, Hamish Mc. William, Eric Nzuobontane (EBI) Alvaro Fernandes, Robert Gaizaukaus, Chris Greenhalgh, Peter Rice, Paul Watson June Finch, Pinar Alper, Phil Lord, Chris Wroe, Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, , Justin Ferris, Kevin Glover, Mark Greenwood, Yikun Guo, Ananth Krishna, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Juri Papay, Matthew Pocock Milena Radenkovic, Stefan Rennick. Egglestone, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, and Chris Wroe. Mark Wilkinson (Bio. MOBY) Savas Parastatidis (Microsoft) Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM) Robin Mc. Entire (GSK) Keith Decker, Ravi Madduri And the rest of my. Grid!
More Information • my. Grid – http: //www. mygrid. org. uk • Taverna – http: //www. taverna. org. uk • my. Experiment – http: //www. myexperiment. org • Bio. Catalogue – http: //www. biocatalogue. org • ca. Grid-Taverna pilot – http: //www. cagrid. org/wiki/Ca. Grid: How. To: Create_Ca. Grid_Workflow_Using_Taverna – http: //dev. globus. org/wiki/Using_g. RAVI_Services_in_Taverna • UTOPIA – http: //utopia. cs. man. ac. uk/utopia/
775b4d395438b71ab42e2bf911648ad1.ppt