f555f90486bf5c5da86e2dba4b236938.ppt
- Количество слайдов: 148
Middleware for in silico Biology Professor Carole Goble University of Manchester http: //www. mygrid. org. uk GGF Summer School 24 th July 2004, Italy
Vision: The Grid computing has emerged as an important new field, distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high-performance orientation. . . we [define] the "Grid problem”…as flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources - what we refer to as virtual organizations From "The Anatomy of the Grid: Enabling Scalable Virtual Organizations" by Foster, Kesselman and Tuecke GGF Summer School 24 th July 2004, Italy
Knowledge workers, fluid communities • Capturing, generating, gathering, integrating, sharing, processing, analysing, weeding, cleaning, correlating, archiving, retiring knowledge • Much of it not theirs & • not of their creation • Much of it destined for • others GGF Summer School 24 th July 2004, Italy Know-how as important as knowwhat Know-why, when, where, who as important
Roadmap • Part 1 – Application context • Part 2 – Architecture – Information and Workflows – Semantics and provenance • Part 3 – Wrap up GGF Summer School 24 th July 2004, Italy
my. Grid is an EPSRC funded UK e. Science Program Pilot Project Particular thanks to the other members of the Taverna project, http: //taverna. sf. net GGF Summer School 24 th July 2004, Italy
Application Testbeds Grave’s Disease • • Simon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical Medical Sciences, University of Newcastle Autoimmune disease of the thyroid Discover all you can about a gene: Affymetrix microarray analysis, Gene annotation Services from Japan, Hong Kong, various sites in UK Williams-Beuren Syndrome • • Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester, UK Microdeletion of 155 Mbases on Chromosome 7 Characterise an unknown gene: Gene alerting service, gene and protein annotation Services from USA, Japan, various sites in UK Trypanosomiasis in cattle • • Steve Kemp, University of Liverpool, UK Annotation pipelines and Gene expression analysis Services from USA, Japan, various sites in UK GGF Summer School 24 th July 2004, Italy
Point, click, cut, paste Slide courtesy of GSK ID DE DE DE GN OS OC OC KW FT FT SQ MURA_BACSU STANDARD; PRT; 429 AA. PROBABLE UDP-N-ACETYLGLUCOSAMINE 1 -CARBOXYVINYLTRANSFERASE (EC 2. 5. 1. 7) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINE ENOLPYRUVYL TRANSFERASE) (EPT). MURA OR MURZ. BACILLUS SUBTILIS. BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE; BACILLUS. PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE. ACT_SITE 116 BINDS PEP (BY SIMILARITY). CONFLICT 374 S -> A (IN REF. 3). SEQUENCE 429 AA; 46016 MW; 02018 C 5 C CRC 32; MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI GGF Summer School 24 th July 2004, Italy
Life Sciences: knowledge generation GGF Summer School 24 th July 2004, Italy • Informational Science • Large Scale • Distributed • No one organisation owns it all • Integrating across scales, models, types, communities • Small groups drawing on pooled resources
Data deluge, processing bottleneck Metabolic Pathways Pharmacogenomics Human Genome Combinatorial Chemistry Computational Load ESTs Genome Data Moores Law 1990 GGF Summer School 24 th July 2004, Italy 2000 2010
Union of lots of small experiments billions millions Hundred thousands Protein-Protein Interactions metabolism pathways receptor-ligand 4º structure Proteins sequence 2º structure 3º structure MPMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYT. . . billions DNA sequences alignments . . . atcgaattccaggcgtcacattctcaattcca. . . millions GGF Summer School 24 th July 2004, Italy Physiology Cellular biology Biochemistry Neurobiology Polymorphism Endocrinology and Variants genetic variants etc. individual patients epidemiology Genetics and Maps Linkage Cytogenetic Clone-based millions ESTs Expression patterns Large-scale screens
What data do I get? • Descriptive as well as numeric • Literature • Analogy/ knowledgebased GGF Summer School 24 th July 2004, Italy
The bottleneck is not computation Its integration GGF Summer School 24 th July 2004, Italy
WBS Workflows: Pink: Outputs/inputs of a service Purple: Taylor-made services Green: Emboss soaplab services Yellow: Manchester soaplab services Grey: Unknowns Query nucleotide sequence Repeat. Masker Interoperability ncbi. Blast. Wrapper Gen. Bank Accession No URL inc GB identifier Translation/sequence file. Good for records and publications prettyseq Gen. Bank Entry Amino Acid translation Identifies PEST seq Identifies Finger. PRINTS MW, length, charge, p. I, etc Predicts Coiled-coil regions tblastn Vs nr, est_mouse, est_human databases. Blastp Vs nr Predicts cellular location Sort for appropriate Sequences only epestfind 6 ORFs Seqret pscan pepstats pepcoil Nucleotide seq (Fasta) sixpack ORFs transeq Repeat. Masker Coding sequence Signal. P Target. P PSORTII restrict cpgreport Identifies functional and structural domains/motifs Hydrophobic regions Gen. Scan ncbi. Blast. Wrapper Inter. Pro PFAM Prosite Smart Pepwindow? Octanol? GGF Summer School 24 th July 2004, Italy Repeat. Masker ncbi. Blast. Wrapper Restriction enzyme map Cp. G Island locations and % Repetative elements Blastn Vs nr, est databases.
The problem Two major steps: • Extend into the gap: Similarity searches; Repeat. Masker, BLAST • Characterise the new sequence: NIX, Interpro, etc… • • Numerous web-based services (i. e. BLAST, Repeat. Masker) Cutting and pasting between screens Large number of steps Frequently repeated – info now rapidly added to public databases Don’t always get results Time consuming Huge amount of interrelated data is produced – handled in lab book and files saved to local hard drive • Mundane • Much knowledge remains undocumented • Bioinformatician does the analysis GGF Summer School 24 th July 2004, Italy
Classical Approach to the Bioinformatics Data Analysis - Microarray Study Annotations for many different Genes Import microarray data to Affymetrix data Mining Tool, Run Analyses and select Select Gene and Visually examine SNPS lying within GGF Summer School 24 th July 2004, Italy Experiment Design to test Hypotheses Find restriction sites and design primers by eye for genotyping experiments
The Graves’ Disease Scenario Microarray data Storage + analysis Annotation Pipeline Gene & SNP Characterisation Experimental Design GGF Summer School 24 th July 2004, Italy
Experiment life cycle Forming experiments Discovering and reusing experiments and resources Personalisation Executing and monitoring experiments Sharing services & experiments Managing lifecycle, provenance and results of experiments The Grid is a technology; the scientist wants a solution. GGF Summer School 24 th July 2004, Italy
GGF Summer School 24 th July 2004, Italy
Service Registration GGF Summer School 24 th July 2004, Italy
Portal GGF Summer School 24 th July 2004, Italy
GGF Summer School 24 th July 2004, Italy
GGF Summer School 24 th July 2004, Italy
GGF Summer School 24 th July 2004, Italy
Status reporting GGF Summer School 24 th July 2004, Italy
GGF Summer School 24 th July 2004, Italy
GGF Summer School 24 th July 2004, Italy
Results displayed using Cinema GGF Summer School 24 th July 2004, Italy
WBS Life Cycle • • • Wrap services as web services Register them Build a workflow using the services Evolve the workflow Run it over and over again in case data has changed Record results & provenance Inspect and compare results & provenance Set up event notification to fire the workflow Set up a portal to run the workflow Publish the workflow template in a registry to share with the world GGF Summer School 24 th July 2004, Italy
Delivering results William-Beuren Syndrome • Cuts down the time taken to perform one pipeline from 2 weeks to 2 hours • Much more systematic collection and analysis. More regularly undertaken. Less boring. Less prone to mistakes. • Once notification installed won’t even have to initiate it. • Possible lead already found – but I can’t tell you. • Benchmark: first run though of two iterations of workflows – Reduced gap by 267 693 bp at its centrmeric end – Correctly located all seven known genes in this region – Identified 33 of the 36 known exons residing in this location GGF Summer School 24 th July 2004, Italy
Delivering results • Easy to get started with Taverna • Sharing happens – IPR issues, and suspicions still abound • Network effect necessary and happens • Managed the transition from generic middleware development to practical day to day useful services. • Architecture is solid. • SOA – good idea GGF Summer School 24 th July 2004, Italy
GGF Summer School 24 th July 2004, Italy
Virtual organisations Service & Platform Administrators Bioinformaticians Service Providers Service Workflow Information Reuse Annotation providers Registries m. IRs Resources Biologists Tool & middleware developers GGF Summer School 24 th July 2004, Italy
Collaborative e-Science • High level services for e-Science experimental management; – Scientific discovery is personal & global. – Federated third party registries for workflows and services – Workflow and service discovery for reuse and repurposing GGF Summer School 24 th July 2004, Italy Annotate Registry Register • Sharing knowledge and sharing components Find – Provenance – Event notification – Personalisation
Roadmap • Part 1 – Application context • Part 2 – Architecture – Information and Workflows – Semantics and provenance • Part 3 – Wrap up GGF Summer School 24 th July 2004, Italy
Key Characteristics • Data Intensive, Up stream analysis • Pipelines - experiments as workflows (chiefly) • Adhoc exploratory investigative workflows for individuals from no particular a priori community • Openness – the services are not ours. • Low activation energy, incremental take on • Foundations for sharing knowledge and sharing experimental objects • Multiple stakeholders • Collection of components for assembly GGF Summer School 24 th July 2004, Italy
In a nutshell • Bioinformatics toolkit • Open (Web) Services – my. Grid components and external domain services – Publication, discovery, interoperation, composition, decommissioning of my. Grid services – No control or influence over domain service providers • Metadata Driven – LSIDs, Common information model, Ontologies, Semantic Web technologies • Open extensible architecture – Assemble your own components – Designed to work together GGF Summer School 24 th July 2004, Italy – Loosely coupled Semantic Discovery Feta View UDDI registry Pedro Haystack Provenance Browser Taverna Wf. DE Gateway & CHEF Portal Freefluo Event Notification Wf. EE Info. Model Soaplab Gowlab LSID m. IR
Platform • Standards based • (Web) Service Oriented Architecture – Publication, discovery, interoperation, composition, decommissioning of my. Grid services – Web services communication fabric – XML document types – LSIDs for identifying resources • Implemented in Java using Axis and Tomcat – WS-I -> OGSA / WSRF • Metadata driven – RDF-coded metadata – OWL-coded ontologies – Common information model GGF Summer School 24 th July 2004, Italy
Stakeholders • • • Middleware for my. Grid users Tool Developers IS specialists biologists Bioinformaticians systems tool Service Providers infrequent builders administrators problem Biologists are specific bioinformaticians service indirectly supported provider by the portals and bioinformatics tool builders apps these develop. annotators GGF Summer School 24 th July 2004, Italy
Collections of Tasks Domain Tasks Building Service Providers Workflow Enactment Bioinformaticians Scientists Data Management Storage Description Provenance Finding Service Discovery Querying GGF Summer School 24 th July 2004, Italy Annotation providers
Investigation = set of experiments + metadata • Experimental design components • Experimental instances that are records of enacted experiments • Experimental glue that groups and links design and instance components • Life Science IDs, URIs, RDF GGF Summer School 24 th July 2004, Italy
Experimental entities GGF Summer School 24 th July 2004, Italy
Tool Providers Web Portal Haystack LSID Launch pad e-Science Mediator UDDI Registries Feta Service & WF Discovery Ontologies Ontology Mgt Views Metadata Store Free. Fluo Workflow Enactment Engine Provenance Mgt LSID Authority Event Notification Service Information Repository OGSA-DQP Distributed Query Processor Soap. Lab Gow. Lab GGF Summer School 24 th July 2004, Italy Legacy apps Native Web Services AMBIT Text Extraction Service External services Web Service (Grid Service) communication fabric Core services Service Providers Taverna Workbench Applications Bioinformaticians my. Grid Service Stack
Apps Service stack LSID Launch Pad Taverna workbench Haystack Web Portal e-Science process patterns External services Service & workflow discovery ! Metadata management ! Data management ! Workflow enactment ! e-Science event bus Core services e-Science Mediator Web Service (Grid Service) communication fabric Soap. Lab Gow. Lab GGF Summer School 24 th July 2004, Italy Legacy apps Websites Native Web Services AMBIT Text Extraction Service
20, 000 feet Provenance and Data browser Haystack or Portal Taverna Workbench Semantic Discovery & Registration View Service LSID Authority UDDI m. IR data m. IR metadata Store Service Event Notification Service GGF Summer School 24 th July 2004, Italy Freefluo Workflow Engine Web services, local tools User interaction etc.
e-Science Mediator 1. Application oriented: directly supports the e. Scientist by: • providing pre-configured e-Science processes templates (i. e. system -level workflows) • helping in capturing and maintaining context information (via the information model) that is relevant to the interpretation and sharing of the results of the e-science experiments. • Facilitating personalisation and collaboration 2. Middleware oriented: contributes to the synergy between my. Grid services by: • Acting as a sink for e-Science events initiated by my. Grid components • Interpreting the intercepted events and triggering interactions with other related components entailed by the semantics of those events • Compensating for possible impedance mismatches with other services both in terms of data types and interaction protocols GGF Summer School 24 th July 2004, Italy
Supporting the e-scientist Find Workflow Use-case • Recurring use-cases can be captured • Then corresponding process templates can be authored • e-science mediator makes processes available to the user GGF Summer School 24 th July 2004, Italy Find an interesting workflow for experiment Examine and modify if necessary Find Workflow Process Create exp. Context for this user launch semantic Search facility Launch workflow Editor for selected WF Store to personal repository For later re-use Enable MIR browser For storage with context
Mediating between services Example: mediation during a workflow execution 2: Establish experiment/user context [*]4: link process trace to context 7: get WF results 1: Execution started [*]3: intermediate process completed 6: workflow completed E-Science Mediator 9: notify WF completion to subscribers Notification Service [*]5: Store intermediate process trace 8: Store WF results MIR GGF Summer School 24 th July 2004, Italy WF Enactor
Simplified Architecture Context preserved via my. Grid Inormation Model Client Side The Grid GUI (e-science workbench) Client-side e-science process logic Server-side e-science process logic Service Registry GGF Summer School 24 th July 2004, Italy E-Science Mediator client-stubs E-Science Mediator Service Notification Service MIR WF Enactor
Event notification Service • Publish/subscribe model – Topic based (cf. JMS topics, CORBA channels) – Hierarchic topics – Persistent event storage – Subscription leases – Federation for scalability & reliability – Event filtering GGF Summer School 24 th July 2004, Italy http: //cvs. mygrid. org. uk/notification-stable/downloads
Portal toolkit for bioinformaticians • Target application – Williams-Beuren Syndrome – Fixed set of workflows • Extra my. Grid portlets – – – Configurable Workflow enactment Workflow scheduling Completion notification Results browsing Portlet Container Interface Portlet • Based on CHEF & Jetspeed-1 Portlet – Portlets for team collaboration GGF Summer School 24 th July 2004, Italy
Text Services User Client XScufl workflow definition + parameters Workflow Server Clustered Pub. Med Ids + titles Initial Cluster Workflow Abstracts Workflow Swissprot/Blast Enactment record Extract Get Related Pub. Med Id Abstracts Term-annotated Medline abstracts Medline Server (Sheffield) Pub. Med Ids Medline: pre-processed offline to extract biomedical terms + indexed GGF Summer School 24 th July 2004, Italy Pub. Med Ids Get Medline Abstracts
Roadmap • Part 1 – Application context • Part 2 – Architecture – Information and Workflows – Semantics and provenance • Part 3 – Wrap up GGF Summer School 24 th July 2004, Italy
Information Model v 2 my. Grid components form a loosely coupled system An Information Model for e-Science experiments Based on CCLRC scientific metadata model XML messages between services conform to the IMv 2 Domain specific Domain neutral http: //cvs. mygrid. org. uk/cgi-bin/viewcvs. cgi/mygrid/MIR/model/ Nick Sharman, Nedim Alpdemir, Justin Ferris, Mark Greenwood, Peter GGF Summer School 24 th July 2004, Italy Li, Chris Wroe, The my. Grid Information Model, Proc UK e-Science 2 nd All Hands Meeting, Nottingham, UK 1 -3 Sept 2004.
Information Model v 2 my. Grid components form a loosely coupled system An Information Model for e-Science experiments Based on CCLRC scientific metadata model XML messages between services conform to the IMv 2 Domain specific Scientific data and the Life Science Identifier Types, Values and Documents Molecular Biology Bioinformatics Resources and Ids Domain neutral Provenance information Annotation and Argumentation e-Science process, experimental methods GGF Summer School 24 th July 2004, Italy People, teams and organizations
Layered Semantics • Domain Semantics layered on top of domain neutral but scientific data model • Reducing the activation energy, lowering barriers of entry. Ontologies IMv 2 Format XSD types MIME types Domain Semantics Data Metadata Workflow metadata Experiment Semantics Service Metadata Provenance metadata Syntax Workflow OGSA-DQP GGF Summer School 24 th July 2004, Italy
Experimental entities GGF Summer School 24 th July 2004, Italy
View over the MIR GGF Summer School 24 th July 2004, Italy
Life Science IDs • Each database on the web has: – Different policies for assigning and maintaining identifiers, dealing with versioning etc. – Different mechanism for retrieving an item given an ID. • Life Science IDs designed to harmonise the retrieval of data. • Emerging standard for bioinformatics – I 3 C, OMG Life Sciences Group, W 3 C • Defines: – URN for life science resources GGF Summer School 24 th July 2004, Italy – SOAP (and other) interfaces for LSID assignment, T. Clark, S. Martin & T. Liefeld: Globally distributed object identification for biological knowledge bases, Briefings in Bioinformatics Vol 5 No 1 pp 59 -70, March 2004 LSID resolution & resolution discovery services
What is an LSID? urn: lsid: Authority. ID: Namespace. ID: Object. ID: [ Revision. ID] urn: lsid: ncbi. nlm. nig. gov: Gen. Bank: T 48601: 2 urn: lsid: ebi. ac. uk: SWISSPROT. accession: P 34355: 3 urn: lsid: rcsb. org: PDB: 1 D 4 X: 22 • LSID Designator: A mandatory preface that notes that the item being identified is a life science-specific resource • Authority Identifier: An Internet domain owned by GGF Summer School 24 th July 2004, Italy the organization that assigns an LSID to a
LSID Properties • Unique authority for each identifier • Multiple resolution services, supporting: – Data retrieval – data immutable: data returned for a given LSID must always be the same • caches – Metadata retrieval – mutable and resolverspecific • annotation services. More on this in Part 4 • Resolution discovery service – Implemented over DNS/DDNS (Optional) • Authority commitment: must always maintain an authority at e. g. pdb. org that GGF Summer School 24 th July 2004, Italy
How is data retrieved? Application 2. Where can I get data and metadata for urn: lsid: pdb. org. 1 AFT PDB Authority @ pdb. org 1. Get me info for: urn: lsid: pdb. org: 1 AFT LSID client PDB Data resolver PDB database PDB Metadata resolver 2. Get me the data and metadata for: urn: lsid: pdb. org: 1 AFT GGF Summer School 24 th July 2004, Italy
LSID Components • IBM built client and server implementations in Perl, Java, C++ • Straightforward to wrap an existing database as a source of data or metadata • Client simple to use • LSID Launchpad adds LSID GGF Summer School 24 th July 2004, Italy resolution to http: //www-124. ibm. com/developerworks/oss/lsid/
Use within my. Grid • Needed an identifier for our own experimental resources – workflows, experiments, new data results etc • All and everything identified with LSIDs • LSID saves us having to invent our own conventions and code. • Can pass references to data around and be reassured the other party will know how to resolve that reference • Resolution services: – Data: my. Grid Information Repository (MIR) – Metadata: my. Grid Metadata Store (RDF-based) GGF Summer School 24 th July 2004, Italy
LSID Assignment 4. Data and metadata retrieved Data LSIDs Client application Metadata Requests LSID Assigning Service LSID Authority LSID Metadata Resolver LSID Data Resolver 2. New LSIDs assigned to data m. IR Store plug-in Services Enactor 1. Data sent/ received from services Metadata plug-in Workflow design User context GGF Summer School 24 th July 2004, Italy Metadata Store 3. Data / Metadata stored
Information Storage • The MIR data store • Stores experimental components – Workflow specs as XML Scufl docs – Data, XML notes – Types: XML docs, Relational • Every entry has Dublin Core provenance attributes • Every entry can have (multiple) ontology GGF Summer School 24 th July 2004, Italy expressions
Metamodel for Types • Necessary to identify the type and format of each datum of interest so that it can (only) be input to type-compatible viewers, services and workflows. • Can’t fix this – working in an open world. There are many established, de facto and locally preferred types & formats. Define common bio-types a fool’s errand. GGF Summer School 24 th July 2004, Italy
Intermediate Results GGF Summer School 24 th July 2004, Italy
Results Management • Taverna/Freefluo Wf. EE agnostic about the data flowing through it. • As objects progress through tagged with terms from ontologies, free text descriptions and MIME types, and which may contain arbitrary collection structures. • Using the metadata hints we can locate GGF Summer School 24 th July 2004, Italy and launch
GGF Summer School 24 th July 2004, Italy
Results Amplification One input • Automated annotation workflows produce lots of heterogeneous data • The workflows changed how scientist works. • Before: analyse results as go along • After: all results, all the analysis, in one go • Intermediate results management and associated provenance management essential • Domain specific visualisation Many outputs GGF Summer School 24 th July 2004, Italy
GGF Summer School 24 th July 2004, Italy
Domain Services • Native WSDL Web services – DDBJ, NCBI BLAST, Path. Port • Bio. MOBY Web services – Single function stereotype • Wrapped legacy services – Stateful interaction stereotype – One button wrapping – Soap. Lab for command-line tools GGF Summer School 24 th July 2004, Italy – Gow. Lab for screen scraped For each application Create. Job Run Wait. For Get. Results Destroy
Domain Services • • • Lots of them ~ 300 Open world: we don’t own them Many produce text not numbers Many are unique, single site Need lots of genuine redundant replica services • Unreliable and unstable – Research level software – Reliant on other peoples servers • Services in the wild rare significant time to wrap applications as web services GGF Summer School 24 th July 2004, Italy (licensing, installation, Domain Services in WBS • • • • Repeatmasker NCBI_BLAST Modified BLAST Gen. Scan PSORTII i. PSORT Target. P Various EMBOSS services Inter. Pro. Scan BLAST 2 NIX TESS TWINSCAN
Can you guess what it is yet? GGF Summer School 24 th July 2004, Italy
SHIM Services Main Bioinformatics Applications Services SHIM Main Bioinformatics Services Application GGF Summer School 24 th July 2004, Italy • Explicitly capturing the process • Unrecorded ‘steps’ which aren’t realised until attempting to build something • Services that enable domain services to fit
Workflow development and enactment • Freefluo workflow enactment engine –Processor & event observer plugin support • Taverna development and execution environment –Workbench, workflow editor, tool plug-in support • http: //taverna. sourceforge. net • Simple conceptual unified flow GGF Summer School 24 th July 2004, Italy language (XScufl) wraps up units of
tree structure explorer graphical diagram Results in enactor invocation window GGF Summer School 24 th July 2004, Italy service palette shows a range of operations which can be used in the composition of a workflow
Workflow environment • Taverna API acts as an intermediate layer between user level applications and workflow enactors such as Free. Fluo. • Includes object models using a standard MVC design for both workflow definitions and data objects within a workflow • Implicit iteration and data flow • Data sets and nested flows GGF Summer School 24 th July 2004, Italy • Configurable failure handling
Scufl-Taverna-Free. Fluo • SCUFL - Simple Conceptual Unified Flow Language • Started with WSFL … SCUFL provides a much higher level view on workflows, and therefore simpler and more user-focused. • Simple – relies upon an inherently connected environment to reduce the GGF Summer School 24 th July 2004, Italy quantity of information
Scufl • Conceptual – one Processor in a SCUFL workflow maps as far as is possible to one conceptual operation as viewed by a non expert user – Wrap up stateful service interactions into custom Processor GGF Summer School 24 th July 2004, Italy implementations Taverna Workbench Scufl language parser Freefluo Workflow Enactor Core Processor Web Service Soap lab Bio MOBY Processor Local App Processor Enactor
Scufl • Unified Flow Language – SCUFL does not dictate how the workflow is to be enacted, it is inherently declarative in intent. • Can potentially be translated to other Web workflow languages. Service • GGF Summer School 24 th July 2004, Italy Can be arbitrarily Processor Taverna Workbench Scufl language parser Freefluo Workflow Enactor Core Processor Soap lab Bio MOBY Processor Local App Processor Enactor
• One input, three outputs and eight processors. • All the processors are labeled top to bottom with input ports, processor name and output ports. • All the processors here are standard WSDL-described GGF Summer School 24 th July 2004, Italy standard web
GGF Summer School 24 th July 2004, Italy
Workflow script Workflow In and Outs Failure policy Service Discovery Services Alternates list Invocation + Data Metadata template External Data Store Enactor LSID Data LSID + Data MIR Data Store GGF Summer School 24 th July 2004, Italy LSIDs + Metadata MIR Metadata Store Events Event Notification Service
Fault tolerance • Failure of workflow engine – P 2 P architecture – XML serialisation – Checkpointing Retry, delay and backoff configuration • Failure of services or network – User defined retry policy – Alternate replicas – Alternate list GGF Summer School 24 th July 2004, Italy • Automatic choices for Alternate Processor
Fault tolerance scheduled and waiting for data aborted data ready types match creating alternate processor can iterate data mismatch constructing iterator invoking instantiation error aborted waiting to retry error timeout done iterating success complete invoking with implicit iteration retries left alternate available waiting to retry adding item to result data set error timeout retries left service failure GGF Summer School 24 th July 2004, Italy aborted allow partials success
Status reporting GGF Summer School 24 th July 2004, Italy
Whither BPEL? • Focus: scripting simple request/response services vs. choreographing business processes • Complexity: Scufl is simple enough for bioinformaticians to develop workflows • Generality: Extensible processor support vs. Web Services only • Provenance generation GGF Summer School 24 th July 2004, Italy
What needs to be done • Free-standing web service • Long-running workflows – Computationally-intensive services – Access to a reliable high performance BLAST service that reflects NCBI Blast – NCBio. Grid? • Scalability – Large documents – data staging • Debugging environment – services / workflows are brittle. • Interactivity – Version 1 had user proxy as an actor – The Original Process split into 3 steps: GGF Summer School 24 th July 2004, Italy • Identification of candidate overlapping nucleotide sequences • Characterisation of nucleotide sequence
OGSA-DQP http: //www. ogsa-dai. org. uk/dqp GGF Summer School 24 th July 2004, Italy • Used in Grave’s Disease • Uses OGSA-DAI data access services to access individual data resources. • A single query to access and join data from more than one OGSA-DAI wrapped data resource. • Supports orchestration of
Roadmap • Part 1 – Application context • Part 2 – Architecture – Information and Workflows – Semantics and provenance • Part 3 – Wrap up GGF Summer School 24 th July 2004, Italy
Finding and selecting services Activation energy gradient Unregistered services • Scavenging • URLs and Soaplab endpoints – Introspection Registered services • Word-based searching • Semantic annotation for later discovery and (re)use by friends and strangers in your VO (Part 3) Drag and drop services onto Taverna workbench GGF Summer School 24 th July 2004, Italy
Registry View Service • • Registry Third party registries Third party services Third party annotation (RDF) • Views over federated registries • UDDI interfaces extended with RDF • Federated views – Updated via Notification Service – Personalized based on Annotation GGF Summer School 24 th July 2004, Italy • Authorisation and IPR
Semantic discovery • User chooses services GGF Summer School 24 th July 2004, Italy • A common ontology is used to annotate and query any my. Grid object including services. • Discover workflows and services described in the registry via Taverna. • Look for all workflows that accept an input of semantic type nucleotide sequence • Aim to have semantic discovery over public view on the Web.
Workflow and service annotation • Adding structured metadata to a workflow registration to enable others to discover and reuse it more effectively. E. g. what semantic type of input does it accept. GGF Summer School 24 th July 2004, Italy
Can you guess what it is yet? GGF Summer School 24 th July 2004, Italy
Service Registration http: //pedro. man. ac. uk GGF Summer School 24 th July 2004, Italy
Semantic Discovery • Drag a workflow entry into the explorer pane and the workflow loads. • Drag a service/ workflow to the scavenger window for inclusion into the workflow GGF Summer School 24 th July 2004, Italy
Annotation Ontologists Ontology Store Description extraction Interface Description Vocabulary Annotation providers Pedro Annotation tool Service Providers Others WSDL Soaplab Annotation/ description Taverna Workbench Registry plug-in Registry (Personalised View) Registry GGF Summer School 24 th July 2004, Italy
Annotation Ontologists Ontology Store Vocabulary Annotation providers Pedro Annotation tool Annotation/ description Scientists Taverna Workbench Store plug-in GGF Summer School 24 th July 2004, Italy Haystack Provenance Browser m. IR
Service Providers Ontology Store Ontologists Vocabulary Others WSDL Soaplab Feta Semantic Discovery Bioinformaticians Registry Taverna Workbench Registry (Personalised View) Feta plug-in Registry Workflow Execution GGF Summer School 24 th July 2004, Italy Free. Fluo Wf. EE Store data & metadata invoking m. IR
Layered Semantics • Domain Semantics layered on top of domain neutral but scientific data model • Reducing the activation energy, lowering barriers of entry. Ontologies IMv 2 Format XSD types MIME types Domain Semantics Data Metadata Workflow metadata Experiment Semantics Service Metadata Provenance metadata Syntax Workflow OGSA-DQP GGF Summer School 24 th July 2004, Italy
Model of services Operation name, description task method resource application subclass Service has. Input has. Output Parameter name, description semantic type format transport type collection format subclass WSDL based operation workflow name description author organisation bio. Moby service GGF Summer School 24 th July 2004, Italy WSDL based Web service Soaplab service Local Java code
Service Ontology Suite Upper level ontology Task ontology Informatics ontology Web service ontology parameters: input, output, precondition, effect performs_task uses-resource is_function_of Inspired by DAML-S Molecular biology ontology Publishing ontology Organisation ontology Bioinformatics ontology GGF Summer School 24 th July 2004, Italy Current work: Joint development on an Open Biological Ontologies Bio. Service Ontology. http: //obo. sourceforge. net/
Workflow metadata Three stages in lifecycle: 1. Workflow creation • 3. Workflow harmonization Service discovery 2. Workflow resolution • Service selection • • • Reconciling parameters Format transformations Invocation and harmonization Stage of invocation DBJ BLAST service Soaplab BLAST service Creating a job Configuring the service Setting input data n/a create. Empty. Job() set_database(database, job) set_query_sequence(qu ery, job) simple. Search (program, database, query) GGF Summer School 24 th July 2004, Italy Running the job run(job)
Tiered specifications Task Service class Specific services IBM Life Sciences service Classes of services Domain “semantic” “Unexecutable” “Potentials” SOAPLAB service set. Program() create. Job() Sequence similarity search set. Database() BLAST service BLAST run() or set. E_value() get. Results() Instances of services Business “operational” “Executable” GGF Summer School 24 th July 2004, Italy “Actuals” blast. Query()
Stratified metadata • Service Type and Class (OWL) • Service Instance (RDF) GGF Summer School 24 th July 2004, Italy
Seven types of service metadata Conceptual Configuration Provenance Operational Invocation model Interface Data format GGF Summer School 24 th July 2004, Italy
Service and Workflow registration allows peer review and publication of e-Science methods. Scufl URI Workflow registry entry Workfllow Executive Summary Descriptions Inputs, Outputs, Tasks, Component resources Operational Descriptions Cost, Qo. S Access rights… Invokable Interface descriptions e. g. XML data types stored WSDL Syntactic descriptions e. g. MIME types RDF Conceptual descriptions OWL/RDF Provenance Descriptions Authors, creation date, institution… GGF Summer School 24 th July 2004, Italy encoded RDF Store • Description scheme • RDFS & DAML+OIL / OWL ontologies of services & biology • Based on DAML-S • Reasoning over OWL descriptions • Query over RDF • Aim to have semantic discovery over public view on the web.
Reflections • Multiple descriptions, multiple interfaces – Users needs – Machine needs • The dimensions of Service Class substitution – Biologists choose experimentally meaningful services and do not want “semantically similar” substitutions; only substituting one instance for another GGF Summer School 24 th July 2004, Italy – Experimentally neutral “glue” services that can be
Reuse and Repurposing • Describing for reuse is challenging – Reuse depends on semantic descriptions and these are costly to produce – Describing for someone else’s benefit – Reuse by multiple stakeholders • • Licensing workflows for reuse. Authorisation models But reuse does happen! Metadata pays off but it needs a network effect and there is a cost. GGF Summer School 24 th July 2004, Italy
So far, Using Concepts • Controlled vocabulary for advertisements for workflows and services • Indexes into registries and m. IR – Semantic discovery of services and workflows – Semantic discovery of repository entries • Type management for composition – Semantic workflow construction: guidance and validation • Navigation paths between data and knowledge holdings – Semantic “glue” between repository entries – Semantic annotation and linking of workflow GGF Summer School 24 th July 2004, Italy provenance logs
Provenance Experiments being performed repeatedly, at different site, different time, by different users or groups; A large repository of records about experiments!! Scientists In silico experiments: GGF Summer School 24 th July 2004, Italy • verification of data; • “recipes” for experiment designs; • explanation for the impact of changes; • ownership; • performance of services; • data quality;
Provenance Web Genomic Project data 1 WSDL service. Invocation 1 data 2 data. Another Process provenance Data provenance Organisation provenance GGF Summer School 24 th July 2004, Italy Knowledge provenance data 3 service. Invocation 2 data 4
Representing links urn: lsid: taverna. sf. net: datathing: 45 fg 6 urn: lsid: taverna. sf. net: datathing: 23 ty 3 • Identify each resource – Life science identifier: URI with associated data and metadata retrieval protocols. – Understanding that underlying data will not change GGF Summer School 24 th July 2004, Italy
Representing links II http: //www. mygrid. org. uk/ontology#derived_from urn: lsid: taverna. sf. net: datathing: 45 fg 6 urn: lsid: taverna. sf. net: datathing: 23 ty 3 • Identify link type – Again use URI – Allows us to use RDF infrastructure • Repositories • Ontologies GGF Summer School 24 th July 2004, Italy
Provenance Pyramid Knowledge Level Data Level Organisation Level GGF Summer School 24 th July 2004, Italy Process Level
Organisation level provenance Process level provenance run. By e. g. BLAST @ NCBI Project Experiment design part. Of Service Process component. Process Workflow design e. g. web service invocation of BLAST @ NCBI instance. Of Event component. Event e. g. completion of a web service invocation at 12. 04 pm Workflow run Data/ knowledge level provenance knowledge statements run for e. g. similar protein sequence to User can add templates to each Person workflow process to determine links between data items. Organisation Data item GGF Summer School 24 th July 2004, Italy data derivation e. g. output data derived from input data
Provenance tracking. . masked_sequence_of project . . nucleotide_sequence >gi|19747251|gb|AC 005089. 3| Homo sapiens BAC clone CTA-315 H 11 from 7, complete sequence AAGCTTTTCTGGCACTGTTTCCTTCTTC CTGATAACCAGAGAAGGAAAAGATCT CCATTTTACAGATGAG GAAACAGGCTCAGAGAGGTCAAGGCT CTGGCTCAAGGTCACACAGCCTGGGA ACGGCAAAGCTGATATTC AAACCCAAGCATCTTGGCTCCAAAGC CCTGGTTTCTGTTCCCACTACTGTCAG TGACCTTGGCAAGCCCT GTCCTCCTCCGGGCTTCACTCTGCAC ACCTGTAACCTGGGGTTAAATGGGCT CACCTGGACTGTTGAGCG • Automated generation of this web of links • Workflow enactor generates experiment definition rdf: type . . part_of urn: lsid: taverna: datathing: 13 . . BLAST_Report . . similar_sequences_to AC 005089. 3 831 Homo sapiens BAC clone CTA-315 H 11 from 7, complete sequence 15145617 clone RP 11 -622 P 13 from 7, complete sequence 15384807 from clone RP 11 -553 N 16 on chromosome 1, complete sequence 7717376 chromosome 21 segment HS 21 C 082 16304790 rdf: type AC 073846. 6 815 Homo sapiens BAC AL 365366. 20 46. 1 Human DNA sequence c. DNA DKFZp 686 G 08119 (from clone DKFZp 686 G 08119) 5629923 BAC RPCI 11 -256 L 6 (Roswell Park Cancer Institute Human BAC Library) complete sequence 34533695 FLJ 45040 is, clone BRAWH 3020486 f 20377057 chromosome 17, clone RP 11 -104 J 23, complete sequence 4191263 from clone RP 4 -715 N 11 on chromosome 20 q 13. 1 -13. 2 Contains two putative novel genes, ESTs, STSs and SSs, complete sequence G 17977487 clone RP 11 -731 I 19 from 2, complete sequence 17048246 chromosome 15, clone RP 11 -342 M 21, complete sequence 14485328 from clone RP 11 -461 K 13 on chromosome 10, complete sequence 5757554 clone RP 3 -368 G 6 from X, complete sequence 4176355 chromosome 4 clone B 200 N 5 map 4 q 25, complete sequence 2829108 group . . author . . works_for person . . author workflow invocation . . run_during . . run_for service description service invocation AL 163282. 2 44. 1 Homo sapiens AL 133523. 5 44. 1 Human chromosome urn: lsid: taverna: datathing: 15 14 DNA sequence BAC R-775 G 15 of library RPCI-11 from chromosome 14 of Homo sapiens (Human), complete sequence 34367431 . . part_of workflow definition . . invocation_of 19747251 organisation . . part_of BX 648272. 1 44. 1 Homo sapiens m. RNA; . . described_by AC 007298. 17 44. 1 Homo sapiens 12 q 22 AK 126986. 1 44. 1 Homo sapiens DNA c AC 069363. 10 44. 1 Homo sapiens AL 031674. 1 44. 1 Human DNA sequence AC 093690. 5 44. 1 Homo sapiens BAC AC 012568. 7 44. 1 Homo sapiens AL 355339. 7 44. 1 Human DNA sequence AC 007074. 2 44. 1 Homo sapiens PAC . . created_by AC 005509. 1 44. 1 Homo sapiens AF 042090. 1 44. 1 Homo sapiens chromosome 21 q 22. 3 PAC 171 F 15, complete sequence . . filtered_version_of A Relationship BLAST report has with other items in the repository GGF Summer School 24 th July 2004, Italy – LSIDs – Data derivation links – Knowledge links B Other classes of information related to BLAST report – Process links – Organisation links
Haystack (IBM/MIT) Gen. Bank record GGF Summer School 24 th July 2004, Italy Portion of the Web of provenance Managing collection of sequences for review
GGF Summer School 24 th July 2004, Italy
Reflections • Visualisation of results usually domain specific • Provenance browsing and querying needs to fit with that visualisation • Generic graphical presentation limited to small, low complexity result sets • Layered provenance for different purposes and different stakeholders – Detailed process for debugging and usage statistics for Qo. S – Data and Knowledge for the Scientist GGF Summer School 24 th July 2004, Italy • Migration with data objects
Map of Context RDF LSID PDF XML Literature relevant to provenance study or data in this workflow Provenance record of a workflow run HTML OWL Ontologies mapping between objects URI XML Web page of people who has related interests as the owner of GGF Summer School 24 th July 2004, Italy the workflow Experiment Notes Interlinking graph of the workflow that generates the provenance logs LSID XML
Provenance metadata • Outside objects – RDF store URI LSID • Within objects – LSID metadata. GGF Summer School 24 th July 2004, Italy LSID metadata URI
Linked Provenance Resources The subsumed concepts Link to the log annotated with more general concept Link to the log annotated with more specific concept GGF Summer School 24 th July 2004, Italy The subsuming concepts
Generating Links The concept The generated Link to related provenance document The name of the data GGF Summer School 24 th July 2004, Italy
Semantics • RDF-based service and data registries • RDF-based metadata for ALL experimental components • RDF-based provenance graphs • OWL based controlled vocabularies for database content • GGF Summer School 24 th July 2004, Italy OWL based integration of Ontology-aided workflow construction RDF-based semantic mark up of results, logs, notes, data entries
RDF in a nutshell • Resource Description Framework • W 3 C candidate recommendation (http: //www. w 3. org/RDF) • Graphical formalism ( + XML syntax + semantics) – for representing metadata – for describing the semantics of information in a machine- accessible way • RDFS extends RDF with “schema vocabulary”, e. g. : – Class, Property has. Colleagu Ia U e – type, sub. Class. Of, sub. Property. Of li n – range, domain • Statements are <subject, predicate, object> GGF Summer School 24 th July 2004, Italy triples:
W 3 C Web Ontology language OWL • The Ontology Language de jour • Continuum of expressivity – Concepts, roles, individuals, axioms – From simple frames to description logics – Sound and complete formal semantics • Supports reasoning to infer classification – Based on the SHIQ description logic GGF Summer School 24 th July 2004, Italy http: //www. w 3. org/TR/2004/REC-owl-features-20040210/ • Eas(ier) to extend and evolve
A pioneer of the… The Semantic Grid is an extension of the current Grid in which information and services are given well -defined and explicitly represented meaning, better enabling computers Semantics in and on the and people Grid to work in cooperation GGF Summer School 24 th July 2004, Italy
Roadmap • Part 1 – Application context • Part 2 – Architecture – Information and Workflows – Semantics and provenance • Part 3 – Wrap up GGF Summer School 24 th July 2004, Italy
Key Characteristics • Data Intensive, Up stream analysis • Pipelines - experiments as workflows (chiefly) • Adhoc exploratory investigative workflows for individuals from no particular a priori community • Openness – the services are not ours. • Low activation energy, incremental take-on • Foundations for sharing knowledge and sharing experimental objects • Multiple stakeholders • Collection of components for assembly GGF Summer School 24 th July 2004, Italy
Forming experiments Discovering and reusing experiments and resources Soaplab Sharing services & experiments GGF Summer School 24 th July 2004, Italy Personalisation Executing and monitoring experiments Managing lifecycle, provenance and results of experiments
Putting the user first User-driven end to end scenarios essential Whole solution that fits with them Users vs Machines (vs Interesting computer science) – Mismatch for information needs • • Scufl instead of BPEL/WSFL Layers of Provenance Service/workflow descriptions for PEOPLE not just machines Bury complexity, increasingly simplify – Bioinformaticans HARDLY EVER want to have their services automatically selected • Except SHIMs, Replicas, User specified equivalences Service providers and developers are users too! GGF Summer School 24 th July 2004, Italy
Security • Single sign-on to my. Grid services • Credentials mapping to external services (though most are open and free) • Policy-driven authorization • Solutions? – PERMIS, Shibboleth, WS-Security, XACML, SAML – FAME/PERMIS, SAM GGF Summer School 24 th July 2004, Italy
Reuse • Describing for reuse is challenging – Reuse depends on semantic descriptions and these are costly to produce – Describing for someone else’s benefit – Reuse by multiple stakeholders • • • Licensing workflows for reuse. Authorisation models But reuse does happen! Other genomic disorders (e. g. sick cows) Metadata pays off but it needs a network effect and there is a cost. GGF Summer School 24 th July 2004, Italy
Personalisation • • • Dynamic creation of personal data sets. Personal views over repositories. Personalisation of workflows. Personal notification Annotation of datasets and workflows. Personalisation of service descriptions – what I think the service does. GGF Summer School 24 th July 2004, Italy
Standards • By tapping into (defacto) standards (LSID, RDF, WS-I) and communities we can leverage others results and tools – Haystack, Pedro, Jena, CHEF/Sakai. • The Grid standards are confusing and volatile – The choice of vanilla Web Services was good. – We didn’t jump to OGSI. We won’t jump to WSRF until its necessary. • And workflow standards have been untimely. GGF Summer School 24 th July 2004, Italy
Where is the WSRF? There isn’t any – vanilla Web Services GGF Summer School 24 th July 2004, Italy
Computational processes • Most service are quick pipes • Long running services – Gene expression clustering service in Hong Kong • parking the data at a URL & notification through polling or email (Grid. FTP, event notification, data staging! – Integrative Biology e-Science pilot follow-on to include simulation services – High throughput BLAST with NCBI update profile • Stateful interactions GGF Summer School 24 th July 2004, Italy
Observations • Show stoppers for practical adoption are not technical showstoppers – Can I incorporate my favourite service? – Can I manage the results? • Service providers are a bottleneck • For every user dedicate a technologist. • Caution against technology push. • Rapid prototyping, deployment, feedback crucial. GGF Summer School 24 th July 2004, Italy
Grid Computing trajectory Virtual organisations with dynamic access to unlimited resources cost Sharing of apps and know-how For all With controlled set of unknown clients Sharing standard scientific process and data, sharing of common infrastructure Between trusted partners CPU intensive workload Grid as a utility, data Grids, robust infrastructure Intra-company, intra community e. g. Life Science Grid CPU scavenging GGF Summer School 24 th July 2004, Italy time
Acknowledgements An EPSRC funded UK e. Science Program Pilot Project Particular thanks to the other members of the Taverna project, http: //taverna. sf. net GGF Summer School 24 th July 2004, Italy
my. Grid People Core • Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pokock Milena Radenkovic, Stefan Rennick-Egglestone, Peter Rice, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson and Chris Wroe. Users • Simon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical Medical Sciences, University of Newcastle, UK • Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester, UK • Steve Kemp, Liverpool, UK Postgraduates • Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, John Dickman, Keith Flanagan, Antoon Goderis, Tracy Craddock, Alastair Hampshire Industrial • Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM) • Robin Mc. Entire (GSK) Collaborators • Keith Decker GGF Summer School 24 th July 2004, Italy
http: //www. mygrid. org. uk Tutorial http: //twiki. mygrid. org. uk/twiki/bin/view/Mygrid/Ne. SCmy. Grid. Tutorial GGF Summer School 24 th July 2004, Italy
Publications • • • P Lord, C Wroe, R Stevens, CA Goble, S Miles, L Moreau, K Decker, T Payne, J Papay, Semantic and Personalised Service Discovery in Proceedings IEEE/WIC International Conference on Web Intelligence / Intelligent Agent Technology Workshop on "Knowledge Grid and Grid Intelligence" October 13, 2003, Halifax, Canada. J Zhao, CA Goble, M Greenwood, C Wroe, R Stevens Annotating, linking and browsing provenance logs for e-Science in 1 st Semantic Web Conference (ISWC 2003) Workshop on Retrieval of Scientific Data, Florida, USA, October 2003 C Wroe, R. D. Stevens, CA Goble, A Roberts, M Greenwood A suite of DAML+OIL ontologies to describe bioinformatics web services and data. International Journal of Cooperative Information Systems. Special issue on Bioinformatics and Biological Data Management 12(2): 197 -224, 2003. C Wroe, CA Goble, M Greenwood, P Lord, S Miles, L Moreau, J Papay, T Payne Experiment automation using semantic data on a bioinformatics Grid, IEEE Intelligent Systems, Jan/Feb 2004 J Zhao, C Wroe, CA Goble, R Stevens, D Quan, M Greenwood, Using Semantic Web Technologies for Representing e-Science Provenance in Proc 3 rd International Semantic Web Conference ISWC 2004, Hiroshima, Japan, 9 -11 Nov 2004. C Wroe, P Lord, S Miles, J Papay, L Moreau, C Goble Recycling Services GGF Summer School 24 th July 2004, Italy and Workflows through Discovery and Reuse to appear in Proceedings UK e-Science All Hands Meeting Nottingham, UK, 1 -3 September, 2004.
Publications • T Oinn, M Addis, J Ferris, D Marvin, M Senger, M Greenwood, T Carver, K Glover, Matthew R. Pocock, A Wipat, P Li. Taverna: A tool for the composition and enactment of bioinformatics workflows accepted for Bioinformatics Journal, 16 June 2004 • T. Oinn, M. Addis, J. Ferris, D. Marvin, M. Greenwood, C. Goble, A. Wipat, P. Li, T. Carver Delivering Web Service Coordination Capability to Users In Thirteenth International World Wide Web Conference (WWW 2004) pp. 438439, New York, May 2004. • M Addis, J Ferris, M Greenwood, D Marvin, P Li, T Oinn and A Wipat Experiences with e. Science workflow specification and enactment in bioinformatics, Proceedings of UK e-Science All Hands Meeting 2003, pages 459 -467 • M. N. Alpdemir, A. Mukherjee, N. W. Paton, P. Watson, A. A. A. Fernandes, A. Gounaris and J. Smith Service-based Distributed Querying on the Grid in the Proceedings of the First International Conference on Service Oriented Computing, 15 -18, December 2003 Trento, Italy. Springer. • J. Smith, A. Gounaris, P. Watson, N. W. Paton, A. A. A. Fernandes and Rizos Sakellariou Distributed Query Processing on the Grid in International Journal GGF Summer School 24 th July 2004, Italy of High Performance Computing Applications, Volume 17, Issue 04,
Publications • • R. Stevens, H. J. Tipney, C. Wroe, T. Oinn, M. Senger, P. Lord, C. A. Goble, A. Brass and M. Tassabehji Exploring Williams-Beuren Syndrome Using my. Grid to appear in Proceedings of 12 th International Conference on Intelligent Systems in Molecular Biology, 31 st Jul-4 th Aug 2004, Glasgow, UK. C. A. Goble, S. Pettifer, R. Stevens and C. Greenhalgh Knowledge Integration: In silico Experiments in Bioinformatics in The Grid: Blueprint for a New Computing Infrastructure Second Edition eds. Ian Foster and Carl Kesselman, 2003, Morgan Kaufman, November 2003. R. Stevens, A. Robinson, and C. A. Goble my. Grid: Personalised Bioinformatics on the Information Grid in proceedings of 11 th International Conference on Intelligent Systems in Molecular Biology, 29 th June– 3 rd July 2003, Brisbane, Australia, published Bioinformatics Vol. 19 Suppl. 1 2003, pp 302 -304. GGF Summer School 24 th July 2004, Italy
f555f90486bf5c5da86e2dba4b236938.ppt