
322b4a7d9962a0201f2f71aa12380624.ppt
- Количество слайдов: 26
Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester ISWC 2005, Galway
Take home message • New problem – Workflow reuse and repurposing is happening, how do we make it scale? • Data: Survey of 6 e-Science middleware projects • Requirements analysis: 7 bottlenecks – Creating a pool of process knowledge – Accessing this pool ISWC 2005, Galway
e-Science • Support sharing and col-laboratories in science • The world of distributed web services – A boom in services: e. g. 1800+ bio services in the my. Grid project • Pulled together as in silico experiments – Scientist-friendly workflow languages – Hard to build (>1 year!) – A boom in workflows? 100 workflows in my. Grid, up to 50 services ISWC 2005, Galway
Evolving e-Science to a Web of Science? • In silico experiments as commodities and know-how • Share, reuse, repurpose – authoring time, quality and provenance collection Manchester, CS Newcastle, CS ISWC 2005, Galway Manchester, Biology
Workflow by example Scientists & developers Discover existing work Edit workflow (repurposing actions) Maintain reuse/repurpose history Try out workflow Register and annotate workflow and new services for reuse 3 rd party annotation providers Deploy workflow Scientists & developers Scientists Wroe, Goble, Goderis, Lord et al. Recycling workflows and services through discovery and reuse. CCPE 2005 ISWC 2005, Galway
Analyze This ISWC 2005, Galway
Analyze This x #scientists x #workflows x #versions x #runs ISWC 2005, Galway
Workflow Describes process Web service Describes process Different workflow languages: BPEL, Scufl etc. SOAP/WSDL interface Orchestration/choreogr aphy of Web and web services Executable with workflow enactor Participant in a workflow Can be published as a web or Web service ISWC 2005, Galway Executable
Workflow reuse Reuse of editable processes Repurpose / build on other people’s work Hackable; change data/control flow Discovery based on data/control flow Measures of aggregated task similarity and flow similarity ISWC 2005, Galway Web service reuse Reuse of encapsulated processes Incorporate other people’s work Parametrisable operations Discovery based on WSDL operations Measures of task similarity
Repurposing, discovery and composition • Discovery – The process of finding, ranking and selecting existing resources • Composition – The process of combining resources into a new working assembly – (auto-) discovery + (auto-) integration • Repurposing – Auto discovery + manual integration – Need techniques for composition-oriented discovery • Discovery supporting integration through rankings ISWC 2005, Galway
A field report of six projects • www. my. Grid. org. uk – reuse by collaborators – personal reuse (versioning) • www. kepler-project. org – 10 complex workflows – reuse of distributed execution models • www. inforsense. com – intranet exchanges within large pharmas • www. geodise. org – 150 Matlab functions, 10 scripts – reuse of function combinations ISWC 2005, Galway
A field report of six projects • www. my. Grid. org. uk – reuse by collaborators – personal reuse (versioning) • www. kepler-project. org – 10 complex workflows – reuse of distributed execution models • www. inforsense. com – intranet exchanges within large pharmas • www. geodise. org – 150 Matlab functions, 10 scripts – reuse of function combinations No support for comparing workflows! No third party reuse! ISWC 2005, Galway
Ranking Process KA 7 bottlenecks to reuse & repurposing Discovery model Workflow interoperability Workflow rigidity We are here IP rights Service availability ISWC 2005, Galway
Ranking Process KA Step 1: Collect as many workflows as possible Discovery model Workflow interoperability Workflow rigidity IP rights Service availability ISWC 2005, Galway
Ranking Step 2: Make this collection usable Process KA Discovery model Workflow interoperability Workflow rigidity IP rights Service availability ISWC 2005, Galway
Semantic Web community? Ranking Process KA Wanted: technology providers Discovery model Workflow interoperability Workflow rigidity IP rights Service availability ISWC 2005, Galway e-Science community
The bottlenecks, in more detail 1. Service availability – web services: Kepler actors, my. Grid processors, Inforsense services – Local services: Web enable, encode, repository 2. Intellectual property rights – Anonymization; journal policies 3. Workflow rigidity – Evolution and adaptation: parametrisation ISWC 2005, Galway
4 The nice thing about workflow standards… Benesh notation • • Laba notation Workflow languages abound Out of 6 projects, 5 do not use BPEL Behavioural semantics left implicit, as a feature Repurposing in case of multiple workflow systems – outside system boundaries – and across ISWC 2005, Galway
4 The nice thing about workflow standards… • Bring out the behavioural semantics – Comparing 3 projects through workflow patterns • E. g. simple merge – Scientific workflows use functional programming patterns – How do these combine into different distributed execution models? – WSMO/SWSI/OWL-S? ISWC 2005, Galway
5 What belongs in the discovery model? • How to retrieve existing scientific workflows? – Scientists & developers facing distributed programs • For scientists? Data flow discovery, in jargon, largely abstracting from control ? ACAAGATGCCATTGT • For developers? Control flow discovery, largely abstracting from data – Workflow patterns, Kepler distributed execution models • Process networks, process algebra, Petri nets… =? ISWC 2005, Galway
5 What belongs in the discovery model? • For scientists – WSMO Capability and OWL-S Profile clearly not intended for data flow-based queries – OWL DL: A-Box based workflow queries [Goderis+DL’ 05] • For developers – Workflow patterns, Kepler distributed execution models • Pattern example based retrieval • An early table of combined execution models ISWC 2005, Galway
6 New challenges in Knowledge Acquisition • Who does the annotation? + + • What should be in the annotation? – Workflow fragments • Task aggregation/prediction • “Service decomposition” – The things that went wrong! ISWC 2005, Galway
6 New challenges in Knowledge Acquisition • Who does the annotation? – Updated service ontology learning and automated service annotation techniques • What should be in the annotation? – Workflow fragments • “Service decomposition” – Cutting up service webs » Social network analysis (services as users!) – The things that went wrong • Web site usability mining ISWC 2005, Galway
7 Ranking workflow relevance • Repurposing measuring integration effort • Ranking data flow (in jargon) • Structural edit distance • E. g. services to remove/add/replace to equal 2 workflows • For OWL workflow ontology, need abduction or off-line processing • Ranking control flow • Relationship between control flow constructs ISWC 2005, Galway
Take home message • Problem: Workflow reuse and repurposing is happening, how do we make it scale • Data: Survey of 6 e-Science middleware projects • Requirements analysis: 7 bottlenecks – Creating a pool of process knowledge • Workflow interoperability – Accessing this pool of knowledge • Workflow discovery, KA and ranking ISWC 2005, Galway
Acknowledgements • This work is supported by the UK e-Science programme EPSRC GR/ R 67743. • The authors would like to acknowledge the my. Grid team. Hannah Tipney developed the Williams’ syndrome workflow and is supported by The Wellcome Foundation (G/R: 1061183). We thank the survey interviewees for their contribution: Chris Wroe, Mark Greenwood and Peter Li (my. Grid), Ilkay Altintas (Kepler), Vasa Curcin (Infor. Sense), Ian Wang (Triana), Colin Puleston (Geodise) and Ben Butchart (Sedna). • Sean Bechhofer provided useful comments on the draft. ISWC 2005, Galway
322b4a7d9962a0201f2f71aa12380624.ppt