
0bd8040da65456e81ded51c85993c0bd.ppt
- Количество слайдов: 76
Planning for the Web II Execution & Service Integration Dan Weld University of Washington June, 2003 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Acknowledgements • • • Oren Etzioni Yolanda Gil Keith Golden Alon Halevy Zack Ives Tal Shaked Caveat © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 2
Outline • Execution for Data Integration Coping with incomplete statistics, latency Interleaved planning & execution Convergent query processing • Service Integration Web service composition • Background • Representational issues • Planning algorithms Automated data analysis © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 3
Optimization and Execution • Problem: Few and unreliable statistics about the data. Unexpected (possibly bursty) network transfer rates. Generally, unpredictable environment. • General solution: (research area) Adaptive query processing. Interleave optimization and execution. As you get to know more about your data, you can improve your plan. © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 4
Adaptivity & Incremental Processing Query Performance Evaluated within the Tukwila system [Ives Ph. D] © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 5
Query Optimization: Model Query Plans’ Execution & Choose the Best op RO ~30 tuples op Restock (R) 100 tuples ROS ~270 tuples 50 sec ROS ~270 tuples 30 sec Shipping (S) 90 tuples Orders (O) 50 tuples OS ~15 tuples op Restock (R) 100 tuples op Orders (O) 50 tuples Shipping (S) 90 tuples From source sizes, stats, estimate result sizes, costs Estimates, assumptions introduce error: O Exponential increase in estimation error with each join [Ioannidis & Christodoulakis 91] [Antoshekov 93, 96] O Worse if no detailed statistics © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 6
Why Does Data Integration Make Optimization Harder? Query optimization estimates costs using knowledge about environment and data: Data source sizes (“cardinalities”) Often unavailable or not meaningful in data integration Histograms Too expensive to maintain in data integration I/O costs Network I/O costs fluctuate Need a way to gain this sort of knowledge! © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 7
Some Solutions 1. 2. 3. 4. 5. Adaptive operators Mid query reoptimization Convergent query processing Query scrambling [Franklin et al. ] Eddies [Hellerstein et al. ] © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 8
Tukwila Data Integration System Novel components: Event handler Optimization-execution loop Adaptive operators © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 9
Double Pipelined Join Hybrid Hash Join 8 No output until build relation read 8 Asymmetric (build vs. probe) — optimization requires source behavior knowledge © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Double Pipelined Hash Join 4 Outputs data immediately 4 Symmetric — requires less source knowledge to optimize 4 Threads overlap I/O, computation 10
Performance on Networked Data Time (sec) Join of 3 tables sent via JDBC over 10 Mb Ethernet: TPC-H Lineitem Supplier Order Tuples Output (1000 s) © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 11
Double Pipelined Join in Summary Benefits: üEasier to optimize (symmetric) üSub-operations scheduled flexibly üAllows overlap of I/O and computation Incurs some overhead: Threading, queues Required extensions to intelligently handle overflow: • Same hash function, number of buckets for each side • Approaches: flush buckets on left side or flush symmetrically © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 12
Some Solutions 1. Adaptive operators 2. Mid-query reoptimization 1. Interleaved planning and execution 3. Convergent query processing 4. Query scrambling 5. Eddies © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 13
Mid-query reoptimization Materialization Point: write AB to disk AB D C A C D B If actual predicted statistics replan [Kabra & De. Witt] © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 14
Some Solutions 1. 2. 3. 4. 5. Adaptive operators Mid query reoptimization Convergent query processing Query scrambling Eddies © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 15
Convergent Query Processing • Instead of adapting remainder of plan after executing all data on plan prefix • Adapt whole plan after executing whole plan on part of data • Can better gather information this way… © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 16
Convergent Query Processing in Action: Changing Join Plans in Mid-Stream. O S) Join Restock, Orders, Shipping (R ROS 0 S 0 R 0 O R 2 O 2 S 2 R 1 O 1 S 1 R 2 O 2 R 0 S 0 RS O 1 S 1 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration “Cleanup” query plan 17
Breaking a Join into Phases: One Subset per Table, Each Phase Restock (R) Orders (O) Cleanup 0 R 0 Phase O 0 O 1 Phase 1 O 0 O 1 R 1 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 18
The Cleanup Plan Reuses Previous Work Where Possible Restock Orders Shipping R 0 O 0 S 0 R 2 O 2 S 2 R 1 O 1 S 1 R 2 O 2 R 0 S 0 Exclude R 0 S 0 O 0, R 1 S 1 O 1 , R 2 S 2 O 2 , O 1 S 1 Exclude R 2 O 2 © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 19
CQP on a 100 Mbps LAN: Nearly P-III, 256 MB buffer pool, re-optimization every 10 sec “Optimal” Performance 866 MHz cost to parse XML © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 20
Slow WAN, Faster CPU: CQP Reduces Work 1 GHz P-III, 256 MB, re-optimization every 10 sec. 1 Mbps network, RTT ~50 msec © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 21
Outline • Execution for Data Integration Coping with incomplete statistics, latency Interleaved planning & execution Convergent query processing • Service Integration Web service composition • Background • Representational issues • Planning algorithms Automated data analysis © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 22
What is a Web Service • A web service is a network accessible interface to application functionality, built using standard Internet protocols (TCP/IP, XML, SOAP, WSDL… Clients of a web service do NOT need to know how it is implemented. • Why interesting? Increased automation Application Network client © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Web Service Application code 23
Case Study: Amazon • Services Exported Product details (short, long, images, samples) Purchase functionality Ratings, reviews, collaborative filtering data, lists, … • Examples Store builder tools Amazon Browser – visualization tool Windows desktop interfaces – drag-n-drop… MP 3 Piranha Games Automatic review writer? ? © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 24
Case Study: Google • Services Exported Search interface Limits on items returned, queries / day • Examples Metacrawler functionality Geosearch ‘nearby thai restaurants’ • TIGER, FIPs -> lat, long of pages Robust hyperlinks • Creates a signature for destination pages & tracks with query © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 25
Case Study: Fed Express • • Shipment tracking Proof of delivery Invoice reviewed, adjusted, settled Schedule pickup time, location Outgoing or returns • Order supplies (airbills, envelopes, boxes) • Review shipping history • Rate requests Location, package size • International trade Required documents, duties, taxes © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 26
Case Study: Hailstorm / My. Services • Web Services My. Documents My. Addressbook My. Wallet My. Notifications …. • Scenario Wallet keeps receipts, arranges product return Expedia uses notifications to warn of canceled flight • Reality Ebay, Am. Ex, Groove, … © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 27
Case Study: OAA • Common schema for travel industry • Reservations Flights, trains, rental cars, hotels • Time & distances • Payment, deposits, vouchers • Vacation Packages © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 28
Web Service Technology Stack shopping web service? Discovery Web Service Client UDDI Web Service Description Packaging WSDL URIs Proxy WSDL SOAP pkg request WSDL SOAP pkg response Transport Network © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 29
SOAP (Simple Object Access Protocol) • SOAP Messages XML Payload • Using SOAP as RPC (Remote Procedure Call) Messages Request message SOAP client SOAP server Response message © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 32
If a WS were a Phone Call… • XML represents the conversation, • SOAP describes the rules for how to call someone • UDDI is the phone book. • WSDL describes what the phone call is about and how you can participate. © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 33
WSDL for int foo(int arg); <types> <schema target. Namespace="http: //tempuri. org/xsd" xmlns="http: //www. w 3. org/2001/XMLSchema" xmlns: SOAP-ENC="http: //schemas. xmlsoap. org/soap/encoding/" xmlns: wsdl="http: //schemas. . . l/" element. Form. Default="qualified" > </schema> </types> <message name="Simple. foo"> <part name="arg" type="xsd: int"/> </message> <message name="Simple. foo. Response"> <part name="result" type="xsd: int"/> </message> <port. Type name="Simple. Port. Type"> <operation name="foo" parameter. Order="arg" > <input message="wsdlns: Simple. foo"/> <output message="wsdlns: Simple. foo. Response"/> </operation> </port. Type> © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 34
DISCO • If you know the URL for a service • DISCO lets you query them • And get back a WSDL description • But what if you don’t know the right URL? © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 35
UDDI • Hosted Registries Microsoft, IBM, HP, SAP, NTT, BEA • Entries defined with Business information • Name, contacts, descriptions, identifier, yellow pages category Service information • Entities, each of which describes a family of related services which together implement a business process Binding information • How to invoke: URI, required parameters, options, & Tmodel Service specifications (Tmodel) • As a symbol – fingerprint to recognize a known service • Decomposable to find WSDL description © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 36
Acronyms (W 3 C, MSFT, IBM) • UDDI Discover, describe, register services SOAP-based service for locating WSDL-formatted service WSFL descriptions • DISCO XLANG Discover / retrieve SCL+SDL descrips • SDL / NASSL SOAP description lang –get params / types BPEL 4 WS • SCL SOAP contract lang – extends SDL – orchestration of msgs • WSDL Describe abstract interface and protocol bindings of arbitrary network services (extends scl) • XLANG / WSFL / BPEL 4 WS SDL SCL lang for biz processes used in Biz. Talk Biz process execution language for web services • MSFT, IBM, BEA proposal © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration NASSL WSDL 37
The Layer Cake [TBL, XML 2000] © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 38
RDF (Resource Description Framework) Way to describe resources via metadata Makes no assumptions about a particular application domain Based on XML Another one? Standard for semantic web Restricts resource descriptions to triplets (subject, predicate, object) Provides a lightweight ontology system Subproperty, Subclass, Domain & Range © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 39
DAML+OIL (www. daml. org) • DAML extends RDF and RDFS with richer modeling primitives. disjoint. With, intersection. Of, one. Of, cardinality • Able to provide properties of properties uniqueness, transitivity, etc. © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 40
DAML-S DAML+OIL ontology describing Web Services Complements low level descriptions like WSDL Describes what and why a service operates, Not just how to communicate with it. Goals: Discovery, Invocation, Composition, Verification, Execution Monitoring (mapping to WSDL) © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 41
Outline • Execution for Data Integration Coping with incomplete statistics, latency Interleaved planning & execution Convergent query processing • Service Integration Web service composition • Background • Representational issues • Planning algorithms Automated data analysis © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 42
Partial Survey of Planners • UW Internet Softbot Planners: SENSp / XII / PUCCINI Repr. languages: UWL / SADL ; LCW • PKS Planning at the knowledge level • Mc. Dermott Forward-chaining search w/ GRG guidance • Mc. Ilraith et al. Con. Golog (procs, loops, conditionals, w/ nondet • Papazoglou, Traverso et al. Stratified service arch; XSRL language; MBP • Finin; Srivastava; Knoblock; Ambite; Nau… © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 43
Planning for image processing tasks • Many fielded systems Lansky’s COLLAGE , Chien et al. MVP/ASIP, Golden ADLIM, Blythe GRID… • Spatial representations important • Daily Composit 8 -day Re. LAZEA Mosaic project MODIS LAI Daily Composit 8 -day Re. LAZEA Mosaic project FPAR MODIS FPAR LAI So GOES il Radiati on RUC 2 GRIB WGRIB bin Drilldown Min, Max Temp Mean Precip. Mean wind hy ograp Land Surface Models e istur il Mo So ver w co Sno flow ream St NPP Phenology Top Inputs Filters © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Models Statistics False Color Visualization 44
Motivating Scenarios Planning a trip Yahoo maps -> driving time -> travel prefs Automatic expense form filing Purchasing a group of items Aggregation from multiple vendors Select for: payment types, stock level, deliv Local & 3 rd party reputation services (BBB) Monitoring marketplace Auction sites Events (check calendar / notification service © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 45
UW Internet Softbot • • Software robot Effectors mv, ftp, chmod, cd, lpr, rm, . . . Sensors ls, finger, INSPEC, netfind, wc, . . . Say what we want, not how to do it Find phone numbers, fetch/print online papers, … • Integrate multiple resources © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 46
Motivation/Contributions • Represent actions like ls, finger • Represent goals such as “Rename paper. tex to kr. tex” “Print all files in directory papers. ” (even with incomplete information) • No previous system could express © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 47
The Middle Ground 1. Action Representation 2. Knowledge Representation © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 48
Softbot Architecture Task Manager SADL Actions LCW Knowledge PUCCINI Planner Sensors Effectors UNIX shell & WWW © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 49
SADL Family Tree [Fikes & Nilsson, 71] STRIPS [Etzioni et al, 92] Incomplete info, Noise-free sensors [Pednault, 89] UWL ADL ", Conditional Effects SADL Represents ls, “Rename”, finger. . . [Golden & Weld, 96] © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 50
SADL/UWL Annotations Goal annotations: satisfy = achieve by any means hands-off = don’t change (maintenance) Effect annotations cause = change world observe = change agent’s knowledge “Delete the file named junk” satisfy (name (ƒ, junk)) Ù satisfy(deleted (ƒ)) © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 51
Information Goals are Temporal • Two time points When proposition sampled When reply given • “Tell me now who was President in 1883” 1883 • “Tell me tomorrow who is President now” now • “Identify (ASAP) the file now named `junk’” ASAP © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 52
Information Goals are Temporal “Rename paper. tex to kr. tex” designator (name) changes UWL can’t express SADL solution initially = time goal was posed initially (name (ƒ, paper. tex)) Ù satisfy (name (ƒ, kr. tex)) initially (name (ƒ, core)) Ù satisfy (deleted (ƒ )) Compare to more general temporal representation © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 53
Tidiness Goals “Print paper, but don’t leave it uncompressed. ” initially (compressed (paper), tv) Ù satisfy (printed (paper)) Ù satisfy (compressed (paper), tv) State of paper. ps may change temporarily C but must be restored B Compare to more general goal lang, e. g. LTL © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 54
Unbounded Information Gain action ls (d ) precondition: satisfy(current. shell(csh)) Ù satisfy(readable(d )) effect: f when in. dir(f, d) $ l, n, d observe(length(f, l )) Ù observe(name(f, n )) Ù observe(in. dir(f, d )) observe © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 55
Compare PKS Representation Initial State: Kf = {(= (pwd) root), (indir papers root), (indir planner root), (dir papers), (dir planner), (file paper_tex)} Kx = {((indir paper_tex planner) | (indir paper_tex papers))} Goal: K(indir paper_tex (pwd)) © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 56
The Internet Softbot Task Manager SADL Actions LCW Knowledge PUCCINI Planner Sensors Effectors UNIX shell & WWW © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 57
Knowledge Representation • Closed World Assumption (CWA) Made by classical planners Anything not recorded as true is false • Open World Assumption (OWA) Anything not recorded true or false is unknown Sensor abuse Can’t handle goals © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 58
Sensor Abuse • OWA: Don’t know when to stop sensing Many ways to find same information Many plans containing same action • After executing find / -name foo, should know ls bin won’t reveal more files named foo ls tex won’t reveal more files named foo Google may reveal more files named foo © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 59
How Classical Planners Handle • block (x) On. Table (x) A B C replaced with: On. Table (A) Ù On. Table (B) Ù On. Table (C) • Relies on CWA Must know all blocks OWA ® can never be sure © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration A B C 60
Local Closed World Knowledge • Complete info over restricted domain All blocks on table, all products at Amazon • Local Closed World Knowledge (LCW) Restricted form of circumscription Provides fast closed world inference Allows fast updates Suited to planner action representations. © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 61
LCW Semantics “I know all files in directory bin” LCW(in. dir(f, bin)) º f ⊨ in. dir(f, bin) Ú ⊨ Øin. dir(f, bin) © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 62
LCW Representation • M: Ground literals in agent’s model in. dir(icaps 03, papers) in. dir(junk, papers) Ø executable(core) • L: LCW formulas in agent’s model LCW(in. dir(f, papers)) • If P Ï M, and L ⊢ LCW(P), then ØP Conclude: Ø in. dir(foo, papers) foo © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 63
LCW Reasoning • Inference If I know all files in tex, and I know the size of every file, then do I know the size of every file in tex? • Updates If I know the size of every file in tex, and tex I remove a file from tex, do I still know the size of every file in tex? tex What if I add a file to tex? © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 64
LCW Reasoning is Hard Theorem: If LCW formulas can contain Ú and Ø then answering an LCW query is NP-hard. But we need fast inference! • Solution: restrict representation • Positive first-order conjunctions • Fast polynomial time inference/updates [Etzioni et al. AIJ] [Levy VLDB 96] [Friedman & Weld IJCAI 97] © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 65
LCW Updates • L must be updated when M changes. • All changes to M fall into one of four categories: Information loss: Δ(φ, {T, F} ® U) Information gain: Δ(φ, U ® {T, F}) Domain Growth: Δ(φ, F ® T) Domain contraction: Δ(φ, T ® F) © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 66
Domain Growth Adding core to bin invalidate LCW(in. dir(f, bin) Ù size(f, c)) unless the size of core is known! Theorem: If Δ(φ, F ® T) then L’ ¬ L - MREL(φ) B A C MREL(φ) º {Φ Î REL(φ) | ⊬ LCW(Φ-X)θ} REL(φ) º {ΦÎ L | $(XÎΦ, θ, α) Xθ= φα Ù ⊬ Ø(Φ-X)θ} © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 67
LCW Updates © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 68
Time (CPU seconds) ® Pruning Redundant Sensing Experience (problems attempted) © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration ® 69
The Internet Softbot Task Manager SADL Actions LCW Knowledge PUCCINI Planner Sensors Effectors UNIX shell & WWW © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 70
XII / Puccini Planner • Based on UCPOP Generative, Partial-Order, Causal-Link I. e. much like Gerevini’s LPG • Efficient sensing (LCW control) • Lifted support of goals [Golden et al. 94, Golden Phd] © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 71
Satisfying Goals Link Directly to Effect rm * ® f Satisfy(Deleted(f)) Subgoal on LCW; Then Expand to Ground Form ls ® LCW lpr foo, lpr bar ® f Satisfy(Printed(f)) Partition © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 72
Threats to LCW, ls -l /tex LCW(in. dir(f, /tex) & size(f, l)) goal compress /tex/paper cause(length(paper), U) mv junk /tex/ cause(in. dir(junk, /tex), T) Threat ==“Information Loss” Threat “Domain Growth” Promote Demote, Confront Promote, Demote Shrink Confront Enlarge Shrink © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 73
Softbot Status • Fully Implemented (1997) • Hundreds of Unix, Internet Actions • Daunting Combinatorics Rodney Declarative Search Control SIMS Laborious, Brittle • Hence. . . Simon © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration Meta. Crawler Info Manifold ? Improved Declarative Control ? Reactive Control ? Less Expressive Language Bargain. Finder ILA Ahoy Shop. Bot Occam 74
PG-based Heuristics / Sensing [Shaked 03] ? ? © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 75
Using the Graph • • LPG-like search (local search on POP) Propagating sensing action links Executing to reach ‘better’ states Sophisticated heuristics! © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 76
Conclusion • Planning for the web is ripe for progress • Data integration Modeling sources: GAV, LAV, … Answering queries using views , Interleaved planning and execution, eddies, cqp • Service integration Web service composition Representing unbounded information gain Latest heuristic search techniques => fast! © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 77
PKS • Contingent, forward-chaining planner Constructs a complete, correct plan Separates plan-time and execution-time effects • Less Expressive No universal quantification • Still needs search control heuristics [Pettrick & Bacchus KR 00, AIPS 02] © Daniel S. Weld, PLANET 2003 Tutorial on Data Integration 78
0bd8040da65456e81ded51c85993c0bd.ppt