b6379c1e2149e3294b9cbf4480ea8484.ppt
- Количество слайдов: 32
Nondeterministic Queries in a Relational Grid Information Service Peter A. Dinda Dong Lu Prescience Lab Department of Computer Science Northwestern University http: //plab. cs. northwestern. edu
Overview • RGIS: GIS system based on the relational data model using SQL • Complex compositional queries can be posed – “Find me 16 hosts on the same LAN that together have 32 GB of RAM” • Can be very expensive to answer – Joins: worst case O(n^m) for m tables of size n • Introduce nondeterminism – User gets random sample of result set – Automated query transformation 2
Outline • • Overview Model Implementation Nondeterministic queries Performance evaluation Related work D. Lu and P. Dinda, Synthesizing Realistic Conclusions Computational Grids, SC 2003 D. Lu, J. Skicewicz, and P. Dinda, Scoped and Approximate Queries in a Relational Grid Information Service, Grid 2003 3
RGIS Model of a Grid module Software endpoint iplink router host maclink macswitch connectorlink Network Data link Physical • Annotated network topology graph • Annotation examples – Hosts: memory, disk, OS, NICs, etc. – Router/Switch: backplane bandwidth, ports – Link: latency and bandwidth • Highly dynamic data in streams, not DB • Virtualization, Futures, Leases – Virtual machines 4
Outline • • Overview Model Implementation Nondeterministic queries Performance evaluation Related work D. Lu and P. Dinda, Synthesizing Realistic Conclusions Computational Grids, SC 2003 D. Lu, J. Skicewicz, and P. Dinda, Scoped and Approximate Queries in a Relational Grid Information Service, Grid 2003 5
Software Metadata Network Types Data Link Physical Security 6
7
RGIS Design (Per Site) 8
RGIS Design (Intersite) A Update Push To Friend Site RGIS Server B RGIS Server Update Push To Friend Site • Site RGIS server pushes local updates to friend sites • Site RGIS server consolidates updates from site and friend sites • Site RGIS server answers all queries originating from its site C RGIS Server 9
Insert/Update/Delete Dual Xeon 1 GHz, 2 GB, 8 x 36 GB RAID 5, Oracle 9 i xx 10
2, 700 lines of authored SQL 4, 000 lines of generated PL/SQL 22, 000 lines of authored Perl Main dependencies • DBI to Oracle 9 i • SOAP: : Lite • CGI Not finished yet!11
RGIS Design (Per Site) This talk 12
Outline • • Overview Model Implementation Nondeterministic queries Performance evaluation Related work D. Lu and P. Dinda, Synthesizing Realistic Conclusions Computational Grids, SC 2003 D. Lu, J. Skicewicz, and P. Dinda, Scoped and Approximate Queries in a Relational Grid Information Service, Grid 2003 13
Motivation • Queries for compositions of resources easily expressed in SQL: “Find 2 hosts with Linux that together have 3 GB of RAM” select h 1. insertid, h 2. insertid from hosts h 1, hosts h 2 where h 1. os=‘LINUX’ and h 2. os=‘LINUX’ and h 1. mem_mb+h 2. mem_mb>=3072 • But such queries can be very expensive to execute • However, we typically don’t need the entire result set, just some rows, and not always the same ones • And we need them in a bounded amount of time 14
Why Not Just Limit? • Oracle rownum, My. SQL limit clause • “Return first k rows of result set” • Problem: Always get the SAME answer • Problem: May STILL take a long time – Results not discovered until near the end • Problem: Query time related to DATA as well as k 15
Query Approaches Nondeterministic results (this paper) All results Approximate results Available in Grid 2003 Paper Scoped results Return Random Sample of Result Set 16
Nondeterministic Version of Query select nondeterministically h 1. insertid, h 2. insertid from hosts h 1, hosts h 2 where h 1. os=‘LINUX’ and h 2. os=‘LINUX’ and h 1. mem_mb+h 2. mem_mb>=3072 within 2 seconds 17
Implementing non-deterministic queries select nondeterministically h 1. insertid, h 2. insertid from Using Oracle-Specific hosts h 1, hosts h 2 Extensions where h 1. os=‘LINUX’ and h 2. os=‘LINUX’ and h 1. mem_mb+h 2. mem_mb>=3072 within SELECT 2 seconds H 1. INSERTID, H 2. INSERTID FROM Query Manager HOSTS H 1 SAMPLE(P), HOSTS H 2 SAMPLE(P) and Rewriter WHERE (H 1. OS='LINUX' AND H 2. OS='LINUX' AND H 1. MEM_MB+H 2. MEM_MB>=3072) Random sample of input tables with Selection Probability P determined by time constraint and server load 18
Implementing non-deterministic queries select nondeterministically Using Our Schema h 1. insertid, h 2. insertid from (Not Oracle-Specific) hosts h 1, hosts h 2 where h 1. os=‘LINUX’ and h 2. os=‘LINUX’ and h 1. mem_mb+h 2. mem_mb>=3072 within SELECT 2 seconds H 1. INSERTID, H 2. INSERTID FROM Query Manager HOSTS H 1, HOSTS H 2 , and Rewriter INSERTIDS TEMP_H 1 , INSERTIDS TEMP_H 2 WHERE (H 1. OS='LINUX' AND H 2. OS='LINUX' AND H 1. MEM_MB+H 2. MEM_MB>=3072) AND (H 1. INSERTID=TEMP_H 1. INSERTID AND Random sample of TEMP_H 1. rand > 982663452. 975047 AND input tables with TEMP_H 1. rand <= 1025613125. 93505) AND Selection Probability P (H 2. INSERTID=TEMP_H 2. INSERTID AND determined by time constraint TEMP_H 2. rand > 1877769069. 94039 AND and server load TEMP_H 2. rand <= 1920718742. 90039) 19 Rest of Talk
Implementing non-deterministic queries Host 0 insertid x Random Starting Point random_number x+y y=P*N N Reshuffling Requirement 20
Deadlines • Hard-limiting – Time-limited thread or process forked • Climbing – Start with low probability p, issue query, if no results, double probability, try again, keep going until no more time or have results • Estimation – Like climbing, but do polynomial estimation over previous runs to estimate if next run will exceed deadline 21
Outline • • Overview Model Implementation Nondeterministic queries Performance evaluation Related work D. Lu and P. Dinda, Synthesizing Realistic Conclusions Computational Grids, SC 2003 D. Lu, J. Skicewicz, and P. Dinda, Scoped and Approximate Queries in a Relational Grid Information Service, Grid 2003 22
Grid. G: Synthesing Realistic Computational Grids • Generates a Grid as an annotated layer 3 topology – Hosts, routers, links • Graph conforms to power laws of Internet topology • Annotations include: – memory, clock speed, cpu type, number of CPUs, operating system type, link bandwidths, router bandwidths, etc. – Memory distribution according to Smith study of MDS contents http: //www. cs. northwestern. edu/~urgis/Grid. G 23
Test Grids Grid Size (Hosts) 50, 000 Query “Find n hosts with 3 GB of memory” 500, 000 “Find n hosts with 3 GB of memory” 5, 000 “Find n hosts with 3 GB of memory” 10, 000 “Find 2 close hosts” 50, 000 “Find 2 close hosts” 100, 000 “Find 2 close hosts” 24
Nondeterministic query performance Select two hosts that together have >3 GB of RAM Meaningful tradeoff between query processing time and result set size is possible 25
Nondeterministic query performance Select n hosts that together have >3 GB of RAM, holding query time constant Can use tradeoff to control query time independent of query complexity 26
Deadlines Max Min Find 2 hosts with collective 600 GB RAM (VERY RARE) in 50 K host grid 27
Extending RGIS to Support Grid Computing On Virtual Machines • Virtuals – Each RGIS object has a unique id – Virtualization table associates unique id of virtual resources with unique ids of their constituent physical resources – Virtual nature of resource is hidden unless query explicitly requests it • Futures – An RGIS object that does not exist yet – Futures table of unique ids – Future nature of resource hidden unless query explicitly requests it 28
Related Work • • • SLP, X. 500, LDAP Condor Class. Ads MDS R-GMA Redline Random sampling from databases – Olsen, others 29
Conclusions • GIS system based on relational data model • Powerful queries, but expensive to execute • Nondeterminism to control query time – Can be implemented without RDMBS support – Automated query translation in RGIS • Several techniques to implement deadlines for queries 30
People and Acknowledgements • Students – Jason Skicewicz, Andrew Weinrich (Web + Soap), Jack Lange (CDN) • Collaborator – Relational Grid Resources Project at Indiana • Beth Plale • http: //www. cs. indiana. edu/~plale/projects/RGR • Funder – NSF 31
For More Information • URGIS Site – http: //www. cs. northwestern. edu/~urgis • Prescience Lab – http: //plab. cs. northwestern. edu Special Advertising Section Join The User Comfort Study! http: //comfort. cs. northwestern. edu 32


