Скачать презентацию Nondeterministic Queries in a Relational Grid Information Service Скачать презентацию Nondeterministic Queries in a Relational Grid Information Service

b6379c1e2149e3294b9cbf4480ea8484.ppt

  • Количество слайдов: 32

Nondeterministic Queries in a Relational Grid Information Service Peter A. Dinda Dong Lu Prescience Nondeterministic Queries in a Relational Grid Information Service Peter A. Dinda Dong Lu Prescience Lab Department of Computer Science Northwestern University http: //plab. cs. northwestern. edu

Overview • RGIS: GIS system based on the relational data model using SQL • Overview • RGIS: GIS system based on the relational data model using SQL • Complex compositional queries can be posed – “Find me 16 hosts on the same LAN that together have 32 GB of RAM” • Can be very expensive to answer – Joins: worst case O(n^m) for m tables of size n • Introduce nondeterminism – User gets random sample of result set – Automated query transformation 2

Outline • • Overview Model Implementation Nondeterministic queries Performance evaluation Related work D. Lu Outline • • Overview Model Implementation Nondeterministic queries Performance evaluation Related work D. Lu and P. Dinda, Synthesizing Realistic Conclusions Computational Grids, SC 2003 D. Lu, J. Skicewicz, and P. Dinda, Scoped and Approximate Queries in a Relational Grid Information Service, Grid 2003 3

RGIS Model of a Grid module Software endpoint iplink router host maclink macswitch connectorlink RGIS Model of a Grid module Software endpoint iplink router host maclink macswitch connectorlink Network Data link Physical • Annotated network topology graph • Annotation examples – Hosts: memory, disk, OS, NICs, etc. – Router/Switch: backplane bandwidth, ports – Link: latency and bandwidth • Highly dynamic data in streams, not DB • Virtualization, Futures, Leases – Virtual machines 4

Outline • • Overview Model Implementation Nondeterministic queries Performance evaluation Related work D. Lu Outline • • Overview Model Implementation Nondeterministic queries Performance evaluation Related work D. Lu and P. Dinda, Synthesizing Realistic Conclusions Computational Grids, SC 2003 D. Lu, J. Skicewicz, and P. Dinda, Scoped and Approximate Queries in a Relational Grid Information Service, Grid 2003 5

Software Metadata Network Types Data Link Physical Security 6 Software Metadata Network Types Data Link Physical Security 6

7 7

RGIS Design (Per Site) 8 RGIS Design (Per Site) 8

RGIS Design (Intersite) A Update Push To Friend Site RGIS Server B RGIS Server RGIS Design (Intersite) A Update Push To Friend Site RGIS Server B RGIS Server Update Push To Friend Site • Site RGIS server pushes local updates to friend sites • Site RGIS server consolidates updates from site and friend sites • Site RGIS server answers all queries originating from its site C RGIS Server 9

Insert/Update/Delete Dual Xeon 1 GHz, 2 GB, 8 x 36 GB RAID 5, Oracle Insert/Update/Delete Dual Xeon 1 GHz, 2 GB, 8 x 36 GB RAID 5, Oracle 9 i xx 10

2, 700 lines of authored SQL 4, 000 lines of generated PL/SQL 22, 000 2, 700 lines of authored SQL 4, 000 lines of generated PL/SQL 22, 000 lines of authored Perl Main dependencies • DBI to Oracle 9 i • SOAP: : Lite • CGI Not finished yet!11

RGIS Design (Per Site) This talk 12 RGIS Design (Per Site) This talk 12

Outline • • Overview Model Implementation Nondeterministic queries Performance evaluation Related work D. Lu Outline • • Overview Model Implementation Nondeterministic queries Performance evaluation Related work D. Lu and P. Dinda, Synthesizing Realistic Conclusions Computational Grids, SC 2003 D. Lu, J. Skicewicz, and P. Dinda, Scoped and Approximate Queries in a Relational Grid Information Service, Grid 2003 13

Motivation • Queries for compositions of resources easily expressed in SQL: “Find 2 hosts Motivation • Queries for compositions of resources easily expressed in SQL: “Find 2 hosts with Linux that together have 3 GB of RAM” select h 1. insertid, h 2. insertid from hosts h 1, hosts h 2 where h 1. os=‘LINUX’ and h 2. os=‘LINUX’ and h 1. mem_mb+h 2. mem_mb>=3072 • But such queries can be very expensive to execute • However, we typically don’t need the entire result set, just some rows, and not always the same ones • And we need them in a bounded amount of time 14

Why Not Just Limit? • Oracle rownum, My. SQL limit clause • “Return first Why Not Just Limit? • Oracle rownum, My. SQL limit clause • “Return first k rows of result set” • Problem: Always get the SAME answer • Problem: May STILL take a long time – Results not discovered until near the end • Problem: Query time related to DATA as well as k 15

Query Approaches Nondeterministic results (this paper) All results Approximate results Available in Grid 2003 Query Approaches Nondeterministic results (this paper) All results Approximate results Available in Grid 2003 Paper Scoped results Return Random Sample of Result Set 16

Nondeterministic Version of Query select nondeterministically h 1. insertid, h 2. insertid from hosts Nondeterministic Version of Query select nondeterministically h 1. insertid, h 2. insertid from hosts h 1, hosts h 2 where h 1. os=‘LINUX’ and h 2. os=‘LINUX’ and h 1. mem_mb+h 2. mem_mb>=3072 within 2 seconds 17

Implementing non-deterministic queries select nondeterministically h 1. insertid, h 2. insertid from Using Oracle-Specific Implementing non-deterministic queries select nondeterministically h 1. insertid, h 2. insertid from Using Oracle-Specific hosts h 1, hosts h 2 Extensions where h 1. os=‘LINUX’ and h 2. os=‘LINUX’ and h 1. mem_mb+h 2. mem_mb>=3072 within SELECT 2 seconds H 1. INSERTID, H 2. INSERTID FROM Query Manager HOSTS H 1 SAMPLE(P), HOSTS H 2 SAMPLE(P) and Rewriter WHERE (H 1. OS='LINUX' AND H 2. OS='LINUX' AND H 1. MEM_MB+H 2. MEM_MB>=3072) Random sample of input tables with Selection Probability P determined by time constraint and server load 18

Implementing non-deterministic queries select nondeterministically Using Our Schema h 1. insertid, h 2. insertid Implementing non-deterministic queries select nondeterministically Using Our Schema h 1. insertid, h 2. insertid from (Not Oracle-Specific) hosts h 1, hosts h 2 where h 1. os=‘LINUX’ and h 2. os=‘LINUX’ and h 1. mem_mb+h 2. mem_mb>=3072 within SELECT 2 seconds H 1. INSERTID, H 2. INSERTID FROM Query Manager HOSTS H 1, HOSTS H 2 , and Rewriter INSERTIDS TEMP_H 1 , INSERTIDS TEMP_H 2 WHERE (H 1. OS='LINUX' AND H 2. OS='LINUX' AND H 1. MEM_MB+H 2. MEM_MB>=3072) AND (H 1. INSERTID=TEMP_H 1. INSERTID AND Random sample of TEMP_H 1. rand > 982663452. 975047 AND input tables with TEMP_H 1. rand <= 1025613125. 93505) AND Selection Probability P (H 2. INSERTID=TEMP_H 2. INSERTID AND determined by time constraint TEMP_H 2. rand > 1877769069. 94039 AND and server load TEMP_H 2. rand <= 1920718742. 90039) 19 Rest of Talk

Implementing non-deterministic queries Host 0 insertid x Random Starting Point random_number x+y y=P*N N Implementing non-deterministic queries Host 0 insertid x Random Starting Point random_number x+y y=P*N N Reshuffling Requirement 20

Deadlines • Hard-limiting – Time-limited thread or process forked • Climbing – Start with Deadlines • Hard-limiting – Time-limited thread or process forked • Climbing – Start with low probability p, issue query, if no results, double probability, try again, keep going until no more time or have results • Estimation – Like climbing, but do polynomial estimation over previous runs to estimate if next run will exceed deadline 21

Outline • • Overview Model Implementation Nondeterministic queries Performance evaluation Related work D. Lu Outline • • Overview Model Implementation Nondeterministic queries Performance evaluation Related work D. Lu and P. Dinda, Synthesizing Realistic Conclusions Computational Grids, SC 2003 D. Lu, J. Skicewicz, and P. Dinda, Scoped and Approximate Queries in a Relational Grid Information Service, Grid 2003 22

Grid. G: Synthesing Realistic Computational Grids • Generates a Grid as an annotated layer Grid. G: Synthesing Realistic Computational Grids • Generates a Grid as an annotated layer 3 topology – Hosts, routers, links • Graph conforms to power laws of Internet topology • Annotations include: – memory, clock speed, cpu type, number of CPUs, operating system type, link bandwidths, router bandwidths, etc. – Memory distribution according to Smith study of MDS contents http: //www. cs. northwestern. edu/~urgis/Grid. G 23

Test Grids Grid Size (Hosts) 50, 000 Query “Find n hosts with 3 GB Test Grids Grid Size (Hosts) 50, 000 Query “Find n hosts with 3 GB of memory” 500, 000 “Find n hosts with 3 GB of memory” 5, 000 “Find n hosts with 3 GB of memory” 10, 000 “Find 2 close hosts” 50, 000 “Find 2 close hosts” 100, 000 “Find 2 close hosts” 24

Nondeterministic query performance Select two hosts that together have >3 GB of RAM Meaningful Nondeterministic query performance Select two hosts that together have >3 GB of RAM Meaningful tradeoff between query processing time and result set size is possible 25

Nondeterministic query performance Select n hosts that together have >3 GB of RAM, holding Nondeterministic query performance Select n hosts that together have >3 GB of RAM, holding query time constant Can use tradeoff to control query time independent of query complexity 26

Deadlines Max Min Find 2 hosts with collective 600 GB RAM (VERY RARE) in Deadlines Max Min Find 2 hosts with collective 600 GB RAM (VERY RARE) in 50 K host grid 27

Extending RGIS to Support Grid Computing On Virtual Machines • Virtuals – Each RGIS Extending RGIS to Support Grid Computing On Virtual Machines • Virtuals – Each RGIS object has a unique id – Virtualization table associates unique id of virtual resources with unique ids of their constituent physical resources – Virtual nature of resource is hidden unless query explicitly requests it • Futures – An RGIS object that does not exist yet – Futures table of unique ids – Future nature of resource hidden unless query explicitly requests it 28

Related Work • • • SLP, X. 500, LDAP Condor Class. Ads MDS R-GMA Related Work • • • SLP, X. 500, LDAP Condor Class. Ads MDS R-GMA Redline Random sampling from databases – Olsen, others 29

Conclusions • GIS system based on relational data model • Powerful queries, but expensive Conclusions • GIS system based on relational data model • Powerful queries, but expensive to execute • Nondeterminism to control query time – Can be implemented without RDMBS support – Automated query translation in RGIS • Several techniques to implement deadlines for queries 30

People and Acknowledgements • Students – Jason Skicewicz, Andrew Weinrich (Web + Soap), Jack People and Acknowledgements • Students – Jason Skicewicz, Andrew Weinrich (Web + Soap), Jack Lange (CDN) • Collaborator – Relational Grid Resources Project at Indiana • Beth Plale • http: //www. cs. indiana. edu/~plale/projects/RGR • Funder – NSF 31

For More Information • URGIS Site – http: //www. cs. northwestern. edu/~urgis • Prescience For More Information • URGIS Site – http: //www. cs. northwestern. edu/~urgis • Prescience Lab – http: //plab. cs. northwestern. edu Special Advertising Section Join The User Comfort Study! http: //comfort. cs. northwestern. edu 32