b82f98d332258af2c31fa3653aa2154c.ppt
- Количество слайдов: 22
Transactions, Concluded, and the Future of Data Management Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 4, 2003 Slide content courtesy of Susan Davidson, Raghu Ramakrishnan & Johannes Gehrke
Final. Administrivia § Project demos today and tomorrow § Final exam handed out at the end of today’s class § Finals plus project reports due by 1 PM, 12/18/2003 § Project reports should be ballpark 10 -15 pages § Remember, quality and clarity of presentation matters! § Also, email me a brief message detailing: Your contributions to the project Your group members’ contributions and your assessment of “group dynamics” § Turn in at my office, 576 Levine Hall or to my assistant, Kathy Venit, in 308 Levine Hall 2
Last Time… § We were discussing isolation levels § How to keep transactions from interfering with one another § Or at least, how to minimize this § Recall the strongest version of isolation was serializability 3
Theory of. Serializability § A schedule of a set of transactions is a linear ordering of their actions § e. g. for the simultaneous deposits example: R 1(X. bal) R 2(X. bal) W 1(X. bal) W 2(X. bal) § A serial schedule is one in which all the steps of each transaction occur consecutively § A serializable schedule is one which is equivalent to some serial schedule (i. e. given any initial state, the final state is the same as one produced by some serial schedule) § The example above is neither serial nor serializable 4
Questions of Concern § Given a schedule S, is it serializable? § How can we "restrict" transactions in progress to guarantee that only serializable schedules are produced? 5
Conflicting Actions § Consider a schedule S in which there are two consecutive actions Ii and Ij of transactions Ti and Tj respectively § If Ii and Ij refer to different data items, then swapping Ii and Ij does not matter § If Ii and Ij refer to the same data item Q, then swapping Ii and Ij matters if and only if one of the actions is a write § Ri(Q) Wj(Q) produces a different final value for Q than Wj(Q) Ri(Q) 6
Testing for Serializability § Given a schedule S, we can construct a di-graph G=(V, E) called a precedence graph § V : all transactions in S § E : Ti Tj whenever an action of Ti precedes and conflicts with an action of Tj in S § Theorem: A schedule S is conflict serializable if and only if its precedence graph contains no cycles § Note that testing for a cycle in a digraph can be done in time O(|V|2) 7
An Example T 1 T 2 T 3 R(X, Y, Z) R(X) W(X) T 1 R(Y) W(Y) T 2 T 3 Cyclic: Not serializable. R(Y) R(X) W(Z) 8
Another Example T 1 T 2 R(X) W(X) T 3 T 1 R(X) W(X) T 2 T 3 Acyclic: serializable R(Y) W(Y) 9
Producing the Equivalent Serial Schedule § If the precedence graph for a schedule is acyclic, then an equivalent serial schedule can be found by a topological sort of the graph § For the second example, the equivalent serial schedule is: § R 1(Y)W 1(Y) R 2(X)W 2(X) R 2(Y)W 2(Y) R 3(X)W 3(X) 10
Locking and Serializability § We said that for a serializable schedule, a transaction must hold all locks until it terminates (a condition called strict locking) § It turns out that this is crucial to guarantee serializability § Note that the first (bad) example could have been produced if transactions acquired and immediately released locks. 11
Well-Formed, Two-Phased Transactions § A transaction is well-formed if it acquires at least a shared lock on Q before reading Q or an exclusive lock on Q before writing Q and doesn’t release the lock until the action is performed § Locks are also released by the end of the transaction § A transaction is two-phased if it never acquires a lock after unlocking one § i. e. , there are two phases: a growing phase in which the transaction acquires locks, and a shrinking phase in which locks are released 12
Two-Phased Locking Theorem § If all transactions are well-formed and two-phase, then any schedule in which conflicting locks are never granted ensures serializability § i. e. , there is a very simple scheduler! § However, if some transaction is not well-formed or two-phase, then there is some schedule in which conflicting locks are never granted but which fails to be serializable § i. e. , one bad apple spoils the bunch. 13
Summary of Transactions § Transactions are all-or-nothing units of work guaranteed despite concurrency or failures in the system § Theoretically, the “correct” execution of transactions is serializable (i. e. equivalent to some serial execution) § Practically, this may adversely affect throughput isolation levels § With isolation levels, users can specify the level of “incorrectness” they are willing to tolerate 14
What to Look for Down the Road § … well, no one reallyknows the answer to this… § … But here are some hints, ideas, and hot directions § § Sensors and streaming data Peer-to-peer meets databases “The Semantic Web” Collaborative data sharing 15
Sensors and Streaming Data § No databases at all… § … Instead we have networks of simple sensors § Madden, starting at MIT § Gehrke, Cornell § Widom, Stanford § queries are in SQL § data is live and “streaming” § we compute aggregates over “windows” 16
What’s Interesting Here § We’re not talking about data on disk – we’re talking about queries over “current readings” § Sensors are generally “stupid” and may be battery-operated § A lot of challenges are networking-related: how to aggregate data before it gets sent, etc. § The next step (e. g. , work initiated here @ Penn): including sensors that capture images – a very different problem! § This has many more compelling applications – security, monitoring, correlating multiple sensors, rescue operations, military logistics and coordination, etc. 17
Peer-to-Peer Computing § Fundamentally, our model of DBMSs tends to be centralized § Even for data integration: there’s a single mediator § This has many implications: central administration, central coordination, etc. § What can be gained from borrowing a page from peer-topeer systems like Napster, Kazaa, etc. ? § A better architecture? § Solutions to many problems unsolved by distributed DBMSs? Replication, object location, distributed optimization, resiliency to failure, … § New types of applications, e. g. , in integration? 18
P 2 P Work § As a new architecture for storage and querying § PIER (Berkeley), P-Grid (EPFL), Medusa (MIT) § A better way of thinking about translating and exchanging data § Piazza (Washington), Orchestra (Penn), Hyperion (Toronto), work at Trento 19
The Semantic Web § In some ways, a very “pie-in-the-sky” vision § But some real and concrete problems might be partly solvable § Goal is really very similar to data integration, where somehow we have mappings between the schemas § Currently, most people in the SW community are from knowledge representation community and use RDF § Focus: very rich ways of describing schemas – “ontologies” – that blend querying with class definitions “Teachers are people who teach students” “Tenure-track professors are teachers at universities who can get tenure”; etc. § Implicit take on the problem: if we create better languages for describing ontologies, it’s easier to mediate between schemas 20
Holes in the Semantic Web § What issues and concerns came up in the data integration assignment you had? § Do you think a richer schema language would help for these? § Do you think “better normalization” would help? § Fundamentally, we need: § Languages for not only describing relationships, but transformations between formats (e. g. , XML schemas) § Automatic or partly automated ways of discovering mappings and correspondences § These are all database problems, and the solution likely must come from the DB community § This is part of what P 2 P systems like Piazza, Hyperion try to address 21
My Take on the Future § We’ve evolved from a world where data management is about controlling the data § Instead, data management is about translating and transforming data using declarative languages § It should ultimately become much like TCP or SOAP – a set of standard services for “getting stuff” from one point to another, or from one form to another § It’s the plumbing that connects different applications using different formats § Orchestraproject at Penn: focuses on how to build a system for supporting collaborative science § People publish and map data in different schemas § What happens if people start updating it? § How do you propagate, manage, trace, reconcile changes? 22


