Скачать презентацию Transactions and Wrap-Up Zachary G Ives University of Скачать презентацию Transactions and Wrap-Up Zachary G Ives University of

c24e242a10795b6fb89e71ae9c992444.ppt

  • Количество слайдов: 37

Transactions and Wrap-Up Zachary G. Ives University of Pennsylvania CIS 550 – Database & Transactions and Wrap-Up Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 9, 2004 Some slide content derived from Ramakrishnan & Gehrke

Reminders Please be sure you’re signed up for a project demo § Due at Reminders Please be sure you’re signed up for a project demo § Due at that time: 8 -15 page report describing: What your project goals were What you implemented Basic architecture and design Division of labor § And the code! § Also: please email me an assessment of how well your group worked; group members’ contributions; your contributions § Final examination Dec. 17 th, Meyerson Hall B 3, 8: 30 AM 2

Recall: Good Executions An execution is “good” if it is serial (transactions are executed Recall: Good Executions An execution is “good” if it is serial (transactions are executed atomically and consecutively) or serializable (i. e. equivalent to some serial execution ) Deposit 1 Deposit 3 read(X. bal) read(Y. bal) X. bal : = X. bal + $50 Y. bal: = Y. bal + $10 write(X. bal) write(Y. bal) Equivalent to executing Deposit 1 then 3, or vice versa 3

Atomicity Problems can also occur if a crash occurs in the middle of executing Atomicity Problems can also occur if a crash occurs in the middle of executing a transaction: CRASH Transfer read(X. bal) read(Y. bal) X. bal= X. bal-$100 Y. bal= Y. bal+$100 Need to guarantee that the write to X does not persist (ABORT) § Default assumption if a transaction doesn’t commit 4

Transactions in SQL § A transaction begins when any SQL statement that queries the Transactions in SQL § A transaction begins when any SQL statement that queries the db begins. § To end a transaction, the user issues a COMMIT or ROLLBACK statement. Transfer UPDATE Accounts SET balance = balance - $100 WHERE account#= ‘ 1234’; UPDATE Accounts SET balance = balance + $100 WHERE account#= ‘ 5678’; COMMIT; 5

Read-Only vs. Read-Write Transactions § We can tell the DBMS that we won’t be Read-Only vs. Read-Write Transactions § We can tell the DBMS that we won’t be performing any updates (What does this allow the DBMS to do? SET TRANSACTION READ ONLY; ): SELECT * FROM Accounts WHERE account#=‘ 1234’; § If we are going to modify the WRITE; we need: SET TRANSACTION READ DBMS, UPDATE Accounts SET balance = balance - $100 WHERE account#= ‘ 1234’; . . . 6

Dirty Reads § Dirty data is data written by an uncommitted transaction; a dirty Dirty Reads § Dirty data is data written by an uncommitted transaction; a dirty read is a read of dirty data (WR conflict ) § Sometimes we can tolerate dirty reads; other times we cannot: e. g. , if we wished to ensure balances never went negative in the transfer example, we should test that there is enough money first! 7

“Bad” Dirty Read EXEC SQL select balance into : bal from Accounts where account#=‘ “Bad” Dirty Read EXEC SQL select balance into : bal from Accounts where account#=‘ 1234’; if (bal > 100) { EXEC SQL update Accounts set balance = balance - $100 where account#= ‘ 1234’; EXEC SQL update Accounts set balance = balance + $100 where account#= ‘ 5678’; } EXEC SQL COMMIT; If the initial read (italics) were dirty, the balance could become negative! 8

Acceptable Dirty Read If we are just checking availability of an airline seat, a Acceptable Dirty Read If we are just checking availability of an airline seat, a dirty read might be fine! (Why is that ? ) Reservation transaction: EXEC SQL select occupied into : occ from Flights where Num= ‘ 123’ and date=11 -03 -99 and seat=‘ 23 f’; if (!occ) {EXEC SQL update Flights set occupied=true where Num= ‘ 123’ and date=11 -03 -99 and seat=‘ 23 f’; } else {notify user that seat is unavailable} 9

Other Undesirable Phenomena § Unrepeatable read: a transaction reads the same data item twice Other Undesirable Phenomena § Unrepeatable read: a transaction reads the same data item twice and gets different values (RW conflict) § Phantom problem: a transaction retrieves a collection of tuples twice and sees different results 10

Phantom Problem Example § T 1: “find the students with best grades who Take Phantom Problem Example § T 1: “find the students with best grades who Take either cis 550 -f 03 or cis 570 -f 04” § T 2: “insert new entries for student #1234 in the Takes relation, with grade A for cis 570 -f 04 and cis 550 -f 03” § Suppose that T 1 consults all students in the Takes relation and finds the best grades for cis 550 -f 03 § Then T 2 executes, inserting the new student at the end of the relation, perhaps on a page not seen by T 1 § T 1 then completes, finding the students with best grades for cis 570 -f 04 and now seeing student #1234 11

Isolation § The problems we’ve seen are all related to isolation § General rules Isolation § The problems we’ve seen are all related to isolation § General rules of thumb w. r. t. isolation: § Fully serializable isolation is more expensive than “no isolation” We can’t do as many things concurrently (or we have to undo them frequently) § For performance, we generally want to specify the most relaxed isolation level that’s acceptable Note that we’re “slightly” violating a correctness constraint to get performance! 12

Specifying Acceptable Isolation Levels § The default isolation level is SERIALIZABLE (as for the Specifying Acceptable Isolation Levels § The default isolation level is SERIALIZABLE (as for the transfer example) § To signal to the system that a dirty read is acceptable, SET TRANSACTION READ WRITE ISOLATION LEVEL READ UNCOMMITTED; 13

READ COMMITTED § Forbids the reading of dirty (uncommitted) data, but allows a transaction READ COMMITTED § Forbids the reading of dirty (uncommitted) data, but allows a transaction T to issue the same query several times and get different answers § No value written by T can be modified until T completes § For example, the Reservation example could also be READ COMMITTED; the transaction could repeatably poll to see if the seat was available, hoping for a cancellation 14

REPEATABLE READ § What it is NOT: a guarantee that the same query will REPEATABLE READ § What it is NOT: a guarantee that the same query will get the same answer! § However, if a tuple is retrieved once it will be retrieved again if the query is repeated § For example, suppose Reservation were modified to retrieve all available seats § If a tuple were retrieved once, it would be retrieved again (but additional seats may also become available) 15

Summary of Isolation Levels Level Dirty Read Unrepeatable Read Phantoms READ UNCOMMITTED Maybe READ Summary of Isolation Levels Level Dirty Read Unrepeatable Read Phantoms READ UNCOMMITTED Maybe READ COMMITTED No Maybe REPEATABLE READ No No Maybe SERIALIZABLE No No No 16

Implementing Isolation Levels One approach – use locking at some level: § each data Implementing Isolation Levels One approach – use locking at some level: § each data item is either locked (in some mode, e. g. shared or exclusive) or is available (no lock) § an action on a data item can be executed if the transaction holds an appropriate lock § consider granularity of locks – how big of an item to lock Larger granularity = fewer locking operations but more contention! tuple, page, table, etc. Appropriate locks: § Before a read, a shared lock must be acquired § Before a write, an exclusive lock must be acquired 17

Lock Compatibility Matrix Locks on a data item are granted based on a lock Lock Compatibility Matrix Locks on a data item are granted based on a lock compatibility matrix: Request mode { Shared Exclusive Mode of Data Item None Shared Exclusive Y Y N N When a transaction requests a lock, it must wait (block) until the lock is granted 18

Locks Prevent “Bad” Execution If the system used locking, the first “bad” execution could Locks Prevent “Bad” Execution If the system used locking, the first “bad” execution could have been avoided: Deposit 1 Deposit 2 xlock(X) read(X. bal) {xlock(X) is not granted} X. bal : = X. bal + $50 write(X. bal) release(X) xlock(X) read(X. bal) X. bal: = X. bal + $10 write(X. bal) release(X) 19

Lock Types and Read/Write Modes When we specify “read-only”, the system only uses shared-mode Lock Types and Read/Write Modes When we specify “read-only”, the system only uses shared-mode locks Any transaction that attempts to update will be illegal When we specify “read-write”, the system may also acquire locks in exclusive mode Obviously, we can still query in this mode 20

Isolation Levels and Locking Always update with exclusive lock held to end of transaction Isolation Levels and Locking Always update with exclusive lock held to end of transaction READ UNCOMMITTED: read data without acquiring any lock READ COMMITTED: read data: lock all tuples read, immediately release locks REPEATABLE READ: read data: grab shared lock on all tuples read, hold to end of transaction SERIALIZABLE: read data: grab shared lock on all tuples read and the index , hold to end of transaction Holding locks to the end of a transaction is called “strict” locking 21

Theory of. Serializability § A schedule of a set of transactions is a linear Theory of. Serializability § A schedule of a set of transactions is a linear ordering of their actions § e. g. for the simultaneous deposits example: R 1(X. bal) R 2(X. bal) W 1(X. bal) W 2(X. bal) § A serial schedule is one in which all the steps of each transaction occur consecutively § A serializable schedule is one which is equivalent to some serial schedule (i. e. given any initial state, the final state is the same as one produced by some serial schedule) § The example above is neither serial nor serializable 22

Questions to Address § Given a schedule S, is it serializable? § How can Questions to Address § Given a schedule S, is it serializable? § How can we "restrict" transactions in progress to guarantee that only serializable schedules are produced? 23

When Actions Conflict § Consider a schedule S in which there are two consecutive When Actions Conflict § Consider a schedule S in which there are two consecutive actions Ii and Ij of transactions Ti and Tj respectively § If Ii and Ij refer to different data items, then swapping Ii and Ij does not matter § If Ii and Ij refer to the same data item Q, then swapping Ii and Ij matters if and only if one of the actions is a write § Ri(Q) Wj(Q) produces a different final value for Q than Wj(Q) Ri(Q) 24

Testing for Serializability § Given a schedule S, we can construct a di-graph G=(V, Testing for Serializability § Given a schedule S, we can construct a di-graph G=(V, E) called a precedence graph § V : all transactions in S § E : Ti Tj whenever an action of Ti precedes and conflicts with an action of Tj in S § Theorem: A schedule S is conflict serializable if and only if its precedence graph contains no cycles § Note that testing for a cycle in a digraph can be done in time O(|V|2) 25

An Example T 1 T 2 T 3 R(X, Y, Z) R(X) W(X) T An Example T 1 T 2 T 3 R(X, Y, Z) R(X) W(X) T 1 R(Y) W(Y) T 2 T 3 Cyclic: Not serializable. R(Y) R(X) W(Z) 26

Locking and Serializability § We said that a transaction must hold all locks until Locking and Serializability § We said that a transaction must hold all locks until it terminates (a condition called strict locking) § It turns out that this is crucial to guarantee serializability § Note that the first (bad) example could have been produced if transactions acquired and immediately released locks. 27

Well-Formed, Two-Phased Transactions § A transaction is well-formed if it acquires at least a Well-Formed, Two-Phased Transactions § A transaction is well-formed if it acquires at least a shared lock on Q before reading Q or an exclusive lock on Q before writing Q and doesn’t release the lock until the action is performed § Locks are also released by the end of the transaction § A transaction is two-phased if it never acquires a lock after unlocking one § i. e. , there are two phases: a growing phase in which the transaction acquires locks, and a shrinking phase in which locks are released 28

Two-Phased Locking Theorem § If all transactions are well-formed and two-phase, then any schedule Two-Phased Locking Theorem § If all transactions are well-formed and two-phase, then any schedule in which conflicting locks are never granted ensures serializability § i. e. , there is a very simple scheduler! § However, if some transaction is not well-formed or two-phase, then there is some schedule in which conflicting locks are never granted but which fails to be serializable § i. e. , one bad apple spoils the bunch 29

Summary § Transactions are all-or-nothing units of work guaranteed despite concurrency or failures in Summary § Transactions are all-or-nothing units of work guaranteed despite concurrency or failures in the system § Theoretically, the “correct” execution of transactions is serializable (i. e. equivalent to some serial execution) § Practically, this may adversely affect throughput isolation levels § With isolation levels, users can specify the level of “incorrectness” they are willing to tolerate 30

What to Look for Down the Road § … well, no one reallyknows the What to Look for Down the Road § … well, no one reallyknows the answer to this… § … But here are some hints, ideas, and hot directions § Sensors and streaming data * § Peer-to-peer meets databases and data integration * § “The Semantic Web” 31

Sensors and Streaming Data § No databases at all… § … Instead we have Sensors and Streaming Data § No databases at all… § … Instead we have networks of simple sensors § queries are in SQL § data is live and “streaming” § we compute aggregates over “windows” 32

What’s Interesting Here § We’re not talking about data on disk – we’re talking What’s Interesting Here § We’re not talking about data on disk – we’re talking about queries over “current readings” § Sensors are generally “stupid” and may be batteryoperated § A lot of challenges are networking-related: how to aggregate data before it gets sent, etc. § Future challenges: what happens when we have lots of different kinds of sensors 33

Peer-to-Peer Computing § Fundamentally, our model of DBMSs tends to be centralized Even for Peer-to-Peer Computing § Fundamentally, our model of DBMSs tends to be centralized Even for data integration: there’s a single mediator § What can be gained from borrowing a page from peer-topeer systems like Napster, Kazaa, etc. ? § A better architecture? § Solutions to many problems unsolved by distributed DBMSs? Replication, object location, distributed optimization, resiliency to failure, … § New types of applications, e. g. , in integration? § Can we exchangedata between databases in some controlled way? e. g. , share parts of your Palm contact list with someone else’s cell phone contact list 34

The Semantic Web § In some ways, a very “pie-in-the-sky” vision § A “World The Semantic Web § In some ways, a very “pie-in-the-sky” vision § A “World Wide Web” that’s machine-understandable § “The ultimate automated librarian” § But some real and concrete problems might be partly solvable § Goal is really very similar to data integration, where somehow we have mappings between the schemas § Need: § Languages for not only describing relationships, but transformations between formats (e. g. , XML schemas) § Automatic or partly automated ways of discovering mappings and correspondences § These are all database problems, and the solution likely must come from the DB community 35

My Take on the Future of Data Management § We’ve evolved from a world My Take on the Future of Data Management § We’ve evolved from a world where data management is about controlling the data § Instead, data management is about translating and transforming data using declarative languages § It should ultimately become much like TCP or SOAP – a set of standard services for “getting stuff” from one point to another, or from one form to another § It’s the plumbing that connects different applications using different formats 36

A Plug for Next Semester § CIS 650: focus is on techniques for constructing A Plug for Next Semester § CIS 650: focus is on techniques for constructing data management systems § Databases; distributed databases; P 2 P databases; data integration; middleware; etc. § We’ll read many of the definitive papers in the DB field § Meanwhile… Best of luck on your projects and exams – and have a wonderful break § I hope you learned a lot in this course and that it – at least for stretches – was enjoyable! 37