Kεφάλαιο 7 Φυσικός Σχεδιασμός

Kεφάλαιο 7 Φυσικός Σχεδιασμός -- Αρχεία YV - Access Methods and Indexes 274

Physical n n n Storage The DATA MANAGER is the component of the DBMS responsible to interact with the physical database The related concepts are: file system, buffer management, access methods Each DBMS has its own data manager, which may employ a computer-system standard file system enhanced with additional mechanisms/facilities A DBMS involves the following Memory Hierarchy: Tape ® Disk ® Memory ® Cache (sequential) (direct) – Tapes are used for mass storage, disks for the persistent database storage, while the main memory and cache for processing the transactions and DBMS functions YV - Access Methods and Indexes 275

Memory Hierarchy. DATA CURRENCY Registers Cache Main Memory E X P E N S E Electronic Storage Online External Storage Near-line (Archive) Storage Nonvolatile Electronic OR Magnetic/Optical (block-addressed) Disk Jukeboxes OR Tape Robots Off-line MEMORY CAPACITY YV - Access Methods and Indexes 276

Disks and Files n n DBMS stores information on (“hard”) disks. This has major implications for DBMS design! – READ: transfer data from disk to main memory (RAM). – WRITE: transfer data from RAM to disk. – Both are high-cost operations, relative to in-memory operations, so must be planned carefully! YV - Access Methods and Indexes 277

Why Not Store Everything in Main Memory? n n n Costs too much. $1000 will buy you either 128 MB of RAM or 7. 5 GB of disk today. Main memory is volatile. We want data to be saved between runs. (Obviously!) Typical storage hierarchy: – Main memory (RAM) for currently used data. – Disk for the main database (secondary storage). – Tapes for archiving older versions of the data (tertiary storage). YV - Access Methods and Indexes 278

Physical Storage Media n Disk Storage Devices – Data is stored as magnetized areas on magnetic disks – Disk Packs have many disks connected to a rotating spindle – Disks are divided into concentric circular tracks on each surface - track capabilities range from 4 to 50 KBytes – Tracks are divided into blocks (pages), with fixed size for a specific system. The sizes range from 512 to 4096 bytes – Whole blocks are transferred between disks and memory – A physical disk block address consists of: a surface number, track number (within surface) & block number (within track) – Reading or writing a disk block is time consuming (because of the seek time and the rotational delay) YV - Access Methods and Indexes 279

Accessing a Disk Page n Time to access (read/write) a disk block: – seek time (moving arms to position disk head on track) – rotational delay (waiting for block to rotate under head) – transfer time (actually moving data to/from disk surface) n Seek time and rotational delay dominate. – Seek time varies from about 1 to 20 msec – Rotational delay varies from 0 to 10 msec – Transfer rate is about 1 msec per 4 KB page n Key to lower I/O cost: reduce seek/rotation delays! Hardware vs. software solutions? YV - Access Methods and Indexes 280

Components of a Disk The platters spin (say, 90 rps). The arm assembly is moved in or out to position a head on a desired track. Tracks under heads make a cylinder (imaginary!). Only one head reads/writes at any one time. Spindle Disk head Tracks Sector Arm movement Platters Arm assembly Block size is a multiple of sector size (which is fixed). v YV - Access Methods and Indexes 281

Disk Space Management n n Lowest layer of DBMS software manages space on disk. Higher levels call upon this layer to: – allocate/de-allocate a page – read/write a page n Request for a sequence of pages must be satisfied by allocating the pages sequentially on disk! Higher levels don’t need to know how this is done, or how free space is managed. YV - Access Methods and Indexes 282

Basic Definitions n n n A record is a named collection of data values (items) A file is a named sequence of fixed sized records stored in a sequence of fixed sized blocks (pages) on the disk Each file has a file descriptor (file header) with information about the file (item names, data types, etc. ) The blocking factor for a file is the average number of file records stored in a disk block Each block has a name called its address File records can be unspanned (no record can span two blocks) or spanned (the record is stored in more than one block) YV - Access Methods and Indexes 283

File System n FILE SYSTEM: Its primary role is to manage files stored on pages: – – – – n Create a file Insert a page Modify a page Delete a page Retrieve a page Reorganize a file Terminate access to a file, etc. Seen from the outside a file system is responsible for the: – translation from file name to the absolute address of the file – translation from a record key to the page address YV - Access Methods and Indexes 284

Memory Management. DBMS Application . SET-ORIENTED Database Access Methods TUPLEORIENTED Main Online External Nearline External Buffer Manages BLOCK-ORIENTED Manages YV - Access Methods and Indexes Logging and Recovery File Manager Transaction Programs Tuple Management Associative Access Record Management Buffer Management File Management Archive Manager 285

Buffer Management n n n A Buffer is a part of the main memory available for the storage of blocks (pages) transferred to/from disks The BUFFER MANAGER is the subsystem responsible for the allocation of buffer space (transparently to the user) Typical Operation of the Buffer Manager Given a user request for a page: – checks if the page is in the buffer already, – if it is, then it passes its address to the user – if it is not, then it brings it from the disk into the buffer, possibly replacing another page (if no space is available), then passes its address to the user YV - Access Methods and Indexes 286

Buffer Management in a DBMS Page Requests from Higher Levels BUFFER POOL disk page free frame MAIN MEMORY DISK n n DB choice of frame dictated by replacement policy Data must be in RAM for DBMS to operate on it! Table of <frame#, pageid> pairs is maintained. YV - Access Methods and Indexes 287

When a Page is Requested. . . n If requested page is not in pool: – Choose a frame for replacement – If frame is dirty, write it to disk – Read requested page into chosen frame n Pin the page and return its address. If requests can be predicted (e. g. , sequential scans) pages can be pre-fetched several pages at a time! * YV - Access Methods and Indexes 288

More on Buffer Management n Requestor of page must unpin it, and indicate whether page has been modified: – dirty bit is used for this. n Page in pool may be requested many times, – a pin count is used. A page is a candidate for replacement iff pin count = 0. n CC & recovery may entail additional I/O when a frame is chosen for replacement. (Write-Ahead Log protocol; more later. ) YV - Access Methods and Indexes 289

Buffer Replacement Policy n Frame is chosen for replacement by a replacement policy: – Least-recently-used (LRU), Clock, MRU etc. n Policy can have big impact on # of I/O’s; depends on the n access pattern. Sequential flooding: Nasty situation caused by LRU + repeated sequential scans. – # buffer frames < # pages in file means each page request causes an I/O. MRU much better in this situation (but not in all situations, of course). YV - Access Methods and Indexes 290

Buffer Management (2) n n A buffer manager is very similar to virtual memory managers (as found in Operating Systems). But, it is usually much more sophisticated, since it is specially designed for database systems and can thus predict much better the needs and idiosyncrasies of the database system NEW ISSUES: – replacement strategy. The typical strategies in Operating Systems (e. g. , LRU) do not perform well in databases (MRU) – pinned records. It is often the case that the DBMS needs to specify that some blocks remain continuously (are pinned) in the buffer. – forced output of blocks. It is usual for the DBMS (e. g. , for recovery reasons) to force some blocks to disk prematurely YV - Access Methods and Indexes 291

Buffer Management (3) n A Buffer Manager keeps for each page in the buffer: – In which disk page it is stored – Whether it has been modified or not (dirty page) – Information for the replacement strategy that is used n There are several alternative buffer structure designs: – The same buffer pool is used for all relations – Separate buffer pool is used for each relation – As above, but with relations borrowing buffers from other relations YV - Access Methods and Indexes 292

DBMS vs. OS File System OS does disk space & buffer mgmt: why not let OS manage these tasks? n n n Differences in OS support: portability issues Some limitations, e. g. , files can’t span disks. Buffer management in DBMS requires ability to: – pin a page in buffer pool, force a page to disk (important for implementing CC & recovery), – adjust replacement policy, and pre-fetch pages based on access patterns in typical DB operations. YV - Access Methods and Indexes 293

Access Methods n The access methods are responsible for the following: – Allocation of file records (tuples) within blocks – Support of record addressing by address and by value. In essence, converting between references to records and physical blocks on storage devices – Support of secondary (auxiliary) file structures in order to make record addressing more efficient. n In the sequel, we examine the physical organization of records and blocks and also the basic file organizations YV - Access Methods and Indexes 294

File Management n PHYSICAL ORGANIZATION OF RECORDS / BLOCKS n Key Issues: – Formatting fields within a record – Formatting records within a block – Assigning records to blocks n Formatting fields within a record – Fixed length stored in specific order F 1 F 2 F 3 F 4 F 5 The address of Fi is: i-1 L 2 L 3 L 4 L 5 B + kÓ 1 Lk = B : Base address YV - Access Methods and Indexes 295

Formatting Fields – Fixed length fields stored as an indexed heap F 5 F 3 F 1 F 4 --- Fields need not be stored in order --- There is exactly one pointer in the header for each field (whether it is present or not) YV - Access Methods and Indexes 296

Formatting Fields (2) – Variable length fields delimited by special symbols F 1 F 2 $ F 3 $ F 4 $ F 5 $ – Variable length fields delimited by length F 1 L 1 YV - Access Methods and Indexes F 2 L 2 F 3 L 3 F 4 L 4 F 5 L 5 297

Formatting Records n Formatting records within a block – Records stored contiguously within the block (fixed packed) B 1 2. . . N L A record is located by a simple address calculation Ri = B + (i-1)*L YV - Access Methods and Indexes 298

Formatting Records (2) – The above structure is highly inflexible, introducing several inefficiencies * records may span blocks (happens very often and is costly) 2 1 3 4 Block Boundary 5 * insertion and deletion become complicated 1 2. . Delete this record. . N YV - Access Methods and Indexes 299

Formatting Records (3) – A block header contains an array of pointers pointing to the records within the block (indexed heap) DESCRIPTOR grows YV - Access Methods and Indexes Next Primary Next Overflow . . . 300

Formatting Records (4) n A record is located by providing its block number and its index in the pointer array in the block header. This combination (block number, index) is called TID n Insertion and deletion are easy: they are accomplished by manipulating the pointer array n The block may be reorganized without affecting external pointers (pointing to records). That is, records retain their TID even if they are moved around within the block. YV - Access Methods and Indexes 301

Assigning Records to Blocks - FILE ORGANIZATION n Assigning records to blocks -- Arbitrary placement -- Keyed placement by sorting n Arbitrary placement: Records are assigned to blocks arbitrarily, usually according to the order of insertion (this is called a HEAP or PILE) – simplest file organization strategy – uses as many blocks as necessary - links blocks together – provides no help for retrieval whatsoever (linear search) YV - Access Methods and Indexes 302

Unordered (Heap) Files n n n Simplest file structure contains records in no particular order. As file grows and shrinks, disk pages are allocated and deallocated. To support record level operations, we must: – keep track of the pages in a file – keep track of free space on pages – keep track of the records on a page n There are many alternatives for keeping track of this. YV - Access Methods and Indexes 303

Heap File Implemented as a List Data Page Full Pages Header Page Data Page n n Data Pages with Free Space The header page id and Heap file name must be stored someplace. Each page contains 2 `pointers’ plus data. YV - Access Methods and Indexes 304

Heap File Using a Page Directory Data Page 1 Header Page Data Page 2 DIRECTORY n n Data Page N The entry for a page can include the number of free bytes on the page. The directory is a collection of pages; linked list implementation is just one alternative. – Much smaller than linked list of all HF pages! YV - Access Methods and Indexes 305

Indexes n A Heap file allows us to retrieve records: – by specifying the rid, or – by scanning all records sequentially n Sometimes, we want to retrieve records by specifying the values in one or more fields, e. g. , – Find all students in the “CS” department – Find all students with a gpa > 3 n Indexes are file structures that enable value-based queries efficiently. YV - Access Methods and Indexes us to answer such 306

System Catalogs n For each index: – structure (e. g. , B+ tree) and search key fields n For each relation: – – n name, file structure (e. g. , Heap file) attribute name and type, for each attribute index name, for each index integrity constraints For each view: – view name and definition n Plus statistics, authorization, buffer pool size, etc. * Catalogs are themselves stored as relations! YV - Access Methods and Indexes 307

File Organization - Basics n Keyed placement: Records are assigned to blocks according to the values of some key fields. They can then be retrieved with associative access – The supporting structure implementing the mapping of records with specific values in the key fields to blocks is called an INDEX – It facilitates the execution of retrievals since, to a large extent, only relevant records are retrieved. – Updates (insertions and deletions) become more expensive, because of the requirement to maintain the index – Three major index structures: (a) ISAM (b) HASHING (c) B-Trees YV - Access Methods and Indexes 308

File Organization - Basics n Keyed placement by sorting: Sort the file on the key field(s) and store it in that order (SEQUENTIAL FILE) – It is a special case of the general keyed placement, with the distinguishing characteristic that there is no index to be supported – Retrievals are performed employing binary search – Advantages: » faster selection than non-keyed » good for range queries (e. g. , salary between 25 and 35 K) » efficient joins (applying merge-scan) – Disadvantages: » Slower equality selection than other keyed index structures » Updates are extremely expensive (and complex) YV - Access Methods and Indexes 309

Hashing n The magic of folding and hashing FOLDING Range of potential key values (shaded areas denote used key values) HASHING . . . Range of Positive Integers YV - Access Methods and Indexes Record Address Space 310

Hashing Essentials n n n Key values usually come from very large domains (e. g. , character strings of a certain length). First, they have to be converted into a numerical representation: FOLDING Then, the numerical value is transformed into a valid address from the address space: HASHING Factors that are important: domain values must be evenly distributed, utilization in the address space must be high, records must be evenly spread across the available space, . . . Hashing is generally good for exact queries, but very inadequate for range queries. YV - Access Methods and Indexes 311

Hashing Mechanics n n n The file blocks are divided into an equal number of buckets Typically, a bucket corresponds to one disk block (or a fixed number of blocks) One (or more) of the file fields is (are) selected to be the hash key(s) A hashing function h is constructed as follows: h : V ® {0, 1, 2, . . . B-1} where: V is the domain of field values B is the number of buckets in the address space (Note: Folding is required as an intermediate operation) n YV - Access Methods and Indexes 312

Hashing Examples n Example – Assume that, V is the domain for Employee. Number (a 9 digit number standing for the SSN), and B = 1000. – We create a hash function h : V ® {0, 1, 2, . . . B-1}, as: if v belongs to V, h(v) = last 3 digits of v = v MOD 1000 n n Hashing Functions can be constructed easily, with the main criterion being: uniform distribution of records in buckets (otherwise, search gets very expensive) Typical hash functions: Congruential (division remainder), Nth power, base transformation, polynomial division, encryption, etc. YV - Access Methods and Indexes 313

Hashing Functions - Overflow n n n Perhaps, the most popular and most heavily used hash function is congruence (MOD). Basically, we divide the field value (after folding) by B and we interpret the remainder as the bucket value. Example Use the function h(v) = v MOD 3 to index Salary 0 Salaries h (Salary) Tom 30 Jill 36 1 Shirley 22 Maria 25 2 YV - Access Methods and Indexes Dan 29 Ron 43 Bart 61 OVERFLOW 314

Hashing --- Overflow n The example above demonstrates the phenomenon of collisions, which occur when a new record hashes to a bucket which is already FULL n An overflow area is kept for storing such records n Overflows can occur because of: – Heavy loading of the file – Poor hashing function (does not distribute uniformly the field values) – Statistical peculiarities (too many values hash to the same bucket) YV - Access Methods and Indexes 315

Hashing --- Overflow (2) n Overflows are usually handled by one the following ways: – Chaining: if a bucket h(v) is full, chain an empty block to the bucket to expand it – Open Addressing: if h(v) is full, store the record in h(v)+1 If it is also full, store it in h(v)+2, etc. – Double-hashing: Employ 2 hashing functions (h and h’ ) If h(v) is full, try h’(v). If h’(v) is also full, try any of the above mentioned schemes (including a third hash function) YV - Access Methods and Indexes 316

Performance of Hashing n n n The performance of a hashing scheme depends on the value of the loading factor L, defined as: L is the number of records in a file divided by (B´S) where: B is the number of buckets S is the number of records per bucket Practical Hint: For loading factors of about 0. 9 and with a well-chosen hashing function, expect about 1. 2 probes on the average to retrieve a record with a given key value. Rule of thumb: When the loading factor becomes too high, a typical tactic is to double B and rehash YV - Access Methods and Indexes 317

Static Hashing Limitations n n The main disadvantage of static hashing is the fixed number of buckets (while the number of records dynamically changes) -- this brings in OVERFLOW Various dynamic extensions of hashing have been devised: – Extendible Hashing – Linear Hashing n n Dynamic hashing techniques avoid having long overflow chains in each bucket They achieve this by dynamically changing the number of buckets and / or the hashing function YV - Access Methods and Indexes 318

Extendible Hashing n n n In extendible hashing, the number of bucket increases and decreases as the file (relation) expands or shrinks The hashing function h is chosen such that its range is a very large set of integers (typically, B is 2 b , where b=32) Not all bits of the hash value are used, in particular: – At any point, the d most significant bits are used, 0 £ d £ b – The d-bit number is used as an index to a directory (array) that contains a pointer to the appropriate bucket. The directory is said to have (global) length d – The directory is stored on disk and expands or shrinks dynamically YV - Access Methods and Indexes 319

Extendible Hashing (2) – Any number (the ones in the power of 2) of neighboring entries in the directory may be pointing to the same bucket. Consecutive entries, taken in pairs (1 -2, 3 -4, 5 -6, etc. ) are called buddies – If 2 k entries point to a bucket, the bucket’s local depth d’ which is stored in the bucket’s header, is equal to d’ = d - k – The hashing values of keys hashed in the same bucket have the same d’ bits. – Extendible hashing does not require an overflow area YV - Access Methods and Indexes 320

Extendible Hashing (3) n Example: Index on Salary (extendible hashing) d’= d= 1 Hart 32 Jill 40 Taft 33 2 00 01 Salaries h (Salary) 2 Rudy 38 10 11 2 Bill 31 Shirl 35 Larry 31 YV - Access Methods and Indexes 321

Extendible Hashing (4) n Overflows are handled with the following algorithm: Assume that a bucket is about to overflow (due to the insertion of a new tuple): – If the overflown bucket has d’ < d, ~ Split the bucket into 2 ~ Make the pointer of the buddy entry in the directory point to the new entry ~ Rehash all the keys of the bucket ~ Increase the local depth (d’) of the bucket by 1. This will also be the value of the local depth for the new bucket YV - Access Methods and Indexes 322

Extendible Hashing (5) – If the overflown bucket has d’ = d, ~ DOUBLE the size of the directory ~ Every entry indexed by a (d+1) - bit number, points to the bucket where the entry indexed by the first d bits pointed before ~ Increase the depth d by 1 ~ Proceed as before by splitting the overflown bucket n When a lot of keys are deleted, buddies may be merged. This may result in cutting the size of the directory in half. YV - Access Methods and Indexes 323

Extendible Hashing: Overflow. Example: Insert tuples <Peter, 37> and <Nat, 43> 3 000 001 010 011 Hart 32 Jill 40 Taft 33 Peter 37 Rudy 38 2 2 2 h (Salary) 100 101 110 111 YV - Access Methods and Indexes Bill 31 Larry 31 Shirl 35 Nat 43 3 3 324

Extendible Hashing - Summary n ADVANTAGES – Performance of retrievals is constant as the relation grows n DISADVANTAGES – Updates are fairly expensive, especially when the directory doubles – There is much space overhead for the directory – If the directory grows very big and does not fit in main memory, retrievals need two I/O operations – If a bucket is overflown with tuples having the same key values, extendible hashing will be splitting this bucket forever!!! YV - Access Methods and Indexes 325

Kεφάλαιο 7 Φυσικός Σχεδιασμός — Αρχεία YV —