- Количество слайдов: 58
Database Management System Introduction
Warning • This class is a lot of work. • But it is worth it. • Of all courses you take at CS, this may be the one that gets you a job.
Syllabus • The background and history of database management systems. • The fundamentals of using a database management systems. • Relational model. • Queries and Updates. • Relational Algebra. • Normalization • Transactions and Security. • Object-oriented, object-relational, semi-structured and XML database systems.
What Is a Database System? • Database: a very large, integrated collection of data. • Models a real-world enterprise • Entities (e. g. , teams, games) • Relationships (e. g. , Abo Teraka is playing in Al Ahly) • More recently, also includes active components , often called “business logic”. (e. g. , the BCS ranking system) • A Database Management System (DBMS) is a software system designed to store, manage, and facilitate access to databases.
Why Study Databases? ? • Shift from computation to information • • ? always true for corporate computing Web made this point for personal computing more and more true for scientific computing Need for DBMS has exploded in the last years • • • Corporate: retail swipe/clickstreams, “customer relationship”, “supply chain”, “data warehouses”, etc. Scientific: digital libraries, Human Genome project, NASA Mission to Planet Earth, physical sensors, grid physics network DBMS encompasses much of CS in a practical discipline • • OS, languages, theory, AI, multimedia, logic Yet traditional focus on real-world apps
databases you may use
Database Applications • These examples are what we called traditional database applications (First part of book focuses on traditional applications) • More Recent Applications: • • • Youtube i. Tunes Geographic Information Systems (GIS) Data Warehouses Many other applications
Database Systems: Then
History of Database Systems • 1950’s and early 1960’s: • Data processing using magnetic tapes for storage • • Tapes provide only sequential access Punched cards for input • Late 1960’s and 1970’s: • • • Hard disks allow direct access to data Network and hierarchical data models in widespread use Ted Codd defines the relational data model • • Would win the ACM Turing Award for this work IBM Research begins System R prototype UC Berkeley begins Ingres prototype High-performance (for the era) transaction processing
• 1980 s: • History (cont. ) Research relational prototypes evolve into commercial systems • SQL becomes industry standard • • Parallel and distributed database systems • • • Large decision support and data-mining applications • • XML and XQuery standards Object-oriented database systems • 1990 s: Large multi-terabyte data warehouses Emergence of Web commerce • 2000 s: Automated database administration Increasing use of highly parallel database systems Web-scale distributed data storage systems
= Is a File System a DBMS? • Thought Experiment 1: • • • You and your project partner are editing the same file. You both save it at the same time. Whose changes survive? A) Yours B) Partner’s C) Both D) Neither E) ? ? ? • Thought Experiment 2: –You’re updating a file. –The power goes out. –Which of your changes survive? Q: How do you write programs over a subsystem when it promises you only “? ? ? ” ? A: Very, very carefully!! A) All B) None C) All Since Last Save D) ? ? ?
Can we do it without a DBMS ? Sure we can! Start by storing the data in files: students. txt courses. txt professors. txt Now write C or Java programs to implement specific tasks
Doing it without a DBMS. . . Write a C program to do the following: • Read ‘students. txt’ Read ‘courses. txt’ Enroll “Mary Johnson” in record “Ahmed Hassan” Find&update the “CSE 444”: Find&update the record “CS 444” Write “students. txt” Write “courses. txt” 13
Enters a DMBS “Two tier database system” Data files Database server (someone else’s C program) Applications
Problems without a DBMS. . . • System crashes: Read ‘students. txt’ Read ‘courses. txt’ Find&update the record “Mary Johnson” Find&update the record “CSE 444” Write “students. txt” Write “courses. txt” CRASH ! • What is the problem ? • Large data sets (say 50 GB) • What is the problem ? • Simultaneous access by many users • Need locks: we know them from OS, but now data on disk; and is there any fun to re-implement them ?
Why Use a DBMS? Access by a collection of ad hoc programs in C++, Java, PHP, etc. • Without a DBMS, we'd have: data stored as bits on disks organized as files users of the data There is no control or coordination of what these programs do with the data
Why Use a DBMS? applications DBMS users of the data • With a DBMS, we have: data stored as bits on disks organized as files DBMS provides control and coordination to protect the data.
Database definition • Database is “data” or facts supplied by a base or software • Files contain data with the same structure • Database is an integration of different kinds of data
Database Systems • The big commercial database vendors: • Oracle • IBM (with DB 2) bought Informix recently • Microsoft (SQL Server) • Sybase • Some free database systems (Unix) : • Postgres • Mysql • Predator
DBMS Functions 1. 2. 3. 4. 5. 6. Define the database Construct the database Manipulating database Data security and integrity Concurrency Recovery
Disadvantages of database • Expensive • Incompatible with any other DBMS
Concurrency • A DBMS supports access by concurrent users • concurrent = happening at the same time • concurrent access, particularly writes (data changes), can result in inconsistent states (even when the individual operations are correct) • the DBMS can check the actual operations of concurrent users, to prevent activity that will lead to inconsistent states
Access Control • A DBMS can restrict access to authorized users • security policies often require control that is more fine-grained than that provided by a file system • since the DBMS understands the data structure, it can enforce fairly sophisticated and detailed security policies • on subsets of the data • on subsets of the available operations
Redundancy Control • A DBMS can assist in controlling redundancy • redundancy = multiple copies of the same data • with file storage, it's often convenient to store multiple copies of the same data, so that it's "local" to other data and applications • this can cause many problems: • • • wasted disk space inconsistencies need to enter the data multiple times
Backup and Recovery • A DBMS can provide backup and recovery • backup = snapshots of the data particular times • recovery = restoring the data to a consistent state after a system crash • the higher level semantics (relationships and constraints) can make it difficult to restore a consistent state • transaction analysis can allow a DBMS to reconstruct a consistent state from a number of backups
Views and Interfaces • A DBMS can support multiple user interfaces and user views • since the DBMS provides a well-defined data model and a persistent data dictionary, many different interfaces can be developed to access the same data • data independence ensures that these UIs will not be made invalid by most changes to the data • new user views can be supported as new schemas defined against the conceptual schema
Database Components DBMS ======== Design tools Database contains: User’s Data Metadata Indexes Application Metadata Table Creation Form Creation Query Creation Report Creation Procedural language compiler (4 GL) ======= Run time Form processor Query processor Report Writer Language Run time Application Programs User Interface Applications
Actors on DBMS • Database Administrator • System analysis • Database designer • Application programmer • End user
Actors on the Scene • Database Administrators • acquiring a DBMS • managing the system • acquiring HW and SW to support the DBMS • authorizing access (security policies) • managing staff, including DB designers
Actors on the Scene • Database Designers • identifying the information of interested in the Universe of Discourse (Uo. D) • designing the database conceptual schema • designing views for particular users • designing the physical data layout and logical schema • adjusting data parameters for performance
Actors on the Scene • Systems Analysts and Application Programmers (generic database developers) • provide specialized knowledge to optimize database • usage provide generic (canned) application programs
Actors on the Scene • End Users • casual users: ad-hoc queries • naïve or parametric users: canned queries such as menus for a phone company customer service agent • sophisticated users: people who understand the system and the data and use it in many novel ways • standalone users: people who use personal easy-to-use databases for personal data
Three-Schema Architecture user-specific views External View Conceptual Schema generic view Internal Schema physical view
Levels of Abstraction Users • Views describe how users see the data. • Conceptual schema defines logical structure • Physical schema describes the files and indexes used. • (sometimes called the ANSI/SPARC model) View 1 View 2 View 3 Conceptual Schema Physical Schema DB
Example: University Database View 1 • Conceptual schema: • Conceptual Schema Physical Schema Courses(cid: string, cname: string, credits: integer) Enrolled(sid: string, cid: string, grade: string) • External Schema (View): • Course_info(cid: string, enrollment: integer) • Physical schema: • • View 3 Students(sid: string, name: string, login: string, age: integer, gpa: real) • • View 2 Relations stored as unordered files. Index on first column of Students. DB
Levels of Abstraction • Physical level: describes how a record (e. g. , customer) is stored. • Logical level: describes data stored in database, and the relationships among the data. type customer = record customer_id : string; customer_name : string; customer_street : string; customer_city : string; end; • View level: application programs hide details of data types. Views can also hide information (such as an employee’s salary) for security purposes.
Conceptual Data Models • A data model describes the possible schemas (essentially the meta-schema) • A DBMS is designed around a particular data model • this is what allows all system components (and humans) to understand the schema and data • possible data models • relational, object-oriented, object-relational, entity-relationship, semantic, network, hierarchical, etc.
Physical Data Models • A physical data model describes the way in which data is stored in the computer • typically only of interest to database designers, • • implementers and maintainers …not end users must provide a well-defined structure that can be mapped to the conceptual schema allows optimization strategies to be defined generically
Instances and Schemas • Similar to types and variables in programming languages • Schema – the logical structure of the database • Example: The database consists of information about a set of customers and accounts and the relationship between them) • Physical schema: database design at the physical level • Logical schema: database design at the logical level • Instance – the actual content of the database at a particular point in time
Classification • DBMS has 3 criteria as • Data models (relational & object &…. ) • Number of users (single user & Multi-user) • Number of sites (Centralized & Distributed)
Data model Is a technique for organization data and concepts to describe the structure of data, relationship and integrity constrains.
Database models 1. 2. 3. 4. 5. Relational data model Oracle, Access Hierarchical data mode (as a tree) IMS DBMS Network data model (as a graph) IDMS DBMS Object oriented model VERSANT DBMS Object relational data model UNISQL DBMS
Data Models • Hierarchical Model (1960’s and 1970’s) • Similar to data structures in programming languages. Books (id, title) Authors (first, last) Publisher Subjects
Data Models • Network Model (1970’s) • Provides for single entries of data and navigational “links” through chains of data. Authors Subjects Books Publishers
Data Models • Object Oriented Data Model (1990’s) • Encapsulates data and operations as “Objects” Books (id, title) Authors (first, last) Publisher Subjects
• Example of tabular data in the relational model Attributes Relational Model
A Sample Relational Database
Relational data model • Based on the relations between data • Each relation or table (entity) is a data structure or a collection of attributes describing data • Attribute or a field is a column in the table • A tuple or record is a raw in the table
Relational data model • Null value is assigned to attribute which means that the attribute is not yet known • Primary key is a unique identifier for the table. One attribute or combination of attributes
Relational data model • Foreign key is an attribute (combination of attributes) is one relation whose values are required to match those of the primary of some relation • Candidate key is any key (primary or foreign keys)
New Trends in Databases • Object-relational databases • Main memory database systems • XML XML ! • • Relational databases with XML support Middleware between XML and relational databases Native XML database systems Lots of research here at UW on XML and databases • Data integration • Peer to peer, stream data management – still research
SQL • SQL: widely used non-procedural language Example: Find the name of the customer with customer-id 192 -83 -7465 select customer_name from customer where customer_id = ‘ 192 -83 -7465’ • • • Example: Find the balances of all accounts held by the customer with customer-id 192 -83 -7465 select account. balance from depositor, account where depositor. customer_id = ‘ 192 -83 -7465’ and depositor. account_number = account_number Application programs generally access databases through one of • • Language extensions to allow embedded SQL Application program interface (e. g. , ODBC/JDBC) which allow SQL queries to be sent to a database
The Entity-Relationship Model • Models an enterprise as a collection of entities and relationships • Entity: a “thing” or “object” in the enterprise that is distinguishable from other objects • • Described by a set of attributes Relationship: an association among several entities • Represented diagrammatically by an entity-relationship diagram:
Transaction Management • A transaction is a collection of operations that performs a single logical function in a database application • Transaction-management component ensures that the database remains in a consistent (correct) state despite system failures (e. g. , power failures and operating system crashes) and transaction failures. • Concurrency-control manager controls the interaction among the concurrent transactions, to ensure the consistency of the database.
An UNIVERSITY example • A UNIVERSITY database for maintaining information concerning students, courses, and grades in a university environment • We have: STUDENT file stores data on each student COURSE file stores data on each course SECTION file stores data on each section of each course GRADE_REPORT file stores the grades that students receive PREREQUISITE file stores the prerequisites
Example of a simple database
COMPANY Database • The company is organized into DEPARTMENTs. Each department has a name, number, and an employee who manages the department. We keep track of the start date of the department manager. A department may have several locations. • Each department controls a number of PROJECTs. Each project has a name, number, and is located at a single location.
COMPANY Database • We store each EMPLOYEE's social security number, • address, salary, sex, and birth date. Each employee works for one department but may work on several projects. We keep track of the number of hours per week that an employee currently works on each project. We also keep track of the direct supervisor of each employee. Each employee may have a number of DEPENDENTs. For each dependent, we keep their name, sex, birth date, and relationship to the employee.