Скачать презентацию M 359 Relational databases theory and practice Block Скачать презентацию M 359 Relational databases theory and practice Block

916998de912a5e9607e84e74e7cdc9e6.ppt

  • Количество слайдов: 56

M 359 Relational databases: theory and practice Block 1 Databases in context 1 M 359 Relational databases: theory and practice Block 1 Databases in context 1

Aims ØExplain relational database concepts. ØExplain database and DBMS. ØExplain the use of the Aims ØExplain relational database concepts. ØExplain database and DBMS. ØExplain the use of the database language SQL. ØExplain about database development and the techniques. 2

1. The evolution of data management Chapter -1 1. 3 1. 5 1. 6 1. The evolution of data management Chapter -1 1. 3 1. 5 1. 6 1. 9 Modern Data Processing Databases and DBMSs Are all database systems the same? Enduring issues in data management 3

Modern Data Processing v. The file-based approach. v. The database Management System approach. 11 Modern Data Processing v. The file-based approach. v. The database Management System approach. 11

File based systems v. Made up of data files and associated programs. v. Each File based systems v. Made up of data files and associated programs. v. Each data file contains data records specific to one application. v. Each application needs to know about the format and access mechanisms for each file (interface). 12

Interface 13 Interface 13

The explicit interface 14 The explicit interface 14

Productive maintenance 15 Productive maintenance 15

Productive Maintenance: Maintenance that leads to a benefit by meeting a new requirement , Productive Maintenance: Maintenance that leads to a benefit by meeting a new requirement , not just a continuation of the previous state 16

Unproductive maintenance 17 Unproductive maintenance 17

Unproductive Maintenance: making changes to the program not because of any new requirement for Unproductive Maintenance: making changes to the program not because of any new requirement for that program but in order to continue using a shared data file that has been changed for some other reason 18

Unproductive maintenance- change program because shared data file has been changed ¬ because of Unproductive maintenance- change program because shared data file has been changed ¬ because of new requirement for that program 19

Data Dependence: File based processing is closely tied to physical structure of the data Data Dependence: File based processing is closely tied to physical structure of the data stored in the files. Disadvantages of file based systems: Ø Unproductive maintenance. Ø Duplication of data in multiple files. Ø Need to generate a new application for each request. 20

The Database Approach It hides physical representation from users or application programs. A s/w The Database Approach It hides physical representation from users or application programs. A s/w system (DBMS) is responsible for all interactions of data files with application programs. The application program doesn’t know how data is stored or how to manipulate records. Application program uses a logical interface. 21

The Database Management System Approach 22 The Database Management System Approach 22

Statements to the logical interface Example for logical interface : Select Postcode From Student; Statements to the logical interface Example for logical interface : Select Postcode From Student; v DBMS turns logical request to physical instruction. v It internally hold a mapping b/w logical name & physical representation. v To change physical structure , change the mapping but not to change logical interface& application process. v If we change the physical structure, then the mapping can be changed. But no change in logical name & application program. v The separation of physical storage of data from application process (data independence). 23

The enduring principles of data storage and management Ø The data records must be The enduring principles of data storage and management Ø The data records must be durable. Ø The data records must be interpretable. Ø The data records must be trustworthy and secure. Ø Agreeing the structure and content of data collections can enable data sharing. Ø Structured data can be processed automatically. Ø Data structure representation must be flexible. Ø Data access must be simple and logical, rather than tied to physical representations. Ø Data collections can be used for many purposes; their processing must be timely to be useful. 24

Data vs Information v. Data is a term given to the many recordable facts Data vs Information v. Data is a term given to the many recordable facts about the world we inhabit. Ø Example: a clock may show both hands pointing towards the 12 v. Information is said to be meaningful data. Ø Example: if the clock is not working or is not set to local time, we cannot say what means! 25

Database terminology v. Database: a collection of structured, persistent, shared data. v. Structured data: Database terminology v. Database: a collection of structured, persistent, shared data. v. Structured data: the data has a regular, defined, structure within the database and can support the manipulation of the data. v. Persistent data: the data will continue to be stored even when the applications creating and using it are no longer running. Data must be maintained without loss. v. Shared data: can be used by many user applications. 26

Database terminology v Database Management System (DBMS): the collection of software that enable storage, Database terminology v Database Management System (DBMS): the collection of software that enable storage, manipulation and protection of databases. v Database engine: A software program for accessing DB and manipulating data. v Redundancy: duplication of data. v Query: a request to the database to access some data. v User process: user program running outside the DBMS. v Data dictionary (system catalog): description of the data held in the database. v Query optimizer: works out how to turn a logical query into an efficient series of physical processing steps to achieve the desired result. v Constraints: Rules that define representation and values for data in the DB. 27

DBMS-embedded functions 28 DBMS-embedded functions 28

Separation of concerns in a DBMS: Physical storage, logical description & individual user requirements Separation of concerns in a DBMS: Physical storage, logical description & individual user requirements 29

Advantages of the database and DBMS approach v. Centralized functionality leads to consistency in Advantages of the database and DBMS approach v. Centralized functionality leads to consistency in the management of the data, its storage, validation, security, processing and control. v. Data can be shared reducing the need for redundant duplication. v. There is a reduction of the unproductive maintenance incurred when the requirements of data and programs are changed. v. The separation of concerns allows a focused way of considering data representations, requirements and processing. 30

Advantages of the database and DBMS approach v. The separation of concerns simplifies the Advantages of the database and DBMS approach v. The separation of concerns simplifies the task for programmers developing user processes. v. Ad hoc querying (an unplanned, interactive style of query generation) becomes easily achievable because the DBMS can translate a logical user query into the more complex physical requirements of the underlying computer system. 31

Disadvantages of the database and DBMS approach v. Complex. v. Expensive. v. Risks of Disadvantages of the database and DBMS approach v. Complex. v. Expensive. v. Risks of failure and security breaches are severe. 32

The eras of database management systems v. Pre-relational. v. Relational. v. Post-relational. 33 The eras of database management systems v. Pre-relational. v. Relational. v. Post-relational. 33

Pre-relational systems v. Hierarchical systems • Data is arranged in a hierarchy. • A Pre-relational systems v. Hierarchical systems • Data is arranged in a hierarchy. • A tree structured hierarchy. • Search is not efficient. v. Network systems • A data item can have multiple parent data items. • It has increased flexibility. • More complex to navigate. 34

Hierarchical systems Student enrolled in courses at different years 35 Hierarchical systems Student enrolled in courses at different years 35

Relational systems The student relation Ø E F Codd proposed this structure. Ø A Relational systems The student relation Ø E F Codd proposed this structure. Ø A logical data structure in which all data appeared in tabular form known as relation. 36

Relational systems Selecting Rows (Postcode starts with MK) 37 Relational systems Selecting Rows (Postcode starts with MK) 37

Selecting certain columns 38 Selecting certain columns 38

The Closure Property Selecting rows and columns Closure property: The property of applying operations The Closure Property Selecting rows and columns Closure property: The property of applying operations to relations & producing relations. 39

Problems with the relational model Representation difficulties: It is difficult to represent objects in Problems with the relational model Representation difficulties: It is difficult to represent objects in the real world into tables. Impedance mismatch: convert between the two approaches whenever data is transferred from the setbased to the value-based environment. Semantics: Fails to capture the meaning of data stored in the database. 40

Post-relational systems v. Object-oriented database systems. v. XML database systems. 41 Post-relational systems v. Object-oriented database systems. v. XML database systems. 41

Object-oriented database systems v. We can represent data using objects. v. Objects contain both Object-oriented database systems v. We can represent data using objects. v. Objects contain both value(data) and behaviors(methods). v. Encapsulation: the internal state is only accessible through the methods. Types of Object-Oriented Systems v. Object-oriented DBMS (OODBMS). v. Object-relational DBMS (ORDBMS). 42

e. Xtensible Markup Language (XML) database systems Ø Uses tagging to indicate the meaning e. Xtensible Markup Language (XML) database systems Ø Uses tagging to indicate the meaning of data. Ø Example: 8 High Street MR 245 TD United Kingdom

8 High Street MR 245 TD United Kingdom
43

2. Data management – why it matters Chapter -2 46 2. Data management – why it matters Chapter -2 46

Corporate information strategy(CIS) v Information strategy for a large organization. v Is developed to Corporate information strategy(CIS) v Information strategy for a large organization. v Is developed to ensure that the technology is meeting the company’s needs and is appropriate for those needs It can be broken into: v Information systems strategy (IS) v Information management strategy (IM) v Information technology strategy (IT) 47

Corporate information strategy 48 Corporate information strategy 48

 IS strategy Address how to implement information technology to improve the business Ex: IS strategy Address how to implement information technology to improve the business Ex: make an online website , to make site attractive give free services, promotional offers etc. IM strategy Clarify the duties of employees related with the new system to implement IS strategy Their tasks will be manage & operate the new system Keep its information up-to-date, accurate and understandable IT strategy It includes an outline of activities & processes needed to implement the IS strategy within the IM strategy defined It involves supply of technology and equipment 49

The cost of data Costs are usually tangible, except for Costs related to poor The cost of data Costs are usually tangible, except for Costs related to poor quality or missing data (time, effort, frustration & dissatisfaction) Can be divided under 4 categories: System costs (building, equipment , network, people , ) Origination costs (acquisition, cleaning & preparation) Communication costs (setting up & delivery) Execution costs (training employee, additional equipment & software) 50

Data quality Ø Data have high quality if it satisfies the requirements of its Data quality Ø Data have high quality if it satisfies the requirements of its intended use. Ø It lacks quality when it fails to satisfy the requirements. Quality Management of data v establishing standards for quality of data. v establishing procedures or methods which ensure that these standards of quality are met. v monitoring the actual quality. v taking control when quality falls below acceptable thresholds. 51

Data quality Data must include the following qualities v Accuracy : data value & Data quality Data must include the following qualities v Accuracy : data value & data representation should be accurate. v Completeness: should have complete set of data values to take decisions. v Timeliness: data is available for user needs within a reasonable time period. v Relevance: the appropriateness of the data to the requirement. v Understandable: meaning and interpretation must be clear to the users. v Trusted: require an adequate guarantee for the security, privacy and ownership of data. 52

The impact of poor data quality v. Operational effect: lowering customer & employee satisfaction, The impact of poor data quality v. Operational effect: lowering customer & employee satisfaction, increasing costs. v. Management effect: poor or invalid decision making, taking longer to make decisions, resulting in internal organizational mistrust. v. Strategic effect: creating difficulties in setting strategy, raising issues of data ownership and control, diverting management attention. 53

ACTIVITY 2. 1 Messy mailing lists 1. 2. 3. 4. 5. Open the Notlaw ACTIVITY 2. 1 Messy mailing lists 1. 2. 3. 4. 5. Open the Notlaw Supermarket Choose the Mailing List Generation application Use the By Postcode Use the partial postcode MK. Compare it to the list in the ALL option Ø List data quality problems that could have been avoided by data cleaning or better constraints? Ø What data quality problems cannot be avoided? Ø How will poor quality affect the Supermarket? 56

ACTIVITY 2. 2 Refund problems 1. Run the Notlaw Till Management app. give a ACTIVITY 2. 2 Refund problems 1. Run the Notlaw Till Management app. give a date in last week. 2. Record the purchase of one four-pint carton of milk 3. Write a note of the till receipt No. & the line No. of the milk and the code No. & price of the milk 4. Run Price Management app. 5. Give the date as yesterday. 6. Increase the price by 3 pence. 7. Run the Refund Management app. 8. Give today’s date. 9. Use the till receipt No. & receipt line No. you noted earlier. Ø How much profit can the customer make in this situation? Ø Can you do the same at the Walton Supermarket? 57

ACTIVITY 2. 3 Moving problems 1. A customer informs the supermarket of a change ACTIVITY 2. 3 Moving problems 1. A customer informs the supermarket of a change of address. 2. Using the Notlaw Customer Maintenance application: 3. Select the data for Geraldine Taylor, by entering her customer number C 12345. 4. Change this customer’s address to any new address 5. Save this change 6. Using the Orders Pending app. look for outstanding orders still to be delivered to Geraldine (there is only one). Ø What problem is evident in the data the applications display? Ø Is this problem present in the Walton Supermarket? 58

ACTIVITY 2. 4 Confusion reigns 1. We are to develop an application to report ACTIVITY 2. 4 Confusion reigns 1. We are to develop an application to report the details of incomplete orders to suppliers 2. Look at the document notlaworderdatadescription. doc in the ActivitiesBlock 1NotlawOther. Docs subfolder within your M 359 installation folder. 3. Using this description identify those data items you would need to use to decide if an order was incomplete. Ø What is your understanding of the meaning of the Notlaw data? Ø Now look at the file waltonorderdatadescription. doc why is it easier to understand? 59

ACTIVITY 2. 5 Using nonstandard data representations 1. Open the Notlaw Book Catalogue application. ACTIVITY 2. 5 Using nonstandard data representations 1. Open the Notlaw Book Catalogue application. 2. Examine the data held about the latest Airy Porter book (currently Volume 216). 3. Make a note of the main details for this book. 4. Use the Books-4 -Booksellers Order app. to order another 6 copies of this book. Use the Customer Reference “ 1234 “ and “notlaw “ for the Customer Name. Ø that is, if you can! Ø follow the above instructions for the Walton Supermarket. 60

ACTIVITY 2. 6 Satisfying a Data Protection Act request 1. A customer who previously ACTIVITY 2. 6 Satisfying a Data Protection Act request 1. A customer who previously requested a copy of their data held by the supermarket has disputed the alleged non-payment of a bill. This customer says that on the occasion in question they did not make an order with Notlaw Supermarket so there should be no outstanding bill related to their account. 2. The supermarket admits it has made a mistake by confusing this customer’s records with that of another customer. The supermarket has agreed to correct the fact that the customer is recorded as having a poor payment history. 3. Using the Notlaw Customer Maintenance application 4. Look at the entry for Gerald Tailor, customer No. C 19342. 5. You should see that the customer details show a Poor Payment flag on screen; 6. remove that flag by changing the yes to no. 7. Update the changed data in the database. 8. Now, retrieve Gerald’s data a second time and confirm that the Poor Payment flag shows no. 61

ACTIVITY 2. 6 Satisfying a Data Protection Act request 9. Now execute the End-Of-Month ACTIVITY 2. 6 Satisfying a Data Protection Act request 9. Now execute the End-Of-Month Reconciliation application. 10. This application would do some automated reconciliation checks that the supermarket might make to maintain the internal consistency of data; it might also result in the payment of bills, sending of overdue reminders and other such activities. We’ve coded this so that it checks if there any unpaid bills and sets the poor payment flag for those customers who have unpaid bills. 11. Incidentally, the original choice of the 10 -digit ISBN has proved to be insufficient to cope with the volume of books being published; prompting the recent introduction of 13 -digit ISBNs – so expect a mini-millenium bug type flurry of activity in the field of book databases. 12. Using the Customer Maintenance application, check the state of the poor payment flag for Gerald Tailor. You should see that the poor payment flag has been reset by the reconciliation application. 62

ACTIVITY 2. 6 Satisfying a Data Protection Act request Now use the Credit-Check application: ACTIVITY 2. 6 Satisfying a Data Protection Act request Now use the Credit-Check application: 13. Enter Gerald’s customer number C 19342 to bring up the details of any outstanding payments. As you can see, changing the poor payment flag has not changed the data which shows that Gerald has an unpaid bill. In fact it should be inconsistent to have unpaid bills and a good payment record, but this is permitted by the Notlaw database. A change to the poor payment flag should require a change to the related data so that the reconciliation application cannot reset the poor payment flag at some later date. 14. Use the Walton Supermarket applications to make the above alterations to Gerald’s data (he has the same customer number). Remember that the Walton data is separate from the Notlaw data, so Gerald’s record hasn’t been corrected yet. When you attempt to update the poor payment flag, the Walton application will behave differently compared to the Notlaw application. Follow the on-screen instructions to reallocate the disputed order to the correct customer Geraldo Tailon C 19324, and run the reconciliation application 15. and check Gerald’s customer records. 63

ACTIVITY 2. 7 Maintenance overheads 1. 2. 3. 4. 5. Run the Notlaw Customer ACTIVITY 2. 7 Maintenance overheads 1. 2. 3. 4. 5. Run the Notlaw Customer Profiling application. use today’s date Look at the customer details listed. Now request the report for the same date two years ago Compare the detail of a few customers shown on the two reports; Ø What is unusual? Ø For comparison, look at the customer list produced by the Walton Customer Profiling application for a similar span of dates. 64

ACTIVITY 2. 8 Accessing archived records 1. Your task in this activity is to ACTIVITY 2. 8 Accessing archived records 1. Your task in this activity is to extract the contents of an archive file and print out the customer data. 2. Locate the data file Notlaw_May_1_1977. nrc in the ActivitiesBlock 1 3. NotlawOther. Docs folder of your M 359 installation folder. 4. This data file is an archival record of the content of the Notlaw customer database on 1 May 1977. Try to extract some meaningful content from this file. 5. Don’t spend too long trying to interpret the content of the file; you can use a simple text editor like Microsoft Notepad to open the file, then skip straight on to the discussion. 65

The data island syndrome Ø Independent database inside the company for the use of The data island syndrome Ø Independent database inside the company for the use of a single department or even a single person. Ø Could be independent of the company’s policies and outside the company’s direct data management. Ø Creates isolated islands of data within the organization. 67

End of First Meeting 69 End of First Meeting 69