99e738c1c44e83951f01ebd9392f23a5.ppt
- Количество слайдов: 77
Fundamentals of Relational Database Design and Database Planning J. Trumbo Fermilab CSS-DSG 1
Outline n n n n Definitions Selecting a dbms Selecting an application layer Relational Design Planning A very few words about Replication Space 2
Definitions What is a database? A database is the implementation of freeware or commercial software that provides a means to organize and retrieve data. The database is the set of physical files in which all the objects and database metadata are stored. These files can usually be seen at the operating system level. This talk will focus on the organize aspect of data storage and retrieval. Commercial vendors include Micro. Soft and Oracle. Freeware products include mysql and postgres. For this discussion, all points/issues apply to both commercial and freeware products. 3
Definitions Instance A database instance, or an ‘instance’ is made up of the background processes needed by the database software. These processes usually include a process monitor, session monitor, lock monitor, etc. They will vary from database vendor to database vendor. 4
Definitions What is a schema? A SCHEMA IS NOT A DATABASE, AND A DATABASE IS NOT A SCHEMA. A database instance controls 0 or more databases. A database contains 0 or more database application schemas. A database application schema is the set of database objects that apply to a specific application. These objects are relational in nature, and are related to each other, within a database to serve a specific functionality. For example payroll, purchasing, calibration, trigger, etc. A database application schema not a database. Usually several schemas coexist in a database. A database application is the code base to manipulate and retrieve the data stored in the database application schema. 5
Definitions Cont. Primary Definitions n n Table, a set of columns that contain data. In the old days, a table was called a file. Row, a set of columns from a table reflecting a record. Index, an object that allows for fast retrieval of table rows. Every primary key and foreign key should have an index for retrieval speed. Primary key, often designated pk, is 1 or more columns in a table that makes a record unique. 6
Definitions Cont. Primary Definitions n n Foreign key, often designated fk, is a common column common between 2 tables that define the relationship between those 2 tables. Foreign keys are either mandatory or optional. Mandatory forces a child to have a parent by creating a not null column at the child. Optional allows a child to exist without a parent, allowing a nullable column at the child table (not a common circumstance). 7
Definitions Cont. Primary Definitions Entity Relationship Diagram or ER is a pictorial representation of the application schema. 8
Er Example 9
Definitions Cont. Primary Definitions Constraints are rules residing in the database’s data dictionary governing relationships and dictating the ways records are manipulated, what is a legal move vs. what is an illegal move. These are of the utmost importance for a secure and consistent set of data. 10
Definitions Cont. Primary Definitions Data Manipulation Language or DML, sql statements that insert, update or delete database in a database. Data Definition Language or DDL, sql used to create and modify database objects used in an application schema. 11
Definitions Cont. Primary Definitions A transaction is a logical unit of work that contains one or more SQL statements. A transaction is an atomic unit. The effects of all the SQL statements in a transaction can be either all committed (applied to the database) or all rolled back (undone from the database), insuring data consistency. 12
Definitions Cont. Primary Definitions n A view is a selective presentation of the structure of, and data in, one or more tables (or other views). A view is a ‘virtual table’, having predefined columns and joins to one or more tables, reflecting a specific facet of information. 13
Definitions Cont. Primary Definitions Database triggers are PL/SQL, Java, or C procedures that run implicitly whenever a table or view is modified or when some user actions or database system actions occur. Database triggers can be used in a variety of ways for managing your database. For example, they can automate data generation, audit data modifications, enforce complex integrity constraints, and customize complex security authorizations. Trigger methodology differs between databases. 14
Definitions Cont. Primary Definitions Replication is the process of copying and maintaining database objects, such as tables, in multiple databases that make up a distributed database system. Backups are copies of the database data in a format specific to the database. Backups are used to recover one or more files that have been physically damaged as the result of a disk failure. Media recovery requires the restoration of the damaged files from the most recent operating system backup of a database. It is of the utmost importance to perform regularly scheduled 15 backups.
Definitions Cont. Mission Critical Applications An application is defined as mission critical, imho, if 1. there are legal implications or financial loss to the institution if the data is lost or unavailable. 2. there are safety issues if the data is lost or unavailable. 3. no data loss can be tolerated. 4. uptime must be maximized (98%+). 16
Definitions Cont. ‘large’ or ‘very large’ or ‘a lot’ Seems odd, but ‘large’ is a hard definition to determine. Vldb is an acronym for very large databases. Its definition varies depending on the database software one selects. Very large normally indicates data that is reaching the limits of capacity for the database software, or data that needs extraordinary measures need to be taken for operations such as backup, recovery, storage, etc. 17
Definitions Cont. Commercial databases do not a have a practical limit to the size of the load. Issues will be backup strategies for large databases. Freeware does limit the size of the databases, and the number of users. Documentation on these issues vary widely from the freeware sites to the user sites. Mysql supposedly can support 8 T and 100 users. However, you will find arguments on the users lists that these numbers cannot be met. 18
Selecting a DBMS Many options, many decisions, planning, costs, criticality. For lots of good information, please refer to the urls on the last slides. Many examples of people choosing product. 19
Selecting a DBMS How do I Choose? Which database product is appropriate for my application? You must make a requirements assessment. Does you database need 24 x 7 availability? Is your database mission critical, and no data loss can be tolerated? Is your database large? (backup recovery methods) What data types do I need? (binary, large objects? ) Do I need replication? What level of replication is required? Read only? Read/Write? Read/Write is very expensive, so can I justify it? 20
Selecting a DBMS How do I Choose? Cont. If your answer to any of the above is ‘yes’, I would strongly suggest purchasing and using a commercial database with support. Support includes: n 24 x 7 assistance with technical issues n Patches for bugs and security n The ability to report bugs, and get them resolved in a timely manner. n Priority for production issues n Upgrades/new releases n Assistance with and use of proven backup/recovery methods 21
Selecting a DBMS The Freeware Choice Freeware is an alternative for applications. However, be fore warned, support for these databases is done via email to a ad hoc support group. The level of support via these groups may vary over the life of your database. Be prepared. Also expect less functionality than any commercial product. See http: //wwwcss. fnal. gov/dsg/external/freeware/ 22
Selecting a DBMS The Freeware Choice Freeware is free. Freeware is open source. Freeware functionality is improving. Freeware is good for smaller non-mission critical applications. 23
Selecting an Application Layer Again, planning takes center stage. In the end you want stability and dependability. n How many users need access? n What will the security requirements be? n Are there software licensing issues that need consideration? n Is platform portability a requirement? n Two tier or three tier architecture? 24
Selecting an Application Layer n n n Direct access to the database layer? (probably should be avoided) Are you replicating? How? Where? With what? There are no utilities that will port data from 1 database to another (i. e. , postgres to mysql). if database portability is a requirement, an independent code must be written to satisfy this requirement. 25
Selecting an Application Layer Cont. n n n n Application maintenance issues People availability, working with users as a team, talent, and turnover? (historically a huge issue) A ‘known’ or ‘common’ language? Freeware? Bug fixes, patches…are they important and timely? Documentation? Set standards, procedures, code reviews making sure the documentation exists and is clear. Is the application flexible enough to easily accommodate business rule changes that mandate modifications? The availability of an ER diagram at this stage is invaluable. We consider it a must have. There are no utilities to port data from 1 type of db to another. This lack of portability means a method to 26
Selecting an Application Layer Misc. application definitions… This presentation is not an application presentation, but I will mention a few terms you may hear. Sql the query language for relational databases. A must learn. ODBC, open database connectivity. The software that allows a database to talk to an application. JDBC, java database connectivity. 27
Relational Design The design of the application schema will determine the usability and query ability of the application. Done incorrectly, the application and users will suffer until someone else is forced to rewrite it. 28
Relational Design The Setup The database group has a standard 3 tier infrastructure for developing and deploying production databases and applications. This infrastructure provides 3 database instances, development, integration and production. This infrastructure is applicable to any application schema, mission critical or not. It is designed to insure development, testing, feedback, signoff, and an protected production environment. Each of these instances contain 1 or more applications. 29
Relational Design The Setup The 3 instances are used as follows: 1. Development instance. Developers playground. Small in size compared to production. Much of the data is ‘invented’ and input by the developers. Usually there is not enough disk space to ever ‘refresh’ with production data. 30
Relational Design Cont. The Setup 2. The integration instance is used for moving what is thought to be ‘complete’ functionality to a pre production implementation. Power users and developers work in concert in integration to make sure the specs were followed. The users should use integration as their sign off area. Cuts from dev to int are frequent and common to maintain the newest releases in int for user testing. 31
Relational Design Cont. The Setup 3. The production instance, real data. Needs to be kept pure. NO testing allowed. Very few logons. The optimal setup of a production database server machine has ~3 operating system logons, root, the database logon (ie oracle), and a monitoring tool. In a critical 24 x 7 supported database, developers, development tools, web servers, log files, all should be kept off the production 32 database server.
Relational Design Cont. The Setup Let’s talk about mission critical & 24 x 7 a bit. 1. To optimize a mission critical 24/7 database, the database server machine should be dedicated to running the database, nothing else. 2. All software products need maintenance and downtime. Resist putting software products on the db server machine so that their maintenance does not inhibit the running of the database. Further, if the product breaks, it could inhibit access to the database for a long period. Example, a logging application, monitoring users on the db goes wild, fills all available space and halts the database. If this logging app. were not on the dbserver machine, the db would be unaffected by the malfunction. 33
Relational Design Cont. The Setup 3. All database applications and database software require modifications. Most times these modification require down time because the schema or data modifications need to lock entire tables exclusively. If you are sharing your database instance with other many other applications, and 1 of those applications needs the database for an upgrade, all apps may have to take the down time. Avoid this by insuring your 24/7 database application is segregated from all other software that is not absolutely needed. In that way you insure any down times are specific to your cause. 34
Our 1 st relational example A cpu can house 1 or more databases CPU (d 0 ora 2) An database can accommodate 1 or more instances Databases on d 0 ora 2 (d 0 ofprd 1, d 0 ofint 1) An instance may contain 1 or more application schemas schema applications in d 0 ofprd 1 (sam, runs, calib) schema applications in d 0 ofint 1 (sam, runs, calib) 35
What is a schema? It is not Tables (columns/datatypes) having Constraints (not null, unique, foreign & primary keys) Triggers Indexes etc. Accounts Privileges & Roles Server side processes n. The environment (servers, OS) n. The results of queries, I. e objects n. Application Code One implements a schema by running scripts. These scripts can be run against multiple servers and should be archived. 36
Relational Design Getting Started Using your design tool, you will begin by relating objects that will eventually become tables. All the other schema objects will fall out of this design. You will spend LOADS of time in your design tool, honing, redoing, reacting to modifications, etc. The end users and the designers need to be working almost at the same desk for this process. If the end user is the designer, the end user should involve additional users to insure an unbiased and general design. It is highly suggested that the design be kept up to date for future documentation and maintainers. Tables are related, most frequently in a 0 to many relationship. Example, 1 run will result in 0 or more events. Analyzing and defining these relationships results in an application schema. 37
What will a good schema design buy you? I am afraid the 80% planning 20% implementation rule applies. Gather requirements. n Discovery of data that needs to be gathered. n Fast query results n Limited application code maintenance n Data flexibility n Less painful turnover of application to new maintainers. n Fewer long term maintenance issues. 38
Relational Design Let’s get started Write a requirements document. You will not be able to anticipate all requirements, but a document will be a start. A well designed schema naturally allows for additional functionality. Who are the users? What is their mission? Identify objects that need to be stored/tracked. Think about how objects relate to each other. Do not be afraid to argue/debate the relationships with others. 39
Relational Design So how do you get there? Design tools are available, however, they do not think for you. They will give you a clue that you are doing something stupid, but it won’t stop you. It is highly recommended you use a design tool. A picture says 1000 words. Create ER, entity relationship, diagrams. Get a commitment from the developer(s) to see the application through to implementation. We have seen several applications redone multiple times. A string of developers tried, left the project, and left a mess. A new developer started from scratch because there was no documentation or design. 40
Relational Design How do I get there? Adhere to the recommendations of your database vendor for setup and architecture. Don’t be afraid to ask for help or to see other examples. Don’t be afraid to pilfer others design work, if it is good, if it closely fits your requirements, then use it. Ask questions, schedule reviews with experts and users. Work with your hardware system administrators to insure you have the hardware you need for the proposed job to be done. 41
Relational Design Common Mistakes we see ALL the time n n Do not design your schema around your favorite query. A relational design will enable all queries to be speedy, not only your favorite. Don’t design the schema around your narrow view of the application. Get other users involved from the start, ask for input and review. 42
Relational Design Common Mistakes n Create a relational structure, not a hierarchical structure. The ER diagram should not necessarily resemble a tree or a circle. It is the logical building of relationships between data. Relationships flow between subsets of data. The resulting ER diagram’s ‘look’ is not a standard by which one can judge the quality of the design. 43
Relational Design Common Mistakes n n n Do not create 1 huge table to hold 99% of the data. We have seen a table with 1100+ columns…unusable, unqueryable, required an entire application rewrite, took over a year, made 80 tables from the 1 table. Do not create separate schemas for the same application or functions within an application. Use indices and constraints, this is a MUST! 44
Relational Design Examples of Common Mistakes n n Using timestamp as the primary key assumes that within a second, no other record will be inserted. Actually this was not the case, and an insert operation failed. Use database generated sequences as primary keys and NON-UNIQUE index on timestamp. A table with more than 900 columns. Such design will cause chaining since each record is not going to fit in one block. One record spanning many blocks, thus chaining, hence bad performance. 45
Relational Design Examples of Common Mistakes n n Do not let the application control a generated sequence. Have seen locking issues, and duplicate values issues when the application increments the sequence. Have the database increment/lock/constrain the sequence/primary key. That is why the databases have sequence mechanisms, use them. Use indices! An Atlas table with 200, 000 rows, halted during a query. Reason? No indices. Added a primary key index, instantaneous query response. Indices are not wasted space! 46
Relational Design Examples of Common Mistakes USE DATABASE CONSTRAINTS!!!!!! Have examples where constraints were not used, but ‘implemented’ via the api. Bugs in the api allowed data to be deleted that should not have been deleted, and constraints would have prevented the error. Have also seen apis error with ‘cannot delete’ errors. They were trying to force an invalid delete, luckily the database constraints saved the data. 47
Entity Relationship Diagrams 1 to many 48
Entity Relationship Diagrams many to many 49
Entity Relationship Diagrams 1 to 1 50
Relational Design The Good Calibration type might have 3 rows, drift, pedestal, & gain This is a parent table. Each calibration record will be Defined by drift, pedestal or gain. In addition to start and end times. This is a child table. 51
Relational Design The Bad You have now created 3 different children, all reporting the same information, when 1 child would suffice. Code will have to be written, tested, and maintained for 4 tables now instead of 2. 52
Relational Design The Ugly Now you have created 3 different applications, using 6 tables. All of which could be managed with 2 tables. Extra code, extra testing, extra maintenance. 53
Relational Design The Good…let’s recap AHHH, back to normal, or normalization as we refer to it. 54
Relational Design What to expect from a design tool n n An entity relationship diagram The ability to create the ddl (data definition language) needed The ability to project disk space usage Ddl in a format to allow you to enter the code into a code library (cvs), and that will allow you to run against your database 55
Relational Design Why bother? Experience from Run. II TO SAVE TIME AND PRECIOUS PEOPLE RESOURCES! Personnel consistency does not exist. Application developers come and go regularly. The documentation that a design product provides will the next developer an immediate understanding of the application in picture format. Application sharing is enhanced when others can look at your design and determine whether the application is reusable in their environment. Sam is a good example of an application that 3 experiments are now using. 56
Relational Design Why bother? Cont. When an application is under construction, the ER diagram goes to every application meeting, and quite possibly the wallet of the application leader. It is the pictorial answer to many issues. Planning for disk space has been an issue, the designer tool should assist with this task. 57
Planning Overall What do I need to plan for? People, hardware, software, obsolescence, maintenance, emergencies. How far out do I need to plan? Initially 2 -4 years. How often do I need to review the plans? Annually. What if my plan fails or looks undoable? Nip it in the bud, be proactive, come up with options. 58
Planning Overall n n Disk space requirements. My experience is all the wags, (wild guesses) fall short of what is needed. It is hard to predict the number of rows in a table. It would be easier if we knew the amount and results of the science ahead of time! Remember, 10 x what you think the data will take. Hardware requirements. Experience tells us that the database machine should serve 1 master (if it is a large database or mission critical), the database, nothing else. Ideally there will be root, a database monitor user and a database user, oracle for example. No apache, no log file areas, no applications, etc. 59
Planning Overall n n Growth and obsolesce. Plan for 3 -4 years before needing to replace hardware. Hardware and software become obsolete. New/upgraded software gives addition functionality that you will want/need. Maintenance. Do you change the oil in your car? Plan on 1 morning per month downtime for caring for the hardware and software. Security patches could mandate additional stoppages. I cannot stress how important this is. Fire walling will not protect you from bugs and obsolescence. If the downtime is not needed, it will not be taken. Planning maintenance time is as important as planning to buy 60 disks.
Planning User Requirements Will user requirements influence your hardware & software decisions? Do you need replication? What architecture is your api going to be? How many users will be loading the database and hardware? 61
Planning Maintenance n Database/Operating system software need upgrades. One always hopes one can get on a stable version of something and not upgrade. That is a fallacy. Major version upgrades provide needed and new functionality. Bug patches and security patches are a never ending fact of life. 62
Planning Backup and Recovery Backup and recovery procedures of vldb (very large databases) are difficult at best. Vldb is normally defined as mulitple Gig or tera byte databases. This is probably the most sensitive area when choosing a freeware database. Hardware plays a part here as well. Insure when planning for hardware there is plan for backup and recovery. Disk and tape 63
Planning Good Practices with a Hammer Make a standards document and enforce its use. When dbas and developers are always on the same page, life is easier for both. Expectations are clear and defined. Anger and disappointment are lessened. System as well as database standards need to be followed and enforced. 64
Planning Failover Yikes, we are down! Everyone always wants 24 x 7 scheduled uptime. Until they see the cost. Make anyone who insists on real 100% uptime to justify it (and pay for it? ). 98 -99% uptime can be realized at a much lower cost. Uptime requirements will influence, possibly dictate, database choices, hardware choices, fte requirements. 65
Planning Failover The cheapest method of addressing a failure is proactive planning. Make sure your database and database software backed up. Unless you are using a commercial database with roll forward recovery, assume you will lose all dml since your last backup if you need to recover. This should dictate your backup schedule. Do not forget tape backups as a catastrophic recovery method. Practice recovery on your integration and development databases. Practice different scenarios, delete a datafile, delete the entire database. 66
Replication is the process of copying and maintaining database objects in multiple databases that make up a distributed database system. Replication can improve the performance and protect the availability of applications because alternate data access options exist. 67
Replication Cont. n n Oracle Supports 3 types of replication READ ONLY Snapshots (Materialized views), Advanced Replication and streams based replication. Streams allows ddl modifications made to the master automatically. Streams can be configured in uni-directional ( Single Source and one or more than targets) or master to master where updates can happen to any participant database. Advanced replication also supports master to master. But streams based replication is recommended. READ ONLY Snapshots replication from a Sun box to a Sun & Linux box(s) is being done in CDF. When a replica is under maintenance there is failover to another replica. The replicas are up and running in read only mode if the master is down for maintenance. 68
Replication cont. Oracle master to master replication allows for updates on both the master and replica sides. Master to master is a complex and a high maintenance replication. It seems to be the 1 st option the unwitting opt for. Both Cern and Fermi dbas have requested firm justification before considering this type of replication request. Every link in the multi master would be required to be a fully staffed, as downtime will be critical. 69
Replication cont. 1. 2. 3. 4. Disk Space for Archives. If receiving site is down for extended period of time, then source db should be tuned enough to hold the archives logs, otherwise, one has to reinstantiate the replication. Reasonable downtime for target depends upon archive area being generated on source. Space, space and more space. Conflict Resolution In Master to Master, conflict resolution may be challenge. Rules should be well defined to resolve the data conflicts. Design of Data Model if Primary Keys are populated by sequences , there is very much chance of overlapping the sequences and will cause integrity constraints. Data Model should be designed very carefully. DB Support In Master to Master Replication, all master sites should be in 24*7 support mode. Otherwise , sync up of data will be challenge or one may lead to reinstantiation of replication. Reinstantiation is not unplug and play type of 70 situation.
Freeware Replication My. SQL has replication in the last stable version (3. 23. 32, v 4. 1 is out). It is masterslave replication using binary log of operations on the server side. It is possible to build star or chain type structures. There is a Postgre. SQL replication tool. We have not tested it yet. 71
Lost in Space is the 1 area consistently under estimated in every application I have seen. Imho, consistently, data volume initial estimates were undersized by a factor of 2 or 3. For example, Run. II events were estimated at 1 billion rows. This estimate was surpassed Feb. 2004. We will probably end up with 4 -5 billion event rows. That is a lot of disk space. Disk hardware becomes unsupported, and obsolete in what seems to be a blink of an eye. 72
Lost In Space cont. 8 x N Gb Unexpected? Rollback Data Index Redo All databases use disk to store data. Data Index mirror Backup Replication Good rule of thumb: You need 10 x the disk to hold a given amount of data in an RDB. Operate in 2 year cycles: • First 2 years storage available on day 1. • Evaluate growth at end of year 1, begin prep of next 2 yr. 73
Lost in Space, cont. You will use as much disk space as you purchase, and then some. Database indices will take MINIMALLY at least as much space as the tables, probably considerably more. Give WIDE lead time to purchase disk storage. New disks are not installed and configured over night. They require planning, downtime and $. 74
Additional References **WARNING some of these may be database specific. n Intro to database design http: //www. cc. gatech. edu/classes/AY 2000/cs 4400_sprin g/cs 4400 a/ n Intro to Oracle tutorial http: //w 2. syronex. com/jmr/edu/db/ n Evolutionary Database Design http: //www. martinfowler. com/articles/evodb. html mentions 1 dba for atlas n Sql course http: //sqlcourse. com/ 75
Additional References n n ***Highly recommended reading, db comparatives http: //wwwcss. fnal. gov/dsg/external/freeware/ db infrastructure standard, support levels, etc. for fermi computing http: //wwwcss. fnal. gov/dsg/external/oracle_admin/ 76
Additional References n n Oracle Designer tutorial http: //wwwcss. fnal. gov/dsg/internal/ora_adm/index. htm#d esigner (choose Oracle Designer tutorial or Oracle Designer Short Cuts and Lessons Learned) Btev specific additional information http: //www css. fnal. gov/dsg/external/BTe. V/index. html 77
99e738c1c44e83951f01ebd9392f23a5.ppt