INTRODUCTION A database consists of an organized collection

INTRODUCTION A database consists of an organized collection of interrelated data for one or more uses, typically in digital form. Digital databases are managed using database management systems, which store database contents, allowing data creation and maintenance, search and other access.

Purpose of Database: =>A database system should be a repository of data needed for an organization’s data processing. =>This data should be accurate , private and protected from damage. it should be organized so that diverse applications with different data requirements can employ the data.

n n n Different application programmers and different end users will have different views of the data which must be derived from a common overall data structure. The ways in which ends users want to utilize existing data will constantly change, and in some cases demands for new uses of the data will arise rapidly and urgently. The extent to which these demands can be satisfied determines the overall value of the database system.

Database concept n. Disadvantage: 1. inconsistency 2. unsharable data 3. unstandardized data 4. unsecure data 5. incorrect data.

Database management system n n n It provides a centralized control of the data IT is a software that responsible for storing, maintaining and utilizing database. A database along with a DBMS is referred to as a database system. Advantages 1. Databases reduces the data redundancy to a large extent: =>Duplication of data is known as Data Redundancy. n The database systems do not maintain separate copies of the same data. Rather, all the data are kept at one place and all the application that required data refer to centrally maintained database

n n If any change is to be made to data, it will be made at just one place and the same changed information will be available to all the application referring to it. Thus redundancy gets controlled and so are problems associated with it.

Database Management System Database End User Centrally Controlled Database System Application Programs

2. Databases can control data inconsistency to a large extent. n n n Inconsistent database provide incorrect or conflict information. When the redundancy is not controlled, there may be occasions on which the two entries about the same data do not agree (when one of them stores the updated information and the other does not). Data base said to be inconsistent. By controlling redundancy, the inconsistency is also be controlled. Even if there is some redundancy retained in the database due to some technical reasons, the database management system ensures that any change made to either of the two entries is automatically made to the other. This process is known as propagating up dates

n 3. Databases of data: n facilitate sharing Individual pieces of data in database may be shared by different users and users may have access to the same piece of data for different purposes. The database management system makes sure that data not only the existing application can may be satisfied without having to create any new stored files

4. Databases enforce standards The database management system can ensures that all the data (that is stored centrally ) follow the applicable standards. There may be certain standards laid by the company or organization using database. n Standards storing data formats is particularly desirable as an aid to data interchange or migration between system. n

5. Databases can ensure data security: The information stored inside a database is sometimes of great value to a corporations. Data Security: It refers to protection of data against accidental or intentional disclosure to unauthorized persons, or unauthorized modification or destruction. Privacy of Data : refers to the rights of individuals and organizations to determine for themselves when “how” and to what extent information about them is to be transmitted to others. 6. Integrity can be maintained through database: By integrated database we mean unification of several otherwise distinct data files, with any redundancy among those files partially or whole elimiminated.

DATABASE ABSTRACTION 1. A collection of interrelated files and a set of programs that allow users to access and modify these files is known as a data management system. 2. Database system is to provide the users only that much information that is required by them. 3. A database abstraction layer is an application programming interface which unifies the communication between a computer application and databases such as My. SQL, Postgre. SQL, Oracle or SQL. Traditionally, all database vendors provide their own interface tailored to their products which leaves it to the application programmer to implement code for all database interfaces he would like to support. Database abstraction layers reduce the amount of work by providing a consistent API to the developer and hide the database specifics behind this interface as much as possible. There exist many abstraction layers with different interfaces in numerous programming languages.

Database management system (DBMS) n n A collection of interrelated files and a set of programs that allow users to access and modify these files is known as a DBMS. End user is a person who is not a computer-trained person. An application system analyst is the one who is concerned about the all of the database of logical level. what are the relationships between the data-entities. Physical Storage system analyst is concerned with the physical implementation details of the database. i. e. How would the database the stored on which storage device. what will be the starting address of the database.

=>Physical storage system analyst n n It concerned with the physical implementation details of the database. i. e. how would the database be stored on which storage device? Levels of Database Implementation INTERNAL LEVEL (Physical Level): The lowest level of abstraction, the internal level, is the one closest to physical storage. This level is also sometimes termed as physical level. It describes how the data are stored on the storage medium

n CONCEPTUAL LEVEL: This level of describes what data are actually stored in the database. It also describes the relationships exists among data. The level are not concerned with how these logical in terms of simple data structure n EXTERNAL LEVEL : This is the level closest to the users and is concerned with the way in which the data are viewed by individual user

n Sales officer View 1 Item-Name Price Purchase Officer View 1 Item-Name Price Conceptual Level Item-number Character(6) Item-Name Characer(20) Internal Level External Level Application Programs are used to fetch the desired information Stored-Item Length=40 Item # Type=Byte(6), offset=0 Index=Ix Name Type=Byte(20), offset=6

Data Independence: The ability to modify a scheme definition in one level without affecting a scheme in the next higher level is called Data independence. n There are two Levels of data independence: 1. Physical Data Independence: The ability to change the physical schema without changing the logical schema is called physical data independence. For example, a change to the internal schema, such as using different file organization or storage structures, storage devices, or indexing strategy, should be possible without having to change the conceptual. 2. Logical. Data Independence: The ability to change the logical (conceptual) schema without changing the External schema (User View) is called logical data independence. For example, the addition or removal of new entities, attributes, or relationships to the conceptual schema should be possible without having to change existing external schemas. =>It more difficult to achieve logical data independence than physical data independence.

There are several models for databases: • Tabular ("flat file") – data in a single table, eg. spreadsheet • Hierarchical – e. g. . company departments • Relational – Each row is a record (patient, child etc. ) – Each column is an attribute (age, sex etc. )

=>Relational Data Model A relational database is a set of related tables, each concerning a specific topic. Its central idea was to describe a database as a collection of predicates over a finite set of predicate variables, describing constraints on the possible values and combinations of values. n The data is organized into tables (i. e. row and columns). n Tables are called Relations. n A Row in a table represents a relationship among a set of values. since a table is a collection of such relationships

For example n Suppliers (Supp#, supp-name, Status, City) n Underlined fields indicate primary key Advantages: n – Reduction of data redundancy n – Eliminates inconsistencies (errors). n – High data integrity and quality. n – Data can be descriptive. n – Allows implementation of security. n

Primary-key is the key-field that can uniquely identify a row in a relation. n Suppliers Supp# Supp-Name Status City S 1 Britannia 10 Delhi S 2 New Bakery 30 Mumbai S 3 Mother Dairy 10 Delhi S 4 Cook 50 Bangalore S 5 Haldiram 40 Jaipur

n n Row of relations are known as tuples and columns are known as Attribute The relational data model is based on a collection of tables (relations)

Conceptual design Logical design Physical design

Logical Data Model Logical Process Model Track 01 Country TR Physical Implementation Process Database creation CREATE DATABASE CREATE TABLE LOAD

Database Planning Current systems evaluation Development of Standards Technological feasability Operational feasability Economical feasability

n n n Network Data Model: The network model is a database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, is not restricted to being a hierarchy or lattice. A record is a collection of field (attribute), each of which contains only one data values. Link mean that it is an association between precisely two records

S 1 Britannia 10 I 2 20 Cake I 3 30 Bread I 6 Cream Biscuit

ADVANTAGES: • Network Model is able to model complex relationships and represents semantics of add/delete on the relationships. • Can handle most situations for modeling using record types and relationship types. • Language is navigational; uses constructs like FIND, FIND member, FIND owner, FIND NEXT within set, GET etc. Programmers can do optimal navigation through the database. DISADVANTAGES: • Navigational and procedural nature of processing • Database contains a complex array of pointers that thread through a set of records. • Little scope for automated "query optimization”

Hierarchical Data Model A hierarchical data model is a data model in which the data is organized into a tree-like structure. The structure allows repeating information using parent/child relationships: each parent can have many children but each child only has one parent. All attributes of a specific record are listed under an entity type. Example: Serial Num Type User Emp. No 3009734 -4 Computer 100 10 -L 3 -23 -283742 Monitor 100 Schultz 20 -B 2 -22 -723423 Monitor 100 Drachmei er 20 -B 232342 Printer 100 Emp No First Name Last Name Dept. Num 100 Sally Baker 10 -L 101 Jack Douglas 102 Sarah 103 David

n Hierarchical Form (Supplier Superior to items) S 1 Britannia I 2 I 3 I 6 10 Cake Delhi 5. 00 Bread 9. 00 10 Cram Biscuit 10. 00 20 20

ADVANTAGES: • Hierarchical Model is simple to construct and operate on • Corresponds to a number of natural hierarchically organized domains - e. g. , assemblies in manufacturing, personnel organization in companies • Language is simple; uses constructs like GET, GET UNIQUE, GET NEXT WITHIN PARENT etc. DISADVANTAGES: • Navigational and procedural nature of processing • Database is visualized as a linear arrangement of records • Little scope for "query optimization"

RELATIONAL MODEL A relational database matches data by using common characteristics found within the data set. The resulting groups of data are organized and are much easier for many people to understand. For example, a data set containing all the real-estate transactions in a town can be grouped by the year the transaction occurred; or it can be grouped by the sale price of the transaction; or it can be grouped by the buyer's last name; and so on.

Relational Model developed by E. F. Codd (IBM) n n Relation: A relation is a table i. e. data arranged in rows and columns. Domain : A domain is a pool of values from which the actual values appearing in a given column are drawn. A domain describes the set of possible values for a given attribute, and can be considered a constraint on the value of the attribute. A domain is said to be if atomic domain are considered to be indivisible domain.

n n Tuple: The row of tables (relations). Attributes: The columns of tables (relations). The number of attributes in a relation is called Degree and the number of row in a relation is known as Cardinality. View: A view is a table that does not really exist in its own right but is instead derived from one or more underlying base table (s).

n The concept of views ‘name of view Select * from Items ‘ Name of base table where Price>12 ‘ the condition Create VIEW Good. Items AS

=>Good. Items (A virtual table based on Items table) Item# Item-Name Price I 1 Milk 15. 00 I 4 Milk Bread 14. 00 I 7 Ice Cream 16. 00 I 9 Namkeen 15. 00

n n n Primary Key: It is a set of one or more attributes that can uniquely identify tuples within the relation. Supp# is the primary key for suppliers. Primary key consists of more than one attribute , it is called composite-primary key. Candidate keys: All attribute combinations inside a relation that can serve as primary key are candidate keys as they are candidates for the primary key position. For Example: Supp# and supp-Name are two candidate key in the suppliers relation. and also Item# and Item-Name are candidate key.

n n In case of two or more candidate keys, only one of them serves as the primary key. the rest of them are alternates key. Alternate Key: A candidate key that is not the primary key is called an alternate key. In suppliers table, supp-name is the alternate key and in Item table Item. Name is the alternate key Key : A non-key attribute, whose values are derived from

n n n Foreign-key: A non –key attribute, whose values are derived from the primary key of some other table, is known as Foreign-key in its current table. Primary key of one table is a foreign key of other table. For Example:

Sid Sname Email 01 A A@gmail. com 02 B B@gmail. com Sid Total Fee Paid Fee. Bal 01 6000 4000 2000 02 6000 2000 4000

n n n Referential Integrity is a system of rules that a DBMS uses to ensure that relationships between records in related tables are valid, and that users don’t accidentally delete or change related data. Relational Algebra: The relational algebra is a collection of operations on relations. Each operation takes one or more relations as its operand(s) and produces another relation as its result. Select operation: The Select operation selects tuples(horizontal subset) from a relation that satisfy a given condition.

It is denoted by lower letter Greek letter Sigma ( σ ). n For Example : To select those tuples from Items relation where the price is more than 14. 00. We shall write; σ Price >14. 00(Items) Item# I 1 I 7 I 9 Item-Name Milk Ice Cream Namkeen Price 15. 00 16. 00 15. 00

n n i. iii. Project Operation: The project operation yields a “vertical” subset of a given relation. In Result, Duplicating tuples are automatically removed It is denoted by Greek letter Pi(л). For Example: л Supp-Name, city(Suppliers) Supp-Name Britannia New Bakery City Delhi Mother Dairy Delhi Cook Haldiram Bangalore Mumbai Jaipur

Supp# Supp. Name Status City S 1 Britanni 10 a Delhi S 2 New 30 Baker y Mumba i S 3 Mother Dairy 10 Delhi S 4 Cook 50 S 5 Haldira m 40 Bangal ore Jaipur

n n n Cartesian Product operation: The Cartesian product is a binary operation. It is denoted by a cross(X). The Cartesian product of two relations A and B is written as A X B. It produce a new relation which has a degree equal to the sum of the degree of the two relations operated upon. All tuples of first relation are concatenated with all the tuples of second relation to form the tuples of the new relation.

Student Stud# S 001 S 002 S 003 Stud. Name Suraj Preeti Asra Hosteler Y N N Instructor Inst# I 01 I 02 Inst-Name KVS, jabalpur KVS, kurnool Subject IP Math

Stud# Stud. Name Hosteler Inst# Inst. Name Subject S 001 Suraj Y I 01 KVS, IP jabalpur S 001 Suraj Y I 02 KVS Math kurnool S 002 Preeti N I 01 S 002 Preeti N I 02 S 003 Asra N I 01 S 003 Asra N I 02 KVS, jabalpur KVS kurnool IP Math

Union operation: The union operation is a binary operation that requires two relations as its operands. It produces a third relation that contains tuples from both the operand relations n It is denoted by U. n Union of two relations x and y is x U y n Condition for union operation 1. The relations A and B must be of the same degree. that is, they must have the same number of attribute. 2. The domains of the ith attributes of A and ith attribute of B must be the same. n

Examples Drama n Roll No 13 17 n Name Rush Swati Song Roll No 2 10 13 Age 15 14 Name Manya Rishabh Rush Result of Drama U Song will be Roll No 2 10 13 17 Name Manya Rishabh Rush Swati Age 15 15 13 14 A B Age 15 15 13

The Set Difference Operation: n The set difference operation gives tuples that are in one relation but not in another n It is denoted by –(minus). n The expression A-B results in a relation containing those tuples in A but in B. n Result of Drama – Song will be Roll No Name Age 17 Swati 14 A B

n n The set Intersection operation The set intersection operation finds tuples that are common to the two operand relations It is denoted by ∏. Result of Drama ∏ Song will be Roll No 13 Name kush Age 15

END

n n n The Join operation The join operation joins two relations to form a new relation on the basis of common column the two operand relations have. Suppliers Supp# Supp-Name Status City S 1 S 2 Britannia 10 New Bakery 30 Delhi Mumbai S 3 Mother Dairy 10 Delhi S 4 S 5 Cook Haldiram 50 40 Bangalore Jaipur

Clients relation n Clients# C 101 C 102 C 104 n Client-Name ABC Co. XYZ Co. Zigs Co. City Delhi Jaipur If tables suppliers and Clients are joined over City columns ; the results will be

Supp # Supp. Name Stat City u s Clients# Client. Name City S 1 Britannia 10 Delhi C 101 ABC Co. Delhi S 3 Mother Dairy 10 Delhi C 104 Zigs Co. Jaipur S 5 Haldiram 40 Jaipur C 102 XYZ Co. Jaipur S 5 Haldiram 40 C 104 Zigs Co. Jaipur

Type of Joins 1. Inner or equi –join: Inner join is used to give matching records from joined tables. n n Normalization: It is a process by which one goes from to another preferable form through a set of rule and observations. The objective of normalization on tables is to remove the redundancy of data. First normal form: A relation R is in First Normal form(1 NF) if and only if all underlying domains of the relation atomic (indivisible) values.

In every tuples of the relation R, no attribute should have repeating groups. n In every tuple of the relation R, each attribute must have a value and that too an atomic(indivisible) values n Method of Bringing Unnormalized Relation into 1 NF 1. Remove all repeating groups from the relation 2. Decompose non-atomic attributes to atomic attributes. n

Student Stud Name Subject # 101 Shobhit English Maths Biology 102 Upasana Economic s English Hindi Stud # 101 Name Subject Shobhit English Maths 101 Shobhit Biology 102 Upasana Econom ics 102 Upasana English 102 Upasana Hindi

Second Normal Form(2 NF) n n n Functional Dependence: Functional Dependence is a relationship that exists between any two fields. J K(K is functionally dependent on j) J L(L is not functionally dependent on j) J K L X 1 0 X 1 6 Y 4 1 Y 4 9 Z 3 5

n n Second Normal Form(2 NF): A relation R is in Second Normal form(2 NF) if and only if it is 1 NF and every non-key attribute is fully dependent on the primary key. An attribute is a non-key if it does not participate in the primary key. Partial dependence means a non-key attribute is dependent upon a portion of the compositeprimary- key and not the whole primary key. Fully functional dependency: A FD X Y is a fully FD if remove of any attribute A from X means that the dependency does not hold good any more.

n Teacher T-Name Subject Child Name Gurinder Comp. Sc Saran Child Age 5 Anand Child Sex F Electroni Vivek 7 cs Shridhar Physics Sandhya 4 M Maya Comp. Sc Vinayaka 7 M Maya Comp. Sc Sangeeta 5 F Maya Comp. Sc Ananth M 3 F

n Teacher Child T-Name Subject Gurinder Comp. Sc Anand Shridhar Maya Electronics Physics Comp. Sc TName Gurind er Anand Child Name Saran Child Age 5 Child Sex F Vivek 7 M Shridh ar Maya Sandh ya Vinaya ka Sange eta Ananth 4 F 7 M 5 F 3 M Maya

n n n The third Normal Form: A Relation R is said to be in Third Normal Form(3 NF) if and only if it is in 2 NF and every non-key attribute is non-transitively dependent upon the primary key Every non-key attribute depends upon the primary key. Teacher T-Name Gurinder Anand Shridhar Maya Subject Comp. Sc Electronics Physics Comp. Sc Classes 5 4 5 2

n Teacher T-Name Gurinder Anand Shridhar Maya Sbcl Subject Comp. Sc Electronics Physics Comp. Sc Classes 5 4 5 2

n Boyce- Codd Normal Form(BCNF): A relation is in BCNF if it is in 3 NF and all of its determinants (i. e. the attributes upon which other attributes depend) are candidate key (i. e. the attributes that posses the properties to be a primary key but could not have been chosen as keys).

Different Parts Of A Database: v. Fields – different types of data (number or text) v. Records v. Queries v. Reports

Data Mining Searching for novel patterns, rules or relationships in data, e. g. : • correlations • classification • clustering • visualization Versus traditional statistics: hypothesis testing States Data Mart - Cognos