Chapter 17 Client-Server Processing Parallel Database Processing and

Outline § § Overview Client-Server Database Architectures Parallel Database Architectures for Distributed Database Management Systems § Transparency for Distributed Database Processing § Distributed Database Processing 17 -2

Evolution of Distributed Processing and Distributed Data § § Need to share resources across a network Timesharing (1970 s) Remote procedure calls (1980 s) Client-server computing (1990 s) 17 -3

Timesharing Network 17 -4

Simple Resource Sharing 17 -5

Client-Server Processing 17 -6

Distributed processing and data 17 -7

Motivation for Client-Server Processing § Flexibility: the ease of maintaining and adapting a system § Scalability: the ability to support scalable growth of hardware and software capacity § Interoperability: open standards that allow two or more systems to exchange and use software and data 17 -8

Motivation for Parallel Database Processing § Scaleup: increased work that can be accomplished § Speedup: decrease in time to complete a task § Availability: increased accessibility of system § Highly available: little downtime § Fault-tolerant: no downtime 17 -9

Motivation for Distributed Data § Data control: locate data to match an organization’s structure § Communication costs: locate data close to data usage to lower communication cost and improve performance § Reliability: increase data availability by replicating data at more than one site 17 -10

Summary of Distributed Processing and Data 17 -11

Client-Server Database Architectures § Client-Server Architecture is an arrangement of components (clients and servers) among computers connected by a network. § A client-server architecture supports efficient processing of messages (requests for service) between clients and servers. 17 -12

Design Issues § Division of processing: the allocation of tasks to clients and servers. § Process management: interoperability among clients and servers and efficiently processing messages between clients and servers. Middleware: software for process management • 17 -13

Tasks to Distribute § Presentation: code to maintain the graphical user interface § Validation: code to ensure the consistency of the database and user inputs § Business logic: code to perform business functions § Workflow: code to ensure completion of business processes § Data access: code to extract data to answer queries and modify a database 17 -14

Middleware § A software component that performs process management. § Allow clients and servers to exist on different platforms. § Allows servers to efficiently process messages from a large number of clients. § Often located on a dedicated computer. 17 -15

Client-Server Computing with Middleware 17 -16

Types of Middleware § Transaction-processing monitors: relieve the operating system of managing database processes § Message-oriented middleware: maintain a queue of messages § Object-request brokers: provide a high level of interoperability and message intelligence § Data access middleware: provide a uniform interface to relational and non relational data using SQL 17 -17

Two-Tier Architecture 17 -18

Two-Tier Architecture § A PC client and a database server interact directly to request and transfer data. § The PC client contains the user interface code. § The server contains the data access logic. § The PC client and the server share the validation and business logic. 17 -19

Three-Tier Architecture (Middleware Server) 17 -20

Three-Tier Architecture (Application Server) 17 -21

Three-Tier Architecture § To improve performance, the three-tier architecture adds another server layer either by a middleware server or an application server. The additional server software can reside on a separate computer. Alternatively, the additional server software can be distributed between the database server and PC clients. • • 17 -22

Multiple-Tier Architecture § A client-server architecture with more than three layers: a PC client, a backend database server, an intervening middleware server, and application servers. § Provides more flexibility on division of processing § The application servers perform business logic and manage specialized kinds of data such as images. 17 -23

Multiple-Tier Architecture 17 -24

Multiple-Tier Architecture with Web Server 17 -25

Web Service Architecture § Generalize multiple-tier architectures for electronic business commerce § Supports services provided/used by automated agents § Advantages § Deploy services faster § Communicate services in standard formats § Find services easier 17 -26

Web Service Components 17 -27

Web Service Standards § HTTP, FTP, TCP-IP § Simple Object Access Protocol: XML message sending § Web Service Description Language (WSDL) § Universal Description, Discovery Integration § Web Services Flow Language 17 -28

Parallel DBMS § Uses a collection of resources (processors, disks, and memory) to perform work in parallel § Divide work among resources to achieve desired performance (scaleup and speedup) and availability. § Uses high speed network, operating system, and storage system § Purchase decision involves more than parallel DBMS 17 -29

Basic Architectures 17 -30

Clustering Architectures 17 -31

Design Issues § Load balancing: CN architecture most sensitive § Cache coherence: CD architecture problem § Interprocessor communication: CN architecture most sensitive § Application transparency: no knowledge about parallelism 17 -32

Oracle Real Application Clusters 17 -33

Oracle RAC Features § § § Cache fusion to synchronize cache access Query optimizer intelligence Connection load balancing Automatic failover Comprehensive administration interface 17 -34

IBM DB 2 SPF 17 -35

IBM SPF Features § § Automatic or DBA determined partitioning Query optimizer intelligence High scalability Partitioned log parallelism 17 -36

Distributed Database Architectures § DBMSs need fundamental extensions. § Underlying the extensions are a different component architecture and a different schema architecture. § Component Architecture manages distributed database requests. § Schema Architecture provides additional layers of data description. 17 -37

Global Requests 17 -38

Component Architecture 17 -39

Schema Architecture I 17 -40

Schema Architecture II 17 -41

Distributed Database Transparency § Transparency is related to data independence. § With transparency, users can write queries with no knowledge of the distribution, and distribution changes will not cause changes to existing queries and transactions. § Without transparency, users must reference some distribution details in queries and distribution changes can lead to changes in existing queries. 17 -42

Motivating Example 17 -43

Fragments Based on the Cust. Region Column 17 -44

Fragments Based on the Ware. House. No Column 17 -45

Fragmentation Transparency § Fragmentation transparency provides the highest level of data independence. § Users formulate queries and transactions without knowledge of fragments (locations, or local formats). § If fragments change, queries and transactions are not affected. 17 -46

Location Transparency § Location transparency provides a lesser level of data independence than fragmentation transparency. § Users need to reference fragments in formulating queries and transactions. § However, knowledge of locations and local formats is not necessary. 17 -47

Local Mapping Transparency § Local mapping transparency provides a lesser level of data independence than location transparency. § Users need to reference fragments at sites in formulating queries and transactions. § However, knowledge of local formats is not necessary. 17 -48

Oracle Distributed Databases § Homogeneous and heterogeneous distributed databases § Emphasis on site autonomy § Provides local mapping transparency § Each site is a separately managed database. 17 -49

Oracle Links § One way link from local to remote § Support remote access to other users’ objects § Necessary to have knowledge of remote database objects § Use synonyms and views with links to reduce remote database knowledge 17 -50

Distributed Database Processing § Distributed data adds considerable complexity to query processing and transaction processing. § Distributed database processing involves movement of data, remote processing, and site coordination. § Performance implications sometimes cannot be hidden. 17 -51

Distributed Query Processing § Involves both local (intra site) and global (inter site) optimization. § Multiple optimization objectives § The weighting of communication costs versus local processing costs depends on network characteristics. § There are many more possible access plans for a distributed query. 17 -52

Distributed Transaction Processing § Distributed DBMS provides concurrency and recovery transparency. § Independently operating sites must be coordinated. § New kinds of failures exist because of the communication network. § New protocols are necessary. 17 -53

Distributed Concurrency Control § The simplest scheme involves centralized coordination. § Centralized coordination involves the fewest messages and the simplest deadlock detection. § The number of messages can be twice as much in distributed coordination. § Primary Copy Protocol is used to reduce overhead with locking multiple copies. 17 -54

Centralized Coordination 17 -55

Distributed Recovery Management § Distributed DBMSs must contend with failures of communication links and sites. § Detecting failures involves coordination among sites. § The recovery manager must ensure that different parts of a partitioned network act in unison. § The protocol for distributed recovery is the two phase commit protocol (2 PC). 17 -56

Voting and Decision Phases 17 -57

Summary § Utilizing distributed processing and data can significantly improve DBMS services but at the cost of new design challenges. § Client-server architectures provide alternatives among cost, complexity, and benefit levels. § Parallel database processing provides improved performance (speedup and scaleup) and availability. § Architectures for distributed DBMSs differ in the integration of the local databases and level of data independence. 17 -58