b4e823e2024609fc87bed42d84260f20.ppt
- Количество слайдов: 35
CPT-S 580 -06 Advanced Databases Yinghui Wu EME 49 ADB (ln 26) 1
CPT-S 580 -08 Advanced Databases DBMS and Cloud Computing ü Cloud computing: overview ü Database design in cloud ADB (ln 26)
Cloud computing: concept 3
The Hype! ü Forrester in 2010 – Cloud computing will go from $40. 7 billion in 2010 to $241 billion in 2020. ü Gartner in 2009 - Cloud computing revenue will soar faster than expected and will exceed $150 billion by 2013. It will represent 19% of IT spending by 2015. ü IDC in 2009: “Spending on IT cloud services will triple in the next 5 years, reaching $42 billion. ” ü Companies and even Federal/state governments using cloud computing now: fedbizopps. gov
What is Cloud Computing? ü Cloud Computing is a general term used to describe a new class of network based computing that takes place over the Internet, – basically a step on from Utility Computing – a collection/group of integrated and networked hardware, software and Internet infrastructure (called a platform). – Using the Internet for communication and transport provides hardware, software and networking services to clients ü These platforms hide the complexity and details of the underlying infrastructure from users and applications by providing very simple graphical interface or API (Applications Programming Interface). 5
What is Cloud Computing? ü In addition, the platform provides on demand services, that are always on, anywhere, anytime and any place. ü Pay for use and as needed, elastic – scale up and down in capacity and functionalities ü The hardware and software services are available to – general public, enterprises, corporations and businesses markets ü A number of characteristics: – Remotely hosted: Services or data are hosted on remote infrastructure. – Ubiquitous: Services or data are available from anywhere. – Commodified: The result is a utility computing model similar to traditional that of traditional utilities, like gas and electricity - you pay for what you would want! 6
Cloud Architecture 7
Cloud Computing Characteristics Common Characteristics: Massive Scale Resilient Computing Homogeneity Geographic Distribution Virtualization Service Orientation Low Cost Software Advanced Security Essential Characteristics: On Demand Self-Service Broad Network Access Resource Pooling 8 Rapid Elasticity Measured Service
What is a Cloud? ü A single-site cloud (aka “Datacenter”) consists of – – – – Compute nodes (grouped into racks) Switches, connecting the racks A network topology, e. g. , hierarchical Storage (backend) nodes connected to the network Front-end for submitting jobs and receiving client requests (Often called 3 -tier architecture) Software Services ü A geographically distributed cloud consists of – Multiple such sites – Each site perhaps with a different structure and services
On-demand access: *aa. S Classification Software as a Service (Saa. S) Sales. Force CRM Lotus. Live Google App Engine 10 Platform as a Service (Paa. S) Infrastructure as a Service (Iaa. S)
On-demand access: *aa. S Classification On-demand: renting a cab vs. (previously) renting a car, or buying one. E. g. : – AWS Elastic Compute Cloud (EC 2): a few cents to a few $ per CPU hour – AWS Simple Storage Service (S 3): a few cents to a few $ per GBmonth ü Haa. S: Hardware as a Service – You get access to barebones hardware machines, do whatever you want with them, Ex: Your own cluster – Not always a good idea because of security risks ü Iaa. S: Infrastructure as a Service – You get access to flexible computing and storage infrastructure. Virtualization is one way of achieving this (what’s another way, e. g. , using Linux). Often said to subsume Haa. S. – Ex: Amazon Web Services (AWS: EC 2 and S 3), Eucalyptus, Rightscale, Microsoft Azure, Google Compute Engine.
On-demand access: *aa. S Classification ü Paa. S: Platform as a Service – You get access to flexible computing and storage infrastructure, coupled with a software platform (often tightly coupled) – Ex: Google’s App. Engine (Python, Java, Go) ü Saa. S: Software as a Service – You get access to software services, when you need them. Often said to subsume SOA (Service Oriented Architectures). – Ex: Google docs, MS Office on demand
Cloud computing: pros, cons and thoughts 13
Opportunities and Challenges ü The use of the cloud provides a number of opportunities: – It enables services to be used without any understanding of their infrastructure. – Cloud computing works using economies of scale: • It potentially lowers the outlay expense for start up companies, as they would no longer need to buy their own software or servers. • Cost would be by on-demand pricing. • Vendors and Service providers claim costs by establishing an ongoing revenue stream. – Data and services are stored remotely but accessible from “anywhere”. 14
Opportunities and Challenges ü In parallel there has been backlash against cloud computing: – Use of cloud computing means dependence on others and that could possibly limit flexibility and innovation: • The others are likely become the bigger Internet companies like Google and IBM, who may monopolise the market. • Some argue that this use of supercomputers is a return to the time of mainframe computing that the PC was a reaction against. – Security could prove to be a big issue: • It is still unclear how safe out-sourced data is and when using these services ownership of data is not always clear. – There also issues relating to policy and access: 15 • • If your data is stored abroad whose policy do you adhere to? What happens if the remote server goes down? How will you then access files? There have been cases of users being locked out of accounts and losing access to data.
Design of scalable DBMS over cloud: transactions Divy Agrawal, Sudipto Das, and Amr El Abbadi (VLDB 10) 17
Design Principle (I) ü Separate System and Application State – System metadata is critical but small – Application data has varying needs – Separation allows use of different class of protocols
Design Principle (II) ü Limit interactions to a single node – Allows systems to scale horizontally – Graceful degradation during failures – Obviate need for distributed synchronization
Design Principle (III) ü Decouple Ownership from Data Storage – Ownership refers to exclusive read/write access to data – Partition ownership – effectively partitions data – Decoupling allows light weight ownership transfer
Design Principle (IV) ü Limited distributed synchronization is practical – Maintenance of metadata – Provide strong guarantees only for data that needs it
Two Approaches to Scalability ü Data Fusion – Enrich Key Value stores – GStore: Efficient Transactional Multi-key access [ACM SOCC’ 2010] ü Data Fission – Cloud enabled relational databases – Elas. Tra. S: Elastic Tran. Sactional Database [Hot. Clouds 2009; Tech. Report’ 2010]
Data fusion Divy Agrawal, Sudipto Das, and Amr El Abbadi (VLDB 10) 23
Atomic Multi-key Access [Das et al. , ACM So. CC 2010] ü Key value stores: – Atomicity guarantees on single keys – Suitable for majority of current web applications ü Many other applications need multi-key accesses: – Online multi-player games – Collaborative applications ü Enrich functionality of the Key value stores
Key Group Abstraction ü Define a granule of on-demand transactional access ü Applications select any set of keys to form a group ü Data store provides transactional access to the group ü Non-overlapping groups
Keys located on different nodes Horizontal Partitions of the Keys Key Group A single node gains ownership of all keys in a Key. Group Formation Phase
Key Grouping Protocol ü Conceptually akin to “locking” ü Allows collocation of ownership at the leader ü Leader is the gateway for group accesses ü “Safe” ownership transfer: deal with dynamics of the underlying Key Value store – Data dynamics of the Key-Value store – Various failure scenarios ü Hides complexity from the applications while exposing a richer functionality
Implementing GStore Application Clients Transactional Multi-Key Access Grouping Middleware Layer resident on top of a Key-Value Store Grouping Transaction Layer Manager Key-Value Store Logic Distributed Storage Grouping Transaction Layer Manager Key-Value Store Logic
Data fission 29
Elastic Transaction Management [Das et al. , Hot. Cloud 2009, UCSB TR 2010] ü Designed to make RDBMS cloud-friendly ü Database viewed as a collection of partitions ü Suitable for standard OLTP workloads: – Large single tenant database instance • Database partitioned at the schema level – Multi-tenant with large number of small databases • Each partition is a self contained database
Elastic Transaction Management ü Elastic to deal with workload changes ü Dynamic Load balancing of partitions ü Automatic recover from node failures ü Transactional access to database partitions
Application Clients Application Logic Elas. Tra. S Client DB Read/Write Workload TM Master Lease Management Health and Load Management Metadata Manager Master Proxy OTM OTM Txn Manager P 1 Durable Writes MM Proxy P 2 DB Partitions Distributed Fault-tolerant Storage Pn Log Manager
What to optimize? Feature Traditional Cloud Cost [$] fixed optimize Performance [tps, secs] optimize fixed Scale-out [#cores] optimize fixed - fixed ? ? ? - optimize Predictability [s($)] Consistency [%] Flexibility [#variants] [Florescu & Kossmann, SIGMOD Record 2009]
Open Questions ü How to implement the storage layer? ü What is the right consistency model? ü What is the right programming model? ü Whether and how to make use of caching? ü How to balance functionality and scale? ü What are the right cloud abstractions? ü Cloud inter-operatability ü Moving beyond a single cloud [Adapted from D. Kossmann‘s ICDE 2010 Keynote]
References ü [Cooper et al. , ACM So. CC 2010] Benchmarking Cloud Serving Systems with ü ü ü YCSB, B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, R. Sears, In ACM So. CC 2010 [Brantner et al. , SIGMOD 2008] Building a Database on S 3 by M. Brartner, D. Florescu, D. Graf, D. Kossman, T. Kraska, SIGMOD’ 08 [Kraska et al. , VLDB 2009] Consistency Rationing in the Cloud: Pay only when it matters, T. Kraska, M. Hentschel, G. Alonso, and D. Kossmann, VLDB 2009 [Lomet et al. , CIDR 2009] Unbundling Transaction Services in the Cloud, D. Lomet, A. Fekete, G. Weikum, M. Zwilling, CIDR’ 09 [Das et al. , Hot. Cloud 2009] Elas. Tra. S: An Elastic Transactional Data Store in the Cloud, S. Das, D. Agrawal, and A. El Abbadi, USENIX Hot. Cloud, 2009 [Das et al. , ACM So. CC 2010] G-Store: A Scalable Data Store for Transactional Multi key Access in the Cloud, S. Das, D. Agrawal, and A. El Abbadi, ACM SOCC, 2010. [Das et al. , TR 2010] Elas. Tra. S: An Elastic, Scalable, and Self Managing Transactional Database for the Cloud, S. Das, S. Agarwal, D. Agrawal, and A. El Abbadi, UCSB Tech Report CS 2010 -04
References ü ü ü ü ü [Yang et al. , CIDR 2009] A scalable data platform for a large number of small applications, F. Yang, J. Shanmugasundaram, and R. Yerneni, CIDR, 2009 [Kossmann et al. , SIGMOD 2010] An Evaluation of Alternative Architectures for Transaction Processing in the Cloud, D Kossmann, T. Kraska, Simon Loesing, In SIGMOD 2010 [Aulbach et al. , SIGMOD 2009] A Comparison of Flexible Schemas for Software as a Service, S. Aulbach, D. Jacobs, A. Kemper, M. Seibold, In SIGMOD 2009 [Aulbach et al. , SIGMOD 2008] Multi-Tenant Databases for Software as a Service: Schema and Mapping Technicques, In SIGMOD 2008 [Weissman et al. , SIGMOD 2009] The Design of the Force. com Multitenant Internet Application Development Platform, C. D. Weissman, S. Bobrowski, In SIGMOD 2009 [Jacobs et al. , DTW 2007] Ruminations of Multi-Tenant Databases, D. Jacobs, S. Aulbach, In DTW 2007 [Chang et al. , OSDI 2006] Bigtable: A Distributed Storage System for Structured Data, F. Chang et al. , In OSDI 2006 [Cooper et al. , VLDB 2008] PNUTS: Yahoo!'s hosted data serving platform, B. F. Cooper et al. , In VLDB 2008 [De. Candia et al. , SOSP 2007] Dynamo: amazon's highly available key-value store, G. De. Candia et al. , In SOSP 2007
b4e823e2024609fc87bed42d84260f20.ppt