02cc71574de583cc7faa0642ddd4db86.ppt
- Количество слайдов: 51
S 3: A Secure Scalability Service for Dynamic Content Bruce Maggs Carnegie Mellon University and Akamai Technologies Joint work with Charlie Garrod and Amit Manjhi and Natassa Ailamaki, Phil Gibbons, Todd Mowry, Chris Olston, and Anthony Tomasic.
Number of requests a website receives is unpredictable CNN, NY Times, ABC News unavailable from 9 -10 AM (Eastern Time) Page views/day (in millions) CNN. com Content providers’ dilemma: how many resources to provision? Need on-demand scalability
Content Delivery Network (CDN) Solution Page views/day (in millions) CNN. com 50 k 1. 2 k 50 k Page was 1. 2 k instead of 50 k on 12 Sep, 01 Used Akamai on Election day Source: http: //www. tcsa. org/lisa 2001/cnn. txt http: //www. akamai. com/en/html/about/press 479. html
Typical Web-Site Architecture Request Users Execute Access code DB Response DB App Web Server Home server
CDN Architecture Internet core Users CDN nodes Content providers CDNs excel at delivering static content.
Advantages of CDNs • Large infrastructure handles load spikes • Clients charged on a per-usage basis – no need to guess what resources to provision • Moves data closer to end-users – decreases latency and increases throughput
CDN Application Services CDN’s can also run applications Internet Users DB but for data-intensive dynamic applications… database server becomes the bottleneck!
Methods to scale the database component • In-house database scalability: [DBCache, DBProxy, MTCache, NEC Cache Portal] – Must provision for peak load • Database outsourcing: Database as a service [Hacigumus+ ICDE ’ 02, SIGMOD ’ 02] – Have to cede control of data • Database Scalability Service (DBSS): Shared infrastructure that caches applications’ data [INRIA/LIP 6, CIDR ’ 05, SIGMOD ’ 06, ICDE ’ 07]
S 3 Database Scalability Service • CDN-like proxy nodes cache results of database queries – reduces load on central database servers • All database updates sent to central server – clients don’t cede ownership of their data • Uses publish/subscribe system to maintain data consistency – avoids additional load at the central server • Content provider may encrypt database requests/responses to protect sensitive data
Database Scalability Service users: Content Delivery Network DBSS Internet home server databases:
Database Scalability Service users: Internet Web and application servers DBSS home server databases:
Database Scalability Service client apps: DBSS Internet home server databases:
Outline • • Need for on-demand scalability S 3 invalidation mechanism Security-scalability tradeoff Reducing latency
Addressing consistency • TTL is wasteful: – Often refresh cached data unnecessarily (workloads dominated by reads) – Must set TTL=0 for strong consistency! • Solution: update or invalidate cached data only when affected by updates – Naïve approach: home organizations notify proxy servers of relevant updates not scalable Our approach: Fully-distributed, proxy-to-proxy update notification mechanism
Distributed Consistency Mechanism update users update notification proxy node Multicast Environment update notification • Distributed app-level multicast environment, e. g. , Scribe • Forward all updates to backend home servers
Configuring Multicast Channels • Key observation: Web applications typically interact with DB via a small, fixed set of query/update templates (usually 10 -100) • Example: SELECT qty FROM inv WHERE id = ? UPDATE inv SET qty = ? WHERE id = ? Templates: natural way to configure channels Options: Channel-by-query or Channel-by-update
Channel-by-Query Option • One channel per query template Q: C(Q) Begin caching result(s) of query Subscribe to C(Q) template Q Evict only query result for Q Unsubscribe from C(Q) Issue update Determine which query templates Q 1, …, Qn affected; send notification on each C(Qi) • Few subscriptions/cached result • Many invalidation notifications/update Conflicts determined lazily (upon update)
Channel-by-Update Option • One channel per update template U: C(U) Begin caching result(s) of query template Q Determine which update templates U 1, …, Un apply; subscribe to each C(Ui) Evict only query Unsubscribe from all C(Ui) result for Q above Issue update using template U Send notification on C(U) • Many subscriptions/cached result • Few invalidation notifications/update Conflicts determined eagerly (when caching Q)
Parameter-Specific Channels • Optimization: consider parameter bindings supplied at runtime … for example: • Q 5: SELECT qty FROM inv WHERE id = ? – When issued with id = 29, create extra parameterspecific channel C(5, 29) – Subscribe to both C(5) and C(5, 29) • Upon update: – If update affects a single item with id = X, send notification on channel C(5, X) • Saves work if X 29 – Updates affecting multiple items sent to C(5)
S 3 Prototype • • • Tomcat as proxy web server/servlet container Proxy database cache written in Java Queries: access cached data when possible – – • • Cache JDBC query results (i. e. , materialized views) Index results by JDBC query representation My. SQL 4 as back-end database Updates: sent to back-end database Invalidation notifications delivered via Scribe Experiments on Emulab (Utah) – Thanks!
Benchmark Applications • Bookstore (TPC-W, from UW-Madison) – Online bookseller, a standard web benchmark – Changed the popularity of books • Auction (RUBi. S, from Rice) – Modeled after Ebay • Bulletin board (RUBBo. S, from Rice) – Modeled after Slashdot Benchmarks model popular websites
Selective: cache queries only if subscribed to parameter-dependent groups
Impact of Cooperative Caching
Outline • • Need for on-demand scalability S 3 invalidation mechanism Security-scalability tradeoff Reducing latency
Guaranteeing security in a DBSS setting Limit ability to observe an application’s data by: – DBSS administrator – Unauthorized application through the DBSS Security-Scalability tradeoff in the DBSS setting Analyzing the code helps in managing this tradeoff
A simple solution for guaranteeing security • Outsource database scalability – Home server: master copies of all data— handles updates directly • No query execution on the DBSS – DBSS caches query results (read-only)—kept consistent by invalidation All data passing through the DBSS can be encrypted: Query, Update, Query results
A Simple Example toys (toy_id, toy_name) No Invalidations Q 1: toy_id=15 Empty Q 1 U 1 DBSS 11 Barbie 15 GI Joe Nothing is encrypted Home server Database Q 1: SELECT toy_id FROM toys WHERE toy_name=“GI Joe” U 1: DELETE FROM toys WHERE toy_id=5 Invalidate Empty Q 1: Result Q 1 U 1 Q 1: Result 11 Barbie 15 GI Joe Results are encrypted More encryption leads to more invalidations
Challenge: providing scalability while guaranteeing security When updates occur, DBSS needs to invalidate Application faces a dilemma in what data to encrypt (secure) More encryption Less encryption Conservative Invalidation Precise Invalidation Security Scalability Security-scalability tradeoff
Opportunity for managing the tradeoff Not all data is equally sensitive Data Sensitivity Completely insensitive Moderately sensitive Extremely sensitive Bestsellers list Inventory records, customer records Credit Card Information Don’t care Care but worried about scalability impact Secure at all costs But for most data, nontrivial to assess: 1. Data-sensitivity 2. Scalability impact of securing the data
Key Insight: arbitrary queries and updates not possible function get_toy_id ($toy_name) { $template: =“SELECT toy_id FROM toys WHERE toy_name=? ”; $query: =attach_to_template ($template, $toy_name); execute ($query); … } Given templates: Can statically identify data not needed for precise invalidation
Data not useful for invalidation: examples Example 1: Q 1: SELECT toy_id FROM toys WHERE toy_name=? Q 2: SELECT toy_name FROM toys WHERE toy_id=? No data is needed for precise invalidation Example 2: Q 1: SELECT toy_id FROM toys WHERE toy_name=? U 1: DELETE FROM toys WHERE toy_id=? Query parameters are not needed for precise invalidation (the query result is needed though)
Security without hurting scalability Data not needed for invalidation Can secure “for free” (without hurting scalability) Security Conscious Scalability Approach [SIGMOD ’ 06] As a result, Tradeoff has to be only managed over remaining data
Sample experiment: methodology • Scalability: max # concurrent users with acceptable response times • Security: # templates with encrypted results Users 5 ms 100 ms Home server CDN and DBSS • California Privacy Law determined sensitive data • Non-transactional invalidation • Start with a cold cache
Benchmark Applications • Bookstore (TPC-W, from UW-Madison) – Online bookseller, a standard web benchmark – Changed the popularity of books • Auction (RUBi. S, from Rice) – Modeled after Ebay • Bulletin board (RUBBo. S, from Rice) – Modeled after Slashdot Benchmarks model popular websites
Q 1 SELECT toy_id FROM toys WHERE toy_name=? Q 2 SELECT qty FROM toys WHERE toy_id=? Q 3 SELECT cust_name FROM customers WHERE cust_id=? U 1: DELETE FROM toys WHERE toy_id=5 Template Parameters Query result x Blind Scalability Security-Scalability Tradeoff Template x x Statement x x x View Invalidations All Q 1, Q 2, Q 3 All Q 1, Q 2 with toy_id=5 Q 1 with toy_id=5 Q 2 with toy_id=5 X denotes encrypted, visible
Scalability (number of concurrent users supported) Magnitude of Security-Scalability tradeoff 00 Benchmark Applications
Security Results Query data that can be encrypted “for free” and result 4 6 18 Auction 17 7 12 Bboard 7 7 14 Bookstore
Security Results in Detail • Auction: The historical record of user bids was not exposed • Bboard: The rating users give one another based on the quality of their posting • Bookstore: Book purchase association rules discovered by the vendor – customers who purchase book A also purchase book B
Scalability (Number of concurrent users supported) Scalability Conscious Security Approach (SCSA) to managing the tradeoff 900 Nothing encrypted SCSA 600 Everything encrypted 300 0 0 5 10 15 20 25 Security (Number of query templates with encrypted results) 1. Easy to either get good scalability or good security 2. SCSA presents a shortcut to manage the tradeoff 30
Outline • • Need for on-demand scalability S 3 invalidation mechanism Security-scalability tradeoff Reducing latency
Contributors to User Latency Request, high latency Response, high latency Web server App server Database Traditional architecture high latency CDN DBSS Database DBSS architecture A 42 single HTTP request Multiple database requests
Sample Web Application Code function find_comments ($user_id) { $template: =“SELECT from_id, body FROM comments WHERE to_id=? ” $query: =attach_to_template ($template, $user_id) $result: =execute ($query) foreach ($row in $result) print (get_body ($row), get_name (get_id ($row))) } (N+1) queries are issued because: • Convenient for programmers to abstract database values • No effect in the traditional setting Found many examples in the benchmark applications 43
Reducing User Latency in a DBSS Setting Transformations to reduce number of round-trips 1. Group execution of queries: MERGING transformation 2. Overlap execution of queries: NONBLOCKING transformation Web Application Code Procedural program with embedded SQL 44 Transformed Code Transformed program and SQL Holistic transformations using src-to-src compilers
The MERGING Transformation www. ebay. com John Names of users who have posted comments about John Content Delivery Network 1 Query 1. Find user_ids who have made comments 2. For each user_id, find name of the user 45 N Database Scalability Queries Service High latency
The MERGING Transformation Find names of users who have commented about John Names of users who have posted comments about John 1. Find user_ids who have made comments 2. For each user_id, find name of the user SELECT from_id, u. name FROM comments, users u WHERE from_id = u. id AND to_id = ? Assuming constant cache hit rate, the #round-trips to the database decreases by a factor of (N+1) 46
The NONBLOCKING Transformation www. amazon. com John Home page Content Delivery Network 1. Greet user 2. Get names of related books Database Scalability Service High latency 47 Issue queries concurrently to reduce latency
Applicability of the Transformations Either transformation applies to 25% (Auction), 75% (Bboard), and 50% (Bookstore) dynamic runtime interactions 48
Application: Impact on Latency Average latency in ms BBOARD 49 Transformations Overall latency decreases by 38%, the DBSS-DB latency decreases by 65%
Impact of Latency on Scalability Improved scalability Threshold Scalability Latency curve Latency Reduced latency curve Simultaneous users supported Reducing latency improves scalability 50
Scalability (number of concurrent users supported) Effect of the Transformations on Scalability Applying both transformations yield the best scalability 51
02cc71574de583cc7faa0642ddd4db86.ppt