Скачать презентацию Security and Replication and Course Wrap-up Zachary Скачать презентацию Security and Replication and Course Wrap-up Zachary

033df4e4c4b16111ccb17c206f3f39a0.ppt

  • Количество слайдов: 35

Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS Security and Replication … and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems PNUTS slide content courtesy of Brian Cooper

Secure Transactions § Authentication using public/private key pairs is essential today § Consider every Secure Transactions § Authentication using public/private key pairs is essential today § Consider every Web transaction – we want to know whom we’re conversing with! … versus ending up with a phishing attack! 2

Secure Sockets Layer (SSL) Relies on a trusted third party § Certificate authority issues Secure Sockets Layer (SSL) Relies on a trusted third party § Certificate authority issues certificates to certify a server (CA) and its public key § Verisign is perhaps the best known of these A server S generates public-private keypair § Sends the public key, other info (plus $$$) to Verisign (etc. ) § Gets back acertificate with: CA name S’s name, URL, public key Timestamp and expiration info 3

Example Certificate Owner: CN=GTE Cyber. Trust Root, O=GTE Corporation, C=US Issuer: CN=GTE Cyber. Trust Example Certificate Owner: CN=GTE Cyber. Trust Root, O=GTE Corporation, C=US Issuer: CN=GTE Cyber. Trust Root, O=GTE Corporation, C=US Serial number: 1 a 3 Valid from: Fri Feb 23 23: 01: 00 GMT 1996 until: Thu Feb 23 23: 59: 00 GMT 2006 Certificate fingerprints: MD 5: C 4: D 7: F 0: B 2: A 3: C 5: 7 D: 61: 67: F 0: 04: CD: 43: D 3: BA: 58 SHA 1: 90: DE: 9 E: 4 C: 4 E: 9 F: 6 F: D 8: 86: 17: 57: 9 D: D 3: 91: BC: 65: A 6: 89: 64 4

The SSL Protocol Client C connects to server S from enterprise E § S The SSL Protocol Client C connects to server S from enterprise E § S sends E’s certificate (cleartext) § C validates the certificate using the CA (e. g. , Verisign)’s public key § C generates and sends to S session keyencrypted with E’s a public key Java has built-in support for SSL (Java Secure Socket Extension, integrated in 1. 4) and a tool for managing certificates (keytool) 5

So… § The client and server know each other given SSL § How do So… § The client and server know each other given SSL § How do we go ahead and make a purchase? § Most commonly: you enter your credit card number § Sometimes this is stored in the retailer’s system for future purposes! Best case: s The CC info is stored in a special, firewalled server, not part of the web site s Web server has other account info about you s When a transaction goes through, web site sends order to this special server, which combines it with CC info and sends it onward 6

Replication… Core of the Cloud § The vision of the “cloud”: a “computing utility” Replication… Core of the Cloud § The vision of the “cloud”: a “computing utility” that is geographically distributed § At its core: geographical replication as well as partitioning § What to replicate (including granularity) § Whereto replicate § How to maintain consistency (and howfresh data needs to be) 7

What to Replicate Cost to maintaining consistency if data is changing § Larger objects, What to Replicate Cost to maintaining consistency if data is changing § Larger objects, slower networks, frequent updates, freshness requirements replication is more expensive § May be able to send a “diff” instead of the whole object Thus, difference between LAN and WAN replication: § Local-area / cluster: Single-writer, multiple-reader is often replicated data e. g. , CNN § Wide-area: Need to limit replication to seldom-updated data, or relax the freshness or consistency constraints e. g. , Akamai (images, video), Google index 8

Where to Place Replicas in the Internet Want to place them at points where Where to Place Replicas in the Internet Want to place them at points where they can handle many requests and reduce traffic in bottlenecks § Commonly, least one replica in Europe, Asia, US West Coast, at US East Coast C 5 C 1 C 2 C 3 Server 1 C 4 C 6 Server 2 congested or failure-prone link C 7 C 8 C 9 9

Schemes for Maintaining Consistency Goal is to trade off performance vs. consistency guarantees § Schemes for Maintaining Consistency Goal is to trade off performance vs. consistency guarantees § § Lock-based protocols Invalidation Lease Time-to-live 10

Lock-Based Protocols § Guaranteestrong consistency § Similar to distributed version of what’s done in Lock-Based Protocols § Guaranteestrong consistency § Similar to distributed version of what’s done in a database § Client request for an item requires a lockat its handling read server § Update to an item requires write lock a § Multiple read locks can be held concurrently; write lock must be exclusive What are the potential pitfalls of this approach? Is it resilient to network partition? 11

Invalidation Protocols § If a server is to update an item, it can multicastthis Invalidation Protocols § If a server is to update an item, it can multicastthis to all replicas § Requires servers to know who all of the other parties are § May be somewhatweaker than lock-based models why? – Common variation: lease-based protocol § A replicated item is “leased” for a particular period § If the item is updated during its lease, it is invalidated/refreshed § After it expires, it is dropped What are the pros and cons of these protocols? 12

Time-to-Live-Based Replication § Generally used when freshness constraints aren’t severe § Replicas are provided Time-to-Live-Based Replication § Generally used when freshness constraints aren’t severe § Replicas are provided with an expectation for how likely they are likely to be current § After the “time-to-live” expires, they need to be revalidated How does this compare to the previous approaches? 13

Replication in “Cloud” Services § Yahoo’s PNUTS, Google’s Big. Table are based on the Replication in “Cloud” Services § Yahoo’s PNUTS, Google’s Big. Table are based on the notion that there is locality of data access § Consider consistency within each record but ignore cross-record consistency § e. g. , in a social network, we should coordinate accesses to the same user (but don’t care about consistency with unrelated friends) … but even here, we might be able to tolerate relaxed consistency among the users 14

Yahoo’s PNUTS Platform A B 42342 42521 A B C D E F E Yahoo’s PNUTS Platform A B 42342 42521 A B C D E F E W C 66354 W D E 12352 75656 E C F 15677 E E W W E C E Indexes and views Geographic replication Parallel database A B C D E F 15 42342 42521 66354 12352 75656 15677 E W W E C E Structured, flexible schema

Query model Per-record operations § Get § Set § Delete Multi-record operations § Multiget Query model Per-record operations § Get § Set § Delete Multi-record operations § Multiget § Scan § Getrange Web service (RESTful) API 16

System Architecture Local region Remote regions Clients REST API Routers YMB Tablet controller Storage System Architecture Local region Remote regions Clients REST API Routers YMB Tablet controller Storage units 17

Tablet splitting and balancing Each storage unit has many tablets (horizontal partitions of the Tablet splitting and balancing Each storage unit has many tablets (horizontal partitions of the table) Storage unit may become a hotspot Storage unit Tablet Overfull tablets split Tablets may grow over time Shed load by moving tablets to other servers 18

Range queries Apple Avocado Grapefruit…Pear? Banana Blueberry Canteloupe Grape Kiwi Lemon Lime Mango Orange Range queries Apple Avocado Grapefruit…Pear? Banana Blueberry Canteloupe Grape Kiwi Lemon Lime Mango Orange MINCanteloupe. Lime Strawberry. Lime. MAX Strawberry Router Strawberry. MAX SU 1 SU 3 SU 2 Grapefruit…Lime? SU 1 SU 2 SU 1 Lime…Pear? Strawberry Tomato Watermelon Storage unit 1 Storage unit 2 Storage unit 3 20

Updates 1 8 Sequence # for key k Write Routers Message brokers 3 Write Updates 1 8 Sequence # for key k Write Routers Message brokers 3 Write key k 2 7 4 Sequence # for key k Write key k 5 SU SU SUCCESS SU 6 Write key k 21

Asynchronous replication and consistency 22 Asynchronous replication and consistency 22

Asynchronous Replication 23 Asynchronous Replication 23

Consistency Model § Goal: make it easier for applications to reason about updates and Consistency Model § Goal: make it easier for applications to reason about updates and cope with asynchrony § Consider a single record for Brian Cooper’s Facebook entry: Record inserted v. 1 24 Update v. 2 v. 3 v. 4 Update v. 5 v. 6 Generation 1 v. 7 Delete Update v. 8 Time

Consistency Model Read (local) Stale version v. 1 25 v. 2 v. 3 v. Consistency Model Read (local) Stale version v. 1 25 v. 2 v. 3 v. 4 Stale version v. 5 v. 6 Generation 1 v. 7 Current version v. 8 Time

Consistency Model Read up-to-date Stale version v. 1 26 v. 2 v. 3 v. Consistency Model Read up-to-date Stale version v. 1 26 v. 2 v. 3 v. 4 Stale version v. 5 v. 6 Generation 1 v. 7 Current version v. 8 Time

Consistency Model Read ≥ v. 6 Stale version v. 1 27 v. 2 v. Consistency Model Read ≥ v. 6 Stale version v. 1 27 v. 2 v. 3 v. 4 Stale version v. 5 v. 6 Generation 1 v. 7 Current version v. 8 Time

Consistency Model Write Stale version v. 1 28 v. 2 v. 3 v. 4 Consistency Model Write Stale version v. 1 28 v. 2 v. 3 v. 4 Stale version v. 5 v. 6 Generation 1 v. 7 Current version v. 8 Time

Consistency Model Write if = v. 7 ERROR Stale version v. 1 29 v. Consistency Model Write if = v. 7 ERROR Stale version v. 1 29 v. 2 v. 3 v. 4 Stale version v. 5 v. 6 Generation 1 v. 7 Current version v. 8 Time

Consistency Model Write if = v. 7 ERROR Stale version Current version Mechanism: per Consistency Model Write if = v. 7 ERROR Stale version Current version Mechanism: per record mastership v. 1 30 v. 2 v. 3 v. 4 v. 5 v. 6 Generation 1 v. 7 v. 8 Time

PNUTS Recap § An interesting compromise between consistency and performance/availability § Used underneath many PNUTS Recap § An interesting compromise between consistency and performance/availability § Used underneath many of Yahoo’s properties § … And an exemplar of the new generation of cloud services 31

Experiments – Show It’s So! § The general goal: to help demonstrateand show whya Experiments – Show It’s So! § The general goal: to help demonstrateand show whya real-world artifact provides a benefit § Versus some benchmark or naïve strategy § We also want to understand why there’s a benefit § Some common kinds of experiments: § § Usabilitysome sort of user tests, versus a benchmark : Performance we increase the workload, what happens? : as Scalability we increase the data, devices, nodes, what happens? : as Complexity : especially for things like code, what happens as we make the task harder or bigger? 32

Experimentation § In general, experiments should follow the scientific method : § Hypothesis our Experimentation § In general, experiments should follow the scientific method : § Hypothesis our method will do better than XYZ on workloads like (e. g. , QWV, which are representative of domain ABC) § Experiment (examine this – may need many trials, random workloads, etc. ) § Conclusion (show, with statistically significant measurements, that the hypothesis is true) § Often, the hypothesis almost goes unsaid in computer science – it’s implicitin the choice of the problem – but it is there! § Note that many attributes, e. g. , elegance, style, are not very amenable to experiments § Others, like expressiveness, generally need to be rather proven than run 33

Experimental Workloads § There are generally three kinds of systems experiments: § Synthetic microbenchmark: Experimental Workloads § There are generally three kinds of systems experiments: § Synthetic microbenchmark: experimental runs are done over inputs that are generated to stress a specific factor, but is not particularly realistic Examples: a hard disk random access test; a web server’s maximum throughput Really shows the factor of interest; can be tweaked, scaled, etc. § Synthetic based on real behavior: experimental runs are done over inputs that are modeled after real data, but perhaps generated randomly Examples: SPEC benchmarks; TPC-W web transaction benchmark Enables us to generate more inputs, testing scalability, etc. § Real-world: traces are collected of real system behavior over real data Disadvantage: hard to quantify or control the different factors 34

Experimental Methodology § Consider the important factors that you wish to examine (and demonstrate) Experimental Methodology § Consider the important factors that you wish to examine (and demonstrate) § Scalability – can typically be in terms of running time, size of the problem, space consumed, etc. § Here: performance is what matters § Break it down into individual parameters § Crawl & index time; time to answer a query; etc. § Consider a workload that helps measure the parameter § Crawl 1000 documents; run 50 queries 10 times apiece; etc. § Vary one parameter at a time, study effects § Number of machines; number of threads per machine; etc. § Run experiment multiple times; average and show 95% confidence intervals in line (continuous) or bar (discrete) chart 35

Course Recap (Until Next Week’s Midterm 2!) § Distributed, Web-scale systems are here to Course Recap (Until Next Week’s Midterm 2!) § Distributed, Web-scale systems are here to stay! § They create many issues that are not totally resolved, and for which there is no one answer: § § § Heterogeneity Timing Partitioning and replication Consistency and integrity Etc. § This course tried to give you a sense of the issues and state-of-theart– as well as the skills to go out and work in this domain § I hope the amount of work we all sank into the material (and the homeworks) will pay off for you! § And stay tuned – there’s lots more to come! Sensor networks, semantic Web, mobile systems, location-based services, … 36