Скачать презентацию Squirrel A peer-topeer web cache Sitaram Iyer Rice Скачать презентацию Squirrel A peer-topeer web cache Sitaram Iyer Rice

8ed8f74677ec61d8c82c8327dc38c5fe.ppt

  • Количество слайдов: 43

Squirrel: A peer-topeer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron Squirrel: A peer-topeer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel 2002 / Sitaram Iyer / Tuesday July 23 / (Rice PODC

Web Caching 1. Latency, 2. External traffic, 3. Load on web servers and routers. Web Caching 1. Latency, 2. External traffic, 3. Load on web servers and routers. Deployed at: Corporate network boundaries, ISPs, Web Servers, etc.

Web Cache Browser Cache Client Browser Web Server Centralized Web Cache Browser Cache Client Web Cache Browser Cache Client Browser Web Server Centralized Web Cache Browser Cache Client Corporate LAN Internet

Cooperative Web Cache Browser Cache Client Web Cache Browser Cache Web Server Web Cache Cooperative Web Cache Browser Cache Client Web Cache Browser Cache Web Server Web Cache Client Corporate LAN Internet

Decentralized Web Cache Squirrel Browser Cache Client Browser Web Server Browser Cache Client Corporate Decentralized Web Cache Squirrel Browser Cache Client Browser Web Server Browser Cache Client Corporate LAN Internet

Distributed Hash Table Peer-to-peer location service: Pastry Operations: Insert(k, v) Lookup(k) Peer-to-peer routing and Distributed Hash Table Peer-to-peer location service: Pastry Operations: Insert(k, v) Lookup(k) Peer-to-peer routing and location substrate k 1, v 1 nodes k 2, v 2 k 3, v 3 k 4, v 4 k 5, v 5 k 6, v 6 • Completely decentralized and self-organizing • Fault-tolerant, scalable, efficient

Why peer-to-peer? 1. Cost of dedicated web cache No additional hardware 2. Administrative effort Why peer-to-peer? 1. Cost of dedicated web cache No additional hardware 2. Administrative effort Self-organizing network 3. Scaling implies upgrading Resources grow with clients

Setting • Corporate LAN • 100 - 100, 000 desktop machines • Located in Setting • Corporate LAN • 100 - 100, 000 desktop machines • Located in a single building or campus • Each node runs an instance of Squirrel

Mapping Squirrel onto Pastry Two approaches: • Home-store • Directory Mapping Squirrel onto Pastry Two approaches: • Home-store • Directory

Home-store model client URL hash home LAN Internet Home-store model client URL hash home LAN Internet

Home-store model client home …that’s how it works! Home-store model client home …that’s how it works!

Directory model Client nodes always cache objects locally. Home-store: home node also stores objects. Directory model Client nodes always cache objects locally. Home-store: home node also stores objects. Directory: the home node only stores pointers to recent clients, and forwards requests.

Directory model client Internet LAN home Directory model client Internet LAN home

Directory model client home Randomly choose entry from table Directory model client home Randomly choose entry from table

Directory: Advantages Avoids storing unnecessary copies of objects. Rapidly changing directory for popular objects Directory: Advantages Avoids storing unnecessary copies of objects. Rapidly changing directory for popular objects seems to improve load balancing. Home-store scheme can incur

Directory: Disadvantages Cache insertion only happens at clients, so: • • active clients store Directory: Disadvantages Cache insertion only happens at clients, so: • • active clients store all the popular objects, inactive clients waste most of their storage. Implications:

Directory: Load spike example • Web page with many embedded images, or • Periods Directory: Load spike example • Web page with many embedded images, or • Periods of heavy browsing. Many home nodes point to such clients! Evaluate …

Trace characteristics Microsoft in : Cambrid Redmond ge Total duration 1 day 31 days Trace characteristics Microsoft in : Cambrid Redmond ge Total duration 1 day 31 days Number of clients 36, 782 105 Number of HTTP requests 16. 41 million 0. 971 million Peak request rate 606 req/sec 186 req/sec Number of objects 5. 13 million 0. 469 million Number of cacheable objects 2. 56 million 0. 226 million

Total external traffic (GB) [lower is better] R ed m on d Total external Total external traffic (GB) [lower is better] R ed m on d Total external traffic 105 No web cache 100 95 Directory Home-store 90 Centralized cache 85 0. 001 0. 1 1 10 100 Per-node cache size (in MB)

ge Total external traffic (GB) [lower is better] Ca m br id Total external ge Total external traffic (GB) [lower is better] Ca m br id Total external traffic 6. 1 No web cache 6 5. 9 Directory 5. 8 Home-store 5. 7 5. 6 5. 5 Centralized cache 0. 001 0. 1 1 10 100 Per-node cache size (in MB)

% of cacheable requests R ed m on d LAN Hops 100% 80% 60% % of cacheable requests R ed m on d LAN Hops 100% 80% 60% 40% 20% 0% 0 1 2 3 4 5 6 Total hops within the LAN Centralized Home-store Directory

ge % of cacheable requests Ca m br id LAN Hops 100% 80% 60% ge % of cacheable requests Ca m br id LAN Hops 100% 80% 60% 40% 20% 0% 0 1 2 3 4 5 Total hops within the LAN Centralized Home-store Directory

Number of times observed R ed m on d Load in requests per sec Number of times observed R ed m on d Load in requests per sec 100000 Home-store Directory 10000 100 10 1 0 10 20 30 40 50 Max objects served per-node / second

ge Number of times observed Ca m br id Load in requests per sec ge Number of times observed Ca m br id Load in requests per sec 1 e+07 Home-store Directory 1 e+06 100000 1000 10 1 0 10 20 30 40 50 Max objects served per-node / second

Number of times observed R ed m on d Load in requests per min Number of times observed R ed m on d Load in requests per min 100 Home-store Directory 10 1 0 50 100 150 200 250 300 350 Max objects served per-node / minute

ge Number of times observed Ca m br id Load in requests per min ge Number of times observed Ca m br id Load in requests per min Home-store Directory 10000 100 10 1 0 20 40 60 80 100 120 Max objects served per-node / minute

Fault tolerance Sudden node failures result in partial loss of cached content. Home-store: Proportional Fault tolerance Sudden node failures result in partial loss of cached content. Home-store: Proportional to failed nodes. Directory: More vulnerable.

Fault tolerance If 1% of Squirrel nodes abruptly crash, the fraction of lost cached Fault tolerance If 1% of Squirrel nodes abruptly crash, the fraction of lost cached content is: Home-store Redmond Cambrid ge Directory Mean 1% Mean 1. 71% Max 19. 3% Mean 1. 65% Max 9. 8% 1. 77% 3. 52%

Conclusions • Possible to decentralize web caching. • Performance comparable to a centralized web Conclusions • Possible to decentralize web caching. • Performance comparable to a centralized web cache, • Is better in terms of cost, scalability, and administration effort, and • Under our assumptions, the homestore scheme is superior to the directory scheme.

Other aspects of Squirrel • Adaptive replication – Hotspot avoidance – Improved robustness • Other aspects of Squirrel • Adaptive replication – Hotspot avoidance – Improved robustness • Route caching – Fewer LAN hops

Thanks. Thanks.

(backup) Redmond Total Mean per-node Max per-node Storage utilization Home-store Directory 97641 MB 61652 (backup) Redmond Total Mean per-node Max per-node Storage utilization Home-store Directory 97641 MB 61652 MB 2. 6 MB 1664 MB

(backup) Fault tolerance Home-store Directory Equations Mean H/O Max Hmax /O Mean (H+S)/O Max (backup) Fault tolerance Home-store Directory Equations Mean H/O Max Hmax /O Mean (H+S)/O Max max(Hmax, Smax)/O Redmond Mean 0. 0027% Max 0. 0048% Mean 0. 198% Max 1. 5% Cambridge Mean 0. 95% Max 3. 34% Mean 1. 68% Max 12. 4%

(backup) Full home-store protocol other req client req other req (LAN) a : object (backup) Full home-store protocol other req client req other req (LAN) a : object or notmod from home b 1 : req b 3 : object or notmod from origin b 2 (WAN) origin server

(backup) Full directory protocol other req req a 2 , d 2 : req (backup) Full directory protocol other req req a 2 , d 2 : req client a 1 : no dir, go to origin. Also d 1 home b : not-modified origin server a 3 , d 3 dir c 2 , e 4 : object or not-modified origin server e 3 delegate c 1 , e 1 : req e 2 : c. GET req

(backup) Peer-to-peer Computing Decentralize a distributed protocol: – Scalable – Self-organizing – Fault tolerant (backup) Peer-to-peer Computing Decentralize a distributed protocol: – Scalable – Self-organizing – Fault tolerant – Load balanced Not automatic!!

Decentralized Web Cache Browser Cache Web Server Browser Cache LAN Internet Decentralized Web Cache Browser Cache Web Server Browser Cache LAN Internet

Challenge Decentralized web caching algorithm: Need to achieve those benefits in practice! Need to Challenge Decentralized web caching algorithm: Need to achieve those benefits in practice! Need to keep overhead unnoticeably low. Node failures should not become significant.

Peer-to-peer routing, e. g. , Pastry Peer-to-peer object location and routing substrate = Distributed Peer-to-peer routing, e. g. , Pastry Peer-to-peer object location and routing substrate = Distributed Hash Table. Reliably maps an object key to a live node. Routes in log 16(N) steps (e. g. 3 -4 steps for 100, 000 nodes)

Home-store is better! Simpler home-store scheme achieves load balancing by hash function randomization. Directory Home-store is better! Simpler home-store scheme achieves load balancing by hash function randomization. Directory scheme implicitly relies on access patterns for load distribution.

Directory scheme seems better… Avoids storing unnecessary copies of objects. Rapidly changing directory for Directory scheme seems better… Avoids storing unnecessary copies of objects. Rapidly changing directory for popular objects results in load balancing.

Interesting difference Consider: – Web page with many images, or – Heavily browsing node Interesting difference Consider: – Web page with many images, or – Heavily browsing node Directory: many pointers to some node. Home-store: natural load balancing. Evaluate …

Fault tolerance When a single Squirrel node crashes, the fraction of lost cached content Fault tolerance When a single Squirrel node crashes, the fraction of lost cached content is: Home-store Redmond Directory Mean 0. 0027% Mean 0. 2% Max 1. 5% 0. 95% Mean 1. 7% 3. 34% Max Cambrid Mean ge Max 0. 0048% 12. 4%