
0845b241708e3396aed3d2bde94881ae.ppt
- Количество слайдов: 33
Future Directions in Advanced Storage Services Danny Dolev School of Engineering and Computer Science Hebrew University
Case Study -replication for efficiency and robustness Storage Area Network (SAN) – Current technology utilizes standard Ethernet connectivity and Cluster of workstations • Message ordering is used to overcome possible inconsistency in case of failures. • During “stable ” periods total ordering is not used because of its high latency • What about “stress” periods ? ? ? •
Motivation • Message delivery order is a fundamental building block in distributed systems • The agreed order allows distributed applications to use the state-machine replication model to achieve fault tolerance and data replication • Replicated systems are often built atop Group Communication Systems (GCS). • • provide message ordering, reliable delivery and group membership Many GCS were introduced with variety of optimization tradeoffs – each has its own bottleneck, preventing it from becoming truly scalable
Current State • High performance implementations use a management layer that resides in the critical path and provides: • • • This layer consumes valuable CPU cycles needed to the actual “body of work“ • • • Message ordering Membership State synchronization Consistency No standard interface (API) and no Interoperability Depicts a specific programming methodology (e. g. , event driven) The network capacity outperforms any progress in CPU capability
The challenges • As network speed reaches several 10 th of Gbs even a multi-core server reaches its CPU limits (approximately 1 hz = 1 bps ) GHz/Gbps Rx ratio GHz/Gbps Tx ratio *The graphs appear in the paper: “TCP performance revisited” (ispass’ 03) by Foong et. al. and are used with the authors‘ permission.
The challenges • New techniques are called for to free the CPU to do a productive work • Extra resources exist: Peripheral devices are equipped with programmable processors (GPU, disk controllers, NICs) • Some devices have dedicated CPUs with unique properties (SIMD, TCAM memory, d/encryption logic) • • Offloading parts of the application to such devices is the new dimension!!!
Reasons for Offloading • Memory Bottlenecks – reduced memory pressure and cache-misses (due to filtering done at the device) • Better timeliness guarantees – GPOS ↔ Embedded OS (RTOS) – avoiding “OS noise” (interrupts, context switches, timers etc. )
Reasons for Offloading • Security – Another level of isolation – Harder to tamper with • Reduced power consumption – Pentium 4 2. 8 Ghz: 68 Watt – Intel XScale 600 Mhz: 0. 5 Watt
Sample Devices: Graphics AGEIA Phys. X - 500 Mhz multi-core processor - Specialized physics units IBM T 60 - ATI Mobility™ Radeon® X 1300 6 programmable shader processors • 512 MB • NVIDIA® Ge. Force® 6/7800, 600$ 400 Mhz Core; 512 MB DDR Memory Bandwidth (GB/sec): 54. 4
Sample Devices: Graphics • Compared to the CPU, GPU performance has been increasing at a much faster rate - SIMD architecture (Single Instruction, Multiple Data) ~3 times Moore’s law ~12 Gflops
Sample Devices: Networking • Today’s Network Interface Cards (NICs) are equipped with an onboard CPU. – – Execute proprietary code Inaccessible to the OS “Killer NIC” (http: //www. killernic. com/Killer. Nic/) 400 Mhz Network Processing Unit; 64 MB DDR Embedded Linux OS “world’s first Network Card designed specifically for Online Gaming“
Replication and Offloading • Offloading reusable components that implement various distributed algorithms will facilitate the development of cluster replication and reliability. • Possible candidates: • • • Reliable Broadcast Total Order: timestamp ordering, Token Ring, etc. Membership Services Failure Detectors Atomic Commit Protocols (2 PC, 3 PC, E 3 PC, etc. ) Locking Service
Example: Offloaded TO-Application • • We have offloaded Lamport's Timestamp ordering algorithm to the networking device Application Architecture: TO Service GUI Host Orderer Device Reliable Broadcast TO Service Orderer Reliable Broadcast Device PC 1 PC n
Hydra: An Offloading Framework • Offloading application is a tedious task Depends on device capabilities, SDK and toolchain • Requires kernel knowledge (device drivers, DMA) • Repeated for each target device • • We have developed a generic offloading framework that enables a developer to design the offloading aspects of the application at design time. • Joint work with: Yaron Weinsberg (HUJI), Tal Anker (Marvell), Muli Ben-Yehuda (IBM), Pete Wyckoff (OSC)
HYDRA Programming Model • Hydra programming model enables one to develop an Offload-Aware (OA) Applications • • “aware” of available computing resources The minimal unit for offloading is called: “Offcode” (i. e. , “Offloaded-Code”) Exports a well defined interface (like COM objects) • Given as open source or as compiled binaries • Described by Offcode Description File (ODF) • • Exposes the offcode’s functionality (interfaces)
Offcode Libraries Offcode Library Networking Math BSD Socket socket. odf Graphics CRC 32 Security OA-App crc 32. odf ort mp i ort imp User Lib mpeg Decoder. odf
Offcode Description File Offcode Description
Channels • Offcodes are interconnected via Channels Determines various communication properties between offcodes (I) An Out-Of-Band Channel, OOB-channel, is attached to every OA-application and Offcode • • • Not performance critical (uses memory copies) Used for initialization, control and events dissemination B A Specialized channel OOB-channel C
Channels (II) A specialized channel is created for performance critical communication. • Hydra provides several channel types: • • Unicast / Multicast Reliable / Unreliable Synchronized / Asynchronous Buffered / Zero-Copy R/W/Both
Design Methodology • We follow the “layout design” methodology first presented in Far. Go 1 and later in Far. Go-DA 2. • Offload-aware applications are designed by two aspects: 1. Basic logic design: Design the application logic and define the components to be offloaded. 2. Offloading Layout design: Define the communication channels between offcodes and their location constraints. “Far. Go-System”, ICDCS’ 99, Ophir Holder and Israel Ben-Shaul (2) “A programming model and system support for disconnected-aware applications on resource-constrained devices”, ICSE’ 02, Yaron Weinsberg and Israel Ben-Shaul (1)
1. Logical Design (the example) Component GUI Description Provides the viewing area and user controls (define a message pattern, frequency and send it) TO Service Provides the TO API: TO_broadcast() TO_recv() Lamport. Orderer Implements the specific algorithm instance (Timestamp Ordering) Reliable. Boradcast Implements a simple RB algorithm
2. Offloading Layout Design 1 2 Components Legend 1: GUI 2: TO Service 3: Lamport Orderer 4: Reliable Broadcast 3 4 net
Channel Constraints Link Constraint (default) B. ODF B target = Device 1 or Device 2 A. ODF A target = Device 1 A Device 1 Link B Device 2
Channel Constraints Link Constraint (default): B. ODF B target = Device 1 or Device 2 A. ODF A target = Device 1 B Link A Device 1 Device 2
Channel Constraints Link Constraint (default): B. ODF B target = Device 1 or Device 2 A. ODF A target = Device 1 A Link B Device 1 Device 2
Channel Constraints Pull Constraint: B. ODF B target = Device 1 or Device 2 A. ODF A target = Device 1 A Device 1 Pull B Device 2
Channel Constraints Pull Constraint: B. ODF B target = Device 1 or Device 2 A. ODF A target = Device 1 A Pull B Device 1 Device 2
Channel Constraints Gang Constraint: B. ODF B target = Device 2 A. ODF A target = Device 1 A Device 1 Gang B Device 2
Channel Constraints Gang Constraint: B. ODF B target = Device 2 A. ODF A target = Device 1 A Device 1 Gang B Device 2
Finally: Application Deployment Logical Devices Layout Graph mapping Physical Devices mapping Offcode Generation Offloading Execution
Evaluation OA Total-Order Application
Evaluation OA Total-Order Application 5 Intel Pentium~4 2. 4~GHz systems 512 MB of RAM, 32 -bit, 33~MHz PCI bus. Programmable Netgear 620 NICs, 512 k. B RAM. We used Linux version 2. 6. 11 with the Hydra module Dell Power. Connect 6024 Gigabit ethernet switch
Conclusions • We are at the beginning of a journey for enabling an application developer to fully utilize the available computing resource Peripherals • Multi-core systems • • Offloading can improve the performance of distributed applications, advanced storage services, IDS systems, VMMs etc.