Providing Secure Storage on the Internet Barbara Liskov

Providing Secure Storage on the Internet Barbara Liskov & Rodrigo Rodrigues MIT CSAIL April 2005

Internet Services • Store critical state • Are attractive targets for attacks • Must continue to function correctly in spite of attacks and failures

Replication Protocols • Allow continued service in spite of failures – Failstop failures – Byzantine failures • Byzantine failures really happen! – Malicious attacks

Internet Services 2 • Very large scale – Amount of state – Number of users – Implies lots of servers • Must be dynamic – System membership changes over time

BFT-LS • Provide support for Internet services – Highly available and reliable – Very large scale – Changing membership • Automatic reconfiguration – Avoid operator errors • Extending replication protocols

Outline • • • Application structure MS specification MS implementation Application methodology Performance and analysis

System Model C C Unreliable Network C S • • S S Many servers, clients Service state is partitioned among servers Each “item” has a replica group Example applications: file systems, databases

Client accesses current replica group s s s s C

Client accesses new replica group s s s s C

Client contacts wrong replica group s s s s C

The Membership Service (MS) • Reconfigures automatically to reduce operator errors • Provides accurate membership information that nodes can agree on • Ensures clients are up-to-date • Works at large scale

System runs in Epochs • Periods of time, e. g. , 6 hours • Membership is static during an epoch • During epoch e, MS computes membership for epoch e+1 • Epoch duration is a system parameter • No more than f failures in any replica group while it is useful

Server IDs • Ids chosen by MS • Consistent hashing • Very large circular id space

Membership Operations • Insert and delete node • Admission control – Trusted authority produces a certificate • Insert certificate includes – ip address, public key, random number, and epoch range – MS assigns the node id ( h(ip, k, n) )

Monitoring • MS monitors the servers – Sends probes (containing nonces) – Some responses must be signed • Delayed response to failures • Timing of probes, number of missed probes, are system parameters • BF nodes (code attestation)

Ending Epochs • Stop epoch after fixed time • Compute the next configuration: Epoch number Adds and Deletes • Sign it – MS has a well known public key • Propagated to all nodes – Over a tree plus gossip

Guaranteeing Freshness C <nonce> <nonce, epoch #>σMS MS • Clients sends a challenge to MS • Response gives client a time period T during which it may execute requests • T is calculated using client clock

Implementing the MS • At a single dedicated node – Single point of failure • At a group of 3 f+1 – Running BFT – No more than f failures in system lifetime • At the servers themselves – Reconfiguring the MS

System Architecture • All nodes run application • 3 F+1 run the MS

Implementation Issues • Nodes run BFT – State machine replication (e. g. , add, delete) • Decision making • Choosing MS membership • Signing

Decision Making • Each replica probes independently • Removing a node requires agreement – One replica proposes – 2 F+1 must agree – Then can run the delete operation • Ending an epoch is similar

Moving the MS • Needed to handle MS node failures • To reduce attack opportunity – Move must be unpredictable • Secure multi-party coin toss • Next replicas are h(c, 1), …, h(c, 3 F+1)

Signing • Configuration must be signed • There is a well-known public key • Proactive secret sharing • MS replicas have shares of private key – F+1 shares needed to sign • Keys are re-shared when MS moves

Changing Epochs: Summary of Steps 1. Run the end. Epoch operation on state machine 2. Select new MS replicas 3. Share refreshment 4. Sign new configuration 5. Discard old shares

Example Service • Any replicated service • Dynamic Byzantine Quorums d. BQS – Read/Write interface to objects • Two kinds of objects – Mutable public-key objects – Immutable content-hash objects

d. BQS Object Placement • Consistent hashing 14 16 • 3 f+1 successors of object id are responsible for the object

Byzantine Quorum Operations • Public-key objects contain – State, signature, version number • Quorum is 2 f+1 replicas • Write: – Phase 1: client reads to learn highest v# – Phase 2: client writes to higher v# • Read: – Phase 1: client gets value with highest v# – Phase 2: write-back if some replicas have a smaller v#

d. BQS Algorithms – Dynamic Case • Tag all messages with epoch numbers • Servers reject requests for wrong epoch • Clients execute phases entirely in an epoch – Must be holding a valid challenge response • Servers upgrade to new configuration – If needed, perform state transfer from old group • A methodology

Evaluation • Implemented MS, two example services • Ran set of experiments on Planet. Lab, RON, local area

MS Scalability • Probes – use sub-committees • Leases – use aggregation • Configuration distribution – Use diffs and distribution trees

Fetch Throughput

Time to reconfigure • Time to reconfigure is small • Variability stems from Planet. Lab nodes • Only used F = 1, limitation of APSS protocol

d. BQS Performance

Failure-free Computation • Depends on no more than F failures while group is useful • How likely is this?

Probability of Choosing a Bad Group

Probability that the System Fails

Conclusion • Providing support for Internet services • Scalable membership service – Reconfiguring the MS • Dynamic replication algorithms – d. BQS – a methodology • Future research – Proactive secret sharing – Scalable applications

Providing Secure Storage on the Internet Barbara Liskov and Rodrigo Rodrigues MIT CSAIL April 2005