V S Subrahmanian Invited talk at CLIMA-IV

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 MASS: Multiagent Security and Survivability V. S. Subrahmanian University of Maryland Joint work with Sarit Kraus, Cihan Tas, and Yingqian Zhang

Survivability of Multi-Agent Systems (MASs) © V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Problem: External events may cause an MAS to crash. Examples of such events are: power failures, OS crashes, Malignant attacks, etc. Approach: Replication of agents. Questions to ask: When to replicate? Where to replicate? Which agents (who) to replicate?

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Talk Outline Architectures for multiagent survivability Centralized probabilistic survivability Agent-oriented probabilistic survivability Centralized Probabilistic Survivability Details Outline of 3 algorithms for agent oriented distributed survivability Experimental results

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Centralized approach A MAS (set of agents) is deployed over a given network of host nodes. A special “survivability program” is place on a node selected by the MAS developer.

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Agent oriented approach A special survivability agent is deployed at one or more nodes in the network. These agents automatically collaborate to increase survivability of the MAS.

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Talk Outline Architectures for multiagent survivability Centralized probabilistic survivability Agent-oriented probabilistic survivability Centralized Probabilistic Survivability Details Outline of 3 algorithms for agent oriented distributed survivability Experimental results

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 MAS and Network Assumptions Agents: Multiagent application (MAS): a finite set A of agents. Memory Requirements: provide one or more services; located on a host computer; require resources. Each agent requires a certain amount of memory from host. Each host node has some fixed amount of memory to give the set of agents. Network Assumptions: fully connected, defined as Ne(N, edges, mem)

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Definition: (Deployment) A deployment, : N 2 A, specifies which agents are located at a given node. must satisfy the following: Every agent must be deployed somewhere. The agents deployed at a node cannot use more memory than that node makes available. Example: (n 1) = {a, b} (n 2) = {a, c}

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Definition: Disconnect probability A disconnect probability function for a network (N, Edges, mem) is a mapping dp: N C[0, 1] Example: Statistical, past experience, expert opinion C[0, 1] is the set of all closed subintervals of [0, 1] dp(n 1)=[0. 2, 0. 3] says that there is a 20%-30% probability that the node n will get disconnected. dp(n 2)=[0. 25, 0. 25] says that disconnect prob. is exactly 25%. Dp can be extended easily to include temporal projections.

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Definition: Future Networks A possible future network of (N, Edges, mem) consists of a subset of the nodes and a subset of the edges Involving the selected nodes.

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Example of Future Networks

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Related Work Methods for agent cloning for load balancing (Sycara, Shehory, Decker) Methods for agent replication for fault tolerance (Fedoruk, Marin, Sens, Fan) Network reliability– studied extensively. Fault-Tolerance software systems: N-version approach (Lyu, He). BUT: In the case of MAS, No answer to: who to replicate, how many replicas, where these replicas should be located. No work on: Probabilistic methods of survivability

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Outline Motivation Assumptions and definitions Related Work Problem statement What is survivability? Finding optimal survivability. Node based heuristics Agent base heuristics Experiment, comparison and Results Conclusion

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Computing Optimal Deployment Given a network (Ne) and a disconnect Probability (dp) function: Find a deployment whose probability of survivability is maximal.

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 What is Survivability? What is the probability with which it is guaranteed that the MAS will survive? Survivability: at least one copy of each agent will keep functioning.

Constraints on Future Networks CONS(dp, Ne) © V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Prob(Ne)—the probability that future network Ne will arise. Suppose, Ne 1, …, Nek are all possible future networks. Suppose Ne’ 1, …, Ne’l are the future networks that includes node n: prob(Ne 1)+…+prob(Nek ) =1; prob (Nej ) >=0; 1 -dp(n)=prob(Ne’ 1)+…. +prob(Ne’l) In this talk, we assume dp returns a point probability. (Paper allows interval probabilities. )

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 A probability of survival of Suppose, Ne 1, …, Ner are all the possible future networks such that is a deployment w. r. t. Nej Minimize prob(Ne 1)+…+prob(Ner ) subject to CONS(dp, Ne) Guaranteed that the actual prob. of survival is greater than or equal to this.

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Deployment Example Disconnect Prob. Node dp n 1 a n 1 0. 1 n 2 b n 2 0. 2 n 3 a, b n 3 0. 3 Minimize p 3+p 4+p 5+p 6+p 7 Subject to p 1+p 4+p 5+p 7=0. 9 pi>=0 p 2+p 4+p 6+p 7=0. 8 p 3+p 5+p 6+p 7=0. 7 p 1+p 2+p 3+p 4+p 5+p 6+p 7+p 8=1 Possible Future Networks # 1 2 3 4 5 6 7 8 Nodes n 1 n 2 n 3 n 1, n 2 n 1, n 3 n 2, n 3 n 1, n 2, n 3 Solution 0. 7

Computing the Survival of a Deployment © V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 We can solve the linear program using simplex or any other method. Problem: the size of the linear program is enormous as the number of possible future networks is huge. We do NOT assume independence of dp. Else, we could add a new constraint, prob({n 1, n 2})=[1 -dp(n 1)] * [1 -dp(n 2)]

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Proposition If an agent is located in a given set of nodes and another agent is located in those nodes and some others, then nothing is gained by putting the second agent in any of the other nodes. Example: a 1 a 2 If a 2 is not in purple node, we do not lose. Saves time. Corollary: when searching for an optimal deployment, there is no need to look at ones where the set of “locations” of one agent is a superset of the set of locations of another agent.

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Proposition If an agent is located in a given set of nodes and another agent is located in those nodes and some others, then nothing is gained by putting the second agent in any of the other nodes. a 1 a 2 Example: a 1 a 2 If a 2 is not in purple node, we do not lose. Saves time. Corollary: when searching for an optimal deployment, there is no need to look at ones that the “locations” of one agent is a superset of others.

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Definitions An agent a is relevant with respect to and Ne if there is no other agent which is deployed at a strict subset of nodes at which a is deployed. Nodes in which no relevant agents are deployed are not necessary. a 1 a 2

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Theorem Suppose MAS is a multiagent application, Ne=(N, Edges, mem) is a network, dp is a disconnect probability function and is a deployment for MAS on Ne. Let Ne’=(N’, Edges’, mem) where N’ is the set of necessary nodes of Ne w. r. t , and Edges’={(n 1, n 2)|n 1, n 2 N’}. If ’ is the restriction of on Ne’ then surv( )=surv( ’) Proof: need to show the equivalent of 2 minimization expressions with different set of constraints. BOTTOM LINE: When computing survivability of a deployment, it is enough to restrict interest to necessary nodes.

CDP: compute survivability given Ne, MAS, , dp. © V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Remove unnecessary nodes and create Ne’ ; ’. Compute hitting sets for Ne’ ; ’ For any possible future network Check if the pfn contains at least one hitting set. Create and return the result of the appropriate minimization problem Will be used for finding optimal deployment.

CDP: compute survivability given Ne, MAS, , dp. © V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Remove unnecessary nodes and create Ne’ ; ’. Compute hitting sets for Ne’ ; ’ For any possible future network Check if the pfn contains at least one hitting set. Create and return the result of the appropriate minimization problem Will be used for finding optimal deployment. Node n 1 n 2 b n 3 # 1 2 3 4 5 6 7 8 a a, b Nodes n 1 n 2 n 3 n 1, n 2 n 1, n 3 n 2, n 3 n 1, n 2, n 3

CDP: compute survivability h 1 = {n 1, n 2} h 2 given Ne, MAS, , dp. = {n 3} © V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Remove unnecessary nodes and create Ne’ ; ’. Compute hitting sets for Ne’ ; ’ For any possible future network Check if the pfn contains at least one hitting set. Create and return the result of the appropriate minimization problem Will be used for finding optimal deployment. Node n 1 n 2 b n 3 # 1 2 3 4 5 6 7 8 a a, b Nodes n 1 n 2 n 3 n 1, n 2 n 1, n 3 n 2, n 3 n 1, n 2, n 3

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 CDP 1 : Adding Efficiency Proposition 1: Proposition 2: Removing an agent from a node cannot add new hitting sets to the system Any hitting set that contains a removed node, is not going to be a hitting set anymore. These two suggest that re-computation of hitting sets is not necessary. Use the old network. Application: For each element of old network’s hitting set If the node changed is not an element of that set OR h – {node_changed} can support the removed agent – USE THIS HITTING SET IN YOUR NEW NETWORK ALSO

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Computing Optimal Deployment Given a network and a disconnect probability function: Find a deployment whose probability of survivability is maximal.

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Search for Optimal Deployment: Branch and Bound algo. Initial state: all agents on all nodes (if a valid deployment stop). Children of a state are all the states that are obtained by the removal of one agent from one node of the state. In each stage: if a valid deployment is found, compute its survivability and use to bound the search. No need to consider deployments whose survivability is lower than the bound (proposition). Theorem: Problem of finding an optimal deployment is NPNP complete. NOT PRACTICAL TO FIND OPTIMAL DEPLOYMENTS.

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Node Based Heuristic Put as many agents as possible in nodes with low disconnect probability Sort nodes in ascending order of disconnect probability. For each such node put as many agents as possible on it using Knapsack algorithm. Don’t deploy an agent twice. (variant is to use a greedy knapsack approx). If all agents are deployed and you still have available nodes, start from the beginning. 0. 2 25 0. 3 20 0. 35 30 10 20 10

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Agent Based Heuristic Place agents with high resource requirements on nodes with low disconnect probabilities. Sort agents in ascending order according to resource requirements. Place the first agent on the node with the “lowest” disconnect probability; then second etc 0. 35 30 0. 2 0. 3 25 20 20 10 10

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Experiment Settings Goal: to compare of the different algorithms and heuristics. Evaluation of time and survivability as function of: Problem’s size: Number of agents+ number of nodes Number ratio: Number of agents/number of nodes Size ratio: avg. of memory requirement of agents/avg. of memory available on nodes. Varying: number of agents; number of nodes. Random generation: memory requirement/availability of agents and nodes disconnect probability of nodes

Comparing Heuristics: Survivability © V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 ______ Agent Based H. ----- Node Based H. Num. of Agents + Num. of Nodes

Heuristics Comparison: Survivability © V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Num. of Agents > Num. of Nodes Num. of Agents < Num. of Nodes ___ Agent Based H. Survivability ----- Nodes Based H. ___ Agent Based H. ----- Nodes Based H. Num. of Agents + Num. of Nodes

Heuristics Comparison: Computation Time in microsecs. © V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 (Microseconds) Computation Time ___ Agent Based H. ___ Node Based H. Num. of Agents + Num. of Nodes

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Results As the sum of the number of agents and nodes increases, survivability decreases. Node-based heuristic almost always gives slightly better results than the agent-based heuristic for survivability. When there are more agents than nodes, the node-based heuristic will require less time, while there are more nodes than agents the agent based heuristic will take less time.

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Talk Outline Architectures for multiagent survivability Centralized probabilistic survivability Agent-oriented probabilistic survivability Centralized Probabilistic Survivability Details Outline of 3 algorithms for agent oriented distributed survivability Experimental results

ASA-1: Agent Survivability Algorithm © V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Key idea Add a special distributed survivability agent dsa to MAS. Add a copy of dsa to each node on the network. dsa replicas on different nodes are designed in such a way that they always know what the other replicas are doing. ASSUMPTIONS Each agent can kill/copy itself to another node. Each node has enough space for the dsa has knowledge of current deployment and of changes in disconnect probabilities of nodes. Developed network based models to estimate disconnect probability of nodes. Buffer manager can stream agents. Pretty standard to implement.

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 ASA-1 algorithm sketch Current deployment is now a If disconnect probability change occurs then 1. Use COD to find new deployment new 2. For each node n, compute Insert(n) and Delete(n) – nodes to be inserted/deleted from n. 3. Determine how to insert/delete within the constraints of space per node. Key step is (3). During execution of the algorithm, 1. at least one copy of each agent must be on the network at all times and 2. no host’s space should be exceeded (at all times).

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 ASA-2 algorithm MAS’ = MAS U { dsa 2 } Use COD to deploy MAS’. Slight differences between the coding of this dsa as compared to the dsa used in ASA-1. dsa 2: Arbitrarily deletes all but one copy of each agent Moves/copies remaining agents to their new locations.

ASA 3: Mobility-based survivability © V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Each agent in MAS has a mobility capability. Each agent in MAS is augmented with the following rules: When any dsa sends it a message to move to a new location, it does so. After executing the move, it informs all dsa replices that it has done the move. Assume no other move operations in the agent that interfere with the above rules.

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 ASA-3 When a change in disconnect prob is detected do the following: 1. For each node n, compute Insert(n) and Delete(n). 2. Send delete messages to all agent replicas that can be safely deleted (safe means at least one other copy of the agent exists) 3. Send a move/exchange message to various agents. (agents can be streamed ) 4. When an ack is received, send a move/exchange message to the next agent. 5. Continue till no more agents to be moved.

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Experiments Agent survivability based on agent characteristics from existing IMPACT agents (31 agent sample). Agent memory requirements Between 150 KB and 250 KB with 3/31 probability. Between 50 K and 150 K with 8/31 probability. Between 0 KB and 50 KB with 20/31 probability. Bandwidth = 100 KB/sec (actually a relatively low bandwidth assumption).

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Effect of Problem Size on

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Effect of Survivability with ASA-1

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Key experimental results Computation Time: Network Time: ASA-1, ASA-3 much faster than ASA-2. ASA-1, ASA-3 hard to compare. Survivability of Resulting Deployments: ASA-1, ASA-3 always outperform ASA-2 ASA-1 usually better than ASA-3 esp. when problem size is large. ASA-1, ASA-3 very close. When problem size is increased, ASA-3 usually better. ASA-3 seems to be the best overall.

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Related Work FLP Marin et. al Reliability (safety, liveness) Kumar Answers how to replicate flexibly and reliably How to deploy proxies? Gartner Mainly on synchronization No answer to 3 Questions Marin Only one facility – making multiple is complex No probabilistic approach. (limited constraints) Focuses on control, we focus on avoidance Fan How to clone? (Can be used for the replication in our paradigm. )

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Related Work Lyu, He Unnecessary

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Collaborators C. Tas – Univ.

© V. S. Subrahmanian, Invited talk at CLIMA-IV, 1/7/2004 Conclusions There is a growing need for guarantees that multiagent applications will survive various kinds of faults. We proposed a set of solutions on how to deploy multiple copies of agents on nodes so that the probability the MAS survives is maximized. We provided a formal model and conducted preliminary experiments. Many open questions are left!!!