End to End Security and Privacy in Distributed

End to End Security and Privacy in Distributed Systems and Cloud Bharat Bhargava CERIAS Security Center CWSA Wireless Center Department of CS and ECE Purdue University bb@cs. purdue. edu Supported by NSF, AFRL, CISCO, Motorola, IBM 1

Visions of AF chief scientist ( Werner Dahm) • U. S Air Force “Technology Horizons” 2010 --‐ 2030 http: //www. aviationweek. com/media/pdf/Check 6/USAF_Technology_Horizons_report. pdf • Next-Generation High-Bandwidth Secure Communications • Trusted, Adaptive, Flexibly-Autonomous Systems • New technologies for increased cyber resilience of Air Force networks and systems • Technologies can provide increased trust in autonomy to enable reduced manpower requirements via flexibly autonomous systems 2

PCA 1: Inherently Intrusion-Resilient Cyber Systems • There will be an enduring need for technologies that can enable a wide range of Air Force capabilities that support missions, including better multi-level security solutions to facilitate sharing of information, systems, and training with international partners. • Command Control Center (C 2 C), exploring solutions to national security problems with emphasis on improved information interoperability, systems integration, and cyber assurance, including net-centric strategies, complex systems engineering, and information technologies. 3

PCA 2: Automated Cyber Vulnerability Assessments and Reactions • • • Ad hoc networks Polymorphic networks Agile networks Complex adaptive distributed networks Frequency-agile RF systems V&V for complex adaptive systems Autonomous systems Collaborative/cooperative control Information fusion and understanding Cyber defense Cyber resilience Social network modeling 4

Objective • A trustworthy, secure, and privacy preserving network platform must be established for trusted collaboration in an End to End system. The fundamental research problems include: – – – Trust management Privacy preserved collaborations Dealing with a variety of attacks in networks Intruder identification in ad hoc networks Trust-based privacy preservation for peer-to-peer data sharing 5

Applications • Security sensitive applications in the next generation systems – – Cloud Computing Subscribe/Publish paradigm in Do. D Data sharing for medical research and treatment Collaboration among government agencies for homeland security – Transportation system (security check during travel, hazardous material disposal) – Collaboration among government officials, law enforcement and security personnel, and health care facilities during bio-terrorism and other emergencies 6

The Oppnet Concept ( Navy STTR) Fighter Satellites X-47 B UCAS Seed Oppnets recruit and coordinate the capabilities of diverse networks, sensors, and computational resources in a way that optimizes resource utilization and also ensures improved Qo. S despite intermittent link connectivity. Radar Processing LCSs Merchant Ships Target Carrier USVs Underwater Acoustic Array Oppnet links 3/19/2018 non-Oppnet UCAS links to Carrier 7

Building Seed Oppnet Arrival an. Emergency Among An Incident Occurs Response Teams Before of. Expanded Oppnet: Emergency Response Teams Forming Incident Expanded Oppnet Discovering Candidate Helpers, Selecting and Integrating Helpers Overturned Vehicles (On. Star & BANs) Cellphone Tower Computer Network Road Infrastructure Sensors 11/10/2006

Seek Collaboration with Faculty Interested in • • • Security/Privacy Networking Embedded Systems Sensors Distributed Systems Assistive Technologies 9

Opportunistic Networks (with Navy) • Opportunistic Resource Utilization Networks (Oppnets) for UAV Ad-Hoc Networking: – Novel MANET consisting of an initial seed network that temporarily recruits resources. – Oppnets: • Allow the construction of highly adaptive, flexible, and maintainable application networks • Utilize and enhance applications, even including inflexible, stovepiped, legacy applications • Adapt and optimize the use of resources on-the-fly • Enable and facilitate distributed applications • Virtualize resources across platforms, allow scalability, and promote dynamic growth – Oppnets are: • Opportunistic resource/capability utilization networks • Opportunistic growth networks • Specialized Ad-Hoc Networks/Systems (SAHNS) 10

AFRL BAA Announcement • • To support end to end authenticity, integrity and confidentiality within the AF, its mission capabilities must be preceded with strong 2 -way authentication and authorization. Currently available web browsers lack adequate support for standard Web Services programming models or protocols. This limitation presents the following challenges with the Air Force’s view of the architecture: – Authentication and authorization does not take place across intended end points. Such as between the requestor and provider. – Termination at intermediate steps of service execution exposes messages to hostile threats, for example, Man-in-the-Middle attacks. https: //www. fbo. gov/index? s=opportunity&mode=form&tab=core&id=9 c 3333 a 8 b 3 ee 11 0 f 6 d 136 e 1 dc 6606 ef 0 df&_cview=0

Aspects of Technical Approach ( AFRL Project) • • • 12 Designed a policy-driven security architecture for SOA based across service domains System provides a secure end-to-end message origin authentication for web service client requests and web service providers to ensure confidentiality and integrity in the presence of man-in-the-middle attacks. Investigated adoption of web service WS-* standards (WS-Security, WS -Reliability, WS-Trust, WS-Interoperability) for enterprise Air Force systems. Used Taint Analysis for monitoring security A prototype implementation of proposed approaches based on open source technologies that integrated into existing government-off-theshelf (GOTS) components in an operational environment. Developed prototype and conducted experiments in Cloud environment

Problem Overview (AFRL) 13

SOA Reference Scenario 1. UDDI Registry request 2. Forwarding the service list to Trust Broker and receive a ranked list 3. Invoking a selected service 4. Second invocation by service in domain A 5. Invoking a service in public service domain 6. End points (Reply to user) 14

Reference Scenario Details • Steps: 1. Global UDDI Registry request • 2. User sends a refined list of services to Trust Broker module • • 3. Trust Broker categorizes the list of services and returns a classified list Certified, Trusted, Untrusted services Service Request • • 15 User receives a list of services related to the requested category User selects a service based on its criteria (Qo. S, Trust category of service, Security preference, etc. ) and invokes that service. User creates a session with Trust Broker and selected service in Trusted Domain A. (Trust sessions are shown with dashed lines)

Reference Scenario Details (Cont. ) 4. Trusted domain A will invoke another service in Trusted domain B. • • 5. Step four is repeated. • • • 6. At this moment, an external service invocation to an public service is detected by Taint Analysis module This will be reported to Trust Broker will maintain the trustworthiness of this SOA service orchestration and if needed can stop it. Service in service domain B invokes a service in an public (Maybe untrusted) domain C (Possibility of deploying Taint Analysis in this domain) Service end points to user • 16 Taint Analysis module will intercept the communications and reports any illegal external invocation Trust session will be extended to this domain (a new trust link between domain A and trust broker) The response of SOA invocation can be sent directly to the user

SOA Security Solution 17

Detecting Service Violation in Internet • Problem statement Detecting service violation in networks is the procedure of identifying the misbehaviors of users or operations that do not adhere to network protocols. Collaboartion with Dr. Albert Legaspi Head, Networks Division SPAWARSYSCEN 55100, NAVY, San Diego 18

Topology Used (Internet) Victim, V A 3 uses reflector H 3 to attack V H 5 A 1 spoofs H 5’s address to attack V 19

Detecting Do. S Attacks in Internet *SPIE: Source Path Isolation Engine 20

• Research Directions – Observe misbehavior flows through service level agreement (SLA) violation detection – Core-based loss – Stripe based probing – Overlay based monitoring 21

Approach • Develop low overhead and scalable monitoring techniques to detect service violations, bandwidth theft, and attacks. The monitor alerts against possible Do. S attacks in early stage • Policy enforcement and controlling the suspected flows are needed to maintain confidence in the security and Qo. S of networks 22

Methods • Network tomography – Stripe based probing is used to infer individual link loss from edge-to-edge measurements – Overlay network is used to identify congested links by measuring loss of edge-to-edge paths • Transport layer flow characteristics are used to protect critical packets of a flow • Edge-to-edge mechanism is used to detect and control unresponsive flows 23

Monitoring Network Domains • Idea: – Excessive traffic changes internal characteristics inside a domain (high delay & loss, low throughput) – Monitor network domain for unusual patterns – If traffic is aggregating towards a domain (same IP prefix), probably an attack is coming • Measure delay, link loss, and throughput achieved by user inside a network domain Monitoring by periodic polling or deploying agents in high speed core routers put non-trivial overhead on them 24

Core-assisted loss measurements • Core reports to the monitor whenever packet drop exceeds a local threshold • Monitor computes the total drop for time interval t • If the total drop exceeds a global threshold a. The monitor sends a query to all edge routers requesting their current rates b. The monitor computes total incoming rate from all edge c. The monitor computes the loss ratio as the ratio of the dropped packets and the total incoming rate d. If the loss ratio exceeds the SLA loss ratio, a possible SLA violation is reported 25

Stripe Unicast Probing [Duffield et al. , INFOCOM ’ 01] Idea from Butler Lampson and Howard Sturgis (Crash Recovery in a Distributed Data Storage System) • Back-to-back packets experience similar congestion in a queue with a high probability • Receiver observes the probes to correlate them for loss inference • Infer internal characteristics using topology • For general tree? Send stripe from root to every order-pair of leaves • Develop stripe-based monitoring by extending loss inference for multiple drop precedence 26

Inferring Loss • Calculate how many packets are received by the two receivers. Transmission probability Ak Ak = ZR 1 ZR 2 ZR 1 U R 2 where Zi binary variable which takes 1 when all packets reached their destination and 0 otherwise • Loss is 1 - Ak • For general tree, send stripe from root to every order-pair of leaves.

Overlay-based Monitoring • Problem statement – • Given topology of a network domain, identify which links are congested Solutions: Simple and Advanced methods 1. Monitor the network for link delay 2. If delayi > Thresholdidelay for path i, then probe the network for loss 3. If lossj > Thresholdjloss for any link j, then probe the network for throughput 4. If BWk > Thresholdk. BW, flow k is violating service agreements by taking excess resources. Upon detection, we control the flows. 28

Probing: Simple Method Congested link (a) Topology (b) Overlay (c) internal links • Each peer probes both of its neighbors • Detect congested link in both directions 29

Experiments: Evaluation methodology • Simulation using ns-2 • Two topologies – C-C links, 20 Mbps – E-C links, 10 Mbps • Parameters – Number of flows order of thousands – Change life time of flows – Simulate attacks by varying traffic intensities and injecting traffic from multiple entry points • Output Parameters – delay, loss ratio, throughput Congested link Topology 1 30

Loss Ratio Identified Congested Links Time (sec) (a) Counter clockwise probing Time (sec) (b) Clockwise probing Probe 46 in graph (a) and Probe 76 in graph (b) observe high losses, which means link C 4 E 6 is congested. 31

False Positive (theoretical analysis) • The simple method does not correctly label all links • The unsolved “good” links are considered bad hence false positive happens 32 • Need to refine the solution Advanced Method

• Example: if 100 links in the network and 20 of them are congested and 80 are “good”. The basic probing method can identify 15 congestion links and 70 good links. The other 15 are labeled as “unknown”. If all unknown links are treated as congested, 10 good link will be falsely labeled as congested. When the false positive is too high, the available paths that can be chosen by the routers are restricted, thus network performance is impacted. 33

Analyzing Simple Method • Lemma 1. If P and P’ are probe paths in the first and the second round of probing respectively, |P P’ | ≤ 1 • Theorem 1. If only one probe path P is shown to be congested in any round of probing, the simple method successfully identifies status of each link in P • Performs better if edge-to-edge paths are congested • The average length of the probe paths in the Simple method is ≤ 4 34

Theorem 2. Let p be the probability of a link being congested in any arbitrary overlay network. The simple method determines the status of any link of the topology with probability at least 2(1 p)4 -(1 -p)7+p(1 -p)12 Detection Probability Performance: Simple Method Frac of actual congested links 35

Advanced Method Advanced. Method() begin Conduct Simple Method. E is the unsolved equation set for Each undecided variable Xij of E do node 1 = Find. Node(Tree T, vi, IN) node 2 = Find. Node(Tree T, vj , OUT) if node 1 ≠ NULL AND node 2 ≠ NULL then Probe(node 1, node 2). Update equation set E end if Stop if no more probe exists endfor end 36

Loss Ratio Identifying Links: Advanced Method Time (sec) Link E 2 C 2, C 1 C 3, C 3 C 4, and C 4 E 6 are congested. Simple method identifies all except E 2 C 2. Advanced method finds probe 37 E 5 E 1 to identify status of E 2 C 2.

Analyzing Advanced Method • Lemma 2. For an arbitrary overlay network with n edge routers, on the average a link lies on b = edge-to-edge paths • Lemma 3. For an arbitrary overlay network with n edge routers, the average length of all edge-toedge paths is d = • Theorem 3. Let p be the probability of a link being congested. The advanced method can detect the status of a link with probability at least (1(1 -(1 -p)d)b) 38

• Graph shows lower and upper bounds • When congestion is ≤ 20%, links are identified with O(n) probes with probability ≥ 0. 98 • Does not help if ≥ 60% links are congested Detection Probability Bounds on Advanced Method Frac of actual congested links Advanced method uses output of simple method and topology to find a probe that can be used to identify status of an unsolved link in simple method 39

% of traffic Experiments: Delay Measurements Delay (ms) Cumulative distribution function (cdf) • Attack changes delay pattern in a network domain • We need to know the delay pattern when there is not attack 40

Loss Ratio Experiments: Loss measurements Time (sec) (a) Core-assisted Time (sec) (b) Stripe-based Core-based measurement is more precise than stripe-based, however, it has high overhead 41

Loss Ratio Delay (ms) Attack Scenarios Time (sec) (a) Changing delay pattern due to attack (b) Changing loss pattern due to attack • Attack 1 violates SLA and causes 15 -30% of packet loss • Attack 2 causes more than 35% of packet loss 42

Detecting Do. S Attacks • If many flows aggregate towards a downstream domain, it might be a Do. S attack on the domain • Analyze flows at exit routers of the congested links to identify misbehaving flows • Activate filters to control the suspected flows • Flow association with ingress routers – Egress routers can backtrack paths, and confirm entry points of suspected flows 43

Processing overhead (CPU cycle) Communication overhead in KB Overhead comparison Percentage of misbehaving flow (a) Processing overhead (b) Communication overhead • Core has relative low processing overhead • Overlay scheme has an edge over other two schemes 44

Observations • Stripe-based Monitoring – Stripe-based probing can monitor Diff. Serv networks only from the edges – It takes 10 sec to converge the inferred loss ratio to actual loss ratio with ≥ 90% accuracy – 10 -15 delay probes and 20 -25 loss probes per second are sufficient for monitoring – Probe is a 3 -packet stripe • 3 shows good correlation, 4 does not add much 45

Observations (Cont’d) • Overlay-based Monitoring – Congestion status of individual links can be inferred from edge-to-edge measurements – When the network is ≤ 20% congested • Status of a link is identified with probability ≥ 0. 98 • Requires O(n) probes, where n is the number of edge routers – Worst case is O(n 2), whereas stripe-based requires O(n 3) probes to achieve same functionality 46

Observations (Cont’d) • Analyze existing techniques to defeat Do. S attacks – Marking has less overhead than Filtering, however, it is only a forensic method – Monitoring might have less processing overhead than marking or filtering, however, monitoring injects packets and others do not – Monitoring can alert against Do. S attacks in early stage 47

Observations (Cont’d) • Traffic Conditioner – Using small state table, we can design scalable traffic conditioner – It can protect critical packets of a flow to improve application Qo. S (delay, throughput, response time, …) – Both Round trip time (RTT) & Retransmission time-out (RTO) are necessary to avoid RTTbias among flows 48

Observations (Cont’d) • Flow Control – Network tomography is used to design edgeto-edge mechanism to detect & control unresponsive flows – Qo. S of adaptive flows improves significantly with flow control mechanism 49

Conclusion on Monitoring • Elegant way to use probability in inferring loss. 3 packets stripe shows good correlation • Monitoring network can detect service violation and bandwidth theft using measurements • Monitoring can detect Do. S attacks in early stage. Filter can be used to stop the attacks • Overlay-based monitoring requires only O(n) probing with a very high probability, where n is the number of edge routers • Overlay-based monitoring has very low communication and processing overhead • Stripe-based inference is useful to annotate a topology tree with loss, delay, and bandwidth. 50

Intruder Identification in Ad Hoc Networks (Cisco, Motorola, AFRL) • Problem Statement Intruder identification in ad hoc networks is the procedure of identifying the user or host that conducts the inappropriate, incorrect, or anomalous activities that threaten the connectivity or reliability of the networks and the authenticity of the data traffic in the networks 51

Research problem Related work Purdue Research Identification of Intruder SAODV protocol Robust Reverse Labeling Tree scheme, Trust to minimize suspicion of all, Experiments Identification of Packet Drops Arizona-React system Deals with multiple attackers coordinating, uses audit, hash functions on path Identification of Collaboration CSCW Intrusion Detection Systems (IDS) capable of correlating CAs, Fuzzy systems, Learning, Experiments

Introduction to AODV • Introduced in 97 by Perkins at NOKIA, Royer at UCSB • 12 versions of IETF draft in 4 years, 4 academic implementations, 2 simulations • Combines on-demand distance vector • Broadcast Route Query, Unicast Route Reply • Quick adaptation to dynamic link condition and scalability to large scale network • Support multicast 53

Route Discovery in AODV (An Example) D S 1 S 3 S 2 S 4 S Route to the source Route to the destination 54

Attacks on AODV • Route request flooding – query non-existing host (RREQ will flood throughout the network) • False distance vector – reply “one hop to destination” to every request and select a large enough sequence number • False destination sequence number – select a large number (even beat the reply from the real destination) • Wormhole attacks – tunnel route request through wormhole and attract the data traffic to the wormhole • Coordinated attacks – The malicious hosts establish trust to frame other hosts, or conduct attacks alternatively to avoid being identified 55

False Destination Sequence Attack Sequence number 5 S 3 RREQ(D, 3) S RREP(D, 4) RREQ(D, 3) S 4 D RREQ(D, 3) S 1 RREQ(D, 3) RREP(D, 20) S 2 M Packets from S to D are sinking at M. 56

During Route Rediscovery, False Destination Sequence Number Attack Is Detected, S needs to find D again. Node movement breaks the path from S to M (trigger route rediscovery). (1). S broadcasts a request that carries the old sequence + 1 = 21 D S 3 RREQ(D, 21) S S 1 S 2 S 4 (2) D receives the RREQ. Local sequence is 5, but the sequence in RREQ is 21. D detects the false destination sequence number attack. M Propagation of RREQ 57

Reverse Labeling Restriction (RLR) Blacklists are updated after an attack is detected. • Basic Ideas • Every host maintains a blacklist to record suspicious hosts who gave wrong route related information. • The destination host will broadcast an INVALID packet with its signature. The packet carries the host’s identification, current sequence, new sequence, and its own blacklist. • Every host receiving this packet will examine its route entry to the destination host. The previous host that provides the false route will be added into this host’s blacklist. 58

BL {} S 3 D BL {} INVALID ( D, 5, 21, BL{}, Signature ) S 4 S S 1 BL {S 2} BL {S 1} S 2 M BL {} BL {M} S 4 BL {} Correct destination sequence number is broadcasted. Blacklist at each host in the path is determined. 59

D 1 S 4 [M ] D 3 [M ] S 1 D 2 M [M ] S 3 D 4 [M ] S 2 M attacks 4 routes (S 1 -D 1, S 2 -D 2, S 3 -D 3, and S 4 -D 4). When the first two false routes are detected, D 3 and D 4 add M into their blacklists. When later D 3 and D 4 become victim destinations, they will broadcast their blacklists, and every host will get two votes that M is malicious host. Malicious site is in blacklists of multiple destination hosts. 60

Intruder Identified • If M is in multiple blacklists, M is classified as a malicious host based on a certain threshold. • Intruder is approximately identified. • Trust values can be used for combining knowledge from other hosts. 61

Experimental Studies of RLR • The experiments are conducted using ns 2. • Various network scenarios are formed by varying the number of independent attackers, number of connections, and host mobility. • The examined parameters include: – Packet delivery ratio – Identification accuracy: false positive and false negative ratio – Communication and computation overhead 62

Simulation Parameter Simulation duration 1000 seconds Simulation area 1000 * 1000 m Number of mobile hosts Transmission range Pause time between the host reaches current target and moves to next target 30 250 m 0 – 60 seconds Maximum speed 5 m/s Number of CBR connection 25/50 Packet rate 2 pkt / sec 63

Experiment 1: Measure the Changes in Packet Delivery Ratio Purpose: investigate the impacts of host mobility, number of attackers, and number of connections on the performance improvement brought by RLR Input parameters: host pause time, number of independent attackers, number of connections Output parameters: packet delivery ratio Observation: When only one attacker exists in the network, RLR brings a 30% increase in the packet delivery ratio. When multiple attacker exist in the system, the delivery ratio will not recover before all attackers are identified. 64

Increase in Packet Delivery Ratio: Single Attacker X-axis is host pause time, which evaluates the mobility of host. Y-axis is delivery ratio. 25 connections and 50 connections are considered. RLR brings a 30% increase in delivery ratio. 100% delivery is difficult to achieve due to network partition, route discovery delay and buffer. 65

Experiment 2: Measure the Accuracy of Intruder Identification Purpose: investigate the impacts of host mobility, number of attackers , and connection scenarios on the detection accuracy of RLR Input parameters: number of independent attackers, number of connections, host pause time Output parameters: false positive alarm ratio, false negative alarm ratio Observation: The increase in connections may improve the detection accuracy of RLR. When multiple attackers exist in the network, RLR has a high false positive ratio. 66

Accuracy of RLR: Single Attacker 30 hosts, 25 connections Host Pause time (sec) # of normal hosts identify the attacker # of normal hosts marked as malicious 30 hosts, 50 connections # of normal hosts identify the attacker # of normal hosts marked as malicious 0 24 0. 22 29 2. 2 10 25 0 29 1. 4 20 24 0 25 1. 1 30 28 0 29 1. 1 40 24 0 29 0. 6 50 24 0. 07 29 1. 1 60 24 0. 07 24 1. 0 The accuracy of RLR when there is only one attacker in the system 67

Experiment 3: Measure the Communication Overhead Purpose: investigate the impacts of host mobility and connection scenarios on the overhead of RLR Input parameters: number of connections, host pause time Output parameters: control packet overhead Observation: When no false destination sequence attacks exist in the network, RLR introduces small packet overhead into the system. 68

Control Packet Overhead X-axis is host pause time, which evaluates the mobility of host. Y-axis is normalized overhead (# of control packet / # of delivered data packet). 25 connections and 50 connections are considered. RLR increases the overhead slightly. 69

Research Opportunities: Improve Robustness of RLR • Protect the good hosts from being framed by malicious hosts • The malicious hosts can frame the good hosts by putting them into blacklist. • By lowering the trust values of both complainer and complainee, we can restrict the impacts of the gossip distributed by the attackers. 70

• Avoid putting every host into blacklist • Combining the host density and movement model, we can estimate the time ratio that two hosts are neighbors • The counter for a suspicious host decreases as time passes • Adjusting the decreasing ratio to control the average percentage of time that a host stays in the blacklist of another host 71

• Defend against coordinated attacks • The behaviors of collusive attackers show Byzantine manners. The malicious hosts may establish trust to frame other hosts, or conduct attacks alternatively to avoid being identified. • Look for the effective methods to defend against such attacks. Possible research directions include: • Apply classification methods to detect the hosts that have similar behavior patterns • Study the behavior histories of the hosts that belong to the same group and detect the pattern of malicious behavior (time-based, order-based) 72

Conclusions on Intruder Identification • False destination sequence attacks can be detected by the anomaly patterns of the sequence numbers • Reverse labeling method can reconstruct the false routing tree • Isolating the attackers brings a sharp increase in network performance • On going research will improve the robustness of the mechanism and the accuracy of identification 73

Defending against Collaborative Packet Drop Attacks on MANETs Work done with Dr. Weichao Wang ( former student now professor at UNCC) and Dr. Mark Linderman at AFRL 74

Organization of Presentation • • • Problem Statement Related Work REAct System and Its Vulnerability Our Approach Analysis Conclusion

Problem Statement Packet drop attacks put severe threats to Ad Hoc network performance and safety • Directly impact the parameters such as packet delivery ratio • Will impact security mechanisms such as distributed node behavior monitoring • Different approaches have been proposed • Vulnerable to collaborative attacks • Have strong assumptions of the nodes

Problem Statement Many research efforts focus on individual attackers • The effectiveness of detection methods will be weakened under collaborative attacks • E. g. , in “watchdog”, multiple malicious nodes can provide fake evidences to support each other’s innocence • In wormhole and Sybil attacks, malicious nodes may share keys to hide their real identities

Problem Statement Focus on collaborative packet drop attacks. Why? • Secure and robust data delivery is a top priority for many applications • The proposed approach can be achieved as a reactive method: reduce overhead during normal operations • Can be applied in parallel to secure routing

Related Work Detecting packet drop attacks • Audit based approaches • Whether or not the next hop forward the packets • Use both first hand second hand evidences • Problems: • Energy consumption of eavesdropping • Can be cheated by directional antenna • Authenticity of the evidence • Incentive based approaches • Nuggets and credits • Multi-hop acknowledgement

Related Work Collaborative attacks and detection • • Classification of the collaborative attacks Collusion attack model on secure routing protocols Collaborative attacks on key management in MANET Detection mechanisms: • Collaborative IDS systems • Ideas from immune systems • Byzantine behavior based detection

REAct system and vulnerability REAct system: • • Proposed by researchers in Univ of Arizona Published in ACM Wi. Sec 2009 Random audit based detector of packet drop A reactive approach: will be activated only when something bad happens • Assumptions: • At least two node disjoint paths b/w any pair of nodes • Know the identity of the intermediate nodes • Pair-wise keys b/w the source and the intermediate nodes

REAct system and vulnerability Working procedure of REAct • Destination detects the drop in packet arriving rate and notifies the source • Source randomly selects an intermediate node and asks it to generate a behavioral proof of the received packets • Intermediate node constructs a bloom filter using these packets • Source compares the bloom filter to its own value • If match: the attacker is after the intermediate node • Otherwise, it is before the intermediate node • Repeat the procedure until the bad link is located

REAct system and vulnerability Example of REAct: the source selects n 4 to be the first audited node. N 4 generates the correct bloom filter, so the attacker is between n 4 and D.

Collaborative attacks on REAct n 1 and n 4 are collusive attackers. n 1 discards the packets but delivers the bloom filter to n 4. Now the source will think that the attacker is between n 4 and D. Why REAct is vulnerable to this attack: the source can verify the bloom filter, but not the generator of the filter.

Proposed approach Assumptions: • Source shares a different secret key and a different random number with every intermediate node • All nodes in the network agree on a hash function h() • There are multiple attackers in the network • They share their secret keys and random numbers • Attackers have their own communication channel • An attacker can impersonate other attackers

Proposed approach Hash based approach: • Every node will add a fingerprint into the packet S 1 sends out the packet to n 1: S n 1: (S, D, data packet, random number t 0) Node n 1 will combine the received packet and its random number r 1 to calculate the new fingerprint: t 1 = h( r 1 || S || D || data packet || t 0 || r 1 ) n 1 n 2: (S, D, data packet, t 1 ) The audited node will generate the bloom filter based on the data packets and the fingerprints The source will generate its own bloom filter and compare it to the value of the audited node

Proposed approach Why our approach is safe • The node behavioral proofs in our proposed approach contain information from both the data packets and the intermediate nodes. • Theorem 1. If node ni correctly generates the value ti, then all innocent nodes in the path before ni (including ni) must have correctly received the data packet selected by S.

Proposed approach Why our approach is safe • The ordered hash calculations guarantee that any update, insertion, and deletion operations to the sequence of forwarding nodes will be detected. • Therefore, we have: • if the behavioral proof passes the test of S, the suspicious set will be reduced to {ni, ni+1, ---, D} • if the behavioral proof fails the test of S, the suspicious set will be reduced to {S, n 1, ---, ni}

Discussion • Indistinguishable audit packets • The malicious node should not tell the difference between the data packets and audited packets • The source will attach a random number to every data packet • Reducing computation overhead • A hash function needs 20 machine cycles to process one byte • We can choose a part of the bytes in the packet to generate the fingerprint. In this way, we can balance the overhead and the detection capability.

Discussion • Security of the proposed approach • The hash function is easy to compute: very hard to conduct Do. S attacks on our approach • It is hard for attackers to generate fake fingerprint: they have to have a non-negligible advantage in breaking the hash function • The attackers will adjust their behavior to avoid detection • The source may choose multiple nodes to be audited at the same time • The source should adopt a random pattern to determine the audited nodes

Conclusion • Previous approach is vulnerable to collaborative attacks • Propose a new mechanism for nodes to generate behavioral proofs • Hash based packet commitment • Contain both contents of the packets and information of the forwarding paths • Introduce limited computation and communication overhead • Extensions: • Investigate other collaborative attacks • Integrate our detection method with secure routing protocols

Another Example of Internal/External Attacks To combat joint attacks Example: Node A deceives node S informing it has shortest path to D A forwards any packets to X X sets up a tunnel to Y Any further packet will go through the tunnel In tunnel, packets can be selectively dropped or tampered with Collaborative wormhole attacks: internal and external attackers 92

Impacts of collaborative wormhole/packet discard attack on underwater sensor networks

Ideas on characterizing/classifying CAs • Identify the key features of combined attacks Use signal processing technique and machine learning technique to characterize/classify attacks • – – Wavelet transform for anomaly detection Fuzzy logic for decision making process 94

Context based Adaptable Defense against Collaborative Attacks in Service Oriented Architecture and Cloud Northrop Grumman Cybersecurity Research Consortium

Motivations/Objectives • Unmanned Aircraft Systems (UAS) can revolutionize military’s ability to monitor and understand the global environment. The communication is under constant attacks (both internal and external). Mobile environments have disconnections and can not be sure of global information • So. A and Cloud Environment are being used in Battle field tactical and emergency response networks. Require end to end security and privacy • Need to deal with collaborative, multiple, concurrent attacks of various types on various targets • Conduct experiments with real attack scenarios • Develop Cyber Genome ideas for Advanced Persistent Threats

Scope of problem Step 1 Step 2 Step 3 Static scan for attack graph generation Distributed monitoring Build tools for inferring, tracing back, and dealing with new attacks Identify linkage among malicious attacks Information integration Collaborative attack detection engine Propagate warnings, preempt to thwart attack Build prototype and evaluation

Identify intruders, Collaboration, origin & type of attack, potential for future attacks Identification Tomography, Router monitoring, SLA agreements Collaboration Agreements, Intent, Targets, Communication Observations Neural nets, Learning, Fuzzy logic Predication Extrapolation, Evaluation, Causal analysis

Acceleration in Intruder Identification D 3 D 2 D 1 M 2 M 3 M 1 S 2 S 3 Coordinated attacks by M 1, M 2, and M 3 Multiple attackers trigger more blacklists to be broadcasted by D 1, D 2, D 3. 99

Ideas on characterizing/classifying CAs • Identify the key features of Collaborative attacks Use signal processing technique and machine learning technique to characterize/classify attacks • – – Wavelet transform for anomaly detection Fuzzy logic for decision making process 100

Detection and Response at Coordinator Node Incoming packets are inspected and the classification parameters are handed to the fuzzy engine Parameters are mapped to the membership functions Black list managment is crucial to the efficiency of the system 101

Packet Delivery Ratio (PDR) and Throughput Impact of reaction time on performance is high, implying that coordinator node must get fast feedback from other nodes and react properly and quickly UDP achieves higher PDR, but drops more packets, resulting in lower throughput 102

Hardest Challenges • Understanding of the impact of multiple attacks when they run concurrently • Identify collaboration activity in a realistic attack scenario in Do. D and Emergency • Formalize the model for collaboration and will relate the computer supported cooperative work (CSCW) paradigm to this problem.

Future Plans • • Advanced Persistent Threats ( with NGC and Mitre) Cyber Genetics ( with Do. D) Prototype and Experiments Privacy in Cloud ( With Dr. Myong Kang, Naval Research Lab) • End to End security in So. A (with Asher Sinclair, AFRL) • Context Modeling (with Dr. Mark Linderman, AFRL) • Cross-domain Information Exchange (with Mike Mayhew and Yat Fu AFRL)

Trust-based Privacy Preservation for Peer-to-peer Data Sharing ( AFRL) Problem statement • Privacy in peer-to-peer systems is different from the anonymity problem • Preserve privacy of requester • A mechanism is needed to remove the association between the identity of the requester and the data needed 105

Proposed solution • A mechanism is proposed that allows the peers to acquire data through trusted proxies to preserve privacy of requester – The data request is handled through the peer’s proxies – The proxy can become a supplier later and mask the original requester 106

Related work • Trust in privacy preservation – Authorization based on evidence and trust, [Bhargava and Zhong, Da. Wa. K’ 02] – Developing pervasive trust [Lilien, CGW’ 03] • Hiding the subject in a crowd – K-anonymity [Sweeney, UFKS’ 02] – Broadcast and multicast [Scarlata et al, INCP’ 01] 107

Related work (2) • Fixed servers and proxies – Publius [Waldman et al, USENIX’ 00] • Building a multi-hop path to hide the real source and destination – Free. Net [Clarke et al, IC’ 02] – Crowds [Reiter and Rubin, ACM TISS’ 98] – Onion routing [Goldschlag et al, ACM Commu. ’ 99] 108

Related work (3) • [Sherwood et al, IEEE SSP’ 02] – provides sender-receiver anonymity by transmitting packets to a broadcast group • Herbivore [Goel et al, Cornell Univ Tech Report’ 03] – Provides provable anonymity in peer-to-peer communication systems by adopting dining cryptographer networks 109

Privacy measurement • A tuple <requester ID, data handle, data content> is defined to describe a data acquirement. • For each element, “ 0” means that the peer knows nothing, while “ 1” means that it knows everything. • A state in which the requester’s privacy is compromised can be represented as a vector <1, 1, y>, (y Є [0, 1]) from which one can link the ID of the requester to the data that it is interested in. 110

Privacy measurement (2) For example, line k represents the states that the requester’s privacy is compromised. 111

Mitigating collusion • An operation “*” is defined as: • This operation describes the revealed information after a collusion of two peers when each peer knows a part of the “secret”. • The number of collusions required to compromise the secret can be used to evaluate the achieved privacy 112

Trust based privacy preservation scheme • The requester asks one proxy to look up the data on its behalf. Once the supplier is located, the proxy will get the data and deliver it to the requester – Advantage: other peers, including the supplier, do not know the real requester – Disadvantage: The privacy solely depends on the trustworthiness and reliability of the proxy 113

Trust based scheme – Improvement 1 • To avoid specifying the data handle in plain text, the requester calculates the hash code and only reveals a part of it to the proxy. • The proxy sends it to possible suppliers. • Receiving the partial hash code, the supplier compares it to the hash codes of the data handles that it holds. Depending on the revealed part, multiple matches may be found. • The suppliers then construct a bloom filter based on the remaining parts of the matched hash codes and send it back. They also send back their public key certificates. 114

Trust based scheme – Improvement 1 • Examining the filters, the requester can eliminate some candidate suppliers and finds some who may have the data. • It then encrypts the full data handle and a data transfer key with the public key. • The supplier sends the data back using through the proxy • Advantages: – It is difficult to infer the data handle through the partial hash code – The proxy alone cannot compromise the privacy – Through adjusting the revealed hash code, the allowable error of the bloom filter can be determined 115

Data transfer procedure after improvement 1 Requester Proxy of Requester Supplier R: requester S: supplier Step 1, 2: R sends out the partial hash code of the data handle Step 3, 4: S sends the bloom filter of the handles and the public key certificates Step 5, 6: R sends the data handle and encrypted by the public key Step 7, 8: S sends the required data encrypted by 116

Trust based scheme – Improvement 2 • The above scheme does not protect the privacy of the supplier • To address this problem, the supplier can respond to a request via its own proxy 117

Trust based scheme – Improvement 2 Requester Proxy of Supplier 118

Trustworthiness of peers • The trust value of a proxy is assessed based on its behaviors and other peers’ recommendations • Using Kalman filtering, the trust model can be built as a multivariate, time-varying state vector 119

Conclusion • A trust based privacy preservation method for peer-to-peer data sharing is proposed • It adopts the proxy scheme during the data acquirement • Extensions – Solid analysis and experiments on large scale networks are required – A security analysis of the proposed mechanism is required 120

Related Ongoing Research A. Detecting wormhole attacks ( Prof Mario Gerla, UCLA) B. Position-based private routing in ad hoc networks (Motorola) C. Private routing in ad hoc networks D. Privacy Preserving Data Dissemination in Cross-Domains ( AFRL) E. Congestion aware distance vector (CADV) protocol for ad hoc networks 121