Скачать презентацию Prophiler A fast filter for the large-scale detection Скачать презентацию Prophiler A fast filter for the large-scale detection

93a0940a3db3cc223fe1a852fd2b8045.ppt

  • Количество слайдов: 22

Prophiler: A fast filter for the large-scale detection of malicious web pages Reporter : Prophiler: A fast filter for the large-scale detection of malicious web pages Reporter : 鄭志欣 Advisor: Hsing-Kuo Pao Date : 2011/03/31 1

Conference • Davide Canali, Marco Cova, Giovanni Vigna and Christopher Kruegel, Conference • Davide Canali, Marco Cova, Giovanni Vigna and Christopher Kruegel, "Prophiler: a Fast Filter for the Large-Scale Detection of Malicious Web Pages", 20 th International World Wide Web Conference (WWW 2011) 2

Outline Introduction Approach Implementation and Setup Evaluation Conclusion 3 Outline Introduction Approach Implementation and Setup Evaluation Conclusion 3

Intruduction • Malicious Web pages – Drive-by-Download : Java. Script – Compromising hosts – Intruduction • Malicious Web pages – Drive-by-Download : Java. Script – Compromising hosts – Large-scare Botnets • Static analysis vs. Dynamic analysis – Dynamic analysis spent a lot of time. – Static analysis reduce the resources required for performing large-scale analysis. – URL blacklists (Google safe Browsing) – Honey. Client: Wepawet Phoney. C JSUnpack – Combined ? • Quickly discard benign pages forwarding to the costly analysis tools(Wepawet). 4

Prophiler Prophiler, uses static analysis techniques to quickly examine a web page for malicious Prophiler Prophiler, uses static analysis techniques to quickly examine a web page for malicious content. HTML , Java. Script , URL information Model : Using Machine-Learning techniques 5

Approach Features Neko HTML Parser HTML, Java. Script, URL information Total features : 77 Approach Features Neko HTML Parser HTML, Java. Script, URL information Total features : 77 New features : 17 Models 6

Features 7 Features 7

Reference Paper • [26]C. Seifert, I. Welch, and P. Komisarczuk. Identification of Malicious Web Reference Paper • [26]C. Seifert, I. Welch, and P. Komisarczuk. Identification of Malicious Web Pages with Static Heuristics. In Proceedings of the Australasian Telecommunication Networks and Applications Conference (ATNAC), 2008. • [16] P. Likarish, E. Jung, and I. Jo. Obfuscated Malicious Javascript Detection using Classification Techniques. In Proceedings of the Conference on Malicious and Unwanted Software (Malware), 2009 • [6] B. Feinstein and D. Peck. Caffeine Monkey: Automated Collection, Detection and Analysis of Malicious Java. Script. In Proceedings of the Black Hat Security Conference, 2007. • [17] J. Ma, L. Saul, S. Savage, and G. Voelker. Beyond Blacklists: Learning to Detect Malicious Web Sites from Suspicious URLs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2009. • [25] C. Seifert, I. Welch, and P. Komisarczuk. Identification of Malicious Web Pages Through Analysis of Underlying DNS and Web Server Relationships. In Proceedings of the LCN Workshop on Network Security (WNS), 2008. 8

Effectiveness of new features HTML(7) Java. Script(4) URL and Host(5) #elements containing suspicious content Effectiveness of new features HTML(7) Java. Script(4) URL and Host(5) #elements containing suspicious content shellcode presence probability(J 48) TLD of the URL #iframes the presence of decoding routines the absence of a subdomain in the URL #elements with a small area the maximum string length the TTL of the host’s DNS A record the whitespace percentage of the web page the entropy of the scripts the presence of a suspicious domain name or file name the page length in characters the presence of a port number in the URL the presence of meta refresh tags the percentage of scripts in the page 9

Discussion Assumptions First, distribution of feature values for malicious examples is different from benign Discussion Assumptions First, distribution of feature values for malicious examples is different from benign examples. Second, the datasets used for model training share the same feature distribution as the real-world data that is evaluated using the models. Trade-offs False negative vs. False positive 10

Implementation and Setup(cont. ) • Prophiler as a filter for our existing dynamic analysis Implementation and Setup(cont. ) • Prophiler as a filter for our existing dynamic analysis tool, called Wepawet. • Collection URLs : Heritrix (tools), Spam Email • Terms form Twitter , Google , Wikipedia trends • Collecting URLs : 2, 000 URLs/day 11

12 12

Implementation and Setup • The crawler fetches pages and submits them as input to Implementation and Setup • The crawler fetches pages and submits them as input to Prophiler. • Server : – Ubuntu Linux x 64 v 9. 10 – 8 -core Intel Xeon processor and 8 GB of RAM • The system in this configuration is able to analyze on average 320, 000 pages/day. • Analysis must examine around 2 million URLs each day. 13

Evaluation Total web pages : 20 million web pages. 14 Evaluation Total web pages : 20 million web pages. 14

Evaluation (cont. ) • Training Set : – – 787 Wepawet’s database. 51, 171 Evaluation (cont. ) • Training Set : – – 787 Wepawet’s database. 51, 171 Top 100 Alexa website Google safebrowsing API , anti-virus , experts. 10 -Fold 15

16 16

Evaluation (cont. ) • Validation – – – – 153, 115 pages Submitted to Evaluation (cont. ) • Validation – – – – 153, 115 pages Submitted to Wepawet spent 15 days Benign : 139, 321 pages Malicious : 13, 794 pages False Positive : 10. 4% False Negative : 0. 54% Saving valuable resources 17

18 18

Evaluation (cont. ) Large-scale Evaluation 18, 939, 908 pages run 60 -days 14. 3% Evaluation (cont. ) Large-scale Evaluation 18, 939, 908 pages run 60 -days 14. 3% as malicious 85. 7% as reduction of load on the back-end analyzer 1, 968 malicious pages/days (by Wepawet) False Positive rate : 13. 7% False Negaitve rate : 1% 19

1968 every day as malicious by Wepawet 20 1968 every day as malicious by Wepawet 20

Evaluation (cont. ) Comparsion 15000 web pages Malicious : 5861 pages Benign : 9139 Evaluation (cont. ) Comparsion 15000 web pages Malicious : 5861 pages Benign : 9139 pages 21

Conclusion We developed Prophiler, a system whose aim is to provide a filter that Conclusion We developed Prophiler, a system whose aim is to provide a filter that can reduce the number of web pages that need to be analyzed dynamically to identify malicious web pages. Deployed our system as a front-end for Wepawet , with very small false negative rate. 22