01f89b26cb1b09d80656f96e6e8bf3c1.ppt
- Количество слайдов: 26
Automatic Ad Blocking Justin Crites and Mathias Ricken November 24, 2004 Comp 527 – Computer Systems Security Rice University
Web Advertisement – The Facts n Annoying Animation n Sound n n Potentially dangerous May contain malware n Ad download reveals user’s IP n n Deteriorate user’s web experience
Web Advertisement – The Future n More ads Q 3 2003: $1. 79 billion online ad spending n Q 3 2004: $2. 43 billion (+36%) n Source: Interactive Advertising Bureau n Bigger, richer ads More half-page ads, fewer banners n 33% rich media (Flash, popups) n n Expected to surpass image ads in 2005 Source: Double. Click
Project Goals n Make ad blocking easy to use n n Particularly for novice users “Two click” blocking Automatic online updates Blocking of entire HTML page sections
Project Platform n Mozilla Firefox Open-source browser n Extensible through Java. Script n Source code for many plugins available as examples n n Ad. Block Open-source plugin n Filter certain HTML elements based on regular expressions n
Ad. Block in Detail n Filter out HTML elements n <IMG> n <IFRAME> n <EMBED> n <OBJECT> n Selection controlled by blacklist with wildcarded URLs n Example: http: //*. somedomain. com/pagead/ads? * n * denotes zero or more arbitrary characters
Ad. Block Problems n Wildcards entered manually n Concepts too difficult for non-technical users n No simple way to share filters n Ad. Block only blocks very few HTML elements n Non-recursive
Developing for Mozilla Firefox n Register plugin as “chrome provider” for Skin n Platform n Content n Localization n Ad. Block is a content provider n n Plugins consist of XUL – XML user interface definition n Java. Script – event scripting n
Problems for Developers n Java. Script issues No strict typing n No variable declarations necessary n n Lack of debugging support Uninformative errors n No single stepping/breakpoints n n Code of plugins split between several files n Confusing control flow, if badly written
“Two Click” Blocking
“Two Click” Blocking n User blocks items with minimum interaction Right-click Context menu n Left-click Autoblock n n Ad. Block now has three lists User filters (like before) n Autoblock URLs n Generated filters n
Wildcarded URL Generation n Intelligently build wildcarded URL n Go from http: //ad. domainname. com/ads/someimage. gif http: //xxx. domainname. com/ads/anotherpic. jpg … to http: //*. domainname. com/ads/* n Keep matching parts, replace different parts with * Longest Common Subsequence (LCS)
Longest Common Subsequence n n Dynamic programming, O(n 2) in time and space Nested for-loops n n If a[i] = b[j], M[i, j] M[i-1, j-1] + 1 else pick maximum from left or above j 0 1 2 3 4 5 6 i B D C A B A 0 0 0 0 A 0 0 1 1 2 3 4 5 6 7 B C B D A B 0 1 1 2 2 0 0 0
Longest Common Subsequence n n Dynamic programming, O(n 2) in time and space Nested for-loops n n If a[i] = b[j], M[i, j] M[i-1, j-1] + 1 else pick maximum from left or above Example: BDAB Insert * after diagonal streches n BD*AB* j 0 1 2 3 4 5 6 i B D C A B A 0 0 0 0 A 0 0 1 1 2 3 4 5 6 7 B C B D A B 0 1 1 2 2 3 3 0 1 2 2 2 3 3 0 1 2 2 3 3 4 0 1 2 2 3 4 4
LCS to Wildcarded URL n LCS often includes very short fragments http: //ad. domainname. com/ads/someimage. gif http: //xxx. domainname. com/ads/anotherpic. jpg generates http: //*. domainname. com/ads/*o*e*i*. * Only accept matching fragments with length > 2 n Cannot merge all URLs together n Result would be http: //* or similar
URL Merging n n Only merge two URLs if similar enough Look at wildcarded URL from LCS Remove all non-alpha-numeric characters n Remove common fragments n n http and other top-level domains n gif and other file suffixes n n com Merge only if string is still non-empty n else try merging with other URL or keep separate
Improving URL Merging n Wildcarded URLs are sometimes too general n Possible improvements Do not merge across domains n Take directory structure into account n Treat numbers as one entity, not separate characters n
Automatic Online Updates
Automatic Online Updates n New menu item in “Preferences” dialog Import filters from URL n Can automatically update after specified interval, e. g. one week n Circumvents file system n n Users can import filters from trusted agency n Magazine, university, network admin, etc.
Improvements to Online Updates n Create an “ad blocking community” Users add filters to online database n If filters are good, user gains karma n Filters from users with more karma get preferred n n Advertisers face thousands of users entering and sharing filters
Blocking HTML Page Sections
Blocking HTML Page Sections n Allows blocking HTML elements containing other elements (as opposed to just <img> or <object> tags) n Path-style strings specify HTML elements n n n meaning “the second <table> tag in the first <body> in the first <html>” Document Object Model (DOM) path html: 1/body: 1/table: 2 Paired with wildcard URLs to determine on which pages to block that HTML path n {“domainname. com/sessionid=*”, “html: 1/body: 1/table: 2”}
Implementation n HTML document viewed as tree n If a webpage URL matches the wildcarded URL Recur into DOM tree branch as specified by DOM path n Remove matching element n
Possible Improvements n Command characters in DOM paths n # – Block element for all indexes html: 1/body: 1/table: #/tr: 1 means “block the first row of all tables in the body” n * – Insert arbitrary path html: 1/*/table: #/tr: 1 means “block the first row of all tables in the document”
Conclusion n “Two click” blocking simplifies ad blocking for non-technical users Online updates make sharing filters easier Blocking entire HTML page sections is expected to be powerful for fairly static pages
Thank You! n We thank the following groups for the support we have received: n The Mozilla Organization (www. mozilla. org) n The Ad. Block Project (adblock. mozdev. org) n Dr. Dan Wallach, Scott Crosby and COMP 527
01f89b26cb1b09d80656f96e6e8bf3c1.ppt