198b2b584817116e954e3a9312dcf7a3.ppt
- Количество слайдов: 33
Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld University of Washington, Seattle UIST 2007
Programmers Use Search • To identify an API • To seek information about an API • To find examples on how to use an API Example Task: “Programmatically output an Acrobat PDF file in Java. ”
Example: General Web Search Interface
Example: Code-Specific Web Search Interface …
Problems • Information is dispersed: tutorials, API itself, documentation, pages with samples • Difficult and time-consuming to … – – locate required pieces, get an overview of alternatives, judge relevance and quality of results, understand dependencies. • Many page visits required
With Assieme we … • Designed a new Web search interface • Developed needed inference
Outline • Motivation • What Programmers Search For • The Assieme Search Engine – Inferring Implicit References – Using Implicit References for Scoring • Evaluation of Inference & User Study • Discussion & Conclusion
Six Learning Barriers faced by Programmers (Ko et al. 04) • Design barriers — What to do? • Selection barriers — What to use? • Coordination barriers — How to combine? • Use barriers — How to use? • Understanding barriers — What is wrong? • Information barriers — How to check?
Examining Programmer Web Queries Objective • See what programmers search for Dataset • 15 million queries and click-through data • Random sample of MSN queries in 05/06 Procedure • Extract query sessions containing ‘java’ – 2, 529 • Manual looking at queries and defining regex filters • Informal taxonomy of query sessions
Examining Programmer Web Queries
Examining Programmer Web Queries 64. 1 % 35. 9 % Descriptive Contain package, type or member name “java JSP current date” “java Simple. Date. Format” Selection barrier Use barrier 17. 9 % Contain terms like “example”, “using”, “sample code” “using currentdate in jsp” Coordination barrier
relevance indicated by # uses Assieme documentation example code Summaries show referenced types required libaries links to related info
Challenges ? How to put the right information on the interface • Get all programming-related data • Interpret data and infer relationships
Outline • Motivation • What Programmers Search For • The Assieme Search Engine – Inferring Implicit References – Using Implicit References for Scoring • Evaluation of Inference & User Study • Discussion & Conclusion
Assieme’s Data Pages with code examples Queried Google on “java ±import ±class …” ~2, 360, 000 JAR files Downloaded library files for all projects on Sun. com, Apache. org, Java. net, Source. Forge. net ~79, 000 Java. Doc pages Queried Google on “overview-tree. html …” ~480, 000 … is crawled using existing search engines
The Assieme Search Engine … infers 2 kinds of implicit references JAR files Pages with code examples Uses of packages, types and members ? Matches of packages, types and members Java. Doc pages
Extracting Code Samples unclear segmentation code in a different language (C++) distracting terms ‘…’ in code line numbers
Extracting Code Samples remove HTML commands, but preserve line breaks remove some distracters by heuristics launch (error-tolerant) Java parser at every line break (separately parse for types, methods, and sequences of statements)
Resolving External Code References Naïve approach of finding term matches does not work: 1 2 3 4 5 import java. util. *; class c { Hash. Map m = new Hash. Map(); void f() { m. clear(); } } ? Reference java. util. Hash. Map. clear() on line 4 only detectable by considering several lines Use compiler to identify unresolved names
Resolving External Code References • Index packages/types/members in Jar files java. util. Hash. Map. clear() java. util. Hash. Map … • Compile & lookup unresolved names compile index lookup JAR files put on classpath Utility function: # covered references (and JAR popularity) JAR files greedily pick best JARs
Scoring • Existing techniques … – Docs modeled as weighted term frequencies – Hypertext link analysis (Page. Rank) • … do not work well for code, because: – JAR files (binary code) provide no context – Source code contains few relevant keywords – Structure in code important for relevance
Using Implicit References to Improve Scoring • Assieme exploits structure on Web pages and structure in code HTML hyperlinks code references
Scoring APIs (packages/types/members) Web pages
Scoring APIs • Use text on doc pages and on pages with code samples that reference API (~ anchor text) • Weight APIs by #incoming refs (~ Page. Rank) Web Pages • Use fully qualified references (java. util. Hash. Map) and adjust term weights • Filter pages by references • Favor pages with accompanying text
Outline • Motivation • What Programmers Search For • The Assieme Search Engine – Inferring Implicit References – Using Implicit References for Scoring • Evaluation of Inference & User Study • Discussion & Conclusion
Evaluating Code Extraction and Reference Resolution … on 350 hand-labeled pages from Assieme’s data Code Extraction • Recall 96. 9%, Precision 50. 1% ( 76. 7%) • False positives: C, C#, Java. Script, PHP, Fish. Eye/diff • (After filtering pages without refs: precision 76. 7%) Reference Resolution • Recall 89. 6%, Precision 86. 5% • False positives: Fisheye and diff pages • False negatives: incomplete code samples
User Study Assieme vs. Google Code Search Design • 40 search tasks based on queries in logs: query “socket java” “Write a basic server that communicates using Sockets” • Find code samples (and required libraries) • 4 blocks of 10 tasks: 1 for training + 1 per interface Participants • 9 (under-)graduate students in Computer Science
User Study – Task Time F(1, 258)=5. 74 p ≈. 017 significant * F(1, 258)=1. 91 p ≈. 17
User Study – Solution Quality 0 seriously flawed. 5 generally good but fell short in critical regard 1 fairly complete F(1, 258)=55. 5 F(1, 258)=6. 29 p <. 0001 p ≈. 013 * *
User Study – # Queries Issued F(1, 259)=6. 85 p ≈. 001 F(1, 259)=9. 77 p ≈. 002 * *
Outline • Motivation • What Programmers Search For • The Assieme Search Engine – Inferring Implicit References – Using Implicit References for Scoring • Evaluation of Inference & User Study • Discussion & Conclusion
Discussion & Conclusion • Assieme – a novel web search interface • Programmers obtain better solutions, using fewer queries, in the same amount of time • Using Google subjects visited 3. 3 pages/task, using Assieme only 0. 27 pages, but 4. 3 previews • Ability to quickly view code samples changed participants’ strategies
Thank You Raphael Hoffmann Computer Science & Engineering University of Washington raphaelh@cs. washington. edu James Fogarty Computer Science & Engineering University of Washington jfogarty@cs. washington. edu Daniel S. Weld Computer Science & Engineering University of Washington weld@cs. washington. edu This material is based upon work supported by the National Science Foundation under grant IIS-0307906, by the Office of Naval Research under grant N 00014 -06 -10147, SRI International under CALO grant 03 -000225 and the Washington Research Foundation / TJ Cable Professorship.