Скачать презентацию Information Extraction from Event Announcements Student Jianwei Lu Скачать презентацию Information Extraction from Event Announcements Student Jianwei Lu

963444bb4025dbcf43f644eb08052769.ppt

  • Количество слайдов: 19

Information Extraction from Event Announcements Student: Jianwei Lu (40942937) Supervisor: Robert Dale Jianwei Lu Information Extraction from Event Announcements Student: Jianwei Lu (40942937) Supervisor: Robert Dale Jianwei Lu 1

Agenda n n n Project Introduction Email Event Information Extractor Conclusion Jianwei Lu 2 Agenda n n n Project Introduction Email Event Information Extractor Conclusion Jianwei Lu 2

Background n What is Information Extraction (IE)? ¡ ¡ n Automated extraction of key Background n What is Information Extraction (IE)? ¡ ¡ n Automated extraction of key information Populate a database What are the significances? ¡ ¡ Manage and search data efficiently Aim for other target applications FOR MORE INFO. . . [Cowie J and Wilks Y n, d] http: //www. dcs. shef. ac. uk/~yorick/papers/infoext. pdf Jianwei Lu 3

The Outcomes Title URL Jianwei Lu 4 The Outcomes Title URL Jianwei Lu 4

Sample Data n n n Corpus 1 – 30 documents Corpus 2 – 100 Sample Data n n n Corpus 1 – 30 documents Corpus 2 – 100 documents Corpus 3 – 1, 500 documents Jianwei Lu 5

Agenda n n n Project Introduction Email Event Information Extractor Conclusion Jianwei Lu 6 Agenda n n n Project Introduction Email Event Information Extractor Conclusion Jianwei Lu 6

My System Architecture Jianwei Lu 7 My System Architecture Jianwei Lu 7

Text Zoning Jianwei Lu 8 Text Zoning Jianwei Lu 8

URL Finding Rules n n Use pattern to capture URLs Approaches for finding an URL Finding Rules n n Use pattern to capture URLs Approaches for finding an event URL 1. 2. n Search Summary zone Search the whole document Results Jianwei Lu 9

Dates Finding Rules n n Use pattern to capture Dates Use clues to find Dates Finding Rules n n Use pattern to capture Dates Use clues to find corresponding date 1. 2. 3. n submission-date < start-date <= end-date no submission-date in a “Call for Participation” announcement etc. Results Jianwei Lu 10

Locations Finding Rules n Tokenise lines into words Use gazetteer to capture Locations n Locations Finding Rules n Tokenise lines into words Use gazetteer to capture Locations n Results n Jianwei Lu 11

Title Finding Rules Jianwei Lu 12 Title Finding Rules Jianwei Lu 12

Title Finding Rules (cont’d) n Apply Machine Learning to classify title lines Refine title Title Finding Rules (cont’d) n Apply Machine Learning to classify title lines Refine title after classification n Results n Jianwei Lu 13

Current Performance Jianwei Lu 14 Current Performance Jianwei Lu 14

Agenda n n n Project Introduction Email Event Information Extractor Conclusion Jianwei Lu 15 Agenda n n n Project Introduction Email Event Information Extractor Conclusion Jianwei Lu 15

What I have Achieved n Modules for Information Extraction ¡ ¡ n URL Dates What I have Achieved n Modules for Information Extraction ¡ ¡ n URL Dates Locations Title Evaluation Framework Jianwei Lu 16

Limitations and Future Work n n n Extension for refining titles Comparison for titles Limitations and Future Work n n n Extension for refining titles Comparison for titles Comprehensive study on SVM tool and features used for machine learning Jianwei Lu 17

Implementation Details n n Python 2. 6 Gazetteer from http: //world-gazetteer. com/ n Support Implementation Details n n Python 2. 6 Gazetteer from http: //world-gazetteer. com/ n Support Vector Machine http: //svmlight. joachims. org/ n Natural Language Toolkit (NLTK) http: //www. nltk. org/Home Jianwei Lu 18

Questions? Jianwei Lu 19 Questions? Jianwei Lu 19