Information Extraction from Event Announcements Student: Jianwei Lu (40942937) Supervisor: Robert Dale Jianwei Lu 1
Agenda n n n Project Introduction Email Event Information Extractor Conclusion Jianwei Lu 2
Background n What is Information Extraction (IE)? ¡ ¡ n Automated extraction of key information Populate a database What are the significances? ¡ ¡ Manage and search data efficiently Aim for other target applications FOR MORE INFO. . . [Cowie J and Wilks Y n, d] http: //www. dcs. shef. ac. uk/~yorick/papers/infoext. pdf Jianwei Lu 3
The Outcomes Title URL Jianwei Lu 4
Sample Data n n n Corpus 1 – 30 documents Corpus 2 – 100 documents Corpus 3 – 1, 500 documents Jianwei Lu 5
Agenda n n n Project Introduction Email Event Information Extractor Conclusion Jianwei Lu 6
My System Architecture Jianwei Lu 7
Text Zoning Jianwei Lu 8
URL Finding Rules n n Use pattern to capture URLs Approaches for finding an event URL 1. 2. n Search Summary zone Search the whole document Results Jianwei Lu 9
Dates Finding Rules n n Use pattern to capture Dates Use clues to find corresponding date 1. 2. 3. n submission-date < start-date <= end-date no submission-date in a “Call for Participation” announcement etc. Results Jianwei Lu 10
Locations Finding Rules n Tokenise lines into words Use gazetteer to capture Locations n Results n Jianwei Lu 11
Title Finding Rules Jianwei Lu 12
Title Finding Rules (cont’d) n Apply Machine Learning to classify title lines Refine title after classification n Results n Jianwei Lu 13
Current Performance Jianwei Lu 14
Agenda n n n Project Introduction Email Event Information Extractor Conclusion Jianwei Lu 15
What I have Achieved n Modules for Information Extraction ¡ ¡ n URL Dates Locations Title Evaluation Framework Jianwei Lu 16
Limitations and Future Work n n n Extension for refining titles Comparison for titles Comprehensive study on SVM tool and features used for machine learning Jianwei Lu 17
Implementation Details n n Python 2. 6 Gazetteer from http: //world-gazetteer. com/ n Support Vector Machine http: //svmlight. joachims. org/ n Natural Language Toolkit (NLTK) http: //www. nltk. org/Home Jianwei Lu 18
Questions? Jianwei Lu 19