70d1ce4aa543c78a2e9f5ffddd4e9212.ppt
- Количество слайдов: 27
Taxonomies, Lexicons and Organizing Knowledge Wendi Pohs, IBM Software Group
Agenda • Benefits, business and technical • A few definitions • Planning • Issues • Measuring value • Futures • Q&A IBM Software Group
The Mantra v. Knowledge is in the eye of the beholder, but reflecting end user needs is as critical as representing texts. . and it takes work! IBM Software Group
Business Benefits If only I could find information to help me do my job better. . . Mergers and acquisitions Research and development Industries: Consulting Pharmaceuticals Financial services Legal IBM Software Group
Technical Benefits • Site creation • Navigation/search • Personalization • Defining areas of expertise IBM Software Group
Definitions: Taxonomy • “The science, laws or principles of classification” (From the Greek: rules o arrangement) • Biology (Linnaeus) • Education (Bloom) • A hierarchical collection of categories and documents • Structure and content IBM Software Group
Definitions: Directory • More general than taxonomy • Natural structure • Wide vs deep • Category structure less controlled • File system • Yahoo (http: //www. yahoo. com) • Yellow Pages • Corporate Web sites (http: //www. ibm. com) IBM Software Group
Definitions: Thesaurus • Controlled vocabulary • Subject headings, labels • Synonyms (U, UF) • Relation types (TT, BT, NT, SN, HN, RT, SA) • Examples: http: //www. loc. gov/flicc/wg/taxonom y. html IBM Software Group
Definitions: Meta-data and tagging • Meta-data • Properties, attributes: information describing types of data [Crandall] • The ‘energy’ required to keep things organized [Earley] • Tagging • <META>, <Source> • Document Properties IBM Software Group
Definitions: Classification • Analyzing documents and assigning them to predefined categories • Rule-based vs natural • Classification schemes • Dewey • Library of Congress • Industry-specific IBM Software Group
Definitions: Clustering • Automatically generating groups of similar documents based on distance or proximity measures • "Bags of words" • Vector analysis determines boundaries • Adaptive, but not abstract IBM Software Group
Develop a Plan • Determine user information needs • Information audit, Content audit • Select appropriate sources • Create initial taxonomy • Edit categories • Categorize new documents • Test the UI • Train the taxonomy IBM Software Group
Plan: Information audit • What is the objective of the system? • Who owns the project? • What do users need? • What do content creators need? • What do system managers need? IBM Software Group
Plan: Content audit • Is there an existing taxonomy? • How clean is the meta-data? • Is the content suited to automatic classification techniques? • Good example: Notes discussion databases • Not-so-good example: Web site with little text, lots of links • Is a subset of a source better than the whole? IBM Software Group
Plan: Select sources • Which sources? • Who owns them? • Which sources do users access most often? • How do users access these sources? • What is the lifecycle of the content? • Who identifies the most current content? IBM Software Group
Plan: Maintenance • Resources • Centralized or department-level • Who decides when new content is added? • Term approval process • How do new concepts get into the taxonomy? IBM Software Group
Identify issues • Getting user involvement and buy-in • Maintenance resources • Directory versus taxonomy • Meta-data • Globalization and regionalization • Hidden vs published taxonomies IBM Software Group
Understand the BIG issues • Organizational “perfection complex” [Chait • Multiple taxonomies • Automated versus manual categorization IBM Software Group
Multiple taxonomies • Many editors • Term approval process, synonyms • Standard tools across the enterprise • Federated taxonomies • Taxonomy links, “cross-connections, ” facets, views • Taxonomy mapping IBM Software Group
IBM Software Group
Measuring value • NCR Corporation - Support Organization • Needed to convince organization of the value of captured content • Managers resisted diverting resources to maintaining content • Current measure: Time per incident • How could the value of a knowledge classification system be demonstrated? IBM Software Group
Measuring value • NCR developed a new parameter: • Knowledge helpful (the answer was in the support database and was used to solve the problem) • Knowledge not effective (the answer sent them in the wrong direction, did not help to address the issue) • Knowledge not available (nothing available to assist in solving the problem) • Knowledge not required (problem solved IBM Software Group
Futures • Methods: • Feature extraction, statistical analysis, rules-based, label generation • Starter taxonomies, imports • Taxonomy mapping • Interfaces: Visualization, better training tools IBM Software Group
Q&A • ? IBM Software Group
70d1ce4aa543c78a2e9f5ffddd4e9212.ppt