612bd256aa80cde571a41092d1a02f10.ppt
- Количество слайдов: 70
Taxonomy Strategies LLC Taxonomy Governance Ron Daniel, Jr. & Joseph A. Busch Taxonomy Strategies LLC May 16, 2005 Copyright 2005 Taxonomy Strategies LLC. All rights reserved.
Agenda § 1: 30 Welcome & Introductions § 1: 45 Exercise: Taxonomy Revisions § 2: 15 Fundamental Processes § 2: 30 Governance Team Roles and Structures § 3: 00 Tools § 3: 05 Break § 3: 15 Exercise: Organizational Self-Assessment § 3: 30 Maturity Model § 3: 40 Designing and Building Maintainable Taxonomies & Metadata § 4: 00 Additional Processes § 4: 20 Q &A § 4: 30 Adjourn Taxonomy Strategies LLC The business of organized information 2
Who we are: Joseph Busch § Over 25 years in the business of organized information § Founder, Taxonomy Strategies § Director, Solutions Architecture, Interwoven § VP, Infoware, Metacode Technologies § Program Manager, Getty Foundation § Manager, Pricewaterhouse § Metadata and taxonomies community leadership § President, American Society for Information Science & Technology § Director, Dublin Core Metadata Initiative § Adviser, National Research Council Computer Science and Telecommunications Board § Reviewer, National Science Foundation Division of Information and Intelligent Systems § Founder, Networked Knowledge Organization Systems/Services Taxonomy Strategies LLC The business of organized information 3
Who we are: Ron Daniel, Jr. § Over 15 years in the business of metadata & automatic classification § Principal, Taxonomy Strategies § Standards Architect, Interwoven § Senior Information Scientist, Metacode Technologies § Technical Staff Member, Los Alamos National Laboratory § Metadata and taxonomies community leadership § Chair, PRISM (Publishers Requirements for Industry Standard Metadata) working group § Acting chair: XML Linking working group § Member: RDF working groups § Co-editor: PRISM, XPointer, 3 IETF RFCs, and Dublin Core 1 & 2 reports. Taxonomy Strategies LLC The business of organized information 4
Recent & current projects § Commercial § Government § § § § Commodity Futures Trading Commission Defense Intelligence Agency ERIC Federal Aviation Administration Federal Reserve Bank of Atlanta Forest Service GSA Office of Citizen Services (www. firstgov. gov) Head Start Infocomm Development Authority of Singapore NASA (nasataxonomy. jpl. nasa. gov) Small Business Administration Social Security Administration USDA Economic Research Service USDA e-Government Program (www. usda. gov) § § § Allstate Insurance Blue Shield of California Debevoise & Plimpton Halliburton Hewlett Packard Motorola People. Soft Pricewaterhouse Coopers Siderean Software Sprint Time Inc. § Commercial subcontracts § § Agency. com – Top financial services Critical Mass – Fortune 50 retailer Deloitte Consulting – Big credit card Gistics/OTB – Direct selling giant § NGO’s § § Taxonomy Strategies LLC The business of organized information CEN IDEAlliance IMF OCLC 5
Participant Introductions § Who are you? § What do you do? § What brings you here today? Taxonomy Strategies LLC The business of organized information 6
Agenda § 1: 30 Welcome & Introductions § 1: 45 Exercise: Taxonomy Revisions § 2: 15 Fundamental Processes § 2: 30 Governance Team Roles and Structures § 3: 00 Tools § 3: 05 Break § 3: 15 Exercise: Organizational Self-Assessment § 3: 30 Maturity Model § 3: 40 Designing and Building Maintainable Taxonomies & Metadata § 4: 00 Additional Processes § 4: 20 Q &A § 4: 30 Adjourn Taxonomy Strategies LLC The business of organized information 7
Taxonomy Governance Overview § Is “Taxonomy Governance” synonymous with “Taxonomy Maintenance”? § What kinds of changes can be made, and what are their costs? § What kinds of information are needed to determine the changes? § What kind of group should maintain the taxonomy? § What kinds of rules should the group follow to decide on changes? § What should the group do beyond maintaining the taxonomy? Taxonomy Strategies LLC The business of organized information 8
Exercise: Taxonomy Modifications § Divide into small groups § Review assigned sample taxonomy § Discuss changes you would make § In 10 minutes, a spokesperson will speak for the group and briefly: § Tell us something good about the taxonomy § Characterize the short-term changes your group would make § Characterize the questions your group would like answered before making other changes Taxonomy Strategies LLC The business of organized information 9
Exercise Notes § Team Members: § Something good about the taxonomy: § Short term changes: § Questions for other changes: Taxonomy Strategies LLC The business of organized information 10
Group 1 Sample Taxonomy Strategies LLC The business of organized information 11
Group 2 Sample Taxonomy Top Level Business / Accounting / Firms / Directories Business / Biotechnology & Pharmaceuticals / Education & Training Business / Employment / By Industry Business / Healthcare / Employment / Regional Business / Small Business / Finance / Accounting Random Samples of Detailed Categories Reference / Education / Colleges & Universities / North America / United States / Maryland / Columbia Union College / Athletics Reference / Education / K-12 / Home Schooling / Unschooling / Chats and Forums Regional / Europe / Ireland / Business & Economy / Employment / Health & Medical Science / Math / Academic Departments / South America / Colombia Science / Social Sciences / Linguistics / Translation / Associations Society / People / Women / Science & Technology / Mathematics Taxonomy Strategies LLC The business of organized information 12
Group 3 Sample Taxonomy Top Level Detail in Auto Products Category Source: http: //householdproducts. nlm. nih. gov/products. htm Taxonomy Strategies LLC The business of organized information 13
Predictions § Short-term changes will center on rules of style – ‘&’ Editorial Rules vs. ampersand, capitalization, plurals § Faceted subdivision will only be suggested by experienced practitioners, by groups given low-level details of a taxonomy, or both. People will critique the UI Presentation § Questions for Long-term changes will focus, in decreasing order, on: § Who are the users and what are they doing? Metadata Specification, Design for maintainability How to put it into action? User Characterization § What is the content and how much is in the various categories? §… § What kind of money depends on the taxonomy, and what kind of maintenance expenses are justified? Content and Metadata Maintenance ROI § Anything else people want to cover? Taxonomy Strategies LLC The business of organized information 14
Agenda § 1: 30 Welcome & Introductions § 1: 45 Exercise: Taxonomy Revisions § 2: 15 Fundamental Processes § 2: 30 Governance Team Roles and Structures § 3: 00 Tools § 3: 05 Break § 3: 15 Exercise: Organizational Self-Assessment § 3: 30 Maturity Model § 3: 40 Designing and Building Maintainable Taxonomies & Metadata § 4: 00 Additional Processes § 4: 20 Q &A § 4: 30 Adjourn Taxonomy Strategies LLC The business of organized information 15
Fundamental Processes § What are the two fundamental processes every organization should implement to maintain its metadata and taxonomies? § Query log / Click trail examination § Tagging Error Correction § What are the key outlooks a taxonomist should try to instill in their organization? Taxonomy Strategies LLC The business of organized information 16
Fundamental Process #1 – Query Log Examination § How can we characterize users and what they are looking for? • • § Query Log & Click Trail • Examination § Sophisticated software available, • but don’t wait. § 80/20 Rule – 80% of value from 20% of possible reports. § Greatest value comes from: § Identifying a person as responsible for search quality § Starting a “Measure & Improve” mindset § Greatest challenge: § Getting a person assigned (≥ 10%) § Getting logs turned back on § What to do after the obvious fixes have been made Taxonomy Strategies LLC The business of organized information • • Ultra. Seek Reporting Top queries Queries with no results Queries with no click -through Most requested documents Query trend analysis Complete server usage summary Click Trail Packages i. Web. Track Net. Tracker Optimal. IQ Site. Catalyst Visitorville Web. Trends 17
Fundamental Process #2 – Tagging Error Correction § For the Taxonomy to be used, its values must be associated with content. § We will refer to this as “Tagging”. § Errors will happen, and some will be found. What are you going to do about them? § Define an error correction process. § Process will accommodate questions like: § Is it an error? What is the cost to correct or not correct? Does the correction need to be scheduled? etc. § Once an error is corrected, NEVER lose that fact. § Manually reviewed pages are vital for training automatic classifiers. § Has implications for metadata specification and review procedures. § Over time, multiple error detection methods will be defined. § e. g. Statistical sampling of newly added pages § Gradually, additional error correction processes may be defined to deal with particular types of errors. Taxonomy Strategies LLC The business of organized information 18
Fundamental Outlooks § How are we going to build and maintain metadata structures and controlled vocabularies? § The taxonomy problem § How are we going to populate metadata elements with complete and consistent values? § The tagging problem § How are we then going to use metadata in applications and demonstrate benefits? § The ROI problem Must know this to address other problems! Taxonomy Strategies LLC The business of organized information § Taxonomy Governance is a standards process. § Take tips from other standards efforts § Team, with comment-handling responsibilities and an appeals process § Issue Logs § Announcements § Release Schedule § Foster a “Measure & Improve” Mindset 19
Agenda § 1: 30 Welcome & Introductions § 1: 45 Exercise: Taxonomy Revisions § 2: 15 Fundamental Processes § 2: 30 Governance Team Roles and Structures § 3: 00 Tools § 3: 05 Break § 3: 15 Exercise: Organizational Self-Assessment § 3: 30 Maturity Model § 3: 40 Designing and Building Maintainable Taxonomies & Metadata § 4: 00 Additional Processes § 4: 20 Q &A § 4: 30 Adjourn Taxonomy Strategies LLC The business of organized information 20
Taxonomy Business Processes § Taxonomies must change, gradually, over time if they are to remain relevant § Maintenance processes need to be specified so that the changes are based on rational cost/benefit decisions § A team will need to maintain the taxonomy on a part- time basis § Taxonomy team reports to some other steering committee Taxonomy Strategies LLC The business of organized information 21
Definitions about the Controlled Vocabulary Governance Environment 1: Syndicated Terminologies change on their own schedule Syndicated Terminologies ISO 3166 -1 Other External Change Requests & Responses Published CVs and STs Web CMS 2: CV Team decides when to update CVs Archives Intranet Search Vocabulary Management System ERMS ’ Notifications CVs ERP 3: Team adds value via mappings, translations, synonyms, training materials, etc. Custodians Other Internal Consuming Applications Other Controlled Items Intranet Nav. DAM … 4: Updated versions of CVs published to consuming applications … ’ Controlled Vocabulary Governance Environment Taxonomy Strategies LLC The business of organized information 22
Other Controlled Items § Taxonomy Team will have additional items to manage: § Charter, Goals, Performance Measures § Editorial rules § Team processes § Tagger training materials (manual and automatic) § Outreach & ROI § Communication plan § Website § Presentations § Announcements § Roadmap Taxonomy Strategies LLC The business of organized information 23
Taxonomy governance | Generic team charter § Taxonomy Team is responsible for maintaining: § The Taxonomy, a multi-faceted classification scheme § Associated taxonomy materials, such as: § § Editorial Style Guide Taxonomy Training Materials Metadata Standard Team rules and procedures (subject to CIO review) § Team evaluates costs and benefits of suggested change § Taxonomy Team will: § Manage relationship between providers of source vocabularies and consumers of the Taxonomy § Identify new opportunities for use of the Taxonomy across the Enterprise to improve information management practices § Promote awareness and use of the Taxonomy Strategies LLC The business of organized information 24
Editorial Rules § To ensure consistent style, rules are needed § Issues commonly addressed in the rules: § § § § § Sources of Terms Abbreviations Ampersands Capitalization Continuations (More… or Other…) Duplicate Terms Hierarchy and Polyhierarchy Languages and Character Sets Length Limits “Other” – Allowed or Forbidden? Plural vs. Singular Forms Relation Types and Limits Scope Notes Serial Comma Spaces Synonyms and Acronyms Term Order (Alphabetic or …) Term Label Order (Direct vs. Inverted) Rule Name Editorial Rule Use Existing Vocabularies Other things being equal, reusing an existing vocabulary is preferred to creating a new one. Ampersands The character '&' is preferred to the word ‘and’ in Term Labels. Example: Use Type: “Manuals & Forms”, not “Manuals and Forms”. Special Characters Retain accented characters in Term Labels. Example: España Serial comma If a category name includes more than two items, separate the items by commas. The last item is separated by the character ‘&’ which IS NOT preceded by a comma. Example: “Education, Learning & Employment”, not “Education, Learning, & Employment”. Capitalization Use title case (where all words except articles are capitalized). Example: “Education, Learning & Employment” NOT “Education, learning & employment” NOT “EDUCATION, LEARNING & EMPLOYMENT” NOT “education, learning & employment” … … § Must also address issue of what to do when rules conflict – which are more important? Taxonomy Strategies LLC The business of organized information 25
Roles in Two Taxonomy Governance Teams § Executive Sponsor § Advocate for the taxonomy team § Business Lead § § Keeps team on track with larger business objectives Balances cost/benefit issues to decide appropriate levels of effort § Specialists help in estimating costs § Obtains needed resources if those in team can’t accomplish a particular task § Technical Specialist § § Estimates costs of proposed changes in terms of amount of data to be retagged, additional storage and processing burden, software changes, etc. Helps obtain data from various systems § Content Specialist § § § Team’s liaison to content creators Estimates costs of proposed changes in terms of editorial process changes, additional or reduced workload, etc. Small-scale Metadata QA Responsibility § Taxonomy Specialist § Suggests potential taxonomy changes based on analysis of query logs, indexer feedback § Makes edits to taxonomy, installs into system with aid of IT specialist § Content Owner § Reality check on process change suggestions Team structure at a different org. § Business Lead § Custodians § Responsible for content in a specific CV. § Training Representative § Develops communications plan, training materials § Work Practices Representative § Develops processes, monitors adherence § IT Representative § Backups, admin of CV Tool § Info. Mgmt. Representative § Provides CV expertise, tie-in with larger IM effort in the organization. Taxonomy Strategies LLC The business of organized information 26
Taxonomy governance | Where changes come from Firewall Application UI Tagging UI Content Application Logic Taxonomy Query log analysis End User Recommendations by Editor 1. Small taxonomy changes (labels, synonyms) 2. Large taxonomy changes (retagging, application changes) 3. New “best bets” content Taxonomy Strategies LLC Tagging Logic Staff notes ‘missing’ concepts Tagging Staff Taxonomy Editor Taxonomy Team The business of organized information Team considerations 1. Business goals experience 2. Changes in user experience 3. Retagging cost Requests from other Requestsof NASA parts from other parts of the organization 27
Processes § Different organizations will need to consider their own change processes. § Organization 1: A custodian is responsible for the content, but checks facts with department heads before making changes. § Organization 2: Analysts suggest changes, editors approve, copyeditors verify consistency. § Change process MUST also consider cost of implementing the change § § Retagging data Reconfiguring auto-classifier Retraining staff Changes in user expectations Taxonomy Strategies LLC The business of organized information Taxonomy Change Cases Case 1. Renaming a term Case 2. Adding a new leaf term Case 3. Inserting a new term Case 4. Splitting a term Case 5. Deleting a leaf term or subtree Case 6. Deleting a term Case 7. Moving a subtree Case 8. Merging terms Case 9. Adding a CV Case 10. Deleting a CV 28
Taxonomy governance | Taxonomy maintenance workflow Yes Suggest new name/category Review name Problem? Copy edit new name No Add to enterprise Taxonomy No Yes Analyst Taxonomy Tool Taxonomy Strategies LLC The business of organized information Editor Copywriter Sys Admin 29
Agenda § 1: 30 Welcome & Introductions § 1: 45 Exercise: Taxonomy Revisions § 2: 15 Fundamental Processes § 2: 30 Governance Team Roles and Structures § 3: 00 Tools § 3: 05 Break § 3: 15 Exercise: Organizational Self-Assessment § 3: 30 Maturity Model § 3: 40 Designing and Building Maintainable Taxonomies & Metadata § 4: 00 Additional Processes § 4: 20 Q &A § 4: 30 Adjourn Taxonomy Strategies LLC The business of organized information 30
Taxonomy editing tools vendors Most popular taxonomy editor? MS Excel Ability to Execute high Immature industry – no vendors in upper-right quadrant! low High functionality, high cost ($100 k!) Widely used, cheap, single-user Niche Players Taxonomy Strategies LLC Completeness of Vision The business of organized information Visionaries 31
Sample Taxonomy Editor Functionality § Standard and Custom Fields § Standard and Custom Relations § Data Typing, Restrictions, and Inference Term Editing § Flexible Reporting § Flexible Importing § Multiple Vocabulary Support § Inter-Vocabulary Relations § Unique IDs § ISO Codes not sufficient § Workflow § Voting § Change Request Management § Programmability Taxonomy Strategies LLC The business of organized information Hierarchy Browser 32
Where do I put the metadata? § Where can I store metadata? § In the content – HTML Headers, File properties, etc. § In a centralized repository – Search index, MDDB, etc. § In multiple systems – Common case § Where should I store metadata? § Consultant’s answer – “It depends. ” § If you are moving files through a process, putting it in the file keeps it from getting dropped at system borders. § If you are doing search across multiple documents, it has to be at least copied out of the files. § If you make copies of files and modify them, consistent in-file metadata will be impossible. § Real question is not where to STORE the metadata, it is how to MAINTAIN the metadata. § Web CMS as an example. § Central Metadata Database is a very advanced practice. Taxonomy Strategies LLC The business of organized information 33
Agenda § 1: 30 Welcome & Introductions § 1: 45 Exercise: Taxonomy Revisions § 2: 15 Fundamental Processes § 2: 30 Governance Team Roles and Structures § 3: 00 Tools § 3: 05 Break § 3: 15 Exercise: Organizational Self-Assessment § 3: 30 Maturity Model § 3: 40 Designing and Building Maintainable Taxonomies & Metadata § 4: 00 Additional Processes § 4: 20 Q &A § 4: 30 Adjourn Taxonomy Strategies LLC The business of organized information 34
Agenda § 1: 30 Welcome & Introductions § 1: 45 Exercise: Taxonomy Revisions § 2: 15 Fundamental Processes § 2: 30 Governance Team Roles and Structures § 3: 00 Tools § 3: 05 Break § 3: 15 Exercise: Organizational Self-Assessment § 3: 30 Maturity Model § 3: 40 Designing and Building Maintainable Taxonomies & Metadata § 4: 00 Additional Processes § 4: 20 Q &A § 4: 30 Adjourn Taxonomy Strategies LLC The business of organized information 35
What Processes Should I Try to Institute? § Processes will vary from one organization to another. § Assessing the Organization’s state is the first step. § Determining the ROI and potential resources follows. § Plan on instituting processes over time, beginning with basic ones. Taxonomy Strategies LLC The business of organized information 36
Search and Metadata Self-Assessment Form § Background 1) Rate your organization’s search & metadata maturity from 1 to 10. What was the most recent change to your organization’s search & metadata processes? 2) 3) 8) What is the next step for your organization’s search & metadata processes? § Is there a process in place to examine query logs? Is there an organization-wide metadata standard, such as an extension of the Dublin Core, for use by search tools, multiple repositories, etc. ? 10) Are there hiring and training practices especially for metadata and taxonomy positions? If so, describe briefly. Basic 4) 9) Are system features and metadata fields added based on cost/benefit analysis, rather than things that are easy to do with the current tools? Are tools only acquired after requirements have been analyzed, or are major purchases sometimes made to use up yearend money? 5) § Is there an ongoing data cleansing procedure to look for ROT (Redundant, Obsolete, Trivial content)? If so, describe briefly. Advanced 11) Are there established qualitative and quantitative measures of metadata quality? If so, describe briefly. Intermediate 6) § 12) Can the CEO explain the ROI for search and metadata? § Optional 13) Your name: 14) Organization: 7) Does the search engine index more than 4 repositories around the organization? 15) E-mail: Contact information will not be used for marketing purposes. It will only be used to follow-up and clarify issues around the survey. Taxonomy Strategies LLC The business of organized information 37
Agenda § 1: 30 Welcome & Introductions § 1: 45 Exercise: Taxonomy Revisions § 2: 15 Fundamental Processes § 2: 30 Governance Team Roles and Structures § 3: 00 Tools § 3: 05 Break § 3: 15 Exercise: Organizational Self-Assessment § 3: 30 Maturity Model § 3: 40 Designing and Building Maintainable Taxonomies & Metadata § 4: 00 Additional Processes § 4: 20 Q &A § 4: 30 Adjourn Taxonomy Strategies LLC The business of organized information 38
Metadata Maturity Model § Taxonomy governance processes must fit the organization § As consultants, we notice different levels of maturity in the business processes around Content Management, Taxonomy, and Metadata § Honestly assess your organization’s metadata maturity in order to design appropriate governance processes § We are starting to define a maturity model, similar to the CMMI model in the software world. Taxonomy Strategies LLC The business of organized information 39
Shameless Plug: Tomorrow Morning at 9: 45 Metadata Maturity Model Process Areas Call for Data: Leave Self-Assessments with us Limiting Processes Maturity Levels Basic Intermediate Advanced Bleeding Edge Search Capabilities Uniform Search Box Query Log Exam. Index Multiple Best Bets Simple Grouping Intranet Facet Navigation Improved Ranking Metadata and taxonomy standards System MD Stds. Organization MD Std. Reuse ERP Multiple Repos. Comply Taxonomy Roadmap Requirements, then Tools Bakeoff Datasets Budget for Bakeoffs Staff training and hiring Search Analyst Role Librarian Expertise Pre-hire Testing SME Catalogers Data creation and QA CM Introduced ROT-Elimination Hybrid Creation Model Adaptive Qualification Quality Measures Project Plan Std. Proj. Methodol. X-Functional Teams Communication Plan Multi-Year Plan Early Termination External Search ROI Intranet ROI Model CEO knows Search ROI Tools and tool selection Project management Executive support and ROI Taxonomy Strategies LLC The business of organized information Highly Abstract Subject Taxonomies Unneeded Capabil. Tools, then Reqs. Use it or Lose It Budgets 40
Purpose of Maturity Model § Estimating the maturity of an organization’s information management processes tells us: § How involved the taxonomy development and maintenance process should be § Overly sophisticated processes will fail § What to recommend as next steps § Maturity is not a goal, it is a characterization of an organization’s methods for achieving particular goals. § Mature processes have expenses which must be justified by consequent cost savings or revenue gains. § Metadata Maturity may not be core to your business. Taxonomy Strategies LLC The business of organized information 41
Agenda § 1: 30 Welcome & Introductions § 1: 45 Exercise: Taxonomy Revisions § 2: 15 Fundamental Processes § 2: 30 Governance Team Roles and Structures § 3: 00 Tools § 3: 05 Break § 3: 15 Exercise: Organizational Self-Assessment § 3: 30 Maturity Model § 3: 40 Designing and Building Maintainable Taxonomies & Metadata § 4: 00 Additional Processes § 4: 20 Q &A § 4: 30 Adjourn Taxonomy Strategies LLC The business of organized information 42
Overview of Best Practices in Metadata and Taxonomy § Avoid monolithic ‘subject’ taxonomies § May have a browsing taxonomy constructed from combined facets. § Use (or map to) Dublin Core for basic information. § Extend with custom elements for specific facts. § Use pre-existing, standard, vocabularies as much as possible. § Validate author names with LDAP directory § ISO country codes for locations § Product & service info from ERP system § Designate a team to manage the taxonomies and related materials § Taxonomy Editorial Rules, Processes, Training materials, Outreach & ROI § Design a Metadata QC Process § Start with an error-correction process, then get more formal on error detection. § In the future, large-scale ontologies like CYC may be valuable in automated error detection. Taxonomy Strategies LLC The business of organized information 43
Factor “Subject” into smaller facets § Size § DMOZ tries to organize all web content, has more than 600 k categories! § Difficulty in navigating, maintaining § Hidden facet structure § “Classification Schemes” vs. “Taxonomies” Taxonomy Strategies LLC The business of organized information 44
Sources for 7 common vocabularies Vocabulary Definition Potential Sources Organizational structure. FIPS 95 -2, U. S. Government Manual, Your organizational structure, competitors, partners, regulators, etc. Content Type Structured list of the various types of content being managed or used. DC Types, AGLS Document Type, AAT Information Forms , Records management policy, etc. Industry Broad market categories such as lines of business, life events, or industry codes. FIPS 66, SIC, NAICS, etc. Location Place of operations or constituencies. FIPS 5 -2, FIPS 55 -3, ISO 3166, UN Statistics Div, US Postal Service, etc. Topic Business topics relevant to your mission and goals. Federal Register Thesaurus, NAL Agricultural Thesaurus, LCSH, etc. Audience Subset of constituents to whom a piece of content is directed or intended to be used. GEM, ERIC Thesaurus, IEEE LOM, etc. Products and Services Names of products/programs & services. ERP system, Your products and services, etc. Functions and processes performed to accomplish mission and goals. FEA Business Reference Model, Enterprise Ontology, AAT Functions, etc. Taxonomy Strategies LLC The business of organized information 45
Facet Principles § Basic facets with identified items – people, places, projects, instruments, missions, organizations, … Note that these are not subjective “subjects”, they are objective “objects”. § Subjective views can be laid on top of the objective facts, but should be in a different namespace so they are clearly distinguishable. § For example, labels like “Anarchist” or “Prime Minister” can be applied to the same person at different times (e. g. Nelson Mandela). Taxonomy Strategies LLC The business of organized information 46
Iterative Development Vision (More participants and tagged content at each iteration) 1 Identify Objectives Review tagged samples, default procedures Interview core team and stakeholders 2 Inventory Content Define fields & purpose 4 Model Content Gather additional sources, if any ID sources, spider assets & extract metadata 3 Specify Metadata Define content chunks & XML DTDs 6 Specify Procedures Start with UI sketches, off -the-shelf rules. Stage Participants Taxonomy Strategies Revise, use in alpha CMS alpha workflows in CMS Manually tag small sample 7 Train Staff Plan & Prototype Project Team LLC Modify CMS for beta Revise if needed, bake into alpha CMS Tailor the default materials Use alpha CMS to tag larger sample Alpha Dev & Test Stakeholders and SMEs The business of organized information Interview beta users Gather additional sources, if any Revise if needed, bake into alpha CMS Compile controlled vocabularies 5 Specify Vocabularies Interview alpha users Modify for 1. 0 Modify CMS for beta Modify for 1. 0 Revise, use in beta CMS Modify & extend workflows Use beta CMS to tag larger sample Beta D&T Friendly Users Revise using team procedur e Finalize procedure materials Finalize training materials & train staff Final D&T Audiences 47
Planning for Taxonomy Changes § Error Correction – What to do when end-users and tagging staff notice problems? § Provide for it in the Error Correction Process § Add Query Log Analysis to help detect user problems § How to answer questions re. things to add, delete, or rearrange in the taxonomy? § Keep a visible issue log § Discuss with SMEs, tag samples, use other testing methods § Per-facet changes: § Corporate reorganizations, Product lineup changes, Country splits & merges, … will happen. Prepare for them when deploying those facets § Long-term – what facets to create, when, and why § See Taxonomy Roadmap section Taxonomy Strategies LLC The business of organized information 48
Agenda § 1: 30 Welcome & Introductions § 1: 45 Exercise: Taxonomy Revisions § 2: 15 Fundamental Processes § 2: 30 Governance Team Roles and Structures § 3: 00 Tools § 3: 05 Break § 3: 15 Exercise: Organizational Self-Assessment § 3: 30 Maturity Model § 3: 40 Designing and Building Maintainable Taxonomies & Metadata § 4: 00 Additional Processes § Brief remarks on Measurements, ROI, Training, Roadmap § 4: 20 Q &A § 4: 30 Adjourn Taxonomy Strategies LLC The business of organized information 49
Measuring Metadata and Taxonomy Quality § Taxonomy development is an iterative process § Develop an organizational idea, then test it by tagging sample content § Elicit feedback via walk-throughs and card sorting exercises § Use both qualitative and quantitative methods § Time, budget, and availability of tagged data will determine what methods are possible. Taxonomy Strategies LLC The business of organized information 50
Taxonomy testing | Qualitative methods Method Process Walk-throughs Show and explain Include sample pages in walkthroughs, not just the hierarchy. Validation 4 Approach 4 Consistency to rules 4 Accuracy (SME Checking) Card sorting, Contextual analysis Usability Testing 4 Appropriateness to task 4 Repeatability of user classification 4 Tasks are completed successfully 4 Time to complete task is reduced 4 Reaction to new interface User Satisfaction Survey Tagging samples 4 Reaction to search results Tag sample content with 4 Content ‘fit’ taxonomy 4 Fills out content inventory 4 Training materials for people & algorithms 4 Basis for quantitative methods Taxonomy Strategies LLC The business of organized information 51
Tagged Samples § The Taxonomy must fit the content. § How to verify this? Tag samples! § Spreadsheets are a convenient tool for this. URLs, drop-down choosers, text notes allowed. § Team can review tagged samples when reviewing taxonomy § More sophisticated teams may test Metadata Element URL sixbits. atl. frb. org/invoke. cfm? objectid=A 01 B 3 0 D 1 -10 C 2 -11 D 6981100508 B 104751&method=display Headline Innovation Awards Organization Federal Reserve Bank of Atlanta Content Type Honors & Awards Subject Salary & Compensation? inter-cataloger agreement § Samples should appear in training materials for tagging staff § Show typical and unusual cases. § Samples are used to define training sets for automatic classifiers. Taxonomy Strategies LLC The business of organized information Metadata Value DOCUMENT URL FACET A FACET B FACET C FACET D MISSING IDEAS 52
Quantitative Method | How evenly does it divide the content? § Background: § Documents do not distribute uniformly across categories § Zipf (1/x) distribution is expected behavior § 80/20 rule in action (actually 70/20 rule) § Methodology: § Part of alpha test of ‘content type’ for corporate intranet § 115 URLs selected at random from search index were manually categorized. Inaccessible files and ‘junk’ were removed § Results: § Results were slightly more uniform than the Zipf distribution, which is better than expected Taxonomy Strategies LLC The business of organized information 53
Quantitative Method | How intuitive (repeatable) are the categorizations? § Methodology: Closed Card Sort § For alpha test of a grocery site § 15 Testers put each of 100 best- selling products into one of 10 predefined categories § Categories where fewer than 14 of 15 testers put product into same category were flagged “Cocoa Drinks – Powder” is best categorized in both “Beverages” and “Grocery”. § Results: % of Testers Cumulative % of Products 15/15 54% 14/15 70% 13/15 77% 12/15 83% 11/15 85% <11/15 100% Taxonomy Strategies LLC The business of organized information In the trade, “Corn Tortillas” are a Dairy item! 54
Quantitative Method | How does taxonomy “shape” match that of content? § Background: § Hierarchical taxonomies allow comparison of “fit” between content and taxonomy areas § Methodology: § 25, 380 resources tagged with taxonomy of 179 terms. (Avg. of 2 terms per resource) § Counts of terms and documents summed within taxonomy hierarchy § Results: § Roughly Zipf distributed (top 20 terms: 79%; top 30 terms: 87%) § Mismatches between term% and document% flagged Term Group % Terms % Docs Administrators 7. 8 15. 8 Community Groups 2. 8 1. 8 Counselors 3. 4 1. 4 Federal Funds Recipients and Applicants 9. 5 34. 4 Librarians 2. 8 1. 1 News Media 0. 6 3. 1 Other 7. 3 2. 0 Parents and Families 2. 8 6. 0 Policymakers 4. 5 11. 5 Researchers 2. 2 3. 6 School Support Staff 2. 2 0. 2 Student Financial Aid Providers 1. 7 0. 7 Students 27. 4 7. 0 Teachers 25. 1 11. 4 Source: Courtesy Keith Stubbs, US. Dept. of Education Taxonomy Strategies LLC The business of organized information 55
Taxonomy ROI § What level of effort in taxonomy creation and maintenance is justified? Taxonomy Strategies LLC The business of organized information 56
Fundamentals of Taxonomy ROI § Building and maintaining a taxonomy, and tagging data with it, are costs not benefits. § There is no benefit without exposing the tagged data to users in some way that cuts costs or improves revenues. § Putting a new taxonomy into operation requires UI changes and/or backend system changes. § You need to determine those changes, and their costs, as part of the taxonomy ROI. Taxonomy Strategies LLC The business of organized information 57
Common Taxonomy ROI Scenarios § Catalog site - ROI based on increased sales through improved § product findability § product cross-sells and up-sells § customer loyalty § Call center - ROI based on cutting costs through § fewer customer calls due to improved website self-service § faster, more accurate CSR responses through better information access § Knowledge worker productivity - ROI based on cutting costs through § less time searching for things § less time recreating existing materials, with knock-on benefits of less confusion and reduced storage and backup costs § Executive mandate § No ROI at the start, just someone with a vision and the budget to make it happen. Taxonomy Strategies LLC The business of organized information 58
Tagging and Training § How are we going to populate metadata elements with complete and consistent values? § The tagging problem § How are we going to get people (and/or software) to assign consistent, and accurate, metadata to the content? § The tagger training problem Taxonomy Strategies LLC The business of organized information 59
Taxonomy governance: Workflow-driven metadata tagging Compose in Template Submit to CMS Automatically fill-in metadata Yes Review content Approve/Edit metadata Problem? Copy Edit content No No Hard Copy Web site Tagging Process Doesn’t Stop Here! Yes Analyst Tagging Tool Taxonomy Strategies LLC The business of organized information Editor Copywriter Sys Admin 60
Training Taxonomy Editors and Tagging Staff § Staff will require training on Indexing UI § The structure of the taxonomy § The UI they use to tag the content § The rules to follow when deciding what codes to apply § The end-effect of the codes they apply – have a running prototype or QA environment. § Tagging examples come from samples tagged during taxonomy development. Indexing rules Rule Specificity rule Apply the most specific terms when tagging assets. Specific terms can always be generalized, but generic terms cannot be specialized. Repeatable rule All attributes should be repeatable. Use as many terms as necessary to describe What the asset is about and Why it is important. Storage is cheap. Re-creating content is expensive. Appropriate ness rule Not all attributes apply to all assets. Only supply values for attributes that make sense. Usability rule Anticipate how the asset will be searched for in the future, and how to make it easy to find it. Remember that search engines can only operate on explicit information. § Hardcopies of the taxonomy, and yellow highlighters, are helpful during training. Taxonomy Strategies LLC The business of organized information Description 61
Tagging tool example—Interwoven Meta. Tagger Auto-categorization Manual form fill-in w/ check boxes, pulldown lists, etc. Auto keyword & summarization Taxonomy Strategies LLC The business of organized information Parse & lookup (recognize names) Rules & pattern matching 62
Taxonomy Roadmap § How to plan for long-term taxonomy development projects? Taxonomy Strategies LLC The business of organized information 63
Taxonomy Roadmap § Most organizations require a phased implementation of an Enterprise Taxonomy § A Taxonomy Roadmap defines the facets to be developed, their timing, and the reasons why § Factors to consider in prioritizing the facets include: § Immediacy of application – how will the taxonomy be put into use? A § § Search Engine? Portal Navigation? Other? How long will that take? Impact – How many users will a facet help? How big of a help will it be? Ease of development – does the vocabulary exist, can it be bought, or must it be developed? How big and complex will it be? How often will it change? Are there tools to help manage taxonomy changes or must those be acquired too? What data must be tagged for that? What are the requirements on the metadata’s density and accuracy? Can those be met with automatic methods, or will more extensive human involvement be needed? Staff expertise and Team experience. Taxonomy Strategies LLC The business of organized information 64
Roadmap: Dependencies § Roadmap requires an organization plan their projects well in advance, so that upcoming projects can be influenced by the taxonomy § Consequently, this is an advanced practice § Roadmap prioritizes vocabularies according to benefit, cost, and fit with projects. § Governance Team is responsible for maintaining the Roadmap and the necessary outreach. Taxonomy Strategies LLC The business of organized information 65
Roadmap: Facet Prioritization Matrix Facet Description Impact Effort to create/ maintain CV Effort to tag Language* Languages supported by portal Medium (High impact for subset) Done/Low Format File format (PDF, doc, html, etc…) Low/Low Location* Geo, region, country, site Med-High Done/Low Medium Content Type Also referred to as genre (news, policy, checklist, form, etc…) Medium/Low Medium Organization Publishing organization that owns content Medium/High Medium Subject Also referred to as topic (benefits, travel, etc…) High/High Medium Products & Services Corporate product and service offerings Medium High/High Role (level of responsibility)* Manager, employee, nonemployee High (In use on portal, but search has limited access to secure content) Done/Low High Access Control Organization as audience Low Medium/High * Facets already in existence in client’s Intranet Taxonomy Strategies LLC The business of organized information 66
Roadmap: Timeline lists the facets to be developed, and when those development efforts start and end. Language Intermediate and related projects are also shown. Search Format Timeline shows what projects will make use of the facet, and how long that should take. Search Access Control Auto. Classification Tool Content Type Role Sear ch CM? Searc h? Taxonomy Tool Projects Organization Location (Region) Subject Search & Org Chart UI Ind ex CM ? Search & Portal Nav Location (Country) Products/ Services FY 04 Q 2 Taxonomy Strategies FY 04 Q 3 LLC FY 04 Q 4 The business of organized information FY 05 Q 1 Search? Index Search & Index FY 05 Q 2 FY 05 Q 3 FY 05 Q 4 67
Agenda § 1: 30 Welcome & Introductions § 1: 45 Exercise: Taxonomy Revisions § 2: 15 Fundamental Processes § 2: 30 Governance Team Roles and Structures § 3: 00 Tools § 3: 05 Break § 3: 15 Exercise: Organizational Self-Assessment § 3: 30 Maturity Model § 3: 40 Designing and Building Maintainable Taxonomies & Metadata § 4: 00 Additional Processes § 4: 20 Q &A § 4: 30 Adjourn Taxonomy Strategies LLC The business of organized information 68
Agenda § 1: 30 Welcome & Introductions § 1: 45 Exercise: Taxonomy Revisions § 2: 15 Fundamental Processes § 2: 30 Governance Team Roles and Structures § 3: 00 Tools § 3: 05 Break § 3: 15 Exercise: Organizational Self-Assessment § 3: 30 Maturity Model § 3: 40 Designing and Building Maintainable Taxonomies & Metadata § 4: 00 Additional Processes § 4: 20 Q &A § 4: 30 Adjourn Taxonomy Strategies LLC The business of organized information 69
Taxonomy Strategies LLC Contact Info Ron Daniel, Jr. 925 -368 -8371 rdaniel@taxonomystrategies. com Joseph Busch 415 -377 -7912 jbusch@taxonomystrategies. com May 16, 2005 Copyright 2005 Taxonomy Strategies LLC. All rights reserved.