86efca084b85e783a3ac5a346eb46677.ppt
- Количество слайдов: 126
Taxonomy Strategies LLC Taxonomy & metadata strategies for effective content management Melbourne, Sydney, Canberra Masterclass 6 -15 June 2007 Copyright 2007 Taxonomy Strategies LLC. All rights reserved.
Today’s agenda 9: 00 -9: 10 -9: 15 -9: 45 -10: 00 10 min Introduction 5 min Warm-up exercise 30 min Taxonomy fundamentals: Building taxonomies 15 min Taxonomy exercise 10: 00 -10: 30 30 min Taxonomy fundamentals: Taxonomy business case 10: 30 -11: 00 30 min Tea Break 11: 00 -12: 00 60 min Taxonomy governance 12: 00 -12: 30 30 min Capabilities self-assessment 12: 30 -13: 30 60 min Lunch 13: 30 -14: 30 60 min Taxonomy benchmarking 14: 30 -14: 45 15 min Benchmarking exercise 14: 45 -15: 15 30 min Tea Break 15: 15 -16: 15 60 min Content tagging 16: 15 -16: 30 15 min Tagging exercise 16: 30 -17: 00 30 min Q&A Taxonomy Strategies LLC The business of organized information 2
Who I am: Joseph Busch v Over 25 years in the business of organized information. § Founder, Taxonomy Strategies LLC § Director, Solutions Architecture, Interwoven § VP, Infoware, Metacode Technologies – (acquired by Interwoven, November 2000) § Program Manager, Getty Foundation § Manager, Pricewaterhouse v Metadata and taxonomies community leadership. § President, American Society for Information Science & Technology § Director, Dublin Core Metadata Initiative § Adviser, National Research Council Computer Science and Telecommunications Board § Reviewer, National Science Foundation Division of Information and Intelligent Systems § Founder, Networked Knowledge Organization Systems/Services Taxonomy Strategies LLC The business of organized information 3
What we do Organize Stuff Taxonomy Strategies LLC The business of organized information 4
For us, taxonomy work includes: v Metadata specification defines the properties needed to describe content so that it can be found & used. v Vocabularies are collections of terms that are used to specify some of the metadata properties. § Some vocabularies are big and hierarchical, some are small and flat. v An application profile specifies what metadata & vocabularies are required, and then represents them formally. Taxonomy Strategies LLC The business of organized information 5
Recent & current projects: http: //www. taxonomystrategies. com/html/clients. htm Government Commercial Not-for-Profit Taxonomy Strategies LLC The business of organized information 6
Who are you? What sectors do you work in? Your Role v Administrator v Records Manager v Content Manager v Communications v Editor v Information Architect v Usability Expert v Librarian v Knowledge Engineer v Ontologist v Chief Information Officer Taxonomy Strategies LLC The business of organized information Industrial Sector v Agriculture & Processing § Food, Lumber, Pulp & Paper v Financial Services § Banking & Insurance v Government § Public administration § Public safety v High Tech § Computers, Software & Telecommunications v Heavy Manufacturing § Steel, Automobiles & Aircraft v Manufacturing § Consumer Products v Medical & Health Care v Mining & Refining § Petrochemicals, Oil & Gas v Pharmaceuticals 7
Why are you here? v What are the key questions that you want answered in today’s workshop? v Please rank the questions from the most important (5) to the least important (1) v Please provide your job title, organization and department; your name is optional. Priority (1 -5) Questions Your title or role: Your org or industry: Your dept: Your name: Taxonomy Strategies LLC The business of organized information (optional) 8
Today’s agenda 9: 00 -9: 10 -9: 15 -9: 45 -10: 00 10 min Introduction 5 min Warm-up exercise 30 min Taxonomy fundamentals: Building taxonomies 15 min Taxonomy exercise 10: 00 -10: 30 30 min Taxonomy fundamentals: Taxonomy business case 10: 30 -11: 00 30 min Tea Break 11: 00 -12: 00 60 min Taxonomy governance 12: 00 -12: 30 30 min Capabilities self-assessment 12: 30 -13: 30 60 min Lunch 13: 30 -14: 30 60 min Taxonomy benchmarking 14: 30 -14: 45 15 min Benchmarking exercise 14: 45 -15: 15 30 min Tea Break 15: 15 -16: 15 60 min Content tagging 16: 15 -16: 30 15 min Tagging exercise 16: 30 -17: 00 30 min Q&A Taxonomy Strategies LLC The business of organized information 9
The Taxonomy problem: How to pick from > 5, 000 faucets? By: v Category v Price v Brand v Color/Finish v # Handles v Series Name v Water Filter? v Faucet Spray v Handle Shape v Soap Dispenser? Taxonomy Strategies LLC The business of organized information 10
The main issue: What goes here? v When do the things in the list change? v How do we maintain the list? v What rules do we follow? Taxonomy Strategies LLC The business of organized information 11
Seven phases of taxonomy development Week: 1 Identify Objectives 2 Inventory Resources 1 2 3 4 5 6 7 8 9 10 11 12 Conduct interviews Identify, gather & review resources 3 Specify Metadata 4 Model Content 5 Specify Vocabularies 6 Specify Procedures 7 Test & Train Taxonomy Strategies LLC The business of organized information Define fields & purpose Define content chunks & XML DTDs Compile controlled vocabularies Develop workflow, rules & procedures Manually tag small sample 12
Taxonomy design phases need to be iterated Plan & Prototype 1 Identify Objectives 2 Inventory Resources 3 Specify Metadata 4 Model Content 5 Specify Vocabularies 6 Specify Procedures 7 Test & Train Alpha Dev & Test Review tagged samples, default procedures Interview core team and stakeholders Interview alpha users Gather additional resources, if any Identify, gather & review resources Define content chunks & XML DTDs Develop workflow rules & procedures Manually tag small sample Taxonomy Strategies LLC The business of organized information Interview beta users Modify CMS for beta Revise if needed, bake into alpha CMS Compile controlled vocabularies Final D&T Gather additional sources, if any Revise if needed, bake into alpha CMS Define fields & purpose Beta D&T Revise, use in alpha CMS alpha workflows in CMS Use alpha CMS to tag larger sample Modify for 1. 0 Modify CMS for beta Modify for 1. 0 Revise, use in beta CMS Modify & extend workflows Use beta CMS to tag larger sample Revise using team proced ure Finalize procedure materials Finalize training materials & train staff 13
Licensing an existing taxonomy See Factiva’s taxonomy www. taxonomywarehouse. com v There are usually license fees, but these will be less than the effort to develop an equivalent taxonomy. v But pre-existing taxonomies rarely fit an organization’s needs and may require extensive customization. Recommendation v Adopt a faceted approach. v Reuse existing (especially internal) vocabularies for as many of the facets as possible. v Plan on doing full-custom “Content Type” and “Topic” taxonomies. Taxonomy Strategies LLC The business of organized information 14
Free sources for 8 common taxonomies Taxonomy Definition Potential Sources Organizational structure. Content Type Structured list of the various types Dublin Core Type Vocabulary, AGLS of content being managed or used. Document Type, Your records management policy, etc. Industry Broad market categories such as lines of business, life events, or industry codes. SIC, NAICS, Your market segments, etc. Location Place of operations or constituencies. FIPS 5 -2, FIPS 55 -3, ISO 3166, UN Statistics Div, US Postal Service, Your sales regions, etc. Business Activity Business activities or functions performed to accomplish mission and goals. Federal Enterprise Architecture Business Reference Model, Enterprise ontology, Your business functions, etc. Topic Business topics relevant to your mission & goals. Federal Register Thesaurus, NAL Agricultural Thesaurus, Your research areas, etc. Audience Subset of constituents to whom a piece of content is directed or is intended to be used by. GEM, ERIC Thesaurus, IEEE LOM, Your psycho-graphics or personas, etc. Products & Services Names of products/programs and services. ERP system, Your products and services, etc. Taxonomy Strategies LLC The business of organized information SP 800 -87, U. S. Government Manual, Your organizational structure, etc. 15
Typical product catalog: A-Z, then idiosyncratic categories Taxonomy Strategies LLC The business of organized information 16
How to analyze existing product catalog categories: Principles and priorities Preparing a product catalog for facet browsing (aka Guided Navigation) requires a category hierarchy and additional attributes. Principles 1. Categories and subcategories that could be swapped are candidates for conversion to attributes. 2. Repeated lists of subcategories signal a possible need for an attribute. 3. The number of attributes should not exceed six or seven, so not all attribute candidates should be used. • Avoid selecting strongly correlated attributes, such as “Weight” and “Shipping Weight”. Priorities 1. Choose Categories that apply to many products, over those with few products. 2. Choose Attributes that apply to many Categories over those that apply only to very few categories. Taxonomy Strategies LLC The business of organized information 17
Product categories example: Wireless carrier Products Accessories Content Phones Services Batteries Cases Chargers Data Hands-Free Headsets Miscellaneous Purchased Subscription Taxonomy Strategies LLC The business of organized information Versatile Phones Smart Devices Basic Phones Prepaid Phones International Only Phones Mobile Broadband Cards Conferencing Internet / Data Landline Phone Network & Roaming Relay Services Solutions Wireless Data 18
Product attributes example: Digital cameras in an electronics catalog Resolution v Types of attributes § Generic attributes – Brand/Product Family/Model – Price Range – Usually Ships 3 Megapixels (4) 4 Megapixels (5) 5 Megapixels (27) 6 -8 Megapixels (21) Brand Canon (15) § Merchandising attributes Fuji (10) – Usage (E-mail, Internet Browsing, Programming, …) Kodak (17) – Segment (Home, Business, Education, Government …) Nikon (8) – Region & Country Olympus (9) – Most Popular – New Type Point & Shoot (25) – Related Products Digital SLR (10) § Specialized attributes Packages (5) – Capacity (Battery; Memory; MB; GB; BPS, …) – Resolution (DPI; Megapixels; XGA, UXGA, …) Price Range – Size (Display; Screen; . . . ) $100 -250 (5) – Standard (a, b, g, n, …; scsi, ata, sata, eide, …; dimm, simm, $250 -500 (16) …) $500 -1000 (19) – Type (Camera; Battery; Display; Printer; Server; Storage; More than $1000 (3) Switch; …) Taxonomy Strategies LLC The business of organized information 19
Faceted taxonomy theory & practice v How many terms are needed to provide sufficient granularity? Not as many as you think! v Post-coordinate indexing allows several simple controlled vocabularies to be combined, rather than using a single large pre-coordinated vocabulary. Taxonomy Strategies LLC The business of organized information 20
The power of faceted taxonomy v 4 independent categories of 10 nodes each have the same discriminatory power as one hierarchy of 10, 000 nodes (104) 10, 000 § Easier to maintain § Easier to tag by content authors § Can be easier to navigate Audience Advocacy Contractors & Grantees Environmental Professionals Federal Facilities General Public Industry Kids Researchers & Scientists Small Business Students Health Advisory Exposure Food Safety Health Assessment Health Effect Health Risk Occupational Health Pesticide Effects Sun Protection Toxicity Industry Agriculture & Cattle Automobile Repair Chemical Dry Cleaning Electronics & Computer Energy Extractive Industries Food Processing Leather Tanning & Finishing Metal Finishing Substance Allergen Biological Contaminant Carcinogen Chemical Explosive Liquid Waste Microorganism Ozone Pesticide Radioactive Waste v It’s more effective to increase the number of facets, than to increase the number of terms per facet. Taxonomy Strategies LLC The business of organized information 21
Automatically created taxonomies v Documents can be ‘clustered’ based on similarities and differences. v Problems: § Typically only a single hierarchy § No overall plan § Results hard for people to navigate What does “North” mean on this map? Taxonomy Strategies LLC The business of organized information 22
Automatic taxonomy construction software v Software can scan large quantities of content and extract statistically significant words and phrases. v Example: § Archive of 10 publications analyzed for topics related to “copyright. ” v Software does a poor job of § De-duplication. § Turning significant words and phrases into a larger structure. § Discriminating between “gold” and “garbage. ” v Software is good for § Getting an understanding of the key noun phrases in a large collection. § Providing test cases for evaluating a taxonomy. Source: Sample data courtesy of n. Stein. Taxonomy Strategies LLC The business of organized information 23
Most popular flickr tags on 20 Feb 2007 http: //www. flickr. com/photos/tags/ Sort flickr categories into 5 or fewer groups. Then label each group. Taxonomy Strategies LLC The business of organized information 24
Taxonomy exercise— Facet grouping v Universal taxonomy facets § By location (spatially) § By time (chronologically) § By type (genre) § By physical properties (size, color, shape, etc. ) § By subject (topic) Richard Saul Wurman. Information Architects (1996) Taxonomy Strategies LLC The business of organized information 25
Taxonomy exercise— Facet grouping Sort flickr categories into 5 or fewer groups. Then label each group. Taxonomy Strategies LLC The business of organized information 26
Today’s agenda 9: 00 -9: 10 -9: 15 -9: 45 -10: 00 10 min Introduction 5 min Warm-up exercise 30 min Taxonomy fundamentals: Building taxonomies 15 min Taxonomy exercise 10: 00 -10: 30 30 min Taxonomy fundamentals: Taxonomy business case 10: 30 -11: 00 30 min Tea Break 11: 00 -12: 00 60 min Taxonomy governance 12: 00 -12: 30 30 min Capabilities self-assessment 12: 30 -13: 30 60 min Lunch 13: 30 -14: 30 60 min Taxonomy benchmarking 14: 30 -14: 45 15 min Benchmarking exercise 14: 45 -15: 15 30 min Tea Break 15: 15 -16: 15 60 min Content tagging 16: 15 -16: 30 15 min Tagging exercise 16: 30 -17: 00 30 min Q&A Taxonomy Strategies LLC The business of organized information 27
Business case and motivations for taxonomies v How are we going to use content, metadata, and taxonomies in applications to obtain business benefits? Taxonomy Strategies LLC The business of organized information 28
What technology analysts have said: Add metadata to search on! v “Adding metadata to unstructured content allows it to be managed like structured content. Applications that use structured content work better. ” v “Enriching content with structured metadata is critical for supporting search and personalized content delivery. ” v “Content that has been adequately tagged with metadata can be leveraged in usage tracking, personalization and improved searching. ” v “Better structure equals better access: Taxonomy serves as a framework for organizing the ever-growing and changing information within a company. The many dimensions of taxonomy can greatly facilitate Web site design, content management, and search engineering. If well done, taxonomy will allow for structured Web content, leading to improved information access. ” Taxonomy Strategies LLC The business of organized information 29
Fundamentals of taxonomy ROI v Tagging content using a taxonomy is a cost, not a benefit. v There is no benefit without exposing the tagged content to users in some way that cuts costs or improves revenues. v Putting taxonomy into operation requires UI changes and/or backend system changes, as well as data changes. v You need to determine those changes, and their costs, as part of the ROI. Taxonomy Strategies LLC The business of organized information 30
Product utilization: Taxonomy compared to search v Conversion rate increases. § Home. Depot. com – Double digit increase. § 1 -800 -Flowers. com – More than a 10% increase. § Otto Group (Kaleidoscope, Freemans, Grattan, and lookagain catalogs) – 130% increase. v Lift in average order size. Taxonomy Strategies LLC The business of organized information 31
Product catalog: Taxonomy compared to search Increased conversion rate Benefit: & revenue lift Web sales net income Increased conversion rate $ 80, 000 30% $ 24, 000 Order size lift 10% $ 8, 000 Potential revenue increase per year Taxonomy Strategies LLC The business of organized information $ 32, 000 32
Usability research: Taxonomy compared to search v “We found that users preferred a browsing oriented interface for a browsing task, and a direct search interface when they knew precisely what they wanted. ” Marti Hearst (and others) v “The category interface is superior to the list interface in both subjective and objective measures. ” Hao Chen & Susan Dumais Taxonomy Strategies LLC The business of organized information 33
Usability research: Taxonomy compared to search Category is 48% faster Median Search Time in Seconds Category is 36% faster Source: Chen & Dumais Taxonomy Strategies LLC The business of organized information In top 20 results Not in top 20 results 34
Time saved: Taxonomy compared to search 1 hour per day searching x 36% faster = 22 minutes each day 22 minutes x 250 working days per year = 5500 minutes or 92 hours per year Taxonomy Strategies LLC The business of organized information 35
Time saved: Taxonomy compared to search Benefit: Increase service efficiency Number of call center calls per month 50, 000 Average cost per call $ 20 Call response costs per month $ 1, 000 Total call response costs per year $12, 000 Percentage of self-serviced calls due to improved information browsing Service costs savings per year Taxonomy Strategies LLC The business of organized information 30% $ 3, 600, 000 36
Trusted advisers: Taxonomy avoids costs v “The amount of time wasted in futile searching for vital information is enormous, leading to staggering costs …” Sue Feldman, v Sun’s usability experts calculated that 21, 000 employees were wasting an average of six minutes per day due to inconsistent intranet navigation structures. When lost time was multiplied by staff salaries, the estimated productivity loss exceeded $10 M per year—about $500 per employee per year. Jakob Nielsen, useit. com Taxonomy Strategies LLC The business of organized information 37
Knowledge workers spend up to 2. 5 hours each day looking for information … … But find what they are looking for only 40% of the time. Source: Kit Sims Taylor Taxonomy Strategies LLC The business of organized information 38
Knowledge workers spend more time re-creating existing content than creating new content 25% 8% Source: Kit Sims Taylor (cited by Sue Feldman in her original article) Taxonomy Strategies LLC The business of organized information 39
Cost saved by not recreating content Benefit: Increase in productivity Number of employees 100 Average employee salary $ 80, 000 Employee costs per year $8, 000 Increase in productivity from not recreating content Employee cost savings per year Taxonomy Strategies LLC The business of organized information 25% $2, 000 40
Business case summary 1. Classifications and classification-like schemes are being used to facilitate information seeking in the workplace, and on the web. 2. Users take advantage (and prefer) this type of scheme (faceted navigation) when it is made available in the user interface. 3. Hierarchical or facet navigation can be guided by the User Interface. 4. Facet navigation is best combined with keyword searching. E. g. , keyword search followed by faceted navigation of results. Taxonomy Strategies LLC The business of organized information 41
Today’s agenda 9: 00 -9: 10 -9: 15 -9: 45 -10: 00 10 min Introduction 5 min Warm-up exercise 30 min Taxonomy fundamentals: Building taxonomies 15 min Taxonomy exercise 10: 00 -10: 30 30 min Taxonomy fundamentals: Taxonomy business case 10: 30 -11: 00 30 min Tea Break 11: 00 -12: 00 60 min Taxonomy governance 12: 00 -12: 30 30 min Capabilities self-assessment 12: 30 -13: 30 60 min Lunch 13: 30 -14: 30 60 min Taxonomy benchmarking 14: 30 -14: 45 15 min Benchmarking exercise 14: 45 -15: 15 30 min Tea Break 15: 15 -16: 15 60 min Content tagging 16: 15 -16: 30 15 min Tagging exercise 16: 30 -17: 00 30 min Q&A Taxonomy Strategies LLC The business of organized information 42
Taxonomy requires a business processes v Taxonomies must change, gradually, over time if they are to remain relevant. v Maintenance processes need to be specified so that the changes are based on rational cost/benefit decisions. Taxonomy Strategies LLC The business of organized information 43
Taxonomy governance can be viewed as a standards process v Taxonomy must evolve, but in a predictable way. v Team structure, with an appeals process § Taxonomy stewardship is part-time role at most organizations. § Team needs to make decisions based on costs and benefits. v Documentation and educational materials. v Comment-handling responsibilities (part of error- correction process) v Issue Logs. v Release Schedule. Taxonomy Strategies LLC The business of organized information 44
Taxonomy governance: Change process overview 2: Taxonomy Team decides when to update CV 2: NASA Taxonomy Team snapshots CV Sources Taxonomy Facets CV Consumers Site Search Tool decides when to update snapshots of external CVs Portal Subject Codes Taxonomy Working Copies of CVs, maintain in Tool Taxonomy Tool Working Papers Project Archives NASA Expertise Competencies Other CVs from other NASA Sources Internal 3: 3: Team adds value to Team adds value via definitions, synonyms, snapshots through definitions, synonyms, classification rules, training materials, etc. External Standard Vocabularies Standard Internally Created CVs 1: External controlled vocabularies (CVs) change on their own schedule Created ’ Web CMS 4: Updated versions of CVs 4: Updated versions of published to CVs to Consumers consumers Taxonomy NASA Taxonomy Governance Environment DMS’ ’ DAM Tagging Metatagging Tool Search UI Environment CV = Controlled Vocabulary Taxonomy Strategies LLC The business of organized information 45
Who should build the taxonomy? v The taxonomy (and metadata specification) should be produced by a cross-functional team which includes business, technical, information management, and content creation stakeholders. v The team should plan on maintaining the taxonomy as well as building it. § Maintenance will not (usually) be anyone’s full-time job. § Exact mix of people on team will change. v It should be built in an iterative fashion, with more content and broader review for each iteration. Taxonomy Strategies LLC The business of organized information 46
Taxonomy governance: Generic team charter v Taxonomy Team is responsible for maintaining: § The Taxonomy, a multi-faceted classification scheme. § Associated taxonomy materials, such as: – Editorial Style Guides. – Taxonomy Training Materials. – Metadata Standard. § Team rules and procedures for change management. v Taxonomy Team will consider costs and benefits of suggested changes. v Taxonomy Team will: § Manage relationship between providers of source vocabularies and consumers of the Taxonomy. § Identify new opportunities for use of the Taxonomy across the enterprise to improve information management practices. § Promote awareness and use of the Taxonomy Strategies LLC The business of organized information 47
Taxonomy governance team: Generic roles § Keeps committee on track with larger business objectives. § Balances cost/benefit issues to decide appropriate levels of Business Lead Technical Specialist effort. § Obtains needed resources if those on committee can’t accomplish a particular task. § Estimates costs of proposed changes in terms of amount of data to be retagged, additional storage and processing burden, software changes, etc. § Helps obtain data from various systems. Taxonomy Specialist § Committee’s liaison to content creators. Content Specialist § Suggests potential taxonomy changes based on analysis of Content Owners § Estimates costs of proposed changes in terms of editorial process changes, additional or reduced workload, etc. query logs, indexer feedback. § Makes edits to taxonomy, installs into system with aid of IT specialist. § Reality check on process change suggestions. Taxonomy Strategies LLC The business of organized information 48
Where taxonomy changes come from Firewall Application UI Tagging UI Content Application Logic Taxonomy Query log analysis End User Recommendations by Editor 1. Small taxonomy changes (labels, synonyms) 2. Large taxonomy changes (retagging, application changes) 3. New “best bets” content. Tagging Logic Staff notes ‘missing’ concepts Tagging Staff Taxonomy Editor Taxonomy Team Taxonomy Strategies LLC The business of organized information Team Considerations 1. Business goals. experience 2. Changes in user experience. 3. Retagging cost. Requests from other parts of NASA parts of the organization 49
Taxonomy maintenance processes v Different organizations will need to consider their own change processes. § Organization 1: A custodian is responsible for the content, but checks facts with department heads before making changes. § Organization 2: Analysts suggest changes, editors approve, copyeditors verify consistency. § Organization 3: Marketing reps ask for a change, taxonomy editor makes demo, web representative approves it. v Change process MUST also consider cost of implementing the change § § Retagging data. Reconfiguring auto-classifier. Retraining staff. Changes in user expectations. Taxonomy Strategies LLC The business of organized information 50
Taxonomy maintenance workflow Taxonomy Tool Yes Suggest new name/category Review name Problem? No Copy edit new name Add to enterprise Taxonomy No Yes Analyst Editor Taxonomy Strategies LLC The business of organized information Copywriter Sys Admin 51
Sample taxonomy editor: Data Harmony Hierarchy Browser Standard Term Info Taxonomy Strategies LLC The business of organized information 52
Taxonomy editing tools vendors An immature area– No Ability to Execute high Most popular taxonomy editor is MS Excel vendors are in upperright quadrant! low High functionality /high cost products ($100 K+) Niche Players Multi. Tes is widely used, Completeness of Visionaries cheap with functionality Taxonomy Strategies LLC The business of organized information 53
Taxonomy maturity model v Taxonomy governance processes must fit the organization. v As consultants, we notice different levels of maturity in the business processes around content management, taxonomy, and metadata. v Honestly assess your organization’s metadata maturity in order to design appropriate governance processes. v We are starting to define a maturity model, similar to the Software Capability Maturity Model (CMM) § Initial: Ad hoc, each project begins from scratch. § Repeatable: Procedures defined and used, but not standardized across organization or are misapplied to projects. § Defined: Standard processes are tailored for project needs. Strategic training for long-range goals is in place. § Managed: Projects managed using quantitative quality measures. Process itself is measured and controlled. § Optimizing: Continual process improvement. Extremely accurate project estimation. Taxonomy Strategies LLC The business of organized information 54
Purpose of maturity model v Estimating the maturity of an organization’s information management processes tells us: § How involved the taxonomy development and maintenance process should be – Overly sophisticated processes will fail. § What to recommend as first steps. v Maturity is not a goal, it is a characterization of an organization’s methods for achieving particular goals. v Mature processes have expenses which must be justified by consequent cost savings or revenue gains. v IT Maturity may not be core to your business. Taxonomy Strategies LLC The business of organized information 55
Taxonomy maturity scorecard Initial Repeatable Defined Managed Optimizing Organizational Structure Executive Sponsorship * Budgeting * Hiring & Training * Quality Assurance Manual Processes * Automated Processes 1 * Project Management Estimating & Scheduling * Cost Control * Project Methodology * 2 Design and Execution Planning * Design Excellence * Development Maturity * 1 – X is starting to examine search query logs, which is an important first step in improving search. But this is only an isolated example. 2 – IT has a project methodology they are trying to use across all projects. But not all business units have project methodologies. Taxonomy Strategies LLC The business of organized information 56
Taxonomy governance self-assessment Background 1. Rate your organization’s overall taxonomy maturity from 1 to 2. Does the search engine index more than 4 repositories around the organization? 10. Immature 1 2 3 4 5 6 7 8 9 10 Mature 2. What type of change was most recently made to your organization’s taxonomy management environment? Functionality Standards Tools People Data Quality 2. What is the area for your organization’s taxonomy management environment improvement? Functionality Standards Tools People Data Quality Basic 3. Are system features and metadata fields added based on cost/benefit analysis, or because they are easy to do with the current applications and tools? Cost/Benefit Easy 4. Are applications and tools acquired after requirements have been analyzed, or are major purchases sometimes made to use up year-end money? Requirements Year-End 5. Are there hiring and training practices for metadata and taxonomy positions? Yes No If there is training, describe it briefly. 1. Is there a process in place to examine search query logs? Yes No 2. Is there an organization-wide metadata standard, such as the “Dublin Core”, for use by search tools? Yes No Advanced 1. Are there established qualitative and quantitative measures of metadata quality? Yes No Intermediate If there are measures, describe them briefly. 1. Is there an ongoing data cleansing procedure to look for any redundant, obsolete or trivial content (ROT)? Yes No If there is a process, describe it briefly. Taxonomy Strategies LLC The business of organized information 2. Can the CEO explain the return on investment (ROI) for content management, search and metadata? Yes No 57
2005 Maturity survey: Search practices Not current practice Being developed In practice Former practice NA or Unknown Search Box in standard place on all web pages. 20% (12) 11% (7) 62% (38) 2% (1) 5% (3) Search engine indexes multiple repositories in addition to web sites. 25% (15) 21% (13) 44% (27) 2% (1) 8% (5) Spell Checking. 31% (19) 18% (11) 38% (23) 0% (0) 13% (8) Synonym Searching. 41% (25) 23% (14) 30% (18) 0% (0) 7% (4) Search results grouped by date, location, or other factors in addition to simple relevance score. 37% (22) 20% (12) 37% (22) 0% (0) 7% (4) Queries are logged and the logs are regularly examined 31% (19) 25% (15) 31% (19) 5% (3) 8% (5) Common queries identified, 'best' pages for those queries are found, and search engine configured to return them at the top. (Best Bets) 46% (28) 25% (15) 21% (13) 0% (0) 8% (5) Advanced computation of relevance based on data in addition to the text of the document. 43% (26) 16% (10) 25% (15) 0% (0) 16% (10) A faceted search tool, such as Endeca, has been implemented for the organization's external site or product catalog search. 68% (41) 7% (4) 10% (6) 0% (0) 15% (9) A faceted search tool, such as Endeca, has been implemented for the organization's internal website(s) or portal. 57% (34) 15% (9) 17% (10) 0% (0) 12% (7) n=87 Taxonomy Strategies LLC The business of organized information 58
2005 Maturity survey: Metadata practices Not current practice Being developed In practice Former practice NA or Unknown Metadata standards are developed for the needs of each system with no overall attempt to unify them. 22% (13) 12% (7) 37% (22) 20% (12) 10% (6) An Organization-wide metadata standard exists and new systems consider it during development. 37% (22) 20% (12) 0% (0) 7% (4) The Organization-wide metadata standard is based on the Dublin Core. 52% (30) 16% (9) 21% (12) 0% (0) 12% (7) Multiple repositories comply with metadata standard. 52% (31) 20% (12) 17% (10) 0% (0) 12% (7) A Cataloging Policy document exists to teach people how to tag data in compliance with organizational metadata standard. 48% (29) 20% (12) 0% (0) 12% (7) The Cataloging Policy document is revised periodically. 48% (29) 15% (9) 17% (10) 0% (0) 20% (12) A centralized metadata repository exists to aggregate and unify metadata from disparate sources. 57% (34) 17% (10) 0% (0) 10% (6) 15% (9) 12% (7) 61% (36) 3% (2) 8% (5) Metadata is generated automatically by software. 38% (23) 18% (11) 27% (16) 2% (1) 15% (9) Metadata is generated automatically, then reviewed manually for correction. 48% (29) 18% (11) 17% (10) 2% (1) 15% (9) n=87 Metadata is manually entered into web forms. Taxonomy Strategies LLC The business of organized information 59
2005 Maturity survey: Taxonomy practices Not current practice Being developed In practice Former practice NA or Unknown Org Chart Taxonomy - One based primarily on the structure of the organization. 36% (21) 10% (6) 34% (20) 5% (3) 15% (9) Products Taxonomy - One based primarily on the products and/or services offered by the organization. 37% (22) 10% (6) 32% (19) 5% (3) 15% (9) Content Types Taxonomy - One based primarily on the different types of documents. 28% (16) 21% (12) 40% (23) 5% (3) 7% (4) Topical Taxonomy - One based primarily on topics of interest to the site users. 20% (12) 36% (21) 34% (20) 3% (2) 7% (4) Faceted Taxonomy - One which uses several of the approaches above. 32% (19) 29% (17) 34% (20) 0% (0) 5% (3) The Taxonomy, or a portion of it, was licensed from an outside taxonomy vendor. 75% (44) 3% (2) 14% (8) 0% (0) 8% (5) The Taxonomy follows a written 'style guide' to ensure its consistency over time. 47% (28) 22% (13) 20% (12) 0% (0) 10% (6) The Taxonomy is maintained using a taxonomy editing tool other than MS Excel. 35% (21) 17% (10) 40% (24) 2% (1) 7% (4) The Taxonomy was validated on a representative sample of content during its development. 28% (17) 22% (13) 33% (20) 3% (2) 13% (8) A Roadmap for the future evolution of the Taxonomy has been developed. 38% (23) 40% (24) 13% (8) 0% (0) 8% (5) n=87 Taxonomy Strategies LLC The business of organized information 60
Today’s agenda 9: 00 -9: 10 -9: 15 -9: 45 -10: 00 10 min Introduction 5 min Warm-up exercise 30 min Taxonomy fundamentals: Building taxonomies 15 min Taxonomy exercise 10: 00 -10: 30 30 min Taxonomy fundamentals: Taxonomy business case 10: 30 -11: 00 30 min Tea Break 11: 00 -12: 00 60 min Taxonomy governance 12: 00 -12: 30 30 min Capabilities self-assessment 12: 30 -13: 30 60 min Lunch 13: 30 -14: 30 60 min Taxonomy benchmarking 14: 30 -14: 45 15 min Benchmarking exercise 14: 45 -15: 15 30 min Tea Break 15: 15 -16: 15 60 min Content tagging 16: 15 -16: 30 15 min Tagging exercise 16: 30 -17: 00 30 min Q&A Taxonomy Strategies LLC The business of organized information 61
Taxonomy testing methods Method Process Who Requires Validation Walk-thru Show & explain 4 Taxonomist 4 SME 4 Team 4 Rough taxonomy 4 Approach 4 Appropriateness to task Walk-thru Check conformance to editorial rules 4 Taxonomist 4 Draft taxonomy 4 Editorial Rules 4 Consistent look and feel Usability Testing Contextual analysis (card sorting, scenario testing, etc. ) 4 Users 4 Rough taxonomy 4 Tasks & Answers 4 Tasks are completed successfully 4 Time to complete task is reduced User Satisfaction Survey 4 Users 4 Rough Taxonomy 4 UI Mockup 4 Search prototype 4 Reaction to taxonomy 4 Reaction to new interface 4 Reaction to search results Tagging Samples Tag sample content with taxonomy 4 Taxonomist 4 Team 4 Indexers 4 Sample content 4 Rough taxonomy (or better) 4 Content ‘fit’ 4 Fills out content inventory 4 Training materials for people & algorithms Taxonomy Strategies LLC The business of organized information 62
Walk-through method— Show & explain ABC Computers. com Content Type Award Case Study Contract & Warranty Demo Magazine News & Event Product Information Services Solution Specification Technical Note Tool Training White Paper Other Content Types Competency Industry Service Business & Finance Interpersonal Development IT Professionals Technical Training IT Professionals Training & Certification PC Productivity Personal Computing Proficiency Banking & Finance Communications E-Business Education Government Healthcare Hospitality Manufacturing Petro-chemicals Retail / Wholesale Technology Transportation Other Industries Assessment, Design & Implementation Deployment Enterprise Support Client Support Managed Lifecycle Asset Recovery & Recycling Training Taxonomy Strategies LLC The business of organized information Product Family Desktops MP 3 Players Monitors Networking Notebooks Printers Projectors Servers Services Storage Televisions Other Brands Audience Line of Business Region. Country All Business Employee Education Gaming Enthusiast Home Investor Job Seeker Media Partner Shopper First Time Experienced Advanced Supplier All Home & Home Office Gaming Government, Education & Healthcare Medium & Large Business Small Business All Asia-Pacific Canada EMEA Japan Latin America & Caribbean United States 63
Walk-through method— Editorial rules consistency check v v v v Abbreviations Ampersands Capitalization General…, More…, Other… Languages & character sets Length limits Multiple parents Plural vs. singular form Scope notes Serial comma Sources of terms Spaces Synonyms & acronyms Term order (Alphabetic or …) Term label order (Direct vs. inverted) Rule Name Editorial Rule Abbreviations, other than colloquial terms and acronyms, shall not be used in term labels. Example: Public Information NOT: Public Info. Ampersands The ampersand [&] character shall be used instead of the word ‘and’. Example: Licensing & Compliance NOT: Licensing and Compliance Capitalization Title case capitalization shall be used. Example: Customer Service NOT: CUSTOMER SERVICE NOT: Customer service NOT: customer service General…, More…, Other… The term labels “General…”, “More…”, and “Other…” shall be used for categories which contain content items that are not further classifiable. Example: “Other Property” “Other Services” “General Information” “General Audience” … … … Taxonomy Strategies LLC The business of organized information 64
Task-based testing* * Based on Donna Maurer’s usability work with the Australian government v 15 representative questions were selected § Perspective of various organizational units § Most frequent website searches § Most frequently accessed website content § Correct answers to the questions were agreed in advance by team. v 15 users were tested § Did not work for the organization § Represented target audiences v Testers were asked “where would you look for …” § “under which facet… Topic, Commodity, or Geography? ” § Then, “… under which category? ” § Then, “…under which sub-category? ” § Tester choices were recorded v Testers were asked to “think aloud” § Notes were taken on what they said v Pre- and post questions were asked § Tester answers were recorded Taxonomy Strategies LLC The business of organized information 65
Task-based testing— Representative questions 1. How much cotton is imported from China? 2. What are the impacts of “mad cow" disease on U. S. meat production, sales? 3. What is the average farm income level in your state? 4. How much of our diet comes from fast food? 5. How many people receive WIC benefits (Special Supplemental Nutrition Program for Women, Infants, and Children)? 6. How much acreage is planted to genetically engineered corn? 7. What is the cost of foodborne illness in the United States? 8. What part of food costs go to farmers, retailers? 9. Which States produce the most tobacco? 10. What percentage of farms in the United States are small farms? 11. What are the costs and benefits associated with providing more traceability in the U. S. food supply? 12. How many people in America don’t get enough to eat? 13. What is behind the trade balance (surplus or deficit) in agricultural goods? 14. What is the extent of conservation compliance? How does that impact farmer's decisions? 15. What are the impacts of foreign trade restrictions on U. S. farmers, U. S. food prices? Taxonomy Strategies LLC The business of organized information 66
Task-based testing— Closed card sorting 3. What is the average farm income level in your state? 1. Topics 2. Commodities 3. Geographic Coverage 1. 1. 1 1. 2 1. 3 1. 4 1. 5 1. 6 1. 7 1. 8 1. 9 1. 10 Topics Agricultural Economy Agriculture-Related Policy Diet, Health & Safety Farm Financial Conditions Farm Practices & Management Food & Agricultural Industries Food & Nutrition Assistance Natural Resources & Environment Rural Economy Trade & International Markets Taxonomy Strategies LLC The business of organized information 1. 4. 1 1. 4. 2 1. 4. 3 1. 4. 4 1. 4. 5 1. 4. 6 1. 4. 7 Farm Financial Conditions Costs of Production Commodity Outlook Farm Financial Management & Performance Farm Income Farm Household Financial Well-being Lenders & Financial Markets Taxes 67
Task based testing— Card sort analysis Find-it Tasks User 1 User 2 User 3 User 4 User 5 1. Cotton Asia Cotton 2. Mad cow Cattle Food Safety Cattle 3. Farm income Farm Income US States Farm Income 4. Fast food Food Consumption Diet Quality & Nutrition Food Expenditures Diet Quality & Nutrition 5. WIC Program WIC Program 6. GE Corn Corn 7. Foodborne illness Foodborne Disease Consumer Food Safety Foodborne Disease Retailing & Wholesaling 8. Food costs Food Prices Market Structure Market Analysis Food Expenditures 9. Tobacco Tobacco 10. Small Farms Farm Structure Farm Structure Food Safety Policy Food Prices 11. Traceability Food System Labeling Policy Food Safety Innovations 12. Hunger Food Security Food Security 13. Trade balance Commodity Trade & Intl Markets Commodity Trade Market Analysis Commodity Trade 14. Conservations Cropping Practices Conservation Policy 15. Trade restrictions Trade Policy Food Safety & Trade WTO Market Analysis Commodity Trade Taxonomy Strategies LLC The business of organized information 68
Task based testing— Card sort results v In 80% of the trials users looked for information under the categories that we expected them to look for it. v Breaking-up topics into facets makes it easier to find information, especially information related to commodities. Taxonomy Strategies LLC The business of organized information 69
Task based testing— Card sort results Test Questions % Correct % Agree 1. Cotton 91% 82% 2. Mad cow 73% 64% 100% 55% 91% 73% 5. WIC 100% 6. GE corn 100% 7. Foodborne illness 82% 8. Food costs 55% 27% 100% 10. Small farms 91% 11. Traceability 36% 18% 100% 73% 13. Trade balance 36% 64% 14. Conservation 91% 15. Trade restrictions 55% 36% Possible change required. 3. Farm income 4. Fast food 9. Tobacco 12. Hunger Taxonomy Strategies LLC The business of organized information Change required. Policy of “Traceability” needs to be clarified. Use quasi-synonyms. On these trials, only 50% looked in the right category, & only 27 -36% agreed on the category. Possible error in categorization of this question because 64% thought the answer should be “Commodity Trade. ” 70
Task-based testing— User satisfaction survey v Was it easy, medium or difficult to choose the appropriate Topic? – Easy – Medium – Difficult v Was it easy, medium or difficult to choose the appropriate Commodity? – Easy – Medium – Difficult v Was it easy, medium or difficult to choose the appropriate Geographic Coverage? – Easy – Medium – Difficult Taxonomy Strategies LLC The business of organized information 71
User satisfaction survey— Results More Difficult Taxonomy Strategies LLC The business of organized information Easier 72
User interface survey— Which search UI is ‘better’? v Criteria § User satisfaction § Success completing tasks § Confidence in results § Fewer dead ends v Methodology § Design tasks from specific to general § Time performance § Calculate success rates § Survey subjective criteria § Pay attention to survey hygiene: – – – Participant selection Counterbalancing T-scores Taxonomy Strategies LLC The business of organized information Source: Yee, Swearingen, Li, & Hearst 73
User interface survey— Results (1) Which Interface would you rather use for these tasks? Find images of roses Google-like Baseline Faceted Category 15 16 Find all works from a certain period 2 30 Find pictures by 2 artists in the same media 1 29 … Overall assessment: Google-like Baseline Faceted Category More useful for your usual tasks 4 28 Easiest to use 8 23 Most flexible 6 24 28 3 Helped you learn more 1 31 Overall preference 2 29 More likely to result in dead-ends … Source: Yee, Swearingen, Li, & Hearst Taxonomy Strategies LLC The business of organized information 74
User interface survey— Results (2) Google-like Baseline Faceted Category Taxonomy Strategies LLC The business of organized information Source: Yee, Swearingen, Li, & Hearst 75
Tagging samples— How many items? Goal Illustrate metadata schema Number of Items Criteria 1 -3 Random (excluding junk) Develop training documentation 10 -20 Show typical & unusual cases Qualitative test of small vocabulary (<100 categories) 25 -50 Random (excluding junk) Quantitative test of vocabularies * 3 -10 X number of categories Use computer-assisted methods when more than 10 -20 categories. Pre-existing metadata is the most meaningful. * Quantitative methods require large amounts of tagged content. This requires specialists, or software, to do tagging. Results may be very different than how “real” users would categorize content. Taxonomy Strategies LLC The business of organized information 76
Tagging samples— Manually tagged metadata sample Attribute Values Title Jupiter’s Ring System URL http: //ringmaster. arc. nasa. gov/jupiter/ Description Overview of the Jupiter ring system. Many images, animations and references are included for both the scientist and the public. Content Types Web Sites; Animations; Images; Reference Sources Audiences Educators; Students Organizations Ames Research Center Missions & Projects Voyager; Galileo; Cassini; Hubble Space Telescope Locations Jupiter Business Functions Scientific and Technical Information Disciplines Planetary and Lunar Science Time Period 1979 -1999 Taxonomy Strategies LLC The business of organized information 77
Tagging samples— Spreadsheet for tagging 10’s-100’s of items 1) Clickable URLs for sample content 2) Review small sample and describe 3) Drop-down for tagging (including ‘Other’ entry for the unexpected 4) Flag questions Taxonomy Strategies LLC The business of organized information 78
Rough bulk tagging— Facet demo (1) v Collections: 4 content sources § NTRS, SIRTF, Webb, Lessons Learned v Taxonomy § Converted Multi. Tes format into RDF for Seamark v Metadata § Converted from existing metadata on web pages, or § Created using simple automatic classifier (string matching with terms & synonyms) § 250 k items, ~12 metadata fields, 1. 5 weeks effort v OOTB Seamark user interface, plus logo Taxonomy Strategies LLC The business of organized information 79
Rough bulk tagging— Facet demo (2) Taxonomy Strategies LLC The business of organized information 80
Document distribution— How evenly does it divide the content? v Documents do not distribute uniformly across categories v Zipf (1/x) distribution is expected behavior v 80/20 rule in action (actually 70/20 rule) Leading candidate for splitting Leading candidates for merging Taxonomy Strategies LLC The business of organized information 81
Document distribution— How evenly does it divide the content? v Methodology: 115 randomly selected URLs from corporate intranet search index were manually categorized. Inaccessible files and ‘junk’ were removed. v Results: Slightly more uniform than Zipf distribution. Above the curve is better than expected. Taxonomy Strategies LLC The business of organized information 82
Document distribution— How does taxonomy “shape” match that of content? Background: v Hierarchical taxonomies allow comparison of “fit” between content and taxonomy areas Methodology: v 25, 380 resources tagged with taxonomy of 179 terms. (Avg. of 2 terms per resource) v Counts of terms and documents summed within taxonomy hierarchy Results: v Roughly Zipf distributed (top 20 terms: 79%; top 30 terms: 87%) v Mismatches between term% and document% flagged Term Group % Terms % Docs Administrators 7. 8 15. 8 Community Groups 2. 8 1. 8 Counselors 3. 4 1. 4 Federal Funds Recipients and Applicants 9. 5 34. 4 Librarians 2. 8 1. 1 News Media 0. 6 3. 1 Other 7. 3 2. 0 Parents and Families 2. 8 6. 0 Policymakers 4. 5 11. 5 Researchers 2. 2 3. 6 School Support Staff 2. 2 0. 2 Student Financial Aid Providers 1. 7 0. 7 Students 27. 4 7. 0 Teachers 25. 1 11. 4 Source: Courtesy Keith Stubbs, US. Dept. of Ed. Taxonomy Strategies LLC The business of organized information 83
Usability testing— How intuitive (repeatable) are the categorizations (1)? v Methodology: Closed Card Sort § For alpha test of a grocery site § 15 Testers put each of 71 best-selling product types into one of 10 pre-defined categories § Categories where fewer than 14 of 15 testers put product into same category were flagged Taxonomy Strategies LLC The business of organized information 84
Usability testing— How intuitive (repeatable) are the categorizations (2)? Taxonomy Strategies LLC The business of organized information 85
Usability testing— How intuitive (repeatable) are the categorizations? % of Testers Cumulative % of Products With Poly-Hierarchy 15/15 54% 69% 14/15 70% 83% 13/15 77% 93% 12/15 83% 100% 11/15 85% 100% <11/15 100% Taxonomy Strategies LLC The business of organized information 86
The #1 underused source of quantitative information on how to improve your taxonomy? Query Logs & Click Trails Taxonomy Strategies LLC The business of organized information 87
Query log & click trail examination— Who are the users & what are they looking for? v Only 30 -40% of organizations regularly examine their logs*. v Sophisticated software available, but don’t wait. v 80% of value comes from basic reports Taxonomy Strategies LLC The business of organized information 88
Query log & click trail examination— Query log Ultra. Seek Reporting v Top queries v Queries with no results v Queries with no click-through v Most requested documents v Query trend analysis v Complete server usage summary Taxonomy Strategies LLC The business of organized information 89
Query log & click trail examination— Click trail packages v i. Web. Track v Net. Tracker v Optimal. IQ v Site. Catalyst v Visitorville v Web. Trends Taxonomy Strategies LLC The business of organized information 90
Summary— Start a “Measure & Improve” mindset v Taxonomy changes do not stand alone § Search system improvements § Navigation improvements § Content improvements § Process improvements Taxonomy Strategies LLC The business of organized information 91
Benchmarking exercise v What are 5 representative questions that your users ask or tasks that your users do when using your application? v Is it currently easy, medium or difficult to answer these questions or accomplish these tasks? Rating (Easy/ Medium/Difficult) Taxonomy Strategies LLC The business of organized information Questions or Tasks 92
Conclusion— What is a good taxonomy? v Incremental, extensible process that identifies and v v enables owners, and engages stakeholders. Quick implementation that provides measurable results as quickly as possible. A means to an end, and not the end in itself. Not perfect, but it does the job it is supposed to do—such as improving search and navigation. Improved over time, and maintained. Taxonomy Strategies LLC The business of organized information 93
Today’s agenda 9: 00 -9: 10 -9: 15 -9: 45 -10: 00 10 min Introduction 5 min Warm-up exercise 30 min Taxonomy fundamentals: Building taxonomies 15 min Taxonomy exercise 10: 00 -10: 30 30 min Taxonomy fundamentals: Taxonomy business case 10: 30 -11: 00 30 min Tea Break 11: 00 -12: 00 60 min Taxonomy governance 12: 00 -12: 30 30 min Capabilities self-assessment 12: 30 -13: 30 60 min Lunch 13: 30 -14: 30 60 min Taxonomy benchmarking 14: 30 -14: 45 15 min Benchmarking exercise 14: 45 -15: 15 30 min Tea Break 15: 15 -16: 15 60 min Content tagging 16: 15 -16: 30 15 min Tagging exercise 16: 30 -17: 00 30 min Q&A Taxonomy Strategies LLC The business of organized information 94
Tagging Overview v Tagging is better than the words that happen to occur in a piece of content. v All tagging is useful § End user tagging § Tagging by librarians § Automated tagging by OS and algorithms v Content should be tagged throughout its lifecycle, each time the content is handled and used so that it accrues value or its significance is diminished. Taxonomy Strategies LLC The business of organized information 95
MS Office: File Properties Ho w m any peo ple fi ll th is i n ? Taxonomy Strategies LLC The business of organized information 96
Organize Ho w m an y p eo ple cl ick on Taxonomy Strategies LLC The business of organized information th is? 97
What is social tagging? v End user tagging v Easy, intuitive tagging interfaces v Almost instantaneous feedback § Enables people to tag & re-tag content § … in response to seeing their tags in context with other tags. v Emergent categories § Resembles open card sort process in which patterns emerge § … rather than validating categories using closed card sorts. Taxonomy Strategies LLC The business of organized information 98
Social tagging innovators v flickr founders § Caterina Fake § Stewart Butterfield v del. icio. us founder § Joshua Schachter v del. icio. us & flickr are now both part of Yahoo! v As of April 2006 flickr had 130 million photos posted by 3 million registered users. Taxonomy Strategies LLC The business of organized information 99
Four tagging rules for end users Rule Description Use specific terms Apply the most specific terms when tagging content. But do not tag every possible topic, just the ones that are most important or best characterize the content as a whole. Use multiple terms Use as many terms as necessary to describe overall What the content is about & Why it is important. Do not over-tag. Use appropriate terms Only fill-in the facets & values that make sense. Not all facets apply to all content. Consider how content will be used Anticipate how the content will be searched for in the future, & how to make it easy to find it. Remember that search engines can only operate on explicit information. Taxonomy Strategies LLC The business of organized information 100
Agenda v Content Tagging v Tagging Interface Taxonomy Strategies LLC The business of organized information 101
Requirements for a tagging interface v Automated form fill-in (automatically fills in known data) v Tagging precedents (see tags already assigned by v v v v v others) Controlled vocabularies, e. g. , with pull-down list Multi-valued tags Geo-tagging Group tagging Clean-up tag tools, e. g. , alpha list Batch editing Share/Don’t share (Public/Private) Identified owner (who can be emailed) Almost immediate feedback, e. g. , tag cloud Taxonomy Strategies LLC The business of organized information 102
Form fill-in: Automatically filled-in known data Taxonomy Strategies LLC The business of organized information 103
Form fill-in: Automatically filled-in known data Manual form fill-in w/ check boxes, pull-down lists, etc. Auto keyword & summarization Taxonomy Strategies LLC The business of organized information 104
Form fill-in: Automatically filled-in known data Auto-categorization Rules & pattern matching Parse & lookup (recognize names) Taxonomy Strategies LLC The business of organized information 105
Tagging precedents: See tags assigned by others Taxonomy Strategies LLC The business of organized information 106
Multi-valued group tagging Taxonomy Strategies LLC The business of organized information 107
Group geo-tagging Taxonomy Strategies LLC The business of organized information 108
Group geo-tagging Taxonomy Strategies LLC The business of organized information 109
Clean up tag tools: Alpha list Taxonomy Strategies LLC The business of organized information 110
Batch edit Taxonomy Strategies LLC The business of organized information 111
Share or don’t share tagging Taxonomy Strategies LLC The business of organized information 112
Bulk tagging v ID collection of related content items by pattern or context v Then, apply same attributes to all content items Taxonomy Strategies LLC The business of organized information 113
Tag a folder v Drag & drop content items into folder v Then, content items inherit properties of folder Taxonomy Strategies LLC The business of organized information 114
Workflow v Approve & improve mindset Create Content Add Metadata Review & Improve Taxonomy Strategies LLC The business of organized information Publish Review & Improve 115
Interactive rewards v Almost instantaneous exposure of tags in simple user interfaces on the web provides positive reinforcement for user tagging that simply did not exist before. v For example, § Most popular § Tag clouds § Alerts Taxonomy Strategies LLC The business of organized information 116
Most popular v Another example is most emailed from, e. g. , the NY Times. Taxonomy Strategies LLC The business of organized information 117
Tag cloud Taxonomy Strategies LLC The business of organized information 118
Alerts v New (content selected by date) v Subscriptions (content selected by tags) v Interest (content selected by other people) v Individual (content selected for you by other people) Taxonomy Strategies LLC The business of organized information 119
Taxonomy Strategies LLC Is faceted indexing the future of social tagging? 6 -15 June 2007 Copyright 2007 Taxonomy Strategies LLC. All rights reserved.
Tagging exercise: Blog tagging (a) ALA Tech Source. http: //www. techsource. ala. org/blog/2007/04/google-buys-oclc-announces-new-products. html Taxonomy Strategies LLC The business of organized information 121
Tagging exercise: Blog tagging (b) HBSP. http: //discussionleader. hbsp. com/davenport/2007/04/cause_and_effect_reporting_raw. html#comments Taxonomy Strategies LLC The business of organized information 122
Tagging exercise: Taxonomy facets—definitions Taxonomy Facets Descriptions Business activity Use for common business function or activity such as finance, marketing and sales. Industry / Product Use for content that is about or related to an industrial sector or product such as construction equipment. Geography Use for content that is about a region, country or city. Organization Use for named organizations, brands and business entities. Person / Role Use for named people and the roles people have in organizations. Content Type Use for content genres such as letters, memos and reports. Audience Use to indicate the intended audience. Topic Use for other business and associated topics that the content is about or related to. Taxonomy Strategies LLC The business of organized information 123
Tagging exercise: Taxonomy facets—values Business activity Accounting Auditing Finance HR management IT Marketing Operations management Sales Geography Africa Americas Antarctica Asia Europe Oceania Global Historical geography Oceans & seas Regions Organization / Entity Industry / Product Agriculture … Mining Utilities Construction Manufacturing Wholesale trade Retail trade Transportation & warehousing Information Finance & insurance Real estate Professional Management Administrative support Education Health care Arts, entertainment & recreation Accommodation & food Other services Public administration Business entities Companies & brands Government agencies International NGOs Organization types People / Role Business Leaders Thought Leaders Political Leaders Roles Content Type Basic facts & information Blog Brochure Database E-mail Letter Memo Multimedia Report Newsletter Podcast Press Release Research & Analysis RSS Feed Taxonomy Facets Audience Consumer Employee Manager Executive Tags Business activity Industry / Product Geography Organization Person / Role Content Type Audience Taxonomy Strategies LLC The business of organized information Topic 124
Summary v There are lessons to be learned from web tagging about how to get good metadata in document and content management applications. v Document and content management system tagging must be simple, and it must be almost instantaneously easier to find relevant work products. Taxonomy Strategies LLC The business of organized information 125
Taxonomy Strategies LLC Questions? Joseph A. Busch + 415 -377 -7912 jbusch@taxonomystrategies. com http: //www. taxonomystrategies. com 6 -15 June 2007 Copyright 2007 Taxonomy Strategies LLC. All rights reserved.


