Скачать презентацию Taxonomies and Metadata for Content Management Michael Huff Скачать презентацию Taxonomies and Metadata for Content Management Michael Huff

5bc002aacd21960305b78fb29b8b49fc.ppt

  • Количество слайдов: 55

Taxonomies and Metadata for Content Management Michael Huff Information Resource Officer U. S. Department Taxonomies and Metadata for Content Management Michael Huff Information Resource Officer U. S. Department of State

E-Government Act of 2002 • The use of computers and the Internet is rapidly E-Government Act of 2002 • The use of computers and the Internet is rapidly transforming societal interactions and the relationships among citizens, private businesses, and the Government. • The Federal Government has had uneven success in applying advances in information technology to enhance governmental functions and services, achieve more efficient performance, increase access to Government information, and increase citizen participation in Government. • Most Internet-based services of the Federal Government are developed and presented separately, according to the jurisdictional boundaries of an individual department or agency, rather than being integrated cooperatively according to function or topic.

Which U. S. Government organizations are experienced in using metadata & taxonomy tools? – Which U. S. Government organizations are experienced in using metadata & taxonomy tools? – – Defense Intelligence Agency USDA Economic Research Service (ERS) Federal Aviation Administration First. Gov – NASA – Small Business Administration – Social Security Administration – Department of State

Terms Definitions Metadata Data about data - a label that describes a content object Terms Definitions Metadata Data about data - a label that describes a content object so unstructured content can be managed like structured content. Taxonomy The specification and classification of the names of people, places, things, and everything else that is needed to allow search engines and other content applications to work better. Facet Classification Discrete set of elements (or fields) for labeling content and content components. Controlled Vocabulary A managed set of terms for which there is an agreed upon value or definition.

Metadata Field Data Type / Source Title string Creator string Identifier URL Date date Metadata Field Data Type / Source Title string Creator string Identifier URL Date date Subject (~10, 000 categories) Taxonomy

Why use metadata? • Adding metadata to unstructured content allows it to be managed Why use metadata? • Adding metadata to unstructured content allows it to be managed like structured content. • Enriching content with structured metadata is critical for supporting search and personalized content delivery. • Content that has been adequately tagged with metadata can be leveraged in usage tracking, personalization and improved searching.

Where does metadata fit in the information system architecture? User experience. How content is Where does metadata fit in the information system architecture? User experience. How content is presented and how users experience and interact with it dictates its perceived and actual value. Content architecture: Scalable metadata framework to enable content reuse, and handle changes in organization goals, user needs, and retrieval concerns. Tools and technology. The information supply-chain platform that enables workflows, and supports organizational and operational concerns.

What is Dublin Core? • Dublin Core is the metadata standard for describing Internet What is Dublin Core? • Dublin Core is the metadata standard for describing Internet resources so they are easy to find. Original workshop held in Dublin, Ohio. Dublin Core approved as ISO 15836. Shanghai meeting. 95 03 04 For more information: http: //www. dublincore. org

Why is metadata important? Complexity Subject metadata Better Use metadata – – What & Why is metadata important? Complexity Subject metadata Better Use metadata – – What & Why: navigation. How can it be used: & Subject, Description, Rights & Permissions Coverage discovery Asset metadata – Who, Where & More efficient When: Relational metadata Title, Creator, Publisher, Links editorial between and to: Contributor, Date, Type, Relation Format, Identifier, Source, process Language – Enabled Functionality http: //dublincore. org/documents/dcmi-terms/

What is a taxonomy? The specification of the names of people, places, things … What is a taxonomy? The specification of the names of people, places, things … and everything else that is needed to allow search engines and other content applications to work better. Animalia Kingdom Chordata Phylum Mammalia Class Carnivora Order Canidae Family Canis Genus C. familiari Species Linnaeus … 44 -Office Equipment and Accessories and Supplies. 12 -Office Supplies. 17 -Writing Instruments. 05 -Mechanical pencils. 06 -Wooden pencils. 07 -Colored pencils Segment UNSPSC … Family Class Commodity

Sample Recipe Taxonomy Facet Categories Main Ingredients Chocolate Dairy Fruits Grains Meat & Seafood Sample Recipe Taxonomy Facet Categories Main Ingredients Chocolate Dairy Fruits Grains Meat & Seafood Nuts Olives Pasta Spices & Seasonings Vegetables Meal Type Breakfast Brunch Lunch Supper Dinner Snack Cuisines African American Asian Caribbean Continental Eclectic/ Fusion/ International Jewish Latin American Mediterranean Middle Eastern Vegetarian Courses Cooking Methods Appetizers Beverages Breads Cheese Cocktails Desserts Fish & Shellfish Fruit Hors d'Oeuvres Meat Pasta Salad Sandwiches Soup Vegetables Advanced Bake Broil Fry Grill Marinade Microwave No Cooking Poach Quick Roast Sauté Slow Cooking Steam Stir-fry Controlled Vocabularies

The power of taxonomy facets • 4 independent categories of 10 nodes each have The power of taxonomy facets • 4 independent categories of 10 nodes each have the same discriminatory power as one hierarchy of 10, 000 nodes (104) • Easier to maintain • Can be easier to navigate Main Ingredients Chocolate Dairy Fruits Grains Meat & Seafood Nuts Olives Pasta Spices & Seasonings Vegetables Meal Type Breakfast Brunch Lunch Supper Dinner Snack Cuisines African American Asian Caribbean Continental Eclectic/ Fusion/ International Jewish Latin American Mediterranean Middle Eastern Vegetarian Cooking Methods Advanced Bake Broil Fry Grill Marinade Microwave No Cooking Poach Quick Roast Sauté Slow Cooking Steam Stir-fry

7 Common taxonomy facets Personalized content delivery requires defining taxonomy facets Facet Definition Example 7 Common taxonomy facets Personalized content delivery requires defining taxonomy facets Facet Definition Example Source Products and Services Names of products and services. ERP system, Your products and services, etc. Organizational structure. FIPS 95 -2, Your organizational structure, etc. Content Type Structured list of the various types of content being managed or used. AGLS Document Type, AAT Information Forms , Records management policy, etc. Industry Broad market categories such as lines of business, life events, or industry codes. FIPS 66, SIC, NAICS, etc. Location Place of operations or constituencies. FIPS 5 -2, FIPS 55 -3, ISO 3166, US Postal Service, etc. Functions and processes performed to accomplish mission and goals. FEA Business Reference Model, Enterprise Ontology, AAT Functions, etc. Audience Subset of constituents to whom a piece of content is directed or intended to be used. GEM, ERIC Thesaurus, IEEE LOM, etc. Topic Business topics relevant to your mission and goals. Federal Register Thesaurus, ERIC Thesaurus, Pro. Quest, etc. … and re-use of existing vocabulary sources

Applying the facets to the Dublin Core metadata elements Dublin Core Elements Definition Vocabulary Applying the facets to the Dublin Core metadata elements Dublin Core Elements Definition Vocabulary Source Title Not applicable Creator Content maker. LDAP Subject Content topic. Keyword Topic facet Description of content, summary. Not applicable Publisher of this manifestation. Agency facet Contributor Content contributor. LDAP Date Content lifecycle event for this manifestation. Not applicable Type Genre. Form Type facet Format of this manifestation. RFC 2045 Identifier Reference for this manifestation, e. g. , URL. Not applicable Source Applied taxonomy metadata facilitates a multi-faceted view of content Resource name. Source from which this manifestation has been derived. Not applicable Language of this manifestation. ISO 639 Relation Reference to related resource. None Coverage Space, period, date, jurisdiction, etc. Jurisdiction facet Rights Who has rights to use this manifestation. Privacy level

Facets at work on First. Gov site Frequency Organization Audience Content Type http: //www. Facets at work on First. Gov site Frequency Organization Audience Content Type http: //www. firstgov. gov

Powered by Guided Navigation 2 -3 clicks to product No dead ends http: //www. Powered by Guided Navigation 2 -3 clicks to product No dead ends http: //www. tesco. com/winestore

http: //www. towerrecords. com http: //www. towerrecords. com

Powered by http: //www. fortunoff. com Powered by http: //www. fortunoff. com

Seven practical rules for taxonomies 1. Incremental, extensible process that identifies and enables owners, Seven practical rules for taxonomies 1. Incremental, extensible process that identifies and enables owners, and engages stakeholders. 2. Quick implementation that provides measurable results as quickly as possible. 3. Not monolithic—has separately maintainable facets. 4. Re-uses existing IP as much as possible. 5. A means to an end, and not the end in itself. 6. Not perfect, but it does the job it is supposed to do —such as improving search and navigation. 7. Improved over time, and maintained.

What is the general purpose of the content you are managing? What types of What is the general purpose of the content you are managing? What types of content are you handling? Who is the audience for this content? What are the core organizational objectives that the content is related to?

 • Creating a taxonomy is only part of the job • How will • Creating a taxonomy is only part of the job • How will it be put to use? • In a new application, or by modifying an existing application? • What’s the effort around that? • Additional Issues • Tagging – Who will add the metadata and how? Browse by Topic Link to Bios from Personal Names Link to company data (quotes, news, . . . ) from Company names Link to info on Countries Alerts on People, Companies, and Topics

1 Identify Objectives 2 Inventory Content 3 Specify Metadata 4 Model Content 5 Specify 1 Identify Objectives 2 Inventory Content 3 Specify Metadata 4 Model Content 5 Specify Vocabularies 6 Specify Procedures 7 Train Staff Conduct interviews ID sources, spider assets & extract metadata Define fields & purpose Define content chunks & XML DTDs Compile controlled vocabularies Develop workflow, rules & procedures Develop materials & train staff

Task 1 – Identify objectives What do you do? What kinds of digital assets Task 1 – Identify objectives What do you do? What kinds of digital assets are being produced? For what audiences? What is the business process for submitting, selecting, editing, maintaining digital assets? How many digital assets are there? How fast is this growing? Are there particular industry or other standards that are important? What types of assets are hard to search for (that should be easier to find)? What tools would be helpful in locating assets? Acronyms? Abbreviations? Nick names? Glossary? Thesaurus? Taxonomy? Who else should we be talking to?

Task 2 – Inventory content 1. Identify target asset file path/URL. 2. Automatically generate Task 2 – Inventory content 1. Identify target asset file path/URL. 2. Automatically generate inventory metadata by crawling file stores. 3. Audit assets using inventory. 4. Enhance metadata with new facets. Path/URL Audit process Spider-generated New facets

Task 3 – Specify metadata Element Data Type Length Req. / Repeat Source Purpose Task 3 – Specify metadata Element Data Type Length Req. / Repeat Source Purpose Identifier String 48 chars 1 System supplied Author String Variable * LDAP validated Credits Title String Variable ? User Text search, results display Embargo Date Fixed ? System Obey rights Description String Variable ? User Text search, results display 1 Asset Types vocabulary Browse or group search results Custom interface for group of users Asset Type List Fixed Basic accountability Subject Audience List Fixed * Audience vocabulary Location List Fixed * ISO 3166 Filter or rank search results * Organization vocabulary Key index to retrieve & aggregate assets Organization. . . List Fixed Legend: ? – 1 or more * - 0 or more

Task 4 – Model content Header area Factor asset types from inventory into canonical Task 4 – Model content Header area Factor asset types from inventory into canonical types. Select examples from inventory (possibly with spider). Identify useful chunks for each asset type. Factor chunks into element superset. Identify relationships between chunks. Iterate until agree on asset types, elements, and relationships. Main content area Footer area Left navigation area

Task 5 – Specify vocabularies Develop broad taxonomy outline (13 levels deep) Review, revise, Task 5 – Specify vocabularies Develop broad taxonomy outline (13 levels deep) Review, revise, and approve taxonomy outline with stakeholders and subject matter experts. Fill in taxonomy outline Tag random samples from content inventory Review, revise, and approve draft taxonomy with stakeholders and subject matter experts.

Task 6 – Specify procedures Develop taxonomy style rules, ensure that the taxonomy follows Task 6 – Specify procedures Develop taxonomy style rules, ensure that the taxonomy follows them. Develop tagging rules and procedures, along with software to assist in the task. Specify taxonomy maintenance process and the update procedures to follow.

Task 6 – Governance & Maintenance The taxonomy must be changed over time. Suggestions Task 6 – Governance & Maintenance The taxonomy must be changed over time. Suggestions for changes can come from users, through query log analysis, and staff, from feedback form. Governance structure needed to make sure changes are justified. Firewall Application UI Tagging UI Application Logic Content Taxonomy Query log analysis End User Recommendations by Editor 1 Small taxonomy changes (labels, synonyms) 2 Large taxonomy changes (retagging, application changes) 3 New ‘best bets’ content Tagging Logic Staff notes ‘missing’ concepts Tagging Staff Taxonomy Editor Steering Committee considerations 1 Business Goals 2 Change in user experience 3 Retagging cost

Task 6 – Steering Committee Roles Business Lead Keeps committee on track with larger Task 6 – Steering Committee Roles Business Lead Keeps committee on track with larger business objectives Balances cost/benefit issues to decide appropriate levels of effort Specialists help in estimating costs Obtains needed resources if those in committee can’t accomplish a particular task Technical Specialist Estimates costs of proposed changes in terms of amount of data to be retagged, additional storage and processing burden, software changes, etc. Helps obtain data from various systems Content Specialist Committee’s liaison to content creators Estimates costs of proposed changes in terms of editorial process changes, additional or reduced workload, etc. Taxonomy Specialist Suggests potential taxonomy changes based on analysis of query logs, indexer feedback Makes edits to taxonomy, installs into system with aid of IT specialist Content Owner Reality check on process change suggestions

Task 7 – Train staff Indexing rules Staff will require training on The UI Task 7 – Train staff Indexing rules Staff will require training on The UI they use to tag the content The rules to follow when deciding what codes to apply The end-effect of the codes they apply The structure of the taxonomy Tagging examples come from the content inventory Hardcopies of the taxonomy, and yellow highlighters, are helpful during training Rule Description Specificity rule Apply the most specific terms when tagging assets. Specific terms can always be generalized, but generic terms cannot be specialized. Repeatable rule All attributes should be repeatable. Use as many terms as necessary to describe What the asset is about and Why it is important. Storage is cheap. Re-creating content is expensive. Appropriate ness rule Not all attributes apply to all assets. Only supply values for attributes that make sense. Usability rule Anticipate how the asset will be searched for in the future, and how to make it easy to find it. Remember that search engines can only operate on explicit information. Indexing UI

What about Automatic Categorization? • Automatic vs. Manual Categorization is a cost/benefit tradeoff – What about Automatic Categorization? • Automatic vs. Manual Categorization is a cost/benefit tradeoff – Semi-automated recommended over pure manual in production situations. – Automatic performance not bad, but not equal to trained manual tagging. • Software is not sane, so errors look crazy. – Large backlogs of content can’t justify investment of high-quality manual tagging • Old articles rarely accessed. • Recommend automated bulk tagging with error reporting and correction process.

What about automatically-created taxonomies? Typically a single hierarchy with no overall plan Results hard What about automatically-created taxonomies? Typically a single hierarchy with no overall plan Results hard for people to navigate What about automatic categorization? Accuracy close to human levels, but errors are very different Cost/benefit tradeoff Semi-automation is best practice

Enterprise taxonomy maintenance workflow Yes Suggest new name/category Review name Problem? Copy edit new Enterprise taxonomy maintenance workflow Yes Suggest new name/category Review name Problem? Copy edit new name No Add to enterprise Taxonomy No Yes Taxonomy Tool Analyst Editor Copywriter Sys Admin

Categorize with a purpose What is the problem you are trying to solve? Improve Categorize with a purpose What is the problem you are trying to solve? Improve search Browse for content on an enterprise-wide portal Enable users to syndicate content Otherwise provide the basis for content re-use How will you control the cost of creating and maintaining the metadata) needed to solve these problems? CMS with a metadata tagging products Semi-automated classification Taxonomy editing tools Guided navigation tools

How do you sell it? Don’t sell the taxonomy, sell the vision of what How do you sell it? Don’t sell the taxonomy, sell the vision of what you want to be able to do Clearly understanding what the problem is and what the opportunities are Costs and benefits Design the taxonomy in relation to the value at hand

Internet Resources Internet Resources

U. S. Government Resources U. S. Government Resources

http: //www. nasa. gov/home/index. html http: //www. nasa. gov/home/index. html

http: //pub-lib. jpl. nasa. gov/pub-lib/dscgi/ds. py/View/Collection-10 http: //pub-lib. jpl. nasa. gov/pub-lib/dscgi/ds. py/View/Collection-10

http: //www. loc. gov/flicc/wg/taxonomy. html http: //www. loc. gov/flicc/wg/taxonomy. html

http: //www. loc. gov/lexico/servlet/lexico/ http: //www. loc. gov/lexico/servlet/lexico/

http: //www. archives. gov/federal_register/code_of_federal_regulations/thesaurus. html http: //www. archives. gov/federal_register/code_of_federal_regulations/thesaurus. html

http: //feapmo. gov/ http: //feapmo. gov/

http: //www. km. gov/ http: //www. km. gov/

Other Resources Other Resources

http: //www. educause. edu/asp/taxonomy/show_taxonomy_links. asp? TREE=1&EXPAND=1 http: //www. educause. edu/asp/taxonomy/show_taxonomy_links. asp? TREE=1&EXPAND=1

http: //databases. unesco. org/thesaurus/ http: //databases. unesco. org/thesaurus/

http: //www. naa. gov. au/recordkeeping/control/functions_thesaur/contents. html http: //www. naa. gov. au/recordkeeping/control/functions_thesaur/contents. html

http: //www. taxonomystrategies. com/html/bibliography. htm http: //www. taxonomystrategies. com/html/bibliography. htm

Summary Why taxonomies? Why metadata? Summary Why taxonomies? Why metadata?

Shiyali Ramamrita Ranganathan Shiyali Ramamrita Ranganathan

Ranganathan’s Five Laws of Library Science 1. Books are for use (They don't belong Ranganathan’s Five Laws of Library Science 1. Books are for use (They don't belong on the shelf) 2. Books are for all; every reader his book (Every reader is unique) 3. Every book its reader (Every book is unique) 4. Save the time of the reader (Make libraries easy to use) 5. A library is a growing organism (Libraries are constantly changing to meet changing patron needs)

Thank you Michael Huff Information Resource Officer U. S. Department of State huffmp@state. gov Thank you Michael Huff Information Resource Officer U. S. Department of State huffmp@state. gov