Скачать презентацию Building Taxonomies Part 3 Alice Redmond-Neal Access Innovations Скачать презентацию Building Taxonomies Part 3 Alice Redmond-Neal Access Innovations

5732933885dc280166f94c2d13ef2d74.ppt

  • Количество слайдов: 29

Building Taxonomies Part 3 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York Building Taxonomies Part 3 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City, May 21, 2006 1 Copyright © 2006 Access Innovations, Inc.

Build a taxonomy – simple steps • Get paper and pencil – Sharpen pencil Build a taxonomy – simple steps • Get paper and pencil – Sharpen pencil • Define subject field • Collect terms • Organize terms • Fill in gaps • Flesh out and interrelate terms You’re done! 2 Copyright © 2006 Access Innovations, Inc.

Define subject field • Review representative collection of content • Determine: – Core areas Define subject field • Review representative collection of content • Determine: – Core areas – Peripheral topics Sociology Psychology Education • Scope can be modified later 3 Copyright © 2006 Access Innovations, Inc. Law

Before you go on: Build or buy? • Survey existing thesaurus/taxonomy resources for your Before you go on: Build or buy? • Survey existing thesaurus/taxonomy resources for your domain • Test for – Scope – Depth • Make-or-break terms – Cost Don’t reinvent the wheel! 4 Copyright © 2006 Access Innovations, Inc.

Collect terms • • • 5 Your documents and databases Departmental terminology Text books Collect terms • • • 5 Your documents and databases Departmental terminology Text books and their indexes (indices) Book tables of contents and indexes Journal quarterly indexes Encyclopediae Lexicons, glossaries on the topic Web resources Users and experts Search logs Copyright © 2006 Access Innovations, Inc.

Gather terms from search logs Beyond the Spider: The Accidental Thesaurus (Richard Wiggins, Information Gather terms from search logs Beyond the Spider: The Accidental Thesaurus (Richard Wiggins, Information Today, Oct 2002) Top ~100 search terms from search logs Match to web site with appropriate answer Basis for favorites or best bets, presented at the top of results list. (AKA behavior-based taxonomy) Not a thesaurus or taxonomy, but still a useful source of terms. 6 Copyright © 2006 Access Innovations, Inc.

Organize terms – roughly • Sort terms into several major categories – logical groups Organize terms – roughly • Sort terms into several major categories – logical groups of similar concepts as Top Terms – Identify core areas and peripheral topics – 10 – 20 to start – Consider moving proper names to authority files • Result: loose collection of terms under several main headings – Rough and tentative – see how it fits as you go – Initial gap analysis – Add / modify / delete as needed 7 Copyright © 2006 Access Innovations, Inc.

Labelling a concept – cognitive linguistics • Most-used labels are middle in range from Labelling a concept – cognitive linguistics • Most-used labels are middle in range from abstract to specific --- relates to search • Linguistic universal – true across cultures • Unique beginner • Life form • Generic Insurance • Specific • Varietal 8 Practical application? Health insurance Group health insurance Copyright © 2006 Access Innovations, Inc.

Craft the Top Terms • Toughest job and most important step! • Dictates further Craft the Top Terms • Toughest job and most important step! • Dictates further organization • Determines how browsers/searchers perceive the taxonomy – Coverage – Formality • Establish the concept first, tweak the wording later 9 Copyright © 2006 Access Innovations, Inc.

The term record • Main Term (MT) • Top Term (TT) • Broader Terms The term record • Main Term (MT) • Top Term (TT) • Broader Terms (BT) = subject term, heading, node, category, descriptor, class TAXONOMY • Narrower Terms (NT) • Related Terms (RT) – See also (SA) • Scope Note (SN) • History (H) • Non. Preferred Term (NP) – Used for (UF), See (S) 10 Copyright © 2006 Access Innovations, Inc. THESAURUS see Lexicographer’s lexicon

Usefulness of a term – the “duh” factor • Some terms are so basic Usefulness of a term – the “duh” factor • Some terms are so basic for a domain that they have little or no value – “Sports” in Sports Illustrated – “Technology” in Technology Review – “Golf” in Golf Magazine • How useful will the term be for indexing? – Apply to everything in the domain? – Distinguish important concepts? – If term is needed, specify limited use conditions in Scope Note 11 Copyright © 2006 Access Innovations, Inc.

Hierarchy structures – variations on a theme • Not pre-determined – Wines type variety Hierarchy structures – variations on a theme • Not pre-determined – Wines type variety region cost – Or Wines cost type…. • Varies by user group and needs – May have multiple views of same content – Standard alpha view or customized notation • Affects information architecture, i. e. how web site functions 12 Copyright © 2006 Access Innovations, Inc.

How do terms relate? • Hierarchical relationships -- Parents and their TAXONOMY children • How do terms relate? • Hierarchical relationships -- Parents and their TAXONOMY children • Equivalence relationships -- Aliases • Associative relationships -- Cousins 13 Copyright © 2006 Access Innovations, Inc. THESAURUS

Hierarchical relationships • Broader Term represents the category • Narrower Term represents the specific Hierarchical relationships • Broader Term represents the category • Narrower Term represents the specific • Three types: – Generic relationship (BTG/NTG) – Whole-part relationship (BTP/NTP) – Instance relationship (BTI/NTI) • BTs/NTs have a reciprocal relationship 14 Copyright © 2006 Access Innovations, Inc.

Broader to Narrower Terms Politics Elections Generic 15 Specific Presidential elections Gubernatorial elections Mayoral Broader to Narrower Terms Politics Elections Generic 15 Specific Presidential elections Gubernatorial elections Mayoral elections Varietal Copyright © 2006 Access Innovations, Inc.

Hierarchy – Generic (genus-species) relationship • Inheritance or inclusion – what’s true of the Hierarchy – Generic (genus-species) relationship • Inheritance or inclusion – what’s true of the parent (BT) is true for all children (NTs) • Applies to entities, actions, properties, agents – not just biological taxonomies Value Cultural value Economic value Moral value Social value 16 Teachers Adult educators School teachers Special ed teachers Student teachers Copyright © 2006 Access Innovations, Inc. Thinking Contemplation Divergent thinking Lateral thinking Reasoning

Generic relationship test – 1 • Both terms in same fundamental category • “All-and-some” Generic relationship test – 1 • Both terms in same fundamental category • “All-and-some” test Rodents SOME ALL Squirrels Pests SOME NOT ALL Squirrels 17 Copyright © 2006 Access Innovations, Inc.

Generic relationship test – 2 Rodents Pests Squirrels ü ALL squirrels are rodents x Generic relationship test – 2 Rodents Pests Squirrels ü ALL squirrels are rodents x NOT ALL squirrels are pests x NOT ALL pests are rodents 18 Copyright © 2006 Access Innovations, Inc.

Hierarchy – Whole-part relationship • Also known as meronymy or partonomy • Four types Hierarchy – Whole-part relationship • Also known as meronymy or partonomy • Four types allowed in thesaurus standards – Body systems and organs • Ear Middle ear – Geographical locations • Bernalillo County Albuquerque – Fields of study • Geology Physical geology – Hierarchical organizational/corporate/social/political structures • Diocese Parish 19 Copyright © 2006 Access Innovations, Inc.

Hierarchy – Instance relationship • General category (common noun) = BT • Individual example Hierarchy – Instance relationship • General category (common noun) = BT • Individual example (proper noun) = NT Seas Baltic Sea Caspian Sea Mediterranean Sea New York museums Guggenheim Museum of Modern Art Museum of Natural History Essentially identical to “final node” in taxonomies. Best practice: long list move to authority file 20 Copyright © 2006 Access Innovations, Inc.

Polyhierarchical relationship • Term can logically fit under more than one Broader Term – Polyhierarchical relationship • Term can logically fit under more than one Broader Term – can have Multiple Broader Terms (MBT) • New to ANSI/NISO standards Spoons Sporks Nurse administrators Health administrators Nurse administrators Finance Accounting 21 Forks Sporks Careers Accounting Copyright © 2006 Access Innovations, Inc.

Equivalence relationship • Preferred Term – Thesaurus term and valid for indexing – Thesaurus Equivalence relationship • Preferred Term – Thesaurus term and valid for indexing – Thesaurus notation: USE • Non. Preferred Term – Not valid for indexing – An alias or imposter – Entry point, directs user to Preferred Term – Thesaurus notation: UF or NPT Spiders UF Arachnids 22 Plant pathology USE Phytopathology Copyright © 2006 Access Innovations, Inc.

Equivalence – when to use • Synonyms, slang, quasi-synonyms • Scientific and trade names Equivalence – when to use • Synonyms, slang, quasi-synonyms • Scientific and trade names – Ibubrofen UF Motrin™ • Lexical variants – Fiber optics UF Fibre optics – Mouse UF Mice • Upward posting of narrow concepts not specified in taxonomy or thesaurus – Social class UF Elite, Middle class, Working class Get equivalent terms from search logs, brainstorming… 23 Copyright © 2006 Access Innovations, Inc.

Associative relationship • Related Terms (RTs) ~ cousins • “…terms related conceptually but not Associative relationship • Related Terms (RTs) ~ cousins • “…terms related conceptually but not hierarchically, and are not part of an equivalence set” (i. e. not synonyms) – Should siblings be Related Terms? ? • Both terms are valid thesaurus terms for indexing, and have reciprocal relationship • Expands user’s awareness, reflects thesaurus coverage of unanticipated areas • Standards describe specific types (see Lexicon) 24 Copyright © 2006 Access Innovations, Inc.

Sibling rivalry and facets • Format and sense of sibling terms should • • Sibling rivalry and facets • Format and sense of sibling terms should • • be consistent If siblings don’t coexist well, separate them Subdivide large groups of terms into facets, mutually exclusive subcategories Growing demand with faceted navigation Facet examples – Properties, Materials, Agents, Actions, Influence – Objects, Styles and periods, Color, Shape (Art & Architecture Thesaurus) 25 Copyright © 2006 Access Innovations, Inc.

Faceted classification • Pharmaceuticals – (by action) • Anti-inflammatory agents… – (by chemical structure) Faceted classification • Pharmaceuticals – (by action) • Anti-inflammatory agents… – (by chemical structure) • Alkaloids… – (by indication) • Pain… – (by use) • Immunosuppression… 26 Copyright © 2006 Access Innovations, Inc. Facet indicators (aka Node labels), not to be used for indexing

Faceting challenge Propose facet indicators and subgroup these paint varieties into facets. 27 • Faceting challenge Propose facet indicators and subgroup these paint varieties into facets. 27 • Paint – Oil paint – High-gloss paint – Interior paint – Matte paint – Latex paint – Semi-gloss paint – Exterior paint Copyright © 2006 Access Innovations, Inc.

Do you agree? • Paint – (by type) • Oil paint • Latex paint Do you agree? • Paint – (by type) • Oil paint • Latex paint – (by use) • Interior paint • Exterior paint 28 – (by surface) • High-gloss paint • Matte paint • Semi-gloss paint Copyright © 2006 Access Innovations, Inc.

Scope Notes (SN) • Indicate meaning of the term in the context • • Scope Notes (SN) • Indicate meaning of the term in the context • • • 29 of this thesaurus, for this audience – Stress – Metal, Psychological, Physiological Indicate any restriction in meaning Indicate range of topics covered Provide direction for indexers; for terms often confused, may suggest an alternative term Use only as needed – not for every term Establish and stick with consistent format Be concise Copyright © 2006 Access Innovations, Inc.