Скачать презентацию Enhancing search An update on taxonomies metadata and Скачать презентацию Enhancing search An update on taxonomies metadata and

471d341ea5a885f778d1a6d99d9b69ee.ppt

  • Количество слайдов: 35

Enhancing search An update on taxonomies, metadata and thesauri Leonard Willpower Information 1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Willpower Information 1

Summary 1 Metadata creation is cataloguing 2 Taxonomies are classifications 3 Thesauri and classifications Summary 1 Metadata creation is cataloguing 2 Taxonomies are classifications 3 Thesauri and classifications are complementary ways of grouping concepts 4 Facet analysis is a useful technique for constructing schemes systematically 5 Most computer search interfaces are inadequate 2

Metadata = catalogue records • Resources: any things that can be identified – documents, Metadata = catalogue records • Resources: any things that can be identified – documents, web pages, images, sound files, teaching packages, books, museum objects, people, organisations • Metadata: structured information about resources – May be included with resources (e. g. “CIP”) or collected in separate “union catalogues” (e. g. OAI-PMH) – Some from the resource itself (size, format), some from external sources (provenance, location, accessibility) 3

Metadata standards • • Anglo-American Cataloguing Rules (AACR) Encoded Archival Description (EAD) Learning Object Metadata standards • • Anglo-American Cataloguing Rules (AACR) Encoded Archival Description (EAD) Learning Object Metadata (LOM) Spectrum standard for museum information Friend of a Friend (FOAF) and v. Card e-Government Metadata Standard (e. GMS) Dublin Core - lowest common denominator 4

Kinds of standards • Content standards: which pieces of information are to be recorded Kinds of standards • Content standards: which pieces of information are to be recorded (DC, AACR) • Value standards: how is the information to be recorded (= DC encoding schemes) – formats (ISO date format, NCA name formats, AACR) – lists of valid values (thesauri, authority files) • Structure standards: how the information is to be grouped and labelled for use by computers and humans (XML schemas, MARC) • Application profiles: Choices from the above 5

Dublin Core metadata • • Title Creator Subject Description Publisher Contributor Date Type • Dublin Core metadata • • Title Creator Subject Description Publisher Contributor Date Type • • Format Identifier Source Language Relation Coverage Rights + element refinements 6

Subject “Typically, Subject will be expressed as keywords, key phrases or classification codes that Subject “Typically, Subject will be expressed as keywords, key phrases or classification codes that describe a topic of the resource. Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme. ” 7

Taxonomies = controlled vocabularies • “Taxonomy”: woolly meaning -> confusion – keep it for Taxonomies = controlled vocabularies • “Taxonomy”: woolly meaning -> confusion – keep it for biological classification systems • Knowledge organization systems (KOS) – a better expression for the general concept • Main types are – thesauri – classification schemes – ontologies 8

Thesauri and classification schemes • Thesauri and classification schemes are alternative ways of showing Thesauri and classification schemes • Thesauri and classification schemes are alternative ways of showing concepts and their relationships • They are complementary and both approaches are needed • They can both be built on the principles of facet analysis 9

Building blocks of all knowledge organisation schemes • concepts • relationships 35 m cameras Building blocks of all knowledge organisation schemes • concepts • relationships 35 m cameras CC: H 012 BT: film cameras aqualungs CC: D 002 BT: diving equipment camera accessories CC: H 002 BT: photographic equipment NT: flash guns light meters tripods RT: cameras 10

Relationships are between concepts, not words vehicles road vehicles conveyances voitures 388. 34 629. Relationships are between concepts, not words vehicles road vehicles conveyances voitures 388. 34 629. 2 BT NT cars automobiles autos private cars 388. 342 629. 222 Choose one term as a descriptor to label the concept: cars USE automobiles 11

Preferred term substitution Anything on farming? I use the term agriculture for farming, so Preferred term substitution Anything on farming? I use the term agriculture for farming, so I’ll search for that 12

Relationships between concepts • Paradigmatic, or a priori: apply generally, independently of any specific Relationships between concepts • Paradigmatic, or a priori: apply generally, independently of any specific document – shoes BT footwear – shoes RT shoemakers A thesaurus can show these • Syntagmatic, or a posteriori: concepts that are related only in the context of a specific document – shoes : history – shoes : prices A classification scheme can also show these 13

Searching hierarchies I need information on road vehicles I know that buses, cars and Searching hierarchies I need information on road vehicles I know that buses, cars and lorries are all kinds of road vehicles, so I’ll search for these terms as well as for road vehicles 14

Searching related terms Please give me information about agriculture OK, I’ll look for that. Searching related terms Please give me information about agriculture OK, I’ll look for that. Would you also be interested in items dealing with forestry, livestock or pet breeding? 15

Paradigmatic relationships in a thesaurus • Many relationships are indicated as RT/RT, but their Paradigmatic relationships in a thesaurus • Many relationships are indicated as RT/RT, but their nature is not specified, so cannot be used for systematic grouping (ontologies overcome this) • Hierarchical generic-specific relationship (BT/NT) allows (requires) grouping of concepts into facets - the terms have to be in the same facet 16

What is a facet? (Sometimes called a fundamental facet) A high-level grouping of concepts What is a facet? (Sometimes called a fundamental facet) A high-level grouping of concepts of the same inherent category, e. g. activities, disciplines, people, materials, places, times. For example: · animals, mice, daffodils and bacteria could all be members of a living organisms facet; · digging, writing and cooking could all be members of an activities facet; · birthdays, wars and football matches could all be members of an events facet. A concept cannot belong to more than one facet 17

What is an array? (Sometimes called a subfacet) A grouping of concepts within a What is an array? (Sometimes called a subfacet) A grouping of concepts within a facet by some stated characteristic of division. vehicles Array ê bicycles ê tricycles ê four-wheeled vehicles automobiles Node labels showing characteristics of division ê goods vehicles lorries A concept may occur in ê passenger vehicles more than one array automobiles buses 19

Parametric search • Searching for resources that have one or more specified characteristics • Parametric search • Searching for resources that have one or more specified characteristics • e. g. vehicles which – have three wheels AND – are used for carrying passengers • This is an important and useful aspect of post-coordinate searching, but it is not faceted classification 20

Ways of displaying concepts and their paradigmatic relationships 1. Alphabetically, with their relationships 35 Ways of displaying concepts and their paradigmatic relationships 1. Alphabetically, with their relationships 35 mm cameras BT: film cameras aqualungs BT: diving equipment camera accessories BT: photographic equipment NT: flash guns light meters tripods RT: cameras 21

Ways of displaying concepts and their paradigmatic relationships 2. Hierarchically - one tree for Ways of displaying concepts and their paradigmatic relationships 2. Hierarchically - one tree for each facet (fields of work). diving. photography. physics . . optics (people) . infants. children. adults . divers. models (people). photographers. physicists (equipment). diving equipment. . aqualungs. . diving suits. . . dry suits. . . wet suits. . face masks. photo equipment. . cameras 22

Ways of displaying concepts and their paradigmatic relationships 3. In subject groups or categories Ways of displaying concepts and their paradigmatic relationships 3. In subject groups or categories (microthesauri) – one tree for each facet in each category 770: PHOTOGRAPHY (fields of work) (people). photography. models (people). . colour photography. photographers (equipment). photo equipment. . cameras 797. 23: DIVING (fields of work). diving. . scuba diving. . snorkel diving (people). divers (equipment). diving equipment. . aqualungs. . diving suits. . . dry suits 23

Combining concepts : syntagmatic relationships A 1 A 2 Netherlands A 3 B 1 Combining concepts : syntagmatic relationships A 1 A 2 Netherlands A 3 B 1 B 2 (places) Italy The Russia (people) potters C 1 C 2 C 3 (activities) moulding throwing decoration Node labels showing facet names (objects) D 1 earthenware D 2 porcelain D 3 stoneware repairers Combine to express compound subjects B 3 either post-coordinate, for searching: ceramicists porcelain AND decoration AND Russia or pre-coordinate, for browsing: porcelain decoration in Russia: D 2 C 3 A 3 24

Order of combining facets thing - kind - part - property - material - Order of combining facets thing - kind - part - property - material - process operation - system operated on - product - byproduct - agent - space - time - form e. g. porcelain (thing) decoration (process) in Russia (space) A facet may occur more than once in a string 25

Faceted classification with processes subordinated to objects (processes) A ceramic production processes in general Faceted classification with processes subordinated to objects (processes) A ceramic production processes in general Words shown in blue AA forming in general may be omitted as AAA coiling AAB moulding they are implied by the AAC throwing hierarchical structure AB decoration in general ABA glazing ABB transfer printing (objects) B ceramics in general BB earthenware in general (processes) BB. AA forming of earthenware BB. AAB moulding of earthenware BB. AB decoration of earthenware BB. ABA glazing of earthenware BB. ABB transfer printing of earthenware BC porcelain in general (processes) BC. AA forming of porcelain BC. AAB moulding of porcelain 26

Faceted classification generation of subject strings (objects) B ceramics BB earthenware (processes) BB. AA Faceted classification generation of subject strings (objects) B ceramics BB earthenware (processes) BB. AA forming BB. AAB BB. AB decoration BB. ABA BB. ABB BC porcelain (processes) BC. AA forming BC. AAB moulding glazing transfer printing moulding ceramics > earthenware > forming > moulding ceramics > earthenware > decoration > glazing ceramics > earthenware > decoration > transfer printing ceramics > porcelain > forming > moulding 27

Alphabetical index ceramic production processes A ceramics B coiling : forming : ceramic production Alphabetical index ceramic production processes A ceramics B coiling : forming : ceramic production AAA decoration : ceramic production AB decoration : earthenware : ceramics BB. AB earthenware : ceramics BB forming : ceramic production AA forming : earthenware : ceramics BB. AA forming : porcelain : ceramics BC. AA glazing : decoration : ceramic production ABA glazing : decoration : earthenware : ceramics BB. ABA moulding : earthenware : ceramics BB. AAB moulding : forming : ceramic production AAB moulding : porcelain : ceramics BC. AAB porcelain : ceramics BC throwing : forming : ceramic production AAC transfer printing : decoration : ceramic production ABB transfer printing : decoration : earthenware : ceramics BB. ABB 28

The same concepts viewed in different ways Classification view · Good for browsing or The same concepts viewed in different ways Classification view · Good for browsing or surveying a topic · Like a map · Like a book’s contents page · Shows related concepts together · Usually arranged by discipline · Shows syntagmatic and paradigmatic relationships · Shows compound topics as pre-combined subject strings Thesaurus view · Good for searching if you know what you want · Like a gazetteer · Like a book’s index · Gets quickly to individual concepts · Usually arranged by facet · Shows paradigmatic relationships · Lets you combine concepts when searching 29

Some clarifications • A classification can be both hierarchical and faceted • A classification Some clarifications • A classification can be both hierarchical and faceted • A classification built on faceted principles can be enumerative • A symbolic notation is not essential, and should not determine the structure • A classification can arrange compound topics in a useful linear sequence - a thesaurus cannot • One-to-one mapping between a thesaurus and a classification is not possible • A “guide to popular topics” may be used to supplement a systematic classification 30

Use of a thesaurus • A thesaurus as a search aid with unindexed material Use of a thesaurus • A thesaurus as a search aid with unindexed material – Allows searching on terms linked to the term asked for • Software support formulating questions – Browsing thesaurus to choose terms – Combining terms with AND, OR, NOT and ( ) 31

An ambiguous search interface Does this mean: or does it mean: (lorries OR cars) An ambiguous search interface Does this mean: or does it mean: (lorries OR cars) AND diesel ? lorries OR (cars AND diesel) ? 32

Thesaurus creation and management • Standards – BS/ISO standards give helpful guidance – Draft Thesaurus creation and management • Standards – BS/ISO standards give helpful guidance – Draft revised BS standard now out for comments • Software – Many packages available – Best if integrated with database used for cataloguing • Cooperative thesaurus development and use – DIY is a major and continuing task 33

Thesaurus development never ends • It is an ongoing task • It needs a Thesaurus development never ends • It is an ongoing task • It needs a knowledgeable thesaurus editor • It needs cooperation and input from indexers and users • User feedback 34

What we need • Software for the combined development of thesaurus and classification – What we need • Software for the combined development of thesaurus and classification – Thesaurofacet; Classaurus; ROOT; Bliss; Taxomita • Software support for combining facets when searching, using a thesaurus. Often referred to as faceted classification, but not the same thing – Flamenco; View-based searching; No zero match (NZM) • Software support for browsing in a classified catalogue with notation, captions and an alphabetical index 35

Links and further information <http: //www. willpowerinfo. co. uk/> 36 Links and further information 36