4ebb9ce160990cc82e92c8f44357726e.ppt
- Количество слайдов: 34
Glasgow, 3 -5 September, CILIP Cataloguing and Indexing Group Conference CLASSIFICATION ON THE NETWORK: MACHINE READABLE, SHARED, CLONED AND HIDDEN Aida Slavic associate editor of the UDC aida. slavic@udcc. org
CLASSIFICATION ON THE NETWORK § the use of classification outside bibliographic domain brought about by the Internet • broad knowledge browsing, presentation (initially) • automatic classification § moving ‘behind the screen’ with digital repositories and cross repository resource discovery • • information integration and searching across distributed collections mapping between vocabularies supporting cross language searching supplementing simple text retrieval techniques to enable search expansion • alerting services • filtering by subject areas for various type of reports, auditing, statistics • source of vocabulary to build new vocabularies § common to all is interest in readily available rich classification data on which services and tools can be built at lower costs 2
VOCABULARY SHARING ON THE NETWORK § Need for generally applicable standards for representing vocabularies in machine readable way § Preferance XML and XML/RDF technology – to promote domain, system and platform independence § Publishing, exposing and sharing controlled vocabularies on the network • ISO/IEC 13250 Topic Maps • BS 8723 Structured vocabularies for information retrieval • Simple Knowledge Organization System (SKOS) See: Sharing Vocabularies on the Web via SKOS 3
VOCABULARY SERVICES & REGISTRIES Making the content of knowledge organization systems (KOS) available through web services initiative by NKOS – Network Knowledge Organization Systems and Services (http: //nkos. slis. kent. edu/) For registries we need: § machine accessible vocabularies using representations standard and access protocols § metadata for describing KOS (using a standard for identifying and describing vocabularies) § business case/cost effectiveness § upload of vocabularies into registries by owners and regular maintenance and upload of versions See: Tudhope, D. Knowledge Organization System Services: brief review of NKOS activities and possibility of KOS registries http: //www. iskouk. org/presentations/tudhope_ISKOUKseminar 1. pdf 4
CLASSIFICATION & SEMANTIC WEB § classification’s capacity to represent and control complex semantic relationships across universe of knowledge is compatible with the semantic web goals - universal and meaningful linking of concepts § large collections of resources already organized according to classifications schemes are source of concept/subject relationships that can be utilized to improve automatic information integration § prerequisite: full machine readability of data ! open access to classification data on the network 5
ABOUT CLASSIFICATION: OUTLINE § role of classification in supporting subject access § subject authority control: managing, sharing, reuse of classification § improving classification source data 6
SEMANTIC RELATIONSHIPS words alone can only be arranged ordered alphabetically Antineutrinos Antineutrons Antiprotons Atomic physics Baryons Beta-particles Bosons Electrons Hadrons Hyperons Leptons Mesons Molecular physics Muons Neutrinos Neutrons Nuclear physics Nuclei Nucleons Positrons Protons Resonances grouping concepts into classes according to similarity 539. 1 Nuclear physics. Atomic physics. Molecular physics 539. 12 Elementary and simple particles 539. 123/. 124 Leptons. Including: Muons 539. 123 Neutrinos 539. 123. 6 Antineutrinos 539. 124 Electrons (including beta-particles) 539. 124. 6 Positrons 539. 125/. 126 Hadrons. Baryons and mesons 539. 125 Nucleons 539. 125. 4 Protons 539. 125. 46 Antiprotons 539. 125. 5 Neutrons 539. 125. 56 Antineutrons 539. 126. 3 Mesons 539. 126. 4 Resonances 539. 126. 6
SEMANTIC RELATIONSHIPS alphabetical order no semantic relationships Antineutrinos Antineutrons Antiprotons Atomic physics Baryons Beta-particles Bosons Electrons Hadrons Hyperons Leptons Mesons Molecular physics Muons Neutrinos Neutrons Nuclear physics Nuclei Nucleons Positrons Protons Resonances systematic order semantic relationships fixed by notation 539. 1 Nuclear physics. Atomic physics. Molecular physics 539. 12 Elementary and simple particles Elementary particles 539. 123/. 124 Leptons. Including: Muons Neutrinos 539. 123. 6 Antineutrinos 539. 123. 6 Electrons (including beta-particles) 539. 124. 6 Antineutrinos Positrons 539. 124 Electrons (including 539. 125/. 126 beta-particles) Hadrons. Baryons and mesons 539. 125 539. 124. 6 Nucleons 539. 125. 4 Positrons Protons 539. 125. 46 Antiprotons 539. 125/. 126 Hadrons. Baryons 539. 125. 5 Neutrons and mesons 539. 125. 56 Antineutrons 539. 125 Nucleons 539. 126. 3 539. 125. 4 Mesons Protons 539. 126. 4 539. 125. 46 Resonances 539. 126. 6 Antiprotons Hyperons 539. 125. 5 Neutrons 539. 125. 56 – enables mechanical ordering of subjects NOTATION Antineutrons 539. 126. 3 Mesons 539. 126. 4 Resonances 539. 126. 6
WORDS § classification is ‘language independent’ but. . . § words are an essential part of every classification system § the separation of concepts from words using notation - simply means that an infinite number of natural language expressions can be attached to every class notation in order to provide access points § verbal access points managed separately as § captions § subject-alphabetical index (relative index, chain index) § alphabetical controlled vocabularies (thesauri, subject headings) § folksonomy 9
HIERARCHICAL ORGANIZATION 6 62 621. 882 621. 882. 214. 2 Applied sciences. Medicine. Technology Engineering. Technology in general Mechanical engineering in general. Nuclear technology. Electrical engineering. Machinery Machine elements. Motive power engineering. Materials handling. Fixings. Lubrication Fastening, fixing devices. Fasteners Threaded fasteners. Screws. Nuts and bolts. Washers Screws, bolts according to head form. Screws and bolts for various materials Screws and bolts according to head form Other polygonal-headed screws and bolts Screws and bolts with knurled or milled head. Thumb screws § freedom to choose and change the level of specificity § browsing function § semantic search expansion 10
UNIVERSAL KNOWLEDGE CLASSIFICATION – ASPECT CLASSIFICATIONS § organizes the universe of knowledge by disciplines based on some scientific and educational consensus (criticism!) § groups phenomena according to the way they are researched, described and studied in documents § assumption – collocation of books by the field in which they are used saves user’s time users looking for books on managing rabbit as a pest will not be interested in fur industry or physiology of rodents. . . They will find all books on rodent pest control in the closest proximity § same phenomenon will find its place in all disciplines in which it may be subject of study 11
SUBJECT CONTEXT – ASSOCIATIVE RELATIONSHIPS Zoology Mammals Rodentia. Lagomorpha Myiomorpha Muridae. Mice and rats Mouse see also Agriculture Plant protection Control of plant diseases and pests Destruction of vertebrate pests Mouse see also Agriculture Animal husbandry Rodents kept for fur Mouse see also Chemical industry Pest-control chemicals Chemicals for controlling rodents. Rodenticides Mouse 12
LINEAR PRESENTATION OF KNOWLEDGE § the role of classification is to establish systematic, linear presentation of knowledge – order of classes § two types of classifications with respect to the flexibility of access points • enumerative – single, pre-established order of simple and complex subjects (e. g. Dewey, LCC) • faceted and semi-faceted classification – allow a range of options in class ordering, control over access points to subjects, and unlimited combinations of subjects 13
SUBJECT ACCESS POINTS bibliographic classifications are designed to denote the following elements of content : § subject and subject facets: entity (its parts, kinds), processes, materials, agents, operations, instruments, space, time § relationships between subjects treated within the document (influence, bias, application, comparison) § inner form of presentation: theoretical, historical or criticism § outer form of presentation such as audience, purpose, form of expression § manifestations: text, image, sound § carriers: paper, magnetic/optical discs, film, analogue recordings 14
CLASSIFICATION VOCABULARY (e. g. UDC) COMMON AUXILIARY NUMBERS SPECIAL AUXILIARY MAIN CLASSES NUMBERS (DISCIPLINES) TIME “” ETHNICS (=. . . ) PLACE (1/9) FORM (0. . . ) ‘ -1/-9 LANGUAGE =… PROPERTIES -02 MATERIALS -03 RELATIONS -04 PERSONS -05 . 0 15
SYNTHESIS MAIN TABLES COMMON AUXILIARIES Discipline 1 Materials Discipline 2 Language Discipline 3 81 Time Form Linguistics and languages 811. 12. 22 811. 12. 24 811. 12. 3/. 4 811. 12. 3 811. 12. 4 811. 12. 58 German Upper German Middle German Low German Plattdeutsch -1 /-9 Schools, trends, methods Frisian -116 Structuralism Dutch -116. 2 Geneva school Dutch based ‘ 0 Origin and periods pidgin and creole Origins and periods of ‘ 0 (1/9) Place (4) Europe (430) Germany (436) Austria (437. 3) Czech Republic (437. 5) Slovakia (438) Poland langusg General theory of linguistics ‘ 1/’ 9 SPECIAL AUXILIARY NUMBERS ‘ 1 ‘ 2 ‘ 34 ’ 35 ’ 36 ’ 37 Metatheory Subject fields, facets of lin. Phonetics. Phonology Graphemics. Orthography Grammar Semantics 16
RELATING SUBJECTS ACROSS DISCIPLINES = PHASE RELATIONSHIPS 37 : 004 Education : Computers 338. 48: 61 Tourism : Medicine 602. 72 : 17 Embryonic cloning : Ethics -04 Relations, Processes and Operations -042 Phase relations -042. 1 Bias phase -042. 2 Comparison phase -042. 3 Influence phase -042. 4 Tool phase. Exposition phase 17
SUBJECT FACETS AND FLEXIBILITY OF ORDER 94 (410. 5)“ 18” History Scotland 19 th century 94 “ 18” (410. 5) History 19 th century Scotland 18
FACETS OF PERSONS -057 Persons according to occupation, work, livelihood, education -057. 17 Managers in general. The management -057. 177 Higher management. Top management -057. 177. 3 Directors. Board members -056 Persons according to constitution, health, disposition, hereditary or other traits -057. 177. 32 Non-executive directors -056. 2 Persons -057. 177. 321 Deputy directors. Assistant directors according to physical state and health -056. 25 Persons according to nourishment (nutritional state) or body weight -056. 257 Overweight persons. Overnourished. Fat. Obese. Hypertrophic -053. 88 Persons according to age or age-groups Adults. Grown-ups Persons in late middle age (troisième âge)612. 12 -009. 92 -057. 177 -053. 88 -056. 257 Angina pectoris Top management – Persons in late middle age. Overweight 19
FACETED STRUCTURE ALLOWS - FACET BASED VIEW 20
MANAGING SUBJECT ACCESS AUTHORITY FILE UDC CLASS: DESCRIPTION: 941. 0 BROADER: SEE ALSO: 94(4) 94(73), 94(54), 94(366) SEARCH TERMS: History United Kingdom Great Britain 20 th century IS DESCRIBED BY METADATA DISPLAY AS: author title publisher format. . . SUBJECT CLASSMARK: 94(410)"19" History of the U. K. WAS BEFORE: DOCUMENT 94 (410) "19" United Kingdom - History -----------------------------MAPPING TO: IS DESCRIBED IN Dewey: 94 LCSH: History, 20 th century United Kingdom LCC: DA 566 -592 21
SEMANTIC SEARCH EXPANSION hadrons search SUBJECT # HITS 539. 12 Elementary and simple particles 132 539. 125/. 126 Hadrons. Baryons and mesons 539. 125. 4 Protons 5 58 539. 125. 46 Antiprotons 539. 125 Nucleons 2 38 539. 125. 5 Neutrons 7 539. 125. 56 Antineutrons 1 539. 126. 3 Mesons 9 539. 126. 4 Resonances 11 539. 126. 6 Hyperons 6 22
ADVANTAGES IN RESOURCE DISCOVERY: DISAMBIGUATION mercury search results. . . SUBJECT # HITS SUBJECT HITS # 523. 41 ASTRONOMY. Mercury 531. 787. 4 PHYSICS. Mercury barmeters ASTRONOMY. Mercury 543. 272. 81 PHYSICS. Mercury barmeters Mercury ANALYTICAL CHEMISTRY. Mercury 546. 49 INORGANIC CHEMISTRY. Mercury, Hg 621. 181. 232 ENGINEERING. Mercury vapour generators 66. 095. 712. 49 CHEMICAL INDUSTRY. Mercuration 2 3 38 38 10 10 9 9 3 3 23
ADVANTAGES IN RESOURCE DISCOVERY: PRECISION rabbit search results. . SUBJECT 569. 32 632. 935. 7 636. 92. 045 636. 932 639. 112 641. 8 677. 354 # HITS Zoology: Rodentia and Lagomorpha Protection of crops Animal husbandry. Domestic rabbits. Pets Animal husbandry. Rodents kept for fur Hunting. Small game generally Cooking. Main dishes Textile industry. Hare fur. Rabbit fur 7 3 38 10 9 22 2 8 24
SUPPORTING MULTILINGUAL SEARCHING CLASSIFICATION AUTHORITY FILE SUBJECT AREAS ZOOLOGY 599. 325. 1 Lapin, Coniglio, Kaninchen, Rabbit. . . 636. 92 ANIMAL HUSBANDRY 677. 354 FUR INDUSTRY hierarchical organization of concepts Lang 1 Lapin, Coniglio, Kaninchen, Rabbit. . . Lapin, Coniglio, Kaninchen, Rabbit search terms SEARCHING INTERFACE Lang 2 Coniglio Lang 3 Kaninchen Lang 4 Rabbit 25
INTEGRATION OF INFORMATION library classification is often used as a pivot i. e. a central mapping structure - for the alignment of different vocabulares as a central mapping structure 26
EXAMPLE Nebis subject authority file record, ETH-Biliothek (Zürich) http: //www. ethbib. ethz. ch/index_e. html 27
MARC CLASSIFICATION FORMATS MARC 21 Concise Format for Classification Data http: //www. loc. gov/marc/classification/ Concise UNIMARC Classification Format http: //www. ifla. org/VI/3/p 1996 -1/concise. htm • offer sufficient support to semantic relationships but no support for managing and exploiting complexity of classification syntax, managing global changes i. e. § heading field is not structured and does not allow multidirectional access to the meaningful elements of a complex notation 28
REQUIREMENT § UDC number encoding for database management data element identifiers 51 (410) (091) § machine readable identification of each structural part of notation separates display of numbers/symbols from their function 29
NETWORK – INITIALLY RESOURCE DISCOVERY German Harvest Automated Retrieval and Directory - GERHARD subject gateway - automatic classification of the German web based on UDC data from the ETH library authority file (GERHARD website was shut down in 2005). read more at http: //www. bis. uni-oldenburg. de/abt 1/waetjen/publ/Article. pdf 30
31
TASKS FOR CLASSIFICATION DEVELOPERS Improving classification data at their source: • provide rich, machine readable classification data exposing semantic relationships and providing multiple access points to notation and words • enable sharing by distributing data in different standard formats • find way of releasing part of data in public domain for testing and training • make sure that copyright regulations do not impede the use of classification in information integration and exchange 32
EXAMPLE - UDC § UDC Master Reference File (MRF) data has been distributed to users in a file format since 1993. § data is improved: unique identifier for every class (independent from notation), semantic and syntactic relationships declared, syndectic structure improved § MRF 2008 exports will be available in MARC and SKOS standards or as on demand SQL statements, + various TEXT/XML outputs § pending: -improvement of verbal access (subject-alphabetical index) -merging the existing multilingual data into one database § future plans: inclusion of mapping to other vocabularies § looking for projects to test semantic technologies and how part of UDC data can be tested in an open m 2 m environment 33
IN SUMMARY § development of new standards opens new possibilities for sharing and use of classification: new services and new solutions § to support new kind of users classification has to be exposed in machine readable, standardized format and made accessible to programs and services on the network § issues for owners: costs, copyright policy --- END OF PRESENTATION --- 34
4ebb9ce160990cc82e92c8f44357726e.ppt