8bb66b692144cf899d80d19c334fd71d.ppt
- Количество слайдов: 21
International Atomic Energy Agency Subject Analysis: Computer Assisted Indexing Bekele Negeri INIS Unit Nuclear Information Specialist (Adapted from A. Nevyjel’s presentation) 07 – 11 October 2013 Vienna, Austria October 2013 INIS Training Seminar 1
Subject Indexing Tools There are two main INIS products used for indexing: Win. Fibre and CAI • Win. Fibre – for input preparation both bibliographic and subject indexing • CAI (Computer Assisted Indexing) – for subject classification and indexing INIS/ETDE Thesaurus and INIS Subject Category Codes are incorporated in both. INIS Training Seminar October 2013 2 International Atomic Energy Agency
Indexing with FIBRE INIS Training Seminar October 2013 3 International Atomic Energy Agency
Computer-assisted Indexing - CAI • Kick-off Meeting Jan 2004 • Implementation and Customisation Jun 2004 • Production Indexing from Jun 2004 ongoing • CAI version 1. 0 final acceptance Aug 2004 • Tuning of the system from Aug 2004 ongoing • CAI batch processing for Member States Dec 2004 • CAI online from remote for MS Nov 2007 INIS Training Seminar October 2013 4 International Atomic Energy Agency
INIS Training Seminar October 2013 5 International Atomic Energy Agency
CAI Thesaurus Extension • • Thesaurus • • • Valid Descriptors Forbidden Terms Total 22, 051 8, 675 30, 726 CAI • Hidden Terms ~35. 000 Terminological Knowledge Base INIS Training Seminar October 2013 6 International Atomic Energy Agency
CAI Thesaurus extension “Hidden terms” are character patterns representing the different appearances of a concept in the free text, which is indexed by one or more descriptors. • handled similar to “forbidden terms” with one or more USE relations • CAI internal only • not exported to INIS production system • not exported to FIBRE • not printed in any appearance of thesaurus • support identification of descriptors in the free text INIS Training Seminar October 2013 7 International Atomic Energy Agency
Hidden Terms: Compounds and Isotopes Descriptor hidden term MAGNESIUM BORIDES Mg. B_2 ACETIC ACID C_2 H_4 O_2 Mg. B 2 C 2 H 4 O 2 CESIUM 137 "1"3"7 cs 137 caesium 137 cesium 137 cs s 137 cs"1"3"7 cs 137 INIS Training Seminar October 2013 free text 8 Cesium 137, Cesium-137 137 Cs 137 Caesium, 137 -Caesium 137, Caesium-137 Cesium, 137 -Cesium 137 Cs, 137 -Cs Cs 137, Cs-137 Cs 137 International Atomic Energy Agency
Hidden Terms: Elementary Particles and countries Descriptor hidden term ELECTRON NEUTRINOS MUON NEUTRINOS TAU NEUTRINOS RHO-770 MESONS OMEGA-782 MESONS CAMBODIA COTE D'IVOIRE GREECE MYANMAR THAILAND INIS Training Seminar #nu#_e #nu#_#mu# #nu#_#tau# #rho#-770 #omega#-782 Country Names: kampuchea ivory coast hellas burma siam October 2013 9 free text νe νμ ντ ρ-770 ω-782 International Atomic Energy Agency
Hidden Terms: UK/US Spellings Descriptor hidden term A CENTERS ACTIVITY METERS ANALOG COMPUTERS ANESTHESIA ARCHAEOLOGY AUSTRIAN ORGANIZATIONS BALLISTIC MISSILE DEFENSE BAYARD-ALPERT GAGES BEAM ANALYZERS BEHAVIOR CATALOGS INIS Training Seminar October 2013 a centres activity metres analogue computers anaesthesia archeology austrian organisations ballistic missile defence bayard-alpert gauges beam analysers behaviour catalogues 10 International Atomic Energy Agency
Hidden Terms: Other Spellings Descriptor Singular/Plural FUNGI G MATRIX Reverse Sequence ATOM-MOLECULE COLLISIONS ATOM-MOLECULE COLLISIONS INIS Training Seminar October 2013 11 hidden term funguses g matrices g matrixes atom-molecule scattering molecule-atom scattering atom-molecule reactions molecule-atom reactions atom-molecule interactions molecule-atom interactions International Atomic Energy Agency
Further Improvements necessary • • • “+” and “-“ signs • K+ KAONS PLUS, KAONS MINUS, POTASSIUM IONS Case sensitivity • • • Ti. N TIN (instead of TITANIUM NITRIDES) gas GALLIUM SULFIDES “…who is the …” WHO (World Health Organization) Verbs versus Nouns • • “… this leads us to …” LEAD “… this leaves it …. ” LEAVES Homographic terms • Solutions SOLUTIONS or MATHEMATICAL SOLUTIONS Nuclear Reactions, e. g. • • • 14 N(γ, α)10 B Targets Beams Reactions INIS Training Seminar October 2013 12 International Atomic Energy Agency
INDEXING PROBLEMS • • General terms (energy, physics, materials, uses etc. Misleading CAI suggestions: ü Thesaurus terms: PRODUCTION and PARTICLE PRODUCTION SOLUTION and MATHEMATICAL SOLUTION IGNITION and THERMONUCLEAR IGNITION WALLS and THERMONUCLEAR REACTOR WALLS PLANTS and NUCLEAR POWER PLANTS MEMBRANES (classic) and membrane (in brane theory) COLOR and COLOR MODEL (elementary particle characteristics) TRANSPORT, etc. INIS Training Seminar October 2013 13 International Atomic Energy Agency
INDEXING PROBLEMS üchemical compounds/ case sensitivity/homonyms: INDIUM IONS for “in ions” ASTATINE 200 for at 200 o. C VISIBLE RADIATION for light (weight) HELIUM 6 for “consisting of 6 He 3 tubes” VISIBLE RADIATION for “light weight” ütemperature, pressure, etc. range üabbreviations: TNA for Thermal Neutron Analysis and TRINONYLAMINE MPA for Maximum Permissible Activity MPa (Mega Pascal) INIS Training Seminar October 2013 14 International Atomic Energy Agency
CAI online for Member States introduced in July 2007 • CAI Batch used by • CAI Online in use by • China • Czech Republic (seldom) • Georgia (only in 2012) • Germany • Iran • Uzbekistan • Vietnam • Austria • Bulgaria • Cuba • Israel • Japan • Mexico • Netherlands • Uruguay (registering) (seldom) CAI online and CAI batch are now regular services for Member States INIS Training Seminar October 2013 15 International Atomic Energy Agency
CAI Batch and Online Processing • Input: Mem. St-CC-yymmdd-xxxxxx • Mem. St is a standard prefix (meaning “member state”) • CC is the country code • yymmdd is the date when the file was generated • xxxxxx is any additional identification • Examples • Mem. St-AR-041203 -thisismytestfile • Mem. St-FR-041212 -fileidentification INIS Training Seminar October 2013 16 International Atomic Energy Agency
CAI Batch Processing • Output: _Mem. St-CC-yymmdd-xxxxxx • These files will carry the CAI suggested descriptors in tag • 800, preceded by the string ##CAI suggestions##; Example: • 800^##CAI suggestions##; DESCRIPTOR 1; DESCRIPTOR 2; DESCRIPTOR 3; ……. • sent back to the member state for reviewing INIS Training Seminar October 2013 17 International Atomic Energy Agency
INIS Training Seminar October 2013 18 International Atomic Energy Agency
CAI Batch and Online Processing Reviewing Process • • Delete all suggested descriptors which are too general Add relevant descriptors which were not found • • • numerical values, e. g. pressure ranges, temperature ranges, . . . nuclear reactions chemical compounds, alloys, etc. CAI is cleaning up BT/NTs clean up BT/NTs from manual additions Clean up suggestions from homographic terms INIS Training Seminar October 2013 19 International Atomic Energy Agency
CAI Batch and Online Processing Finalisation Process CAI batch • When reviewing of the record completed: Delete “##CAI suggestions## “ • When reviewing of all records completed: Submit file to “INIS Input Box” CAI online • When reaching the last record: press “export and exit” button • File goes directly to INIS production system, or if required, sent back to Member State for reviewing INIS Training Seminar October 2013 20 International Atomic Energy Agency
Thank you! INIS Training Seminar October 2013 21 International Atomic Energy Agency
8bb66b692144cf899d80d19c334fd71d.ppt