2343555d2b7b05ba6fbb75d5e0cd33e2.ppt
- Количество слайдов: 26
Taxonomy Strategies LLC Location Terminologies ASIS&T Annual Meeting Austin, TX November 7, 2006 Copyright 2006 Taxonomy Strategies LLC. All rights reserved.
Agenda Who we are Overview Using ISO 3166 Accommodating special needs Taxonomy Strategies LLC The business of organized information 2
Who we are: Ron Daniel, Jr. Ø Over 15 years in the business of metadata & automatic classification § Principal, Taxonomy Strategies § Standards Architect, Interwoven § Senior Information Scientist, Metacode Technologies (acquired by Interwoven, November 2000) § Technical Staff Member, Los Alamos National Laboratory § Doctoral and post-doctoral research in pattern recognition Ø Metadata and taxonomies community leadership § Chair, PRISM (Publishers Requirements for Industry Standard Metadata) working group § Acting chair, XML Linking working group § Member, RDF working groups § Co-editor, PRISM, XPointer, 3 IETF RFCs, and Dublin Core 1 & 2 reports. Taxonomy Strategies LLC The business of organized information 3
Recent & current projects Taxonomy Strategies LLC The business of organized information 4
Agenda Who we are Overview Using ISO 3166 Accommodating special needs Taxonomy Strategies LLC The business of organized information 5
8 Common Taxonomy Facets Facet Definition Potential Sources Organizational structure. FIPS 95 -2, U. S. Government Manual, Your organizational structure, etc. Content Type Structured list of the various types of content being managed or used. DC Types, AGLS Document Type, AAT Information Forms , Your records management policy, etc. Industry Broad market categories such as lines FIPS 66, SIC, NAICS, Your market segments, etc. of business, life events, or industry codes. Location Place of operations or constituencies. Functions and processes performed to FEA Business Reference Model, Enterprise accomplish mission and goals. Ontology, AAT Functions, Your business functions, etc. Topic Business topics relevant to your mission & goals. Audience Subset of constituents to whom a GEM, ERIC Thesaurus, IEEE LOM, Your psychopiece of content is directed or intended graphics or personas, etc. to be used. Products & Services Names of products/programs & services. Taxonomy Strategies LLC The business of organized information FIPS 5 -2, FIPS 55 -3, ISO 3166, UN Statistics Div, US Postal Service, Your sales regions, etc. Federal Register Thesaurus, NAL Agricultural Thesaurus, LCSH, Your research areas, etc. ERP system, UNSPSC, Your products and services, etc. 6
Potential facets in the petroleum industry Moderately related to location Strongly related to location Maint. Disciplines Facilities Production Content Types Community Standard Taxonomy Strategies Lease Mgmt Should be part of community standard LLC The business of organized information Orgs. Process Mgmt Reserves Human Resources E&P Lifecycle Hydro carbon System Geologic Age Basins, Reservoirs & Fields Wells Locations Company Org Company Facets 7
Location names serve as surrogates for other things Ø Company divisions Ø Company facilities Ø Regulatory regimes Ø Currency regions Ø Product marketing areas Ø Sales territories Ø Customer locations Taxonomy Strategies LLC The business of organized information 8
What is a good taxonomy? ü A means to an end, and not the end in itself. ü Not perfect, but it does the job it is supposed to do—such as improving search and navigation. ü Improved over time, and maintained. ü Incremental, extensible process that identifies and enables owners, and engages stakeholders. ü Quick implementation that provides measurable results as quickly as possible. ü Not monolithic—has separately maintainable facets. ü Re-uses existing IP as much as possible. Taxonomy Strategies LLC The business of organized information 9
Location names are used as part of different purposes Ø Typical correspondence and shipping § “Libya” § “South Korea” Ø Official correspondence with government ministers § “Great Socialist People's Libyan Arab Jamahiriya” § “Republic of Korea” Ø Corporate division of responsibility § “Western Region” – does that include Montana? Taxonomy Strategies LLC The business of organized information 10
Location terminologies may be used to organize different collections of information ABC Computers. com Content Type Competency Industry Service Award Case Study Contract & Warranty Demo Magazine News & Event Product Information Services Solution Specification Technical Note Tool Training White Paper Other Content Type Business & Finance Interpersonal Development IT Professionals Technical Training IT Professionals Training & Certification PC Productivity Personal Computing Proficiency Banking & Finance Communications E-Business Education Government Healthcare Hospitality Manufacturing Petrochemocals Retail / Wholesale Technology Transportation Other Industries Assessment, Design & Implementatio n Deployment Enterprise Support Client Support Managed Lifecycle Asset Recovery & Recycling Training Taxonomy Strategies LLC The business of organized information Product Family Desktops MP 3 Players Monitors Networking Notebooks Printers Projectors Servers Services Storage Televisions Other Brands Audience Line of Business Region. Country All Business Employee Education Gaming Enthusiast Home Investor Job Seeker Media Partner Shopper First Time Experienced Advanced Supplier All Home & Home Office Gaming Government, Education & Healthcare Medium & Large Business Small Business All Asia-Pacific Canada EMEA Japan Latin America & Caribbean United States 11
Location terminologies may be used to limit search results Ø Category Ø Company Ø City Ø State Ø Salary Taxonomy Strategies LLC The business of organized information 12
Problems with location vocabularies Ø Placenames change over time Ø Codes may be reused over time Ø Familiarity leads to proliferation § Many versions of pseudo- standard lists § Guessing what the standard will become (e. g. KOS as a code for Kosovo) Ø Natural messiness of human affairs § States vs. Provinces vs. Protectorates, Territories, Possessions, Tribal territories, … § Disputed territories (Palestine, Kashmir, Taiwan, Kurdistan) § Proto-states (Kosovo, Somaliland) Ø Complexity tradeoff in software Ø Approximate alignment § Very few invariant properties of between placenames and countries and their groupings business functions leads to errors when mapping data from Ø Passions one purpose to another § Boycotts and death threats have § Geopolitical names get applied to sales territories with different company history and importance (e. g. Japan vs. Asia-Pac) Taxonomy Strategies LLC The business of organized information been received by people who do or do not list particular places in their lists of ‘countries’ 13
Agenda Who we are Overview Using ISO 3166 Accommodating special needs Taxonomy Strategies LLC The business of organized information 14
ISO 3166 is a fundamental vocabulary for dealing with locations Ø UPS maintains a central World Wide Code Repository (WWCR) to store the metadata used throughout the corporation § Based on the data identified in the enterprise data models Ø They also have a Corporate Code Table Database, populated via extract files from the WWCR. § These tables contain the complete list of standardized corporate code values for each code type. § Country codes are ISO 3166 -1, with local extensions obeying ISO restrictions. § The data modeler for the Corporate Code Table Database is the primary contact from UPS to ISO and the UN with respect to codes for countries. Source: Barbara La. Robardier, “Taxonomy and Metadata at United Parcel Service (UPS): World Wide Code Repository and Corporate Code Tables”; Semantic Technologies Conference, San Francisco, 2005. Taxonomy Strategies LLC The business of organized information 15
ISO 3166 is the world’s most widely-used list of country names Ø 3166 is divided into 3 lists: § 3166 -1: Countries § 3166 -2: Sub-regions § 3166 -3: Changes Ø The lists contain three different codes for the same places: § alpha-2 § alpha-3 § numeric-3 Ø The source for the list is the UN Statistics Division Taxonomy Strategies LLC The business of organized information Country or area name numeric 3 alpha -2 alpha 3 Afghanistan 004 AF AFG Åland Islands 248 AX ALA Albania 008 AL ALB Algeria 012 DZ DZA American Samoa 016 AS ASM Andorra 020 AD AND 716 ZW ZWE … Zimbabwe 16
ISO 3166 codes change, and are even reassigned! Country alpha-2 Assigned Removed CZECHOSLOVAKIA CS 1974* 1993 SERBIA AND MONTENEGRO CS 2003 -07 -23 2006 SERBIA RS 2006 -09 -26 current MONTENEGRO ME 2006 -09 -26 current * ISO 3166 first published in 1974. Czechoslovakia dates from 1918. Taxonomy Strategies LLC The business of organized information 17
What is the code for Kosovo? Ø No code currently exists for Kosovo, but “KS” is unassigned. Should we use it in the expectation that eventually it will be assigned? Ø No. Ø To quote from ISO 3166 -1: 1997, clause 8. 1. 3 Userassigned code elements: "If users need code elements to represent country names not included in this part of ISO 3166, the series of letters AA, QM to QZ, XA to XZ, and ZZ, and the series AAA to AAZ, QMA to QZZ, XAA to XZZ, and ZZA to ZZZ respectively and the series of numbers 900 to 999 are available. " Taxonomy Strategies LLC The business of organized information 18
There are many categories of ISO 3166 -1 alpha-2 codes AB AC AD AE AF AG AH AI AJ AK AL AM AN AO AP AQ AR AS AT AU AV AW AX AY AZ BA BB BC BD BE BF BG BH BI BJ BK BL BM BN BO BP BQ BR BS BT BU BV BW BX BY BZ CA CB CC CD CE CF CG CH CI CJ CK CL CM CN CO CP CQ CR CS CT CU CV CW CX CY CZ DA DB DC DD DE DF DG DH DI DJ DK DL DM DN DO DP DQ DR DS DT DU DV DW DX DY DZ EA EB EC ED EE EF EG EH EI EJ EK EL EM EN EO EP EQ ER ES ET EU EV EW EX EY EZ FA FB FC FD FE FF FG FH FI FJ FK FL FM FN FO FP FQ FR FS FT FU FV FW FX FY FZ Officially assigned code element GA GB GC GD GE GF GG GP GQ GR GS GT GU GV GW GX GY GZ AA HA HB HC HD HE HF HG IA IB IC ID IE IF IG User-assigned code element Exceptionally reserved code element JA JB JC JD JE JF JG KA KB KC KD KE KF KG LE LF LG ME MF MG NE NF NG Transitionally reserved code element. LD LA LB LC MA MB MC MD Indeterminately reserved code element NA NB NC ND OA OB OC OD OE OF OG PA PB PC PD PE PF PG Code elements not used at present stage QA QB QC QD QE QF QG Un-assigned code elements RA RB RC RD RE RF SA SB SC SD SE TA TB TC TD TE UA UB UC UD VA VB VC WA WB XA Code element may be used without restriction GH GI GJ GK GL GM GN GO HH HI HJ HK HL HM HN HO HP HQ HR HS HT HU HV HW HX HY HZ IH II IJ IK IL IM IN IO IP IQ IR IS IT IU IV IW IX IY IZ Code element may be used without restriction Code element may be used but restrictions may apply JH JI JJ JK JL JM JN JO JP JQ JR JS JT JU JV JW JX JY JZ KH KI KJ KK KL KM KN KO KP KQ KR KS KT KU KV KW KX KY KZ Code element deleted from ISO 3166 -1; stop using ASAP LH LI LJ LK LL LM LN LO LP LQ MH MI MJ MK ML MM MN MO Code element must not be used in ISO 3166 -1 NH NI NJ NK NL NM NN NO LR LS LT LU LV LW LX LY LZ MP MQ MR MS MT MU MV MW MX MY MZ NP NQ NR NS NT NU NV NW NX NY NZ OH OI OJ OK OL OM ON OO OP OQ OR OS OT OU OV OW OX OY OZ PH PI PJ PK PL PM PN PO PP PQ PR PS PT PU PV PW PX PY PZ Code element must not be used in ISO 3166 -1 Code element free for assignment (by ISO 3166/MA only!) QH QI QJ QK QL QM QN QO QP QQ QR QS QT QU QV QW QX QY QZ RG RH RI RJ RK RL RM RN RO RP RQ RR RS RT RU RV RW RX RY RZ SF SG SH SI SJ SK SL SM SN SO SP SQ SR SS ST SU SV SW SX SY SZ TF TG TH TI TJ TK TL TM TN TO TP TQ TR TS TT TU TV TW TX TY TZ UE UF UG UH UI UJ UK UL UM UN UO UP UQ UR US UT UU UV UW UX UY UZ VD VE VF VG VH VI VJ VK VL VM VN VO VP VQ VR VS VT VU VV VW VX VY VZ WC WD WE WF WG WH WI WJ WK WL WM WN WO WP WQ WR WS WT WU WV WW WX WY WZ XB XC XD XE XF XG XH XI XJ XK XL XM XN XO XP XQ XR XS XT XU XV XW XX XY XZ YA YB YC YD YE YF YG YH YI YJ YK YL YM YN YO YP YQ YR YS YT YU YV YW YX YY YZ ZA ZB ZC ZD ZE ZF ZG ZH ZI ZJ ZK ZL ZM ZN ZO ZP ZQ ZR ZS ZT ZU ZV ZW ZX ZY ZZ These are reserved for local extensions. Use them when you need a new code! http: //www. iso. org/iso/en/prods-services/iso 3166 ma/02 iso-3166 -code-lists/iso_3166 -1_decoding_table. html#AW Taxonomy Strategies LLC The business of organized information 19
Agenda Who we are Overview Using ISO 3166 Accommodating special needs Taxonomy Strategies LLC The business of organized information 20
Usual and unusual requirements for handling country names Ø One client needed to maintain multiple country lists: 3166 Short Name Åland Islands Aruba Kingdom of the Netherlands. Aruba … … … § Effective dates for codes were needed (note – dates were for codes within a system, not for the countries) § Mappings from old countries to successors were also Taxonomy Strategies LLC The business of organized information 512 Albania Aruba STA Code not in Redbook Albania Redbook Short Form Afghanistan, I. S. of Afghanistan, Islamic State of Country Ø Organization maintained a variety of historical information on countries and regions: Redbook Full Form Afghanistan § ISO 3166 used in most systems § Maintained a separate editorial style list for correspondence and reports § Still other lists were used for statistical information on country subdivisions and multi-country regions Redbook Country Name 914 Alpha-2 Bosnia and Herzegovina 314 … Start Date … End Date 1992 Czech Republic CZ 1993 -06 -15 Czechoslovakia CS 1974 1993 -06 -15 Yugoslavia YU 1974 2003 1974 1992 -08 -30 1997 -07 -14 USSR Zaire ZA 1974 Congo, Dem. Rep. of CD 1997 -07 -14 21
Problems when mapping between location terminologies ISO Code ISO Official Short Name ISO Full Names Redbook Country Name Redbook Full Form STA Name (60 chars) Issues Missing entities not listed in any of the recommended country lists. (e. g. The Azores, Kosovo) CIV CÔTE D'IVOIRE BIH BOSNIA AND HERZEGOVINA TLS Côte d’Ivoire - Côte d'Ivoire Use of accents in Country names. - Bosnia and Herzegovina - Bosnia & Herzegovina Inconsistent use of conjunctions special characters ('and' or ampersand ‘&’) TIMOR-LESTE Democratic Republic of Timor -Leste Timor-Leste Democratic Republic of Timor-Leste Direct order of official country name does not alphabetize where users expect to find it. HKG HONG KONG Hong Kong Special Administrative Region of China, P. R. : Hong Kong SRA China, P. R. : Hong Kong Variation between ISO and company practices. MKD MACEDONIA, THE FORMER YUGOSLAV REPUBLIC OF The former Yugoslav Republic of Macedonia, former Yugoslav Republic of - Macedonia, FYR Long names are more frequently abbreviated. PSE PALESTINIAN TERRITORY, OCCUPIED Occupied Palestinian Territory West Bank and Gaza - West Bank and Gaza Unclear what the correct form of name is. Note: Redbook name is from front matter, not table. KNA SAINT KITTS AND NEVIS St. Kitts and Nevis - St. Kitts and Nevis ISO spells out “Saint” but company uses abbreviation. VNM VIET NAM Vietnam - Vietnam Spelling and name order variations between ISO and company Taxonomy Strategies Republic of Côte d'Ivoire Socialist Republic of Viet Nam LLC The business of organized information 22
Enterprise taxonomy governance environment Change Requests & Responses 1: External vocabularies change on their own schedule, with some advance notice. ISO 3166 -1 Other External 2: Team decides when to update facets within Taxonomy Archives Intranet Search Vocabulary Management System ERMS ’ Notifications ERP Custodians Other Internal Taxonomy Strategies Consuming Applications Web CMS CVs CV (Controlled Vocabulary) – The list of values for one facet in the Taxonomy. Published Facets 3: Team adds value via mappings, translations, synonyms, training materials, etc. Other Controlled Items Intranet Nav. DAM … … ’ 4: Updated versions of facets published to consuming Taxonomy Governance applications Environment LLC The business of organized information 23
The client defined a process for country vocabulary changes Ø The different vocabularies had different processes. Ø Custodians of the different vocabularies communicate so that if one changes, the others know about it. Taxonomy Strategies LLC The business of organized information Notify Board – Indicates Role(s) – Indicates Tool(s) 24
Conclusion Ø Location terminologies are commonly used § They fulfill many different purposes Ø Keeping up-to-date is an ongoing effort § The rate of change is low, but ongoing Ø The issues can be complex § Anything out of the ordinary will not be well-served by off-the-shelf software Ø Most organizations have a proliferation of pseudo 3166 vocabularies. Start there to get things under control. Taxonomy Strategies LLC The business of organized information 25
Taxonomy Strategies LLC Questions? Ron Daniel 925 -368 -8371 rdaniel@taxonomystrategies. com November 7, 2006 Copyright 2006 Taxonomy Strategies LLC. All rights reserved.


