- Количество слайдов: 59
Terminology and Metadata Whys and hows Harold Solbrig Apelon, Inc
Outline • “Terminology” – Why does it matter? • Metadata and its relationship to terminology • Creating and managing terminological resources • Description of Apelon and its role in all of this
Terminology – why does it matter? • • Information technology (IT) is about _____? Depending on your perspective, information: (a) Reduces uncertainty on the part of the receiver (b) IS the reduction of uncertainty on the part of the receiver • • The transfer of information between a sender and a receiver is known as “communication” The business of IT is accurate, timely and relevant communication.
Communication and Language • Language - a “specification” that enables communication – Semantics - the association between signs or symbols and their intended “meaning” – Syntax - the rules for ordering and structuring the signs into phrases and sentences – Pragmatics - the relationship between signs and symbols and the recipient. Broadly, the shared context.
The Semiotic Triangle Thought or Reference Refers to Referent Symbolises Stands for C. K Ogden and I. A. Richards. The Meaning of Meaning. Symbol
The Semiotic Triangle Thought or Reference Refers to Referent Symbolises Stands for C. K Ogden and I. A. Richards. The Meaning of Meaning. Symbol “Rose”, “Clip. Art”
The Communication Process CONCEPT Symbolises Refers To “I see a Clip. Art image of a rose” Stands For Referent CONCEPT “Rose”, “Clip. Art” Symbol Refers To Symbolises “Rose”, “Clip. Art” Symbol Stands For
The Communication Process Semantics CONCEPT Symbolises Refers To “I see a Clip. Art image of a rose” Stands For Referent CONCEPT “Rose”, “Clip. Art” Symbol Refers To Symbolises “Rose”, “Clip. Art” Symbol Stands For
The Communication Process Semantics CONCEPT Symbolises Refers To “I see a Clip. Art image of a rose” Stands For Referent CONCEPT “Rose”, “Clip. Art” Symbol Syntax Refers To Symbolises Stands For
The Communication Process Semantics CONCEPT Symbolises Refers To Stands For Referent “Rose”, “Clip. Art” Symbol Refers To Symbolises “I see a Clip. Art image of a rose” “Rose”, “Clip. Art” Symbol Context Stands For Context Syntax Shared Context
Shared Context Impacts how much information can be contained in a symbol. Information / Symbol No Shared Context Shared Sun Shared Universe Shared Planet Shared Species Common Culture Common Language Similar Education Common Profession Common Specialty
Shared Universe Pioneer 10 & 11 Voyager “Golden Record”
Common Specialty “Interferons are a family of cytokines that exerts antiviral, antitumor and immunomodulatory actions by inducing a complex set of proteins. One of the best known IFN-induced protein is the ds. RNA-dependent protein kinase (PKR), that mediates both antiviral and anticellular activities. PKR inhibits translation initiation through the phosphorylation of the alpha subunit of the initiation factor e. IF-2 (e. IF-2 ) and also controls the activation of several transcription factors such as NF- B, p 53, or STATs. …” Marino Estiban. Induction of apoptosis by the ds. RNA-dependent protein kinase (PKR): Mechanism of action. Apoptosis, Springer, Volume 5, Number 2, April 2000
The impact of context on communication Shared context: – Allows information to be communicated in larger, more succinct “chunks”. • Drug, analgesic and NSAID are all “chunks”, yet differ markedly in conceptual complexity. – Enables specialized symbol sets: • Contrast the amount of information contained in the formula E=MC 2 versus that contained in this presentation. . .
Contextual Formalism The degree of formality in a shared context can vary across a wide spectrum: – Tacit context which is simply presumed – Contextual negotiation proceeding the actual message – Rigorous and formal rules and documents describing the form and possible meanings behind every message and phrase.
Factors Effecting the Degree Contextual Formalism • Number of participating parties – Formalism needs to increase as number of participants increase • Geographic, cultural and temporal proximity of communicators – The further apart communicators are, the less they can assume • Amount of shared context – The more you have, the more important it becomes to be organized
Factors Effecting the Degree Contextual Formalism • The cost of imprecise communication – Poetry and literature - low cost (some may argue actual gain) – Technical and professional - high to very high cost • What is the cost of assuming the units of a thrust specification? • What is the cost of assuming the dose of a prescription? • What is the cost of assuming the century in which the communication originated?
Terminology • Symbols – Their encoding and decoding – Vocabularies, Dictionaries, Enumerations, Codes, . . . • Context – Recording and sharing – Glossaries, textbooks, college courses, operations manuals, information models
Terminology in the Digital Era • Multi-layered – We’ll ignore the lower layers – polarity of diodes representing bits, bits representing numbers, characters, …
Terminology in the Digital Era • Focus is on metadata – What is a particular data collection about? – What information can be found in it? – How is that information recorded? – What are the contextual assumptions?
The Communication Process Display Form CONCEPT Symbolises Refers To Symbolises Decode Stands For Referent Encode Transform Stands For
Metadata and the Communication Process • Metadata describes the forms, data bases, encoding processes, etc. • Terminology is the component of metadata that: – Manages symbols and their “meanings” • For users (e. g. what are the possible choices for field ‘x’, and what does each of them mean) • For IT professionals (the Information Model) – Maintains context • What else does a given specialty, department, company, etc. assume is known in beyond the simple definition of symbols
Terminology and Metadata • Standard modeling tools (UMLS, XML Schema, …) have provided a way to communicate the structure and content of data stores and messages. • Models, however, have to include information about their intended context and meaning to allow data sharing across domains. Terminology provides (or is, in some senses) this component.
Terminology and Metadata (continued) • Amongst other things, ISO 11179 provides a model of how terminology and metadata go together – It has the advantage of being (or being in the process of becoming) a standard – ISO 11179 also provides a standard model of terminology content, which would provide a vehicle for interchange in the appropriate contexts. • There are other models of interest as well…
Terminology and Metadata in 11179
Terminology Sounds easy enough – why not just put together a set of tables and get going? Because… 1) Terminology has to be shared across multiple domains. This, after all, is its raison d'être 1. 2. 2) The model of the terminology itself has to be shareable. The semantics of the terminology have to be shareable. Terminology and knowledge management are inextricably intertwined 1. 2. 3. 4. 5. Fractal in nature – you can never stop adding Boundaries are imprecise and expand This means that there is no such thing as a “small terminology” The components of terminology can also be viewed as declarative programs. This means that the rigor of software development is applicable as well.
Terminology (continued) 3) The knowledge behind terminology needs to be shared – Terminology resources depend on specialists (e. g. doctors, physicists, biologists, geneticists, etc…) – Development is expensive – Maintenance is often very expensive.
Prerequisites to Terminology Creation 1. Know the standards 1. 2. General standards (SKOS, RDF, OWL, 11179, SBVR, XML, UML, XMI, …) Domain specific. 1. 2. Know the tools 1. 2. 3. Example: Medical – HL 7, LQS, CTS-2, UMLS, SNOMED, … Development: TDE, Protégé, Obo Edit, Fact++, Racer, Jena, EVS, Lex. Grid… Distribution: DTS, RDF, OWL, SKOS, … Know the content 1. 2. General (Dublin Core, CYC, SUMO, …) Domain specific (Medical: NCIt, UMLS, ICD’s, SNOMED-CT, Gene Ontology, …)
Terminology and Workflow • Terminology management includes: – Discovery – Federation – Authoring – Review – Distribution – Adoption
Process (Example Sequencing) Import Report Author Plan Federate Review Translate Incorporate Customize Approve Map Version Maintain Subscribe Submit Process Submissions Migrate Reevaluate Replace Transform Extract Load Post-coordinate Review in Context Publish Access
Content Update Applications Semantic Media. Wiki (++) VOSER Annotations and Change Requests Status Report Core SME Submission Work Flow
Key Points • Terminology is a critical component for cross-discipline, cross-enterprise information sharing. • Terminology development is a non-trivial task – it needs to be done correctly. • Terminology resources need to be federated, shared and reused. • But… there’s help!
Apelon • Largest provider of terminology products and services • Unique expertise
Employees • Internationally known terminology experts • Regular contributors to industry standards, publications and conferences
Mission • Apelon software and services support the development, maintenance, and practical deployment of structured terminologies • Put another way, we help our customers - create, - maintain, and - leverage • standard and enterprise terminologies • It’s all about speaking the same language
Facts Most of the world’s standard healthcare terminology resources have been built and/or are maintained with Apelon tools, including – – – SNOMED CPT ICD-9 -CM NDF-RT UMLS
Software Products 1. Terminology Development Environment (TDE) 2. Distributed Terminology System (DTS) 3. Term. Works
1 – Terminology Authoring (TDE) Author ICD CPT SNOMED NDF-RT . . . • Tools to create and maintain structured terminologies • Improve productivity, data quality and scalability • Enhance the value of enterprise assets – Commercial product – CPT – Internal infrastructure – Kaiser Permanente CMT – Public benefit – SNOMED CT, NDFRT, NCI Thesaurus
1 - TDE • Based on Description Logic (DL) – Automated classification – Identifies redundancy – Provably consistent terminology • Collaborative features – Distributed authoring – Workflow – Conflict identification / resolution • Version control • Customizable interface and constraints
1 – Automatic Classification Body Disease is-a part-of Cardiac Disease Heart part-of is-a Mitral Stenosis affects Mitral Valve
2 – Terminology Deployment Deploy Applications • Terminology servers reduce costs of terminology acquisition, integration and management • Applications – EMRs and CDRs • Next. Gen, VA – Knowledge repositories • CDC, NCI Customize – Healthcare information portals • HKHA
2 – What is a Terminology Server? A terminology server is • a networked software component • that centralizes terminology content and reasoning • to provide (complete, consistent and effective) terminology services for other network applications
2 – How is a Terminology Server Used? • By informaticists to create, maintain, localize and map terminologies • By clinical applications and their users to select and record standardized data • By integration engines to map data elements between applications
2 - Examples of Terminology Services • Term/name normalization: What is the SNOMED CT name for heart attack? • Code translation: Myocardial Infarction 410. 9 What is the ICD-9 code for Myocardial Infarction? Yes • Grouping and aggregation: Is Myocardial Infarction a Cardiac Disease? Streptokinase • Clinical knowledge: What drug treats Myocardial Infarction? OK • Local information: Add L 227 as the local code for Serum Calcium.
2 – Apelon’s DTS Product • Integrated repository for all terminologies – – Varying release cycles regular releases Inconsistent data models common object model Independent views integrated view with mappings Current snapshot version management • Extensible with local terminology and maps • Subsets • Easy subscription updates (with exception reports) • Desktop editor and webtop browser • Workflow support • Flexible import, export and integration • Open source
Terminology Server Standards • OMG’s Lexicon Query Services (LQS) – AKA TQS • Health Level Seven (and ANSI) Common Terminology Services (CTS) – In ISO Standardization as well • CTS-II – In process – Led by Apelon
DTS and Standards CTS wrapper for DTS is available INTEL Healthcare SOA using DTS for CTS extensions – Currently ahead of CTS-II – Will be fed back into CTS-II
2 – Knowledge Base (KB) • • • Clinical (SNOMED CT) Reimbursement (ICD, CPT, HCPCS) Pharmaceuticals (Multum, NDF-RT) Labs (LOINC) Nursing (NIC, NOC, and NANDA) Adverse events (Med. DRA, COSTART, WHOART) • Extensive crosswalks • Mappings to Me. SH and UMLS CUIs • Local additions
2 - Software Architecture DTS Database DTS Server Tomcat (DTS Client) DTS Editor DTS Browser DTS Client Application
3 - Term. Works • • Terminology web service Easy, low cost, rapid mapping solution Web services standards ( SOAP / WSDL ) Powerful matching capabilities Comprehensive set of terminologies Deployment via MS Office applications Examples – Charge master management – Laboratory data integration – Legacy data mapping
3 - Easy to Use Excel Plug-in Apelon Excel front end Content Expert Web Services Term. Works Database And Tools
Terminology Consulting • Longstanding experience • Broad range of engagements – Project planning and management – Terminology modeling – Mapping of current code sets – Custom product additions – Tailored terminology applications • Field proven principles and practices
Representative Engagements • Department of Veterans Affairs – Enterprise reference terminology and software; extensive consulting • SNOMED International – Authoring software and custom enhancements for SNOMED CT • National Cancer Institute – Authoring and deployment software; extensive consulting • Department of Defense – Enterprise medication mediation server • Food and Drug Administration – Data element standardization of Structured Product Label • Accenture – Comprehensive data standardization for a National Health Information Network (NHIN) prototype • Department of Health and Human Services – Data element standardization • Kaiser Permanente – Authoring software for the Convergent Medical Terminology (CMT) • Care Science / Quovadx – Terminology server; consulting
Recent Engagements • National Cancer Institute – Restructuring the NCI Thesaurus for federation and integration with Open Biomedical Ontologies (OBO) and other resources – Development of a Semantic Mediawiki based collaboration environment • Cancer Bioinformatics Grid – Using terminology / metadata links for scientific reasoning. • Mayo Clinic (via NCI) – ICD 11 terminology development environment
Apelon Advantages • • • Deep understanding of terminology Extensive practical experience Well known, well connected, well informed Stable organization GSA listed