
e7545269e2770a50f17d7df67989ecff.ppt
- Количество слайдов: 47
THE REPRESENTATION OF ASSOCIATION SEMANTICS WITH ANNOTATIONS IN A BIODIVERSITY INFORMATICS SYSTEM A Dissertation defense presented to the Department of Computer Science In partial fulfillment for the Requirements of the degree Doctor of Philosophy David A. Gaitros Dissertation Committee Dr. Greg Riccardi Dr. Fredrik Ronquist Dr. Robert van Engelen Dr. Ashok Srinivasan December 8 th, 2006 David A. Gaitros, Dissertation Defense, FSU, December 2006
Overview • • • • Acknowledgements Problem Definition Research Statement Goals and Challenges Semantic Associations Ontology in Semantic Associations Morph. Bank Architecture Morph. Bank Object Relations Annotation and Collections Semantic Annotations Example Morph. Bank Semantic Association Results Future work Questions David A. Gaitros, Dissertation Defense, FSU, December 2006 2
Acknowledgement Morph. Bank Primary Investigators Dr. Fredrik Ronquist Dr. Austin Mast Dr. Corinne Jörgensen Dr. Greg Erickson Dr. Greg Riccardi Dr. Robert van Engelen Dr. Peter Jörgensen Morph. Bank Development Team Mr. Wilfredo Blanco Mr. Steve Winner Mrs. Cynthia Gaitros Ms. Katja Seltmann Mrs. Neelima Jammigumpula Mrs. Karolina Maneva-Jakimoska Mrs. Debbie Paul Mr. Chris Cprek David A. Gaitros, Dissertation Defense, FSU, December 2006 3
Acknowledgement (continued) Research Associates Dr. Gordon Erlebacher Dr. Matthew Buffington Dr. Andy Deans Mr. Shayne Steele Student Research Associates Mr. Gabriel Logan Mr. Stanislov Ustymenko Ms. Allison von Eberstein Mr. Jason Simmons Mr. Wei Zhang Ms. Janet Capps David A. Gaitros, Dissertation Defense, FSU, December 2006 4
Problem Statement • Scientist can produce large amounts of data but cannot always process or search it. • In biodiversity, specimens can be dissected, cataloged, photographed, analyzed, and stored in a variety of media. – Much of the detailed knowledge of these specimens are still kept in personal journals, scientific logs, hand-written notes, and human memory. – Such informal methods of storing and retrieving information represented a problem when other biologists attempting to search for biodiversity subject matter. How can we help solve this problem? David A. Gaitros, Dissertation Defense, FSU, December 2006 5
Research Statement This research adds value to image repositories by collecting and publishing semantically rich user specified associations among images and other objects. David A. Gaitros, Dissertation Defense, FSU, December 2006 6
Research Goals • Gather available data standards for biodiversity and semantic associative systems. • Develop models • Transform models into a relational database • Develop data retrieval methods • Research and develop methods to expose Morph. Bank data • Develop a prototype semantic associative annotation tool • Research automated object association • Show that a semantically rich environment is useful to research scientists David A. Gaitros, Dissertation Defense, FSU, December 2006 7
Research Challenges • Finding consensus on data naming standards • Finding a flexible and reliable taxonomic name server • Developing a model for semantic associations • Developing a prototype of a functional semantic association annotation tool. • The magnitude of the work that must be accomplished. – – – Management of a development team Creation of a development environment Creation of a commercial quality web site Populate database Maintenance of a Biodiversity system Attracting sufficient users to determine the feasibility of such a system David A. Gaitros, Dissertation Defense, FSU, December 2006 8
Semantic Associations • Represents a very complex set of relations among objects • Allows users to gain insight or query for interesting relationships among large amounts of data • Inside a semantically rich environment, ontologies and context are preserved. • The novel approach is integrating ad-hoc annotation data with semantic associations with tools that allow for the discovery of the relationship. David A. Gaitros, Dissertation Defense, FSU, December 2006 9
Semantic Associations that have a direct relations are easy to find, others are not View Information: Head posterior cleaned in alcohol Locality Information: Europe Specimen Data: Female, indeterminate, adult, Diplopepis rosae Contributor: Johan Liljeblad and Fredrik Ronquist General Comments: 12 records Determinations: 15 records Related Phylogenetic Characters: 1 record External Data Sources: 15 sources David A. Gaitros, Dissertation Defense, FSU, December 2006 10
Semantic Associations What we would like to be able to find: Related specimens Data About the View Related comments Other images That use this view Other images By this contributor Contributor Image Other related objects The nature Of the relationships David A. Gaitros, Dissertation Defense, FSU, December 2006 Specimen Data Place where This specimen Was collected 11
Semantic Associations What we would like to be able to find: Other Taxonomic descriptions Specimen Data All related images Any phylogenetic Characters/states All related images All Annotations All Determination Annotations External Data Links David A. Gaitros, Dissertation Defense, FSU, December 2006 Associated publications Annotation contributors Other objects contributed 12
Ontology in Semantic Associations • “Ontology is a specification of a specialization” (Charles Canton) • Ontologies represent a community consensus among participants and there is pressure against change. • Issues: – What if someone desires to use a different taxonomic structure to describe a specimen? – What if an error is discovered in the current ontology? – How do you deviate from the current ontology without distorting the data and relationships? David A. Gaitros, Dissertation Defense, FSU, December 2006 13
Ontology in Semantic Associations • Morph. Bank has several software and internal features that address this problem • Through the use of Semantic Associations with Annotations, users can preserve the use of their own ontologies without inhibiting anyone else or corrupting data • Morph. Bank allows for local modifications on external data references David A. Gaitros, Dissertation Defense, FSU, December 2006 14
Morph. Bank Architecture Administrator Group Coordinator Lead Scientist Unregistered User ITIS Login About News Help Contributor Morph. Bank Security Service Browse Search Upload Admin Annotation Morph. Bank Version 2. 5 Data Service Working Set Under Review David A. Gaitros, Dissertation Defense, FSU, December 2006 Read Only Browse Search Released 15
Morph. Bank Object Model David A. Gaitros, Dissertation Defense, FSU, December 2006 16
Morph. Bank Inheritance Relationships David A. Gaitros, Dissertation Defense, FSU, December 2006 17
Morph. Bank Object Relationships David A. Gaitros, Dissertation Defense, FSU, December 2006 18
Morph. Bank Annotation Architecture David A. Gaitros, Dissertation Defense, FSU, December 2006 19
Morph. Bank Object Relationship Example: Image Annotation Overview Using an XML Schema David A. Gaitros, Dissertation Defense, FSU, December 2006 20
Annotations, Collections, and Associations • The research program started with a concentration on annotations. However, the idea of a collection and building a relationship between the two evolved after time” – Annotation: A note that describes, explains, and/or evaluates the contents of a book, article, video, image, etc. This information is always accompanied by a citation. www. monroecc. edu/depts/library/Library. Glossary. htm – Collection: Several things grouped together or considered a whole. [Webster’s Dictionary] – Associations: Phrases that lend meaning to information, making it understandable and actionable, and provide new and possibly unexpected insights [Boenerges Aleman-Meza] David A. Gaitros, Dissertation Defense, FSU, December 2006 21
Collection David A. Gaitros, Dissertation Defense, FSU, December 2006 22
Select New Taxon name Associate Related Materials Annotation Related Determination Annotations Title, comments, and image Annotation David A. Gaitros, Dissertation Defense, FSU, December 2006 23
Annotation David A. Gaitros, Dissertation Defense, FSU, December 2006 24
Annotation David A. Gaitros, Dissertation Defense, FSU, December 2006 25
Annotation David A. Gaitros, Dissertation Defense, FSU, December 2006 26
Annotation • Text version of previous annotation Specimen record 104260 of an adult female of form Indeterminate Pteroceraphron mirablipennis gathered by D. C. Darling of the institute CNCI. The specimen was gathered on August 4 th, 1981. The specimen was gathered near Indiana: Porter Co. : Cowles Bog: Dune Acres, United States of America. This particular specimen is of class Insecta, order Hymenoptera, family Ceraphronidae, Genus Species Pteroceraphron mirablipennis. This particular image (104272) was submitted by Dr. Andy Deans on August 8 th, 2006 and released November 12 th 2006. The view of the image is of the body with a lateral view using auto-montage photography. No particular preparation. There are six related images of this same specimen. There are two related determination annotations. (1) Which identifies the wings, antennae match key and (2) that states this diagnosis if for the genus Pteroceraphron David A. Gaitros, Dissertation Defense, FSU, December 2006 27
Annotation
Semantic Annotation David A. Gaitros, Dissertation Defense, FSU, December 2006 29
Semantic Annotation David A. Gaitros, Dissertation Defense, FSU, December 2006 30
Research Results • Collections are a form of annotations by the fact that items that are in a collection define a relationship. • Inheritance is a strategy for annotation whereby we now know we can extend this model into new meanings. • Through the base. Object class we can form complex relationships and through annotations we can provide meaning to those relationships. • Through Collections we can form relationships of objects that would otherwise have no direct links to each other and through annotations we can provide meaning to those relationships. • There is no limit to how this capability can be extended. David A. Gaitros, Dissertation Defense, FSU, December 2006 31
Research Results • Through inheritance we restrict the semantics that are used with objects to improve context searches. – Fields of data are not open to interpretation – Fields are distinct to the objects they reference – Example: Determination annotations inherit from Annotations and further restrict the meaning of that type of annotation. • Tools can now be built that allow for more extensive and elaborate building of relationships. David A. Gaitros, Dissertation Defense, FSU, December 2006 32
Research Results – Version 2. 2 and 2. 5 Morph. Bank documented and released • Currently working on subsequent versions • Updating documentation • Under Configuration and Control – 300 -600 hits on the web site per day – 3 accepted conference papers, 1 Biodiversity Journal publication, 3 Taxonomic Data Working Group Presentation, 1 ATOL/PBI Presentation – Over 100, 000 data items – Over 60, 000 images – 98 Groups – 121 Registered users from 85 organizations (Ex: FSU, UF, Harvard, Yale, USC, American Museum of Natural History, Duke, Johns Hopkins) – 350 Annotations David A. Gaitros, Dissertation Defense, FSU, December 2006 33
Research Results • • 336 Determination Annotations 1, 544 distinct objects contained in 384 Collections Received very positive feedback from trial participants Received praise from the National Science Foundation for the quality and quantity of work accomplished to date • First Biodiversity System to offer semantic association annotations, general annotations, legacy annotations, and determination annotations • Being used currently by organizations for collaboration on specimen determinations David A. Gaitros, Dissertation Defense, FSU, December 2006 34
Research Results • Developed prototype for semantic search on internally stored XML documents • External objects are exposed through LSIDs in an RDF format. XML documents and being exposed and used by other organizations and data repositories – Morphobank – Genbank – Provide direct links using the Morph. Bank “Show” function as URLs used in Conference and Journal papers • As the amount of data grows in Morph. Bank so does the wealth of semantic associations. David A. Gaitros, Dissertation Defense, FSU, December 2006 35
Research Results David Gaitros’ Contribution – Analysis of the problem • • – Analysis of the original Morph. Bank version 1. 0. Analysis of data requirements and gathering of initial Morph. Bank requirements. Research of the current state of knowledge of annotations in scientific systems. Research of available taxonomic name servers. Modeling • • • – Creation of the Morph. Bank security model. Creation of the Morph. Bank data model and schema. Creation of the semantic association annotation model. Project Manager • • • Leadership of the design team for the Morph. Bank system. Management of the production of Morph. Bank version 2. 2 and 2. 5. Procurement of hardware and software licenses. Management of the Morph. Bank NSF/BDI grant under the direction of the Primary Investigators. Oversight of the functional and design review meetings with users and primary investigators. Presentations of the project at conferences and workshops. David A. Gaitros, Dissertation Defense, FSU, December 2006 36
Research Results – Software Design and Development • Design and implementation of the initial Morph. Bank Administration Model. • Design and implementation of the initial version of the Taxonomic name selection module. • Design and implementation of the Morph. Bank Annotation Software. • Design and implementation of the initial version of the Morph. Bank Collection module. • Design of the external search and exposure feature for the release of Morph. Bank images in response to Morph. OBank external references requirements. • Design of the software test plans. • Contributor to the Morph. Bank user’s manual. David A. Gaitros, Dissertation Defense, FSU, December 2006 37
Future Work • • • Continue to extend the capability of Annotations and Collections Turn on the feature that allows for the annotation of any object Turn on the feature that allows for any object to be in a collection Research more efficient search techniques for semantic associations Complete development and release of phylogenetic character state software Research the possibility of further developing the extensible schema capability Analysis of the complexity of relationships of the objects associated through collections and annotations Expand mature the use of Life Science Identifiers Implement a security strategy that is separate from the implementation of the software Map the current data schema to the ABCD standard for the purpose of exporting data. Publish results in high quality journal. Continued exposure at conferences and workshops David A. Gaitros, Dissertation Defense, FSU, December 2006 38
QUESTIONS David A. Gaitros, Dissertation Defense, FSU, December 2006 39
Environment Requirements • One of the major problems with semantic associations is the complexity and reliability of the relationships – Allowing unqualified individuals to make contributions to the data repositories induces errors that makes the data unreliable – Relationship connections are easily corrupted if heuristics are not followed David A. Gaitros, Dissertation Defense, FSU, December 2006 40
Environment Requirements • Features of Morph. Bank that satisfy environment requirements – – Secure login of ALL contributors Restriction of contributors to the area of their expertise Group membership and data ownership Categories of data • In-progress • Under review • Released (cannot be altered only annotated) – Strict adherence to add, update, view, and delete heuristics – All objects are centrally cataloged and uniquely identifiable – All objects can be accessed via a globally unique identifier David A. Gaitros, Dissertation Defense, FSU, December 2006 41
Semantic Annotation • We want multiple annotations per any Morph. Bank object to allow scientists to add ad-hoc data to the database without specifically creating new tables or columns in existing tables. – How to store and retrieve this information in an efficient and reliable manner. – How to relate these annotations correctly to all other objects. David A. Gaitros, Dissertation Defense, FSU, December 2006 42
Semantic Annotation • Most disciplines have a common language and phrases that they use in describing articles in their area. – Example: Communication of a pilot to a control tower: Pilot: Tallahassee ground control this is Cessna 3245 Yankee on ramp ready for taxi to active runway with information Bravo. – We can pick out specific information that appears in an exact order. This system of formal semantics in aviation communication allows the participants to communicate efficiently and effectively without misunderstanding. – We can schematize this conversation: David A. Gaitros, Dissertation Defense, FSU, December 2006 43
Semantic Annotation
Semantic Annotation • With a Biological Image annotation we have several distinct parts: – – – – Specimen ( biological item of interest 0 Image ( A specimen may have more then one image) Type Annotation Text Description of Annotation Title of Annotation Date (Time Stamp) Location (X/Y coordinate of the area on the image) Associate Morph. Bank Object ( Image, Specimen, Publication, Group, User, Annotation, Location, View). David A. Gaitros, Dissertation Defense, FSU, December 2006 45
Semantic Annotation • All aspects of the annotation can be placed into a schema and searched accurately. • Searches on plain text presents a problem. – Example: Web Search for “Fruit Fly” – Solution: Allow researchers to use restricted semantic annotation in writing the text description • • Place data in an XML document Items can be searched quickly and efficiently No restrictions on content New semantics can be added at anytime. David A. Gaitros, Dissertation Defense, FSU, December 2006 46
Semantic Annotation