Скачать презентацию THE REPRESENTATION OF ASSOCIATION SEMANTICS WITH ANNOTATIONS IN Скачать презентацию THE REPRESENTATION OF ASSOCIATION SEMANTICS WITH ANNOTATIONS IN

e7545269e2770a50f17d7df67989ecff.ppt

  • Количество слайдов: 47

THE REPRESENTATION OF ASSOCIATION SEMANTICS WITH ANNOTATIONS IN A BIODIVERSITY INFORMATICS SYSTEM A Dissertation THE REPRESENTATION OF ASSOCIATION SEMANTICS WITH ANNOTATIONS IN A BIODIVERSITY INFORMATICS SYSTEM A Dissertation defense presented to the Department of Computer Science In partial fulfillment for the Requirements of the degree Doctor of Philosophy David A. Gaitros Dissertation Committee Dr. Greg Riccardi Dr. Fredrik Ronquist Dr. Robert van Engelen Dr. Ashok Srinivasan December 8 th, 2006 David A. Gaitros, Dissertation Defense, FSU, December 2006

Overview • • • • Acknowledgements Problem Definition Research Statement Goals and Challenges Semantic Overview • • • • Acknowledgements Problem Definition Research Statement Goals and Challenges Semantic Associations Ontology in Semantic Associations Morph. Bank Architecture Morph. Bank Object Relations Annotation and Collections Semantic Annotations Example Morph. Bank Semantic Association Results Future work Questions David A. Gaitros, Dissertation Defense, FSU, December 2006 2

Acknowledgement Morph. Bank Primary Investigators Dr. Fredrik Ronquist Dr. Austin Mast Dr. Corinne Jörgensen Acknowledgement Morph. Bank Primary Investigators Dr. Fredrik Ronquist Dr. Austin Mast Dr. Corinne Jörgensen Dr. Greg Erickson Dr. Greg Riccardi Dr. Robert van Engelen Dr. Peter Jörgensen Morph. Bank Development Team Mr. Wilfredo Blanco Mr. Steve Winner Mrs. Cynthia Gaitros Ms. Katja Seltmann Mrs. Neelima Jammigumpula Mrs. Karolina Maneva-Jakimoska Mrs. Debbie Paul Mr. Chris Cprek David A. Gaitros, Dissertation Defense, FSU, December 2006 3

Acknowledgement (continued) Research Associates Dr. Gordon Erlebacher Dr. Matthew Buffington Dr. Andy Deans Mr. Acknowledgement (continued) Research Associates Dr. Gordon Erlebacher Dr. Matthew Buffington Dr. Andy Deans Mr. Shayne Steele Student Research Associates Mr. Gabriel Logan Mr. Stanislov Ustymenko Ms. Allison von Eberstein Mr. Jason Simmons Mr. Wei Zhang Ms. Janet Capps David A. Gaitros, Dissertation Defense, FSU, December 2006 4

Problem Statement • Scientist can produce large amounts of data but cannot always process Problem Statement • Scientist can produce large amounts of data but cannot always process or search it. • In biodiversity, specimens can be dissected, cataloged, photographed, analyzed, and stored in a variety of media. – Much of the detailed knowledge of these specimens are still kept in personal journals, scientific logs, hand-written notes, and human memory. – Such informal methods of storing and retrieving information represented a problem when other biologists attempting to search for biodiversity subject matter. How can we help solve this problem? David A. Gaitros, Dissertation Defense, FSU, December 2006 5

Research Statement This research adds value to image repositories by collecting and publishing semantically Research Statement This research adds value to image repositories by collecting and publishing semantically rich user specified associations among images and other objects. David A. Gaitros, Dissertation Defense, FSU, December 2006 6

Research Goals • Gather available data standards for biodiversity and semantic associative systems. • Research Goals • Gather available data standards for biodiversity and semantic associative systems. • Develop models • Transform models into a relational database • Develop data retrieval methods • Research and develop methods to expose Morph. Bank data • Develop a prototype semantic associative annotation tool • Research automated object association • Show that a semantically rich environment is useful to research scientists David A. Gaitros, Dissertation Defense, FSU, December 2006 7

Research Challenges • Finding consensus on data naming standards • Finding a flexible and Research Challenges • Finding consensus on data naming standards • Finding a flexible and reliable taxonomic name server • Developing a model for semantic associations • Developing a prototype of a functional semantic association annotation tool. • The magnitude of the work that must be accomplished. – – – Management of a development team Creation of a development environment Creation of a commercial quality web site Populate database Maintenance of a Biodiversity system Attracting sufficient users to determine the feasibility of such a system David A. Gaitros, Dissertation Defense, FSU, December 2006 8

Semantic Associations • Represents a very complex set of relations among objects • Allows Semantic Associations • Represents a very complex set of relations among objects • Allows users to gain insight or query for interesting relationships among large amounts of data • Inside a semantically rich environment, ontologies and context are preserved. • The novel approach is integrating ad-hoc annotation data with semantic associations with tools that allow for the discovery of the relationship. David A. Gaitros, Dissertation Defense, FSU, December 2006 9

Semantic Associations that have a direct relations are easy to find, others are not Semantic Associations that have a direct relations are easy to find, others are not View Information: Head posterior cleaned in alcohol Locality Information: Europe Specimen Data: Female, indeterminate, adult, Diplopepis rosae Contributor: Johan Liljeblad and Fredrik Ronquist General Comments: 12 records Determinations: 15 records Related Phylogenetic Characters: 1 record External Data Sources: 15 sources David A. Gaitros, Dissertation Defense, FSU, December 2006 10

Semantic Associations What we would like to be able to find: Related specimens Data Semantic Associations What we would like to be able to find: Related specimens Data About the View Related comments Other images That use this view Other images By this contributor Contributor Image Other related objects The nature Of the relationships David A. Gaitros, Dissertation Defense, FSU, December 2006 Specimen Data Place where This specimen Was collected 11

Semantic Associations What we would like to be able to find: Other Taxonomic descriptions Semantic Associations What we would like to be able to find: Other Taxonomic descriptions Specimen Data All related images Any phylogenetic Characters/states All related images All Annotations All Determination Annotations External Data Links David A. Gaitros, Dissertation Defense, FSU, December 2006 Associated publications Annotation contributors Other objects contributed 12

Ontology in Semantic Associations • “Ontology is a specification of a specialization” (Charles Canton) Ontology in Semantic Associations • “Ontology is a specification of a specialization” (Charles Canton) • Ontologies represent a community consensus among participants and there is pressure against change. • Issues: – What if someone desires to use a different taxonomic structure to describe a specimen? – What if an error is discovered in the current ontology? – How do you deviate from the current ontology without distorting the data and relationships? David A. Gaitros, Dissertation Defense, FSU, December 2006 13

Ontology in Semantic Associations • Morph. Bank has several software and internal features that Ontology in Semantic Associations • Morph. Bank has several software and internal features that address this problem • Through the use of Semantic Associations with Annotations, users can preserve the use of their own ontologies without inhibiting anyone else or corrupting data • Morph. Bank allows for local modifications on external data references David A. Gaitros, Dissertation Defense, FSU, December 2006 14

Morph. Bank Architecture Administrator Group Coordinator Lead Scientist Unregistered User ITIS Login About News Morph. Bank Architecture Administrator Group Coordinator Lead Scientist Unregistered User ITIS Login About News Help Contributor Morph. Bank Security Service Browse Search Upload Admin Annotation Morph. Bank Version 2. 5 Data Service Working Set Under Review David A. Gaitros, Dissertation Defense, FSU, December 2006 Read Only Browse Search Released 15

Morph. Bank Object Model David A. Gaitros, Dissertation Defense, FSU, December 2006 16 Morph. Bank Object Model David A. Gaitros, Dissertation Defense, FSU, December 2006 16

Morph. Bank Inheritance Relationships David A. Gaitros, Dissertation Defense, FSU, December 2006 17 Morph. Bank Inheritance Relationships David A. Gaitros, Dissertation Defense, FSU, December 2006 17

Morph. Bank Object Relationships David A. Gaitros, Dissertation Defense, FSU, December 2006 18 Morph. Bank Object Relationships David A. Gaitros, Dissertation Defense, FSU, December 2006 18

Morph. Bank Annotation Architecture David A. Gaitros, Dissertation Defense, FSU, December 2006 19 Morph. Bank Annotation Architecture David A. Gaitros, Dissertation Defense, FSU, December 2006 19

Morph. Bank Object Relationship Example: Image Annotation Overview Using an XML Schema David A. Morph. Bank Object Relationship Example: Image Annotation Overview Using an XML Schema David A. Gaitros, Dissertation Defense, FSU, December 2006 20

Annotations, Collections, and Associations • The research program started with a concentration on annotations. Annotations, Collections, and Associations • The research program started with a concentration on annotations. However, the idea of a collection and building a relationship between the two evolved after time” – Annotation: A note that describes, explains, and/or evaluates the contents of a book, article, video, image, etc. This information is always accompanied by a citation. www. monroecc. edu/depts/library/Library. Glossary. htm – Collection: Several things grouped together or considered a whole. [Webster’s Dictionary] – Associations: Phrases that lend meaning to information, making it understandable and actionable, and provide new and possibly unexpected insights [Boenerges Aleman-Meza] David A. Gaitros, Dissertation Defense, FSU, December 2006 21

Collection David A. Gaitros, Dissertation Defense, FSU, December 2006 22 Collection David A. Gaitros, Dissertation Defense, FSU, December 2006 22

Select New Taxon name Associate Related Materials Annotation Related Determination Annotations Title, comments, and Select New Taxon name Associate Related Materials Annotation Related Determination Annotations Title, comments, and image Annotation David A. Gaitros, Dissertation Defense, FSU, December 2006 23

Annotation David A. Gaitros, Dissertation Defense, FSU, December 2006 24 Annotation David A. Gaitros, Dissertation Defense, FSU, December 2006 24

Annotation David A. Gaitros, Dissertation Defense, FSU, December 2006 25 Annotation David A. Gaitros, Dissertation Defense, FSU, December 2006 25

Annotation David A. Gaitros, Dissertation Defense, FSU, December 2006 26 Annotation David A. Gaitros, Dissertation Defense, FSU, December 2006 26

Annotation • Text version of previous annotation Specimen record 104260 of an adult female Annotation • Text version of previous annotation Specimen record 104260 of an adult female of form Indeterminate Pteroceraphron mirablipennis gathered by D. C. Darling of the institute CNCI. The specimen was gathered on August 4 th, 1981. The specimen was gathered near Indiana: Porter Co. : Cowles Bog: Dune Acres, United States of America. This particular specimen is of class Insecta, order Hymenoptera, family Ceraphronidae, Genus Species Pteroceraphron mirablipennis. This particular image (104272) was submitted by Dr. Andy Deans on August 8 th, 2006 and released November 12 th 2006. The view of the image is of the body with a lateral view using auto-montage photography. No particular preparation. There are six related images of this same specimen. There are two related determination annotations. (1) Which identifies the wings, antennae match key and (2) that states this diagnosis if for the genus Pteroceraphron David A. Gaitros, Dissertation Defense, FSU, December 2006 27

Annotation <annotationid = ‘ 110537’ type = ‘image’> <imageid>104272</imageid> <object> <name> lanceolot wings</name> <location> Annotation 104272 lanceolot wings 25. 2352. 1 This is a lanceolot wing 67572 Andy Deans 3 Hym. Atol 2006 -08 -04. . . David A. Gaitros, Dissertation Defense, FSU, December 2006 28

Semantic Annotation David A. Gaitros, Dissertation Defense, FSU, December 2006 29 Semantic Annotation David A. Gaitros, Dissertation Defense, FSU, December 2006 29

Semantic Annotation David A. Gaitros, Dissertation Defense, FSU, December 2006 30 Semantic Annotation David A. Gaitros, Dissertation Defense, FSU, December 2006 30

Research Results • Collections are a form of annotations by the fact that items Research Results • Collections are a form of annotations by the fact that items that are in a collection define a relationship. • Inheritance is a strategy for annotation whereby we now know we can extend this model into new meanings. • Through the base. Object class we can form complex relationships and through annotations we can provide meaning to those relationships. • Through Collections we can form relationships of objects that would otherwise have no direct links to each other and through annotations we can provide meaning to those relationships. • There is no limit to how this capability can be extended. David A. Gaitros, Dissertation Defense, FSU, December 2006 31

Research Results • Through inheritance we restrict the semantics that are used with objects Research Results • Through inheritance we restrict the semantics that are used with objects to improve context searches. – Fields of data are not open to interpretation – Fields are distinct to the objects they reference – Example: Determination annotations inherit from Annotations and further restrict the meaning of that type of annotation. • Tools can now be built that allow for more extensive and elaborate building of relationships. David A. Gaitros, Dissertation Defense, FSU, December 2006 32

Research Results – Version 2. 2 and 2. 5 Morph. Bank documented and released Research Results – Version 2. 2 and 2. 5 Morph. Bank documented and released • Currently working on subsequent versions • Updating documentation • Under Configuration and Control – 300 -600 hits on the web site per day – 3 accepted conference papers, 1 Biodiversity Journal publication, 3 Taxonomic Data Working Group Presentation, 1 ATOL/PBI Presentation – Over 100, 000 data items – Over 60, 000 images – 98 Groups – 121 Registered users from 85 organizations (Ex: FSU, UF, Harvard, Yale, USC, American Museum of Natural History, Duke, Johns Hopkins) – 350 Annotations David A. Gaitros, Dissertation Defense, FSU, December 2006 33

Research Results • • 336 Determination Annotations 1, 544 distinct objects contained in 384 Research Results • • 336 Determination Annotations 1, 544 distinct objects contained in 384 Collections Received very positive feedback from trial participants Received praise from the National Science Foundation for the quality and quantity of work accomplished to date • First Biodiversity System to offer semantic association annotations, general annotations, legacy annotations, and determination annotations • Being used currently by organizations for collaboration on specimen determinations David A. Gaitros, Dissertation Defense, FSU, December 2006 34

Research Results • Developed prototype for semantic search on internally stored XML documents • Research Results • Developed prototype for semantic search on internally stored XML documents • External objects are exposed through LSIDs in an RDF format. XML documents and being exposed and used by other organizations and data repositories – Morphobank – Genbank – Provide direct links using the Morph. Bank “Show” function as URLs used in Conference and Journal papers • As the amount of data grows in Morph. Bank so does the wealth of semantic associations. David A. Gaitros, Dissertation Defense, FSU, December 2006 35

Research Results David Gaitros’ Contribution – Analysis of the problem • • – Analysis Research Results David Gaitros’ Contribution – Analysis of the problem • • – Analysis of the original Morph. Bank version 1. 0. Analysis of data requirements and gathering of initial Morph. Bank requirements. Research of the current state of knowledge of annotations in scientific systems. Research of available taxonomic name servers. Modeling • • • – Creation of the Morph. Bank security model. Creation of the Morph. Bank data model and schema. Creation of the semantic association annotation model. Project Manager • • • Leadership of the design team for the Morph. Bank system. Management of the production of Morph. Bank version 2. 2 and 2. 5. Procurement of hardware and software licenses. Management of the Morph. Bank NSF/BDI grant under the direction of the Primary Investigators. Oversight of the functional and design review meetings with users and primary investigators. Presentations of the project at conferences and workshops. David A. Gaitros, Dissertation Defense, FSU, December 2006 36

Research Results – Software Design and Development • Design and implementation of the initial Research Results – Software Design and Development • Design and implementation of the initial Morph. Bank Administration Model. • Design and implementation of the initial version of the Taxonomic name selection module. • Design and implementation of the Morph. Bank Annotation Software. • Design and implementation of the initial version of the Morph. Bank Collection module. • Design of the external search and exposure feature for the release of Morph. Bank images in response to Morph. OBank external references requirements. • Design of the software test plans. • Contributor to the Morph. Bank user’s manual. David A. Gaitros, Dissertation Defense, FSU, December 2006 37

Future Work • • • Continue to extend the capability of Annotations and Collections Future Work • • • Continue to extend the capability of Annotations and Collections Turn on the feature that allows for the annotation of any object Turn on the feature that allows for any object to be in a collection Research more efficient search techniques for semantic associations Complete development and release of phylogenetic character state software Research the possibility of further developing the extensible schema capability Analysis of the complexity of relationships of the objects associated through collections and annotations Expand mature the use of Life Science Identifiers Implement a security strategy that is separate from the implementation of the software Map the current data schema to the ABCD standard for the purpose of exporting data. Publish results in high quality journal. Continued exposure at conferences and workshops David A. Gaitros, Dissertation Defense, FSU, December 2006 38

QUESTIONS David A. Gaitros, Dissertation Defense, FSU, December 2006 39 QUESTIONS David A. Gaitros, Dissertation Defense, FSU, December 2006 39

Environment Requirements • One of the major problems with semantic associations is the complexity Environment Requirements • One of the major problems with semantic associations is the complexity and reliability of the relationships – Allowing unqualified individuals to make contributions to the data repositories induces errors that makes the data unreliable – Relationship connections are easily corrupted if heuristics are not followed David A. Gaitros, Dissertation Defense, FSU, December 2006 40

Environment Requirements • Features of Morph. Bank that satisfy environment requirements – – Secure Environment Requirements • Features of Morph. Bank that satisfy environment requirements – – Secure login of ALL contributors Restriction of contributors to the area of their expertise Group membership and data ownership Categories of data • In-progress • Under review • Released (cannot be altered only annotated) – Strict adherence to add, update, view, and delete heuristics – All objects are centrally cataloged and uniquely identifiable – All objects can be accessed via a globally unique identifier David A. Gaitros, Dissertation Defense, FSU, December 2006 41

Semantic Annotation • We want multiple annotations per any Morph. Bank object to allow Semantic Annotation • We want multiple annotations per any Morph. Bank object to allow scientists to add ad-hoc data to the database without specifically creating new tables or columns in existing tables. – How to store and retrieve this information in an efficient and reliable manner. – How to relate these annotations correctly to all other objects. David A. Gaitros, Dissertation Defense, FSU, December 2006 42

Semantic Annotation • Most disciplines have a common language and phrases that they use Semantic Annotation • Most disciplines have a common language and phrases that they use in describing articles in their area. – Example: Communication of a pilot to a control tower: Pilot: Tallahassee ground control this is Cessna 3245 Yankee on ramp ready for taxi to active runway with information Bravo. – We can pick out specific information that appears in an exact order. This system of formal semantics in aviation communication allows the participants to communicate efficiently and effectively without misunderstanding. – We can schematize this conversation: David A. Gaitros, Dissertation Defense, FSU, December 2006 43

Semantic Annotation <Pilot> <Ground Communication> <Taxitorunway> <Airport Identifier>Tallahasee</Airport Identifier> <Authority> Ground Control</Authority> <Aircraft> <Make. Semantic Annotation Tallahasee Ground Control Cessna 32345 Yankee Taxi Active Runway Bravo . . >/Taxittoramp> David A. Gaitros, Dissertation Defense, FSU, December 2006 44

Semantic Annotation • With a Biological Image annotation we have several distinct parts: – Semantic Annotation • With a Biological Image annotation we have several distinct parts: – – – – Specimen ( biological item of interest 0 Image ( A specimen may have more then one image) Type Annotation Text Description of Annotation Title of Annotation Date (Time Stamp) Location (X/Y coordinate of the area on the image) Associate Morph. Bank Object ( Image, Specimen, Publication, Group, User, Annotation, Location, View). David A. Gaitros, Dissertation Defense, FSU, December 2006 45

Semantic Annotation • All aspects of the annotation can be placed into a schema Semantic Annotation • All aspects of the annotation can be placed into a schema and searched accurately. • Searches on plain text presents a problem. – Example: Web Search for “Fruit Fly” – Solution: Allow researchers to use restricted semantic annotation in writing the text description • • Place data in an XML document Items can be searched quickly and efficiently No restrictions on content New semantics can be added at anytime. David A. Gaitros, Dissertation Defense, FSU, December 2006 46

Semantic Annotation <DNA Sequence Annotation> <Discriminatory Gene Analysis> <DNA Sequence> <URL> http: //www. miltenyibiotec. Semantic Annotation http: //www. miltenyibiotec. com/service/memorec/bio_inf. jpg A B Red Green Black David A. Gaitros, Dissertation Defense, FSU, December 2006 47