
f867d07c396b0d9604cf4351be6f5626.ppt
- Количество слайдов: 73
Introduction to e. Science and Semantic Web Professor Deborah Mc. Guinness TA – Weijing Chen Other lectures from Professor Joanne Luciano, grad student Jim Mc. Cusker, and possibly others from http: //tw. rpi. edu/web/People CSCI 6962 - 01, 86933 , CSCI 4969 - 01, 87927 ITWS 6960 - 01, 87198 , ITWS 4969 - 01, 87928 Week 1, initially August 29, 2011 Moved because of Hurricane Irene to Wednesday August 31, 2011 1
Admin info (keep/ print this slide) • Class: – CSCI 6962 - 01, 86933 CSCI 4969 - 01, 87927 – ITWS 6960 - 01, 87198, ITWS 4969 - 01, 87928 • Hours: 1 pm-3: 50 pm Mondays (except after Columbus day) • Class Location: Winslow 1140 • Instructors: Deborah Mc. Guinness, TA Weijing Chen, Guests: Joanne Luciano, Jim Mc. Cusker • Contacts: [email protected]. rpi. edu, chenw [email protected]. edu, [email protected]. rpi. edu, mccusj@rpi. edu • Contact locations: Winslow 2104 (DLM), 2143 (JSL) 2
For each class • Titanpad – this week http: //twc. titanpad. com/147 • Scribe for each class – this week Weijing • After class – scribe copies notes over to the class page • Class Page: http: //tw. rpi. edu/web/Courses/Semantice. Science/2011 • You will need an account on our site so that you can upload your homeworks and presentations – contact Patrick West – who is in class • See http: //tw. rpi. edu/web/Help/Upload. Link. To. Media for uploading instructions 3
Quick hints (from patrick) • It's just a matter of adding a tag to the body of the drupal page:
Introductions • • • Who are we? Who are you? Why are you here? What do you want to get out of the class? Will you make the class (on time) each week and do you have any other conflicts or issues we should know about? 5
“Knowledge is the common wealth of humanity”* In the Earth and space sciences and elsewhere, ready and open access to the vast and growing collections of cross-disciplinary digital information is the key to understanding and responding to complex Earth system phenomena that influence human survival. We have a shared responsibility to create and implement strategies to realise the full potential of digital information and services for present and future generations. *Adama Samassekou, Convener of the UN World Summit on the Information Society
Brain Storming • What do you think we need to address to start to realize the vision on the previous viewgraph? 7
Contents • • • Outline of the course Background e-Science Examples Informatics Semantics Elements of Semantic e-Science (Se. S) What we expect Logistics summary 8
Outline of the course • Topics for Semantic e-Science/ Foundations: – – – – Semantic Methodologies Knowledge Representation for e-Science Ontology Engineering and Re-Use for e-Science Knowledge Integration for e-Science Semantic Data Integration Semantic Web Languages, Tools and Services Semantic Infrastructure and Architecture for e-Science Semantic Grid Middleware Ontology Evolution for e-Science Knowledge Management for e-Science Workflow Management Data life-cycle for e-Science Data Mining and Knowledge Discovery 9
Background People (scientists) should be able to access a global, distributed knowledge base of (scientific) data that: • appears to be integrated • appears to be locally available But… data is obtained by multiple means, using various protocols, in differing vocabularies, using (sometimes unstated) assumptions, with inconsistent (or non-existent) meta-data. It may be inconsistent, incomplete, evolving, and distributed And… there often exists significant levels of semantic heterogeneity, large-scale data, complex data types, legacy systems, inflexible and unsustainable 10 implementation technology…
What do we need to achieve Semantic e. Science? (in-class brainstorming exercise (2010)) organization, leadership, management strategies, roles and assignment of roles dissemination strategy communication of ideas - machine level - human level conflict resolution cross-disciplinary collaboration flexible adaptable, feedback extensible ability to filter information usage/application of resources, optimization facts, knowledge (domain knowledge) context, domain, scope goals, use cases metadata - data to describe data ability to link information ability to understand information ability to capture and represent conflicting ideas provenance - where data come from trust - reliable ability to capture intent (humanitarian aspect / responsibility) credibility of information interesting and appealing standardization education and outreach methods and metrics criteria for evaluation
The Information Era: Interoperability Modern information and communications technologies are creating an “interoperable” information era in which ready access to data and information can be truly universal. Open access to data and services enables us to meet the new challenges of understand the Earth and its space environment as a complex system: • managing and accessing large data sets • higher space/time resolution capabilities • rapid response requirements • data assimilation into models • crossing disciplinary boundaries. 12
Information has But data products have Lots of Audiences More Strategic Less Strategic SCIENTISTS TOO 13 From “Why EPO (Education and Public Outreach)? ”, a NASA internal report on science education, 2005
Shifting the Burden from the User to the Provider 14 Fox CI and X-informatics - CSIG 2008, Aug 11
e-Science • Emphasis is on Science • Original narrative: One of the key drivers behind the search for such new scientific tools is the imminent deluge of data from new generations of scientific experiments and surveys (*). In order to exploit and explore the petabytes of scientific data that will arise from these high-throughput experiments, supercomputer simulations, sensor networks, and satellite surveys, scientists will need assistance from specialized search engines, data mining tools, and data visualization tools that make it easy to ask questions and understand answers. To create such tools, the data will need to be annotated with relevant "metadata" giving information as to provenance, content, conditions, and so on; and, in many instances, the sheer volume of data will dictate that this process be automated. Scientists will create vast distributed digital repositories of scientific data requiring management services similar to those of more conventional digital libraries, as well as other data-specific services. The ability to search, access, move, manipulate, and mine such data will be a central requirement for this new generation of collaborative science software applications. Hey and Trefethen, 2005 15
Evolving Science • Thousand years ago: science was empirical describing natural phenomena • Last few hundred years: theoretical branch using models, generalizations • Last few decades: a computational branch simulating complex phenomena • Today: data exploration (e. Science) synthesizing theory, experiment and computation with advanced data management and statistics new algorithms!
Living in an Exponential World • Scientific data doubles every year – caused by successive generations of inexpensive sensors + exponentially faster computing • • Changes the nature of scientific computing Cuts across disciplines (e. Science) It becomes increasingly harder to extract knowledge 20% of the world’s servers go into huge data centers by the “Big 5” – Google, Microsoft, Yahoo, Amazon, e. Bay • So it is not only the scientific data!
Collecting Data • Very extended distribution of data sets: data on all scales! • Most datasets are small, and manually maintained (Excel spreadsheets) • Total amount of data dominated by the other end (large multi-TB archive facilities) • Most bytes today are collected via electronic sensors
Making Discoveries • Where are discoveries made? – At the edges and boundaries – Going deeper, collecting more data, using more colors…. • Metcalfe’s law – Utility of computer networks grows as the number of possible connections: O(N 2) • Federating data (the connections!!) – Federation of N archives has utility O(N 2) – Possibilities for new discoveries grow as O(N 2) • Many examples – Sky surveys – galaxy zoo… Very early discoveries from SDSS, 2 MASS, DPOS – Genomics+proteomics – Alzheimers article in reading
Data Delivery: Hitting a Wall FTP and GREP are not adequate • • You can GREP 1 MB in a second You can GREP 1 GB in a minute You can GREP 1 TB in 2 days You can GREP 1 PB in 3 years • Oh!, and 1 PB ~4, 000 disks • • You can FTP 1 MB in 1 sec You can FTP 1 GB / min (~1 $/GB) … 2 days and 1 K$ … 3 years and 1 M$ • At some point you need indices to limit search parallel data search and analysis • This is where databases can help • Take the analysis to the data!!
Mind the Gap! • As a result of finding out who is doing what, Ø Informatics - information science includes the sharing experience/ expertise, and science of (data and) information, the practice of substantial coordination: information processing, and the engineering of • There is/ was still a gap between science the information systems. Informatics studies and the underlying infrastructure and of natural structure, behavior, and interactions technology that is available and artificial systems that store, process and communicate (data and) information. It also develops its own conceptual research • Cyberinfrastructure is the new and theoretical environment(s) that support advanced data and foundations. Since computers, individuals acquisition, data storage, data management, organizations all process information, data integration, data mining, data informatics has computational, cognitive and visualization and other computing and social aspects, including study of the social information processing services over the impact Internet. of information technologies. Wikipedia. 21
World-Wide Emerging Technology Trends • Innovation will come from other parts of the world other than the U. S. • The Chinese have skipped the Internet first generation. • Growth will occur in Asia, and continue to decrease in Western Europe. • U. S. Industry is compulsively outsourcing abroad. • Software is moving from forms-based applications to business processes. • Networks are migrating to IP and optical networking technologies.
Cyberinfrastructure • • • Data curation and storage Federated access Collaboration New uses in High Performance Computing Databases Web servers, services (software as service) Wiki Visualization All discipline neutral
Semantic Web Methodology and Technology Development Process • • Establish and improve a well-defined methodology vision for Semantic Technology based application development Leverage controlled vocabularies, etc. Adopt Leverage Rapid Technology Science/Expert Open World: Prototype Infrastructure Approach Review & Iteration Evolve, Iterate, Redesign, Redeploy Use Tools Evaluation Analysis Use Case Small Team, mixed skills Develop model/ ontology 24
Semant. Eco • Water Quality Portal Example from 2010 • http: //inferenceweb. org/wiki/Semantic_Water_Quality_Portal 25
Ex. 1: Virtual Observatories Make data and tools quickly and easily accessible to a wide audience. Operationally, virtual observatories need to find the right balance of data/model holdings, portals and client software that researchers can use without effort or interference as if all the materials were available on his/her local computer using the user’s preferred language: i. e. appear to be local and integrated Likely to provide controlled vocabularies that may be used for interoperation in appropriate domains along with database interfaces for access and storage -> thus part IT, part CI, part Informatics and all about doing new science 26
Added value Education, clearinghouses, other services, disciplines, et c. Semantic mediation layer - midupper-level VO Portal Semantic interoperability Added value VO API Web Serv. Added value Semantic query, hypothesis and inference Mediation Layer • Ontology - capturing concepts of Parameters, Instruments, Date/Time, Data Product (and Semantic mediation and Service associated classes, properties)layer - VSTO Classes • Maps queries to underlying data Metadata, schema, data • Generates access requests for metadata, data • Allows queries, reasoning, analysis, new value Added DB 2 DB 3 hypothesis generation, testing, explanation, et… … c. DB 1 Query, access and use of data low level DBn 27
Science and technical use cases Find data which represents the state of the neutral atmosphere anywhere above 100 km and toward the arctic circle (above 45 N) at any time of high geomagnetic activity. – Extract information from the use-case - encode knowledge – Translate this into a complete query for data - inference and integration of data from instruments, indices and models Provide semantically-enabled, smart data query services via a SOAP web for the Virtual Ionosphere. Thermosphere-Mesosphere Observatory that retrieve data, filtered by constraints on Instrument, Date-Time, and Parameter in any order and with constraints 28 included in any combination.
Inferred plot type and return required axes data 29
Semantic Web Benefits • Unified/ abstracted query workflow: Parameters, Instruments, Date-Time • Decreased input requirements for query: in one case reducing the number of selections from eight to three • Generates only syntactically correct queries: which was not always insurable in previous implementations without semantics • Semantic query support: by using background ontologies and a reasoner, our application has the opportunity to only expose coherent query (portal and services) • Semantic integration: in the past users had to remember (and maintain codes) to account for numerous different ways to combine and plot the data whereas now semantic mediation provides the level of sensible data integration required, and exposed as smart web services – understanding of coordinate systems, relationships, data synthesis, transformations, etc. – returns independent variables and related parameters • A broader range of potential users (Ph. D scientists, students, professional research associates and those from outside the fields) 30
But data has Lots of Audiences More Strategic Less Strategic From “Why EPO? ”, a NASA internal report on science education, 2005 31
What is a Non-Specialist Use Case? Teacher accesses internet goes to An Educational Virtual Observatory and enters a search for “Aurora”. Someone should be able to query a virtual observatory without having specialist knowledge 32
What should the User Receive? Teacher receives four groupings of search results: 1) Educational materials: http: //www. meted. ucar. edu/topics_spacewx. ph p and http: //www. meted. ucar. edu/hao/aurora/ 2) Research, data and tools: via research VOs but the search for brightness, or green/red line emission is mediated for them 3) Did you know? : Aurora is a phenomena of the upper terrestrial atmosphere (ionosphere) also known as Northern Lights 4) Did you mean? : Aurora Borealis or Aurora Australis, etc. 33
Semantic Information Integration: Concept map for educational use of science data in a lesson plan 34 Fox CI and X-informatics - CSIG 2008, Aug 11
35 Fox CI and X-informatics - CSIG 2008, Aug 11
Ex 2 – Semant. Eco / Semant. Aqua • Water Quality Portal Example from 2010 • http: //inferenceweb. org/wiki/Semantic_Water_Quality_Portal • Came from hw assignment, proposed in class • Generated papers in – Environmental Information Management 2011 – Intl Semantic Web Conference 2011 (main conference and possibly poster session as well) – American Geophysical Union 2011 – Plus invited presentations for water, health, etc. 36
Semantic Web Basics • The triple: {subject-predicate-object} Interferometer is-a optical instrument Optical instrument has focal length An ontology is a representation of this knowledge • W 3 C is the primary (but not sole) governing organization for languages, specifications, best practices, et c. – RDF - Resource Description Framework – OWL 1. 0 - Ontology Web Language (OWL 2. 0 on the way) • Encode the knowledge in triples, in a triple-store, software is built to traverse the semantic network, it can be queried or reasoned upon • Put semantics between/ in your interfaces, i. e. between layers and components in your architecture, i. e. between ‘users’ and ‘information’ to mediate the exchange 37
• • • Terminology Semantic Web – An extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation, www. ics. forth. gr/isl/swprimer/ – Primer: http: //www. ics. forth. gr/isl/swprimer/ Semantic Grid – Semantic services to use the resources of many computers connected by a network to solve large scale computational/ data problems Provenance – origin or source from which something comes, intention for use, who/what generated for, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery, documented in detail sufficient to allow reproducibility. Service-oriented architecture – Provision of a capability over the internet via a ‘remote-procedure-call’ using prescribed input, output and pre-conditions Ontology (n. d. ). The Free On-line Dictionary of Computing. http: //dictionary. reference. com/browse/ontology – An explicit
formal specification of how to represent the objects, concepts
and other entities that are assumed to exist in some area of
interest and the 38 relationships that hold among them.
• • • Terminology Closed World - where complete knowledge is known (encoded), AI relied on this Open World - where knowledge is incomplete/ evolving, SW promotes this Languages – – – – • OWL - Web Ontology Language (W 3 C) RDF - Resource Description Framework (W 3 C) OWL-S/SWSL - Web Services (W 3 C) WSMO/WSML - Web Services (EC/W 3 C) SWRL - Semantic Web Rule Language, RIF- Rules Interchange Format PML - Proof Markup Language Editors: Protégé, SWOOP, Medius, SWe. DE, … Reasoners – Pellet, Racer, Medius KBS, FACT++, fuzzy. DL, KAON 2, MSPASS, Qu. Onto • Query Languages – SPARQL, XQUERY, Se. RQL, OWL-QL, RDFQuery • Other Tools for Semantic Web – – • Search: SWOOGLE swoogle. umbc. edu Collaboration: www. planetont. org Other: Jena, Se. SAME/SAIL, Mulgara, Eclipse, KOWARI Semantic wiki: Onto. Wiki, Semantic. Media. Wiki Emerging Semantic Standards for Earth Science – SWEET, VSTO, MMI, Geo. Sci. ML 39
Semantic Web Layers 40 http: //www. w 3. org/2003/Talks/1023 -iswc-tbl/slide 26 -0. html, http: //flickr. com/photos/pshab/291147522/
Application Areas for Semantics • • • • Smart search Annotation (even simple forms), smart tagging Geospatial Implementing logic (rules), e. g. in workflows Data integration Verification …. and the list goes on Web services Web content mining with natural language parsing User interface development (portals) Semantic desktop Wikis - Onto. Wiki, Semantic. Media. Wiki Sensor Web Software engineering Explanation 41
Visibility 2007 -2008 Hype Cycle for Emerging Semantic Web Technologies v 0. 6 Semantic Web Services Semantic Wiki Smart search, e. g. NOESIS Rules/Logic, SWRL Query Lang, SPARQL Tagging / annotation Triple stores, e. g. Jena, Sesame, Mulgara, Oracle Spatial Ontology editor, SWOOP Mid-level ES domain ontologies, e. g GEON Concept map, Cmap OWL 1. 0 RDF Protégé XML Estimated years to mainstream adoption in Earth science < 2 years DL Reasoners, 2 -5 years SKOS, e. g. Pellet, Racer Species Query 5 -10 years FOAF Validators Lang, Upper level Mid-level ES OWL 1. 1 OWL-QL > 10 years ontologies, e. g domain ontologies, Natural Language Obsolete ABC, DOLCE, e. g SWEET before Ontologies SUMO plateau Query Lang, Commercial Managing and embedded QL modular 42 Slope of Plateau of ontologies Technology Peak of Trough of Enlightenment Productivity (ES and trigger Inflated Disillusionment general) Expectations Produced for NASA TIWG semantic web subgroup Time
April 2008 Outcome Increased Collaboration & Interdisciplinary Science Acceleration of Knowledge Production Revolutionizing how science is done Output Geospatial semantic services established Geospatial semantic services proliferate Scientific semantic assisted services Autonomous inference of science results Vocabulary Interoperable Information Infrastructure Assisted Discovery & Mediation Improved Information Sharing Languages/ Reasoning Technology Capability Results Semantic Web Roadmap Some common vocabulary based product search and access Semantic geospatial search & inference, access Semantic agentbased searches Semantic agentbased integration Local processing + data exchange Basic data tailoring services (data as service), verification/ validation t. Interoperable geospatial services (analysis as service), results explanation service Metadata-driven data fusion (semantic service chaining), trust SWEET core 1. 0 based on GCMD/CF SWEET core 2. 0 based on best practices decided from community RDF, OWL-S Geospatial reasoning, OWL-Time SWEET 3. 0 with semantic callable interfaces via standard programming languages Numerical reasoning Reasoners able to utilize SWEET 4. 0 Scientific reasoning 43 Current Near Term (0 -2 yrs) Mid Term (2 -5 yrs) Long Term (5+ yrs)
Assisted Interactive Interoperable Responsive Verifiable Assisted Data Information Knowledge Discovery & Mediation Analysis services Delivery Quality Building Seamless Data Access Capability Semantic Web Roadmap (capability) April 2008 Some common vocabulary based product search and access Some metadata and limited provenance available Semantic geospatial search & inference, access Ontologies for data mining, visualization and analysis emerging/ maturing Ontologies for information quality developed Verification is manual with minimal tool support Semantic agentbased searches Semantic agent-based integration Common terminology captured in ontologies, crossing domains Domain and range properties in ontologies used in tools Provenance/ annotation with ontologies in user tools Service ontologies carry quality provenance Services annotated Dynamic service Semantic markup of Services must be with resource discovery and mediation, data latency (time lags) hardwired and service descriptions and data scheduling which adapt dynamically agreements established Local processing + data exchange Limited metadata passed to analysis applications Basic data tailoring t Interoperable geospatial services (data as (analysis as service), verification/ results explanation service validation Tag properties, nonjargon vocabulary for non-specialist use Access mediated by agreed standard vocabularies, hard-wired connections Current Access mediated by common ontologies Near Term (0 -2 yrs) Shared terminology for the visual properties of interface objects and graph types. . . Mediation aided by services with domain/ range properties Mid Term (2 -5 yrs) Metadata-driven data fusion (semantic service chaining), trust Semantic fields to describe tag key modal functions. Key data access services are 44 semantically mediated Long Term (5+ yrs)
Assisted Interactive Interoperable Responsive Verifiable Assisted Data Information Knowledge Discovery & Mediation Analysis services Delivery Quality Building Seamless Data Access Capability Roadmap - from near-term to mid-term Semantic geospatial search & inference, access -> requires agent development and vocabulary for agent characterization -> requires mature (domain Ontologies for data and data-type) ontologies with mining, visualization and community endorsement and analysis emerging/ maturing governance and a robust integration framework -> requires mature quality and Ontologies for uncertainty ontologies with information quality domain and range properties developed added and populated Services annotated with resource descriptions Basic data tailoring services (data as service), verification/ validation Tag properties, nonjargon vocabulary for non-specialist use Access mediated by common ontologies Near Term (0 -2 yrs) -> requires semantic service (ontology) registry -> requires service to implement v/v, new descriptions of analyses, developing explanation -> requires development of portal modal function vocabulary and ontology, link to domain context and data structure -> requires adding properties to classes in ontologies and populating instances with expert agreement Semantic agentbased searches Common terminology captured in ontologies, crossing domains Domain and range properties in ontologies used in tools Dynamic service discovery and mediation, and data scheduling t Interoperable geospatial services (analysis as service), results explanation service Shared terminology for the visual properties of interface objects and graph types. . . Mediation aided by services with domain/ range properties Mid Term (2 -5 yrs) 45
Selected Technical Benefits 1. 2. 3. 4. 5. 6. 7. 8. Integrating Multiple Data Sources Semantic Drill Down / Focused Perusal Statements about Statements Inference Translation Smart (Focused) Search Smarter Search … Configuration Proof and Trust Updated material reused from “The Substance of the Web”. Mc. Guinness and Dean. Semantic Web Applications for National Security. May, 2005. http: //www. schafertmd. com/swans/agenda. html 46
1: Integrating Multiple Data Sources • The Semantic Web lets us merge statements from different sources • The RDF Graph Model allows programs to use data uniformly regardless of the source • Figuring out where to find such data is a motivator for Semantic Web Services has. Coordinates #Ionosphere #magnetic name has. Lower. Boundary. Value “ 100” “Terrestrial Ionosphere” has. Lower. Boundary. Unit “km” Different line & text colors 47 represent different data sources
2: Drill Down /Focused Perusal • The Semantic Web uses Uniform Resource Identifiers (URIs) to …#Neutral. Temperature name things • These can typically be resolved to get more information about the resource measuredby • This essentially creates a web of data analogous to the web of text created by the World Wide Web Internet • Ontologies are represented using the same structure as content – We can resolve class and property URIs to learn about the ontology …#Norway located. In . . . #ISR. . . #FPI type operatedby. . . #Milllstone. Hill …#EISCAT 48
3: Statements about Statements • The Semantic Web allows us to make statements about statements – Timestamps – Provenance / Lineage – Authoritativeness / Probability / Uncertainty – Security classification – … • This is an unsung virtue of the Semantic Web #Danny’s #Aurora has. Source has. Date. Time 20031031 hascolor Red Ontologies Workshop, APL May 26, 2006 49
4: Inference • The formal foundations of the Semantic Web allow us to infer additional (implicit) statements that are not explicitly made • Unambiguous semantics allow question answerers to infer that objects are the same, objects are related, objects have certain restrictions, … • SWRL allows us to make additional inferences beyond those provided by the ontology Operates. Instrument #Millstone Hill #Interferometer has. Instrument is. Operated. By Measures has. Typeof. Data has. Operating. Mo has. Meaasured. Data #Vertical. Means 50
5: Translation • While encouraging sharing, the Semantic Web allows multiple URIs to refer to the same thing • There are multiple levels of mapping – – Classes Properties Instances Ontologies • OWL supports equivalence and specialization; SWRL allows more complex mappings #precipitation name ont 1: Precipitation ont 1: Edu. Level VO: Scientist #precipitation name ont 2: Rain ont 2: Edu. Level Edu. VO: K-12 51
6: Smart (Focused) Search • The Semantic Web associates 1 or more classes with each object • We can use ontologies to enhance search by: – – Query expansion Sense disambiguation Type with restrictions …. 52
7: Smarter Search / Configuration 53
GEONGRID Ontology Search and Data Integration Example Uses emerging web standards to enable smart web applications Given an upper-level domain choice • Ecology Illustrate or list contained concepts/hierarchy • Vegetation. Cover, Tree. Rings, etc. Retrieve some specific options from web • Maps, tree-ring data, • Info: https: //portal. geongrid. org: 8443/gridsphere 54
55
56
8: Proof • The logical foundations has. Calibration #Critical of the Semantic Web #Flat. Field Dataset allow us to construct proofs that can be used has. Peer. Review to improve transparency, understanding, and trust #Solar Physics • Proof and Trust are on. Paper going research areas for the Semantic Web: e. g. , “Critical Dataset has been calibrated See PML and Inference with a flat field program that is published In the peer reviewed literature. ” 57 Web
Inference Web Framework for explaining reasoning tasks by storing, exchanging, combining, annotating, filtering, segmenting, comparing, and rendering proofs and proof fragments provided by multiple distributed reasoners. • OWL-based Proof Markup Language (PML) specification as an interlingua for proof interchange • IWExplainer for generating and presenting interactive explanations from PML proofs providing multiple dialogues and abstraction options • IWBrowser for displaying (distributed) PML proofs • IWBase distributed repository of proof-related meta-data such as inference engines/rules/languages/sources • Integrated with theorem provers, text analyzers, web services, … http: //iw. rpi. edu 58
Inference Web Infrastructure (Mc. Guinness, et. al. , 2004 http: //www. ksl. stanford. edu/KSL_Abstracts/KSL-04 -03. html ) Files/WWW Semantic OWL-S/BPEL Discovery Service (DAML/SNRC) CWM (NSF TAMI) JTP (DAML/NIMD) SPARK (DARPA CALO) N 3 KIF SPARK-L UIMA (DTO NIMD Text Analytics Exp Aggregation) Toolkit IWTrust Proof Markup Language (PML) Trust Justification Provenance Trust computation IW Explainer/ Abstractor End-user friendly visualization IWBrowser Expert friendly Visualization IWSearch search engine based publishing IWBase provenance registration Framework for explaining question answering tasks by • abstracting, storing, exchanging, • combining, annotating, filtering, segmenting, • comparing, and rendering proofs and proof fragments provided by question answerers. 59
SW Questions & Answers Users can explore extracted entities and relationships, create new hypothesis, ask questions, browse answers and get explanations for answers. A question An answer A context for explaining the answer An abstracted explanation 60 (this graphical interface done by Batelle supported by Stanford KSL)
Summary • Semantics are a very key ingredient for progress in informatics and escience • A sustained involvement of key inter-disciplinary team members is very important -> leads to incentives, rewards, etc. and a balance of research and production • This is what we will be teaching you in this class 61
Semantic Web Methodology and Technology Development Process • • Establish and improve a well-defined methodology vision for Semantic Technology based application development Leverage controlled vocabularies, et c. Rapid Leverage Open World: Prototype Technology Evolve, Iterate, Infrastructure Redesign, Redeploy Adopt Technology Science/Expert Approach Review & Iteration Use Tools Evaluation Analysis Use Case Small Team, mixed skills Develop model/ ontology 62
Outline of the course • Topics for Semantic e-Science/ Foundations: – – – – Semantic Methodologies Knowledge Representation for e-Science Ontology Engineering and Re-Use for e-Science Knowledge Integration for e-Science Semantic Data Integration Semantic Web Languages, Tools and Services Semantic Infrastructure and Architecture for e-Science Semantic Grid Middleware Ontology Evolution for e-Science Knowledge Management for e-Science Workflow Management Data life-cycle for e-Science Data Mining and Knowledge Discovery 63
Se. S Applications and Ontologies • • Semantic Web for Health Care and Life Science Semantic Web for Bio-Med-informatics Semantic Web for System and Integrated Biology Semantic Web for Sun, Earth, Environment and Climate • Semantic Web for Chemistry, Physics and Astronomy • Semantic Web for Engineering • Semantic Web and Digital Libraries and Scientific Publications 64
Se. S Project options • Configuration and Deployment of Semantic Virtual Observatories – Oceanography, astronomy, geology • • • Ontology Merging and Validation Test-bed Semantic Language and Tool Use and Evaluation Semantic e. Science Implementation Evaluation Semantic Collaboration Case Studies Semantic Application Development and Demonstration 65
Schedule – web page • Reading assignments • Assignments – Individual – Group • Written assessments • Presentation assessments • Group assessments 66
What we expect • Attend class, complete assignments • Participate • Ask questions – be honest with yourself and others about what you do and do not know • Work both individually and in a group • Work constructively in group and class sessions 67
Logistics summary • Class - Monday 1 -3: 50 pm • Office hours – By Appointment along with a regular time to be determined and tetherless night • This weeks assignment: – Reading - Ontologies 101*, Semantic Web, e-Science, RDFS – Turn in a one page description of one of your favorite papers AND WHY from the reading list • Next class (week 2 – September 12***** - note labor day): – Foundations I: Methodologies, Knowledge Representation • If you have a background that you think needs some extra background reading, talk to us. • Questions? 68
Extra 69
Digital natives expect services to accommodate their preferences. • • • Information online, not “in line” Information on-demand, free of place or time Blended classroom and online experience Flexible schedule for working students Relevant and timely content More team collaboration More content from multiple sources Interactive content from voice, video and data Ability to contribute, as well as consume, content/knowledge • Leads to virtual access…
Progression after progression Informatics IT Cyber Infrastru cture Cyber Informatics Core Informatics Science Informatics, aka Xinformatics Science, Societal Benefit Areas 71
Summary • The data and information challenges are (almost) being identified as increasingly common • Data and information science is becoming the ‘fourth’ column (along with theory, experiment and computation) • Informatics is playing a key role in filling the gap between science (and the spectrum of nonexpert) use and generation and the underlying cyberinfrastructure – evident due to the emergence of Xinformatics (world-wide) • Informatics is a profession and a community activity and requires efforts in all 3 sub-areas 72 (science, core, cyber) and must be synergistic
Background Scientists should be able to access a global, distributed knowledge base of scientific data that: • appears to be integrated • appears to be locally available But… data is obtained by multiple means, using various protocols, in differing vocabularies, using (sometimes unstated) assumptions, with inconsistent (or non-existent) meta-data. It may be inconsistent, incomplete, evolving, and distributed And… there often exists significant levels of semantic heterogeneity, large-scale data, complex data types, legacy systems, inflexible and unsustainable 73 implementation technology…