Скачать презентацию XG Multimedia Semantic News Use Case Thierry Declerck Скачать презентацию XG Multimedia Semantic News Use Case Thierry Declerck

3cf3d543f4173976af4c2f4745d182e2.ppt

  • Количество слайдов: 45

XG Multimedia Semantic News Use Case Thierry Declerck, DFKI Gmb. H Language Technology Lab XG Multimedia Semantic News Use Case Thierry Declerck, DFKI Gmb. H Language Technology Lab

Automatic Semantic Analysis of Metada associated with News Videos of Broacasting companies On-going work Automatic Semantic Analysis of Metada associated with News Videos of Broacasting companies On-going work in the projects K-Space and MESH

Metadata of News Broadcasters • We analysed the metadata available from various Broadcasters • Metadata of News Broadcasters • We analysed the metadata available from various Broadcasters • Their data consists of audio/video material and textual metadata. This is a very valuable data set, since the textual metadata consists also in manually annotated scenes descriptions. • This dataset can be used for building a training corpus for automated alignment of video, audio and text data. • In the next slides we see some abstraction over the various types of metadata provided. 3

The Metadata Labels § <DOC filename=„ 0324000 -3_Journal_ ENG_F 4001 C_26122003_2000“> § <TYPE>Earthquake Iran</TYPE> The Metadata Labels § § Earthquake Iran § Journal F: 4001 C § § § § § § § 4

The Title Tag <TITLE> Td. T: Erdbeben /Iran/Zerstörungen in Bam/Trümmer/Ruinen/Opfer </TITLE> Extract: “Erdbeben” (keyword The Title Tag Td. T: Erdbeben /Iran/Zerstörungen in Bam/Trümmer/Ruinen/Opfer Extract: “Erdbeben” (keyword for disaster ontology) ; location “Iran” (with NE detection). Other terms, but yet still unclear about their role 5

The Description Tag <DESCRIPTION> Ein schweres Erdbeben hat im Iran die Stadt Bam fast The Description Tag Ein schweres Erdbeben hat im Iran die Stadt Bam fast völlig zerstört. Linguistic and semantic analysis: [Subj-NP Ein schweres Erbeben] [Vhat] [LOC-PP in Iran] [OBJ-NP die Stadt Bam] [ADV fast völlig] [Vzerstört]. Extraction: Who (causation): Erbeben (Earthquake) What_action: zertören (destroy) What: Stadt Bam (city of Bam). Here the system can infer that Bam is located in Iran. Where: Iran 6

Bam: Menschen sitzen zwischen Trümmern auf Boden verzweifelte" src="https://present5.com/presentation/3cf3d543f4173976af4c2f4745d182e2/image-7.jpg" alt="The Scenes Tag Bam: Menschen sitzen zwischen Trümmern auf Boden verzweifelte" /> The Scenes Tag Bam: Menschen sitzen zwischen Trümmern auf Boden verzweifelte Menschen sitzen am Strassenrand Schuttberge zerstörte Häuser rauchende Trümmer Descriptons of sequences of images displayed. Extracting related entities: People within ruins, desperate people, destroyed houses, smoking ruins etc. All those terms can be seen as “consequences of the earthquake”. Important also: they provide for a description of what is to be seen in the video. 7

The Keywords Tag <KEYWORDS> Naher Osten: Iran; Erdbeben </KEYWORDS> The pattern of the content The Keywords Tag Naher Osten: Iran; Erdbeben The pattern of the content of this tag allows us to infer that Iran is located in “near-east”. 8

Linguistic Knowledge Structures § Multiple layers and levels § § § Low-level linguistic features Linguistic Knowledge Structures § Multiple layers and levels § § § Low-level linguistic features (tokenization, morphology, …) Semantic properties of terms and phrases Named Entities Relation Extraction (incl. Grammatical Relations) Semantic linking to domain ontologies § Can involve several abstraction layers connected through reasoning/mapping processes § Semantic linking to other media analysis § Associated to the domain ontology of MESH (natural disasters in the news) 9

Association of Ontologies 10 Association of Ontologies 10

Semantic annotation of Text extracted from Images (Thierry Declerck, DFKI & Andreas Cobet, TUB) Semantic annotation of Text extracted from Images (Thierry Declerck, DFKI & Andreas Cobet, TUB)

Background § The data: The German Broadcast news programme „Tagesthemen“ § Extract Text from Background § The data: The German Broadcast news programme „Tagesthemen“ § Extract Text from key frames of shots. Annotate those terms semantically § Analyse of the position of the text and the kind of text extracted. 6 cases detected so far: 12

Case 1: Above the picture, just a normal phrase, mostly a nominal phrase (NP) Case 1: Above the picture, just a normal phrase, mostly a nominal phrase (NP) 13

Case 2: Below the picture: Name of a person and of the function of Case 2: Below the picture: Name of a person and of the function of this person 14

Case 3: Below the picture: Name of a person and of a city/country 15 Case 3: Below the picture: Name of a person and of a city/country 15

Case 4: Above the picture, just a normal nominal phrase, and below the picture, Case 4: Above the picture, just a normal nominal phrase, and below the picture, name of a person, 16

Case 5: Below the picture the word „Bericht“ (or similar) and name of Person Case 5: Below the picture the word „Bericht“ (or similar) and name of Person (=> Journalist) 17

Case 6: A location name. No picture of a specific human 18 Case 6: A location name. No picture of a specific human 18

Cross-Media Ontologies The next slides by courtesy of Paul Buitelaar, Michael Sintek, Malte Kiesel Cross-Media Ontologies The next slides by courtesy of Paul Buitelaar, Michael Sintek, Malte Kiesel (DFKI Gmb. H) from the Project Smart. Web. Paper „Feature Representation for Cross-Lingual, Cross. Media Semantic Web Applications“, presented at ESWC 2006.

Semiotic Triangle § See (Ogden & Richards, 1923) - based on § Structural Linguistics Semiotic Triangle § See (Ogden & Richards, 1923) - based on § Structural Linguistics (de Saussure, 1916) § philosophical work by Peirce (mostly 19 th century) 20

Semiotic Triangle – the real world . . . actual goalkeepers in the real Semiotic Triangle – the real world . . . actual goalkeepers in the real world. . . 21

Semiotic Triangle – concepts . . . actual goalkeepers in the real world. . Semiotic Triangle – concepts . . . actual goalkeepers in the real world. . . 22

Semiotic Triangle – words goalkeeper (EN) Torwart (DE) doelman (NL). . . actual goalkeepers Semiotic Triangle – words goalkeeper (EN) Torwart (DE) doelman (NL). . . actual goalkeepers in the real world. . . 23

Semiotic Triangle – images . . . actual goalkeepers in the real world. . Semiotic Triangle – images . . . actual goalkeepers in the real world. . . 24

Features § Multilingual Features § Terms with Linguistic Info and Context Models § Example: Features § Multilingual Features § Terms with Linguistic Info and Context Models § Example: goalkeeper § § § part-of-speech: noun morphology: goal-keeper context (Google hits stats. ): [ gets: 420000, holds: 212000, shoots: 55900, … ] § Multimedia Features § Images with Feature Models § Example: goalkeeper § § § color: #111111 shape: human texture: “keypatch-set 223” 25

Representation – Proposal § Attach multilingual and multimedia features to classes and properties (and Representation – Proposal § Attach multilingual and multimedia features to classes and properties (and also instances) § use of meta-classes Class. With. Feats and Property. With. Feats with properties ling. Feat and img. Feat (with ranges Ling. Feat and Img. Feat) § The classes Ling. Feat and Img. Feat are used for complex feature descriptions rdfs: Class rdfs: sub. Class. Of feat: Class. With. Feats feat: ling. Feat feat: img. Feat if: Img. Feat if: color if: texture … rdfs: Property rdfs: sub. Class. Of feat: Property. With. Feats feat: ling. Feat feat: img. Feat lf: Ling. Feat lf: term lf: lang … meta-classes 26

Representation – Simplified Example 27 Representation – Simplified Example 27

Representation – Ling. Info Ontology ling. Feat Class. With. Feats Ling. Feat ling. Feat Representation – Ling. Info Ontology ling. Feat Class. With. Feats Ling. Feat ling. Feat Property. With. Feats Ling. Feat case gender number orthographic. Form part. Of. Speech root semantics term lang morph. Synt. Decomp Word. Form Literal {neuter, female, male} {singular, plural} Literal {Adj, Verb, Noun, . . . } Root Resource analysis. Index function Phrase. Or. Word. Form Integer {modifier, head, neg. Modifier} is-a Inflected. Word. Form Affix Word. Form phrase. Analysis phrase. Category Phrase Literal Phrase. Or. Word. Form {S, NP, AP, PP, VG} is. Composed. Of is-a inflection word. Form is-a Ling. Feat Literal {en_US, en_GB, en, de, fr, . . . } Phrase. Or. Word. Form Stem Word. Form orthographic. Form Morpheme Literal is-a Root is-a Affix . . . is-a 28

Example Instance: “Fußballspielers” (“of the football player”) lang morph. Syn. Decomp term case gender Example Instance: “Fußballspielers” (“of the football player”) lang morph. Syn. Decomp term case gender number ortographic. Form part. Of. Speech is. Composed. Of inst 0 : Ling. Feat de Fußballspielers inst 2 : Stem nominative male singular Fußballspieler Noun case gender number ortographic. Form part. Of. Speech word. Form root orthographic. Form inst 3 : Stem analysis. Index orthographic. Form. . . is. Composed. Of function root 1 Fußball modifier inst 1 : Inflected. Word. Form genitive male singular Fußballspielers Noun inst 1 : Root Spieler inst 8 : Stem analysis. Index orthographic. Form. . . function root 2 Spieler head inst 7 : Stem (Ball) inst 4 : Root (Ball) inst 5 : Stem (Fuß) inst 6 : Root (Fuß) 29

Features – Interacting Layers 30 Features – Interacting Layers 30

Translating XBRL Into Description Logic Thierry Declerck and Hans-Ulrich Krieger DFKI Gmb. H Translating XBRL Into Description Logic Thierry Declerck and Hans-Ulrich Krieger DFKI Gmb. H

Motivation § Toward a large intelligent web-based financial information and decision support systems in Motivation § Toward a large intelligent web-based financial information and decision support systems in the MUSING project § Till now a prototype based on XBRL (e. Xtensible Business Reporting Language), as developed within the e. Ten project WINS § There we experienced the limitations of the XBRL schema, due to the lack of reasoning support over XML-based data and information extracted from documents. § Need to translate XBRL into an ontology 32

XBRL Example: Header & Metadata #001 2001 -01 -01 2001 -12 -31 ISO 4217: EUR …. . …. XBRL Deutschland e. V. Düsseldorf 33

1338066 1338066 XBRL: Financial Data …. 1338066 0 0 0 749385 259760 209343 0 … 34

XBRL to OWL-XBRL § XBRL taxonomies make use of XML in order to describe XBRL to OWL-XBRL § XBRL taxonomies make use of XML in order to describe the structure of an XBRL document as well as to define new datatypes and properties relevant to XBRL. § Allows to check whether a concrete (business) document conforms to the syntactic structure, defined in the schema. § But a need for languages and tools that go beyond the expressive syntactic power of XML Schema. § OWL, the Web Ontology Language is the new emerging language for the Semantic Web that originates from the DAML+OIL standardization. OWL still makes use of constructs from RDF and RDFS, such as rdf: resource, rdfs: sub. Class. Of, or rdfs: domain § Two important variants OWL Lite and OWL DL restrict the expressive power of RDFS, thereby ensuring decidability. § What makes OWL unique (as compared to RDFS or even XML Schema) is the fact that it can describe resources in more detail and that it comes with a well-defined model-theoretical semantics, inherited from description logic 35

Actual Experiment with the Sesame DB § The basic idea during our (manual) effort Actual Experiment with the Sesame DB § The basic idea during our (manual) effort was that even though we are developing an XBRL taxonomy in OWL using Protégé, the information that is stored on disk is still RDF at the syntactic level. We were thus interested in RDF data base systems, wich make sense of the semantics of OWL and RDFS constructs such as rdfs: sub. Class. Of or owl: equivalent. Class § Current experiment with the Sesame open-source middleware framework for storing and retrieving RDF data. Sesame partially supports the semantics of RDFS and OWL constructs via entailment rules that compute “missing" RDF triples (the deductive closure) § From an RDF point of view, additional 62, 598 triples were generated through Sesame's deductive closure. 36

predicate var="? p"/> 37

A concrete Example of deduced Relation § Since we have classied has. Part (as A concrete Example of deduced Relation § Since we have classied has. Part (as well as part. Of) as a transitive OWL property, the rule in the former slide will fiere, making implicit knowledge explicit and produces new triples such as although only can be found in the original XBRL specification. 38

Translating the Base Taxonomy § § In the German. AP Commercial and Industrial (German Translating the Base Taxonomy § § In the German. AP Commercial and Industrial (German Accounting Principles) taxonomy (http: //www. xbrl-deutschland. de/xe news 2. htm), the file xbrlinstance. xsd specifies the XBRL base taxonomy using XML Schema. It makes use of XML schema datatypes, such as xsd: string or xsd: date, but also defines simple types (simple. Type), complex types (complex. Type), elements (element), and attributes (attribute). Element and attribute declarations are used to restrict the usage of elements and attributes in XBRL XML documents. Since OWL only knows the distinction between classes and properties, the correspondences between XBRL and OWL description primitives is not a one -to-one mapping: 39

Business Intelligence in MUSING § Next generation Business Intelligence: The MUSING European R&D Project Business Intelligence in MUSING § Next generation Business Intelligence: The MUSING European R&D Project (MUlti-industry, Semantic-based next generation business INtelli. Gence). Towards a new generation of Business Intelligence (BI) tools and modules founded on semantic-based knowledge and content systems, enhancing the technological foundations of knowledge acquisition and reasoning in BI applications. 40

Application Domains in MUSING § The breakthrough impact of MUSING on semantic-based BI will Application Domains in MUSING § The breakthrough impact of MUSING on semantic-based BI will be measured in three strategic, vertical domains: § Finance, through development and validation of next generation (Basel II and beyond) semantic-based BI solutions, with particular reference to Credit Risk Management § Internationalisation, (i. e. , the process that allows an enterprise to evolve its business from a local to an international dimension, hereby expressly focusing on the information acquisition work concerning international partnerships, contracts, investments) through development and validation of next-generation semantic based internationalisation platforms; § Operational Risk Management, through development and validation of semantic-driven knowledge systems for measurement and mitigation tools, with particular reference to IT operational risks faced by IT-intensive organisations. 41

Processing of Quantitative Data § Typical Input: Finance reports in PDF 42 Processing of Quantitative Data § Typical Input: Finance reports in PDF 42

PDF to XBRL (OWL-XBRL) § Mapping from PDF to HTML/XML § Detection in the PDF to XBRL (OWL-XBRL) § Mapping from PDF to HTML/XML § Detection in the HTML/XML of relevant layout information that helps in reconstructing the logical units of the original PDF documents (title, header/footer, footnote, tables, free text) § Mapping of terms found in the XML version of the document to XBRL labels. Disambiguating where needed. § Checking if all the lines of the PDF documents are XBRL compliant. Non-compliant information to be saved in a log file. Towards a XBRL checker of balance sheets delivered in proprietary formats. § Generation of the results of the PDFto. XBRL procedure in a multilingual setting 43

Processing of Qualitative Data § TURNOVER, INCOME, GROWTH: “State of revenues, if depurated from Processing of Qualitative Data § TURNOVER, INCOME, GROWTH: “State of revenues, if depurated from sales related to Consip contract award, which remarkably affected the turnover in 2003, would have, on the contrary, recorded an increase of 3, 23% against that microinformatics market which recorded an increase of 3, 2% (Sirmi, january 2005). ” § Task of identifying relevant expressions and to classify them 44

Integration of Data § The Challenge: Merging data and information extracted from various types Integration of Data § The Challenge: Merging data and information extracted from various types of documents. Also in various languages. And in the XG use case, especially integrated information from news wires. 45