3cf3d543f4173976af4c2f4745d182e2.ppt
- Количество слайдов: 45
XG Multimedia Semantic News Use Case Thierry Declerck, DFKI Gmb. H Language Technology Lab
Automatic Semantic Analysis of Metada associated with News Videos of Broacasting companies On-going work in the projects K-Space and MESH
Metadata of News Broadcasters • We analysed the metadata available from various Broadcasters • Their data consists of audio/video material and textual metadata. This is a very valuable data set, since the textual metadata consists also in manually annotated scenes descriptions. • This dataset can be used for building a training corpus for automated alignment of video, audio and text data. • In the next slides we see some abstraction over the various types of metadata provided. 3
The Metadata Labels §
The Title Tag
The Description Tag
Bam: Menschen sitzen zwischen Trümmern auf Boden
The Keywords Tag
Linguistic Knowledge Structures § Multiple layers and levels § § § Low-level linguistic features (tokenization, morphology, …) Semantic properties of terms and phrases Named Entities Relation Extraction (incl. Grammatical Relations) Semantic linking to domain ontologies § Can involve several abstraction layers connected through reasoning/mapping processes § Semantic linking to other media analysis § Associated to the domain ontology of MESH (natural disasters in the news) 9
Association of Ontologies 10
Semantic annotation of Text extracted from Images (Thierry Declerck, DFKI & Andreas Cobet, TUB)
Background § The data: The German Broadcast news programme „Tagesthemen“ § Extract Text from key frames of shots. Annotate those terms semantically § Analyse of the position of the text and the kind of text extracted. 6 cases detected so far: 12
Case 1: Above the picture, just a normal phrase, mostly a nominal phrase (NP) 13
Case 2: Below the picture: Name of a person and of the function of this person 14
Case 3: Below the picture: Name of a person and of a city/country 15
Case 4: Above the picture, just a normal nominal phrase, and below the picture, name of a person, 16
Case 5: Below the picture the word „Bericht“ (or similar) and name of Person (=> Journalist) 17
Case 6: A location name. No picture of a specific human 18
Cross-Media Ontologies The next slides by courtesy of Paul Buitelaar, Michael Sintek, Malte Kiesel (DFKI Gmb. H) from the Project Smart. Web. Paper „Feature Representation for Cross-Lingual, Cross. Media Semantic Web Applications“, presented at ESWC 2006.
Semiotic Triangle § See (Ogden & Richards, 1923) - based on § Structural Linguistics (de Saussure, 1916) § philosophical work by Peirce (mostly 19 th century) 20
Semiotic Triangle – the real world . . . actual goalkeepers in the real world. . . 21
Semiotic Triangle – concepts . . . actual goalkeepers in the real world. . . 22
Semiotic Triangle – words goalkeeper (EN) Torwart (DE) doelman (NL). . . actual goalkeepers in the real world. . . 23
Semiotic Triangle – images . . . actual goalkeepers in the real world. . . 24
Features § Multilingual Features § Terms with Linguistic Info and Context Models § Example: goalkeeper § § § part-of-speech: noun morphology: goal-keeper context (Google hits stats. ): [ gets: 420000, holds: 212000, shoots: 55900, … ] § Multimedia Features § Images with Feature Models § Example: goalkeeper § § § color: #111111 shape: human texture: “keypatch-set 223” 25
Representation – Proposal § Attach multilingual and multimedia features to classes and properties (and also instances) § use of meta-classes Class. With. Feats and Property. With. Feats with properties ling. Feat and img. Feat (with ranges Ling. Feat and Img. Feat) § The classes Ling. Feat and Img. Feat are used for complex feature descriptions rdfs: Class rdfs: sub. Class. Of feat: Class. With. Feats feat: ling. Feat feat: img. Feat if: Img. Feat if: color if: texture … rdfs: Property rdfs: sub. Class. Of feat: Property. With. Feats feat: ling. Feat feat: img. Feat lf: Ling. Feat lf: term lf: lang … meta-classes 26
Representation – Simplified Example 27
Representation – Ling. Info Ontology ling. Feat Class. With. Feats Ling. Feat ling. Feat Property. With. Feats Ling. Feat case gender number orthographic. Form part. Of. Speech root semantics term lang morph. Synt. Decomp Word. Form Literal {neuter, female, male} {singular, plural} Literal {Adj, Verb, Noun, . . . } Root Resource analysis. Index function Phrase. Or. Word. Form Integer {modifier, head, neg. Modifier} is-a Inflected. Word. Form Affix Word. Form phrase. Analysis phrase. Category Phrase Literal Phrase. Or. Word. Form {S, NP, AP, PP, VG} is. Composed. Of is-a inflection word. Form is-a Ling. Feat Literal {en_US, en_GB, en, de, fr, . . . } Phrase. Or. Word. Form Stem Word. Form orthographic. Form Morpheme Literal is-a Root is-a Affix . . . is-a 28
Example Instance: “Fußballspielers” (“of the football player”) lang morph. Syn. Decomp term case gender number ortographic. Form part. Of. Speech is. Composed. Of inst 0 : Ling. Feat de Fußballspielers inst 2 : Stem nominative male singular Fußballspieler Noun case gender number ortographic. Form part. Of. Speech word. Form root orthographic. Form inst 3 : Stem analysis. Index orthographic. Form. . . is. Composed. Of function root 1 Fußball modifier inst 1 : Inflected. Word. Form genitive male singular Fußballspielers Noun inst 1 : Root Spieler inst 8 : Stem analysis. Index orthographic. Form. . . function root 2 Spieler head inst 7 : Stem (Ball) inst 4 : Root (Ball) inst 5 : Stem (Fuß) inst 6 : Root (Fuß) 29
Features – Interacting Layers 30
Translating XBRL Into Description Logic Thierry Declerck and Hans-Ulrich Krieger DFKI Gmb. H
Motivation § Toward a large intelligent web-based financial information and decision support systems in the MUSING project § Till now a prototype based on XBRL (e. Xtensible Business Reporting Language), as developed within the e. Ten project WINS § There we experienced the limitations of the XBRL schema, due to the lack of reasoning support over XML-based data and information extracted from documents. § Need to translate XBRL into an ontology 32
1338066
XBRL to OWL-XBRL § XBRL taxonomies make use of XML in order to describe the structure of an XBRL document as well as to define new datatypes and properties relevant to XBRL. § Allows to check whether a concrete (business) document conforms to the syntactic structure, defined in the schema. § But a need for languages and tools that go beyond the expressive syntactic power of XML Schema. § OWL, the Web Ontology Language is the new emerging language for the Semantic Web that originates from the DAML+OIL standardization. OWL still makes use of constructs from RDF and RDFS, such as rdf: resource, rdfs: sub. Class. Of, or rdfs: domain § Two important variants OWL Lite and OWL DL restrict the expressive power of RDFS, thereby ensuring decidability. § What makes OWL unique (as compared to RDFS or even XML Schema) is the fact that it can describe resources in more detail and that it comes with a well-defined model-theoretical semantics, inherited from description logic 35
Actual Experiment with the Sesame DB § The basic idea during our (manual) effort was that even though we are developing an XBRL taxonomy in OWL using Protégé, the information that is stored on disk is still RDF at the syntactic level. We were thus interested in RDF data base systems, wich make sense of the semantics of OWL and RDFS constructs such as rdfs: sub. Class. Of or owl: equivalent. Class § Current experiment with the Sesame open-source middleware framework for storing and retrieving RDF data. Sesame partially supports the semantics of RDFS and OWL constructs via entailment rules that compute “missing" RDF triples (the deductive closure) § From an RDF point of view, additional 62, 598 triples were generated through Sesame's deductive closure. 36
A concrete Example of deduced Relation § Since we have classied has. Part (as well as part. Of) as a transitive OWL property, the rule in the former slide will fiere, making implicit knowledge explicit and produces new triples such as
Translating the Base Taxonomy § § In the German. AP Commercial and Industrial (German Accounting Principles) taxonomy (http: //www. xbrl-deutschland. de/xe news 2. htm), the file xbrlinstance. xsd specifies the XBRL base taxonomy using XML Schema. It makes use of XML schema datatypes, such as xsd: string or xsd: date, but also defines simple types (simple. Type), complex types (complex. Type), elements (element), and attributes (attribute). Element and attribute declarations are used to restrict the usage of elements and attributes in XBRL XML documents. Since OWL only knows the distinction between classes and properties, the correspondences between XBRL and OWL description primitives is not a one -to-one mapping: 39
Business Intelligence in MUSING § Next generation Business Intelligence: The MUSING European R&D Project (MUlti-industry, Semantic-based next generation business INtelli. Gence). Towards a new generation of Business Intelligence (BI) tools and modules founded on semantic-based knowledge and content systems, enhancing the technological foundations of knowledge acquisition and reasoning in BI applications. 40
Application Domains in MUSING § The breakthrough impact of MUSING on semantic-based BI will be measured in three strategic, vertical domains: § Finance, through development and validation of next generation (Basel II and beyond) semantic-based BI solutions, with particular reference to Credit Risk Management § Internationalisation, (i. e. , the process that allows an enterprise to evolve its business from a local to an international dimension, hereby expressly focusing on the information acquisition work concerning international partnerships, contracts, investments) through development and validation of next-generation semantic based internationalisation platforms; § Operational Risk Management, through development and validation of semantic-driven knowledge systems for measurement and mitigation tools, with particular reference to IT operational risks faced by IT-intensive organisations. 41
Processing of Quantitative Data § Typical Input: Finance reports in PDF 42
PDF to XBRL (OWL-XBRL) § Mapping from PDF to HTML/XML § Detection in the HTML/XML of relevant layout information that helps in reconstructing the logical units of the original PDF documents (title, header/footer, footnote, tables, free text) § Mapping of terms found in the XML version of the document to XBRL labels. Disambiguating where needed. § Checking if all the lines of the PDF documents are XBRL compliant. Non-compliant information to be saved in a log file. Towards a XBRL checker of balance sheets delivered in proprietary formats. § Generation of the results of the PDFto. XBRL procedure in a multilingual setting 43
Processing of Qualitative Data § TURNOVER, INCOME, GROWTH: “State of revenues, if depurated from sales related to Consip contract award, which remarkably affected the turnover in 2003, would have, on the contrary, recorded an increase of 3, 23% against that microinformatics market which recorded an increase of 3, 2% (Sirmi, january 2005). ” § Task of identifying relevant expressions and to classify them 44
Integration of Data § The Challenge: Merging data and information extracted from various types of documents. Also in various languages. And in the XG use case, especially integrated information from news wires. 45