Скачать презентацию The Mysteries of Metadata Workshop at Content World Скачать презентацию The Mysteries of Metadata Workshop at Content World

cc03b636b4b7e081d8ebd9b3da7089d8.ppt

  • Количество слайдов: 110

The Mysteries of Metadata Workshop at Content World 2001, Burlingame, CA. May 15, 2001 The Mysteries of Metadata Workshop at Content World 2001, Burlingame, CA. May 15, 2001 Amit Sheth amit@taalee. com Founder/CEO, Taalee (www. taalee. com) [Taalee is now Semagix: www. semagix. com ] Also, Director, Large Scale Distributed Information Systems (LSDIS) Lab, University Of Georgia (lsdis. cs. uga. edu) Metadata Extraction is a patented technology of Taalee, Inc. Semantic Engine and World. Model are trademarks of Taale. Inc. Confidential HP

Workshop Agenda What is Metadata ? Metadata Descriptions and Standards Metadata Storage/Exchange/Infrastructure (Automated) Metadata Workshop Agenda What is Metadata ? Metadata Descriptions and Standards Metadata Storage/Exchange/Infrastructure (Automated) Metadata Creation/Extraction/Tagging Metadata Usage/Applications HP 2

What is Metadata? Data about data Statements, contexts Recursive – data about “data about What is Metadata? Data about data Statements, contexts Recursive – data about “data about data” Applications Content management Cataloguing Information retrieval, search … "A Web content repository without metadata is like a library without an index, " - Jack Jia, IWOV HP 3

Information Interoperability: key metadata objective and benefit System Syntax Structure Semantics Protocols Metadata Domain Information Interoperability: key metadata objective and benefit System Syntax Structure Semantics Protocols Metadata Domain Modeling, Ontologies HP 4

Semantics Meaning, Understanding Facts, Context, Reasoning Related to: exchange, usage, application HP 5 Semantics Meaning, Understanding Facts, Context, Reasoning Related to: exchange, usage, application HP 5

A metadata classification User Ontologies Move in this direction to tackle Classifications Domain Models A metadata classification User Ontologies Move in this direction to tackle Classifications Domain Models Domain Specific Metadata area, population (Census), information land-cover, relief (GIS), metadata concept descriptions from ontologies overload!! Domain Independent (structural) Metadata (C++ class-subclass relationships, HTML/SGML Document Type Definitions, C program structure. . . ) Direct Content Based Metadata (inverted lists, document vectors, WAIS, Glimpse, LSI) Content Dependent Metadata (size, max colors, rows, columns. . . ) Content Independent Metadata (creation-date, location, type-of-sensor. . . ) Data (Heterogeneous Types/Media) HP 6

Types of Metadata for digital media Media type-specific metadata eg. , texture of images, Types of Metadata for digital media Media type-specific metadata eg. , texture of images, font size… Media processing-specific metadata eg. , search, retrieval, personalized filtering Content Specific metadata eg. , rocket related video and documents HP 7

Metadata for Digital Data Metadata for Digital HP 8 Metadata for Digital Data Metadata for Digital HP 8

Types of Specs and Standards (or Meta. Models) Domain Independent: (MCF), RDF, MOF, Dublin. Types of Specs and Standards (or Meta. Models) Domain Independent: (MCF), RDF, MOF, Dublin. Core Media Specific: MPEG 4, MPEG 7, Voice. XML Domain/Industry Specific (metamodels): MARC (Library), FGDC and UDK (Geographic), News. ML (News), PRISM (Publishing) Application Specific: ICE (Syndication) Exchange/Sharing: XCM, XMI Orthogonal/(Other): RDFS, namespaces, ontologies, domain models, (DAML, OIL) HP 9

what RDF can do for metadata ? Designed to impose structural constraint on syntax what RDF can do for metadata ? Designed to impose structural constraint on syntax to support consistent encoding, exchange and processing of metadata. Domain Independent Metadata standard. HP 10

RDF (Resource Description Format) Resource Property Value • RDF data consists of nodes and RDF (Resource Description Format) Resource Property Value • RDF data consists of nodes and attached attribute/value pairs • Nodes can be any web resources (pages, servers, basically anything for which you can give a URI), even other instances of metadata. • Attributes are named properties of the nodes, and their values are either atomic (text strings, numbers, etc. ) or other resources or metadata instances. HP 11

RDF Example 1 dc: title URI: TALK dc: creator Mysteries of Metadata URI: AMIT RDF Example 1 dc: title URI: TALK dc: creator Mysteries of Metadata URI: AMIT Mysteries of Metadata HP 12

RDF Example 2 dc: title URI: TALK dc: creator BIB: Aff URI: LIB Mysteries RDF Example 2 dc: title URI: TALK dc: creator BIB: Aff URI: LIB Mysteries of Metadata URI: AMIT BIB: Name Amit Sheth BIB: Email amit@taalee. com HP 13

RDFS (RDF Schema) Enables resource description communities to define (and share) vocabularies (museum, library, RDFS (RDF Schema) Enables resource description communities to define (and share) vocabularies (museum, library, e-commerce…) Vocabulary (in RDFS) = the meaning, characteristics, and relationships of a set of properties. HP 14

RDF Based Web RDF Schemas RDF/XML Descriptions Resources HTML Source: http: //www. w 3 RDF Based Web RDF Schemas RDF/XML Descriptions Resources HTML Source: http: //www. w 3 c. rl. ac. uk HP 15

Dublin Core Metadata Initiative Simple element set designed for resource description International, inter-discipline, W Dublin Core Metadata Initiative Simple element set designed for resource description International, inter-discipline, W 3 C community consensus “Semantic” interface among resource description communities (very limited form of semantics) Source: www. desire. org HP 16

Dublin Core RDF <xml> <? namespace href = HP 17

MOF (Metadata Object Facility) and XMI MOF models metadata using a subset of UML MOF (Metadata Object Facility) and XMI MOF models metadata using a subset of UML that is relevant to modeling metadata (class models - classes, associations and subtyping), a set of rules for mapping the elements of the MOF Core to CORBA IDL XML Metadata Interchange (XMI) is an extension of the MOF into the XML space HP 18

News. ML is a packaging and metadata format for news content. News. ML is News. ML is a packaging and metadata format for news content. News. ML is developed by the International Press Telecommunications Council (IPTC), a consortium of news providers, mostly in the print or wire-service industries. Since it deals only with packaging and metadata, News. ML is complementary both to news content formats like NITF and to syndication protocols like ICE. HP 19

News. ML… It can be used by news providers to combine their pictures, video, News. ML… It can be used by news providers to combine their pictures, video, text, graphics and audio files in news output available on web sites, mobile phones, high end desktops interactive television and any other device. accurate, objective set of description tools, which help qualify the information and make the search more precise. News. ML allows a range of metadata to be attached to a multi-media story, including a detailed computerreadable description of what an item is about. HP 20

Example of the end-to-end flow - News. ML The content provider supplies News. ML Example of the end-to-end flow - News. ML The content provider supplies News. ML packaged media content to the operator. The content is categorized as current events, finance, sport, etc. and updated hourly. Source: http: //www. mediabricks. com The operator receives News. ML data from the content provider. The content server automatically pushes updated news articles to all news service subscribers. Consumers sign up for the news service directly on the device. When using the news service, the user browses through the categories and reads the news articles. The news articles are presented in a continuous flow (one after the other) without end-user interaction. HP 21

PRISM Publishing Requirements for Industry Standard Metadata Version: 1. 0, April 2001 Authors: IDEAlliance PRISM Publishing Requirements for Industry Standard Metadata Version: 1. 0, April 2001 Authors: IDEAlliance (Adobe, Vignette, Kinecta et al. ) Idea: “a standard for interoperable content description, interchange, and reuse in both traditional and electronic publishing contexts” Web site: http: //www. prismstandard. org HP 22

PRISM Design Built on existing standards like Dublin Core (DC), RDF, XML Designed to PRISM Design Built on existing standards like Dublin Core (DC), RDF, XML Designed to be used in a simple, straightforward way over the Internet Compatible with News. ML Integrates easily with ICE (for syndication) Vocabulary: Basic: DC Extensions: “Controlled Vocabularies”, e. g. , “North American Industrial Classification System“ (NAICS) HP 23

PRISM Example Photograph taken at 6: 00 am on Corfu with two models Walking on the Beach in Corfu John Peterson Sally Smith, lighting image/jpeg (Source: PRISM spec v. 1; http: //www. prismstandard. org/techdev/prismspec 1. asp) HP 24

Voice. XML §A language for specifying voice dialogs. §Voice dialogs use audio prompts and Voice. XML §A language for specifying voice dialogs. §Voice dialogs use audio prompts and text- to- speech (TTS) for output; touch- tone keys (DTMF) and automatic speech recognition (ASR) for input. §Goal is to bring the advantages of web-based development and content delivery to interactive voice response applications. §High- level voice-specific language simplifies application development. Source: http: //www. voicexml. org HP 25

Voice Based Internet Applications Source: http: //www. voicexml. org HP 26 Voice Based Internet Applications Source: http: //www. voicexml. org HP 26

Voice XML Metadata Voice Specific metadata Supports Syntactic interoperablity Text data to voice data Voice XML Metadata Voice Specific metadata Supports Syntactic interoperablity Text data to voice data Voice XML = XML + Voice Metadata HP 27

Voice. XML – Possible Services § Information retrieval – News, sports, traffic, stock quotes. Voice. XML – Possible Services § Information retrieval – News, sports, traffic, stock quotes. § e- Transactions (e- commerce, e- tailing, etc. ) §Financial: banking, stock trading. §Catalog browsing (generally as an adjunct to paper). § Telephone services § Personal voice dialing, One- number find- me services. § Intranet – Inventory, HR services, corporate portals. § Unification – My Whatever: personal portals, personal agents, unified messaging. Source: http: //www. voicexml. org HP 28

Information and Content Exchange (ICE) Main Goal: efficient and extensible Content Syndication protocol for Information and Content Exchange (ICE) Main Goal: efficient and extensible Content Syndication protocol for the Internet, using XML syntax Authors: Adobe, Kinecta, MS, Sun, Vignette et al. Status: latest spec version 1. 1, May 2000; submitted to W 3 C for review Implementations: Vignette Syndication Server, MS Biz. Talk, Kinecta Interact, … Web Site: http: //www. icestandard. org HP 31

What is the ICE Protocol? Syndication Protocol for communication between Syndicators and Subscribers Metadata What is the ICE Protocol? Syndication Protocol for communication between Syndicators and Subscribers Metadata to define roles and responsibilities of involved parties: Subscriber vs. Syndicator, Requestor vs. Responder, Sender vs. Receiver format and method of content exchange (e. g. , sequenced packages, pull vs. push model) HP 32

ICE Applications ICE vocabulary + domain vocabulary = complete application ICE establishes and manages ICE Applications ICE vocabulary + domain vocabulary = complete application ICE establishes and manages the syndication delivers data logs events => content-independent metadata industry-specific vocabulary defines the content => domain-specific metadata Source: http: //www. icestandard. org HP 33

ICE Explained ICE: Information and Content Exchange protocol Syndicator: A content aggregator and distributor ICE Explained ICE: Information and Content Exchange protocol Syndicator: A content aggregator and distributor Subscriber: A content consumer Subscription: An agreement between a subscriber and a syndicator for the delivery of content according to the delivery policy and other parameters in the agreement Collection: The current content of a subscription ICE Package: A delivery of commands to update a collection such as the addition of content items ICE Payload: The XML document used by ICE to carry protocol information. Examples include requests for packages, catalogs of subscription offers, usage logs and other management information Sources: Internet. Week; "ICE Cookbook, version 1. 0" http: //www. internetweek. com/ebizapps 01/ebiz 050701 -3. htm HP 34

" src="https://present5.com/presentation/cc03b636b4b7e081d8ebd9b3da7089d8/image-33.jpg" alt=" " /> Pd. XIWZQ 8 Ii. PLh. Hr. Qcrjx. AQ 8 Vqu. FJS 8 v. DC … (ASCII-encoded image) Content (domain-specific metadata)

XCM (e. Xtended Content Management) a framework that allows customers to classify content management XCM (e. Xtended Content Management) a framework that allows customers to classify content management offerings according to the business problems they address. The segments of XCM are Content Development - Developing static content and managing the process of its subsequent approval, versioning, storage, and retrieval. Application Content Management (Vignette) - Deploying content dynamically to a Web site and managing that content throughout its online lifecycle. Content Delivery - Delivering content through multiple channels to minimize customer waiting time and improve Web site stability and scalability. Source : http: //www. vignette. com/CDA/Site/0, 2097, 1 -1 -30 -1458 -1146 -1743, 00. html HP 36

XCM e. Xtended Content Management Content Development Management Application Content Management Content Delivery Content XCM e. Xtended Content Management Content Development Management Application Content Management Content Delivery Content Authoring Digital Asset Management Software Configuration Management Document Process Management Metadata Management Recombination Personalization Edge Network Delivery Streaming Media Delivery Caching Source : http: //www. vignette. com/ HP 37

Multiple heterogeneous metadata models with different tag names for the same data in the Multiple heterogeneous metadata models with different tag names for the same data in the same GIS domain Kansas State FGDC Metadata Model UDK Metadata Model Theme keywords: digital line graph, hydrography, transportation. . . Search terms: digital line graph, hydrography, transportation. . . Title: Dakota Aquifer Title Topic: Dakota Aquifer Online linkage: http: //gisdasc. kgs. ukans. edu/dasc/ Adress Id: http: //gisdasc. kgs. ukans. edu/dasc/ Direct Spatial Reference Method: Vector Measuring Techniques: Vector Horizontal Coordinate System Definition: Universal Transverse Mercator Co-ordinate System: Universal Transverse Mercator … … …. . . HP 38

Different views of Metadata Domain Independent Specifications (RDF) Frameworks/Infrastructures (XCM) Application Specific ICE Metadata Different views of Metadata Domain Independent Specifications (RDF) Frameworks/Infrastructures (XCM) Application Specific ICE Metadata Media Specific MPEG 7, Voice. XML Domain Specific News. ML, FGDC/UDK HP 39

Creating and Serving Metadata to Power the Life-cycle of Content Taalee Infrastructure Services Taalee Creating and Serving Metadata to Power the Life-cycle of Content Taalee Infrastructure Services Taalee Content Applications Produce Aggregate Catalog/ Index Integrate Syndicate Personalize Interactive Marketing Where is the content? Whose is it? What is this content about? What other content is it related to? What is the right content for this user? What is the best way to monetize this interaction? Taalee Semantic Meta. Base Broadcast, Wireline, Wireless, Interactive TV HP 40

Taalee’s Intelligent Content Process HP 41 Taalee’s Intelligent Content Process HP 41

Metadata Creation and Semanticization • Automatic Content Classification/Categorization • Metadata Creation/Extraction: Types of metadata Metadata Creation and Semanticization • Automatic Content Classification/Categorization • Metadata Creation/Extraction: Types of metadata created Semantic Engine and World. Model are trademarks of Taalee, Inc. Metadata Extraction is a patented technology of Taalee, Inc. HP 42

Forms/Types/Ingest of Content Sources: Web Sites, Content Feeds and Private Repositories Types: Text, Graphics, Forms/Types/Ingest of Content Sources: Web Sites, Content Feeds and Private Repositories Types: Text, Graphics, Audio, Video, Multimedia Forms: Unstructured text, Semi-structured text, Structured text (+Media); Static or Dynamic Ingest: Feed (push), Web (pull), Repository/Database (usually pull) HP 43

Content Handling/Ingest Infrastructure/Exchange Feed Handlers Crawlers/Screen Scrapers/Bots Software Agents Centralized, Distributed, Mobile/Migratory HP 44 Content Handling/Ingest Infrastructure/Exchange Feed Handlers Crawlers/Screen Scrapers/Bots Software Agents Centralized, Distributed, Mobile/Migratory HP 44

Information Extraction for Metadata Creation Nexis UPI AP Global/Enterprise Web Repositories Digital Videos Documents Information Extraction for Metadata Creation Nexis UPI AP Global/Enterprise Web Repositories Digital Videos Documents . . . Data Stores Digital Maps . . . Digital Images Digital Audios EXTRACTORS METADATA HP 45

Extracting a Text Document: Syntactic approach INCIDENT MANAGEMENT SITUATION REPORT Friday August 1, 1997 Extracting a Text Document: Syntactic approach INCIDENT MANAGEMENT SITUATION REPORT Friday August 1, 1997 - 0530 MDT LAYOUT NATIONAL PREPAREDNESS LEVEL II CURRENT SITUATION: Alaska continues to experience large fire activity. Additional fires have been staffed for structure protection. SIMELS, Galena District, BLM. This fire is on the east side of the Innoko Flats, between Galena and Mc. Gr The fore is active on the southern perimeter, which is burning into a continuous stand of black spruce. The fire has increased in size, but was not mapped due to thick smoke. The slopover on the eastern perimeter is 35% contained, while protection of the historic cabit continues. Date => day month int ‘, ’ int CHINIKLIK MOUNTAIN, Galena District, BLM. A Type II Incident Management Team (Wehking) is assigned to the Chiniklik fire. The fire is contained. Major areas of heat have been mopped up. The fire is contained. Major areas of heat have been mopped-up. All crews and overhead will mop-up where the fire burned beyond the meadows. No flare-ups occurred today. Demobilization is planned for this weekend, HP 46 depending on the results of infrared scanning.

Traditional Text Categorization Customer Training Set Statistical/AI Techniques fee d Classify Place in a Traditional Text Categorization Customer Training Set Statistical/AI Techniques fee d Classify Place in a taxonomy Routing/Distribution Customer Article Feed 4715 Classification of Article 4715 Standard Metadata Feed Source: i. Syndicate Posted Date: 11/20/2000

Taalee’s Categorization & Automatic Metadata Creation Knowledge-base & Statistical/AI Techniques Taalee Training Set Classify Taalee’s Categorization & Automatic Metadata Creation Knowledge-base & Statistical/AI Techniques Taalee Training Set Classify Place in a taxonomy Catalog Metadata FTE Article 4715 Metadata ed Standard metadata fe Customer Training Set Automated Content Enrichment (ACE) Semantic metadata Feed Source: i. Syndicate Posted Date: 11/20/2000 Company Name: France Telecom, Equant Ticker Symbol: FTE, ENT Exchange: NYSE Topic: Company News Company Analysis Conference Calls Earnings Stock Analysis ENT Company Analysis Conference Calls Earnings Stock Analysis NYSE Member Companies Market News IPOs Classification of Article 4715 Article Feed 4715 Taalee Enterprise Content Manager Customization Suite Precise syndication/filtering Routing/Distribution Map to another taxonomy

Automatic Categorization & Metadata Tagging (unstructured text/transcript of A/V) Video Segment with Associated Text Automatic Categorization & Metadata Tagging (unstructured text/transcript of A/V) Video Segment with Associated Text ABSOLUTE CONTROL OF THE SENATE IS STILL IN QUESTION. AS OF TONIGHT, THE REPUBLICANS HAVE 50 SENATE SEATS AND THE DEMOCRATS 49. IN WASHINGTON STATE, THE SENATE RACE REMAINS TOO CLOSE TO CALL. IF THE DEMOCRATIC CHALLENGER UNSEATS THE REPUBLICAN IUMBENT THE SENATE WILL BE EVENLY DIVIDED. IN MISSOURI, REPUBLICAN SENATOR JOHN ASHCROFT SAYS HE WILL NOT CHALLENGE HIS LOSS TO GOVERNOR MEL CARNAHAN WHO DIED IN A CRASH THREE WEEKS AGO. GOVERNOR CARNAHAN'S WIFE IS EXPECTED TO TAKE HIS PLACE. IN THE HIGHEST PROFILE SENATE EVENT OF THE NIGHT, HILLARY CLINTON WON THE NEW YORK SENATE SEAT. SHE IS THE FIRST LADY TO RUN MUCH LESS WIN. Segment Description Auto Categorization Semantic Metadata HP 49

Automatic Categorization & Metadata Tagging (Web page) Video with Editorialized Text on the Web Automatic Categorization & Metadata Tagging (Web page) Video with Editorialized Text on the Web Auto Categorization Semantic Metadata HP 50

Automatic Categorization & Metadata Tagging (Feed) Text From Bllomberg Auto Categorization Semantic Metadata HP Automatic Categorization & Metadata Tagging (Feed) Text From Bllomberg Auto Categorization Semantic Metadata HP 51

Taalee Extraction and Knowledgebase Enhancement Web Page Enhanced Metadata Asset Extraction Agent HP 52 Taalee Extraction and Knowledgebase Enhancement Web Page Enhanced Metadata Asset Extraction Agent HP 52

Basis for Semantics A. Facts/Concepts/Terms/Entities Dictionary, Thesaurus, Reference Data, Vocabulary B. Facts with Relationships Basis for Semantics A. Facts/Concepts/Terms/Entities Dictionary, Thesaurus, Reference Data, Vocabulary B. Facts with Relationships Taxonomy/(Categories), Ontology Domain Modeling (e. g. , Golf = golfer, tournament name, golf course, event) Knowledge Base HP 53

Basis for Semantics C. Reasoning/Inference (Statistical) (Information Retrieval) Statistical Learning/AI (Bayesian, Neural Networks, HMM, Basis for Semantics C. Reasoning/Inference (Statistical) (Information Retrieval) Statistical Learning/AI (Bayesian, Neural Networks, HMM, …) Logic Based (Description Logic) Natural Language/Grammar (part of speech, . . ) HP 54

Alternatives for Metadata Extraction Statistical methods/Cluster Analysis Learning/AI and Collab. Filtering Word or Phrase Alternatives for Metadata Extraction Statistical methods/Cluster Analysis Learning/AI and Collab. Filtering Word or Phrase Reference data/Concept-terms/ Dictionary/Thesaurus By topic/industry/subject/domain Ontologies/Domain Models deeper understanding Knowledge. Base By Entities and Relationships HP 55

Open Directory Project (ODP): Classification/Taxonomy & Directory HP 56 Open Directory Project (ODP): Classification/Taxonomy & Directory HP 56

Ontology Standardize meaning, description, representation of involved attributes Capture the semantics involved via domain Ontology Standardize meaning, description, representation of involved attributes Capture the semantics involved via domain characteristics Allow knowledge sharing and reuse (Ontological Commitment) HP 57

An Ontology HP 59 An Ontology HP 59

Example: Interrelated ontologies Example: Interrelated ontologies

Large Vocabularies/ Taxonomies/Ontologies Word. Net The Medical Subject Headings (Me. SH): NLM's controlled vocabulary Large Vocabularies/ Taxonomies/Ontologies Word. Net The Medical Subject Headings (Me. SH): NLM's controlled vocabulary used for indexing articles, for cataloging books and other holdings, and for searching Me. SHindexed databases, including MEDLINE. Me. SH terminology provides a consistent way to retrieve information that may use different terminology for the same concepts. Year 2000 Me. SH includes more than 19, 000 main headings, 110, 000 Supplementary Concept Records (formerly Supplementary Chemical Records), and an entry vocabulary of over 300, 000 terms. HP 61

Metadata enabled Applications Confidential HP Metadata enabled Applications Confidential HP

Metadata Usage: Impact on Search & Query processing l traditional queries based on keywords Metadata Usage: Impact on Search & Query processing l traditional queries based on keywords l attribute based queries l content-based queries HP 63

Oingo. com Oingo Ontology – ODP based(? ), the database of millions of concepts Oingo. com Oingo Ontology – ODP based(? ), the database of millions of concepts and relationships that powers Oingo's semantic technology Oingo Seek - the database of millions of concepts and relationships that powers Oingo's semantic technology Oingo Sense - the knowledge extraction tool that uncovers the essential meaning of information by sensing concepts and context Oingo Lingua - the language of meaning used to state intent. The basis for intelligent interaction Assets catalogued are Web sites or Web pages. HP 64

Use of Categories for Search After 3 or 4 clicks HP 65 Use of Categories for Search After 3 or 4 clicks HP 65

Metadata & Search Metadata can improve search significantly, but metadata enables much more than Metadata & Search Metadata can improve search significantly, but metadata enables much more than search Alternatives for improving search: clustering, link and other analysis (e. g. , Google’s Link Flux analysis), classification as context, ontologies, metadata, knowledgebases … HP 69

Metadata Usage: Keyword, Attribute and Content Based Access HP 70 Metadata Usage: Keyword, Attribute and Content Based Access HP 70

Keyword Search vs Attribute Search with Semantic metadata Taalee Metadata on Football Assets Metadata Keyword Search vs Attribute Search with Semantic metadata Taalee Metadata on Football Assets Metadata from Typical Virage Search on Cataloging of Football football touchdown Assets Rich Media Reference Page Baltimore 31, Pit 24 http: //www. nfl. com Brian Griese Interview Part Four Brian Griese talks about the first touchdown he ever threw. URL: http: //cbs. sportsline. . . Jimmy Smith Interview Part Seven Jimmy Smith explains his philosophy on showboating. URL: http: //cbs. sportsline. . . Quandry Ismail and Tony Banks hook up for their third long touchdown, this time on a 76 -yarder to extend the Raven’s lead to 31 -24 in the third quarter. League: Professional Teams: Ravens, Steelers Score: Bal 31, Pit 24 Players: Quandry Ismail, Tony Banks Event: Touchdown Produced by: NFL. com Posted date: 2/02/2000 HP 71

Taalee’s Semantic Search Highly customizable, precise and freshest A/V search Delightful, relevant information, exceptional Taalee’s Semantic Search Highly customizable, precise and freshest A/V search Delightful, relevant information, exceptional targeting opportunity Context and Domain Specific Attributes Uniform Metadata for Content from Multiple Sources, Can be sorted by any field HP 72

HP 73 Creating a Web of related information What can a context do? HP 73 Creating a Web of related information What can a context do?

Taalee Directory Georgia Bulldogs System recognizes ENTITY & CATEGORY Taalee Directory Georgia Bulldogs System recognizes ENTITY & CATEGORY

Taalee Directory Careless whisper Taalee Directory Careless whisper

Semantic Relationships HP 76 Semantic Relationships HP 76

Metadata Application Example Semantic Applications for highly relevant and fresh content: Personalization and Targeting/interactive Metadata Application Example Semantic Applications for highly relevant and fresh content: Personalization and Targeting/interactive marketing Please contact Taalee for live demonstrations HP 77

Personalized Directory Change Context Obtain a whole universe of information (that you may not Personalized Directory Change Context Obtain a whole universe of information (that you may not even have thought of) about some entities that have always been of interest to you. Please enter such semantic keywords below.

Personalized Queries & Hot Topics Personalized Queries 1. My Stock Portfolio Microsoft suffers serious Personalized Queries & Hot Topics Personalized Queries 1. My Stock Portfolio Microsoft suffers serious hack attack Cisco Systems Inc PERSONALIZATION Analyst Safa Rashtchy on Yahoo! People. Soft, Inc AT&T Corp. 2. My Football Fantasy Team more… Gators' Spurrier ready for 'big' game Tech's Vick looks to become complete QB Bucs excited about Hamilton HOT Topics!!! Jasper Sanks rumbles into the end zone… Edwards explains reasons for leaving BYU 1. Election 2000 more… Video: Explaining the electoral map 3. Julia Roberts Collection Race for White House hots up Seniors Give Gore Florida Edge Movie Trailer: "Notting Hill" more… Trailer - Runaway Bride 2. Middle East Peace Conflict Patrick More die as Israel steps up security Movie Trailer: "Stepmom" Israel braces for suicide bombs more… Conspiracy Theory Pentagon probes Cole's security 4. Pink Floyd Collection 3. Napster Controversy Set the Controls for the Heart of the Sun… The Brain Behind Napster Wish You Were Here Napster Lawsuit Round Around Keep Talking Creative Nomad II more… The Post War Dream more…

Metadata: Targeting HP 80 Metadata: Targeting HP 80

Semantic/Interactive Targeting Buy Al Pacino Videos Buy Russell Crowe Videos Buy Christopher Plummer Videos Semantic/Interactive Targeting Buy Al Pacino Videos Buy Russell Crowe Videos Buy Christopher Plummer Videos Buy Diane Venora Videos Buy Philip Baker Hall Videos Buy The Insider Video Precisely targeted through the use of Structured Metadata and integration from multiple sources

Web: Extreme Personalization Realtime Feeds Web sites and Pages Interests, Preferences Time-Shifted Content Aggregator Web: Extreme Personalization Realtime Feeds Web sites and Pages Interests, Preferences Time-Shifted Content Aggregator Content Databases Personalized Content Semantic Engine. TM Structured, Hi-Quality Semantic Metabase HP 82

Application of Semantic Metadata and Automatic Content Enrichment My. Media $ My. Stocks % Application of Semantic Metadata and Automatic Content Enrichment My. Media $ My. Stocks % » News w Sports ¯ Music % User has already completed Web Based registration and personalization at Voquette’s Enterprise Customer site. User’s “Wireless Home page” shows the categories for his interests. There is an alert (new content) for his stock and sports categories. HP 83

Application of Semantic Metadata and Automatic Content Enrichment My. Media My Stocks $ CSCO Application of Semantic Metadata and Automatic Content Enrichment My. Media My Stocks $ CSCO My. Stocks % » News w Sports ¯ Music % NT IBM Market Clicking on My. Stocks brings down user’s Personal Portfolio list. The user wants to see news items about Cisco (see next slide). Search at the bottom is a semantic search that understands the financial domain, and the knowledge of user’s portfolio. Typically search can be done by typing one word or selecting from a dynamic, personalized menu. HP 84

Application of Semantic Metadata and Automatic Content Enrichment Different types of recent audio content Application of Semantic Metadata and Automatic Content Enrichment Different types of recent audio content about Cisco are available. CSCO My. Media $ My Stocks CSCO My. Stocks % » News w Sports ¯ Music NT % Analyst Call Conf Call Earnings The user clicks to see a listing of Analyst Calls on Cisco (next slide). IBM Market É~Þ % Icons at the bottom of the screen enable contextually relevant functions: listen, set alert on story, add to playlist. HP 85

Application of Semantic Metadata and Automatic Content Enrichment CSCO Analysis My Stocks My. Media Application of Semantic Metadata and Automatic Content Enrichment CSCO Analysis My Stocks My. Media $ My. Stocks » News w Sports ¯ Music % % CSCO NT IBM CSCO Analyst Call 11/08 ON 24 Payne 11/07 ON 24 H&Q Conf Call CC 11/06 CBS Langlesis Earnings Market É~Þ % Clicking on the link for Cisco Analyst Calls displays a listing sorted by date. Semantic filtering uses just the right metadata to meet screen and other constrains. E. g. , Analyst Call focuses on the source and analyst name or company. The icon denote additional metadata, such as “Strong Buy” by H&Q Analyst. HP 86

i. TV: Taalee’s Extreme Personalization Immediate Interests, Preferences, Content Provider (DBS, DISH, Wink, AOL-TV) i. TV: Taalee’s Extreme Personalization Immediate Interests, Preferences, Content Provider (DBS, DISH, Wink, AOL-TV) Content, “Programs” Meta-Data Tagged Content Personalized Content Capsules, Redirects and Programming Semantic Engine. TM Structured, Hi-Quality Semantic Metabase HP 87

Metadata for Automatic Content Enrichment Interactive Television This screen is customizable with interactivity feature Metadata for Automatic Content Enrichment Interactive Television This screen is customizable with interactivity feature using metadata such as whethere is a new Conference Call video on CSCO. Part of the screen can be automatically customized to show conference call specific information– including transcript, participation, etc. all of which are relevant metadata Conference Call itself can have embedded metadata to support personalization and interactivity. This segment has embedded or referenced metadata that is used by personalization application to show only the stocks that user is interested in. HP 88

Metadata in Enterprise Apps Collection Sony Processing Production Support Network Content §Categorize Affiliate Feeds Metadata in Enterprise Apps Collection Sony Processing Production Support Network Content §Categorize Affiliate Feeds §Catalog §Integrate Public Sources Rich Data Metabase Filter, Search, Consolidate, Personalize, Archive, Licensing, Syndication HP 89

Description (1. 33) – 12/06/00 - ABC (2. 53) - 12/06/00 - CBS (5. Description (1. 33) – 12/06/00 - ABC (2. 53) - 12/06/00 - CBS (5. 16) - 12/06/00 - ABC (2. 46) - 12/06/00 - FOX (1. 33) - 12/06/00 - NBC -- Breaking News -Gore Demands That Recount Restart (5. 33) - 12/06/00 (1. 33) - 12/06/00 - CBS (1. 33) - 12/06/00 - ABC Gore Says Fla. Can't Name Electors (3. 57) - 12/06/00 - CBS (2. 33) - 12/06/00 - CBS Bush Meets Colin Powell at Ranch (4. 27) - 12/06/00 - ABC (3. 12) - 12/06/00 - NNS Market Tumbles on Earnings Warning (3. 44) - 12/06/00 - FOX (0. 32) - 12/06/00 - CBS Barak Outlines His Peace Plan (1. 33) - 12/06/00 - CBS (7. 24) - 12/06/00 - CBS Produced by : CNN Posted Date : 12/07/2000 Reporter : David Lewis Event : Election 2000 Location : Tallahassee, Florida, USA People : Al Gore TALLAHASSEE, Florida (CNN) – Though the two presidential candidates have until noon Wednesday to file briefs in Al Gore's appeal to the Florida Supreme Court, the outcome of two trials set on the same day in Leon County, Florida, may offer Gore his best hope for the presidency. Democrats in Seminole County are seeking to have 15, 000 absentee ballots thrown out in that heavily Republican jurisdiction -- a move that would give Gore a lead of up to 5, 000 votes statewide. Lawyers for the plaintiff, Harry Jacobs, claim the ballots should be rejected because they say County Elections Supervisor Sandra Goard allowed Republican workers to fill out voter identification numbers on 2, 126 incomplete absentee ballot applications sent in by GOP voters, while refusing to allow Democratic workers to do the same thing for Democratic voters. The GOP says that suit, and one similar to it from Martin County, demonstrates Democratic Party politics at its most desperate. Gore is not a party to either of those lawsuits. On Tuesday, the judge in the HP 91

Metadata’s role in emerging i. TV infrastructure Video Enhanced Digital Cable MPEG-2/4/7 MPEG Encoder Metadata’s role in emerging i. TV infrastructure Video Enhanced Digital Cable MPEG-2/4/7 MPEG Encoder Create Scene Description Tree Channel sales through Video Server Vendors, Video App Servers, and Broadcasters “Cisco Systems” Node Taalee Semantic Engine MPEG Decoder GREAT USER EXPERIENCE Retrieve Scene Description Track Node = AVO Object License metadata decoder and semantic applications to device makers Scene Description Tree §Produced by: Fox Sports §Creation Date: 12/05/2000 §League: NFL §Teams: Seattle Seahawks, § Atlanta Falcons §Players: John Kitna §Coaches: Mike Holmgren, § Dan Reeves §Location: Atlanta Object Content Information (OCI) Enhanced XML Description “Cisco Systems” Metadata-rich Value-added Node HP 92

Intelligent Metadata Creation Usage Metadata for Intelligent Content which does contain the words the Intelligent Metadata Creation Usage Metadata for Intelligent Content which does contain the words the user asked for Extractor Agents + Content which does not contain the words the user asked for, but is about what he asked for. Value-added Metadata + Content the user did not think to ask for, but which he needs to know. Semantic Associations HP 93

Intelligent Content via Value-Added Metadata HP 94 Intelligent Content via Value-Added Metadata HP 94

Example 1 – Snapshots (“Jamal Anderson”) Search for ‘Jamal Anderson’ in ‘Football’ Click on Example 1 – Snapshots (“Jamal Anderson”) Search for ‘Jamal Anderson’ in ‘Football’ Click on first result for Jamal Anderson View the original source HTML page. Verify that the source page contains no mention of Team name and League name. They were Taalee’s valueadditions to the metadata to facilitate easier search. View metadata. Note that Team name and League name are also included in the metadata HP 98

Example 2 – Snapshots (“Gary Sheffield”) Search for ‘Gary Sheffield’ in ‘Baseball’ Click on Example 2 – Snapshots (“Gary Sheffield”) Search for ‘Gary Sheffield’ in ‘Baseball’ Click on first result for Gary Sheffield View the original source HTML page. Verify that the source page contains no mention of Team name and League name. They were Taalee’s valueadditions to the metadata to facilitate easier search. View metadata. Note that Team name and League name are also included in the metadata HP 99

Intelligent Content – Value-Added Metadata Some Metadata are obtained explicitly from the asset. Others Intelligent Content – Value-Added Metadata Some Metadata are obtained explicitly from the asset. Others (not present in the asset) are added by Taalee using its semantic relationships. The asset is richly, fully described in the many ways the users chose to interact. League Name of league to which the payer’s team belongs – Not Name mentioned explicitly in asset – Valueadded by Taalee’s processing based on semantic associations. Rich Media Sports Asset Posted Date Team Name of team for which player plays – Not Date of asset posting – Extracted automatically Name of content provider that produced the asset Name of sport Producer Name of players mentioned explicitly in the asset – Extracted automatically Player Names Sport mentioned explicitly in asset – Value-added using Taalee’s semantic relationships Legend: X Y means Taalee uses X to add Y as value-added metadata to the asset HP 100

Intelligent Content via Semantic Associations HP 101 Intelligent Content via Semantic Associations HP 101

Example (test on http: //directory. mediaanywhere. com) Links to news on companies that compete Example (test on http: //directory. mediaanywhere. com) Links to news on companies that compete against Commerce One Search for company ‘Commerce One’ Crucial news on Links to news on companies Commerce One’s Commerce One competes competitors (Ariba) can against be accessed easily and (To view news on Ariba, click automatically on the link for Ariba) HP 103

ASP/Enterprise hosted Internal Source 1 Research Extractor Agent 1 2 World Model Consults Knowledge ASP/Enterprise hosted Internal Source 1 Research Extractor Agent 1 2 World Model Consults Knowledge Base for Cisco’s competition Internal Source 2 Extractor Agent 2 3 External feeds/Web (e. g. Reuters) Extractor Agent 3 Lucent story from external feeds picked for publishing as “semantically related” to Cisco story – passed on to Dashboard Returns result: Lucent is a competitor of Cisco Story on Cisco Semantic Engine Semantic Application 4 1 Cisco story from PW Source 1 passed on to add semantic associations Story on Lucent Taalee Metabase Metadata centric Content Management Architecture XCM-compliant metadata, XML or other format Third-party Content Mgmt And Syndication HP 104

Semantic Associations supported by Taalee Semantic Engine Intelligent Content = What You Asked for Semantic Associations supported by Taalee Semantic Engine Intelligent Content = What You Asked for + What you need to know! Related Stock News COMPANY Competition COMPANIES in INDUSTRY with Competin PRODUCTS COMPANIES in Same or Related INDUSTRY Technology Products Important to INDUSTRY or COMPANY Regulations Industry News EPA Impacting INDUSTRY or Filed By COMPANY SEC HP 105

Semantic Web Application Example: Financial Advisor Research Dashboard Automatic Collation of semantically related digital Semantic Web Application Example: Financial Advisor Research Dashboard Automatic Collation of semantically related digital media information from Multiple Sources Semantically Related News Not Specifically Asked For Research Inferred Automatically Semantic Search/ Personalization, etc. HP 106

A vision for future Semantic Web, Complex Relationships and Knowledge Discovery, E. g. , A vision for future Semantic Web, Complex Relationships and Knowledge Discovery, E. g. , Info. Quilt project at LSDIS Lab, Univ. of Georgia

Beyond RDF – one proposal (cf: Ora Lassila) § Structural modeling obviously not enough Beyond RDF – one proposal (cf: Ora Lassila) § Structural modeling obviously not enough § we need a “logic layer” on top of RDF § some type of description logic is a possibility § Exposing a wide variety of data sources as RDF is useful, particularly if we have logic/rules which allow us to draw inference from this data § RDF + DL = “Frame System for WWW” Source : www. ontoknowledge. org/oil HP 108

Semantic Web - next step in Web evolution “A Web in which machine reasoning Semantic Web - next step in Web evolution “A Web in which machine reasoning will be ubiquitous and devastatingly powerful. ” [Berners-Lee] “A place where the whim of a human being and the reasoning of a machine coexist in an ideal, powerful mixture. ” [Berners-Lee] “A semantic Web would permit more accurate and efficient Web searches, which are among the most important Web-based activities. ” [Berners-Lee] A personal definition Semantic Web: The concept that Web-accessible content can be organized semantically, rather than though syntactic and structural methods. HP 109

What is DAML (DARPA Agent Markup Language) a proposal to create technologies that will What is DAML (DARPA Agent Markup Language) a proposal to create technologies that will enable software agents to dynamically identify and understand information sources, and to provide interoperability between agents in a semantic manner. Based on RDF+XML Agent readable Tags www. daml. org

Source: http: //www. zdnet. com/pcweek/stories/jumps/0, 4270, 2432946, 00. html DAML Example Source: http: //www. zdnet. com/pcweek/stories/jumps/0, 4270, 2432946, 00. html DAML Example

Three layered Architecture Of Semantic Web Logical Layer Formal Semantics and Reasoning Support – Three layered Architecture Of Semantic Web Logical Layer Formal Semantics and Reasoning Support – OIL, DAML-O Schema Layer Definition of Vocabulary RDF Schema Data Layer Simple data model and syntax for metadata - RDF

OIL – as RDF Extension <rdfs: Class rdf: ID=”herbivore”> <rdf: type rdf: resource=”http: //www. OIL – as RDF Extension

DAML and OIL – Evolving towards Semantic Web OIL Mission OIL is a Web-based DAML and OIL – Evolving towards Semantic Web OIL Mission OIL is a Web-based representation and inference layer for ontologies, which combines the widely used modeling primitives from frame-based languages with the formal semantics and reasoning services provided by description logics

Knowledge Discovery - Example Earthquake Sources Nuclear Test Sources (Oklahoma Observatory, etc. ) (USGS, Knowledge Discovery - Example Earthquake Sources Nuclear Test Sources (Oklahoma Observatory, etc. ) (USGS, NEIC) Nuclear Test May Cause Earthquakes Is it really true?

Complex Relationships A nuclear test could have caused an earthquake if the earthquake occurred Complex Relationships A nuclear test could have caused an earthquake if the earthquake occurred some time after the nuclear test was conducted and in a nearby region. Nuclear. Test Causes Earthquake <= date. Difference( Nuclear. Test. event. Date, Earthquake. event. Date ) < 30 AND distance( Nuclear. Test. latitude, Nuclear. Test. longitude, Earthquake, latitude, Earthquake. longitude ) < 10000

Knowledge Discovery - Example When was the first recorded nuclear test conducted? 1950 Find Knowledge Discovery - Example When was the first recorded nuclear test conducted? 1950 Find the total number of earthquakes with a magnitude 5. 8 or higher on the Richter scale per year starting from 1900 Increase in number of earthquakes since 1945

Knowledge Discovery - Example… For each group of earthquakes with magnitudes in the ranges Knowledge Discovery - Example… For each group of earthquakes with magnitudes in the ranges 5. 8 -6, 6 -7, 7 -8, 8 -9, and >9 on the Richter scale per year starting from 1900, find average number of earthquakes Number of earthquakes with magnitude > 7 almost constant. So nuclear tests probably only cause earthquakes with magnitude < 7

Knowledge Discovery Example… Find pairs of nuclear tests and earthquakes such that the earthequake Knowledge Discovery Example… Find pairs of nuclear tests and earthquakes such that the earthequake occurred within 30 days after the test was conducted and in a radius of 10000 miles from the epicenter of the earthquake Demo

Resources/References RDF: www. w 3. org/TR/REC-rdf-syntax/ ICE: www. icestandard. org Meta Object Facility (MOF) Resources/References RDF: www. w 3. org/TR/REC-rdf-syntax/ ICE: www. icestandard. org Meta Object Facility (MOF) Specification, Version 1. 3, September 27, 1999: http: //cgi. omg. org/cgi-bin/doc? ad/99 -09 -05 XML Metadata Interchange (XMI) Specification, Version 1. 1, October 25, 1999: http: //cgi. omg. org/cgi-bin/doc? ad/9910 -02 http: //cgi. omg. org/cgi-bin/doc? ad/99 -10 -03 DAML: www. daml. org NEWSML: newsshowcase. reuters. com PRISM: www. prismstandard. org/techdev/prismspec 1. asp XCM: www. vignette. com OIL: www. ontoknowledge. org/oil SEMANTICWEB: www. semanticweb. org VOICEXML: www. voicexml. org MPEG 7: www. darmstadt. gmd. de/mobile/MPEG 7/ Taalee: www. taalee. com Oingo: www. oingo. com

Multimedia Data Management: Using Metadata to Integrate and Apply Digital Media, Amit Sheth and Multimedia Data Management: Using Metadata to Integrate and Apply Digital Media, Amit Sheth and Wolfgang Klas, Eds. , Mc. Graw Hill, ISBN: 0 -07 -057735 -8, 1998.