9970c1b22f13b7799b8509cae34f612a.ppt
- Количество слайдов: 30
Oh! So That is What you Meant: The Interplay of Data Quality and Data 2003 International Conference on Conceptual Modeling Semantics (ER’ 03) Chicago – October 2003 Stuart Madnick MIT Sloan School of Management smadnick@mit. edu © S. Madnick, 2003 V 4 1
Agenda n Examples of Multiple Interpretations No one “right” answer u “Wrong” answer = bad quality data u n Corporate Householding Examples u Framework u n COntext INterchange (COIN) Approach Basic concept u Applied to Corporate Householding u 2
Example Questions • Simple questions: • “How much did MIT buy from IBM last year ? ” • “How much did IBM sell to MIT last year ? ” [ Do you expect the answers to be the same ? ] • “How much did Merrill Lynch loan to IBM last year ? ” 3
There can be multiple purposes and contexts with differing answers Picture of old lady or young lady ? 4
Data source. . . (how do you cite in a Journal article? ) 5
Role Of Context 02 -01 -03 $ Context 01 -02 -03 ? £ Context ¥ l 03 -02 -01 CONTEXT VARIATIONS: - GEOGRAPHIC ( US vs. UK ) - FUNCTIONAL (CASH MGMT vs. LOANS ) - ORGANIZATIONAL ( CITIBANK vs. CHASE ) Data: Databases Web data E-mail 6
Types of Context Representational Ontological Temporal Example Representatio Currency: $ vs € Scale factor: 1 vs nal 1000 Ontological Temporal Francs before 2000, € thereafter Revenue: Includes Revenue: vs excludes Excludes interest before 1994 but incl. 7
Example : Context Differences ( from multiple web sources) Daimler Benz ( DCX ) Financial P/E Ratio Data ABC 11. 6 Bloomberg 5. 57 DBC 19. 19 Market. Guide 7. 46 8
Additional Example n Q: How did CO 2 emissions (total, per GDP, per capita) change over time (between 1990 and 2000) in Yugoslavia? Context 1: YUG as a geographic region bounded before the breakup u Context 2: YUG as a legal autonomous state Related effort: - Laboratory for Information Globalization and Harmonization Technologies and Studies 9 ( LIGHTS ) Project u
The 1999 Overture Unit-of-measure mixup tied to loss of $125 Million Mars Orbiter “NASA’s Mars Climate Orbiter was lost because engineers did not make a simple conversion from English units to metric, an embarrassing lapse that sent the $125 million craft off course. . . . The navigators ( JPL ) assumed metric units of force per second, or newtons. In fact, the numbers were in pounds of force per second as supplied by Lockheed Martin ( the contractor ). ” 10
Revisit Example Questions • Simple questions: • “How much did MIT buy from IBM last year ? ” • “How much did IBM sell to MIT last year ? ” • “How much did Merrill Lynch loan to IBM last year ? ” • “How many employees does IBM have ? ” • Even definition of “employee” might be ambiguous • How count part-time employees ? • How count full-time consultants ? • (in university] how count joint appointments or 11
An example of “Householding” issue 12
“Household Data” • Example: letter from Smith Barney “To reduce mailing expense … We have adopted a policy known as “householding. ” … shareholders who are members of the same family, share the same address and have multiple accounts in the same Fund will receive only one copy of the annual prospectus. ” • In general, what is definition of “household”? 13
“Household” – Definition? Is it: • Father, mother and children living at same address? • What if father and mother have different last names? • Father, mother, children even if living at different addresses ? • Include cousin Alice visiting for 6 months? • An unmarried couple living together? (of any sex) • Roommates? 14
A Framework for Corporate Householding a. Identical entity instance identification Name: MIT Addr: 77 Mass Ave Name: Mass Inst of Tech Addr: 77 Massachusetts b. Entity aggregation Name: MIT Employees: 1200 Name: Lincoln Lab Employees: 840 C. Transparency of inter-entity relationships MIT Micro. Computer Comp. USA IBM 15
a. Identical entity instance identification Name: MIT Addr: 77 Mass Ave Name: Mass Inst of Tech Addr: 77 Massachusetts • Unambiguous universal identifiers rare (or rarely used) • Examples: Massaschusetts Institute of Technology Mass Inst of Tech MIT, M. I. T. , M I T • In practice a frequent problem for mailing lists 16
b. Entity aggregation Name: MIT Employees: 1200 Name: Lincoln Lab Employees: 840 • What should be included as part of an entity ? • Example: “Lincoln Lab” is “Federally Funded R&D Center of MIT” • Is Lincoln Lab included in answer to questions, such as: How many employees does MIT have ? What was MIT’s total budget last year ? How much have we sold to MIT ? 17
Example: What is “IBM” ? What is the relationship among these entities (and the changes over time – “temporal context”): • • • International Business Machines Corporation IBM Global Services IBM Global Network (1999 -) IBM de Colombia, S. A (90%) Lotus Development Corporation (100%) Software Artistry, Inc. (1998+, 2000 -) Dominion Semiconductor Company (50/50 jv) Mi. CRUS (majority jv) Computing-Tabulating-Recording Co. 18
c. Transparency of inter-entity relationships MIT Micro. Computer IBM Comp. USA • Relationships might be direct or indirect • Understand what circumstances (i. e. , contexts) should they be collapsed ? • This can be multi-leveled, especially in - financial transactions - supply chain management 19
Types of Entities and Relationships • Many possible “atoms”: Locations (“branches”) Scope (“divisions”) Ownerships (“subsidiaries”) – including fractionals Joint ventures. . . Others ? 20
Examples of multiple purposes. . . Corporate household /family structure purposes - Accounting: Account consolidation - Financial: Risk ( credit - bankruptcy, country ) - Marketing/Purchasing ( multiple divisions & subsidiaries ) Customers & Supplier consolidation ( volume discounts ) - Customer Relationship Management ( CRM) Managerial: Regional and/or Product separations Legal: Liability ( insurance ) Licensing ( software, patents ] Relationship: Consultant Conflict of interest & competition And these are dynamic … changing over time 21
Account Consolidation Example: Should Company A consolidate accounts with Company B? 22
General Challenge: There is not a single right answer • As seen with “old woman or young woman” picture • Need to be able to answer question “in context” • This issue appears in many situations … 23
Research Agenda • We do not want to build custom solution for each situation - Need to understand requirements - Need to understand sources • Process 1. Interviews – describe representative purposes / cases in detail 2. Study use of Corporate Householding in organizations 3. Identify key characteristics & typical kinds of rules 4. Incorporate into a context knowledge management system 5. Develop reasoning algorithm to map purposes onto data 24
The Context Interchange Approach Concept: Length Meters function( ) meters feet Feet Shared Ontologies Source Context Conversion Libraries Context Mediator Receiver Context part length Context Transformation 17 Source Context Management Application Select partlength From catalog Where partno=“ 12 AY” Receiver 25
Corporate Household - Ontology, Relations and Elevations Ontology Elevations CHH Relations Application Relation(s) 26
Example Contexts and Queries 27
COIN Demo – Result of Execution 28
The 1805 Overture In 1805, the Austrian and Russian Emperors agreed to join forces against Napoleon. The Russians said their forces would be in the field in Bavaria by Oct. 20. The Austrian staff planned based on that date in the Gregorian calendar. Russia, however, used the ancient Julian calendar, which lagged 10 days behind. The difference allowed Napoleon to surround Austrian General Mack's army at Ulm on Oct. 21, well before the Russian forces arrived. 29
Acknowledgements Work reported has been supported, in part, by • Banco Santander Central Hispano, DARPA, D&B, Fleet Bank, Firstlogic, Merrill Lynch, MIT Total Data Quality Management (TDQM) Program, Pricewaterhouse. Coopers, Singapore. MIT Alliance (SMA), Suruga Bank, and USAF/ROME Laboratory. For more information, go to http: //web. mit. edu/TDQM and 30