0e58ff9562b438b63e58e12db10c111d.ppt
- Количество слайдов: 40
IBM Information Server Understanding Your Source Systems © 2005 IBM Corporation
Corporate View of Information Architecture Is Changing § Information is the key to Business Innovation – Organizations highly effective at driving information integration are 5 times more likely to drive value creation – Information architecture can’t exist in a vacuum – it needs to be tied to enterprise architecture 87% of CEOs believe fundamental change is required in next two years to drive innovation Over 60% of CEOs believe their organizations need to do a better job leveraging information Source: 2006 IBM Global CEO Survey 2
Customer Business Issues § Too much information and not knowing what’s important – Not using demand signals to drive supply chain – Not using customer analysis to tailor marketing and sales – Not leveraging valuable unstructured information § Multiple versions of the truth – Problems managing customer, product and partner interactions – Regulatory compliance inhibited by poor transparency § Lack of trusted information – Incomplete, out-of-date, inaccurate, misinterpreted data – Difficult to understand or control how information is used § Lack of agility – Inability to take advantage of opportunities for innovation – Escalating costs due to inflexible systems and changing needs 3
Today’s Information Challenge Organic growth, mergers & acquisitions, and composite applications have led to a vast and rapidly growing sea of information sources in our enterprises § Where is my information? § What does it mean? § Where is it used? How Can I Identify and Mitigate Data Risks When I Don’t Even Know Which Data I Have? 4
Data Sources Today’s Information Challenge Many User roles Software Architect Developer messages Data Architect DBA data warehouses inf or End Users XML ma tio n Business Analyst Data Admin 40% of IT budgets may be spent on integration Content Sources 5 rms fo ll oa ss t time is 30% of cce people’s A searching for relevant information @ office e-mail spreadsheets of IT Admin reports apps 30% of development time is copy management … fax content repositories relational databases Extende d Search Sources
Key Enterprise Questions in Understanding Data In information and process driven business environments the ability to understand, standardize, and validate core enterprise information is of strategic importance. In most organizations, core information remains in data prisons, preventing a single, integrated enterprise-wide view of data across applications. § What are systems of records for Customer, Product, and Item masters? § How do I ensure that replicated sources are consistent with the system of record? § How can I access a single view of my customer records and use it to enrich my customer interactions? § How do I understand whether the data is valid, accurate, and relevant? § How do I ensure that quality measures and metrics are consistent for the master data across data sources? § I have multiple implementations of the same application plus custom applications with their own data models; how do I exchange data between different applications? § How do we measure product and service effectiveness, when my product definitions are not consistent? 6 § Why do we re-invent the wheel for product and customer data when we develop a new application?
The “Understanding” of the System Developer Data Architect Trip Plan + Name : String + Start : Date + Stop : Date Attraction 0. . * Restaraunt + Ethnicity : String 0. . * + Name : String + Addr : Address Amusement Park + Daily Admission Adult : Currency + Daily Admission Child : Currency Conceptual Logical Physical 7
The “Reality” of the Underlying Data Metadata Description Actual Record Values Name-Line 1 Name-Line 2 Name-Line 3 Address-Line 1 Address-Line 2 Address-Line 3 Robert A. Jones TTE Robert Jones Jr. First Natl Provident FBO Elaine & Michael Lincoln UTA DTD 3 -30 -89 59 Via Hermosa c/o Colleen Mailer Esq Seattle, WA 98101 -2345 Investor Account Position Custodian Trustee Address 8 Financial Instrument
Common Data Problems § Lack of information standards – Different formats & structures across different systems § Data surprises in individual fields – Data misplaced in the database § Information buried in free-form fields Kate A. Roberts 416 Columbus Ave #2, Boston, Mass 02116 Catherine Roberts Four sixteen Columbus APT 2, Boston, MA 02116 Mrs. K. Roberts 416 Columbus Suite #2, Suffolk County 02116 Name Tax ID Telephone J Smith DBA Lime Cons. Williams & Co. C/O Bill 1 st Natl Provident HP 15 State St. 228 -02 -1975 025 -37 -1888 34 -2671434 508 -466 -1200 6173380300 415 -392 -2000 3380321 Orlando WING ASSY DRILL 4 HOLE USE 5 J 868 A HEXBOLT 1/4 INCH WING ASSEMBY, USE 5 J 868 -A HEX BOLT. 25” - DRILL FOUR HOLES USE 4 5 J 868 A BOLTS (HEX. 25) - DRILL HOLES FOR EA ON WING ASSEM RUDER, TAP 6 WHOLES, SECURE W/KL 2301 RIVETS (10 CM) § Data myopia – Lack of consistent identifiers inhibit a single view § The redundancy nightmare – Duplicate records with a lack of standards 9 19 -84 -103 RS 232 Cable 6' M-F Cand. S CS-89641 6 ft. Cable Male-F, RS 232 #87951 C&SUCH 6 Male/Female 25 PIN 6 Foot Cable 90328574 90328575 90238495 90233479 90233489 90345672 IBM I. B. M. Inc. Int. Bus. Machines International Bus. M. Inter-Nation Consults I. B. Manufacturing 187 N. Pk. Str. Salem NH 01456 187 N. Pk. St. Salem NH 01456 187 No. Park St Salem NH 04156 187 Park Ave Salem NH 04156 15 Main Street Andover MA 02341 Park Blvd. Bostno MA 04106
“Data Silos” compound the problem Multiple touchpoints Online Purchases Payments Service and Support ERP Internet Commerce JP Morgan, USA Customer Service Michael Johnson Cust ID : JP 003 Mike Johnson User id : mjohnson JP Morgan Contract: : JP 987 JP Morgan Chase Last Interaction: 4/11/03 (product not received) Corp across the enterprise Purchases Online Registration Sign up Seminar, Newsletter, Pro motion Purchasing Portal Marketing JP Morgan & Chase Michael Johnson Contact : Michael A Johnson, CIO User ID: Mjohnso ! Opt-Out flag ! Personalized access ! No Promotion flag 270 West St NY ! Gold Customer ! Sub: Newsletter 1 Customer and user data resides DW in multiple databases. Interaction between systems requires a custom interface 10
Domain Expertise constantly changes § Domain expertise is critical for understanding the data, the problem and interpreting the results • “The counter resets to 0 if the number of calls exceeds N”. • “The missing values are represented by 0, but the default billed amount is 0 too. ” § Insufficient domain expertise and understanding is a primary cause of poor data quality – data becomes unusable § Usually in people’s heads – seldom documented § Fragmented across organizations Developer § Lost during personnel and project transitions § If undocumented, knowledge and understanding deteriorates and becomes fuzzy over time End Users Business Analyst Software Architect Data Admin 11 IT Admin
Today’s Information Challenge Inconsistent Understanding Across Enterprise Sources Legacy Account (Product, Location) Different… § Data values that uniquely describe a business entity used to tell one from another (customer name, address, date of birth…) § Identifiers assigned to each unique instance of a business entity CRM Account Product Contact Household 12 Finance Account Bill To § Relationships between business entities (two customers “householded” together at the same location) § Hierarchies among business entities (parent company owns other companies, different chart of accounts across operations) Part Ship To ERP Vendor Contact Location Material
Use Case: Customer Credit Card Information & Risks § Customer example: DB 2 – 4, 500 databases. Multiple database vendors. – Tens of thousands of tables, many more views… § Where is customer credit card information stored and used? Oracle – Which tables? Views? Stored Procedures? Servers? – Applications? Messages? Partner interfaces? § What does it mean? SQL Server – Different names/structure for same information, no conventions § How can I modify my data architecture to provide a more sustainable solution? 13 Mainframe
Business Drivers for Investment Depend on Understanding • Empowering risk & compliance initiatives with the information they require • Optimizing Revenue Opportunities by ensuring effective and efficient interactions with customers, partners, and suppliers • Enabling collaborative business processes with consistent and trustworthy information § Reducing the total cost of ownership for maintaining consistent information across the enterprise 14
IBM Information Server A Platform for Understanding IBM Information Server Unified Deployment Understand Cleanse Transform Deliver Discover, model, and govern information structure and content Standardize, merge, and correct information Combine and restructure information for new uses Synchronize, virtualize and move information for in-line delivery Unified Metadata Management Parallel Processing Rich Connectivity to Applications, Data, and Content 15
Serrano-Hawk Version Information Integration: Understand Architects Data Analysts Data Architect Subject Matter Experts Information Analyzer Business Users Business Glossary Understand Structure- and data-driven data modeling and management Analyze data, and report and monitor based on integration and quality rules Unified Metadata Management 16 Perform structure-driven reporting and annotation
Physical Metadata: IBM Information Analyzer § Data-centric analysis of application, database and -based sources file § Secure, detailed profiling of fields, across fields, and across sources Subject Matter Experts Understand Data Analysts IBM Information Analyzer Analyze source data structures, and monitor adherence to integration and quality rules § Creation of metadata from profiling results § Results instantly promotable across IBM Information Server Physical View 17
Business Metadata: IBM Business Glossary § Web-based authoring, managing & sharing of business metadata § Aligns the efforts of IT with the goals of the business § Provides business context to information technology assets Subject Matter Experts Understand Business Users IBM Business Glossary Create and manage business vocabulary and relationships, while linking to physical sources § Establishes responsibility and accountability Database = DB 2 GL Account Number Schema = NAACCT Table = DLYTRANS Column = ACCT_NO data type = char(11) 18 Technical Business The ten digit account number. Sometimes referred to as the account ID. This value is of the form L-FIIIIVVVV. Business View
Logical Metadata: Rational Data Architect § Data modeling for data structures and federations Subject Matter Experts § Federated data discovery § Metadata relationship discovery & mapping Architects Rational Data Architect Create and manage business vocabulary and relationships, while linking to physical sources § Impact analysis, and synchronization across models § SQL & XML generation capabilities 19 Data Modeling & Mapping
Use Case: Customer Credit Card Information & Risks Problem Solution Products § Automated discovery of databases, tables, servers, views IBM Web. Sphere Information Analyzer § Guided discovery to find and relate elements Where is customer credit card information stored? IBM Rational Data Architect § Query/search of all discovered elements IBM Web. Sphere Business Glossary § Automated discovery of underlying data content and quality IBM Web. Sphere Information Analyzer § Analysis of completeness, validity, conformity What does it mean? IBM Web. Sphere Business Glossary § Definition of glossary/taxonomy in business terms § Share with your team § Search, explore metadata using business terms How can I modify my data architecture to provide a more sustainable solution? § Visualize existing database design § Develop new design § Manage change § Establish baselines and measure data over time 20 IBM Rational Data Architect IBM Web. Sphere Information Analyzer
Profile data to analyze and understand data sources Database = DB 2 Schema = CUSTMSTR Table = CUSTOMER Column = TAX_ID data type = char(10) In 3% of the data, this number is only 9 characters long. There are 3 distinct data formats, but there should only be 1. 256 Duplicate values exist in this field. Create a business glossary: common vocabulary between business & technical users Database = DB 2 Schema = CUSTMSTR Table = CUSTOMER Column = CC_NO data type = char(16) ls l e Credit Card Account Number The sixteen digit account number. Sometimes referred to as the CC ID. Private. Access is restricted, need-to-know basis. Extend and map metadata to identify critical data elements, business rules, certification requirements ü“Which databases contain credit card information? ” ü“Who is responsible for the stewardship of this information”? ü“What data contains “Credit Card Account”? ” Monitor data quality to continually assess and certify critical business information The number of TAX_ID exceptions has increased over 3 ü“Where do we need consecutive weeks and has now exceeded the baseline. to implement more The data validation report indicates a new data source is stringent controls? ” producing an unacceptable level of bad values. 21
Step 1 A: Automated discovery of databases, tables, servers, views Discover and remember which data sources contain credit information Physical Data Models DB#1 Oracle üAutomatically generate data models from schemas DB#2 SQL Server DB#3 DB 2 Metadata Repository 22 üDiscover and reverse engineer data sources üVisualize and annotate data models üExport models to metadata repository
Step 1 A details: Visualize & Search Data Structure to Aid Understanding § Overview diagram § Topology diagram § Search 23
Step 1 B: Guided discovery to find and relate elements Build a standardized model for credit card information and semantically map existing schemas Standard Glossary Standard Data Model üCreate Standard Schema with Glossary DB#1 DB 2 üMap Database Schemas to Standard Schema using Glossary ls l DB#2 SQL Server e üVisualize and annotate data models üExport models to metadata repository DB#3 Oracle Metadata Repository 24 üOptionally deploy to federated server to facilitate review
Step 2 A: Automatically discover underlying data content & quality Use Web. Sphere Information Analyzer to profile data sources Many Systems and Sources Rapidly expand deliver knowledge base on critical data Metadata Repository Web. Sphere Information Analyzer 25
Step 2 A details: Scalable Processing with Shared Metadata § High Concurrency § Scalable Architecture § Common Services Backbone § Common Metadata Repository § Common Connectivity § Parallel Engine 26
Step 2 B: Analyze data to identify anomalies and expand knowledge Use Web. Sphere Information Analyzer to review data sources Database = DB 2 Schema = CUSTMSTR Table = CUSTOMER Column = TAX_ID data type = char(10) In 3% of the data, this number is only 9 characters long. There are 3 distinct data formats, but there should only be 1. 256 Duplicate values exist in this field. Uncover unexpected or undocumented data anomalies Metadata Repository Web. Sphere Information Analyzer 27
Step 2 B details: Domain Integrity • Lexical Analysis • Pattern Consistency Data Analysts Subject Matter Experts Data Source(s) Entity Integrity • Duplicate Analysis • Targeted Data Accuracy Targeted Columns Targeted Entities Full Volume Profiling all information Analysis Review Metadata Integrity Domain Integrity • Completeness • Consistency Structural Integrity • Table Analysis • Primary Key Analysis 28 Relational Integrity • Foreign Key Analysis • Cross-Domain Analysis • Redundancy Analysis Targeted Columns Domain Integrity • Business Rule Identification & Validation Decisions Requirements Specifications Reference Tables
Step 2 B details: Column Level Understanding Metadata Integrity § Value Frequencies § Data Classification § Data Properties § Common Formats 29
Step 2 B details: Domain Level Understanding Domain Integrity § Completeness § Validity § Format Conformity § Mapping Values § Reference Tables 30
Step 2 B details: Table Level Understanding Structural Integrity § Single or Multi-Column Primary Keys § Uniqueness § Duplication 31
Step 2 B details: Cross-Table/Cross-Source Level Understanding Relational Integrity § Foreign Keys – Single Column or Multi. Column Assessment – Referential Integrity § Cross-Domain – Commonality or Redundancy 32
Step 2 C: Define glossary/taxonomy in business terms Credit Card Account Number Database = DB 2 Schema = NAACCT The sixteen digit account number. Sometimes referred to as the CC ID. Table = DLYTRANS Column = CC_NO Private. Access is restricted, need-to-know basis. Data type = char(16) Duplicates =. 012% Create a common vocabulary between business & technical users Rational Data Architect Web. Sphere Information Analyzer 33 Metadata Repository Web. Sphere Business Glossary
Step 2 D: Search, explore metadata using business terms ü“What is our business definition of “Credit Card Account”? ” ü“What data contains “Credit Card Account”? ” ü“What are the security restrictions on credit card accounts? ” ü“Who is responsible for the stewardship of this information”? Web. Sphere Business Glossary ü“Which databases contain credit card information? ” Metadata Repository ls l e 34 ü“Which reports include credit card information? ”
Step 3: Monitor data over time for ongoing understanding Use Web. Sphere Information Analyzer to establish baselines of information to audit over time ü“Where do we need to implement more stringent controls? ” ü“How do we ensure that critical data meets our standards? ” Identify and mitigate risk. View differences between current state and the baseline 35
Lessons Learned & Best Practice: Control Scope Ruthlessly / Focus on Benefits § Business must own scope – Business should be owners, not renters – IT maintains its independence by not taking sides – Controlling scope encourages project discipline § Continually Scope & Iterate – Don’t boil the ocean – Projects which try to do it all in one pass generally fail § Measure, Report, and Deliver benefits regularly – Initial projects must provide some benefit within 6 - 9 months at the minimum (even if a small benefit) – Subsequent phases should provide benefits every 3 -6 months 36
Summary § Information understanding is becoming an increasingly important organizational issue § Most critical business initiatives depend of quality information § Improving understanding requires a focused programmatic approach including business, data, and system levels § The IBM Information Server provides all of this in a unified platform § At the core of any source system analysis is a platform capable of providing ongoing understanding 37 IBM Information Server
How Can IBM Help? § Comprehensive platform for data understanding § Experience and repeatable process for helping organizations set up source analysis and data quality programs § Domain and industry-specific expertise in establishing repeatable source analysis, semantic, and data quality services § Data quality assessment offering to report on existing data content and quality and establish the business value of ongoing source analysis through a data quality program § Contact your IBM representative for more information 38
Information On Demand 2006 Register Now: www. ibm. com/events/informationondemand Why attend: § Participate in the PREMIER discussion on the future of Information Management § Learn how the transformation to Information as a Service will help you unlock business value and drive competitive advantage § Hear how your peers are realizing ROI IBM Information On Demand 2006 October 15 -20, 2006 Anaheim, California § The premier information management event for business and IT executives, managers, professionals, DBA's and developers. § Select from over 800 sessions: a 2 1/2 day business leadership track with 180 sessions and a 5 day technical track with 650 sessions. § Latest strategy and product announcements § Large Expo Center, Hands on labs § One on ones with executives and specialists § Birds of a Feather roundtables 39 § Understand the roadmap to long term strategic advantage § Learn best practices in your industry § Receive the best in technical education and free certification § Extensive opportunities for networking with both your peers and industry experts
40
0e58ff9562b438b63e58e12db10c111d.ppt