593bd8e23f45a5455d7747118d74491d.ppt
- Количество слайдов: 26
Building a Data Warehouse. . . Bring in the Sheaves January 13, 2004 EDUCAUSE Mid-Atlantic Conference Baltimore, Maryland Ella Smith U. S. Department of Agriculture Alan Harmon U. S. Naval Academy
Copyright Ella Smith and Alan Harmon, 2004. This work is the intellectual property of the authors. Permission is granted for this material to be shared for non-commercial, educational purposes, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is by permission of the authors. To disseminate otherwise or to republish requires written permission from the authors.
Agenda for this Session • Definition • • Overview of the Initial Process (Proof-of-Concept) Organizational Ownership Data Warehouse Architecture Project Team Composition The Process Wrap Up Questions
What is a Data Warehouse? u Definition: a repository of data derived from operational systems or external source; NOT an archive u Purpose: collect and report data in a consistent, centralized manner; mechanism for conducting longitudinal analysis u Strategy: Target key applications (Admissions, Registrar, Frozen Files), clean data and load.
Benefits of a Data Warehouse u Cost Savings by reducing the amount of manual time and effort required to compile, organize, and report the data. u Data Consistency among the different areas since the data will be synchronized upon entry into the data warehouse. u Access to the information will be faster since the process will be automated and available online (versus paper reports).
Loading and Cleaning Data Opportunity to Integrate, Correct, and Validate Data Live Data Sources Flat File Data Sources Data Extraction and Cleaning (can be very complex) Data in Databases u Applications SAP People. Soft Oracle Apps u u Integrate multiple data sources Correct data problems (cleanse) Validate Data Summarize and roll-up data Update Metadata Data Warehouse
Online Analytical Processing CLASS YEAR ADMISSIONS View M A R K E T D O PR M A JO R S Fast and Selective Access to Summarized Data STUDENTS REGISTRAR View TIME FINANCIAL View Ad Hoc View
DW Development Strategy u Think & Plan Big – Build In Small Steps; Don’t build a BARN! Not an archive system u Identify your audience u Use DW to address new areas, add new capabilities, and fix existing problems u Retain existing transactional systems u Iterative development approach – – – Address key needs Rapidly deliver capability to users Lower risk
Strategy (continued) u Evolve system in manageable phases – Identify questions you need to answer OR – Look at data to determine questions you can answer u Strategy – Develop an overall plan – Develop common metadata standards – Implement needed pieces mindful of integration and expansion
Initial Considerations u Vision u Proof-of-Concept / Phased Approach u Benefits u Strategy u Timeline u Cost u Issues – Data – Political constraints – Organizational Factors
st 1 Step: Proof-of-Concept u Develop a Stand Alone Proof-of-Concept u Develop model to demonstrate use of new tools to end users. u Provide benchmarks for future planning. u Low cost way to “test the waters” u Exposes YOUR data and ability to deal with it u Define number of tasks and deliverables.
Proof-of-Concept Timeline u 6 -8 Weeks for each increment – Requirements: gather and document – Data: identify source, construct model, extract data, cleanse data, transport data to database – Data Access: user interface, security, training, documentation
Proof-of-Concept Timeline
Proof-of-Concept ADMISSION DATA REGISTRAR DATA Logical Data Model SSN CLASS YEAR GPA MAJOR SSN STUDENT_FACT SATVHI SATMHI H. S. RANK H. S. CLASS SIZE DEMOGRAPHIC DATA SSN CLASS YEAR ETHNICITY GENDER HIGH SCHOOL REGION TIME SSN ACYR #ACYEAR CLASS YEAR
Proof-of-Concept ADMISSION DATA REGISTRAR DATA Physical Data Model #ADMISSION_SSN SCORE_CLASS SSN DEMOGRAPHIC DATA #DEMO_SSN DEMO_CLASS ETHNICITY GENDER HIGH SCHOOL REGION SSN STUDENT_FACT #ADMISSION_SSN #DEMO_SSN #REGISTRAR_SSN #ACYEAR SATVHI SATMHI H. S. RANK H. S. CLASS SIZE #REGISTRAR_SSN CLASS GPA MAJOR SSN ACYR TIME #ACYEAR CLASS_YEAR
Post-Po. C: DW Architecture u Many types of architecture – Star schema, Snowflake, Hybrid u Depends on: – – – Types of queries Size of database Capability of hardware and software u Basic Components: – Logical Model – Physical Model
Physical Data Warehouse Topology Admissions Public Affairs Academic Affairs Web. Server General Public Database Server Dean of Students Finance Office for remote connectivity Remote Laptop 1 Remote Laptop 2 HR President Instit Research
Meta. Data Definition: Information about your data u Centralized description of business rules – Describes data and transformations within DW – Captures changes in business rules over time to provide a level playing field for comparing data u Audit trail for data authentication u Bottom line – Increased trust in DW-based analysis results
Project Team Composition u Types of Personnel – – – and Analysis & Design Implementation Test & Quality Assurance Level of Skill (HIGH) (MED) (LOW) u Skill = $$$ u Vary Skill by Task to control cost
The Project Model “Roles and Responsibilities” Steering Committee Joint Client and Consultant Quality Assurance Test and Map to Requirements Modeler Prgmr Project Manager DBA Tool Prgmrs Planning, Reporting, Certification End. User Liaison Documentation
The Project Model “Roles and Responsibilities” Modeler Prgmr Tool Prgmrs DBA End. User Liaison Documentation Analysis Phase Scoping Infrastructure Scoping Modeling Building Documentation Architecture Phase Modeling Cleaning Capacity Planning Prototyping Implementation Phase ETL Implementation Building Transition Phase QA QA QA / Training
The Harvest! u Review requirements and results periodically – At end of each phase – Annually, taken as a whole u Optimize data warehouse – Response based on queries and load – Bring in-line with operational systems u Review and Adjust the DW mission as institutional mandates change
Cost Control u Start small and develop in phases u Bring in skill sets as needed remember: $$$ = (Skills) x (period of time) u Institutional staff should know the data u Organizational issues need to be resolved by the Project Manager and Steering Committee
Accountability u MUST show results (standard or adhoc reports) u Ensure complete documentation to maintain responsibility and association of data to departments u Establish a Return-on-Investment (ROI) whether tangible (number of reports) or intangible (executive support/decision making)
Issues u Security u Performance u Managing the metadata u Managing the data warehouse u Hardware/software configuration u Resources u Staying in the loop!
Building a Data Warehouse
593bd8e23f45a5455d7747118d74491d.ppt