
719b9a492390c88acff4fbb3f389b6d6.ppt
- Количество слайдов: 61
ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University ISQS 6347, Data & Text Mining 1
Outline n n An overview of data preparation for analytics SAS Programming Essentials q q q n Running SAS programs Mastering fundamental concepts SAS program debugging Make use of SAS Enterprise Guide for programming ISQS 6347, Data & Text Mining 2
Structure and Components of Business Intelligence ISQS 6347, Data & Text Mining 3
Overview: From Data Warehousing to Data Analysis n Previous major topics in data warehousing (using SQL Server 2008) q q q n Dimensional model design ETL Cubes design and OLAP Data analysis topics (using SAS) q Data preparation n n q q Analytic business questions Data format and data conversion Data cleansing Data exploratory Data analysis Data visualization ISQS 6347, Data & Text Mining 4
Components of the SAS System Reporting And Graphics Data Access And Management User Interface Analytical Base SAS Application Development Visualization And Discovery Business Solutions Web Enablement ISQS 6347, Data & Text Mining 5
SAS Programming Essentials n Find more information from q http: //support. sas. com ISQS 6347, Data & Text Mining 6
Data-driven Tasks n The functionality of the SAS System is built around four data-driven tasks common to virtually any applications q q Data access Data management Data analysis Data presentation ISQS 6347, Data & Text Mining 7
Turning Data into Information n Process of delivery meaningful information q 80% data-related n n n q Access Scrub Transform Mange Store and retrieve 20% analysis ISQS 6347, Data & Text Mining 8
Turning Data into Information Data DATA Step SAS Data Sets PROC Steps Information ISQS 6347, Data & Text Mining 9
Design of the SAS System Multi. Vendor Architecture 90% independent PC 10% dependent Workstation Servers/ Midrange Mainframe ISQS 6347, Data & Text Mining Super Computer 10 . . .
Design of the SAS System Multi. Engine Architecture DB 2 Teradata SAP d. Base DATA ORACLE SYBASE Microsoft Excel ISQS 6347, Data & Text Mining 11
SAS Programming – Level I n n n Fundamentals (ch 1 -3) Producing list reports (ch 4) Enhancing output (ch 5) Creating data sets (ch 6) Data step programming (ch 7) q q q n n n Reading data Creating variables Conditional processing Keeping and dropping variables Reading Excel files Combining SAS data sets (ch 8) Producing summary reports (ch 9) SAS graphing (ch 10) ISQS 6347, Data & Text Mining 12
Course Scenario n. In this course, you work with business data from International Airlines (IA). The various kinds of data that IA maintains are listed below: q q q flight data passenger data cargo data employee data revenue data ISQS 6347, Data & Text Mining 13
Course Scenario n. The following are some tasks that you will perform: q q q importing data creating a list of employees producing a frequency table of job codes summarizing data creating a report of salary information ISQS 6347, Data & Text Mining 14
SAS Programs A SAS program is a sequence of steps that the user submits for execution. Raw Data DATA steps are typically used to create SAS data sets. DATA Step SAS Data Set PROC Step Report PROC steps are typically used to process SAS data sets (that is, generate reports and graphs, edit data, and sort data). ISQS 6347, Data & Text Mining 15
SAS Programs data work. staff; infile 'raw-data-file'; input Last. Name $ 1 -20 First. Name $ 21 -30 Job. Title $ 36 -43 Salary 54 -59; run; DATA Step proc print data=work. staff; run; proc means data=work. staff; class Job. Title; var Salary; run; ISQS 6347, Data & Text Mining PROC Steps 16
Step Boundaries SAS steps begin with either of the following: n DATA statement n PROC statement SAS detects the end of a step when it encounters one of the following: n a RUN statement (for most steps) n a QUIT statement (for some procedures) n the beginning of another step (DATA statement or PROC statement) ISQS 6347, Data & Text Mining 17
Step Boundaries data work. staff; infile 'raw-data-file'; input Last. Name $ 1 -20 First. Name $ 21 -30 Job. Title $ 36 -43 Salary 54 -59; run; proc print data=work. staff; proc means data=work. staff; class Job. Title; var Salary; run; ISQS 6347, Data & Text Mining 18
Running a SAS Program You can invoke SAS in the following ways: n interactive windowing mode (SAS windowing environment) n interactive menu-driven mode (SAS Enterprise Guide, SAS/ASSIST, SAS/AF, or SAS/EIS software) n batch mode n noninteractive mode ISQS 6347, Data & Text Mining 19
Preparation of SAS Programming n n Data sets: SAS-Programming Create a user defined library reference n Statement LIBNAME libref ‘SAS-data-library’ <options>; n Example LIBNAME ia ‘c: workshopwinsasprog 1’; n Two-levels of SAS files names Libref. fielname ISQS 6347, Data & Text Mining 20
SAS Programming Essentials n n Demon: c 02 s 2 d 1 Exercise: c 02 ex 1 ISQS 6347, Data & Text Mining 21
Browsing the Descriptor Portion n General form of the CONTENTS procedure: n Example: n PROC CONTENTS DATA=SAS-data-set; RUN; proc contents data=work. staff; run; ISQS 6347, Data & Text Mining c 02 s 3 d 1 22
SAS Data Sets: Data Portion The data portion of a SAS data set is a rectangular table of character and/or numeric data values. Job. Title Salary TORRES LANGKAMM SMITH WAGSCHAL TOERMOEN JAN SARAH MICHAEL NADJA JOCHEN Pilot Mechanic Pilot 50000 80000 40000 77500 65000 Character values Variable values First. Name Variable names Last. Name Numeric values Variable names are part of the descriptor portion, not the data portion. 23 ISQS 6347, Data & Text Mining
SAS Variable Values There are two types of variables: character contain any value: letters, numbers, special characters, and blanks. Character values are stored with a length of 1 to 32, 767 bytes. One byte equals one character. numeric stored as floating point numbers in 8 bytes of storage by default. Eight bytes of floating point storage provide space for 16 or 17 significant digits. You are not restricted to 8 digits. ISQS 6347, Data & Text Mining 24
SAS Data Set and Variable Names n. SAS names have these characteristics: q q can be 32 characters long. can be uppercase, lowercase, or mixed-case. are not case sensitive. must start with a letter or underscore. Subsequent characters can be letters, underscores, or numerals. ISQS 6347, Data & Text Mining 25
Valid SAS Names n Select the valid default SAS names. data 5 mon ISQS 6347, Data & Text Mining 26 . . .
Valid SAS Names n Select the valid default SAS names. data 5 mon ISQS 6347, Data & Text Mining 27 . . .
Valid SAS Names n Select the valid default SAS names. data 5 monthsdata ISQS 6347, Data & Text Mining 28 . . .
Valid SAS Names n Select the valid default SAS names. data 5 monthsdata ISQS 6347, Data & Text Mining 29 . . .
Valid SAS Names n Select the valid default SAS names. data 5 monthsdata data#5 ISQS 6347, Data & Text Mining 30 . . .
Valid SAS Names n Select the valid default SAS names. data 5 monthsdata data#5 ISQS 6347, Data & Text Mining 31 . . .
Valid SAS Names n Select the valid default SAS names. data 5 monthsdata data#5 five months data ISQS 6347, Data & Text Mining 32 . . .
Valid SAS Names n Select the valid default SAS names. data 5 monthsdata data#5 five months data ISQS 6347, Data & Text Mining 33 . . .
Valid SAS Names n Select the valid default SAS names. data 5 monthsdata data#5 five months data fivemonthsdata ISQS 6347, Data & Text Mining 34 . . .
Valid SAS Names n Select the valid default SAS names. data 5 monthsdata data#5 five months data fivemonthsdata ISQS 6347, Data & Text Mining 35 . . .
Valid SAS Names n Select the valid default SAS names. data 5 monthsdata data#5 five months data fivemonthsdata Five. Months. Data ISQS 6347, Data & Text Mining 36 . . .
Valid SAS Names n Select the valid default SAS names. data 5 monthsdata data#5 five months data fivemonthsdata Five. Months. Data ISQS 6347, Data & Text Mining 37 . . .
Missing Data Values A value must exist for every variable for each observation. Missing values are valid values. Last. Name First. Name Job. Title Salary TORRES LANGKAMM SMITH WAGSCHAL TOERMOEN JAN SARAH MICHAEL NADJA JOCHEN Pilot Mechanic Pilot 50000 80000. 77500 65000 A character missing value is displayed as a blank. ISQS 6347, Data & Text Mining A numeric missing value is displayed as a period. 39
Browsing the Data Portion n. The PRINT procedure displays the data portion of a SAS data set. n. By default, PROC PRINT displays the following: q q q all observations all variables an Obs column on the left side ISQS 6347, Data & Text Mining 40
Browsing the Data Portion n General form of the PRINT procedure: n Example: n PROC PRINT DATA=SAS-data-set; RUN; proc print data=work. staff; run; ISQS 6347, Data & Text Mining c 02 s 3 d 1 41
SAS Data Set Terminology SAS documentation and text in the SAS windowing environment use the following terms interchangeably: SAS Data Set SAS Table Variable Column Observation Row ISQS 6347, Data & Text Mining 42
SAS Syntax Rules SAS statements have these characteristics: n usually begin with an identifying keyword n always end with a semicolon data work. staff; infile 'raw-data-file'; input Last. Name $ 1 -20 First. Name $ 21 -30 Job. Title $ 36 -43 Salary 54 -59; run; proc print data=work. staff; run; proc means data=work. staff; class Job. Title; var Salary; run; ISQS 6347, Data & Text Mining 43
SAS Syntax Rules SAS statements are free-format. n One or more blanks or special characters can be used to separate words. n They can begin and end in any column. n A single statement can span multiple lines. n Several statements can be on the same line. Unconventional Spacing data work. staff; infile 'raw-data-file'; input Last. Name $ 1 -20 First. Name $ 21 -30 Job. Title $ 36 -43 Salary 54 -59; run; proc means data=work. staff; class Job. Title; ISQS 6347, Data & Text Mining var Salary; run; n 44 . . .
SAS Syntax Rules SAS statements are free-format. n One or more blanks or special characters can be used to separate words. n They can begin and end in any column. n A single statement can span multiple lines. n Several statements can be on the same line. Unconventional Spacing data work. staff; infile 'raw-data-file'; input Last. Name $ 1 -20 First. Name $ 21 -30 Job. Title $ 36 -43 Salary 54 -59; run; proc means data=work. staff; class Job. Title; ISQS 6347, Data & Text Mining var Salary; run; n 46 . . .
SAS Syntax Rules SAS statements are free-format. n One or more blanks or special characters can be used to separate words. n They can begin and end in any column. n A single statement can span multiple lines. n Several statements can be on the same line. Unconventional Spacing data work. staff; infile 'raw-data-file'; input Last. Name $ 1 -20 First. Name $ 21 -30 Job. Title $ 36 -43 Salary 54 -59; run; proc means data=work. staff; class Job. Title; ISQS 6347, Data & Text Mining var Salary; run; n 47 . . .
. . . SAS Syntax Rules SAS statements are free-format. n One or more blanks or special characters can be used to separate words. n They can begin and end in any column. n A single statement can span multiple lines. n Several statements can be on the same line. Unconventional Spacing data work. staff; infile 'raw-data-file'; input Last. Name $ 1 -20 First. Name $ 21 -30 Job. Title $ 36 -43 Salary 54 -59; run; proc means data=work. staff; class Job. Title; ISQS 6347, Data & Text Mining var Salary; run; n 48 . . .
. . . SAS Syntax Rules SAS statements are free-format. n One or more blanks or special characters can be used to separate words. n They can begin and end in any column. n A single statement can span multiple lines. n Several statements can be on the same line. Unconventional Spacing data work. staff; infile 'raw-data-file'; input Last. Name $ 1 -20 First. Name $ 21 -30 Job. Title $ 36 -43 Salary 54 -59; run; proc means data=work. staff; class Job. Title; ISQS 6347, Data & Text Mining var Salary; run; n 49
SAS Syntax Rules Good spacing makes the program easier to read. Conventional Spacing data work. staff; infile 'raw-data-file'; input Last. Name $ 1 -20 First. Name $ 21 -30 Job. Title $ 36 -43 Salary 54 -59; run; proc print data=work. staff; run; proc means data=work. staff; class Job. Title; var Salary; run; ISQS 6347, Data & Text Mining 50
SAS Comments n Type /* to begin a comment. n Type your comment text. n Type */ to end the comment. /* Create work. staff data set */ data work. staff; infile 'raw-data-file'; input Last. Name $ 1 -20 First. Name $ 21 -30 Job. Title $ 36 -43 Salary 54 -59; run; /* Produce listing report of work. staff */ proc print data=work. staff; run; ISQS 6347, Data & Text Mining c 02 s 3 d 2 51
Syntax Errors Syntax errors include the following: misspelled keywords n missing or invalid punctuation n invalid options daat work. staff; infile 'raw-data-file'; input Last. Name $ 1 -20 First. Name $ 21 -30 Job. Title $ 36 -43 Salary 54 -59; run; n proc print data=work. staff run; proc means data=work. staff average max; class Job. Title; var Salary; run; ISQS 6347, Data & Text Mining 52
Debugging a SAS Program c 02 s 4 d 1. sas userid. prog 1. sascode(c 02 s 4 d 1) c 02 s 4 d 2. sas userid. prog 1. sascode(c 02 s 4 d 2) n This demonstration illustrates how to submit a SAS program that contains errors, diagnose the errors, correct the errors, and save the corrected program. ISQS 6347, Data & Text Mining 53
Recall a Submitted Program statements accumulate in a recall buffer each time you issue a SUBMIT command. daat work. staff; infile 'raw-data-file'; input Last. Name $ 1 -20 First. Name $ 21 -30 Job. Title $ 36 -43 Salary 54 -59; run; proc print data=work. staff run; proc means data=work. staff average max; class Job. Title; var Salary; run; data work. staff; infile 'raw-data-file'; input Last. Name $ 1 -20 First. Name $ 21 -30 Job. Title $ 36 -43 Salary 54 -59; run; proc print data=work. staff; run; proc means data=work. staff mean max; class Jobtitle; var Salary; ISQS 6347, Data & Text Mining run; Submit Number 1 Submit Number 2 54
Recall a Submitted Program Issue the RECALL command once to recall the most recently submitted program. Submit Number 1 Issue RECALL once. Submit Number 2 data work. staff; infile 'raw-data-file'; input Last. Name $ 1 -20 First. Name $ 21 -30 Job. Title $ 36 -43 Salary 54 -59; run; proc print data=work. staff; run; proc means data=work. staff mean max; class Job. Title; var Salary; run; Submit Number 2 statements are recalled. ISQS 6347, Data & Text Mining 55
Recall a Submitted Program Issue the RECALL command again to recall Submit Number 1 statements. Submit Number 1 Issue RECALL again. Submit Number 2 daat work. staff; infile 'raw-data-file'; input Last. Name $ 1 -20 First. Name $ 21 -30 Job. Title $ 36 -43 Salary 54 -59; run; proc print data=work. staff run; proc means data=work. staff average max; class Job. Title; var Salary; run; data work. staff; infile 'raw-data-file'; input Last. Name $ 1 -20 First. Name $ 21 -30 Job. Title $ 36 -43 Salary 54 -59; run; proc print data=work. staff; run; proc means data=work. staff mean max; class Job. Title; var Salary; run; ISQS 6347, Data & Text Mining 56
Exercise 8: Basic SAS Programming n n n Define library IA and Out Go through all SAS programs in Chapter 2 -5. Write a SAS program to read a dataset created by yourself or simply use Person 0. txt in \Tech. SharecobadISQS 3358Other. Datasets. The dataset is output to your library Out. Try to apply whatever SAS features in Chapter 5 of Prog. I to general a nice looking report. Go through all exercises for Ch 2, 3, 4, 5, 6 (answer keys are available, so no need to submit the results) ISQS 6347, Data & Text Mining 57
Hands-on exercise n n n Write a SAS program to calculate the number of dates passed in 2012 to 3/3/2012. The input is in the format: date 9. 01 JAN 2012 03 MAR 2012 Answer: 62 days ISQS 6347, Data & Text Mining 58
Making Use of SAS Enterprise Guide Code n Import a text file q n Example: Orders. txt Import an Excel file q Example: Supply. Info. xls ISQS 6347, Data & Text Mining 59
Learn from Examples n SAS Help q q Contents -> Learning to use SAS -> Sample SAS Programs -> Base SAS “Base Usage Guide Examples” n Chapter 3, 4 ISQS 6347, Data & Text Mining 60
Import an Excel Sheet proc import out=work. commrex datafile ="C: LinSharedISQS 6339Commrex_3358. xls" dbms=excel replace; sheet="Company"; getnames=yes; mixed=no; scantext=yes; usedate=yes; scantime=yes; run; proc print data=work. commrex; run; ISQS 6347, Data & Text Mining 61
Excel SAS/ACCESS LIBNAME Engine libname xlsdata 'C: LinSharedISQS 6339Commrex_3358. xls'; proc print data=xlsdata. New 1; run; ISQS 6347, Data & Text Mining 62
Exercise 8: SAS Data Step Programming n http: //zlin. ba. ttu. edu/6339/Exercise. SASProgramming. htm ISQS 6347, Data & Text Mining 63
719b9a492390c88acff4fbb3f389b6d6.ppt