The Information Delivery Process - PowerPoint PPT Presentation

About This Presentation
Title:

The Information Delivery Process

Description:

0082 MCGWIER-WATTS CHRISTINA. 8. Reading Raw Data Files ... Read the value with the $16. informat and assign it to the variable MODEL. ... – PowerPoint PPT presentation

Number of Views:116
Avg rating:3.0/5.0
Slides: 73
Provided by: Kristin217
Category:

less

Transcript and Presenter's Notes

Title: The Information Delivery Process


1
The Information Delivery Process
Manage
Organize
Exploit
2
Turning Data Into Information
DATA Step
Data
Data
PROC Steps
Data
SAS Data Sets
PROC Steps
Information
Information
3
Turning Data Into Information
Process of delivering meaningful information
80 Data-related
  • Access
  • Scrub
  • Transform
  • Manage
  • Store and retrieve

20 Analysis
4
The Raw Data
Partial fixed-column raw data file
1
1
2
2
3
3
4
4
1---5----0----5----0----5----0----5----0----5
5
Browsing the Data Values
  • Listing of Flight Crew
    Employees
  • Obs empid lastname firstname jobcode
    salary
  • 1 0031 GOLDENBERG DESIREE PILOT1
    50221.62 2 0040
    WILLIAMS ARLENE M. FLTAT1 23666.12
    3 0071 PERRY
    ROBERT A. FLTAT1 21957.71
    4 0082 MCGWIER-WATTS CHRISTINA
    PILOT3 96387.39
    5 0091 SCOTT HARVEY F. FLTAT2
    32278.40 6 0106
    THACKER DAVID S. FLTAT1 24161.14
    7 0275 GRAHAM
    DEBORAH S. FLTAT2 32024.93
    8 0286 DREWRY SUSAN
    PILOT1 55377.00
    9 0309 HORTON THOMAS L. FLTAT1
    23705.12 10 0334
    DOWN EDWARD PILOT1 56584.87
    11 0347 CHERVENY
    BRENDA B. FLTAT2 38563.45
    12 0355 BELL THOMAS B.
    PILOT1 59803.16
    13 0366 GLENN MARTHA S. PILOT3
    120202.38
    14 0385 HOLMAN GREGORY A. PILOT2
    93001.09 15 0390
    NOE BARBARA E. FLTAT2 37101.32

6
Reading a Raw Data File
Raw Data File
SAS Data Set
7
Reading Raw Data Files
Raw Data File
SAS Data Set
DATA Step
empid lastname firstname 0031 GOLDENBERG
DESIREE0040 WILLIAMS ARLENE M. 0071 PERRY
ROBERT A. 0082 MCGWIER-WATTS CHRISTINA

data . . . infile . . . input . . .run
8
Reading Raw Data Files
  • In order to create a SAS data set from a raw data
    file, you must
  • start a DATA step and name the SAS data set
    being created (DATA statement)
  • identify the location of the raw data file to
    read (INFILE statement)
  • describe how to read the data fields from the
    raw data file (INPUT statement).

9
Creating a SAS Data Set with the DATA Statement
  • General form of the DATA statement
  • This DATA statement creates a SAS data set called
    WORK.EMPDATA
  • data work.empdata

DATA SAS-data-set(s)
10
Pointing to a Raw Data File with the INFILE
Statement
  • General form of the INFILE statement
  • Examples
  • OS/390 infile edc.prog1.employee
  • UNIX infile /user/prog1/employee.dat
  • WIN infile C\workshop\winsas\
    prog1\employee.dat

INFILE filename ltoptionsgt
11
Reading Raw Data Using Column Input
  • General form of column input
  • To read raw data values with column input,
  • 1. name the SAS variable you want to create
  • 2. use a dollar sign, , if the SAS variable is
    character
  • 3. specify the starting column, a dash, and the
    ending column of the raw data field.

INPUT variable startcol-endcol
12
Reading Raw Data Using Column Input
2
1
1
2
3
3
4
4
1---5----0----5----0----5----0----5----0----5
  • 0031GOLDENBERG DESIREE PILOT1 50221.62
  • input empid 1-4
  • lastname 5-17

13
Reading Raw Data Using Column Input
2
1
1
2
3
3
4
4
1---5----0----5----0----5----0----5----0----5
  • 0031GOLDENBERG DESIREE PILOT1 50221.62
  • input empid 1-4
  • lastname 5-17
  • firstname 18-30

14
Reading Raw Data Using Column Input
2
1
1
2
3
3
4
4
1---5----0----5----0----5----0----5----0----5
  • 0031GOLDENBERG DESIREE PILOT1 50221.62
  • input empid 1-4
  • lastname 5-17
  • firstname 18-30
  • jobcode 31-36

15
Reading Raw Data Using Column Input
2
1
1
2
3
3
4
4
1---5----0----5----0----5----0----5----0----5
  • 0031GOLDENBERG DESIREE PILOT1 50221.62
  • input empid 1-4
  • lastname 5-17
  • firstname 18-30
  • jobcode 31-36
  • salary 37-45

16
Reading Raw Data Using Column Input
2
1
1
2
3
3
4
4
1---5----0----5----0----5----0----5----0----5
  • 0031GOLDENBERG DESIREE PILOT1 50221.62
  • input empid 1-4
  • lastname 5-17
  • firstname 18-30
  • jobcode 31-36
  • salary 37-45

17
Business Scenario
  • International Airlines is preparing to review its
    flight crew. The immediate goal is to read the
    Excel spreadsheet and create a SAS data set.

18
What is the Import Wizard?
  • A point-and-click graphical interface that
    enables you to create a SAS data set from several
    types of external files including
  • dBASE file (.DBF)
  • Excel 97 Spreadsheet (.XLS)
  • Microsoft Access Table
  • Delimited file (.)
  • Comma Separated Values (.CSV)

19
The Raw Data
  • The aircraft data is stored in a fixed-column
    raw data file

Partial data
last maintenance date
aircraft ID
aircraft model
date in service
20
Using Formatted Input
Raw Data File
JetCruise LF5200 030003 04/05/1994
03/11/2001 JetCruise LF5200 030005 02/15/1999
07/05/2001 JetCruise LF5200 030008 03/06/1996
04/02/2002
DATA Step
  • The raw data file will be read with formatted
    input.

data sas-data-set-name infile raw-filename
input pointer-control variable
informat-namerun
SAS Data Set
21
What is a SAS Format?
  • A format is an instruction that the SAS System
    uses to write data values.
  • SAS formats have the following form
  • ltgtformatltwgt.ltdgt

22
SAS Formats
  • Selected SAS formats
  • w.d standard numeric format
  • w. standard character format
  • COMMAw.d commas in a number 12,234.21
  • DOLLARw.d dollar signs and commas in a
  • number 12,234.41

23
SAS Formats
24
Using Formatted Input
  • General form of the INPUT statement with
    formatted input
  • Pointer control
  • _at_n moves the pointer to column n.
  • n moves the pointer n positions.

INPUT pointer-control column informat ...
25
Using Formatted Input
  • Formatted input can be used to read non-standard
    data values by
  • moving the input pointer to the starting position
    of the field
  • specifying a column name
  • specifying an informat.
  • An informat specifies the width of the input
    field and how to read the data values that are
    stored in the field.

26
Using Formatted Input
  • General form of an informat
  • indicates a character format.
  • informat-name names the informat.
  • w is an optional field width.
  • . is the required delimiter.
  • d optionally, specifies a decimal for
    numeric informats.

informat-namew.d
27
Selected Informats
  • 7. or 7.0 reads seven columns of numeric
    data.
  • 7.2 reads seven columns of numeric data and
    inserts a decimal point in the data value.
  • 5. reads five columns of character data and
    removes leading blanks.
  • CHAR5. reads five columns of character data
    and preserves leading blanks.

28
Selected Informats
COMMA7. reads seven columns of numeric data and
removes selected nonnumeric characters, such as
dollar signs and commas. PD4. reads four
columns of packed decimal data. MMDDYY10. reads
dates of the form 01/20/2000.
29
Working with Date Values
  • The raw data file contains date values. These
    date values will be read with the MMDDYY10.
    informat
  • Jetcruise LF5200 030003 04/05/1990 3/11/2001
  • Jetcruise LF5200 030005 02/15/1990 7/05/2001
  • Jetcruise LF5200 030008 03/06/1990 4/02/2002

1
1
2
3
2
3
4
4
1---5----0----5----0----5----0----5----0----5
30
Converting Dates to SAS Date Values
  • SAS uses date informats to read and convert dates
    to SAS date values. For example,
  • Stored Value Informat Converted Value
  • 10/29/1999 MMDDYY10. 14546
  • 29OCT1999 DATE9. 14546
  • 29/10/1999 DDMMYY10. 14546

31
SAS Formats
  • Selected SAS date formats
  • MMDDYYw. 101692 (MMDDYY6.)
  • 10/16/92 (MMDDYY8.)
  • 10/16/1992 (MMDDYY10.)
  • DATEw. 16OCT92 (DATE7.)
  • 16OCT1992 (DATE9.)

32
Locating and Browsing the Raw Data File
  • Browse the raw data file and determine the column
    layout and type

Partial raw data file
last maintenance date
aircraft ID
aircraft model
date in service
33
Starting the DATA Step
  • Use the DATA statement to begin the DATA step and
    name the SAS data set
  • data work.aircraft
  • other SAS statements
  • run
  • Use the INFILE statement to identify the input
    raw data file
  • data work.aircraft infile aircraft.dat
    other SAS statementsrun

34
Writing the INPUT Statement
  • Use the INPUT statement and pointer control to
    read the record starting with the first column.
    Read the value with the 16. informat and assign
    it to the variable MODEL.
  • JetCruise LF5200 030003 04/05/1994 03/11/2001
  • data work.aircraft infile aircraft.dat
    input _at_1 model 16. other SAS statementsrun

35
Writing the INPUT Statement
  • Use the INPUT statement and pointer control to
    read the record starting with column 18. Read
    the value with the 6. informat and assign the
    value to AIRCRAFTID.
  • JetCruise LF5200 030003 04/05/1994 03/11/2001
  • data work.aircraft infile aircraft.dat
    input _at_1 model 16. _at_18 aircraftid 6. other
    SAS statementsrun

36
Writing the INPUT Statement
  • Use the INPUT statement and pointer control to
    read the record starting with column 25. Read
    the value with the MMDDYY10. informat and assign
    the value to INSERVICE.
  • JetCruise LF5200 030003 04/05/1994 03/11/2001
  • data work.aircraft infile aircraft.dat
    input _at_1 model 16. _at_18 aircraftid 6.
    _at_25 inservice mmddyy10. other SAS
    statementsrun

37
Writing the INPUT Statement
  • Use the INPUT statement and pointer control to
    read the record starting with column 36. Read
    the value with the MMDDYY10. informat and assign
    the value to LASTMAINT.
  • JetCruise LF5200 030003 04/05/1994 03/11/2001
  • data work.aircraft infile aircraft.dat
    input _at_1 model 16. _at_18 aircraftid 6.
    _at_25 inservice mmddyy10. _at_36 lastmaint
    mmddyy10.run

38
SAS Syntax Rules
SAS statements are free-format.
  • They can begin and end in any column.
  • One or more blanks or special characters can be
    used to separate words.
  • A single statement can span multiple lines.
  • Several statements can be on the same line.

Unconventional spacing
data work.mech_pilot infile 'c\coursedata\empli
st.dat' input lastname 1-20 firstname
21-30 jobtitle 36-43 salary 54-59 run proc
means datawork.mech_pilot n mean class
jobtitle var salaryrun
39
SAS Syntax Rules
SAS statements are free-format.
  • They can begin and end in any column.
  • One or more blanks or special characters can be
    used to separate words.
  • A single statement can span multiple lines.
  • Several statements can be on the same line.

Unconventional spacing
data work.mech_pilot infile 'c\coursedata\empli
st.dat' input lastname 1-20 firstname
21-30 jobtitle 36-43 salary 54-59 run proc
means datawork.mech_pilot n mean class
jobtitle var salaryrun
40
SAS Syntax Rules
  • SAS statements
  • usually begin with an identifying keyword
  • always end with a semicolon.

data work.mech_pilot infile
'c\coursedata\emplist.dat' input lastname
1-20 firstname 21-30 jobtitle 36-43
salary 54-59 run proc print datawork.mech_pilo
t run proc means datawork.mech_pilot n mean
class jobtitle var salary run
41
Adding a New Variable
Create a new variable by extracting the
four-digit year values from the SAS date values.
Aircraft Service Records
42
Using an Assignment Statement
  • An assignment statement evaluates an expression
    and assigns the resulting value to a variable.
  • General syntax of an assignment statement

variableexpression
43
Using Operators
  • Selected operators for basic arithmetic
    calculations in an assignment statement

44
Using SAS Functions
  • A SAS function is a routine that returns a value
    that is determined from specified arguments.
  • General syntax of a SAS function

function-name(argument1,argument2, . . .)
45
Using SAS Functions
  • SAS functions
  • perform arithmetic operations
  • compute statistics (for example, mean)
  • manipulate SAS dates and process character values
  • perform many other tasks.

46
Creating a Vertical Bar Chart
  • Use the GCHART procedure and the VBAR statement
    to create a vertical bar chart.
  • proc gchart datawork.aircraft vbar
    yrbeg_service title 'Aircraft In Service, by
    Year'run

47
Reading a Subset of Raw Data
  • Use the DATA step that was written earlier. Add
    a subsetting IF statement to process only the
    subset in which the value of AGE is at least 15.
  • data work.aircraft infile aircraft.dat
    input _at_1 model 16. _at_18 aircraftid 6.
    _at_25 inservice mmddyy10. _at_36 lastmaint
    mmddyy10. yrbeg_serviceyear(inservice)
    ageyear(today())-yrbeg_service if
    agegt15run

48
What Is a SAS Data Library?
49
What Is a SAS Data Library?
  • Regardless of which host operating system you
    use, you identify SAS data libraries by assigning
    each one a libref.

libref
50
What Is a SAS Data Library?
  • By default, SAS creates two SAS data libraries
  • a temporary library called WORK
  • a permanent library called SASUSER.

WORK
SASUSER
51
SAS Data Libraries
You can think of a SAS data library as a drawer
in a filing cabinet and a SAS data set as one of
the file folders in the drawer.
52
SAS Data Libraries
When you invoke SAS, you automatically have
access to a temporary and a permanent SAS data
library.
  • WORK - temporary library
  • SASUSER - permanent library

You can create and access your own permanent
libraries.
  • IA - permanent library

53
Reading a SAS Data Set
  • Input data set Output data set
  • SET statement DATA statement

Temporary SAS data set
Temporary SAS data set
Permanent SAS data set
Permanent SAS data set
54
Two-level SAS Filenames
Every SAS file has a two-level name.
libref.filename
  • The first name (libref) refers to the library.

The data set MECH_PILOT is a SAS file in the WORK
library.
  • The second name (filename) refers to the file in
    the library.

55
Browsing the Data Portion
  • The PRINT procedure displays the data portion of
    a SAS data set.
  • By default, PROC PRINT displays
  • all observations
  • all variables
  • OBS column on the left-hand side.

56
Browsing the Data Portion
  • General form of the PRINT procedure
  • Example
  • proc print datawork.empdatarun

PROC PRINT DATASAS-data-setRUN
57
Objectives
  • Generate list reports using the PRINT procedure.
  • Display selected variables in a list report using
    the VAR statement.
  • Display selected observations in a list report
    using the WHERE statement.
  • Sort the observations in a SAS data set using the
    SORT procedure.

58
Creating a List Report
PROC Step
proc print datawork.empdata var empid salary
jobcode run
59
Formatting Data Values
proc print datawork.empsort format salary
dollar11.2 run
60
Creating a Frequency Report
PROC Step
61
Creating a Frequency Report
  • The FREQ procedure displays frequency counts of
    the data values in a SAS data set.
  • General form of a simple PROC FREQ step
  • PROC FREQ DATASAS-data-set
  • RUN
  • Example
  • proc freq datawork.empsort
  • run

62
Creating a One-Way Frequency Report
  • Only variables listed on the TABLES statement are
    included in the frequency counts. These are
    typically variables that have a limited number of
    distinct values.
  • General form of a PROC FREQ step
  • PROC FREQ DATASAS-data-set
  • TABLES SAS-variables
  • RUN

63
Calculating Job Code Frequencies
  • Job Code Frequency Report
  • The FREQ Procedure
  • Job Code

  • Cumulative Cumulative
  • Job_Code Frequency Percent
    Frequency Percent
  • ------------------------------------------------
    -----------------------
  • Flight Attendant 199 71.33
    199 71.33
  • Pilot 80 28.67
    279 100.00

64
Calculating Salary Frequencies
  • Salary Frequency Report
  • The FREQ Procedure
  • Annual Salary
  • Cumulative
    Cumulative
  • Salary Frequency Percent Frequency
    Percent
  • ------------------------------------------------
    ---------------------
  • Low to 25,000 41 14.70
    41 14.70
  • 25,000 to 50,000 172 61.65
    213 76.34
  • 50,000 and up 66 23.66
    279 100.00

65
Calculating Job Code/Salary Frequencies
  • The FREQ Procedure
  • Table of Job_Code by
    Salary
  • Job_Code(Job Code) Salary(Annual
    Salary)
  • Frequency
  • Percent
  • Row Pct
  • Col Pct Low to 25,000
    50,000 Total
  • 25,000 to
    50,0and up
  • 00
  • ---------------------------------------
    -----
  • Flight Attendant 41 158
    0 199
  • 14.70 56.63
    0.00 71.33
  • 20.60 79.40
    0.00
  • 100.00 91.86
    0.00
  • ---------------------------------------
    -----

66
Creating a Frequency Report
  • By default, PROC FREQ
  • analyzes every variable in the SAS data set
  • displays each distinct data value
  • calculates the number of observations in which
    each data value appears (and corresponding
    percentage)
  • indicates for each variable how many
    observations have missing values.

67
Calculating Summary Statistics
  • The MEANS procedure displays simple descriptive
    statistics for the numeric variables in a SAS
    data set.
  • General form of a simple PROC MEANS step
  • PROC MEANS DATASAS-data-set RUN
  • Example
  • proc means dataia.aircraftcap
  • run

68
Calculating Summary Statistics
69
Calculating Summary Statistics
  • By default, PROC MEANS
  • analyzes every numeric variable in the SAS
    data set
  • prints the statistics N, MEAN, STD, MIN, and
    MAX
  • excludes missing values before calculating
    statistics.

70
Selecting Variables
71
Grouping Observations
proc means dataia.aircraftcap maxdec2 var
totpasscap class model run
72
Calculating Capacity Statistics for Each Type of
Plane
  • The SAS System
  • The MEANS Procedure
  • Analysis Variable totpasscap
  • N
  • size Obs N Mean Std Dev Minimum
    Maximum
  • ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
    ƒƒƒƒ
  • Large 16 16 230.13 32.39 207.00
    290.00
  • Medium 9 9 178.56 11.40 165.00
    188.00
  • Small 39 39 132.64 18.85 97.00
    150.00
  • ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
    ƒƒƒƒ
Write a Comment
User Comments (0)
About PowerShow.com