Topics in Data Management - PowerPoint PPT Presentation

About This Presentation
Title:

Topics in Data Management

Description:

Data available on common variables from different sources. ... Example - Brookstone Store Sales&Inventory. 8 EXCEL Spreadsheets - 4 Quarters X 2 Measures ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 17
Provided by: larryw4
Category:

less

Transcript and Presenter's Notes

Title: Topics in Data Management


1
Topics in Data Management
  • SAS Data Step

2
Combining Data Sets I - SET Statement
  • Data available on common variables from different
    sources. Multiple datasets with common variable
    names, possibly different sampling/experimental
    units
  • Exam scores from students in various sections of
    STA 2023
  • County level data from different state databases
  • Flight departure/arrival data from different
    months

3
Combining Data Sets I - SET Statement
options nodate nonumber ps54 ls80 data
one input student 1-8 idnum 9-12 exam1 14-16
exam2 18-20 exam3 22-24 section1 cards Amy
1456 98 78 84 Zed 2234 68 84
75 run data five input student 1-8 idnum
9-12 exam1 14-16 exam2 18-20 exam3
22-24 section5 cards Alex 3410 74 68
. Zach 4561 92 74 88 run data all set
one five run proc print run quit
4
Combining Data Sets I - SET Statement
The SAS System Obs student idnum
exam1 exam2 exam3 section
1 Amy 1456 98 78 84
1 2 Zed 2234 68
84 75 1 3 Alex
3410 74 68 . 5
4 Zach 4561 92 74
88 5
5
Combining Data Sets II - MERGE Statement
  • Data on common sampling/experimental units,
    different variables/characteristics measured in
    different datasets.
  • County data from different government sources
  • Store sales data updated over time

6
Combining Data Sets II - MERGE Statement
options nodate nonumber ps54 ls80 data
s2003 input store 1-8 sales03
10-14 cards Atlanta 1459 Zurich
1383 run data s2004 input store 1-8
sales04 10-14 cards Atlanta 1459 Zurich
1383 run proc sort datas2003 by store proc
sort datas2004 by store data s0304 merge
s2003 s2004 by store run proc
print run quit
The SAS System Obs store sales03
sales04 1 Atlanta 1459 1459
2 Zurich 1383 1383
7
Creating New Variables From Existing Ones
  • Creating Final Grade for Students (Exams 1 and 2
    Each Count 30, Exam 3 40)
  • Total (0.3Exam1)(0.3Exam2)(0.4Exam3)
  • Obtaining Sales Growth () for stores
  • Grow0403100(sales04-sales03)/sales03

8
Grades Example
data all set one five total(0.3exam1)(0.3exa
m2)(0.4exam3) run proc print var student
idnum total run quit
The SAS System Obs student
idnum total 1 Amy
1456 86.4 2 Zed
2234 75.6 3 Alex
3410 . 4 Zach 4561
85.0
9
Building Case Histories
  • Have multiple observations of same variable on
    individual units (not necessarily the same number
    across individuals).
  • Want to summarize the measurements for each
    individual and obtain single record.
  • Summary of all Delta flights for each ATL route
    to other cities for October 2004
  • Arrest record for juveniles over a 5 year period
  • Sales histories for individual stores in a retail
    chain

10
Building Case Histories
  • Step 1 SORT dataset on the variable(s) that
    define(s) the individual units/cases.
  • Step 2 Set the previous dataset into a new one,
    using the same BY statement as in the SORT.
  • The new dataset sees the old dataset as a
    series of blocks of measurements by individual
    cases
  • Step 3 Define any variables you want to use to
    summarize cases in RETAIN statement.
  • Step 4 At beginning of each individual, reset
    variables in Step 3 (typically to 0)
  • Step 5 At end of each individual OUTPUT record

11
Example - Brookstone Store SalesInventory
  • 8 EXCEL Spreadsheets - 4 Quarters X 2 Measures
  • 520 stores observed over 52 weeks
  • Typical Spreadsheet Portion (4 stores X 6 weeks)

Note that the company provides 13 columns
representing the 13 weeks in the quarter for each
storenot the way we want to analyze it. Also,
got rid of commas in EXCEL before exporting to
text file.
12
Reading the Data in SAS
Data inv1 infile filename input storeid 6-8
storename 10-38 _at_ do week1 to 13 input inv
_at_ output end run
This creates 13 observations per store and
single inv variable
13
Reading the Data in SAS
SET
SET
MERGE
14
Building a Store Record for Year
  • Suppose Management wants following summary
    measures for each store
  • Total sales
  • Average sales to inventory ratio
  • Mean and standard deviation of sales
  • Correlation between sales and inventory
  • We need the following quantities counted across
    weeks
  • SALES, SALES2, INV, INV2, SALESINV, SALES/INV

15
SAS Code to Obtain Measures by Store (P1)
Data inv set inv1-inv4 run proc sort by
storeid run Data sales set sales1-sales4 run
proc sort by storeid run Data invsales merge
inv sales by storeid run proc sort by
storeid run Data invsales1 set invsales by
storeid retain sumsales sumsales2 suminv suminv2
salesxinv sales_inv
16
If first.storeid then do sumsales0
sumsales20 suminv0 suminv20 salesxinv0
sales_inv0 end sumsalessumsalessales
sumsales2sumsales2(sales2) suminvsuminvinv
suminv2suminv2(inv2) salesxinvsalesxinv(s
alesinv) sales_invsales_inv(sales/inv) if
last.storeid then do totsalessumsales
meansal_invsales_inv/52 meansalestotsales/52
varsales(sumsales2-(sumsales2)/52)/51
stdsalessqrt(varsales) varinv(suminv2-(suminv
2)/52)/51 stdinvsqrt(varinv) covslinv(salesx
inv-(sumsalessuminv)/52)51 corrslinvcovslinv/(
stdsalesstdinv) output end run
Write a Comment
User Comments (0)
About PowerShow.com