PROC SQL: Tips and Translations for Data Step Users - PowerPoint PPT Presentation

About This Presentation
Title:

PROC SQL: Tips and Translations for Data Step Users

Description:

PROC SQL: Tips and Translations for Data Step Users. By: Gail Jorgensen. Susan Marcella. AGENDA. SQL Syntax Review. Joins Translated. SQL Strengths & Uses ... – PowerPoint PPT presentation

Number of Views:168
Avg rating:3.0/5.0
Slides: 25
Provided by: njs9
Category:

less

Transcript and Presenter's Notes

Title: PROC SQL: Tips and Translations for Data Step Users


1
PROC SQL Tips and Translations for Data Step
Users
By Gail Jorgensen Susan Marcella
2
AGENDA
  • SQL Syntax Review
  • Joins Translated
  • SQL Strengths Uses

3
PROC SQL Tips and Translations for Data Step
UsersSyntax
  • Proc SQL
  • create table/view newdsname as
  • select var1, var2, varN
  • from dsname
  • where condition
  • Quit

4
PROC SQL Tips and Translations for Data Step
Users JOIN vs MERGE
  • Types of JOINs
  • Inner Join selects only matching records
  • (same as if ina and inb)
  • Outer Join selects some non-matching records
  • Left Join selects all records from first table,
    only matching records from second
  • (same as if ina)
  • Right join selects all records from second
    table, only matching records from first
  • (same as if inb)
  • -- Full join selects all records from both
    tables
  • (same as having no if statement)

5
PROC SQL Tips and Translations for Data Step
UsersInner Join
idfld col2
1 M
1 N
2 O
3 P
4 Q
5 R
idno col3 col4
1 X C
2 X D
2 Y F
4 Z
5 Z
7 Z
Dataset DS_A1
Dataset DS_BB
data c merge ds_a1(inina)
ds_b(ininb rename(idnoidfld)) by idfld
if ina and inb run
proc sql create table ds_c as select
ds_a1., ds_b. from ds_a1, ds_b where
ds_a1.idfld ds_b.idno quit
idfld col2 col3 col4
1 M X C
1 N X C
2 O X D
2 O Y F
4 Q Z
5 R Z
idfld col2 idno col3 col4
1 M 1 X C
1 N 1 X C
2 O 2 X D
2 O 2 Y F
4 Q 4 Z
5 R 5 Z
6
PROC SQL Tips and Translations for Data Step
UsersLeft Join
idfld col3 col4
1 X C
2 X D
2 Y F
4 Z
5 Z
7 Z
idfld col2 col5
1 M A
1 N D
2 O
3 P J
4 Q K
5 R N
Dataset inf_a
Dataset inf_b
proc sql create table sql_left as select
a., b. from inf_a as a left join inf_b
as b on a.idfld b.idfld quit
data ds_left merge inf_a(inina)
inf_b(ininb) by idfld if ina run
idfld col2 col5 col3 col4
1 N D X C
1 M A X C
2 O Y F
2 O X D
3 P J
4 Q K Z
5 R N Z
idfld col2 col5 col3 col4
1 N D X C
1 M A X C
2 O Y F
2 O X D
3 P J
4 Q K Z
5 R N Z
Dataset sql_left
Dataset ds_left
7
PROC SQL Tips and Translations for Data Step
UsersRight Join
  • proc sql
  • create table sql_right as
  • select a., b.
  • from inf_a as a right join inf_b as b
  • on a.idfld b.idfld
  • quit

data ds_right merge inf_a(inina)
inf_b(ininb) by idfld if inb run
idfld col2 col5 col3 col4
1 M A X C
1 N D X C
2 O X D
2 O Y F
4 Q K Z
5 R N Z
7 Z
idfld col2 col5 col3 col4
1 N D X C
1 M A X C
2 O Y F
2 O X D
4 Q K Z
5 R N Z
. Z
8
PROC SQL Tips and Translations for Data Step
Users Full Join
CList07
CList08
Obs name recd sent
1 Amanda yes no
2 Gabi yes yes
3 Jan yes yes
4 Jim no yes
5 Pam no no
Obs name recd sent
1 Alison yes yes
2 Jan yes yes
3 Pam no no
4 Tom yes yes
proc sql create table sql_clist as select
c7.name, c7.recd as recd07, c8.recd as
recd08, c7.sent as sent07, c8.sent as
sent08 from clist07 as c7 full join clist08 as
c8 on c7.namec8.name quit
proc sort dataclist07 by name run proc sort
dataclist08 by name run data
data_clist merge clist07 clist08
(rename(recdrecd08 sentsent08)) by name run
9
PROC SQL Tips and Translations for Data Step
Users Full Join (Cont)
Sql_CList
Data_CList
Obs Name recd07 recd08 sent07 sent08
1 yes yes
2 Amanda yes no
3 Gabi yes yes
4 Jan yes yes yes yes
5 Jim no yes
6 Pam no no no no
7 yes yes
Obs name recd sent recd08 sent08
1 Alison yes yes
2 Amanda yes no
3 Gabi yes yes
4 Jan yes yes yes yes
5 Jim no yes
6 Pam no no no no
7 Tom yes yes
10
PROC SQL Tips and Translations for Data Step
UsersHandling Duplicate Variable Names
  • To always select the variable from one dataset
  • Drop unwanted version of variable
  • (PROC SQL permits all SAS dataset options)
  • Select variable from specific table
  • To keep variable from both tables
  • Rename the variable in one dataset
  • To select variable based on value
  • Use CASE statement

11
PROC SQL Tips and Translations for Data Step
UsersCASE Statement
proc sql create table NewCList as select case
when missing(c7.name) then c8.name else
c7.name end as name, c7.recd as recd07,
c8.recd as recd08, c7.sent as sent07, c8.sent
as sent08 from clist07 as c7 full join clist08 as
c8 on c7.namec8.name quit
proc sort dataclist07 by name run proc sort
dataclist08 by name run data
data_clist merge clist07 clist08
(rename(recdrecd08 sentsent08)) by name run
12
PROC SQL Tips and Translations for Data Step
UsersCASE Statement - Results
Obs name recd07 recd08 sent07 sent08
1 Alison yes yes
2 Amanda yes no
3 Gabi yes yes
4 Jan yes yes yes yes
5 Jim no yes
6 Pam no no no no
7 Tom yes yes
13
PROC SQL Tips and Translations for Data Step
UsersDown Calculations
  • PROC SORT datashs.exposure by subject_id run
  • DATA counters(KEEPTableName MAXOBS TOTOBS)
  • SET shs.exposure ENDLAST BY subject_id
  • length TableName 50
  • RETAIN MAXOBS OBSCNTR TOTOBS 0
  • TableNameexposure"
  • TOTOBS1
  • OBSCNTR1
  • IF LAST.subject_id THEN DO
  • IF MAXOBS lt OBSCNTR THEN MAXOBSOBSCNTR
  • OBSCNTR0
  • END
  • IF LAST THEN OUTPUT chemcnts
  • label maxobs'Maximum number of obs per person'
  • totobs'Total Number obs in table'
  • run
  • proc print datacounters
  • run

Obs TableName MAXOBS TOTOBS
1 exposure 14 2124
14
PROC SQL Tips and Translations for Data Step
UsersDown Calculations
sqlcounter
  • proc sql
  • create table sqlcounter as
  • select distinct subject_id, count() as subjcnt
  • from fshs.exposure
  • group by subject_id
  • select exposure" as TableName,
  • max(subjcnt) as MaxObs, sum(subjcnt) as TotObs
  • from sqlcounter
  • quit

Obs subject_id subjcnt
1 178 7
2 306 1
3 307 1
4 308 1
5 326 3
6 330 1
7 331 1
8 332 1
TableName MaxObs TotObs
exposure 14 2124
15
PROC SQL Tips and Translations for Data Step
UsersCounts and Nesting Queries
Genre
Alternative 9
Bluegrass 43
Blues 14
Children's Music 62
Christian Gospel 88
Classical 74
Country 77
Easy Listening 31
Electronic 1
Folk 16
General Folk 18
Gospel Religious 40
Hip Hop/Rap 2
Holiday 13
Inspirational 70
proc sql select distinct genre, count() from
itunes group by genre quit proc sql
outobs1 select (select count() from
itunes) as TotalSongs, (select count(distinct
genre) from itunes) as GenreCnt, (select
count(distinct artist) from itunes) as
ArtistCnt, (select count(distinct album) from
itunes) as AlbumCnt from itunes quit
TotalSongs GenreCnt ArtistCnt AlbumCnt
802 23 160 100
16
PROC SQL Tips and Translations for Data Step
UserDictionaries
proc sql create view detail as select
from dictionary.columns create view
extern as select from dictionary.members
create view tbl as select from
dictionary.tables create view gotem as
select trim(libname) as LibName,
trim(memname) as TableName, trim(name) as
ColName, label as ColLabel from
sashelp.vcolumn quit
SAS
17
PROC SQL Tips and Translations for Data Step
UserDictionaries Getting variable names
  • proc sql
  • / get names of all variables you want /
  • select name into drinkvars separated by ', '
  • from dictionary.columns
  • where libnameAUG' and memname'DEMOG' and
  • lowcase(name) contains ndrk'
  • / use your newly created macro variable in your
    select statement /
  • create table drinks as
  • select drinkvars
  • from aug.demog
  • quit

18
PROC SQL Tips and Translations for Data Step
UserDictionaries Getting variable names
  • proc sql
  • / add the table alias to the front of each
    variable name
  • as you create your macro variable /
  • select 'd.'name into aliasvars separated by ',
    '
  • from dictionary.columns
  • where libname'AUG' and memname'DEMOG' and
  • lowcase(name) contains ('ndrk')
  • / do your merge or whatever using the macro
    variable you just created /
  • create table newtable as
  • select aliasvars, c.expcategory
  • from aug.demog as d left join aug.exposure as c
  • on d.jcml_idc.jcml_id
  • quit

19
PROC SQL Tips and Translations for Data Step
UserViews
  • Views are virtual tables
  • Created with CREATE VIEW statement
  • Can be used as if they are normal physical tables
  • Enhance security
  • can construct a view of only fields and rows that
    user is allowed to view
  • Enhance ease-of-use
  • Can combine rows and columns from multiple tables
    into a single view
  • Facilitate data integrity
  • Can have several views on the same table, but
    only have to update the base table
  • Users always see up-to-date data

proc sql create view aug.testview as
select d.subject_id, d.case_id, d.age, e.job_num,
e.exposure_element from aug.demog as d,
aug.exposure as e where d.subject_ide.subje
ct_id quit
20
PROC SQL Tips and Translations for Data Step
Users Creating Data Source Indicators
dads
faminc
Obs famid name inc
1 2 Art 22000
2 1 Bill 30000
3 3 Paul 25000
4 4 Karl 95000
Obs famid faminc96 faminc97 faminc98
1 3 75000 76000 77000
2 1 40000 40500 41000
3 2 45000 45400 45800
4 5 55000 65000 70000
5 6 22000 24000 28000
proc sort datadads outsorted_dads by famid
run proc sort datafaminc outsorted_faminc
by famid run data ds_fj merge
sorted_dads(inin1) sorted_faminc(inin2) by
famid if in1 and in2 then indic1 else
indic0 dadindin1 famindin2
fidfamid run
proc sql create table sql_fj as select ,
(dads.famidfaminc.famid) as indic,
(dads.famid .) as dadind,
(faminc.famid .) as famind,
coalesce(dads.famid, faminc.famid) as fid from
dads full join faminc on dads.famidfaminc.famid
quit
21
PROC SQL Tips and Translations for Data Step
UsersFull Join - cont
Obs famid name inc faminc96 faminc97 faminc98 indic dadind famind fid
1 1 Bill 30000 40000 40500 41000 1 1 1 1
2 2 Art 22000 45000 45400 45800 1 1 1 2
3 3 Paul 25000 75000 76000 77000 1 1 1 3
4 4 Karl 95000 . . . 0 1 0 4
5 . . 55000 65000 70000 0 0 1 5
6 . . 22000 24000 28000 0 0 1 6
Sql_fj
Obs famid name inc faminc96 faminc97 faminc98 indic dadind famind fid
1 1 Bill 30000 40000 40500 41000 1 1 1 1
2 2 Art 22000 45000 45400 45800 1 1 1 2
3 3 Paul 25000 75000 76000 77000 1 1 1 3
4 4 Karl 95000 . . . 0 1 0 4
5 5 . 55000 65000 70000 0 0 1 5
6 6 . 22000 24000 28000 0 0 1 6
Ds_fj
22
PROC SQL Tips and Translations for Data Step
UsersAdditional Uses
Scenario For a case/control study, verify that
all controls have age within 5 to 10 years of the
related case age.
  • proc sql
  • title 'Bad Control Matches'
  • select c.subject_id, c.casenum, c.gender, age as
    CntlAge label'CntlAge',
  • (select age from cases where
    subject_idc.casenum) as CaseAge,
  • abs(cntlage-calculated caseage) as AgeDiff
  • from controls as c left join demog as d
  • on c.subject_idd.subject_id
  • where (not within5 and not within10)
  • quit

23
PROC SQL Tips and Translations for Data Step
UserMerging Multiple Tables
Obs name class miss_grade
1 Mary Art none missing
2 Olive Art none missing
3 Quincy Art none missing
4 Nat Art none missing
5 Pat Art none missing
6 Quincy Music missing 4
7 Richard Music none missing
8 Mary Math none missing
9 Nat Math none missing
10 Olive Math none missing
11 Pat Math none missing
12 Quincy Math missing 2
13 Richard Math none missing
14 Mary English none missing
15 Nat English none missing
16 Olive English none missing
17 Pat English none missing
18 Quincy English none missing
19 Richard English none missing
proc sql create table sql_c3 as select
a.name, b.class, case
when missing(c.grade1) then "missing 1
when missing(c.grade2) then "missing 2
when missing(c.grade3) then "missing 3
when missing(c.grade4) then "missing
4" else "none missing end as
miss_grade from indat_a as a, indat_b as b,
indat_c as c where c.classidb.classid and
c.perida.perid quit
24
  • SAS and all other SAS Institute Inc. product or
    service names are registered trademarks or
    trademarks of SAS Institute Inc. in the USA and
    other countries. indicates USA registration.
Write a Comment
User Comments (0)
About PowerShow.com