Ch. 2: Getting Your Data into SAS - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Ch. 2: Getting Your Data into SAS

Description:

At OSEDA we try to ease the burden of converting data by ... But it is still important that you know how to convert your own. ... Before v7 this was heresy. ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 34
Provided by: Blo100
Category:
Tags: sas | data | getting | heresy

less

Transcript and Presenter's Notes

Title: Ch. 2: Getting Your Data into SAS


1
Ch. 2 Getting Your Data into SAS
  • At OSEDA we try to ease the burden of converting
    data by having data already in SAS data set
    format in the archive.
  • But it is still important that you know how to
    convert your own.
  • If you understand input statements and proc
    import, youll be well on your way to how to use
    put statements and proc export.

2
Four Basic Methods
  • Direct data entry via Viewtable or SAS/FSP
    windows.
  • Reading raw data files (ascii) with input
    statements.
  • Converting files from other packages such as dbf,
    xls, wkn -- usually via proc import.
  • Accessing data in other DB formats directly via
    special engines that make them look like SAS data
    sets. (Oracle access, for example.)

3
Entering Data Directly via Windows
  • Not used much here. This is usually done as part
    of an on-line transaction system, which is not
    something we do much with SAS.
  • It can be done, however, and can be quite
    sophisticated.

4
Reading Raw Data with Input Stmts
  • This has been traditionally the most important
    way we capture our data.
  • This requires the most knowledge of SAS language
    to do. Sometimes the format of the data makes it
    quite challenging.
  • Reading .csv files has become very important and
    common. SAS makes it pretty easy.

5
Converting xls, dbf, etc.
  • DDE (Windows only) option. Requires having the
    other application running during conversion.
  • Proc import and the Import Data Wizard is
    easiest.
  • Exporting the other applications data to csv and
    then importing remains an option.

6
Reading SPSS, Oracle, etc.
  • We now have special engines that can be used to
    make things such as Oracle tables look just like
    a SAS data set.
  • We have had good success accessing Oracle tables
    via the engine.
  • Nice part of doing it this way is that if the
    source data changes you do not have to reconvert.
    Sometimes, thats a problem.

7
Reading Raw Data Files
  • For small data collections you can imbed the data
    lines in your own code, preceded by a datalines
    statement.
  • Most of the time youll be dealing with data
    stored in external files.
  • The infile statement is used to point SAS to the
    file to be read. The input statement does the
    actual reading.

8
Accessing the Sample Code
  • Go to URLhttp//ftp.sas.com/samples/A56649
  • Cut and paste to get individual programs

9
Long Records
  • An annoying gotcha is the default of truncating
    input lines at 256 bytes.
  • Easy to circumvent if you know to code
    lrecl2000 (or some large value) as option on
    the infile statement. Sets up the buffer size -
    it does not mean the records have to really be
    that long. They just cant be longer.

10
Space Separated Data
  • Avoid this except for really small sets of data
    that you are keying in yourself.
  • Too easy for something to go wrong.
  • Character data containing blanks are a big
    problem.

11
Reading Fixed Format File
  • These are files where the fields are in the
    same column locations on each input line.
  • Read using a combination of column and formatted
    read specifications.
  • A good idea is to use formatted input for
    everything. Keeps it simple.

12
Formatted vs. Column Input
  • Input id 5. 1 year 4. 1 sales 7.
  • Input id 1-5 year 7-10 sales 12-18
  • input id char5. _at_7 year 4. _at_12 sales 7.
  • input id char5. 1 year 4. 1 sales 7 -!-
  • vs. free format
  • input id year sales

13
SAS builtin formats
  • Used within input (and put) statements (and
    functions) to convert to/from external formats to
    SAS internal formats.
  • always used as 1st char of a character format
    or informat.
  • Most references to formats require a period at
    the end to distinguish what it is.
  • Input dob date7 lt--what happens?-

14
Rich Collection of Formats
  • As seen on pp. 34-35 of TLSB.
  • Most of these can be used to write data (in a put
    statement) as well as read data. They just
    reverse the conversion process.
  • Note the wide assortment of data and time related
    formats.

15
Converting Raw Data at OSEDA
  • Major purpose of creating data archive is to make
    the expensive and error-prone task of getting the
    data properly converted something that is done
    just once and by pro. programmers.
  • Goal is to replace infile/input with set
    statements as much as possible. Make
    record-layout type documentation obsolete.

16
Need to Carefully Review
  • It is extremely easy to make a mistake when
    coding complex input statements.
  • Always review the resulting data set(s) very
    carefully before going on.
  • Use Proc Contents and Proc Print in batch, or the
    corresponding interactive windows in DM to
    carefully inspect data sets.

17
SAS Data Sets
  • Absolutely essential that you become very
    comfortable with what these are, how they are
    created, referenced, etc.
  • For some reason, many new users see the way SAS
    handles data set references as really hard.
  • But its really pretty simple.

18
New in V7 Data Set Literals
  • Starting with v7 you can code data
    c\temp\boone_county
  • Before v7 this was heresy. You always had to
    define the data library separately and use a
    2-level name to reference a data set within that
    directory libname tempsas c\temp data
    tempsas.boone_county

19
One Level Data Set Names
  • References to SAS data sets that have only one
    level (I.e. no period such as save.set1) are
    assumed to be stored in a special temporary SAS
    data library named work.
  • An important exception to this is if you define
    the specific user libref. If such a library is
    defined, SAS will store and look for all 1-level
    names in that data library.

20
SAS Data Libraries
  • On Unix and Windows these are just collections of
    SAS data sets stored in directories.
  • At least it used to be that simple. With v8 we
    can now define SAS data libraries that include
    multiple directories.
  • But the latter kind are still rare around here.

21
Creating a Permanent SAS Data Set
  • Typically involves something such as libname
    mysas s\myname\mysas data mysas.data0901
  • The directory specified in quotes in the libname
    statement must be allocated outside SAS (using
    mkdir or the equivalent.)
  • The data set is stored as a system file within
    that directory with name data0901. The extension
    will vary with the engine/SAS version/platform.

22
Entering Data with VT Window
  • I have never done this or known anyone who has.
  • One of our goals at OSEDA is to be able to have a
    record of how the data we use relates to the
    original source.
  • Keying in data like this can be tricky in that it
    is way too easy to make a mistake that the
    software cannot detect.

23
Reading Multiple Input Lines
  • The input statement typically reads one line of
    data to create one observation in a SAS data set.
    However, .
  • You can use the / to tell SAS to go to the next
    input line, or 2 (or 3, etc.) to tell it to
    position itself at the 2nd line (relative to
    where it began reading.)

24
Load Data and Code for 2.12
  • Go to the sample code at http//ftp.sas.com/sample
    s/A56649
  • Search for text 2.12
  • Go to Windows to create c\MyRawData.
  • Type DM command note rawdata to open a SAS
    notepad window.
  • Copy the data from your browser and paste it to
    the notepad window.

25
Data Code for 2.12 - cont.
  • Use File-Save as to save the contents of the
    notepad window to c\MyRawData\temperature.dat
  • Go back and get the sample data step code from
    the browser. Come back to SAS and paste the code
    into the editor window.
  • Submit the code.

26
Multiple Obs Per Line of Raw Data
  • Use the double trailing _at_ to tell SAS to stay on
    this line, even across cycles of the DATA step.
    (Default is to always flush the current record at
    the end of a d-s cycle.)
  • data scores input name score
    _at__at_datalinesMike 25 Samuel 36 Melanie 40 Me
    2 proc print sum score run

27
Trailing _at_ to Read Part of Line
  • A single trailing _at_ in an input stmt says to
    leave the data pointer where it is.
  • Subsequent input statements in the same step will
    pick up where last input left off.
  • But record is released at end of data step cycle
    (unlike with double trailing _at_).
  • Allows you to read part of a record and then
    conditionally continue reading, or have
    alternative reads based on record type.

28
Infile Statement Options
  • Firstobs and obs commonly used to begin/end
    at specific locations within file.
  • Missover, truncover, and stopover (not in TLSB)
    specify how to handle case where SAS needs to go
    to a new line to complete the input read.

29
Reading Delimited (.csv) Files
  • You almost always want to use the dsd option.
    Add option dlm09x for tab-delimited files.
  • A problem with reading this way is that character
    variables may not get right length imputed.
  • Strongly recommend declaring all vars before
    reading (prior to input statement.)

30
Example of Reading csv File
  • data class
  • length stud_id 6 Name 24 Address 40 City 20
    State 2 zip 5 gpa 5
  • infile datalines dsd
  • input stud_id -- gpa
  • put / _all_
  • datalines
  • 001234, Joe Smith,123 S 5th St,Columbia,MO,65201,3
    .0
  • 003456, Mary Jones,909 Nifong, Jefferson
    City,MO,65103,3.5
  • run

31
Running the Samples
  • We have captured the sample code for the text
    (TLSB) and edited/extended it for local use.
  • See the code in file s\sas8\Ch2_Samples.sas s
    \sas8\Ch3_Samples.sas etc.

32
The copyto Macro
  • Utility macro used to copy the sample data to our
    local data directory, s\sas8\RawData
  • Uses a single positional parameter and a single
    keyword parameter.
  • Generates a simple SAS data step that reads data
    from datalines and writes a file in the RawData
    directory with name determined by the parameters.

33
Run the Samples - Ch 2
  • Use the new enhanced editor window.
  • Do a file-open and select the Ch2_Samples.sas
    program file.
  • Note that all the steps which invoke copyto have
    already been run and the results are already out
    there. But it will not hurt if you rerun.
  • You can edit the copyto macro and make your own
    personal copies which you can then play with.
Write a Comment
User Comments (0)
About PowerShow.com