Acquiring Data via FTP in SAS - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

Acquiring Data via FTP in SAS

Description:

Goal: Create SAS Dataset from Files. Downloaded from an FTP Server ... 1000045|NICHOLAS FINANCIAL INC|SC 13G|2005-10-28|edgar/data/1000045/0000950134-05-019959.txt ... – PowerPoint PPT presentation

Number of Views:184
Avg rating:3.0/5.0
Slides: 11
Provided by: jzei
Category:
Tags: ftp | sas | acquiring | create | data | plotter | via

less

Transcript and Presenter's Notes

Title: Acquiring Data via FTP in SAS


1
Acquiring Data via FTP in SAS
  • James Zeitler
  • Baker Research Services
  • Harvard Business School
  • jzeitler_at_hbs.edu

2
Goal Create SAS Dataset from Files Downloaded
from an FTP Server
Master index of reporting forms filed with
Securities and Exchange Commission (SEC) during
the fourth quarter of 2005
3
The Master Index File
Description Master Index of EDGAR
Dissemination Feed Last Data Received
December 31, 2005 Comments
webmaster_at_sec.gov Anonymous FTP
ftp//ftp.sec.gov/edgar/ CIKCompany
NameForm TypeDate FiledFilename ---------------
--------------------------------------------------
--------------- 1000032BINCH JAMES
G42005-12-13edgar/data/1000032/0000720896-05-00
0046.txt 1000045NICHOLAS FINANCIAL
INC10-Q2005-11-10edgar/data/1000045/0000950144-
05-011577.txt 1000045NICHOLAS FINANCIAL
INC8-K2005-10-20edgar/data/1000045/0000950144-0
5-010504.txt 1000045NICHOLAS FINANCIAL INCSC
13G2005-10-28edgar/data/1000045/0000950134-05-01
9959.txt
  • Header
  • Description of file contents
  • Date of last filing received
  • Where to send comments and questions
  • FTP server address
  • Field names
  • Data
  • CIK (Central Index Key)
  • Company Name
  • Form type (E.g. Form 4, 10-Q)
  • Date filed
  • File name
  • Pipe () delimited

4
Reading the Master Index File (1)
data work.master_2005Q4 (drop TextTest)
infile "C\Documents and Settings\My
Documents\BASUG\CC_200912\master.idx" missover
lrecl 256 pad dlm '' dsd length
Description 60 LastDataDate 8
Comments 20 AnonymousFTP
30 CIK 12 CompanyName
80 FormType 12
DateFiled 8 Filename 80
retain Description "" LastDataDate .
Comments ""
AnonymousFTP "" informat DateFiled
yymmdd10. LastDataDate wrddat.
format DateFiled LastDataDate date9.
5
Reading the Master Index File (2)
input _at_1 TextTest 1-24 _at_ if TextTest
"Description" then input Description
24-77 else if TextTest "Last Data
Received" then input _at_24 LastDataDate
else if TextTest "Comments" then
input Comments 24-42 else if
TextTest "Anonymous FTP" then input
AnonymousFTP 24-52 else if TextTest ""
then delete else if TextTest
"CIKCompany" then delete else if
TextTest "----------" then delete
else do input _at_1 CIK CompanyName
FormType DateFiled
FileName output end run
  • Tricks
  • RETAIN header information
  • Read TextTest with trailing _at_
  • TextTest indicates record type
  • Tricks
  • Read different record types differently
  • OUTPUT one record for each data row

6
Using the FTP Access Method on the FILENAME
Statement (1)
filename mastidx ftp "master.idx" cd
"/edgar/full-index/2005/QTR4/" host
"edgar.sec.gov" user "anonymous"
pass ltmyemailaddressgt" run data
work.master_2005Q4 (drop TextTest) infile
mastidx missover lrecl 256 pad dlm ''
dsd ..........................................
  • The filename statement links the file reference
    (mastidx) to a file on the FTP server
  • The ftp file access method indicates that the
    referenced file is on an FTP server
  • master.idx names the file
  • cd (change directory)indicates the directory
    where the file is located on the server
  • host specifies the server address
  • user and pass log the user onto the FTP server
  • The infile statement refers to the fileref
    (mastidx) defined on the filename statement

7
Using the FTP Access Method on the FILENAME
Statement (2)
NOTE 220 FTP server ready. NOTE User anonymous
has connected to FTP server on Host www.sec.gov
. NOTE The infile MASTIDX is
Filenamemaster.idx, Pathname
"/edgar/full-index/2005/QTR4" is the current
director, Local Host Namejzeitler,
Local Host IP addr10.8.2.10, Service
Hostname Namewww.sec.gov, Service IP
addr162.138.185.32, Service
NameFTP,Service Portno21,Lrecl256,
RecfmVariable NOTE 240580 records were read
from the infile MASTIDX. The minimum record
length was 1. The maximum record length was
212. NOTE The data set WORK.MASTER_2005Q4 has
240570 observations and 9 variables. NOTE DATA
statement used (Total process time) real
time 122.29 cpu time
2.29 seconds
8
Reading Multiple Index Files
  • Use macro variables to point to the appropriate
    directories on the FTP host
  • yyyy cycles through the years from 1993 through
    2009
  • q cycles through the quarters 1 through 4 for
    each year
  • Use PROC APPEND to concatenate each temporary
    dataset after its read.
  • The variables YYYY and Q , based on the values of
    the macro variables, indicate the year and
    quarter each observation comes from. I can use
    these later to build directory names.

macro readidx do yyyy 1993 to 2009 do q
1 to 4 filename mastidx ftp "master.idx"
cd "/edgar/full-index/yyyy./QTRq./"
host "edgar.sec.gov" user
"anonymous" pass "jzeitler_at_hbs.edu" run
data work.master_yyyyqq(drop TextTest)
infile mastidx missover lrecl 256 pad dlm
' dsd
input _at_1 CIK
CompanyName
FormType DateFiled
FileName YYYY
"cmpres(yyyy.)" Q
"cmpres(q.)" output end
run proc append base work.master_idx
data work.master_yyyyqq force
run filename mastidx clear
end end mend readidx
9
More on Using FILENAME
  • Device types include
  • DDE
  • DISK
  • DUMMY
  • GTERM
  • PIPE
  • PLOTTER
  • PRINTER
  • TAPE
  • TEMP
  • TERMINAL
  • Access methods include
  • CATALOG
  • CLIPBOARD
  • EMAIL
  • FTP
  • SOCKET
  • URL
  • FILENAME fileref CLEAR_ALL_ CLEAR
  • FILENAME fileref LIST_ALL_ LIST

10
The User-Created wrddat INFORMAT

1. Here's an ingenious
way to read the date in October 6, 2009 format
by creating a user-defined informat
Taken from

Paper 101-27
Have a
Strange DATE? Create your own INFORMAT to Deal
with Her Venky Chakravarthy, Ann
Arbor, MI
http//www2.sas.com/proceedings/sugi27/p101-27.pd
f
data
work.WordDate retain fmtname "wrddat"
type "I" do Label "01Jan1960"d to
"31DEC2009"d Start trim(left(put(Label,wor
ddate19.))) output end run proc format
cntlin work.WordDate run
Write a Comment
User Comments (0)
About PowerShow.com