Why STATA - PowerPoint PPT Presentation

1 / 66
About This Presentation
Title:

Why STATA

Description:

The stuff we type into the command window to tell STATA what to do ... 'if' statements especially handy for creating a variable from two different variables: ... – PowerPoint PPT presentation

Number of Views:189
Avg rating:3.0/5.0
Slides: 67
Provided by: michael748
Category:
Tags: stata

less

Transcript and Presenter's Notes

Title: Why STATA


1
Why STATA?
You have a Question
2
You have the Data

10100001010100100101010 00101011000001100100010 00
010010101001010010011
You need The Answer
You have a Question
3
Data analysis software paradigm
HYS or BRFSS

101000010101001001 001010110000011001 000100101010
010100
STATA software
Your Answer
Your Question
4
Data analysis software paradigm
DATA

101000010101001001 001010110000011001 000100101010
010100
STATA software
Commands
Output
5
OK STATA, really?
6
Why not SPSS, SAS, SUDAAN or Excel?
  • STATA is cheaper perpetual license
  • STATA works easily with survey data (complex
    sampling designs)
  • STATA works fairly easily with data, in general
  • I only know STATA
  • STATA is cool

7
Orientation to STATA
8
What You Will Learn
  • How to Open and Close STATA
  • What STATAs Windows include
  • What options are available on the Shortcut Bar
  • What options are available in the Drop Down Menus

9
Opening STATA
  • Like any other Windows app
  • Can be opened form start menu
  • Shortcut on desktop
  • Shortcut on task bar
  • Do not double click on a STATA data file to open
    STATA it will open STATA, but most likely not
    your data file (need to prepare STATAs reserved
    memory capacity)

10
Closing STATA
  • Like any other Window app
  • Using menus File Exit
  • Click on X in upper right window
  • Either option will prompt you to save the work on
    your data
  • NOTES
  • Always be careful when saving stata data files
    you may be saving changes that you dont want to
    keep. To be safe, save it under another file
    name and always keep your original.

11
Look at STATAs Windows
  • Results
  • Biggest window (black background)
  • Where all the results appear
  • Command
  • Small window under Results window
  • Where you type in your STATA commands

12
Look at STATAs Windows
  • Review
  • Medium window in upper left corner
  • Where STATA documents every command you type into
    the Command window including erroneous
    commands
  • Variable
  • Medium window in lower left corner
  • Where all the variables of an open dataset appear
    with variable descriptions

13
The Shortcut Bar
  • Functions like all Windows apps
  • Shortcut icons underneath menu bar
  • Clicking on a shortcut icon once will activate
  • Holding your cursor over the icon tells you what
    it is
  • Not all shortcuts are useful

14
The Shortcut Bar (key features)
15
The Shortcut Bar (other features)
16
Drop Down Menus
  • Functions like all Windows apps
  • Menu bar is at the top of the window
  • Navigation through menu options functions like
    all windows apps
  • Most menus related to STATA commands are more
    trouble than they are worth

17
Drop Down Menu (key features)
18
Drop Down Menus (other features)
19
Start to use STATA to look at Healthy Youth Survey
20
What You Will Learn
  • How to use a Log file (records what you do)
  • How to open Data files
  • How to save Data files
  • How to explore variables
  • Generating new variables
  • Collapsing and recoding variables
  • Labeling variables

21
Before we beginHow we will be presenting this
to you
  • What is STATA language?
  • The stuff we type into the command window to tell
    STATA what to do
  • How we will Learn STATA via examples
  • On the slides, STATA language is in
  • type-writer font AKA Courier font.

22
As we learn STATA, lets think about a research
question
How many students in each grade report they
smoked cigarettes on any days in the past 30
days?
23
Before Opening Data Files
  • Set STATAs storage capacity first
  • Usually setting 100 megabytes is okay
  • (Basically reserves memory from the system for
    opening data files)
  • set memory 100000k
  • OR
  • set mem 100m

24
Log files Keeping Track of What You Do
  • Log files document all your actions in STATA.
  • There are 2 types
  • .log files - opens in notepad, word pad, MS
    Word, other text editors
  • .smcl files - pronunciation - rhymes with
    pickle - opens in STATA only - great for copying
    and pasting tables into excel
  • Log is recommended for general portable
    documentation

25
Using and Manipulating Log files
  • Opening log files
  • Click brown book icon in toolbar
  • Then, in SAVE dialog window, select .log from
    Save as type

26
Using and Manipulating Log files
  • Closing log files
  • click brown book icon in toolbar select Close
    log file OR
  • Type log close

27
Opening Data Files
  • 3 options
  • Two point and click options (The easiest)
  • Use menu option
  • File Open
  • Click folder icon in toolbar
  • Line command - need filepath and filename
  • use filepath\filename, clear
  • OKAY, lets open state 2008.dta

28
Saving Data Files
  • 3 options
  • Two point and click options (easiest)
  • Use menu option
  • File Save As
  • Click disk icon in toolbar
  • NOTE STATA will ask you if you want to save
    over or create a new file
  • Line command need file path
  • NOTE STATA will not ask you if you want to save
    over or replace old files if specified
  • save filepath\filename

29
Saving Data Files
  • It is important to have a backup dataset and a
    working dataset.
  • You may(will) accidentally save over an old
    dataset and permanently change your data
  • Okay, lets save a working Healthy Youth Survey
    data
  • Save as state 2008 working.dta

30
Exploring Variables
  • Describing a variable using codebook
  • General info on variable, missing values, some
    labeling info, datatype
  • Example
  • codebook d14

31
Exploring Variables
  • More descriptive information using tab
  • Distribution of the values of a variable
    (percents)
  • Example
  • tab d14

32
Exploring Variables
  • Cant find the variable?
  • Use the data dictionary/codebook
  • Scroll through the variable window
  • Use command
  • aorder alphabetizes variable names
  • You can search variable list via key words
  • lookfor lttype a key wordgt

33
What was our research question again?
How many students in each grade report they
smoked cigarettes on any days in the past 30
days?
34
Generating, Collapsing Recoding Variables
  • Make a variable for tinkering using gen
  • Example gen smokers d14
  • Note
  • Use the tab command to check to see if your new
    variable came out like you planned
  • tab d14 smokers

35
Now, lets change the coding of smokers
Generating, Collapsing Recoding Variables
Change from codebook d14 Freq. Numeric
Label 26597 1 none 943 2 1-2
days 405 3 3-5 days 301 4
6-9 days 489 5 10-29 days 689
6 all 30 days 922 .
Change to codebook smokers Freq. Numeric
Label 26597 0 No 2827 1 Yes
922 .
36
Generating, Collapsing Recoding Variables
  • Modifying a tinkering variable using recode
  • Example
  • recode smokers 10 21 31 41 51 61

37
Generating, Collapsing Recoding Variables
  • Lets check our recode
  • Use the tab command to check to see if your new
    variable came out like you planned
  • tab d14 smokers
  • NOTE New variables are listed at the bottom of
    the list in the Variables window

38
Generating, Collapsing Recoding Variables
  • Shortcut - tinkering variable using recode
  • drop smokers
  • gen smokers d14
  • shortcut
  • recode smokers 10 2/61

39
Generating, Collapsing Recoding Variables
  • NOTE regarding drop/keep
  • drop ltvariable namegt - will drop the variable
    from your data set no undos
  • keep ltvariable namegt - will drop all variables
    from your dataset except the variable you specify
    no undos
  • drop ltconditional statementgt - will drop all
    respondents from dataset that do meet these
    criteria no undos
  • keep ltvariable namegt - will drop all respondents
    from dataset that do not meet these criteria
    undos

40
Generating, Collapsing Recoding Variables
  • Relational operators that are used for if
    statements especially handy for creating a
    variable from two different variables
  • gt greater than
  • lt less than
  • gt greater than or equal to
  • lt less than or equal to
  • equal
  • not equal
  • ! not equal
  • Also and or

41
Generating, Collapsing Recoding Variables
  • Using Relational operators
  • Example
  • gen nosmokers 1 if (d14 1)
  • replace nosmokers 0 if (d14 gt 1 d14 lt 6)
  • NOTE STATA sees missing values as a maximum
    value.

42
Generating, Collapsing Recoding Variables
  • Using Relational operators (continued)
  • Example
  • gen nosmokers 1 if (d14 1)
  • replace nosmokers 0 if (d14 gt 1 d14 lt 6)

43
Generating, Collapsing Recoding Variables
  • Lets check our conditional recode
  • Use the tab command to check to see if your new
    variable came out like you planned
  • tab d14 nosmokers

44
Labeling Variables
  • Labeling of variable using label variable
  • Example
  • label variable smokers Current cigarette smoker
  • NOTE label will appear in codebook variable
    results in upper right hand corner. Will also
    appear in the variables window info

45
Labeling Variables
  • Generating a value label using label define
  • Example
  • label define yesno 1Yes 0No
  • NOTE This is a local nametag that can be used
    anywhere in your dataset for multiple variables

46
Labeling Variables
  • Attaching the value label to the variable using
  • label value
  • Example
  • label value smoker yesno
  • NOTE can call up all labels label list

47
Back to our research question
How many students in each grade smoke cigarettes?
tab smokers grade
48
Exercise 1 Generating, Collapsing, Recoding,
and Labeling
49
Running Basic Frequencies
50
What You Will Learn
  • Setting up STATA for survey analysis
  • Running basic frequencies

51
Preparing Survey Data for Analysis (finally)
  • For survey analysis, STATA needs to know about
  • 1) weights and 2) design information
  • NOTE you should always check to see if STATA
    already knows
  • Here is how
  • example svyset
  • This command will tell us about weights, stratum
    and probability sampling units (psus)

52
Setting up for HYS Analysis
  • For HYS analysis, your weighting and design
    depend on what type of data you have.
  • Weighting
  • For state, county, district and school building
    level analysis there is no weighting, so we
    create a fake weight that is equal to 1
  • gen fakewt1

53
Setting up for HYS Analysis
  • For HYS analysis, your weighting and design
    depend on what type of data you have.
  • Design information
  • For sampled data, psu schgrd
  • For example 2008 State sample,
  • King, Pierce, Snohomish(All grades),
  • Clark(6th 8th)
  • Spokane, Thurston(6th)
  • For other counties, districts buildings, psu
    students
  • STATA defaults to individual students

54
Setting up for HYS Analysis
  • Here is how to tell STATA about your data
  • For state sample and sampled counties
  • svyset pweightfakewt, psu(schgrd)
  • For other counties, districts and school
    buildings
  • svyset pweightfakewt
  • For ESDs LOOK IN THE WA HYS Data Analysis
    Technical Assistance Manual website link on
    last slide of Day 3

55
Setting up for HYS Analysis
  • One last note about STATA and weights and design
    information
  • Strata and psu do not have to be designated in
    order for the svyset command to execute.
  • Thus, excluding these design variables could
    yield erroneous standard errors and confidence
    intervals.

56
Setting up for Analysis
  • How to change weighting and design variables
  • Clear it all out
  • svyset, clear
  • Then redefine weighting design variables using
    svyset
  • Notes
  • STATA will remember your weighting and design
    variable designations as long as the data file is
    open
  • If data are saved after designation, then STATA
    will remember your designation next time.
  • Always type in svyset to find out what has been
    designated

57
Okay, Some useful survey data analysis commands
58
Before we get started, lets use a research
question again
What is the percent of 10th graders who smoke
cigarettes?
59
Running Basic Frequencies
  • One-way weighted tabulations with svytab
  • Example
  • svytab d14use if grade 10

60
Running Basic Frequencies
  • Special NOTE re use of if
  • Example
  • svytab d14use if grade 10

NOTE using an if statement in a tabulation
command may yield inaccurate standard errors
depending on sampling design. For HYS, grade was
part of the psu sampling strategy so a
conditional statement here is okay. Normally
this is not recommended for survey data.
61
Output Formatting Options
  • Output options follow a comma after your
    svytab statement
  • col or row specify the direction of the
    proportional tabulation
  • ci specifies to include asymmetrical confidence
    intervals
  • se specifies to include the standard error
    (used for calculating symmetrical confidence
    intervals)
  • obs specifies that numbers of actual
    respondents are included in the results

62
Output Formatting Options
  • There are a number of different codes that you
    can include to format your STATA output
  • per will produce estimates as percents
  • format(3.1f) will produce estimates with to
    one decimal point

63
Running Basic Frequencies
  • Lets add some options
  • Example
  • svytab d14use if grade 10,
  • col ci se obs per

64
Standard error versus confidence interval
  • Standard error designated in option of
    tabulation command with se
  • Noted in output as value in parenthesis
  • (in crosstabs)
  • used to create a margin of error
  • multiply standard error by 1.96 to get 95 margin
    or error
  • Can be used to created symmetric confidence
    intervals using arithmetic and point prevalence
  • May cross 0 or 100

65
Standard error versus confidence interval
  • Confidence Interval designated in option of
    tabulation command with ci
  • Noted in output as values in square brackets
  • (in crosstabs)
  • Calculated as asymmetric confidence intervals
    never cross 0 or 100
  • Preferred for comparing confidence interval
    overlap to detect differences
  • Not as easy to communicate to lay-person

66
Exercise 2 Survey Analysis Running basic
frequencies
Write a Comment
User Comments (0)
About PowerShow.com