The Office of Institutional Research and Assessment - PowerPoint PPT Presentation


PPT – The Office of Institutional Research and Assessment PowerPoint presentation | free to download - id: 7fbda5-NDQ5M


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

The Office of Institutional Research and Assessment


The Office of Institutional Research and Assessment. OIRA, Assessment, & Quasi-Experimental Designs: An Introduction. September 2012. Reuben Ternes, OIRA – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 42
Provided by: oak108
Learn more at:


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: The Office of Institutional Research and Assessment

The Office of Institutional Research and
  • OIRA, Assessment, Quasi-Experimental Designs
    An Introduction

September 2012 Reuben Ternes, OIRA
  • Todays Presentation has 3 parts.
  • 1) A little bit about me
  • 2) A little bit about OIRA
  • 3) A little bit about quasi-experimental design
    research techniques

  • My background is in Quantitative Psychology.
  • I tell most people that I am a statistician,
    since nobody really knows what a Quantitative
    Psychologist is.
  • When people ask me what I do, I tell them that I
    work at a public university as an institutional
    researcher and that it is my job to figure out
    how the university can educate students better,
    faster, smarter, stronger, and cheaper using data
    that the university routinely collects about
    their own students.

More Me
  • My specialty Quasi-Experimental Designs
  • Regression Discontinuity Designs
  • Propensity Score Matching
  • Time Series Analysis, Etc.
  • I will talk more about these later.
  • Recent Research Data Mining/Machine Learning
    Algorithms (mostly Random Forest).
  • These are algorithmic systems designed to predict
    outcomes from large amounts of data.
  • I wont be talking about MLAs much today.

  • Office of Institutional Research and Assessment
  • OIRA
  • Is the keeper of much of the universitys
    official data.
  • Is responsible for a great deal of federal and
    state reporting that deals with official data.
  • Tracks enrollment, retention rates, graduation
    rates, etc.
  • Projects enrollments, surveys students (NSSE,
    CIRP, etc.), provides data to outside
    constituents (legislative requests, US News, etc.)

OIRA Website
  • Lots of good stuff on our website!
  • Current and Historical data on
  • Enrollment, degrees, grade data, admissions data
  • Survey results (NSSE, CIRP), IPEDS, CDS
  • Research Reports, Presentations (like this one)
  • Assessment data/information/resources
  • Plans, rubrics, example reports, etc. etc.

What Else Does OIRA Do?
  • Occasionally, we help faculty and staff with both
    research design and statistical interpretation.
  • Surveys, design of experiments, statistical
    consultations, etc.
  • OIRA also helps the UAC and GEC handle
    Assessment, as required by the NCA/HLC and other
    accrediting bodies.
  • We also do a great deal of policy analysis.

Policy Analysis
  • What kinds of policy analysis?
  • Should we recommend all students to take 16 or
    more credits their first semester, regardless of
    their incoming academic ability?
  • If we raise our students incoming academic
    profile, what impact will that have on retention
    and graduation rates?
  • Does need-based aid increase retention rates?
    Does merit-based aid? Which one is better at
    attracting students?
  • Policy Analysis and Assessment share a lot in
    common, and the same tools can be applied to
  • The burden of proof might not be the same, but
    assessment of student learning outcomes can be
    greatly enhanced by thinking about it in terms of
    policy analysis.

Quasi-Experimental Designs
  • Much of the policy analysis research that I do is
    focused on causation, not correlation.
  • That is, does Policy X cause a change in
    behavior Y?
  • University policy-makers are reluctant to use
    random assignment to estimate the impact of
  • But how can we establish causation without
    experimental designs and random assignment?
  • Quasi-experimental designs!

What is a Quasi-Experimental Design (QED)?
  • Any research design that attempts to estimate
    causation without using random assignment.
  • This is my definition. No doubt, others have
    better definitions than this.
  • Best overall introduction to quasi-experimental
    design techniques that I know of is Shaddish,
    Cook, Campbell (2002).
  • Unfortunately, its a bit outdated...

Random Assignment and Selection Bias
  • Most quasi-experimental designs focus on the
    issue of selection bias.
  • (i.e. bias that is introduced because
    participants in the treatment condition were
    selected non-randomly).
  • There are various ways to either reduce, or
    theoretically eliminate selection bias.
  • These techniques are what the rest of this
    presentation is about.
  • Im about to throw a lot of acronyms around.
  • When you forget what they mean, stop me, and ask
    for clarification.

Overview of QED
  • Ill talk about 4 possible designs
  • Propensity Score Matching (PSM)
  • Interrupted Time Series (ITS)
  • Instrumental Variables Approach (IV)
  • Regression Discontinuity Design (RDD)

Propensity Score Matching
  • Based on Matched Design Principles.
  • Construct a control group that is equivalent to
    the treatment group.
  • (This is what RA does as n?8)
  • Do this by finding control group members that are
    identical to participants
  • For example, if we wanted to study the impact of
    a red meat diet on health outcomes
  • We might match participants on the basis of
    Gender, Age, and Smoking habits.
  • Matching procedures work well when the number of
    matching variables are small and well understood.
  • (They dont actually work well for most medical

Matching Overwhelming?
  • How do you match when you have 50 variables?
  • You cant match them all exactly.
  • Which variables are most important?
  • Solution Propensity-Score Matching (PSM)
  • Assigns a propensity score to each participant.
  • Score probability of belonging to the treatment
  • Based on all of the collective variables that are
    related to receiving treatment.
  • Easier to match on this score rather than all
  • Its a clever way to reduce the dimensionality of
    the matching procedure.

PSM in Action Visual Estimation with LOESS
Smoothing Curves
PSM Pros Cons
  • PSM creates control groups that provide less
    biased estimates of treatment effects.
  • Bias is lessened, but never eliminated.
  • Needs lots of relevant variables.
  • Missing a critical explanatory variable related
    to treatment is bad.
  • Sample size hog. Generally inefficient.
  • Results can be sensitive to the way the
    propensity score was created.
  • Usually done through logistic regression.
  • Must model interactions and non-linearities.
  • I recommend Random Forest or similar techniques
    over logistic regression due to modeling

PSM References
  • Brief Introduction Shaddish, W.R., Cook, T.D.,
    Campbell, D.T. (2002). Experimental and
    quasi-experimental designs for generalized causal
    inference (5th ed.). Boston Houghton Mifflin.
  • Essential Reading Dehejia, R.H. and S. Wahba
    (2002) Propensity Score Matching Methods for
    Non-Experimental Causal Studies, Review of
    Economics and Statistics, 84(1), 151-161.
  • Primer Heinrich, C., A. Maffioli, and G.
    Vázquez. 2010. A Primer for Applying
    Propensity-Score Matching. Impact Evaluation
    Guidelines, Strategy Development Division,
    Technical Notes No.IDB-TN-161. Inter-American
    Development Bank, Washington, D.C.

Interrupted Time Series Analysis
  • One of the most compelling ways to show causality
    is to introduce and suspend an effect repeatedly
    while measuring an important outcome.
  • When graphed, it should look a little bit like a
    the rhythm of a heartbeat.

(No Transcript)
Real Example of ITS Design
Math 3 Implemented Here
These Students Take Math 2
These Students Take Math 3
There was a significant drop in 2-year completion
rates for Math 1 just after the implementation of
the new Remedial Math 3 course.
ITS Pros Cons
  • Many internal threats to validity are rendered
    unlikely with time series data.
  • Visually intuitive. Easy to explain!
  • Most problems with ITS occur when there is too
    little data.
  • Either not enough years to estimate trends
  • Or low sample size which produces too much
  • Estimating effect sizes can sometimes be tricky.

ITS References
  • Brief Introduction Shaddish, W.R., Cook, T.D.,
    Campbell, D.T. (2002). Experimental and
    quasi-experimental designs for generalized causal
    inference (5th ed.). Boston Houghton Mifflin.
  • ITS Segmented Regression Wagner, A. K.,
    Soumerai, S. B., Zhang, F. and Ross-Degnan, D.
    (2002), Segmented regression analysis of
    interrupted time series studies in medication use
    research. Journal of Clinical Pharmacy and
    Therapeutics, 27 299309.

Instrumental Variables
  • Used in regression.
  • Estimates the causal impact of a treatment when
    the explanatory variables are correlated with the
    regressions error term.
  • Lets go over a classic example.
  • Does smoking harm health outcomes?

IV - Example
  • We cant estimate health outcomes by adding
    smoking behavior and other relevant variables
    into a regression (age, gender, etc.).
  • Smoking behavior could be correlated with health
    outcomes even if smoking did not cause health
  • For example smoking could be correlated with
    lower SES, which is well known to also be
    correlated with health outcomes.

IV Selecting the Instrument
  • But what if you could find something that was
    related to smoking behavior, but not to the
  • Taxes?
  • Changes the cost of smoking, which should change
  • Taxes change by state and by time, which allows
    analysis on multiple levels.
  • Including the IV allows for (theoretically)
    unbiased treatment estimates.

IV Pros Cons
  • Including the IV allows for (theoretically)
    unbiased treatment estimates.
  • But only for large samples.
  • Can be used in conjunction with other types of
    analyses (RDD, ITS, etc.)
  • Good IVs can be difficult to find.
  • Weak IVs are sometimes ineffective, even if
  • Sometimes the IV may appear to be unrelated to
    the outcome, but it actually is.
  • Health conscious states that have better health
    outcomes may have a tendency to raise taxes on

IV resources
  • IV has changed a great deal in the last decade.
  • I havent been able to keep up with it well.
  • There are probably better references then the
    ones Im giving here.
  • IV and Smoking Example Leigh, J.P. Schembri, M.
    Instrumental variables technique cigarette
    prices provided better estimate of effects of
    smoking on SF-12. J. Clin. Epidemiol. 2004, 57,
  • IV and Education Example Bettinger, E., Long,
    B. T. (2009). Addressing the Needs of
    Under-Prepared Students in Higher Education Does
    College Remediation Work? Journal of Human
    Resources 44(3), 736-771.
  • IV with RD Example Martorell, P., McFarlin, I.
    (2007). Help or hindrance? The effects of college
    remediation on academic and labor market outcomes
    (Working Paper). Dallas, TX University of Texas
    at Dallas, Texas Schools Project.

Regression Discontinuity (RD) Designs
  • Often, we separate people into groups based on
    some arbitrary point along a continuous variable.
  • Medicare (65 an older)
  • Certification exams (if you reach a certain
    score, you pass, otherwise, you fail).
  • Scholarships criteria (need a certain ACT score
    or HS GPA).
  • Federal Aid programs (eligibility based on
  • Taxes (i.e. your tax bracket)
  • Placement of students into remedial courses
  • Admission criteria (MCATs, SATs, LSATs, ACTs,

Determining Effectiveness
  • It is difficult to determine the effectiveness of
    a policy or program when it uses a cut-score to
    assign treatment.
  • These groups are separated because we believe
    that they behave differently in some fashion.
  • Comparing them directly doesnt make sense,
    because we expect them to be different before
    treatment begins!

RD as a Solution
  • Regression Discontinuity (RD) designs exploit the
    assignment rule to estimate the causal impact of
    the program.
  • RDs can provide unbiased treatment estimates when
    certain assumptions are met.
  • The functional form must be modeled correctly.
  • The assignment rule must be exact, and modeled
  • Many real-world examples meet these assumptions.
  • Many statisticians consider RD to be equivalent
    to an experimental design with random assignment
    (assuming the assumptions can be met).

The Logic of RD
  • Participants just above the cut-off and just
    below the cut-off should be fairly similar to
    each other.
  • However, because of the assignment rule, they
    have vastly different experiences.
  • We can then compare participants very near to
    each other, but with different experiences, and
    use regression to compensate for any remaining
    differences based on the assignment score.
  • When we graph the data, a discontinuity between
    the regression lines is evidence of an effect.

A Fictitious Example of RD
RD Weaknesses
  • RD can only assess the impact of a program near
    the cut-off score.
  • The assignment rule is not always clear cut.
  • Such cases have issues. Called Fuzzy RDs. See
    Trichom (1984).
  • Other minor issues may complicate RD (and make it
    time consuming).
  • Density tests.
  • Bandwidth optimality.
  • Alternative functional forms.
  • Relative to random assignment, RD still remains

RD Resources
  • Intro to RD Shaddish, W.R., Cook, T.D.,
    Campbell, D.T. (2002). Experimental and
    quasi-experimental designs for generalized causal
    inference (5th ed.). Boston Houghton Mifflin.
  • Intro to RD West, S.G., Biesanz, J.C., and
    Pitts, S.C. (2000). Causal Inference and
    Generalization in Field Settings Experimental
    and Quasi-Experimental Designs. In H. T. Reis
    C. M. Judd (Eds.), Handbook of research methods
    in social and personality psychology (pp. 40-84).
    New York Cambridge University Press.
  • Advanced discussion Lee, D., Lemieux, T.
    (2009). Regression Discontinuity Designs in
    Economics (NBER Working Paper). Cambridge, MA
    National Bureau of Economic Research.
  • Advanced Discussion/Primer Lesik, S. (2006).
    Applying the regression-discontinuity design to
    infer causality with non-random assignment.
    Review of Higher Education, 30(1), 119.
  • IV with RD Example Martorell, P., McFarlin, I.
    (2007). Help or hindrance? The effects of college
    remediation on academic and labor market outcomes
    (Working Paper). Dallas, TX University of Texas
    at Dallas, Texas Schools Project.
  • Calcagno, J. C., Long, B. T. (2008) The impact
    of postsecondary remediation using a regression
    discontinuity approach Addressing endogenous
    sorting and noncompliance (NCPR Working Paper).
    New York National Center for Postsecondary
  • Fuzzy RD Trochim, W. (1984). Research design for
    program evaluation The regression-discontinuity
    approach. Beverly Hills, CA. Sage, 1984.
  • Formal Definition and Proof Rubin, D.B. (1977),
    Assignment to Treatment Group on the Basis of a
    Covariate. Journal of Educational Statistics,
    Vol. 2, 4-58.
  • History of RD Cook, T.D., "Waiting for life to
    arrive" A history of the regression-discontinuity
    design in psychology, statistics and economics,
    Journal of Econometrics, February 2008, 142 (2),

A Real World Example (If We Have Time)
  • Does need-based financial aid improve retention
  • Lets examine the data visually
  • Instead of imposing a functional form (like a
    line or a curve). I will explore the data with
    LOESS smoothing curves.
  • These are free forming curves that show the shape
    of the data, without imposing strict assumptions
    about what the data must be like.
  • They help with the visual estimation, but they
    wont give us any regression estimates.
  • They are good for exploration, to see if linear
    regression is appropriate.

First, Some Background
  • In order to qualify for some of our need based
    aid programs (institutional grants) you must have
    an ACT score of a 21 or higher, and demonstrate
  • Students that are not eligible for federal or
    state need-based aid, are also not eligible for
    OUs need-based aid.
  • The grant size is usually substantial.
  • (Theres also a HS GPA requirement, but we will
    ignore that for now and focus only on ACT
  • The data represents about 5,000 students.

Retention Rates by ACT Score and Need Status
The Dashed Lines
  • The dashed lines represent students that have not
    demonstrated financial need.
  • They are separated into two groups, those with
    ACT scores 21 and above.
  • And those with scores below a 21.
  • Notice that there is NO gap, or discontinuity
    between the two groups.
  • The two dashed lines could be represented by the
    same color and you would never be able to tell
    where the discontinuity was.

Retention Rates by ACT Score and Need Status
The Solid Lines
  • The Red line (on the left) represents students
    with ACT scores of less than a 21.
  • These students did not qualify for a large
    portion of OUs need-based aid.
  • The Black solid line (on the right) represents
    students with ACT scores of a 21 or more.
  • These students potentially qualify for additional
    need-based aid.
  • Notice the HUGE gap between the red and black
  • This gap is evidence that need based aid had a
    positive impact on the retention of students near
    the cut-off.

Retention Rates by ACT Score and Need Status