Loading...

PPT – Basic Experimental Design PowerPoint presentation | free to download - id: 7ce55-NTY4O

The Adobe Flash plugin is needed to view this content

Basic Experimental Design

- Larry V. Hedges
- Northwestern University
- Prepared for the IES Summer Research Training

Institute July 8, 2008

What is Experimental Design?

- Experimental design includes both
- Strategies for organizing data collection
- Data analysis procedures matched to those data

collection strategies - Classical treatments of design stress analysis

procedures based on the analysis of variance

(ANOVA) - Other analysis procedure such as those based on

hierarchical linear models or analysis of

aggregates (e.g., class or school means) are also

appropriate

Why Do We Need Experimental Design?

- Because of variability
- We wouldnt need a science of experimental design

if - If all units (students, teachers, schools) were

identical - and
- If all units responded identically to treatments
- We need experimental design to control

variability so that treatment effects can be

identified

A Little History

- The idea of controlling variability through

design has a long history - In 1747 Sir James Linds studies of scurvy
- Their cases were as similar as I could have

them. They all in general had putrid gums, spots

and lassitude, with weakness of their knees.

They lay together on one place and had one diet

common to all (Lind, 1753, p. 149) - Lind then assigned six different treatments to

groups of patients

A Little History

- The idea of random assignment was not obvious and

took time to catch on - In 1648 von Helmont carried out one randomization

in a trial of bloodletting for fevers - In 1904 Karl Pearson suggested matching and

alternation in typhoid trials - Amberson, et al. (1931) carried out a trial with

one randomization - In 1937 Sir Bradford Hill advocated alternation

of patients in trials rather than randomization - Diehl, et al. (1938) carried out a trial that is

sometimes referred to as randomized, but it

actually used alternation

A Little History

- The first modern randomized clinical trial in

medicine is usually considered to be the trial of

streptomycin for treating tuberculosis - It was conducted by the British Medical Research

Council in 1946 and reported in 1948

A Little History

- Experiments have been used longer in the

behavioral sciences (e.g., psychophysics Pierce

and Jastrow, 1885) - Experiments conducted in laboratory settings were

widely used in educational psychology (e.g.,

McCall, 1923) - Thorndike (early 1900s)
- Lindquist (1953)
- Gage field experiments on teaching (1978 1984)

A Little History

- Studies in crop variation I VI (1921 1929)
- In 1919 a statistician named Fisher was hired at

Rothamsted agricultural station - They had a lot of observational data on crop

yields and hoped a statistician could analyze it

to find effects of various treatments - All he had to do was sort out the effects of

confounding variables

Studies in Crop Variation I (1921)

- Fisher does regression analyseslots of themto

study (and get rid of) the effects of confounders - soil fertility gradients
- drainage
- effects of rainfall
- effects of temperature and weather, etc.
- Fisher does qualitative work to sort out

anomalies - Conclusion
- The effects of confounders are typically larger

than those of the systematic effects we want to

study

Studies in Crop Variation II (1923)

- Fisher invents
- Basic principles of experimental design
- Control of variation by randomization
- Analysis of variance

Studies in Crop Variation IV and VI

- Studies in Crop variation IV (1927)
- Fisher invents analysis of covariance to combine

statistical control and control by randomization - Studies in crop variation VI (1929)
- Fisher refines the theory of experimental

design, introducing most other key concepts known

today

Our Hero in 1929

Principles of Experimental Design

- Experimental design controls background

variability so that systematic effects of

treatments can be observed - Three basic principles
- Control by matching
- Control by randomization
- Control by statistical adjustment
- Their importance is in that order

Control by Matching

- Known sources of variation may be eliminated by

matching - Eliminating genetic variation
- Compare animals from the same litter of mice
- Eliminating district or school effects
- Compare students within districts or schools
- However matching is limited
- matching is only possible on observable

characteristics - perfect matching is not always possible
- matching inherently limits generalizability by

removing (possibly desired) variation

Control by Matching

- Matching ensures that groups compared are alike

on specific known and observable characteristics

(in principle, everything we have thought of) - Wouldnt it be great if there were a method of

making groups alike on not only everything we

have thought of, but everything we didnt think

of too? - There is such a method

Control by Randomization

- Matching controls for the effects of variation

due to specific observable characteristics - Randomization controls for the effects all

(observable or non-observable, known or unknown)

characteristics - Randomization makes groups equivalent (on

average) on all variables (known and unknown,

observable or not) - Randomization also gives us a way to assess

whether differences after treatment are larger

than would be expected due to chance.

Control by Randomization

- Random assignment is not assignment with no

particular rule. It is a purposeful process - Assignment is made at random. This does not

mean that the experimenter writes down the names

of the varieties in any order that occurs to him,

but that he carries out a physical experimental

process of randomization, using means which shall

ensure that each variety will have an equal

chance of being tested on any particular plot of

ground (Fisher, 1935, p. 51)

Control by Randomization

- Random assignment of schools or classrooms is not

assignment with no particular rule. It is a

purposeful process - Assignment of schools to treatments is made at

random. This does not mean that the experimenter

assigns schools to treatments in any order that

occurs to her, but that she carries out a

physical experimental process of randomization,

using means which shall ensure that each

treatment will have an equal chance of being

tested in any particular school (Hedges, 2007)

Control by Statistical Adjustment

- Control by statistical adjustment is a form of

pseudo-matching - It uses statistical relations to simulate

matching - Statistical control is important for increasing

precision but should not be relied upon to

control biases that may exist prior to assignment - Statistical control is the weakest of the three

experimental design principles because its

validity depends on knowing a statistical model

for responses

Using Principles of Experimental Design

- You have to know a lot (be smart) to use matching

and statistical control effectively - You do not have to be smart to use randomization

effectively - But
- Where all are possible, randomization is not as

efficient (requires larger sample sizes for the

same power) as matching or statistical control

Basic Ideas of Design Independent Variables

(Factors)

- The values of independent variables are called

levels - Some independent variables can be manipulated,

others cant - Treatments are independent variables that can be

manipulated - Blocks and covariates are independent variables

that cannot be manipulated - These concepts are simple, but are often confused
- Remember
- You can randomly assign treatment levels but not

blocks

Basic Ideas of Design (Crossing)

- Relations between independent variables
- Factors (treatments or blocks) are crossed if

every level of one factor occurs with every level

of another factor - Example
- The Tennessee class size experiment assigned

students to one of three class size conditions.

All three treatment conditions occurred within

each of the participating schools - Thus treatment was crossed with schools

Basic Ideas of Design (Nesting)

- Factor B is nested in factor A if every level of

factor B occurs within only one level of factor A - Example
- The Tennessee class size experiment actually

assigned classrooms to one of three class size

conditions. Each classroom occurred in only one

treatment condition - Thus classrooms were nested within treatments
- (But treatment was crossed with schools)

Where Do These Terms Come From? (Nesting)

- An agricultural experiment where blocks are

literally blocks or plots of land - Here each block is literally nested within a

treatment condition

Where Do These Terms Come From? (Crossing)

- An agricultural experiment
- Blocks were literally blocks of land and plots of

land within blocks were assigned different

treatments

Where Do These Terms Come From? (Crossing)

- Blocks were literally blocks of land and plots of

land within blocks were assigned different

treatments. - Here treatment literally crosses the blocks

Where Do These Terms Come From? (Crossing)

- The experiment is often depicted like this. What

is wrong with this as a field layout? - Consider possible sources of bias

Think About These Designs

- A study assigns a reading treatment (or control)

to children in 20 schools. Each child is

classified into one of three groups with

different risk of reading failure. - A study assigns T or C to 20 teachers. The

teachers are in five schools, and each teacher

teaches 4 science classes - Two schools in each district are picked to

participate. Each school has two grade 4

teachers. One of them is assigned to T, the other

to C.

Three Basic Designs

- The completely randomized design
- Treatments are assigned to individuals
- The randomized block design
- Treatments are assigned to individuals within

blocks - (This is sometimes called the matched design,

because individuals are matched within blocks) - The hierarchical design
- Treatments are assigned to blocks, the same

treatment is assigned to all individuals in the

block

The Completely Randomized Design

- Individuals are randomly assigned to one of two

treatments

The Randomized Block Design

The Hierarchical Design

Randomization Procedures

- Randomization has to be done as an explicit

process devised by the experimenter - Haphazard is not the same as random
- Unknown assignment is not the same as random
- Essentially random is technically meaningless
- Alternation is not random, even if you alternate

from a random start - This is why R.A. Fisher was so explicit about

randomization processes

Randomization Procedures

- R.A. Fisher on how to randomize an experiment

with small sample size and 5 treatments - A satisfactory method is to use a pack of cards

numbered from 1 to 100, and to arrange them in

random order by repeated shuffling. The

varieties treatments are numbered from 1 to 5,

and any card such as the number 33, for example

is deemed to correspond to variety treatment

number 3, because on dividing by 5 this number is

found as the remainder. (Fisher, 1935, p.51)

Randomization Procedures

- You may want to use a table of random numbers,

but be sure to pick an arbitrary start point! - Beware random number generatorsthey typically

depend on seed values, be sure to vary the seed

value (if they do not do it automatically) - Otherwise you can reliably generate the same

sequence of random numbers every time - It is no different that starting in the same

place in a table of random numbers

Randomization Procedures

- Completely Randomized Design
- (2 treatments, 2n individuals)
- Make a list of all individuals
- For each individual, pick a random number from 1

to 2 (odd or even) - Assign the individual to treatment 1 if even, 2

if odd - When one treatment is assigned n individuals,

stop assigning more individuals to that treatment

Randomization Procedures

- Completely Randomized Design (2pn

individuals, p treatments) - Make a list of all individuals
- For each individual, pick a random number from 1

to p - One way to do this is to get a random number of

any size, divide by p, the remainder R is between

0 and (p 1), so add 1 to the remainder to get R

1 - Assign the individual to treatment R 1
- Stop assigning individuals to any treatment after

it gets n individuals

Randomization Procedures

- Randomized Block Design with 2 Treatments
- (m blocks per treatment, 2n individuals per

block) - Make a list of all individuals in the first block
- For each individual, pick a random number from 1

to 2 (odd or even) - Assign the individual to treatment 1 if even, 2

if odd - Stop assigning a treatment it is assigned n

individuals in the block - Repeat the same process with every block

Randomization Procedures

- Randomized Block Design with p Treatments
- (m blocks per treatment, pn individuals per

block) - Make a list of all individuals in the first block
- For each individual, pick a random number from 1

to p - Assign the individual to treatment p
- Stop assigning a treatment it is assigned n

individuals in the block - Repeat the same process with every block

Randomization Procedures

- Hierarchical Design with 2 Treatments
- (m blocks per treatment, n individuals per

block) - Make a list of all blocks
- For each block, pick a random number from 1 to 2
- Assign the block to treatment 1 if even,

treatment 2 if odd - Stop assigning a treatment after it is assigned m

blocks - Every individual in a block is assigned to the

same treatment

Randomization Procedures

- Hierarchical Design with p Treatments
- (m blocks per treatment, n individuals per

block) - Make a list of all blocks
- For each block, pick a random number from 1 to p
- Assign the block to treatment corresponding to

the number - Stop assigning a treatment after it is assigned m

blocks - Every individual in a block is assigned to the

same treatment

Sampling Models

Sampling Models in Educational Research

- Sampling models are often ignored in educational

research - But
- Sampling is where the randomness comes from in

social research - Sampling therefore has profound consequences for

statistical analysis and research designs

Sampling Models in Educational Research

- Simple random samples are rare in field research
- Educational populations are hierarchically

nested - Students in classrooms in schools
- Schools in districts in states
- We usually exploit the population structure to

sample students by first sampling schools - Even then, most samples are not probability

samples, but they are intended to be

representative (of some population)

Sampling Models in Educational Research

- Survey research calls this strategy multistage

(multilevel) clustered sampling - We often sample clusters (schools) first then

individuals within clusters (students within

schools) - This is a two-stage (two-level) cluster sample
- We might sample schools, then classrooms, then

students - This is a three-stage (three-level) cluster

sample

Precision of Estimates Depends on the Sampling

Model

- Suppose the total population variance is sT2 and

ICC is ? - Consider two samples of size N mn
- A simple random sample or stratified sample
- The variance of the mean is sT2/mn
- A clustered sample of n students from each of m

schools - The variance of the mean is (sT2/mn)1 (n

1)? - The inflation factor 1 (n 1)? is called the

design effect

Precision of Estimates Depends on the Sampling

Model

- Suppose the population variance is sT2
- School level ICC is ?S, class level ICC is ?C
- Consider two samples of size N mpn
- A simple random sample or stratified sample
- The variance of the mean is sT2/mpn
- A clustered sample of n students from p classes

in m schools - The variance is (sT2/mpn)1 (pn 1)?S (n

1)?C - The three level design effect is 1 (pn 1)?S

(n 1)?C

Precision of Estimates Depends on the Sampling

Model

- Treatment effects in experiments and

quasi-experiments are mean differences - Therefore precision of treatment effects and

statistical power will depend on the sampling

model

Sampling Models in Educational Research

- The fact that the population is structured does

not mean the sample is must be a clustered sample - Whether it is a clustered sample depends on
- How the sample is drawn (e.g., are schools

sampled first then individuals randomly within

schools) - What the inferential population is (e.g., is the

inference to these schools studied or a larger

population of schools)

Sampling Models in Educational Research

- A necessary condition for a clustered sample is

that it is drawn in stages using population

subdivisions - schools then students within schools
- schools then classrooms then students
- However, if all subdivisions in a population are

present in the sample, the sample is not

clustered, but stratified - Stratification has different implications than

clustering - Whether there is stratification or clustering

depends on the definition of the population to

which we draw inferences (the inferential

population)

Sampling Models in Educational Research

- The clustered/stratified distinction matters

because it influences the precision of statistics

estimated from the sample - If all population subdivisions are included in

the every sample, there is no sampling (or

exhaustive sampling) of subdivisions - therefore differences between subdivisions add no

uncertainty to estimates - If only some population subdivisions are included

in the sample, it matters which ones you happen

to sample - thus differences between subdivisions add to

uncertainty

Inferential Population and Inference Models

- The inferential population or inference model has

implications for analysis and therefore for the

design of experiments - Do we make inferences to the schools in this

sample or to a larger population of schools? - Inferences to the schools or classes in the

sample are called conditional inferences - Inferences to a larger population of schools or

classes are called unconditional inferences

Inferential Population and Inference Models

- Note that the inferences (what we are estimating)

are different in conditional versus unconditional

inference models - In a conditional inference, we are estimating the

mean (or treatment effect) in the observed

schools - In unconditional inference we are estimating the

mean (or treatment effect) in the population of

schools from which the observed schools are

sampled - We are still estimating a mean (or a treatment

effect) but they are different parameters with

different uncertainties

Fixed and Random Effects

- When the levels of a factor (e.g., particular

blocks included) in a study are sampled and the

inference model is unconditional, that factor is

called random and its effects are called random

effects - When the levels of a factor (e.g., particular

blocks included) in a study constitute the entire

inference population and the inference model is

conditional, that factor is called fixed and its

effects are called fixed effects

Applications to Experimental Design

- We will look in detail at the two most widely

used experimental designs in education - Randomized blocks designs
- Hierarchical designs

Experimental Designs

- For each design we will look at
- Structural Model for data (and what it means)
- Two inference models
- What does treatment effect mean in principle
- What is the estimate of treatment effect
- How do we deal with context effects
- Two statistical analysis procedures
- How do we estimate and test treatment effects
- How do we estimate and test context effects
- What is the sensitivity of the tests

The Randomized Block Design

- The population (the sampling frame)
- We wish to compare two treatments
- We assign treatments within schools
- Many schools with 2n students in each
- Assign n students to each treatment in each school

The Randomized Block Design

- The experiment
- Compare two treatments in an experiment
- We assign treatments within schools
- With m schools with 2n students in each
- Assign n students to each treatment in each school

The Randomized Block Design

- Diagram of the design

The Randomized Block Design

- School 1

The Conceptual Model

- The statistical model for the observation on the

kth person in the jth school in the ith treatment

is - Yijk µ ai ßj aßij eijk
- where
- µ is the grand mean,
- ai is the average effect of being in treatment i,

- ßj is the average effect of being in school j,
- aßij is the difference between the average effect

of treatment i and the effect of that treatment

in school j, - eijk is a residual

Effect of Context

Context Effect

Two-level Randomized Block Design With No

Covariates (HLM Notation)

- Level 1 (individual level)
- Yijk ß0j ß1jTijk eijk e N(0, sW2)
- Level 2 (school Level)
- ß0j p00 ?0j ?0j N(0, sS2)
- ß1j p10 ?1j ?1j N(0, sTxS2)
- If we code the treatment Tijk ½ or - ½ , then

the parameters are identical to those in standard

ANOVA

Effects and Estimates

- The population mean of treatment 1 in school j

is - a1 aß1j
- The population mean of treatment 2 in school j is
- a2 aß2j
- The estimate of the mean of treatment 1 in school

j is - a1 aß1j e1j?
- The estimate of the mean of treatment 2 in school

j is - a2 aß2j e2j?

Effects and Estimates

- The comparative treatment effect in any given

school j is - (a1 a2) (aß1j aß2j)
- The estimate of comparative treatment effect in

school j is - (a1 a2) (aß1j aß2j) (e1j? e2j?)
- The mean treatment effect in the experiment is
- (a1 a2) (aß1? aß2?)
- The estimate of the mean treatment effect in the

experiment is - (a1 a2) (aß 1? aß2?) (e1?? e2??)

Inference Models

- Two different kinds of inferences about effects
- Unconditional Inference (Schools Random)
- Inference to the whole universe of schools
- (requires a representative sample of schools)
- Conditional Inference (Schools Fixed)
- Inference to the schools in the experiment
- (no sampling requirement on schools)

Statistical Analysis Procedures

- Two kinds of statistical analysis procedures
- Mixed Effects Procedures (Schools Random)
- Treat schools in the experiment as a sample from

a population of schools - (only strictly correct if schools are a sample)
- Fixed Effects Procedures (Schools Fixed)
- Treat schools in the experiment as a population

Unconditional Inference (Schools Random)

- The estimate of the mean treatment effect in the

experiment is - (a1 a2) (aß 1? aß2?) (e1?? e2??)
- The average treatment effect we want to estimate

is - (a1 a2)
- The term (e1?? e2??) depends on the students in

the schools in the sample - The term (aß1? aß2?) depends on the schools in

sample - Both (e1?? e2??) and (aß1? aß2?) are random

and average to 0 across students and schools,

respectively

Conditional Inference (Schools Fixed)

- The estimate of the mean treatment effect in the

experiment is still - (a1 a2) (aß 1? aß2?) (e1?? e2??)
- Now the average treatment effect we want to

estimate is - (a1 aß1?) (a2 aß2?) (a1 a2) (aß1?

aß2?) - The term (e1?? e2??) depends on the students in

the schools in the sample - The term (aß1? aß2?) depends on the schools in

sample, but the treatment effect in the sample of

schools is the effect we want to estimate

Expected Mean Squares Randomized Block

Design (Two Levels, Schools Random)

Mixed Effects Procedures (Schools Random)

- The test for treatment effects has
- H0 (a1 a2) 0
- Estimated mean treatment effect in the experiment

is - (a1 a2) (aß1? aß2?) (e1?? e2??)
- The variance of the estimated treatment effect is

- 2sW2 nsTxS2 /mn 21 (n?S 1)?s2/mn
- Here ?S sTxS2/sS2 and ? sS2/(sS2 sW2)

sS2/s2

Mixed Effects Procedures

- The test for treatment effects
- FT MST/MSTxS with (m 1) df
- The test for context effects (treatment by

schools interaction) is - FTxS MSTxS/MSWS with 2m(n 1) df
- Power is determined by the operational effect

size - where ?S sTxS2/sS2 and ? sS2/(sS2 sW2)

sS2/s2

Expected Mean Squares Randomized Block

Design (Two Levels, Schools Fixed)

Fixed Effects Procedures

- The test for treatment effects has
- H0 (a1 a2) (aß1? aß2?) 0
- Estimated mean treatment effect in the experiment

is - (a1 a2) (aß1? aß2?) (e1?? e2??)
- The variance of the estimated treatment effect is

- 2sW2 /mn

Fixed Effects Procedures

- The test for treatment effects
- FT MST/MSWS with m(n 1) df
- The test for context effects (treatment by

schools interaction) is - FC MSTxS/MSWS with 2m(n 1) df
- Power is determined by the operational effect

size - with m(n 1) df

Comparing Fixed and Mixed Effects Statistical

Procedures (Randomized Block Design)

Comparing Fixed and Mixed Effects

Procedures (Randomized Block Design)

- Conditional and unconditional inference models
- estimate different treatment effects
- have different contaminating factors that add

uncertainty - Mixed procedures are good for unconditional

inference - The fixed procedures are good for conditional

inference - The fixed procedures have higher power

The Hierarchical Design

- The universe (the sampling frame)
- We wish to compare two treatments
- We assign treatments to whole schools
- Many schools with n students in each
- Assign all students in each school to the same

treatment

The Hierarchical Design

- The experiment
- We wish to compare two treatments
- We assign treatments to whole schools
- Assign 2m schools with n students in each
- Assign all students in each school to the same

treatment

The Hierarchical Design

- Diagram of the experiment

The Hierarchical Design

- Treatment 1 schools

The Hierarchical Design

- Treatment 2 schools

The Conceptual Model

- The statistical model for the observation on the

kth person in the jth school in the ith treatment

is - Yijk µ ai ßi aßij ejk(i) µ ai

ßj(i) ejk(i) - µ is the grand mean,
- ai is the average effect of being in treatment i,

- ßj is the average effect if being in school j,
- aßij is the difference between the average effect

of treatment i and the effect of that treatment

in school j, - eijk is a residual
- Or ßj(i) ßi aßij is a term for the combined

effect of schools within treatments

The Conceptual Model

- The statistical model for the observation on the

kth person in the jth school in the ith treatment

is - Yijk µ ai ßi aßij ejk(i) µ ai

ßj(i) ejk(i) - µ is the grand mean,
- ai is the average effect of being in treatment i,

- ßj is the average effect if being in school j,
- aßij is the difference between the average effect

of treatment i and the effect of that treatment

in school j, - eijk is a residual
- or ßj(i) ßi aßij is a term for the combined

effect of schools within treatments

Context Effects

Two-level Hierarchical Design With No Covariates

(HLM Notation)

- Level 1 (individual level)
- Yijk ß0j eijk e N(0, sW2)
- Level 2 (school Level)
- ?0j p00 p01Tj ?0j ? N(0, sS2)
- If we code the treatment Tj ½ or - ½ , then
- p00 µ, p01 a1, ?0j ßj(i)
- The intraclass correlation is ? sS2/(sS2 sW2)

sS2/s2

Effects and Estimates

- The comparative treatment effect in any given

school j is still - (a1 a2) (aß1j aß2j)
- But we cannot estimate the treatment effect in a

single school because each school gets only one

treatment - The mean treatment effect in the experiment is
- (a1 a2) (ß?(1) ß?(2))
- (a1 a2) (ß1? ß2? ) (aß1? aß2?)
- The estimate of the mean treatment effect in the

experiment is - (a1 a2) (ß? (1) ß? (2)) (e1?? e2??)

Inference Models

- Two different kinds of inferences about effects

(as in the randomized block design) - Unconditional Inference (schools random)
- Inference to the whole universe of schools
- (requires a representative sample of schools)
- Conditional Inference (schools fixed)
- Inference to the schools in the experiment
- (no sampling requirement on schools)

Unconditional Inference (Schools Random)

- The average treatment effect we want to estimate

is - (a1 a2)
- The term (e1?? e2??) depends on the students in

the schools in the sample - The term (ß?(1) ß?(2)) depends on the schools

in sample - Both (e1?? e2??) and (ß?(1) ß?(2)) are random

and average to 0 across students and schools,

respectively

Conditional Inference (Schools Fixed)

- The average treatment effect we want to (can)

estimate is - (a1 ß?(1)) (a2 ß?(2)) (a1 a2) (ß?(1)

ß?(2)) - (a1 a2) (ß1? ß2? ) (aß1? aß2?)
- The term (ß?(1) ß?(2)) depends on the schools

in sample, but we want to estimate the effect of

treatment in the schools in the sample - Note that this treatment effect is not quite the

same as in the randomized block design, where we

estimate - (a1 a2) (aß1? aß2?)

Statistical Analysis Procedures

- Two kinds of statistical analysis procedures

(as in the randomized block design) - Mixed Effects Procedures
- Treat schools in the experiment as a sample from

a universe - Fixed Effects Procedures
- Treat schools in the experiment as a universe

Expected Mean Squares Hierarchical Design (Two

Levels, Schools Random)

Mixed Effects Procedures (Schools Random)

- The test for treatment effects has
- H0 (a1 a2) 0
- Estimated mean treatment effect in the experiment

is - (a1 a2) (ß?(1) ß?(2)) (e1?? e2??)
- The variance of the estimated treatment effect is

- 2sW2 nsS2 /mn 21 (n 1)?s2/mn
- where ? sS2/(sS2 sW2) sS2/s2

Mixed Effects Procedures (Schools Random)

- The test for treatment effects
- FT MST/MSBS with (m 2) df
- There is no omnibus test for context effects
- Power is determined by the operational effect

size - where ? sS2/(sS2 sW2) sS2/s2

Expected Mean Squares Hierarchical Design (Two

Levels, Schools Fixed)

Mixed Effects Procedures (Schools Fixed)

- The test for treatment effects has
- H0 (a1 a2) (ß?(1) ß?(2)) 0
- Note that the school effects are confounded with

treatment effects - Estimated mean treatment effect in the experiment

is - (a1 a2) (ß?(1) ß?(2)) (e1?? e2??)
- The variance of the estimated treatment effect is

- 2sW2 /mn

Mixed Effects Procedures (Schools Fixed)

- The test for treatment effects
- FT MST/MSWS with m(n 1) df
- There is no omnibus test for context effects,

because each school gets only one treatment - Power is determined by the operational effect

size - and m(n 1) df

Comparing Fixed and Mixed Effects

Procedures (Hierarchical Design)

Comparing Fixed and Mixed Effects Statistical

Procedures (Hierarchical Design)

- Conditional and unconditional inference models
- estimate different treatment effects
- have different contaminating factors that add

uncertainty - Mixed procedures are good for unconditional

inference - The fixed procedures are not generally

recommended - The fixed procedures have higher power

Comparing Hierarchical Designs to Randomized

Block Designs

- Randomized block designs usually have higher

power, but assignment of different treatments

within schools or classes may be - practically difficult
- politically infeasible
- theoretically impossible
- It may be methodologically unwise because of

potential for - Contamination or diffusion of treatments
- compensatory rivalry or demoralization

Applications to Experimental Design

- We will address the two most widely used

experimental designs in education - Randomized blocks designs with 2 levels
- Randomized blocks designs with 3 levels
- Hierarchical designs with 2 levels
- Hierarchical designs with 3 levels
- We also examine the effect of covariates
- Hereafter, we generally take schools to be random

Complications

- Which matchings do we have to take into account

in design (e.g., schools, districts, regions,

states, regions of the country, country)? - Ignore some, control for effects of others as

fixed blocking factors - Justify this as part of the population definition
- For example, we define the inference population

as these five districts within these two states - But, doing so obviously constrains

generalizability

Precision of the Estimated Treatment Effect

- Precision is the standard error of the estimated

treatment effect - Precision in simple (simple random sample)

designs depends on - Standard deviation in the population s
- Total sample size N
- The precision is

Precision of the Estimated Treatment Effect

- Precision in complex (clustered sample) designs

depends on - The (total) standard deviation sT
- Sample size at each level of sampling
- (e.g., m clusters, n individuals per cluster)
- Intraclass correlation structure
- It is a little harder to compute than in simple

designs, but important because it helps you see

what matters in design

Intraclass Correlations in Two-level Designs

- In two-level designs the intraclass correlation

structure is determined by a single intraclass

correlation - This intraclass correlation is the proportion of

the total variance that is between schools

(clusters)

Precision in Two-level Hierarchical Design With

No Covariates

- The standard error of the treatment effect is
- SE decreases as m (number of schools) increases
- SE deceases as n increases, but only up to point
- SE increases as ? increases

Statistical Power

- Power in simple (simple random sample) designs

depends on - Significance level
- Effect size
- Sample size
- Look power up in a table for sample size and

effect size

Fragment of Cohens Table 2.3.5

Computing Statistical Power

- Power in complex (clustered sample) designs

depends on - Significance level
- Effect size d
- Sample size at each level of sampling
- (e.g., m clusters, n individuals per cluster)
- Intraclass correlation structure
- This makes it seem a lot harder to compute

Computing Statistical Power

- Computing statistical power in complex designs is

only a little harder than computing it for simple

designs - Compute operational effect size (incorporates

sample design information) ?T - Look power up in a table for operational sample

size and operational effect size - This is the same table that you use for simple

designs

Power in Two-level Hierarchical Design With No

Covariates

- Basic Idea
- Operational Effect Size (Effect Size) x (Design

Effect) - ?T d x (Design Effect)
- For the two-level hierarchical design with no

covariates - Operational sample size is number of schools

(clusters)

Power in Two-level Hierarchical Design With No

Covariates

- As m (number of schools) increases, power

increases - As effect size increases, power increases
- Other influences occur through the design effect
- As ? increases the design effect (and power)

decreases - No matter how large n gets the maximum design

effect is - Thus power only increases up to some limit as n

increases

Two-level Hierarchical Design With Covariates

(HLM Notation)

- Level 1 (individual level)
- Yijk ß0j ß1jXijk eijk e N(0, sAW2)
- Level 2 (school Level)
- ß0j p00 p01Tj p02Wj ?0j ? N(0,

sAS2) - ß1j p10
- Note that the covariate effect ß1j p10 is a

fixed effect - If we code the treatment Tj ½ or - ½ , then the

parameters are identical to those in standard

ANCOVA

Precision in Two-level Hierarchical Design With

Covariates

- The standard error of the treatment effect
- SE decreases as m increases
- SE deceases as n increases, but only up to point
- SE increases as ? increases
- SE decreases as RW2 and RS2 increase

Power in Two-level Hierarchical Design With

Covariates

- Basic Idea
- Operational Effect Size (Effect Size) x (Design

Effect) - ?T d x (Design Effect)
- For the two-level hierarchical design with

covariates - The covariates increase the design effect

Power in Two-level Hierarchical Design With

Covariates

- As m and effect size increase, power increases
- Other influences occur through the design effect
- As ? increases the design effect (and power)

decrease - Now the maximum design effect as large n gets big

is - As the covariate-outcome correlations RW2 and RS2

increase the design effect (and power) increases

Three-level Hierarchical Design

- Here there are three factors
- Treatment
- Schools (clusters) nested in treatments
- Classes (subclusters) nested in schools
- Suppose there are
- m schools (clusters) per treatment
- p classes (subclusters) per school (cluster)
- n students (individuals) per class (subcluster)

Three-level Hierarchical Design With No Covariates

- The statistical model for the observation on the

lth person in the kth class in the jth school in

the ith treatment is - Yijkl µ ai ßj(i) ?k(ij) eijkl
- where
- µ is the grand mean,
- ai is the average effect of being in treatment i,

- ßj(i) is the average effect of being in school j,

in treatment i - ?k(ij) is the average effect of being in class k

in treatment i, in school j, - eijkl is a residual

Three-level Hierarchical Design With No

Covariates (HLM Notation)

- Level 1 (individual level)
- Yijkl ß0jk eijkl e N(0, sW2)
- Level 2 (classroom level)
- ß0jk ?0j ?0jk ? N(0, sC2)
- Level 3 (school Level)
- ?0j p00 p01Tj ?0j ? N(0, sS2)
- If we code the treatment Tj ½ or - ½ , then
- p00 µ, p01 a1, ?0j ?k(ij), ?0jk ßj(i)

Three-level Hierarchical Design Intraclass

Correlations

- In three-level designs there are two levels of

clustering and two intraclass correlations - At the school (cluster) level
- At the classroom (subcluster) level

Precision in Three-level Hierarchical Design With

No Covariates

- The standard error of the treatment effect
- SE decreases as m increases
- SE deceases as p and n increase, but only up to

point - SE increases as ?S and ?C increase

Power in Three-level Hierarchical Design With No

Covariates

- Basic Idea
- Operational Effect Size (Effect Size) x (Design

Effect) - ?T d x (Design Effect)
- For the three-level hierarchical design with no

covariates - The operational sample size is the number of

schools

Power in Three-level Hierarchical Design With No

Covariates

- As m and the effect size increase, power

increases - Other influences occur through the design effect
- As ?S or ?C increases the design effect decreases
- No matter how large n gets the maximum design

effect is - Thus power only increases up to some limit as n

increases

Three-level Hierarchical Design With Covariates

(HLM Notation)

- Level 1 (individual level)
- Yijkl ß0jk ß1jkXijkl eijkl e N(0,

sAW2) - Level 2 (classroom level)
- ß0jk ?00j ?01jZjk ?0jk ? N(0, sAC2)
- ß1jk ?10j
- Level 3 (school Level)
- ?00j p00 p01Tj p02Wj ?0j ? N(0,

sAS2) - ?01j p01
- ?10j p10
- The covariate effects ß1jk ?10j p10 and ?01j

p01 are fixed

Precision in Three-level Hierarchical Design With

Covariates

- SE decreases as m increases
- SE deceases as p and n increase, but only up to

point - SE increases as ? increases
- SE decreases as RW2, RC2, and RS2 increase

Power in Three-level Hierarchical Design With

Covariates

- Basic Idea
- Operational Effect Size (Effect Size) x (Design

Effect) - ?T d x (Design Effect)
- For the three-level hierarchical design with

covariates - The operational sample size is the number of

schools

Power in Three-level Hierarchical Design With

Covariates

- As m and the effect size increase, power

increases - Other influences occur through the design effect
- As ?S or ?C increase the design effect decreases
- No matter how large n gets the maximum design

effect is - Thus power only increases up to some limit as n

increases

Randomized Block Designs

Two-level Randomized Block Design With No

Covariates (HLM Notation)

- Level 1 (individual level)
- Yijk ß0j ß1jTijk eijk e N(0, sW2)
- Level 2 (school Level)
- ß0j p00 ?0j ?0j N(0, sS2)
- ß1j p10 ?1j ?1j N(0, sTxS2)
- If we code the treatment Tijk ½ or - ½ , then

the parameters are identical to those in standard

ANOVA

Randomized Block Designs

- In randomized block designs, as in hierarchical

designs, the intraclass correlation has an impact

on precision and power - However, in randomized block designs designs

there is also a parameter reflecting the degree

of heterogeneity of treatment effects across

schools - We define this heterogeneity parameter ?S in

terms of the amount of heterogeneity of treatment

effects relative to the heterogeneity of school

means - Thus
- ?S sTxS2/sS2

Precision in Two-level Randomized Block

Design With No Covariates

- The standard error of the treatment effect
- SE decreases as m (number of schools) increases
- SE deceases as n and p increase, but only up to

point - SE increases as ? increases
- SE increases as ?S sTxS2/sS2 increases

Power in Two-level Randomized Block Design With

No Covariates

- Basic Idea
- Operational Effect Size (Effect Size) x (Design

Effect) - ?T d x (Design Effect)
- For the two-level hierarchical design with no

covariates - Operational sample size is number of schools

(clusters)

Precision in Two-level Randomized Block

Design With Covariates

- The standard error of the treatment effect
- SE decreases as m increases
- SE deceases as n increases, but only up to point
- SE increases as ? increases
- SE increases as ?S sTxS2/sS2 increases
- SE (generally) decreases as RW2 and RS2 increase

Power in Two-level Randomized Block Design With

Covariates

- Basic Idea
- Operational Effect Size (Effect Size) x (Design

Effect) - ?T d x (Design Effect)
- For the two-level hierarchical design with

covariates - The covariates increase the design effect

Three-level Randomized Block Designs

Three-level Randomized Block Design With No

Covariates

- Here there are three factors
- Treatment
- Schools (clusters) nested in treatments
- Classes (subclusters) nested in schools
- Suppose there are
- m schools (clusters) per treatment
- 2p classes (subclusters) per school (cluster)
- n students (individuals) per class (subcluster)

Three-level Randomized Block Design With No

Covariates

- The statistical model for the observation on the

lth person in the kth class in the ith treatment

in the jth school is - Yijkl µ ai ßj ?k aßij eijkl
- where
- µ is the grand mean,
- ai is the average effect of being in treatment i,

- ßj is the average effect of being in school j,
- ?k is the effect of being in the kth class,
- aßij is the difference between the average effect

of treatment i and the effect of that treatment

in school j, - eijkl is a residual

Three-level Randomized Block Design With No

Covariates (HLM Notation)

- Level 1 (individual level)
- Yijkl ß0jk eijkl e N(0, sW2)
- Level 2 (classroom level)
- ß0jk ?00j ?01jTj ?0jk ? N(0, sC2)
- Level 3 (school Level)
- ?00j p00 ?0j ?oi N(0, sS2)
- ?01j p10 ?1j ?1i N(0, sTxS2)
- If we code the treatment Tj ½ or - ½ , then
- p00 µ, p10 a1, ?0j ßj , ?1j aßij , ?0jk

?k

Three-level Randomized Block Design Intraclass

Correlations

- In three-level designs there are two levels of

clustering and two intraclass correlations - At the school (cluster) level
- At the classroom (subcluster) level

Three-level Randomized Block Design

Heterogeneity Parameters

- In three-level designs, as in two-level

randomized block designs, there is also a

parameter reflecting the degree of heterogeneity

of treatment effects across schools - We define this parameter ?S in terms of the

amount of heterogeneity of treatment effects

relative to the heterogeneity of school means

(just like in two-level designs) - Thus
- ?S sTxS2/sS2

Precision in Three-level Randomized Block

Design With No Covariates

- The standard error of the treatment effect
- SE decreases as m increases
- SE deceases as p and n increase, but only up to

point - SE increases as ?S increases
- SE increases as ?S and ?C increase

Power in Three-level Randomized Block Design With

No Covariates

- Basic Idea
- Operational Effect Size (Effect Size) x (Design

Effect) - ?T d x (Design Effect)
- For the three-level hierarchical design with no

covariates - The operational sample size is the number of

schools

Power in Three-level Randomized Block Design With

No Covariates

- As m and the effect size increase, power

increases - Other influences occur through the design effect
- As ?S or ?C increases the design effect decreases
- No matter how large n gets the maximum design

effect is - Thus power only increases up to some limit as n

increases

Power in Three-level Randomized Block Design With

Covariates

- SE decreases as m increases
- SE deceases as p and n increases, but only up to

point - SE increases as ? and ?S increase
- SE decreases as RW2, RC2, and RS2 increase

Power in Three-level Randomized Block Design With

Covariates

- Basic Idea
- Operational Effect Size (Effect Size) x (Design

Effect) - ?T d x (Design Effect)
- For the three-level hierarchical design with

covariates - The operational sample size is the number of

schools

Power in Three-level Randomized Block Design With

Covariates

- As m and the effect size increase, power

increases - Other influences occur through the design effect
- As ?S or ?C increases the design effect decreases
- No matter how large n gets the maximum design

effect is - Thus power only increases up to some limit as n

increases

What Unit Should Be Randomized? (Schools,

Classrooms, or Students)

- Experiments cannot estimate the causal effect on

any individual - Experiments estimate average causal effects on

the units that have been randomized - If you randomize schools the (average) causal

effects are effects on schools - If you randomize classes, the (average) causal

effects are on classes - If you randomize individuals, the (average)

causal effects estimated are on individuals

What Unit Should Be Randomized? (Schools,

Classrooms, or Students)

- Theoretical Considerations
- Decide what level you care about, then randomize

at that level - Randomization at lower levels may impact

generalizability of the causal inference (and it

is generally a lot more trouble) - Suppose you randomize classrooms, should you also

randomly assign students to classes? - It depends Are you interested in the average

causal effect of treatment on naturally occurring

classes or on randomly assembled ones?

What Unit Should Be Randomized? (Schools,

Classrooms, or Students)

- Relative power/precision of treatment effect
- Assign Schools
- (Hierarchical Design)
- Assign Classrooms
- (Randomized Block)
- Assign Students
- (Randomized Block)

What Unit Should Be Randomized? (Schools,

Classrooms, or Students)

- Precision of estimates or statistical power

dictate assigning the lowest level possible - But the individual (or even classroom) level will

not always be feasible or even theoretically

desirable

Questions and Answers About Design

Questions and Answers About Design

- Is it ok to match my schools (or classes) before

I randomize to decrease variation? - I assigned treatments to schools and am not using

classes in the analysis. Do I have to take them

into account in the design? - I am assigning schools, and using every class in

the school. Do I have to include classes as a

nested factor? - My schools all come from two districts, but I am

randomly assigning the schools. Do I have to

take district into account some way?

Questions and Answers About Design

- I didnt really sample the schools in my

experiment (who does?). Do I still have to treat

schools as random effects? - I didnt really sample my schools, so what

population can I generalize to anyway? - 3. I am using a randomized block design with

fixed effects. Do you really mean I cant say

anything about effects in schools that are not in

the sample?

Questions and Answers About Design

- We randomly assigned, but our assignment was

corrupted by treatment switchers. What do we do? - We randomly assigned, but our assignment was

corrupted by attrition. What do we do? - We randomly assigned but got a big imbalance on

characteristics we care about (gender, race,

language, SES). What do we do? - We randomly assigned but when we looked at the

pretest scores, we see that we got a big

imbalance (a bad randomization). What do we do?

Questions and Answers About Design

- We care about treatment effects, but we really

want to know about mechanism. How do we find out

if implementation impacts treatment effects? - We want to know where (under what conditions) the

treatment works. Can we analyze the relation

between conditions and treatment effect to find

this out? - We have a randomized block design and find

heterogeneous treatment effects. What can we say

about the main effect of treatment in the

presence of interactions?

Questions and Answers About Design

- I prefer to use regression and I know that

regression and ANOVA are equivalent. Why do I

need all this ANOVA stuff to design and analyze

experiments? - Dont robust standard errors in regression solve

all these problems? - I have heard of using school fixed effects to

analyze a randomized block design. Is the a good

alternative to ANOVA or HLM? - Can I use school fixed effects in a hierarchical

design?

Questions and Answers About Design

- We want to use covariates to improve precision,

but we find that they act somewhat differently in

different groups (have different slopes). What

do we do? - We get somewhat different variances in different

groups. Should we use robust standard errors? - We get somewhat different answers with different

analyses. What do we do?

- Thank You !