Loading...

PPT – Basic Experimental Design PowerPoint presentation | free to download - id: 438a2c-YTE5Z

The Adobe Flash plugin is needed to view this content

Basic Experimental Design

- Larry V. Hedges
- Northwestern University
- Prepared for the IES Summer Research Training

Institute July 26, 2010

Institute Schedule

Monday Tuesday Wednesday Thursday Friday

26-Jul 27-Jul 28-Jul 29-Jul 30-Jul

800-1000 800-1000 800-1000 800-1000 800-1000

Basic Design I Sample/power I Growth Modeling Power Lab I Specify models

Hedges Bloom Hedges Spybrook Lipsey

1030-1230 1030-1230 1030-1230 1030-1230 1030-1230

Basic Design II Sample/Power II Analysis Lab I Power Lab II Describe outcomes

Hedges Bloom Hedges Spybrook Lipsy

Konstantopoulos

Lunch 1230-130 Lunch 1230-130 Lunch 1230-130 Lunch 1230-130 Lunch 1230-130

130-330 130-330 130-330 130-330 130-330

Basic Design III Sampling/External Analysis Lab II Mediation Models Model Cause

Hedges Validity Hedges Beretvas Cordray

Bloom Konstantopoulos

400-530 400-530 400-530 400-530 400-530

Introduce Group Project Group Project Group Project Group Project

Group Projects Meeting Meeting Meeting Meeting

Cordray Cordray Others Cordray Others Cordray Others Others

Dinner 600 Dinner 600 Dinner at Carmen's Dinner 600 Dinner at Stained Glass

Institute Schedule

Monday Tuesday Wednesday Thursday

2-Aug 3-Aug 4-Aug 5-Aug

800-1000 800-1000 800-1000 800-1000

Missing Data I Moderator Analysis Finalize Group Group 3 Presents

Graham Konstantopoulos Projects (faculty feedback)

1030-1230 1030-1230 1030-1230 1030-1230

Missing Data II Alternate Designs I Finalize Group Group 4 Presents

Graham Lipsey Projects (faculty feedback)

Lunch 1230-130 Lunch 1230-130 Lunch 1230-130 Lunch 1230-130

130-330 130-330 130-330 130-330

Analyzing Fidelity Alternate Designs II Group 1 Presents Group 5 presents

Cordray Lipsey (faculty feedback) (faculty feedback)

400-530 400-530 400-530 400-530

Group Project Group Project Group 2 Presents Course Evaluation

Meeting Meeting

Cordray Others Cordray Others (faculty feedback) Debrief

Dinner at Mt Everest Dinner 600 Dinner 600 Dinner Graduation

What is Experimental Design?

- Experimental design includes both
- Strategies for organizing data collection
- Data analysis procedures matched to those data

collection strategies - Classical treatments of design stress analysis

procedures based on the analysis of variance

(ANOVA) - Other analysis procedure such as those based on

hierarchical linear models or analysis of

aggregates (e.g., class or school means) are also

appropriate

Why Do We Need Experimental Design?

- Because of variability
- We wouldnt need a science of experimental design

if - If all units (students, teachers, schools) were

identical - and
- If all units responded identically to treatments
- We need experimental design to control

variability so that treatment effects can be

identified

A Little History

- The idea of controlling variability through

design has a long history - In 1747 Sir James Linds studies of scurvy
- Their cases were as similar as I could have

them. They all in general had putrid gums, spots

and lassitude, with weakness of their knees.

They lay together on one place and had one diet

common to all (Lind, 1753, p. 149) - Lind then assigned six different treatments to

groups of patients

A Little History

- The idea of random assignment was not obvious and

took time to catch on - In 1648 von Helmont carried out one randomization

in a trial of bloodletting for fevers - In 1904 Karl Pearson suggested matching and

alternation in typhoid trials - Amberson, et al. (1931) carried out a trial with

one randomization - In 1937 Sir Bradford Hill advocated alternation

of patients in trials rather than randomization - Diehl, et al. (1938) carried out a trial that is

sometimes referred to as randomized, but it

actually used alternation

A Little History

- The first modern randomized clinical trial in

medicine is usually considered to be the trial of

streptomycin for treating tuberculosis - It was conducted by the British Medical Research

Council in 1946 and reported in 1948

A Little History

- Experiments have been used longer in the

behavioral sciences (e.g., psychophysics Pierce

and Jastrow, 1885) - Experiments conducted in laboratory settings were

widely used in educational psychology (e.g.,

McCall, 1923) - Thorndike (early 1900s)
- Lindquist (1953)
- Gage field experiments on teaching (1978 1984)

A Little History

- Studies in crop variation I VI (1921 1929)
- In 1919 a statistician named Fisher was hired at

Rothamsted agricultural station - They had a lot of observational data on crop

yields and hoped a statistician could analyze it

to find effects of various treatments - All he had to do was sort out the effects of

confounding variables

Studies in Crop Variation I (1921)

- Fisher does regression analyseslots of themto

study (and get rid of) the effects of confounders - soil fertility gradients
- drainage differences
- effects of rainfall
- effects of temperature and weather, etc.
- Fisher does qualitative work to sort out

anomalies - Conclusion
- The effects of confounders are typically larger

than those of the systematic effects we want to

study

Studies in Crop Variation II (1923)

- Fisher invents
- Basic principles of experimental design
- Control of variation by randomization
- Analysis of variance

Studies in Crop Variation IV and VI

- Studies in Crop variation IV (1927)
- Fisher invents analysis of covariance to combine

statistical control and control by randomization - Studies in crop variation VI (1929)
- Fisher refines the theory of experimental

design, introducing most other key concepts known

today

Our Hero in 1929

Principles of Experimental Design

- Experimental design controls background

variability so that systematic effects of

treatments can be observed - Three basic principles
- Control by matching
- Control by randomization
- Control by statistical adjustment
- Their importance is in that order

Control by Matching

- Known sources of variation may be eliminated by

matching - Eliminating genetic variation
- Compare animals from the same litter of mice
- Eliminating district or school effects
- Compare students within districts or schools
- However matching is limited
- matching is only possible on observable

characteristics - perfect matching is not always possible
- matching inherently limits generalizability by

removing (possibly desired) variation

Control by Matching

- Matching ensures that groups compared are alike

on specific known and observable characteristics

(in principle, everything we have thought of) - Wouldnt it be great if there were a method of

making groups alike on not only everything we

have thought of, but everything we didnt think

of too? - There is such a method

Control by Randomization

- Matching controls for the effects of variation

due to specific observable characteristics - Randomization controls for the effects all

(observable or non-observable, known or unknown)

characteristics - Randomization makes groups equivalent (on

average) on all variables (known and unknown,

observable or not) - Randomization also gives us a way to assess

whether differences after treatment are larger

than would be expected due to chance.

Control by Randomization

- Random assignment is not assignment with no

particular rule. It is a purposeful process - Assignment is made at random. This does not

mean that the experimenter writes down the names

of the varieties in any order that occurs to him,

but that he carries out a physical experimental

process of randomization, using means which shall

ensure that each variety will have an equal

chance of being tested on any particular plot of

ground (Fisher, 1935, p. 51)

Control by Randomization

- Random assignment of schools or classrooms is not

assignment with no particular rule. It is a

purposeful process - Assignment of schools to treatments is made at

random. This does not mean that the experimenter

assigns schools to treatments in any order that

occurs to her, but that she carries out a

physical experimental process of randomization,

using means which shall ensure that each

treatment will have an equal chance of being

tested in any particular school (Hedges, 2007)

Control by Statistical Adjustment

- Control by statistical adjustment is a form of

pseudo-matching - It uses statistical relations to simulate

matching - Statistical control is important for increasing

precision but should not be relied upon to

control biases that may exist prior to assignment - Statistical control is the weakest of the three

experimental design principles because its

validity depends on knowing a statistical model

for responses

Using Principles of Experimental Design

- You have to know a lot (be smart) to use matching

and statistical control effectively - You do not have to be smart to use randomization

effectively - But
- Where all are possible, randomization is not as

efficient (requires larger sample sizes for the

same power) as matching or statistical control

Basic Ideas of Design Independent Variables

(Factors)

- The values of independent variables are called

levels - Some independent variables can be manipulated,

others cant - Treatments are independent variables that can be

manipulated - Blocks and covariates are independent variables

that cannot be manipulated - These concepts are simple, but are often confused
- Remember
- You can randomly assign treatment levels but not

blocks

Basic Ideas of Design (Crossing)

- Relations between independent variables
- Factors (treatments or blocks) are crossed if

every level of one factor occurs with every level

of another factor - Example
- The Tennessee class size experiment assigned

students to one of three class size conditions.

All three treatment conditions occurred within

each of the participating schools - Thus treatment was crossed with schools

Basic Ideas of Design (Nesting)

- Factor B is nested in factor A if every level of

factor B occurs within only one level of factor A - Example
- The Tennessee class size experiment actually

assigned classrooms to one of three class size

conditions. Each classroom occurred in only one

treatment condition - Thus classrooms were nested within treatments
- (But treatment was crossed with schools)

Where Do These Terms Come From? (Nesting)

- An agricultural experiment where blocks are

literally blocks or plots of land - Here each block is literally nested within a

treatment condition

Blocks Blocks Blocks Blocks Blocks Blocks

1 2 n

T1 T2 T1

T1 T2 T1

Where Do These Terms Come From? (Crossing)

- An agricultural experiment
- Blocks were literally blocks of land and plots of

land within blocks were assigned different

treatments

Blocks Blocks Blocks Blocks Blocks Blocks Blocks Blocks

1 2 n

T1 T2 T1

T2 T1 T2

Where Do These Terms Come From? (Crossing)

- Blocks were literally blocks of land and plots of

land within blocks were assigned different

treatments. - Here treatment literally crosses the blocks

Blocks Blocks Blocks Blocks Blocks Blocks Blocks Blocks

1 2 n

T1 T2 T1

T2 T1 T2

Where Do These Terms Come From? (Crossing)

- The experiment is often depicted like this. What

is wrong with this as a field layout? - Consider possible sources of bias

Blocks Blocks Blocks Blocks Blocks Blocks Blocks Blocks

1 2 n

Treatment 1

Treatment 2

Blocking Variables

- We often exploit natural structure by adding

blocking variables to the design - Examples
- districts
- states
- regions
- This may be a good idea if they explain variation
- But it raises issues in analysis about how you

think about the blocks (fixed or random effects) - We will talk about that later

Think About These Designs

- A study was to assign schools to treatments, but

you decide to block by districts before

assignment to treatments - A study was to have assigned individuals

(students) to treatments within schools, but you

decide to block by districts before assignment to

treatments - Both of these designs occur frequently
- Which design would you expect to be the most

sensitive?

Districts As Blocks Added to a Hierarchical Design

- D1 D2
- T1 T2 T1 T2
- S1 S2 S3 S4 S5 S6 S7 S8

Districts As Blocks Added to a Randomized Blocks

Design

- D1 D2
- T1 T2 T1 T2
- S1 S2 S1 S2 S3 S4 S3 S4

Think About These Designs

- 1. A study assigns T or C to 20 teachers. The

teachers are in five schools, and each teacher

teaches 4 science classes - 2. A study assigns a reading treatment (or

control) to children in 20 schools. Each child

is classified into one of three groups with

different risk of reading failure. - 3. Two schools in each of 10 districts are picked

to participate. Each school has two grade 4

teachers. One of them is assigned to T, the other

to C

Three Basic Designs

- The completely randomized design
- Treatments are assigned to individuals
- The randomized block design
- Treatments are assigned to individuals within

blocks - (This is sometimes called the matched design,

because individuals are matched within blocks) - The hierarchical design
- Treatments are assigned to blocks, the same

treatment is assigned to all individuals in the

block

The Completely Randomized Design

- Individuals are randomly assigned to one of two

treatments

Treatment Control

Individual 1 Individual 1

Individual 2 Individual 2

Individual nT Individual nC

The Randomized Block Design

Block 1 Block m

Treatment 1 Individual 1 Individual 1

Treatment 1

Treatment 1

Treatment 1 Individual n1 Individual nm

Treatment 2 Individual n1 1 Individual nm 1

Treatment 2

Treatment 2

Treatment 2 Individual 2n1 Individual 2nm

The Hierarchical Design

Treatment Treatment Treatment Control Control Control

Block 1 Block m Block m1 Block 2m

Individual 1 Individual 1 Individual 1 Individual 1

Individual 2 Individual 2 Individual 2 Individual 2

Individual n1 Individual nm Individual nm1 Individual n2m

Randomization Procedures

- Randomization has to be done as an explicit

process devised by the experimenter - Haphazard is not the same as random
- Unknown assignment is not the same as random
- Essentially random is technically meaningless
- Alternation is not random, even if you alternate

from a random start - This is why R.A. Fisher was so explicit about

randomization processes

Randomization Procedures

- R.A. Fisher on how to randomize an experiment

with small sample size and 5 treatments - A satisfactory method is to use a pack of cards

numbered from 1 to 100, and to arrange them in

random order by repeated shuffling. The

varieties treatments are numbered from 1 to 5,

and any card such as the number 33, for example

is deemed to correspond to variety treatment

number 3, because on dividing by 5 this number is

found as the remainder. (Fisher, 1935, p.51)

Randomization Procedures

- Think about Fishers description
- Does it worry you in any way?

Randomization Procedures

- You may want to use a table of random numbers,

but be sure to pick an arbitrary start point! - Beware random number generatorsthey typically

depend on seed values, be sure to vary the seed

value (if they do not do it automatically) - Otherwise you can reliably generate the same

sequence of random numbers every time - It is no different that starting in the same

place in a table of random numbers

Randomization Procedures

- Completely Randomized Design
- (2 treatments, 2n individuals)
- Make a list of all individuals
- For each individual, pick a random number from 1

to 2 (odd or even) - Assign the individual to treatment 1 if even, 2

if odd - When one treatment is assigned n individuals,

stop assigning more individuals to that treatment

Randomization Procedures

- Completely Randomized Design (2pn

individuals, p treatments) - Make a list of all individuals
- For each individual, pick a random number from 1

to p - One way to do this is to get a random number of

any size, divide by p, the remainder R is between

0 and (p 1), so add 1 to the remainder to get R

1 - Assign the individual to treatment R 1
- Stop assigning individuals to any treatment after

it gets n individuals

Randomization Procedures

- Randomized Block Design with 2 Treatments
- (m blocks per treatment, 2n individuals per

block) - Make a list of all individuals in the first block
- For each individual, pick a random number from 1

to 2 (odd or even) - Assign the individual to treatment 1 if even, 2

if odd - Stop assigning a treatment it is assigned n

individuals in the block - Repeat the same process with every block

Randomization Procedures

- Randomized Block Design with p Treatments
- (m blocks per treatment, pn individuals per

block) - Make a list of all individuals in the first block
- For each individual, pick a random number from 1

to p - Assign the individual to treatment p
- Stop assigning a treatment it is assigned n

individuals in the block - Repeat the same process with every block

Randomization Procedures

- Hierarchical Design with 2 Treatments
- (m blocks per treatment, n individuals per

block) - Make a list of all blocks
- For each block, pick a random number from 1 to 2
- Assign the block to treatment 1 if even,

treatment 2 if odd - Stop assigning a treatment after it is assigned m

blocks - Every individual in a block is assigned to the

same treatment

Randomization Procedures

- Hierarchical Design with p Treatments
- (m blocks per treatment, n individuals per

block) - Make a list of all blocks
- For each block, pick a random number from 1 to p
- Assign the block to treatment corresponding to

the number - Stop assigning a treatment after it is assigned m

blocks - Every individual in a block is assigned to the

same treatment

Randomization Procedures

- What if I get a big imbalance by chance?
- Classical answers
- If there are random assignments you wouldnt

like, include blocking variables - OR
- Use statistical control
- More complicated alternatives
- Adaptive randomization methods (e.g., Efrons)

Sampling Models

Sampling Models in Educational Research

- Sampling models are often ignored in educational

research - But
- Sampling is where the randomness comes from in

social research - Sampling therefore has profound consequences for

statistical analysis and research designs

Sampling Models in Educational Research

- Which is a better simple random sample (which

sample will provide a more precise estimate)? - Sample A, with N 1,000
- Sample B, with N 2,000

Sampling Models in Educational Research

- Why?
- Because if the population variance is sT2
- We know that the variance of the sample mean from

a sample of size N is - sT2/N
- But

Sampling Models in Educational Research

- Simple random samples are rare in field research
- Educational populations are hierarchically

nested - Students in classrooms in schools
- Schools in districts in states
- We usually exploit the population structure to

sample students by first sampling schools - Even then, most samples are not probability

samples, but they are intended to be

representative (of some population)

Sampling Models in Educational Research

- Survey research calls this strategy multistage

(multilevel) clustered sampling - We often sample clusters (schools) first then

individuals within clusters (students within

schools) - This is a two-stage (two-level) cluster sample
- We might sample schools, then classrooms, then

students - This is a three-stage (three-level) cluster

sample

Sampling Models in Educational Research

- Which is a better two-stage sample (which sample

will provide a more precise estimate)? - Sample A, with N 1,000
- Sample B, with N 2,000
- Now we cannot tell unless we know the number of

clusters (m) and number of units (n) in each

cluster

Precision of Estimates Depends on the Sampling

Model

- Suppose the total population variance is sT2 and

ICC is ? - Consider two samples of size N mn
- A simple random sample or stratified sample
- The variance of the mean is sT2/mn
- A clustered sample of n students from each of m

schools - The variance of the mean is (sT2/mn)1 (n

1)? - The inflation factor 1 (n 1)? is called the

design effect

Precision of Estimates Depends on the Sampling

Model

- Suppose the population variance is sT2
- School level ICC is ?S, class level ICC is ?C
- Consider two samples of size N mpn
- A simple random sample or stratified sample
- The variance of the mean is sT2/mpn
- A clustered sample of n students from p classes

in m schools - The variance is (sT2/mpn)1 (pn 1)?S (n

1)?C - The three level design effect is 1 (pn 1)?S

(n 1)?C

Example

- For example, suppose ? 0.20
- Sample A
- Suppose m 100 and n 10, so N 1,000 then the

variance of the mean is - (sT2/100 x 10)1 (10 1)0.20

(sT2/1000)(2.8) - Sample B
- Suppose m 20 and n 100, so N 2,000, then

the variance of the mean is - (sT2/100 x 20)1 (100 1)0.20

(sT2/1000)(10.4)

Precision of Estimates Depends on the Sampling

Model

- The total variance can be partitioned into

between cluster (sB2 ) and within cluster (sW2 )

variance - We define the intraclass correlation as the

proportion of total variance that is between

clusters - There is typically much more variance within

clusters (sW2 ) than between clusters (sB2 ) - School level intraclass correlation values are

0.10 to 0.25 - This means that (sW2 ) is between 9 and 3 times

as large as (sB2 )

Precision of Estimates Depends on the Sampling

Model

- So why does (sB2 ) have such a big effect?
- Because averaging (independent things) reduces

variance - The variance of the mean of a sample of m

clusters of size n can be written as - The cluster effects are only averaged over the

number of clusters

Precision of Estimates Depends on the Sampling

Model

- Treatment effects in experiments and

quasi-experiments are mean differences - Therefore precision of treatment effects and

statistical power will depend on the sampling

model

Sampling Models in Educational Research

- The fact that the population is structured does

not mean the sample is must be a clustered sample - Whether it is a clustered sample depends on
- How the sample is drawn (e.g., are schools

sampled first then individuals randomly within

schools) - What the inferential population is (e.g., is the

inference to these schools studied or a larger

population of schools)

Sampling Models in Educational Research

- A necessary condition for a clustered sample is

that it is drawn in stages using population

subdivisions - schools then students within schools
- schools then classrooms then students
- However, if all subdivisions in a population are

present in the sample, the sample is not

clustered, but stratified - Stratification has different implications than

clustering - Whether there is stratification or clustering

depends on the definition of the population to

which we draw inferences (the inferential

population)

Sampling Models in Educational Research

- The clustered/stratified distinction matters

because it influences the precision of statistics

estimated from the sample - If all population subdivisions are included in

the every sample, there is no sampling (or

exhaustive sampling) of subdivisions - therefore differences between subdivisions add no

uncertainty to estimates - If only some population subdivisions are included

in the sample, it matters which ones you happen

to sample - thus differences between subdivisions add to

uncertainty

Inferential Population and Inference Models

- The inferential population or inference model has

implications for analysis and therefore for the

design of experiments - Do we make inferences to the schools in this

sample or to a larger population of schools? - Inferences to the schools or classes in the

sample are called conditional inferences - Inferences to a larger population of schools or

classes are called unconditional inferences

Inferential Population and Inference Models

- Note that the inferences (what we are estimating)

are different in conditional versus unconditional

inference models - In a conditional inference, we are estimating the

mean (or treatment effect) in the observed

schools - In unconditional inference we are estimating the

mean (or treatment effect) in the population of

schools from which the observed schools are

sampled - We are still estimating a mean (or a treatment

effect) but they are different parameters with

different uncertainties

Fixed and Random Effects

- When the levels of a factor (e.g., particular

blocks included) in a study are sampled and the

inference model is unconditional, that factor is

called random and its effects are called random

effects - When the levels of a factor (e.g., particular

blocks included) in a study constitute the entire

inference population and the inference model is

conditional, that factor is called fixed and its

effects are called fixed effects

Fixed and Random Effects

- Remember the idea of adding blocking variables
- Technically, if blocking variables (e.g.,

district) are - fixed effects generalizations are limited to the

districts observed - random effects generalizations to a larger

universe of districts - These technicalities are often ignored
- The key point is that generalizations are not

supported by sampling

Applications to Experimental Design

- We will look in detail at the two most widely

used experimental designs in education - Randomized blocks designs
- Hierarchical designs

Experimental Designs

- For each design we will look at
- Structural Model for data (and what it means)
- Two inference models
- What does treatment effect mean in principle
- What is the estimate of treatment effect
- How do we deal with context effects
- Two statistical analysis procedures
- How do we estimate and test treatment effects
- How do we estimate and test context effects
- What is the sensitivity of the tests

The Randomized Block Design

- The population (the sampling frame)
- We wish to compare two treatments
- We assign treatments within schools
- Many schools with 2n students in each
- Assign n students to each treatment in each school

The Randomized Block Design

- The experiment
- Compare two treatments in an experiment
- We assign treatments within schools
- With m schools with 2n students in each
- Assign n students to each treatment in each school

The Randomized Block Design

Schools Schools Schools Schools

Treatment 1 2 m

1

2

- Diagram of the design

The Randomized Block Design

- School 1

Schools Schools Schools Schools

Treatment 1 2 m

1

2

The Conceptual Model

- The statistical model for the observation on the

kth person in the jth school in the ith treatment

is - Yijk µ ai ßj aßij eijk
- where
- µ is the grand mean,
- ai is the average effect of being in treatment i,

- ßj is the average effect of being in school j,
- aßij is the difference between the average effect

of treatment i and the effect of that treatment

in school j, - eijk is a residual

Effect of Context

Context Effect

Two-level Randomized Block Design With No

Covariates (HLM Notation)

- Level 1 (individual level)
- Yijk ß0j ß1jTijk eijk e N(0, sW2)
- Level 2 (school level)
- ß0j p00 ?0j ?0j N(0, sS2)
- ß1j p10 ?1j ?1j N(0, sTxS2)
- If we code the treatment Tijk ½ or - ½ , then

the parameters are identical to those in standard

ANOVA

Effects and Estimates

- The population mean of treatment 1 in school j

is - a1 aß1j
- The population mean of treatment 2 in school j is
- a2 aß2j
- The estimate of the mean of treatment 1 in school

j is - a1 aß1j e1j?
- The estimate of the mean of treatment 2 in school

j is - a2 aß2j e2j?

Effects and Estimates

- The comparative treatment effect in any given

school j is - (a1 a2) (aß1j aß2j)
- The estimate of comparative treatment effect in

school j is - (a1 a2) (aß1j aß2j) (e1j? e2j?)
- The mean treatment effect in the experiment is
- (a1 a2) (aß1? aß2?)
- The estimate of the mean treatment effect in the

experiment is - (a1 a2) (aß 1? aß2?) (e1?? e2??)

Inference Models

- Two different kinds of inferences about effects
- Unconditional Inference (Schools Random)
- Inference to the whole universe of schools
- (requires a representative sample of schools)
- Conditional Inference (Schools Fixed)
- Inference to the schools in the experiment
- (no sampling requirement on schools)

Statistical Analysis Procedures

- Two kinds of statistical analysis procedures
- Mixed Effects Procedures (Schools Random)
- Treat schools in the experiment as a sample from

a population of schools - (only strictly correct if schools are a sample)
- Fixed Effects Procedures (Schools Fixed)
- Treat schools in the experiment as a population

Unconditional Inference (Schools Random)

- The estimate of the mean treatment effect in the

experiment is - (a1 a2) (aß 1? aß2?) (e1?? e2??)
- The average treatment effect we want to estimate

is - (a1 a2)
- The term (e1?? e2??) depends on the students in

the schools in the sample - The term (aß1? aß2?) depends on the schools in

sample - Both (e1?? e2??) and (aß1? aß2?) are random

and average to 0 across students and schools,

respectively

Conditional Inference (Schools Fixed)

- The estimate of the mean treatment effect in the

experiment is still - (a1 a2) (aß 1? aß2?) (e1?? e2??)
- Now the average treatment effect we want to

estimate is - (a1 aß1?) (a2 aß2?) (a1 a2) (aß1?

aß2?) - The term (e1?? e2??) depends on the students in

the schools in the sample - The term (aß1? aß2?) depends on the schools in

sample, but the treatment effect in the sample of

schools is the effect we want to estimate

Expected Mean Squares Randomized Block

Design (Two Levels, Schools Random)

Source df EMS

Treatment (T) 1 sW2 nsTxS2 nmSai2

Schools (S) m 1 sW2 2nsS2

T x S m 1 sW2 nsTxS2

Within Cells 2m(n 1) sW2

Mixed Effects Procedures (Schools Random)

- The test for treatment effects has
- H0 (a1 a2) 0
- Estimated mean treatment effect in the experiment

is - (a1 a2) (aß1? aß2?) (e1?? e2??)
- The variance of the estimated treatment effect is

- 2sW2 nsTxS2 /mn 21 (n?S 1)?s2/mn
- Here ?S sTxS2/sS2 and ? sS2/(sS2 sW2)

sS2/s2

Mixed Effects Procedures

- The test for treatment effects
- FT MST/MSTxS with (m 1) df
- The test for context effects (treatment by

schools interaction) is - FTxS MSTxS/MSWS with 2m(n 1) df
- Power is determined by the operational effect

size - where ?S sTxS2/sS2 and ? sS2/(sS2 sW2)

sS2/s2

Expected Mean Squares Randomized Block

Design (Two Levels, Schools Fixed)

Source Df EMS

Treatment (T) 1 sW2 nmSai2

Schools (S) m 1 sW2 2nSßi2/(m 1)

S x T m 1 sW2 nSSaßij2/(m 1)

Within Cells 2m(n 1) sW2

Fixed Effects Procedures

- The test for treatment effects has
- H0 (a1 a2) (aß1? aß2?) 0
- Estimated mean treatment effect in the experiment

is - (a1 a2) (aß1? aß2?) (e1?? e2??)
- The variance of the estimated treatment effect is

- 2sW2 /mn

Fixed Effects Procedures

- The test for treatment effects
- FT MST/MSWS with m(n 1) df
- The test for context effects (treatment by

schools interaction) is - FC MSTxS/MSWS with 2m(n 1) df
- Power is determined by the operational effect

size - with m(n 1) df

Comparing Fixed and Mixed Effects Statistical

Procedures (Randomized Block Design)

Fixed Mixed

Inference Model Conditional Unconditional

Estimand (a1 a2) (aß1? aß2?) (a1 a2)

Contaminating Factors (e1?? e2??) (aß1? aß2?) (e1?? e2??)

Operational Effect Size

df 2m(n 1) (m 1)

Power higher lower

Comparing Fixed and Mixed Effects

Procedures (Randomized Block Design)

- Conditional and unconditional inference models
- estimate different treatment effects
- have different contaminating factors that add

uncertainty - Mixed procedures are good for unconditional

inference - The fixed procedures are good for conditional

inference - The fixed procedures have higher power

The Hierarchical Design

- The universe (the sampling frame)
- We wish to compare two treatments
- We assign treatments to whole schools
- Many schools with n students in each
- Assign all students in each school to the same

treatment

The Hierarchical Design

- The experiment
- We wish to compare two treatments
- We assign treatments to whole schools
- Assign 2m schools with n students in each
- Assign all students in each school to the same

treatment

The Hierarchical Design

- Diagram of the experiment

Schools Schools Schools Schools Schools Schools Schools Schools

Treatment 1 2 m m 1 m 2 2 m

1

2

The Hierarchical Design

- Treatment 1 schools

Schools Schools Schools Schools Schools Schools Schools Schools

Treatment 1 2 m m 1 m 2 2 m

1

2

The Hierarchical Design

- Treatment 2 schools

Schools Schools Schools Schools Schools Schools Schools Schools

Treatment 1 2 m m 1 m 2 2 m

1

2

The Conceptual Model

- The statistical model for the observation on the

kth person in the jth school in the ith treatment

is - Yijk µ ai ßi aßij ejk(i) µ ai

ßj(i) ejk(i) - µ is the grand mean,
- ai is the average effect of being in treatment i,

- ßj is the average effect if being in school j,
- aßij is the difference between the average effect

of treatment i and the effect of that treatment

in school j, - eijk is a residual
- Or ßj(i) ßi aßij is a term for the combined

effect of schools within treatments

The Conceptual Model

- The statistical model for the observation on the

kth person in the jth school in the ith treatment

is - Yijk µ ai ßi aßij ejk(i) µ ai

ßj(i) ejk(i) - µ is the grand mean,
- ai is the average effect of being in treatment i,

- ßj is the average effect if being in school j,
- aßij is the difference between the average effect

of treatment i and the effect of that treatment

in school j, - eijk is a residual
- or ßj(i) ßi aßij is a term for the combined

effect of schools within treatments

Context Effects

Two-level Hierarchical Design With No Covariates

(HLM Notation)

- Level 1 (individual level)
- Yijk ß0j eijk e N(0, sW2)
- Level 2 (school Level)
- ß0j p00 p01Tj ?0j ? N(0, sS2)
- If we code the treatment Tj ½ or - ½ , then
- p00 µ, p01 a1, ?0j ßj(i)
- The intraclass correlation is ? sS2/(sS2 sW2)

sS2/s2

Effects and Estimates

- The comparative treatment effect in any given

school j is still - (a1 a2) (aß1j aß2j)
- But we cannot estimate the treatment effect in a

single school because each school gets only one

treatment - The mean treatment effect in the experiment is
- (a1 a2) (ß?(1) ß?(2))
- (a1 a2) (ß1? ß2? ) (aß1? aß2?)
- The estimate of the mean treatment effect in the

experiment is - (a1 a2) (ß? (1) ß? (2)) (e1?? e2??)

Inference Models

- Two different kinds of inferences about effects

(as in the randomized block design) - Unconditional Inference (schools random)
- Inference to the whole universe of schools
- (requires a representative sample of schools)
- Conditional Inference (schools fixed)
- Inference to the schools in the experiment
- (no sampling requirement on schools)

Unconditional Inference (Schools Random)

- The average treatment effect we want to estimate

is - (a1 a2)
- The term (e1?? e2??) depends on the students in

the schools in the sample - The term (ß?(1) ß?(2)) depends on the schools

in sample - Both (e1?? e2??) and (ß?(1) ß?(2)) are random

and average to 0 across students and schools,

respectively

Conditional Inference (Schools Fixed)

- The average treatment effect we want to (can)

estimate is - (a1 ß?(1)) (a2 ß?(2)) (a1 a2) (ß?(1)

ß?(2)) - (a1 a2) (ß1? ß2? ) (aß1? aß2?)
- The term (ß?(1) ß?(2)) depends on the schools

in sample, but we want to estimate the effect of

treatment in the schools in the sample - Note that this treatment effect is not quite the

same as in the randomized block design, where we

estimate - (a1 a2) (aß1? aß2?)

Statistical Analysis Procedures

- Two kinds of statistical analysis procedures

(as in the randomized block design) - Mixed Effects Procedures
- Treat schools in the experiment as a sample from

a universe - Fixed Effects Procedures
- Treat schools in the experiment as a universe

Expected Mean Squares Hierarchical Design (Two

Levels, Schools Random)

Source df EMS

Treatment (T) 1 sW2 nsS2 nmSai2

Schools (S) 2(m 1) sW2 nsS2

Within Schools 2m(n 1) sW2

Mixed Effects Procedures (Schools Random)

- The test for treatment effects has
- H0 (a1 a2) 0
- Estimated mean treatment effect in the experiment

is - (a1 a2) (ß?(1) ß?(2)) (e1?? e2??)
- The variance of the estimated treatment effect is

- 2sW2 nsS2 /mn 21 (n 1)?s2/mn
- where ? sS2/(sS2 sW2) sS2/s2

Mixed Effects Procedures (Schools Random)

- The test for treatment effects
- FT MST/MSBS with (m 2) df
- There is no omnibus test for context effects
- Power is determined by the operational effect

size - where ? sS2/(sS2 sW2) sS2/s2

Expected Mean Squares Hierarchical Design (Two

Levels, Schools Fixed)

Source df EMS

Treatment (T) 1 sW2 nmS(ai ß?(i))2

Schools (S) m 1 sW2 nSSßj(i)2/2(m 1)

Within Schools 2m(n 1) sW2

Mixed Effects Procedures (Schools Fixed)

- The test for treatment effects has
- H0 (a1 a2) (ß?(1) ß?(2)) 0
- Note that the school effects are confounded with

treatment effects - Estimated mean treatment effect in the experiment

is - (a1 a2) (ß?(1) ß?(2)) (e1?? e2??)
- The variance of the estimated treatment effect is

- 2sW2 /mn

Mixed Effects Procedures (Schools Fixed)

- The test for treatment effects
- FT MST/MSWS with m(n 1) df
- There is no omnibus test for context effects,

because each school gets only one treatment - Power is determined by the operational effect

size - and m(n 1) df

Comparing Fixed and Mixed Effects

Procedures (Hierarchical Design)

Fixed Mixed

Inference Model Conditional Unconditional

Estimand (a1 a2) (ß?(1) ß?(2)) (a1 a2)

Contaminating Factors (e1?? e2??) (ß?(1) ß?(2)) (e1?? e2??)

Effect Size

df m(n 1) (m 2)

Power higher lower

Comparing Fixed and Mixed Effects Statistical

Procedures (Hierarchical Design)

- Conditional and unconditional inference models
- estimate different treatment effects
- have different contaminating factors that add

uncertainty - Mixed procedures are good for unconditional

inference - The fixed procedures are not generally

recommended - The fixed procedures have higher power

Comparing Hierarchical Designs to Randomized

Block Designs

- Randomized block designs usually have higher

power, but assignment of different treatments

within schools or classes may be - practically difficult
- politically infeasible
- theoretically impossible
- It may be methodologically unwise because of

potential for - Contamination or diffusion of treatments
- compensatory rivalry or demoralization

Comparing Hierarchical Designs to Randomized

Block Designs

- But even when there is substantial contamination

Chris Rhoads has shown that - even though randomized block designs

underestimate the treatment effect - randomized block designs can have higher power

than hierarchical designs - This is not widely known yet, but is important to

remember

Applications to Experimental Design

- We will address the two most widely used

experimental designs in education - Randomized blocks designs with 2 levels
- Randomized blocks designs with 3 levels
- Hierarchical designs with 2 levels
- Hierarchical designs with 3 levels
- We also examine the effect of covariates
- Hereafter, we generally take schools to be random

Complications

- Which matchings do we have to take into account

in design (e.g., schools, districts, regions,

states, regions of the country, country)? - Ignore some, control for effects of others as

fixed blocking factors - Justify this as part of the population definition
- For example, we define the inference population

as these five districts within these two states - But, doing so obviously constrains

generalizability

Precision of the Estimated Treatment Effect

- Precision is the standard error of the estimated

treatment effect - Precision in simple (simple random sample)

designs depends on - Standard deviation in the population s
- Total sample size N
- The precision is

Precision of the Estimated Treatment Effect

- Precision in complex (clustered sample) designs

depends on - The (total) standard deviation sT
- Sample size at each level of sampling
- (e.g., m clusters, n individuals per cluster)
- Intraclass correlation structure
- It is a little harder to compute than in simple

designs, but important because it helps you see

what matters in design

Intraclass Correlations in Two-level Designs

- In two-level designs the intraclass correlation

structure is determined by a single intraclass

correlation - This intraclass correlation is the proportion of

the total variance that is between schools

(clusters) - Typical values of ? are 0.1 to 0.25, so sS2 is

typically 1/9 to 1/3 of sW2 but it has a big

impact

Precision in Two-level Hierarchical Design With

No Covariates

- The standard error of the treatment effect is
- SE decreases as m (number of schools) increases
- SE deceases as n increases, but only up to point
- SE increases as ? increases

How Does Between-Cluster Variance Impact

Precision?

- Think about the standard error again
- So even though sS2 is smaller than sW2, it has a

bigger impact on the uncertainty of the treatment

effect - Suppose sS2 is 1/10 of sS2 (a pretty small value

of ?) if n 30, sS2 will have 3 times

as big an effect on the standard error as will

sW2

Statistical Power

- Power in simple (simple random sample) designs

depends on - Significance level
- Effect size
- Sample size
- Look power up in a table for sample size and

effect size

Fragment of Cohens Table 2.3.5

d d d d d d d

n 0.10 0.20 0.80 1.00 1.20 1.40

8 05 07 31 46 60 73

9 06 07 35 51 65 79

10 06 07 39 56 71 84

11 06 07 43 63 76 87

Computing Statistical Power

- Power in complex (clustered sample) designs

depends on - Significance level
- Effect size d
- Sample size at each level of sampling
- (e.g., m clusters, n individuals per cluster)
- Intraclass correlation structure
- This makes it seem a lot harder to compute

Computing Statistical Power

- Computing statistical power in complex designs is

only a little harder than computing it for simple

designs - Compute operational effect size (incorporates

sample design information) ?T - Look power up in a table for operational sample

size and operational effect size - This is the same table that you use for simple

designs

Power in Two-level Hierarchical Design With No

Covariates

- Basic Idea
- Operational Effect Size (Effect Size) x (Design

Effect) - ?T d x (Design Effect)
- For the two-level hierarchical design with no

covariates - Operational sample size is number of schools

(clusters)

Power in Two-level Hierarchical Design With No

Covariates

- As m (number of schools) increases, power

increases - As effect size increases, power increases
- Other influences occur through the design effect
- As ? increases the design effect (and power)

decreases - No matter how large n gets the maximum design

effect is - Thus power only increases up to some limit as n

increases

Optimal Allocation in the Two-level Hierarchical

Design

- Many different combinations of m and n give the

same power or precision - How should we choose?
- Optimal allocation gives some guidance
- Suppose cost per individual is c1 and cost per

school is c2, so total cost is 2mc2 2mnc1 - gives the optimal n (most precision with smallest

cost)

Optimal Allocation in the Two-level Hierarchical

Design

- The optimal sample size n is often much smaller

than you might think - For example, if ? 0.20
- nO 14 if c2 50c1
- nO 6 if c2 10c1
- nO 2 if c2 c1
- But remember that optimality is only one factor

in choosing sample sizes - Practicality and robustness of the sample (e.g.,

to attrition) are also important considerations

Two-level Hierarchical Design With Covariates

(HLM Notation)

- Level 1 (individual level)
- Yijk ß0j ß1jXijk eijk e N(0, sAW2)
- Level 2 (school Level)
- ß0j p00 p01Tj p02Wj ?0j ? N(0,

sAS2) - ß1j p10
- Note that the covariate effect ß1j p10 is a

fixed effect - If we code the treatment Tj ½ or - ½ , then the

parameters are identical to those in standard

ANCOVA

Precision in Two-level Hierarchical Design With

Covariates

- The standard error of the treatment effect
- SE decreases as m increases
- SE deceases as n increases, but only up to point
- SE increases as ? increases
- SE decreases as RW2 and RS2 increase

Power in Two-level Hierarchical Design With

Covariates

- Basic Idea
- Operational Effect Size (Effect Size) x (Design

Effect) - ?T d x (Design Effect)
- For the two-level hierarchical design with

covariates - The covariates increase the design effect

Power in Two-level Hierarchical Design With

Covariates

- As m and effect size increase, power increases
- Other influences occur through the design effect
- As ? increases the design effect (and power)

decrease - Now the maximum design effect as large n gets big

is - As the covariate-outcome correlations RW2 and RS2

increase, the design effect (and power) increases

Optimal Allocation in the Two-level Hierarchical

Design With Covariates

- Optimal allocation can also be computed when

there are covariates to give some guidance on

cluster size (n) - Suppose cost per individual is c1 and cost per

school is c2, so total cost is 2mc2 2mnc1 - Then the optimal cluster size
- gives the optimal n (most precision with smallest

cost)

Three-level Hierarchical Design

- Here there are three factors
- Treatment
- Schools (clusters) nested in treatments
- Classes (subclusters) nested in schools
- Suppose there are
- m schools (clusters) per treatment
- p classes (subclusters) per school (cluster)
- n students (individuals) per class (subcluster)

Three-level Hierarchical Design With No Covariates

- The statistical model for the observation on the

lth person in the kth class in the jth school in

the ith treatment is - Yijkl µ ai ßj(i) ?k(ij) eijkl
- where
- µ is the grand mean,
- ai is the average effect of being in treatment i,

- ßj(i) is the average effect of being in school j,

in treatment i - ?k(ij) is the average effect of being in class k

in treatment i, in school j, - eijkl is a residual

Three-level Hierarchical Design With No

Covariates (HLM Notation)

- Level 1 (individual level)
- Yijkl ß0jk eijkl e N(0, sW2)
- Level 2 (classroom level)
- ß0jk ?0j ?0jk ? N(0, sC2)
- Level 3 (school Level)
- ?0j p00 p01Tj ?0j ? N(0, sS2)
- If we code the treatment Tj ½ or - ½ , then
- p00 µ, p01 a1, ?0j ?k(ij), ?0jk ßj(i)

Three-level Hierarchical Design Intraclass

Correlations

- In three-level designs there are two levels of

clustering and two intraclass correlations - At the school (cluster) level
- At the classroom (subcluster) level

Precision in Three-level Hierarchical Design With

No Covariates

- The standard error of the treatment effect
- SE decreases as m increases
- SE deceases as p and n increase, but only up to

point - SE increases as ?S and ?C increase

Power in Three-level Hierarchical Design With No

Covariates

- Basic Idea
- Operational Effect Size (Effect Size) x (Design

Effect) - ?T d x (Design Effect)
- For the three-level hierarchical design with no

covariates - The operational sample size is the number of

schools

Power in Three-level Hierarchical Design With No

Covariates

- As m and the effect size increase, power

increases - Other influences occur through the design effect
- As ?S or ?C increases the design effect decreases
- No matter how large n gets the maximum design

effect is - Thus power only increases up to some limit as n

increases

Optimal Allocation in the Three-level

Hierarchical Design With No Covariates

- Optimal allocation can also be computed in three

level designs to give guidance on (p and n) - Suppose cost per individual is c1 , the cost per

class is c2, and the cost per school is c3, so

total cost is 2mc3 2mpc2 2mpnc1 - Then the optimal sample sizes size (most

precision with smallest cost) are - And

Three-level Hierarchical Design With Covariates

(HLM Notation)

- Level 1 (individual level)
- Yijkl ß0jk ß1jkXijkl eijkl e N(0,

sAW2) - Level 2 (classroom level)
- ß0jk ?00j ?01jZjk ?0jk ? N(0, sAC2)
- ß1jk ?10j
- Level 3 (school Level)
- ?00j p00 p01Tj p02Wj ?0j ? N(0,

sAS2) - ?01j p01
- ?10j p10
- The covariate effects ß1jk ?10j p10 and ?01j

p01 are fixed

Precision in Three-level Hierarchical Design With

Covariates

- SE decreases as m increases
- SE deceases as p and n increase, but only up to

point - SE increases as ?S and ?C increase
- SE decreases as RW2, RC2, and RS2 increase

Power in Three-level Hierarchical Design With

Covariates

- Basic Idea
- Operational Effect Size (Effect Size) x (Design

Effect) - ?T d x (Design Effect)
- For the three-level hierarchical design with

covariates - The operational sample size is the number of

schools

Power in Three-level Hierarchical Design With

Covariates

- As m and the effect size increase, power

increases - Other influences occur through the design effect
- As ?S or ?C increase the design effect decreases
- No matter how large n gets the maximum design

effect is - Thus power only increases up to some limit as n

increases

Optimal Allocation in the Three-level

Hierarchical Design With Covariates

- Optimal allocation can also be computed in three

level designs to give guidance on (p and n) - Suppose cost per individual is c1 , the cost per

class is c2, and the cost per school is c3, so

total cost is 2mc3 2mpc2 2mpnc1 - Then the optimal sample sizes size (most

precision with smallest cost) are - and
- .

Randomized Block Designs

Two-level Randomized Block Design With No

Covariates (HLM Notation)

- Level 1 (individual level)
- Yijk ß0j ß1jTijk eijk e N(0, sW2)
- Level 2 (school Level)
- ß0j p00 ?0j ?0j N(0, sS2)
- ß1j p10 ?1j ?1j N(0, sTxS2)
- If we code the treatment Tijk ½ or - ½ , then

the parameters are identical to those in standard

ANOVA

Randomized Block Designs

- In randomized block designs, as in hierarchical

designs, the intraclass correlation has an impact

on precision and power - However, in randomized block designs designs

there is also a parameter reflecting the degree

of heterogeneity of treatment effects across

schools - We define this heterogeneity parameter ?S in

terms of the amount of heterogeneity of treatment

effects relative to the heterogeneity of school

means - Thus
- ?S sTxS2/sS2

Randomized Block Designs

- There are other ways to express this

heterogeneity of treatment effect parameter - For example, (random effects) meta-analyses may

give you direct access to an estimate of the

varian