Statistical Methods in Computer Science presentation

About This Presentation

Transcript and Presenter's Notes

Title: Statistical Methods in Computer Science

1
Statistical Methods in Computer Science

The Basis for
Experiment Design
Ido Dagan

2
Experimental Lifecycle

Model/Theory

Analysis
Hypothesis
Experiment
3
Proving a Theory?

We've discussed 4 methods of proving a
proposition
Everyone knows it
Someone specific says it
An experiment supports it
We can mathematically prove it
Some propositions cannot be verified empirically
This compiler has linear run-time
Infinite possible inputs --gt cannot prove
empirically
But they may still be disproved
e.g., code that causes the compiler to run
non-linearly

4
Karl Popper's Philosophy of Science

Popper advanced a particular philosophy of
science
Falsifiability
For a theory to be considered scientific, it must
be falsifiable
There must be some way to refute it, in principle
Not falsifiable ltgt Not scientific
Examples
All crows are black falsifiable by finding a
white crow
Compile in linear time falsifiable by
non-linear performance
Theory tested on its predictions

5
Proving by disproving...

Platt (Strong Inference, 1964) offers a
specific method
Devise alternative hypotheses for observations
Devise experiment(s) allowing elimination of
hypotheses
Carry out experiments to obtain a clean result
Go to 1.
The idea is to eliminate (falsify) hypotheses

6
Forming Hypotheses

So, to support theory X, we
Construct falsifiability hypotheses X1,.... Xn,
....
Systematically experiment to disprove X, by
proving Xi
If all falsification hypotheses eliminated, then
this lends support to the theory
Note that future falsification hypotheses may be
formed
Theory must continue to hold against attacks
Popper Scientific evolution, survival of the
fittest theory
E.g. Newtons theory
How does this view hold in computer science?

7
Forming Hypotheses in CS

Carefully identify the theoretical object we
are studying
e.g., the relation between input-size and
run-time is linear
e.g., the display improves user performance
Identify falsification hypothesis (null
hypothesis) H0
e.g., there is an input-size for which run-time
is non-linear
e.g., the display will have no effect on user
performance
Now, experiment to eliminate H0

8
The Basics of Experiment Design

Experiments identify a relation between variables
X, Y, ...
Simple experiments Provide indication of a
relation
Better/worse, linear or non-linear, ....
Advanced experiments help identify causes,
interactions
Linear in input size but constant factor depends
on type of data

9
Types of Experiments and Variables

Manipulation experiments
Manipulate ( set value of) independent variables
(input size)
Observe (measure value of) dependent variables
(run time)
Observation experiments
Observe predictor variables (person height)
Observe response variables (running speed)
Also running time if observing system in actual
use
Other variables
Endogenous On causal path between independent
and dependent
Exogenous Other variables influencing dependent
variables

10
An example of observation experiment

Theory Gender affects score performance
Falsifying hypothesis Gender does not affect
performance
I.e. Men women perform the same
Cannot use manipulation experiments
Cannot control gender
Must use observation experiments

11
An example observation experiment(ala Empirical
methods in AI, Cohen 1995)
Siblings 2
Teacher's attitude
Mother artist
Test score 650
Gender Male
Child confidence
Height 145cm
Independent (Predictor) Variables
Siblings 3
Teacher's attitude
Mother Doctor
Test score 720
Gender Female
Child confidence
Height 135cm
12
An example observation experiment(ala Empirical
methods in AI, Cohen 1995)
Siblings 2
Teacher's attitude
Mother artist
Test score 650
Gender Male
Child confidence
Height 145cm
Dependent (Response) Variables
Siblings 3
Teacher's attitude
Mother Doctor
Test score 720
Gender Female
Child confidence
Height 135cm
13
An example observation experiment(ala Empirical
methods in AI, Cohen 1995)
Siblings 2
Teacher's attitude
Mother artist
Test score 650
Gender Male
Child confidence
Height 145cm
Endogenous Variables
Siblings 3
Teacher's attitude
Mother Doctor
Test score 720
Gender Female
Child confidence
Height 135cm
14
An example observation experiment(ala Empirical
methods in AI, Cohen 1995)
Siblings 2
Teacher's attitude
Mother artist
Test score 650
Gender Male
Child confidence
Height 145cm
Exogenous Variables
Siblings 3
Teacher's attitude
Mother Doctor
Test score 720
Gender Female
Child confidence
Height 135cm
15
Experiment Design Introduction

Different experiment types explore different
hypotheses
For instance, a very simple design treatment
experiment
Sometimes known as a lesion study
treatment Ind1 Ex1 Ex2 ....
Exn Dep1
control Not(Ind1) Ex1 Ex2 ....
Exn Dep2
Treatment condition
Independent variable set to with treatment
Control condition Independent var set to no
treatment

Dependent Variable
Variables V0
V1
V2
...
Vn
16
Single-Factor Treatment Experiments

A generalization of treatment experiments
Allow comparison of different conditions
treatment1 Ind1 Ex1 Ex2 ....
Exn Dep1
treatment2 Ind2 Ex1 Ex2
.... Exn Dep2
control Not(Ind) Ex1 Ex2 ....
Exn Dep3
Compare performance of algorithm A to B to C ....
Control condition Optional (e.g., to establish
baseline)
Determine relation of categorical var V0 and the
dependent var

Vn
V1
Dependent Variable
V2
V0
17
Careful !

An effect on the dependent variable may not be as
expected
Example An experiment
Hypothesis fly's ear is on its wings
Fly with two wings. Make loud noise. Observe
flight.
Fly with one wing. Make loud noise. No flight.
Conclusion Fly with only one wing cannot hear!
What's going on here?
First, interpretation by the experimenter
But also, lack of sufficient falsifiability
There are other possible explanations for why fly
wouldn't fly.

18
Controlling for other factors

Often, we cannot manipulate all exogenous
variables
Then, we need to make sure they are sampled
randomly
Randomization averages out their affect
This can be difficult
e.g.,, suppose we are trying to relate gender and
math
We control for effect of of siblings by random
sampling
But of siblings may be related to gender
Parents continue to have children hoping for a
boy (Beal 1994)
Thus of siblings tied with gender
Must separate results based on of siblings

19
Factorial Experiment Designs

Every combination of factor values is sampled
Hope is to exclude or reveal interactions
This creates a combinatorial number of
experiments
N factors, k values each kN combinations
Strategies for eliminating values
Merge values, categories. Skip values.
Focus on extremes, to get a general trend
But may hide behavior at intermediate values

20
Tips for Factorial Experiments

For numerical variables, 2 value ranges are not
enough
Don't give a good sense of the function relating
variables.
Measure, measure, measure.
Piggybacking measurements on planned experiments
cheaper than re-running experiments
Simplify comparisons
Use same number of data points (trials) for all
configurations

21
Experiment Validity

Type of validity Internal and External validity
Internal validity
Experiment shows relationship (independent causes
dependent)
External validity
Degree to which results generalize to other
conditions
Threats uncontrolled conditions threatening
validity

22
Internal validity threats Examples

Order effects
Practice effects in human or animal test subjects
E.g. user performance improves in user interface
tasks
Solution randomize order of presentation to
subjects
Bug or side-effects in testing system leaves
system unclean for next trial need to clean
system between experiments
If treatment/control given in two different
orders
E.g. run with/without new algorithm operating,
for same users
Order may be good for treatment, bad for control
(or vice versa)
Solution counter-balancing (all possible orders)
Demand effects
Experimenter influences subject
e.g., guiding subjects
Confounding effects variable relations arent
clear
See fly with no wings cannot hear

23
External threats to validity

Sampling bias Non-representative samples
e.g., non-representative external factors
Floor and ceiling effects
Problems tested too hard, too easy
Regression effects
Results have no way to go but up or down
Solution approach Run pilot experiments

24
Sampling Bias

Setting prefers measuring specific values over
others
For instance
Random selection of mice from cage for
experiment
Specific values slow, doesnt bite (not
aggressive),
Including results that were found by some
deadline
Solution Detect, and remove
e.g., by visualization, looking for non-normal
distributions
e.g., surprising distribution of dependent data,
for different values of independent variable.

25
Baselines Floor and Ceiling Effects

How do we know A is good? Bad?
Maybe the problems are too simple? Too hard?
For example
New machine learning algorithm has 95 accuracy
Is this good?
Controlling for Floor/Ceiling
Establish baselines
Show that a silly approach achieves close
result
Comparison to strawman (easy), ironman (hard)
May be misleading if not chosen appropriately

26
Regression Effects

General phenomenon Regression towards the mean
Repeated measurement converges towards mean
values
Example threat Run a program on 100 different
inputs
Problems 6, 14, 15 get a very low score
We now fix the problem that affected only these
inputs, and want to re-test
If chance has anything to do with scoring, then
must re-run all
Why?
Scores on 6, 14, 15 has no where to go but up.
So re-running these problems will show
improvement by chance
Solution
Re-run complete tests, or sample conditions
uniformly

27
Summary

Defensive thinking
If I were trying to disprove the claim, what
would I do
Then think ways to counter any possible attack on
claim
Strong Inference, Popper's falsification ideas
Science moves by disproving theories
(empirically)
Experiment design
Ideal independent variables easy to manipulate
Ideal dependent variables measurable, sensitive,
and meaningful
Carefully think through threats
Next week Hypothesis testing

Write a Comment

User Comments (0)

About PowerShow.com

Statistical Methods in Computer Science PowerPoint PPT Presentation