# Statistical Methods in Computer Science - PowerPoint PPT Presentation

PPT – Statistical Methods in Computer Science PowerPoint presentation | free to download - id: 70352c-ZmYwN

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

## Statistical Methods in Computer Science

Description:

### Statistical Methods in Computer Science The Basis for Experiment Design Ido Dagan – PowerPoint PPT presentation

Number of Views:8
Avg rating:3.0/5.0
Slides: 28
Provided by: acil150
Category:
Transcript and Presenter's Notes

Title: Statistical Methods in Computer Science

1
Statistical Methods in Computer Science
• The Basis for
• Experiment Design
• Ido Dagan

2
Experimental Lifecycle
• Model/Theory

Analysis
Hypothesis
Experiment
3
Proving a Theory?
• We've discussed 4 methods of proving a
proposition
• Everyone knows it
• Someone specific says it
• An experiment supports it
• We can mathematically prove it
• Some propositions cannot be verified empirically
• This compiler has linear run-time
• Infinite possible inputs --gt cannot prove
empirically
• But they may still be disproved
• e.g., code that causes the compiler to run
non-linearly

4
Karl Popper's Philosophy of Science
• Popper advanced a particular philosophy of
science
• Falsifiability
• For a theory to be considered scientific, it must
be falsifiable
• There must be some way to refute it, in principle
• Not falsifiable ltgt Not scientific
• Examples
• All crows are black falsifiable by finding a
white crow
• Compile in linear time falsifiable by
non-linear performance
• Theory tested on its predictions

5
Proving by disproving...
• Platt (Strong Inference, 1964) offers a
specific method
• Devise alternative hypotheses for observations
• Devise experiment(s) allowing elimination of
hypotheses
• Carry out experiments to obtain a clean result
• Go to 1.
• The idea is to eliminate (falsify) hypotheses

6
Forming Hypotheses
• So, to support theory X, we
• Construct falsifiability hypotheses X1,.... Xn,
....
• Systematically experiment to disprove X, by
proving Xi
• If all falsification hypotheses eliminated, then
this lends support to the theory
• Note that future falsification hypotheses may be
formed
• Theory must continue to hold against attacks
• Popper Scientific evolution, survival of the
fittest theory
• E.g. Newtons theory
• How does this view hold in computer science?

7
Forming Hypotheses in CS
• Carefully identify the theoretical object we
are studying
• e.g., the relation between input-size and
run-time is linear
• e.g., the display improves user performance
• Identify falsification hypothesis (null
hypothesis) H0
• e.g., there is an input-size for which run-time
is non-linear
• e.g., the display will have no effect on user
performance
• Now, experiment to eliminate H0

8
The Basics of Experiment Design
• Experiments identify a relation between variables
X, Y, ...
• Simple experiments Provide indication of a
relation
• Better/worse, linear or non-linear, ....
• Advanced experiments help identify causes,
interactions
• Linear in input size but constant factor depends
on type of data

9
Types of Experiments and Variables
• Manipulation experiments
• Manipulate ( set value of) independent variables
(input size)
• Observe (measure value of) dependent variables
(run time)
• Observation experiments
• Observe predictor variables (person height)
• Observe response variables (running speed)
• Also running time if observing system in actual
use
• Other variables
• Endogenous On causal path between independent
and dependent
• Exogenous Other variables influencing dependent
variables

10
An example of observation experiment
• Theory Gender affects score performance
• Falsifying hypothesis Gender does not affect
performance
• I.e. Men women perform the same
• Cannot use manipulation experiments
• Cannot control gender
• Must use observation experiments

11
An example observation experiment(ala Empirical
methods in AI, Cohen 1995)
Siblings 2
Teacher's attitude
Mother artist
Test score 650
Gender Male
Child confidence
Height 145cm
Independent (Predictor) Variables
Siblings 3
Teacher's attitude
Mother Doctor
Test score 720
Gender Female
Child confidence
Height 135cm
12
An example observation experiment(ala Empirical
methods in AI, Cohen 1995)
Siblings 2
Teacher's attitude
Mother artist
Test score 650
Gender Male
Child confidence
Height 145cm
Dependent (Response) Variables
Siblings 3
Teacher's attitude
Mother Doctor
Test score 720
Gender Female
Child confidence
Height 135cm
13
An example observation experiment(ala Empirical
methods in AI, Cohen 1995)
Siblings 2
Teacher's attitude
Mother artist
Test score 650
Gender Male
Child confidence
Height 145cm
Endogenous Variables
Siblings 3
Teacher's attitude
Mother Doctor
Test score 720
Gender Female
Child confidence
Height 135cm
14
An example observation experiment(ala Empirical
methods in AI, Cohen 1995)
Siblings 2
Teacher's attitude
Mother artist
Test score 650
Gender Male
Child confidence
Height 145cm
Exogenous Variables
Siblings 3
Teacher's attitude
Mother Doctor
Test score 720
Gender Female
Child confidence
Height 135cm
15
• Different experiment types explore different
hypotheses
• For instance, a very simple design treatment
experiment
• Sometimes known as a lesion study
• treatment Ind1 Ex1 Ex2 ....
Exn Dep1
• control Not(Ind1) Ex1 Ex2 ....
Exn Dep2
• Treatment condition
• Independent variable set to with treatment
• Control condition Independent var set to no
treatment

Dependent Variable
Variables V0
V1
V2
...
Vn
16
Single-Factor Treatment Experiments
• A generalization of treatment experiments
• Allow comparison of different conditions
• treatment1 Ind1 Ex1 Ex2 ....
Exn Dep1
• treatment2 Ind2 Ex1 Ex2
.... Exn Dep2
• control Not(Ind) Ex1 Ex2 ....
Exn Dep3
• Compare performance of algorithm A to B to C ....
• Control condition Optional (e.g., to establish
baseline)
• Determine relation of categorical var V0 and the
dependent var

Vn
V1
Dependent Variable
V2
V0
17
Careful !
• An effect on the dependent variable may not be as
expected
• Example An experiment
• Hypothesis fly's ear is on its wings
• Fly with two wings. Make loud noise. Observe
flight.
• Fly with one wing. Make loud noise. No flight.
• Conclusion Fly with only one wing cannot hear!
• What's going on here?
• First, interpretation by the experimenter
• But also, lack of sufficient falsifiability
• There are other possible explanations for why fly
wouldn't fly.

18
Controlling for other factors
• Often, we cannot manipulate all exogenous
variables
• Then, we need to make sure they are sampled
randomly
• Randomization averages out their affect
• This can be difficult
• e.g.,, suppose we are trying to relate gender and
math
• We control for effect of of siblings by random
sampling
• But of siblings may be related to gender
• Parents continue to have children hoping for a
boy (Beal 1994)
• Thus of siblings tied with gender
• Must separate results based on of siblings

19
Factorial Experiment Designs
• Every combination of factor values is sampled
• Hope is to exclude or reveal interactions
• This creates a combinatorial number of
experiments
• N factors, k values each kN combinations
• Strategies for eliminating values
• Merge values, categories. Skip values.
• Focus on extremes, to get a general trend
• But may hide behavior at intermediate values

20
Tips for Factorial Experiments
• For numerical variables, 2 value ranges are not
enough
• Don't give a good sense of the function relating
variables.
• Measure, measure, measure.
• Piggybacking measurements on planned experiments
cheaper than re-running experiments
• Simplify comparisons
• Use same number of data points (trials) for all
configurations

21
Experiment Validity
• Type of validity Internal and External validity
• Internal validity
• Experiment shows relationship (independent causes
dependent)
• External validity
• Degree to which results generalize to other
conditions
• Threats uncontrolled conditions threatening
validity

22
Internal validity threats Examples
• Order effects
• Practice effects in human or animal test subjects
• E.g. user performance improves in user interface
• Solution randomize order of presentation to
subjects
• Bug or side-effects in testing system leaves
system unclean for next trial need to clean
system between experiments
• If treatment/control given in two different
orders
• E.g. run with/without new algorithm operating,
for same users
• Order may be good for treatment, bad for control
(or vice versa)
• Solution counter-balancing (all possible orders)
• Demand effects
• Experimenter influences subject
• e.g., guiding subjects
• Confounding effects variable relations arent
clear
• See fly with no wings cannot hear

23
External threats to validity
• Sampling bias Non-representative samples
• e.g., non-representative external factors
• Floor and ceiling effects
• Problems tested too hard, too easy
• Regression effects
• Results have no way to go but up or down
• Solution approach Run pilot experiments

24
Sampling Bias
• Setting prefers measuring specific values over
others
• For instance
• Random selection of mice from cage for
experiment
• Specific values slow, doesnt bite (not
aggressive),
• Including results that were found by some
• Solution Detect, and remove
• e.g., by visualization, looking for non-normal
distributions
• e.g., surprising distribution of dependent data,
for different values of independent variable.

25
Baselines Floor and Ceiling Effects
• How do we know A is good? Bad?
• Maybe the problems are too simple? Too hard?
• For example
• New machine learning algorithm has 95 accuracy
• Is this good?
• Controlling for Floor/Ceiling
• Establish baselines
• Show that a silly approach achieves close
result
• Comparison to strawman (easy), ironman (hard)
• May be misleading if not chosen appropriately

26
Regression Effects
• General phenomenon Regression towards the mean
• Repeated measurement converges towards mean
values
• Example threat Run a program on 100 different
inputs
• Problems 6, 14, 15 get a very low score
• We now fix the problem that affected only these
inputs, and want to re-test
• If chance has anything to do with scoring, then
must re-run all
• Why?
• Scores on 6, 14, 15 has no where to go but up.
• So re-running these problems will show
improvement by chance
• Solution
• Re-run complete tests, or sample conditions
uniformly

27
Summary
• Defensive thinking
• If I were trying to disprove the claim, what
would I do
• Then think ways to counter any possible attack on
claim
• Strong Inference, Popper's falsification ideas
• Science moves by disproving theories
(empirically)
• Experiment design
• Ideal independent variables easy to manipulate
• Ideal dependent variables measurable, sensitive,
and meaningful
• Carefully think through threats
• Next week Hypothesis testing