Statistical Methods in Computer Science - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Statistical Methods in Computer Science

Description:

Statistical Methods in Computer Science The Basis for Experiment Design Ido Dagan – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 28
Provided by: acil150
Category:

less

Transcript and Presenter's Notes

Title: Statistical Methods in Computer Science


1
Statistical Methods in Computer Science
  • The Basis for
  • Experiment Design
  • Ido Dagan

2
Experimental Lifecycle
  • Model/Theory

Analysis
Hypothesis
Experiment
3
Proving a Theory?
  • We've discussed 4 methods of proving a
    proposition
  • Everyone knows it
  • Someone specific says it
  • An experiment supports it
  • We can mathematically prove it
  • Some propositions cannot be verified empirically
  • This compiler has linear run-time
  • Infinite possible inputs --gt cannot prove
    empirically
  • But they may still be disproved
  • e.g., code that causes the compiler to run
    non-linearly

4
Karl Popper's Philosophy of Science
  • Popper advanced a particular philosophy of
    science
  • Falsifiability
  • For a theory to be considered scientific, it must
    be falsifiable
  • There must be some way to refute it, in principle
  • Not falsifiable ltgt Not scientific
  • Examples
  • All crows are black falsifiable by finding a
    white crow
  • Compile in linear time falsifiable by
    non-linear performance
  • Theory tested on its predictions

5
Proving by disproving...
  • Platt (Strong Inference, 1964) offers a
    specific method
  • Devise alternative hypotheses for observations
  • Devise experiment(s) allowing elimination of
    hypotheses
  • Carry out experiments to obtain a clean result
  • Go to 1.
  • The idea is to eliminate (falsify) hypotheses

6
Forming Hypotheses
  • So, to support theory X, we
  • Construct falsifiability hypotheses X1,.... Xn,
    ....
  • Systematically experiment to disprove X, by
    proving Xi
  • If all falsification hypotheses eliminated, then
    this lends support to the theory
  • Note that future falsification hypotheses may be
    formed
  • Theory must continue to hold against attacks
  • Popper Scientific evolution, survival of the
    fittest theory
  • E.g. Newtons theory
  • How does this view hold in computer science?

7
Forming Hypotheses in CS
  • Carefully identify the theoretical object we
    are studying
  • e.g., the relation between input-size and
    run-time is linear
  • e.g., the display improves user performance
  • Identify falsification hypothesis (null
    hypothesis) H0
  • e.g., there is an input-size for which run-time
    is non-linear
  • e.g., the display will have no effect on user
    performance
  • Now, experiment to eliminate H0

8
The Basics of Experiment Design
  • Experiments identify a relation between variables
    X, Y, ...
  • Simple experiments Provide indication of a
    relation
  • Better/worse, linear or non-linear, ....
  • Advanced experiments help identify causes,
    interactions
  • Linear in input size but constant factor depends
    on type of data

9
Types of Experiments and Variables
  • Manipulation experiments
  • Manipulate ( set value of) independent variables
    (input size)
  • Observe (measure value of) dependent variables
    (run time)
  • Observation experiments
  • Observe predictor variables (person height)
  • Observe response variables (running speed)
  • Also running time if observing system in actual
    use
  • Other variables
  • Endogenous On causal path between independent
    and dependent
  • Exogenous Other variables influencing dependent
    variables

10
An example of observation experiment
  • Theory Gender affects score performance
  • Falsifying hypothesis Gender does not affect
    performance
  • I.e. Men women perform the same
  • Cannot use manipulation experiments
  • Cannot control gender
  • Must use observation experiments

11
An example observation experiment(ala Empirical
methods in AI, Cohen 1995)
Siblings 2
Teacher's attitude
Mother artist
Test score 650
Gender Male
Child confidence
Height 145cm
Independent (Predictor) Variables
Siblings 3
Teacher's attitude
Mother Doctor
Test score 720
Gender Female
Child confidence
Height 135cm
12
An example observation experiment(ala Empirical
methods in AI, Cohen 1995)
Siblings 2
Teacher's attitude
Mother artist
Test score 650
Gender Male
Child confidence
Height 145cm
Dependent (Response) Variables
Siblings 3
Teacher's attitude
Mother Doctor
Test score 720
Gender Female
Child confidence
Height 135cm
13
An example observation experiment(ala Empirical
methods in AI, Cohen 1995)
Siblings 2
Teacher's attitude
Mother artist
Test score 650
Gender Male
Child confidence
Height 145cm
Endogenous Variables
Siblings 3
Teacher's attitude
Mother Doctor
Test score 720
Gender Female
Child confidence
Height 135cm
14
An example observation experiment(ala Empirical
methods in AI, Cohen 1995)
Siblings 2
Teacher's attitude
Mother artist
Test score 650
Gender Male
Child confidence
Height 145cm
Exogenous Variables
Siblings 3
Teacher's attitude
Mother Doctor
Test score 720
Gender Female
Child confidence
Height 135cm
15
Experiment Design Introduction
  • Different experiment types explore different
    hypotheses
  • For instance, a very simple design treatment
    experiment
  • Sometimes known as a lesion study
  • treatment Ind1 Ex1 Ex2 ....
    Exn Dep1
  • control Not(Ind1) Ex1 Ex2 ....
    Exn Dep2
  • Treatment condition
  • Independent variable set to with treatment
  • Control condition Independent var set to no
    treatment

Dependent Variable
Variables V0
V1
V2
...
Vn
16
Single-Factor Treatment Experiments
  • A generalization of treatment experiments
  • Allow comparison of different conditions
  • treatment1 Ind1 Ex1 Ex2 ....
    Exn Dep1
  • treatment2 Ind2 Ex1 Ex2
    .... Exn Dep2
  • control Not(Ind) Ex1 Ex2 ....
    Exn Dep3
  • Compare performance of algorithm A to B to C ....
  • Control condition Optional (e.g., to establish
    baseline)
  • Determine relation of categorical var V0 and the
    dependent var

Vn
V1
Dependent Variable
V2
V0
17
Careful !
  • An effect on the dependent variable may not be as
    expected
  • Example An experiment
  • Hypothesis fly's ear is on its wings
  • Fly with two wings. Make loud noise. Observe
    flight.
  • Fly with one wing. Make loud noise. No flight.
  • Conclusion Fly with only one wing cannot hear!
  • What's going on here?
  • First, interpretation by the experimenter
  • But also, lack of sufficient falsifiability
  • There are other possible explanations for why fly
    wouldn't fly.

18
Controlling for other factors
  • Often, we cannot manipulate all exogenous
    variables
  • Then, we need to make sure they are sampled
    randomly
  • Randomization averages out their affect
  • This can be difficult
  • e.g.,, suppose we are trying to relate gender and
    math
  • We control for effect of of siblings by random
    sampling
  • But of siblings may be related to gender
  • Parents continue to have children hoping for a
    boy (Beal 1994)
  • Thus of siblings tied with gender
  • Must separate results based on of siblings

19
Factorial Experiment Designs
  • Every combination of factor values is sampled
  • Hope is to exclude or reveal interactions
  • This creates a combinatorial number of
    experiments
  • N factors, k values each kN combinations
  • Strategies for eliminating values
  • Merge values, categories. Skip values.
  • Focus on extremes, to get a general trend
  • But may hide behavior at intermediate values

20
Tips for Factorial Experiments
  • For numerical variables, 2 value ranges are not
    enough
  • Don't give a good sense of the function relating
    variables.
  • Measure, measure, measure.
  • Piggybacking measurements on planned experiments
    cheaper than re-running experiments
  • Simplify comparisons
  • Use same number of data points (trials) for all
    configurations

21
Experiment Validity
  • Type of validity Internal and External validity
  • Internal validity
  • Experiment shows relationship (independent causes
    dependent)
  • External validity
  • Degree to which results generalize to other
    conditions
  • Threats uncontrolled conditions threatening
    validity

22
Internal validity threats Examples
  • Order effects
  • Practice effects in human or animal test subjects
  • E.g. user performance improves in user interface
    tasks
  • Solution randomize order of presentation to
    subjects
  • Bug or side-effects in testing system leaves
    system unclean for next trial need to clean
    system between experiments
  • If treatment/control given in two different
    orders
  • E.g. run with/without new algorithm operating,
    for same users
  • Order may be good for treatment, bad for control
    (or vice versa)
  • Solution counter-balancing (all possible orders)
  • Demand effects
  • Experimenter influences subject
  • e.g., guiding subjects
  • Confounding effects variable relations arent
    clear
  • See fly with no wings cannot hear

23
External threats to validity
  • Sampling bias Non-representative samples
  • e.g., non-representative external factors
  • Floor and ceiling effects
  • Problems tested too hard, too easy
  • Regression effects
  • Results have no way to go but up or down
  • Solution approach Run pilot experiments

24
Sampling Bias
  • Setting prefers measuring specific values over
    others
  • For instance
  • Random selection of mice from cage for
    experiment
  • Specific values slow, doesnt bite (not
    aggressive),
  • Including results that were found by some
    deadline
  • Solution Detect, and remove
  • e.g., by visualization, looking for non-normal
    distributions
  • e.g., surprising distribution of dependent data,
    for different values of independent variable.

25
Baselines Floor and Ceiling Effects
  • How do we know A is good? Bad?
  • Maybe the problems are too simple? Too hard?
  • For example
  • New machine learning algorithm has 95 accuracy
  • Is this good?
  • Controlling for Floor/Ceiling
  • Establish baselines
  • Show that a silly approach achieves close
    result
  • Comparison to strawman (easy), ironman (hard)
  • May be misleading if not chosen appropriately

26
Regression Effects
  • General phenomenon Regression towards the mean
  • Repeated measurement converges towards mean
    values
  • Example threat Run a program on 100 different
    inputs
  • Problems 6, 14, 15 get a very low score
  • We now fix the problem that affected only these
    inputs, and want to re-test
  • If chance has anything to do with scoring, then
    must re-run all
  • Why?
  • Scores on 6, 14, 15 has no where to go but up.
  • So re-running these problems will show
    improvement by chance
  • Solution
  • Re-run complete tests, or sample conditions
    uniformly

27
Summary
  • Defensive thinking
  • If I were trying to disprove the claim, what
    would I do
  • Then think ways to counter any possible attack on
    claim
  • Strong Inference, Popper's falsification ideas
  • Science moves by disproving theories
    (empirically)
  • Experiment design
  • Ideal independent variables easy to manipulate
  • Ideal dependent variables measurable, sensitive,
    and meaningful
  • Carefully think through threats
  • Next week Hypothesis testing
Write a Comment
User Comments (0)
About PowerShow.com