Loading...

PPT – Ockham PowerPoint presentation | free to download - id: d8202-NWE3N

The Adobe Flash plugin is needed to view this content

Ockhams Razor What it is, What it isnt, How

it works, and How it doesnt

- Kevin T. Kelly
- Department of Philosophy
- Carnegie Mellon University
- www.cmu.edu

Further Reading

Efficient Convergence Implies Ockham's Razor,

Proceedings of the 2002 International Workshop

on Computational Models of Scientific Reasoning

and Applications, Las Vegas, USA, June 24- 27,

2002. (with C. Glymour) Why Probability Does Not

Capture the Logic of Scientific Justification,

C. Hitchcock, ed., Contemporary Debates in the

Philosophy of Science, Oxford Blackwell,

2004. Justification as Truth-finding Efficiency

How Ockham's Razor Works, Minds and Machines

14 2004, pp. 485-505. Learning, Simplicity,

Truth, and Misinformation, The Philosophy of

Information, under review. Ockham's Razor,

Efficiency, and the Infinite Game of Science,

proceedings, Foundations of the Formal Sciences

2004 Infinite Game Theory, Springer, under

review.

Which Theory to Choose?

Compatible with data

???

Use Ockhams Razor

Complex

T1

T2

T3

Simple

Dilemma

- If you know the truth is simple,
- then you dont need Ockham.

Complex

T1

T2

T3

Simple

Dilemma

- If you dont know the truth is simple,
- then how could a fixed simplicity bias help you

if the truth is complex?

Complex

T1

T2

T3

Simple

T4

T5

Puzzle

- A fixed bias is like a broken thermometer.
- How could it possibly help you find unknown truth?

Cold!

I. Ockham Apologists

Wishful Thinking

- Simple theories are nice if true
- Testability
- Unity
- Best explanation
- Aesthetic appeal
- Compress data
- So is believing that you are the emperor.

Overfitting

- Maximum likelihood estimates based on overly

complex theories can have greater predictive

error (AIC, Cross-validation, etc.). - Same is true even if you know the true model is

complex. - Doesnt converge to true model.
- Depends on random data.

Thanks, but a simpler model still has lower

predictive error.

The truth is complex. -God-

.

.

.

.

Ignorance Knowledge

- Messy worlds are legion
- Tidy worlds are few.
- That is why the tidy worlds
- Are those most likely true. (Carnap)

unity

Ignorance Knowledge

Messy worlds are legion Tidy worlds are

few. That is why the tidy worlds Are those most

likely true. (Carnap)

1/3

1/3

1/3

Ignorance Knowledge

- Messy worlds are legion
- Tidy worlds are few.
- That is why the tidy worlds
- Are those most likely true. (Carnap)

2/6

2/6

1/6

1/6

Depends on Notation

- But mess depends on coding,
- which Goodman noticed, too.
- The picture is inverted if
- we translate green to grue.

2/6

2/6

Notation Indicates truth?

1/6

1/6

Same for Algorithmic Complexity

- Goodmans problem works against every fixed

simplicity ranking (independent of the processes

by which data are generated and coded prior to

learning). - Extra problem any pair-wise ranking of theories

can be reversed by choosing an alternative

computer language. - So how could simplicity help us find the true

theory?

Notation Indicates truth?

Just Beg the Question

- Assign high prior probability to simple theories.
- Why should you?
- Preference for complexity has the same

explanation.

You presume simplicity Therefore you should

presume simplicity!

Miracle Argument

- Simple data would be a miracle if a complex

theory were true (Bayes, BIC, Putnam).

Begs the Question

- Fairness between theories ?
- bias against complex worlds.

S

C

Two Can Play That Game

- Fairness between worlds ?
- bias against simple theory.

S

C

Convergence

- At least a simplicity bias doesnt prevent

convergence to the truth (MDL, BIC, Bayes, SGS,

etc.). - Neither do other biases.
- May as well recommend flat tires since they can

be fixed.

O

P

O

P

L

L

M

M

E

E

O

I

X

C

S

Does Ockham Have No Frock?

Ash Heap of History

Philosophers stone, Perpetual motion, Free

lunch Ockhams Razor???

. . .

II. How Ockham Helps You Find the Truth

What is Guidance?

- Indication or tracking
- Too strong
- Fixed bias cant indicate anything
- Convergence
- Too weak
- True of other biases
- Straightest convergence
- Just right?

C

S

S

C

C

S

A True Story

Niagara Falls

Clarion

Pittsburgh

A True Story

Niagara Falls

Clarion

Pittsburgh

A True Story

Niagara Falls

Clarion

Pittsburgh

A True Story

Niagara Falls

!

Clarion

Pittsburgh

A True Story

Niagara Falls

Clarion

Pittsburgh

A True Story

?

A True Story

A True Story

Ask directions!

A True Story

Wheres

What Does She Say?

Turn around. The freeway ramp is on the left.

You Have Better Ideas

Phooey! The Sun was on the right!

You Have Better Ideas

!!

You Have Better Ideas

You Have Better Ideas

You Have Better Ideas

Stay the Course!

Ahem

Stay the Course!

Stay the Course!

Dont Flip-flop!

Dont Flip-flop!

Dont Flip-flop!

Then Again

Then Again

Then Again

_at__at_!

One Good Flip Can Save a Lot of Flop

The U-Turn

The U-Turn

The U-Turn

The U-Turn

The U-Turn

The U-Turn

The U-Turn

Told ya!

The U-Turn

The U-Turn

The U-Turn

The U-Turn

The U-Turn

Your Route

Needless U-turn

The Best Route

Told ya!

The Best Route Anywhere from There

Told ya!

The Freeway to the Truth

Told ya!

- Fixed advice for all destinations
- Disregarding it entails an extra course reversal

The Freeway to the Truth

Told ya!

- even if the advice points away from the goal!

Counting Marbles

Counting Marbles

Counting Marbles

May come at any time

Counting Marbles

May come at any time

Counting Marbles

May come at any time

Counting Marbles

May come at any time

Counting Marbles

May come at any time

Counting Marbles

May come at any time

Counting Marbles

May come at any time

Ockhams Razor

- If you answer, answer with the current count.

3

?

Analogy

- Marbles detectable effects.
- Late appearance difficulty of detection.
- Count model (e.g., causal graph).
- Appearance times free parameters.

Analogy

- U-turn model revision (with content loss)
- Highway revision-efficient truth-finding

method.

T

T?

The U-turn Argument

- Suppose you converge to the truth but
- violate Ockhams razor along the way.

3

The U-turn Argument

- Where is that extra marble, anyway?

3

The U-turn Argument

- Its not coming, is it?

3

The U-turn Argument

- If you never say 2 youll never converge to the

truth.

3

The U-turn Argument

- Thats it. You should have listened to Ockham.

3

2

2

2

The U-turn Argument

- Oops! Well, no method is infallible!

3

2

2

2

The U-turn Argument

- If you never say 3, youll never converge to the

truth.

3

2

2

2

The U-turn Argument

- Embarrassing to be back at that old theory, eh?

3

2

2

2

3

The U-turn Argument

- And so forth

3

2

2

2

3

4

The U-turn Argument

- And so forth

3

2

2

2

3

4

5

The U-turn Argument

- And so forth

3

2

2

2

3

4

5

6

The U-turn Argument

- And so forth

3

2

2

2

3

4

5

6

7

The Score

- You

Subproblem

3

2

2

2

3

4

5

6

7

The Score

- Ockham

Subproblem

2

2

2

3

4

5

6

7

Ockham is Necessary

- If you converge to the truth,
- and
- you violate Ockhams razor
- then
- some convergent method beats your worst-case

revision bound in each answer in the subproblem

entered at the time of the violation.

Ockham is Sufficient

- If you converge to the truth,
- and
- you never violate Ockhams razor
- then
- You achieve the worst-case revision bound of each

convergent solution in each answer in each

subproblem.

Efficiency

- Efficiency achievement of the best worst-case

revision bound in each answer in each subproblem.

Ockham Efficiency Theorem

- Among the convergent methods
- Ockham Efficient!

Efficient

Inefficient

Mixed Strategies

- mixed strategy chance of output depends only on

actual experience. - convergence in probability chance of producing

true answer approaches 1 in the limit. - efficiency achievement of best worst-case

expected revision bound in each answer in each

subproblem.

Ockham Efficiency Theorem

- Among the mixed methods that converge in

probability - Ockham Efficient!

Efficient

Inefficient

Dominance and Support

- Every convergent method is weakly dominated in

revisions by a clone who says ? until stage n. - Convergence Must leap eventually.
- Efficiency Only leap to simplest.
- Dominance Could always wait longer.

Cant wait forever!

III. Ockham on Steroids

Ockham Wish List

- General definition of Ockhams razor.
- Compare revisions even when not bounded within

answers. - Prove theorem for arbitrary empirical problems.

Empirical Problems

- Problem partition of a topological space.
- Potential answers partition cells.
- Evidence open (verifiable) propositions.

Example Symmetry

Example Parameter Freeing

- Euclidean topology.
- Say which parameters are zero.
- Evidence open neighborhood.

- Curve fitting

a1

a1 0 a2 0

a1 gt 0 a2 0

a1 gt 0 a2 gt 0

a1 0 a2 gt 0

a2

0

The Players

- Scientist
- Produces an answer in response to current

evidence. - Demon
- Chooses evidence in response to scientists

choices

Winning

- Scientist wins
- by default if demon doesnt present an infinite

nested sequence of basic open sets whose

intersection is a singleton. - else by merit if scientist eventually always

produces the true answer for world selected by

demons choices.

Comparing Revisions

- One answer sequence maps into another iff
- there is an order and answer-preserving map from

the first to the second (? is wild). - Then the revisions of first are as good as those

of the second.

. . .

?

?

?

?

?

. . .

Comparing Revisions

- The revisions of the first are strictly better

if, in addition, the latter doesnt map back into

the former.

. . .

?

?

?

?

?

. . .

?

Comparing Methods

- F is as good as G iff
- each output sequence of F is as good as some

output sequence of G.

F

as good as

G

Comparing Methods

- F is better than G iff
- F is as good as G and
- G is not as good as F

F

not as good as

G

Comparing Methods

- F is strongly better than G iff each output

sequence of F is strictly better than an output

sequence of G but

strictly better than

Comparing Methods

- no output sequence of G is as good as any of F.

not as good as

Terminology

- Efficient solution as good as any solution in

any subproblem.

What Simplicity Isnt

Only by accident!!

- Syntactic length.
- Data-compression (MDL).
- Computational ease.
- Social entrenchment (Goodman).
- Number of free parameters (BIC, AIC).
- Euclidean dimensionality

What Simplicity Is

- Simpler theories are compatible with deeper

problems of induction.

Worst demon

Smaller demon

Problem of Induction

- No true information entails the true answer.
- Happens in answer boundaries.

Demonic Paths

A demonic path from w is a sequence of

alternating answers that a demon can force an

arbitrary convergent method through starting from

w.

01234

Simplicity Defined

The A-sequences are the demonic sequences

beginning with answer A. A is as simple as B iff

each B-sequence is as good as some A-sequence.

2, 3 2, 3, 4 2, 3, 4, 5

lt lt lt

3 3, 4 3, 4, 5

. . .

So 2 is simpler than 3!

Ockham Answer

- An answer as simple as any other answer.
- number of observed particles.

2, , n 2, , n, n1 2, , n, n1, n2

lt lt lt

n n, n1 n, n1, n2

. . .

So 2 is Ockham!

Ockham Lemma

A is Ockham iff for all demonic p, (Ap) some

demonic sequence.

I can force you through 2 but not through 3,2.

So 3 isnt Ockham

3

Ockham Answer

E.g. Only simplest curve compatible with data is

Ockham.

a1

Demonic sequence

Non-demonic sequences

a2

0

General Efficiency Theorem

- If the topology is metrizable and separable and

the question is countable then - Ockham Efficient.
- Proof uses Martins Borel Determinacy theorem.

Stacked Problems

- There is an Ockham answer at every stage.

1

Non-Ockham ? Strongly Worse

- If the problem is a stacked countable partition

over a restricted Polish space - Each Ockham solution is strongly better than each

non-Ockham solution in the subproblem entered at

the time of the violation.

Simplicity ? Low Dimension

- Suppose God says the true parameter value is

rational.

Simplicity ? Low Dimension

- Topological dimension and integration theory

dissolve. - Does Ockham?

Simplicity ? Low Dimension

- The proposed account survives in the preserved

limit point structure.

IV. Ockham and Symmetry

Respect for Symmetry

- If several simplest alternatives are available,

dont break the symmetry.

- Count the marbles of each color.
- You hear the first marble but dont see it.
- Why red rather than green?

Respect for Symmetry

- Before the noise, (0, 0) is Ockham.
- After the noise, no answer is Ockham

Demonic

Non-demonic

(0, 0)

(1, 0)

(1, 0) (0, 1)

(0, 1)

(0, 1) (1, 0)

Right!

Goodmans Riddle

- Count oneicles--- a oneicle is a particle at any

stage but one, when it is a non-particle. - Oneicle tranlation is auto-homeomorphism that

does not preserve the problem. - Unique Ockham answer is current oneicle count.
- Contradicts unique Ockham answer in particle

counting.

Supersymmetry

- Say when each particle appears.
- Refines counting problem.
- Every auto-homeomorphism preserves problem.
- No answer is Ockham.
- No solution is Ockham.
- No method is efficient.

Dual Supersymmetry

- Say only whether particle count is even or odd.
- Coarsens counting problem.
- Particle/Oneicle auto-homeomorphism preserves

problem. - Every answer is Ockham.
- Every solution is Ockham.
- Every solution is efficient.

Broken Symmetry

- Count the even or just report odd.
- Coarsens counting problem.
- Refines the even/odd problem.
- Unique Ockham answer at each stage.
- Exactly Ockham solutions are efficient.

Simplicity Under Refinement

Supersymmetry No answer is Ockham

Time of particle appearance

Particle counting

Oneicle counting

Twoicle counting

Broken symmetry Unique Ockham answer

Particle counting or odd particles

Oneicle counting or odd oneicles

Twoicle counting or odd twoicles

Dual supersymmetry Both answers are Ockham

Even/odd

Proposed Theory is Right

- Objective efficiency is grounded in problems.
- Symmetries in the preceding problems would wash

out stronger simplicity distinctions. - Hence, such distinctions would amount to mere

conventions (like coordinate axes) that couldnt

have anything to do with objective efficiency.

Furthermore

- If Ockhams razor is forced to choose in the

supersymmetrical problems then either - following Ockhams razor increases revisions in

some counting problems - Or
- Ockhams razor leads to contradictions as a

problem is coarsened or refined.

V. Conclusion

What Ockhams Razor Is

- Only output Ockham answers
- Ockham answer a topological invariant of the

empirical problem addressed.

What it Isnt

- preference for
- brevity,
- computational ease,
- entrenchment,
- past success,
- Kolmogorov complexity,
- dimensionality, etc.

How it Works

- Ockhams razor is necessary for mininizing

revisions prior to convergence to the truth.

How it Doesnt

- No possible method could
- Point at the truth
- Indicate the truth
- Bound the probability of error
- Bound the number of future revisions.

Spooky Ockham

- Science without support or safety nets.

Spooky Ockham

- Science without support or safety nets.

Spooky Ockham

- Science without support or safety nets.

Spooky Ockham

- Science without support or safety nets.

VI. Stochastic Ockham

Mixed Strategies

- mixed strategy chance of output depends only on

actual experience.

e

Pe(M H at n) Pen(M H at n).

Stochastic Case

- Ockham
- at each stage, you produce a non-Ockham answer

with prob 0. - Efficiency
- achievement of the best worst-case expected

revision bound in each answer in each subproblem

over all methods that converge to the truth in

probability.

Stochastic Efficiency Theorem

- Among the stochastic methods that converge in

probability, Ockham Efficient!

Efficient

Inefficient

Stochastic Methods

- Your chance of producing an answer is a function

of observations made so far.

2

p

Urn selected in light of observations.

Stochastic U-turn Argument

- Suppose you converge in probability to the truth

but produce a non-Ockham answer with prob gt 0.

3

r gt 0

Stochastic U-turn Argument

- Choose small e gt 0. Consider answer 4.

3

r gt 0

Stochastic U-turn Argument

- By convergence in probability to the truth

3

r gt 0

2

p gt 1 - e/3

Stochastic U-turn Argument

- Etc.

3

r gt 0

2

3

4

pgt 1-e/3

p gt 1-e/3

p gt 1-e/3

Stochastic U-turn Argument

- Since e can be chosen arbitrarily small,
- sup prob of 3 revisions r.
- sup prob of 2 revisions 1

3

r gt 0

2

3

4

pgt 1-e/3

p gt 1-e/3

p gt 1-e/3

Stochastic U-turn Argument

- So sup Exp revisions is 2 3r.
- But for Ockham 2.

3

r gt 0

2

3

4

pgt 1-e/3

p gt 1-e/3

p gt 1-e/3

Subproblem

VII. Statistical Inference

(Beta Version)

The Statistical Puzzle of Simplicity

- Assume Normal distribution, s 1, m? 0.
- Question m? 0 or m?gt 0 ?
- Intuition m? 0 is simpler than m?gt 0 .

m 0

mean

Analogy

- Marbles potentially small effects
- Time sample size
- Simplicity fewer free parameters tied to

potential effects - Counting freeing parameters in a model

U-turn in Probability

- Convergence in probability chance of producing

true model goes to unity

- Retraction in probability chance of producing a

model drops from above r gt .5 to below 1 r.

1

r

Chance of producing true model

Chance of producing alternative model

1 - r

0

Sample size

Suppose You (Probably) Choose a Model More

Complex than the Truth

m 0

mean

m gt 0

Revision Counter 0

gt r

sample mean

zone for choosing m gt 0

Eventually You Retract to the Truth (In

Probability)

m 0

mean

Revision Counter 1

gt r

sample mean

zone for choosing m 0

So You (Probably) Output an Overly Simple Model

Nearby

m gt 0

mean

Revision Counter 1

gt r

sample mean

zone for choosing m 0

Eventually You Retract to the Truth (In

Probability)

m gt 0

mean

Revision Counter 2

gt r

sample mean

zone for choosing m gt 0

But Standard (Ockham) Testing Practice Requires

Just One Retraction!

m 0

mean

Revision Counter 0

gt r

sample mean

zone for choosing m 0

In The Simplest World, No Retractions

m 0

mean

Revision Counter 0

gt r

sample mean

zone for choosing m 0

In The Simplest World, No Retractions

m 0

mean

Revision Counter 0

gt r

sample mean

zone for choosing m 0

In Remaining Worlds, at Most One Retraction

m gt 0

mean

Revision Counter 0

gt r

zone for choosing m gt 0

In Remaining Worlds, at Most One Retraction

m gt 0

mean

Revision Counter 0

zone for choosing m gt 0

In Remaining Worlds, at Most One Retraction

m gt 0

mean

Revision Counter 1

gt r

zone for choosing m gt 0

So Ockham Beats All Violators

- Ockham at most one revision.
- Violator at least two revisions in worst case

Summary

- Standard practice is to test the point

hypothesis rather than the composite alternative. - This amounts to favoring the simple hypothesis

a priori. - It also minimizes revisions in probability!

Two Dimensional Example

- Assume independent bivariate normal distribution

of unit variance. - Question how many components of the joint mean

are zero? - Intuition more nonzeros more complex
- Puzzle How does it help to favor simplicity in

less-than-simplest worlds?

A Real Model Selection Method

- Bayes Information Criterion (BIC)
- BIC(M, sample)
- - log(max prob that M can assign to sample)
- log(sample size) ?? model complexity ? ½.
- BIC method choose M with least BIC score.

Official BIC Property

- In the limit, minimizing BIC finds a model with

maximal conditional probability when the prior

probability is flat over models and fairly flat

over parameters within a model. - But it is also revision-efficient.

AIC in Simplest World

- n 2
- m (0, 0).

- Retractions 0

Simple

Complex

AIC in Simplest World

- n 100
- m (0, 0).

- Retractions 0

Simple

Complex

AIC in Simplest World

- n 4,000,000
- m (0, 0).

- Retractions 0

Simple

Complex

BIC in Simplest World

- n 2
- m (0, 0).

- Retractions 0

Simple

Complex

BIC in Simplest World

- n 100
- m (0, 0).

- Retractions 0

Simple

Complex

BIC in Simplest World

- n 4,000,000
- m (0, 0).

- Retractions 0

Simple

Complex

BIC in Simplest World

- n 20,000,000
- m (0, 0).

- Retractions 0

Simple

Complex

Performance in Complex World

- n 2
- m (.05, .005).

- Retractions 0

Simple

Complex

95

Performance in Complex World

- n 100
- m (.05, .005).

- Retractions 0

Simple

Complex

Performance in Complex World

- n 30,000
- m (.05, .005).

- Retractions 1

Simple

Complex

Performance in Complex World

- n 4,000,000 (!)
- m (.05, .005).

- Retractions 2

Simple

Complex

Question

- Does the statistical retraction minimization

story extend to violations in less-than-simplest

worlds? - Recall that the deterministic argument for higher

retractions required the concept of minimizing

retractions in each subproblem. - A subproblem is a proposition verified at a

given time in a given world. - Some analogue in probability is required.

Subproblem.

- H is an a -subroblem in w at n
- There is a likelihood ratio test of w at

significance lt a such that this test has power lt

1 - a at each world in H.

worlds

H

w

sample size n

gt 1- a

gt a

gt a

reject

reject

accept

Significance Schedules

- A significance schedule a(.) is a monotone

decreasing sequence of significance levels

converging to zero that drop so slowly that power

can be increased monotonically with sample size.

n1

n

a(n1)

a(n)

Ockham Violation ? Inefficient

Subproblem At sample size n

(mX, mY)

Ockham Violation ? Inefficient

Subproblem At sample size n

(mX, mY)

Ockham violation Probably say blue hypothesis at

white world (p gt r)

Ockham Violation ? Inefficient

Subproblem at time of violation

(mX, mY)

Probably say blue

Probably say white

Ockham Violation ? Inefficient

Subproblem at time of violation

(mX, mY)

Probably say blue

Probably say white

Ockham Violation ? Inefficient

Subproblem at time of violation

(mX, mY)

Probably say blue

Probably say white

Probably say blue

Oops! Ockham ? Inefficient

(mX, mY)

Subproblem

Oops! Ockham ? Inefficient

(mX, mY)

Subproblem

Oops! Ockham ? Inefficient

(mX, mY)

Subproblem

Oops! Ockham ? Inefficient

(mX, mY)

Subproblem

Oops! Ockham ? Inefficient

(mX, mY)

Subproblem

Two retractions

Local Retraction Efficiency

- Ockham does as well as best subproblem

performance in some neighborhood of w.

(mX, mY)

Subproblem

At most one retraction

Two retractions

Ockham Violation ? Inefficient

- Note no neighborhood around w avoids extra

retractions.

Subproblem at time of violation

(mX, mY)

w

Gonzo Ockham ? Inefficient

- Gonzo probably saying simplest answer in entire

subproblem entered in simplest world.

(mX, mY)

Subproblem

Balance

- Be Ockham (avoid complexity)
- Dont be Gonzo Ockham (avoid bad fit).

- Truth-directed sole aim is to find true model

with minimal revisions! - No circles totally worst-case no prior bias

toward simple worlds.

THE END