Why Machine Intelligence is Very Hard - PowerPoint PPT Presentation

About This Presentation

Title:

Why Machine Intelligence is Very Hard

Description:

Some tasks that are quite easy for humans but very hard for computers. ... and include men and women of different races with different hairstyles, etc. ... – PowerPoint PPT presentation

Number of Views:58

Avg rating:3.0/5.0

Slides: 48

Provided by: theopa

Category:

more less

Transcript and Presenter's Notes

Title: Why Machine Intelligence is Very Hard

1
Why Machine Intelligence is Very Hard

Theo Pavlidis
Distinguished Professor Emeritus
Dept. of Computer Science
t.pavlidis_at_ieee.org
http//theopavlidis.com

2
Limitations of Computers

Some tasks (e.g. number factorization) are very
hard for computers (unless it is proven that NP
P), but they are also very hard for humans.
Some tasks that are quite easy for humans but
very hard for computers.
Examples language translation, image analysis or
understanding, speech recognition, game playing,
etc. (Often grouped under Artificial Intelligence
AI).
Why are they hard?

3
The State of Machine Vision

There have seen some successes, notably in
industrial inspection and reading of printed text
but a lot of problems remain open.
Reading distorted text (CAPTCHA) is so hard that
it is used as a security device.
Content Based Image Retrieval (CBIR) is
hopelessly behind content based text retrieval.
Face recognition programs are known mainly for
their failure to perform outside the laboratory.

4
CAPTCHA

CompletelyAutomatedPublicTuring test to
tellComputers andHumansApart

5
Content-based Image Retrieval(CBIR)

Given an image find those that are similar to it
from a data base of images. (If the images are
labeled, the problem is reduced to text search.)
Many systems have been advertised but they do
well only on rather trivial queries.
This should be contrasted with the success of
text retrieval, not only Google but earlier
programs such as the Unix grep.

6
Example - 1
7
Example - 2
8
Reasons for the Poor Results in Machine Vision
and CBIR

Images are represented by statistics of pixel
values (e.g. color histogram, texture histogram,
etc)
Such statistics are unrelated to human
perception.
Papers describing CBIR methods use trivial
queries (e.g. show me all pictures with a lot of
green).

9
Perceptual versus Computational Similarity

Two pictures may differ a lot in their pixel
values but appear similar to a person. (They
have the same meaning.)
Two pictures may differ in very few pixels but
they have different meaning. (Face portraits of
two different people in front of the same
background.)

10
Perceptual versus Computational Similarity
Perceptually close
Pixel-wise close
11
Text versus Pictures

In text files each byte (or two) is a numerical
code for a character. Therefore strings of bytes
correspond to words that carry semantic meaning.
In pictures each byte (or group thereof)
represents the color at a particular location
(pixel). Pixels are quite far from the components
that have a semantic meaning.

12
We do not that well in text!

If it is hard to search for concepts unless we
can map concepts into words.
Example 1 Find all articles critical of the
government policy in dealing with the banking
crisis.
Example 2 Find all articles about a dog named
Lucy. Amongst the Google returns was an article
with the phrase Lucy and I spent the weekend
alone together. We have a dog named Kyler.

13
Human Intelligence made simple
Input
Concept
Input
Output
14
The Big Difference

The transformation of input to concept is a
complex process (binding), barely understood by
neuroscientists. (In spite of claims to the
opposite by some computer scientists.)
It is hard to develop algorithms for a barely
understood process.
Humans can transform concepts into formal
entities (words in a language) and then code them
in computer readable form.
Computers can deal with such formal input.

15
What Neuroscientist Say

Perceptions emerge as a result of reverberations
of signals between different levels of the
sensory hierarchy, indeed across different
senses. The author then goes on to criticize the
view that sensory processing involves a one-way
cascade of information (processing)
Source V.S. Ramachandran and S. Blakeslee
Phantoms in the Brain, William Morrow and Company
Inc., New York, 1998 (p. 56)

16
What Do You See?
17
Reading Demo - 1
18
Reading Demo - 1
Tentative binding on the letter shapes (bottom
up) is finalized once a word is recognized (top
down). Word shape and meaning over-ride early
cues.
19
Reading Demo -2

New York State lacks proper facilities for the
mentally III.
The New York Jets won Superbowl III.
Human readers may ignore entirely the shape of
individual letters if they can infer the meaning
through context.

20
The Importance of Context

Human intelligence almost always thrives on
context while computers work on abstract numbers
alone. Independence from context is in fact a
great strength of mathematics.
Source Arno Penzias Ideas and Information,
Norton, 1989, p. 49.

21
The Challenges

We need to replicate complex transformations that
the (human/animal) brain has evolved to do over
millions of years.
We have to deal with the fact the processing is
not unidirectional and also affected by other
factors than the input (context). (Such factors
cause visual illusions.)

22
A time scale

The human visual system has evolved from animal
visual systems over a period of more than 100
million years.
Speech is barely over 100 thousand years old.
Written text is no more than 10 thousand years
old.

23
A note on brain models

There is a history for considering the latest
technology to be a model of the human brain, for
example in the 16th century irrigations networks
were considered to be models of the brain.
If someone claims to have a machine modeling the
human brain, ask how could the machine be
modified to model the brain of a dog (since a dog
cannot learn to write poetry, play chess, etc)?

24
A Note on Neural Nets
Is this a model of the brain?
As much as a table is a model of a dog.
25
Simplified model of a small part of the brain
26
A Dubious Approach

Training on large numbers of samples has been
used as a way out of finding a way to understand
what is going on.
But humans (and animals) do not need to be
trained on large numbers of samples.
Rats trained to distinguish between a square and
a rectangle perform quite well when faced with
skinnier rectangles. They have the concept of
rectangle!

27
Distinguish Rectangles from SquaresThe
Artificially Intelligent Approach

Take a hundred (or more) pictures of rectangles
and squares, compute several statistics on each
picture and for each picture create a feature
vector F. Then compute a vector W so that FW
0 for squares and FW

28
Distinguish Rectangles from SquaresThe Natural
Approach

Find the outline of a shape (if one exists in a
picture) and fit a rectangle to it. Then compute
the aspect ratio of the rectangle. If it is near
1 (for some given tolerance), then it is called a
square, otherwise a rectangle.
Criticism Method lacks generality!!!

29
No Generality in Nature

The animal visual systems has many special areas
for visual tasks (about 30 in the human case).
We have already seen examples where high level
(context) recognition takes quickly over the low
level data processing.

30
Negator of Generality
31
The Learning Machine (neural net) Approach

It has the appeal of getting something for
nothing, so it is kept alive.
We can solve a problem without really
understanding it.
Give a learning machine enough samples and a
classifier will be found!!!
(Forget about the rat who only needs two samples.)

32
Criteria for Choosing a Problem to Work on

Context should either be known or not important.
Processing of the input should be relatively
simple (it should be clear what kind of
information we need to extract).
For an example relying heavily on context see
technology/BoxDimensions/overview.htm on my web
site.
Comments on major areas in the next few slides.

33
Speech Recognition

Grammar driven models (using low level context)
have been quite successful.
High level context is even better. For example,
matching a speech fragment to a name on a list.

34
Optical Character Recognition (OCR)

Printed text characters have small shape
variability and high contrast with the
background. (CAPTCHA systems negate these
properties)
Spelling checkers (or ZIP code directories in
postal applications) introduce low level context.

35
An example of heavy use of context

Reading of the checks sent for payment to
American Express.
Because payments are supposed to be in full and
the amount due is known, the number written on a
check is analyzed to confirm whether it matches
the amount due or not.
(But direct payment is used more and more!)

36
An Aside Why did OCR mature when the need for
it was diminished?

The algorithms used in the products of the 1990s
were known earlier but they were too complex to
be implemented effectively with the digital
technology of earlier times.
When computer hardware became cheap enough for
good OCR, it also became cheap enough for PCs
and the Internet.
Keep this in mind in your business plans!

37
Face Recognition

It took over forty years to built acceptable
quality machines that recognize written symbols.
What makes us think that we can solve the much
more complex problem of distinguishing human
faces?
Neuroscientists point out that humans have
special neural circuitry for face recognition.

38
How these two faces differ?
39
How about these two?
40
Face Recognition and Scalability

The population samples in published studies are
relatively small and include men and women of
different races with different hairstyles, etc.
I have never seen a study where all the subjects
are similar. For example, white blond men between
the ages of 20 and 30 with long hair and beards.
Subjects in published studies are cooperative.

41
Face Detection

Before proceeding with face recognition we need
to find the faces in a picture (face detection)
CMU has a web site where the public may submit
pictures and they get back results with a green
square overlaid on faces facing front and green
pentagons of profiles.
Results are not robust.

42
Glimpses from the Face Detection Gallery - 1
43
Glimpses from the Face Detection Gallery - 3
They got the wrong person
44
Concluding Remarks

Before we try to built a machine to achieve a
goal we must ask ourselves whether that goal is
compatible with the laws of nature . (Not because
people can do it.)
While such laws are clear in Physics and
Chemistry, there are not in the field of
Computation except in some extreme cases.

45
Human Credulity - 1

In spite of well understood laws of physics
inventors persist in offering designs that
violate them and they find takers.
Therefore fundamental advances in Computer
Science are likely to reduce but not to eliminate
preposterous claims.

46
Human Credulity - 2

50 years ago Langmuir (in Pathological Science)
debunked UFOs but also predicted that UFOs will
be with us for a long time because it is too good
a story for the news media to let go.
The view of computers as giant brains that are
able to out-think and replace humans is about as
valid as visits by extraterrestrials, but it
makes too good a story for the news media to let
go.

47
The End