Title: Aristotles Dream: The behaviorome project A Language for Action The grounding problem
1Aristotles Dream The behaviorome projectA
Language for Action(The grounding problem)
- Yiannis Aloimonos CVL
- The space of actions (visual or motoric) is
described by a language with its own phonology,
morphology and syntax -
2Thanks to
- Abhijit Ogale, Gutemberg Guerra,
- Alap Karapurkar
- Ken Nakayama
- NSF, ARDA
3(No Transcript)
4Neuromorphic Engineering, with Kwabena Boahen,
UPenn/Stanford
5Active Perception
- Perception/Action Cycle
- Kants a priori
- Task decomposition
- Consider the body!
- Japanese Delegation
- Intelligence Agencies
6Mirror Neurons
7Action Representations
Linguistic
Visuo-motor
Visual
8Embodiment and Representations
embodiment
embodiment
9(No Transcript)
10Motion capture
11Motion capture
12(No Transcript)
13(No Transcript)
14Rotational Data
15Motor Domain Verb Hierarchy
16Visuo-Motor Verbal Lexicon
- 1121136 ltL0 C1 4gt consume ingest take_in
take have serve oneself to, or consume
regularly - 1806883 ltL0 C2 8gt stop halt come to a
halt, stop moving - 2062507 ltL0 C3 9gt cause_to_be_perceived
have perceptible qualities - 0010141 ltL0 C4 9gt act behave do
behave in a certain manner show a certain
behavior - 1432719 ltL0 C5 9gt let_go_of let_go release
relinquish release, as from one's grip - 1179760 ltL0 C6 15gt hold take_hold have
or hold in one's hands or grip - 1951556 ltL0 C7 19gt leave go_forth go_away
go away from a place - 1750193 ltL0 C8 19gt express_emotion
express_feelings give verbal expression to
one's feelings - 1504229 ltL0 C9 20gt lie be lying, be
prostrate be in a horizontal position - 0001740 ltL0 C10 21gt breathe take_a_breath
respire suspire draw air into, and expel out
of - 0121430 ltL0 C11 23gt change alter modify
cause to change make different cause
transformation - 1175818 ltL0 C12 24gt seize prehend clutch
take hold of grab - 2068725 ltL0 C13 25gt look perceive with
attention direct one's gaze towards - 0946463 ltL0 C14 25gt pronounce articulate
enounce sound_out enunciate say speak in
certain way - 0102093 ltL0 C15 31gt discharge expel eject
release eliminate (substances) from the
body - 0951670 ltL0 C16 62gt utter emit let_out
let_loose express audibly utter sounds - 0106479 ltL0 C9 71gt change undergo a
change become different in essence - 0909455 ltL0 C17 125gt express verbalize
verbalise utter give_tongue_to articulate - 1169634 ltL0 C18 137gt touch make physical
contact with, come in contact with
17Human Activity Language
- Phonology
- Morphology
- Syntax
18Phonology
19Phonological Rules
20Morphology
jog B0 G0 R2 Y3 B3 V0 G3 R0 Y0 jump G0 R0
Y0 B2 G1 V0 B5 G3 R0 Y0 R10 Y5 B0 G0 R0 Y0 B0 G1
B0 G0 R0 Y1 B0 V0 B0 V0 G0 R0 Y0 B0 G0 R0 Y0
V0 run B4 G7 R1 Y2 B1 G0 R3 Y5 V0 Y0 scuff
R0 V0 Y0 V1 Y0 V0 R0 V0 Y0 V0 R0 V0 Y0 B0 V0 G0
V1 G0 V0 G0 ?0 V0 ?0 stomp G0 R0 Y0 V0 R1 V0
Y2 V1 R0 V0 Y1 V0 Y0 B0 G0 V0 B1 V0 G1 V0 B1 V0
G2 R2 Y0 B0 swagger Y0 V0 R1 V0 Y0 V0 R0 V0 Y0
V0 ?0 R0 V0 Y0 V0 R0 V0 Y0 B2 G2 V0 B0 V0 G0 V0
B0 V0 G0 V0 G0 R1 tiptoe B0 G1 V1 B0 V0 G1 ?0
B0 V0 G0 R0 V0 Y0 V0 Y0 V0 R0 V0 Y0 B0 V0 G0 R0
V0 Y0 toe R0 V1 R0 V0 Y0 V0 R0 Y0 B0 G1 V2 G1
R0 V0 Y0 B0 G0 troop R0 Y0 B0 G0 R1 V3 Y3 V1
Y0 B5 G3 R0 Y0 B2 G3 R2 V0 Y2 B0 G0 walk R0
V0 Y0 V1 Y0 V0 R0 V0 Y0 B2 G2 V0 B0 V0 G0 R0 V0
Y0 ?0 R0 Y1 V0 R0 V0 Y1
L01 V0 Y1 jog B0 G0 R2 Y3 B3 V0 G3 R0 Y0
jump G0 R0 Y0 B2 G1 V0 B5 G3 R0 Y0 R9 Y5 B0
G0 R0 Y0 B0 G1 B0 G0 R0 Y1 B0 V0 B0 V0 G0 R0 Y0
B0 G0 R0 Y0 V0 run B4 G7 R1 Y2 B1 G0 R3 Y5 L01
scuff R0 L01 L01 V0 R0 L01 V0 R0 L01 B0 V0 G0
V1 G0 V0 G0 ?0 V0 ?0 stomp G0 R0 Y0 V0 R1 L01
V1 R0 L01 L01 B0 G0 V0 B1 V0 G1 V0 B1 V0 G2 R2 Y0
B0 swagger Y0 V0 R1 L01 V0 R0 L01 V0 ?0 R0
L01 V0 R0 L01 B2 G2 V0 B0 V0 G0 V0 B0 V0 G0 V0 G0
R1 tiptoe B0 G1 V1 B0 V0 G1 ?0 B0 V0 G0 R0
L01 L01 V0 R0 L01 B0 V0 G0 R0 L01 toe R0 V1 R0
L01 V0 R0 Y0 B0 G1 V2 G1 R0 L01 B0 G0 troop R0
Y0 B0 G0 R1 V3 Y3 L01 B5 G3 R0 Y0 B2 G3 R2 L01 B0
G0 walk R0 L01 L01 V0 R0 L01 B2 G2 V0 B0 V0 G0
R0 L01 ?0 R0 Y1 V0 R0 L01
21Morphology
L01 V0 Y1 L02 R0 L01 L03 B1 G1 L04 V1
G1 L05 R1 Y1 L06 V0 L02 L07 B0 L04 L08
L05 L03 L09 V0 L07 L10 L03 L09 L11
L01 L06 L12 L05 L06 L13 L02 L11 L14 L08
R2 L15 V0 0 L16 B5 G3 L17 L16 L05 L18
L09 L08 L19 L12 L06 L20 L14 Y4 jog B3
V0 G3 L20 jump L08 V0 L17 R9 Y5 L03 L08 L03
L05 B0 L18 L05 L04 run B4 G7 L20 L01 scuff
L13 L06 L07 L04 L04 0 L15 stomp L19 L01 L10
L18 swagger L15 L02 L06 L10 L09 L04 L19 tiptoe
L10 0 L07 L13 L07 L02 toe R0 L06 V0 L08
L04 L02 L03 troop L14 V3 Y3 L01 L17 B2 G3 R2
L01 L03 walk L13 L10 L02 0 L12
22Nouns
- Body parts active during the execution of a
human activity
23Noun Morphology
24Noun Morphology2
64 16 18 65 21 64 66 17 65 67 23
28 68 46 47 69 7 58 70 19 66 71
37 39 72 26 67 73 13 15 74 22 30 75
31 71 76 32 69 77 20 70 78 72 75
79 45 68 80 2 10 81 42 73 82 35
76 83 57 59 84 3 74 85 29 78 86
9 77 87 44 79 88 38 82 89 50 62 90
53 81 91 56 83 92 12 80 93 24 85
94 1 84 95 87 91 96 41 90 97 86
93 98 14 96 99 33 88 100 54 95 101
94 97 102 5 89
25Noun Morphology3
26Adjectives
- Initial posture specifies the initial state of
the active joints
27Adverbs
- String of multiplicative constants for the
execution time - of coordinated segments
28View-Invariant Properties
29(No Transcript)
30(No Transcript)
31 B7 G6 R8 Y6 B29 G25 R31 Y26 B8 G7 R8 Y6 B29 G26
R33 Y24 B10 G9 R9 Y8 B6 G8
B10 G7 B14 G10 R20 Y23 B2 B11 G8 B12 G12 R20
Y27 B2 B10 G8 B14 G10
32 B7 G6 R8 Y6 B29 G25 R31 Y26 B8 G7 R8 Y6 B29 G26
R33 Y24 B10 G9 R9 Y8 B6 G8
B10 G7 B14 G10 R20 Y23 B2 B11 G8 B12 G12 R20
Y27 B2 B10 G8 B14 G10
33 B7 G6 R8 Y6 B29 G25 R31 Y26 B8 G7 R8 Y6 B29 G26
R33 Y24 B10 G9 R9 Y8 B6 G8
B10 G7 B14 G10 R20 Y23 B2 B11 G8 B12 G12 R20
Y27 B2 B10 G8 B14 G10
34 B7 G6 R8 Y6 B29 G25 R31 Y26 B8 G7 R8 Y6 B29 G26
R33 Y24 B10 G9 R9 Y8 B6 G8
B10 G7 B14 G10 R20 Y23 B2 B11 G8 B12 G12 R20
Y27 B2 B10 G8 B14 G10
35 B7 G6 R8 Y6 B29 G25 R31 Y26 B8 G7 R8 Y6 B29 G26
R33 Y24 B10 G9 R9 Y8 B6 G8
B10 G7 B14 G10 R20 Y23 B2 B11 G8 B12 G12 R20
Y27 B2 B10 G8 B14 G10
36 B7 G6 R8 Y6 B29 G25 R31 Y26 B8 G7 R8 Y6 B29 G26
R33 Y24 B10 G9 R9 Y8 B6 G8
B10 G7 B14 G10 R20 Y23 B2 B11 G8 B12 G12 R20
Y27 B2 B10 G8 B14 G10
3794
74
69
63
53
60
57
B7 G6 R8 Y6 B29 G25 R31 Y26 B8 G7 R8 Y6 B29 G26
R33 Y24 B10 G9 R9 Y8 B6 G8
123
97
75
18
18
65
B10 G7 B14 G10 R20 Y23 B2 B11 G8 B12 G12 R20
Y27 B2 B10 G8 B14 G10
38Syntax
St, j1
Parallel (actuator)
Nt, j ? Nt, j1
St, j
NPt1,j
VPt,j
VPt1,j
NPt,j
St, j St1, j
Syntax
Sequential (time)
- Noun Body parts active during the execution of
a human activity - Verb Changes each active joint experiences
during the activity execution - Adjective Specifies the initial state of the
active joints (initial posture) - Adverb String of multiplicative constants for
execution time of coordinated segments - Syntax involves the conceptual system
39Experiments in
40(No Transcript)
41(No Transcript)
42(No Transcript)
43(No Transcript)
44(No Transcript)
45- These few frames (or silhouettes or poses) are
related in an action through a grammar a
probabilistic grammar . Recognizing the action
amounts to successful parsing. - What are these key poses or silhouettes? They are
the anchors of action. Currently studying two
candidates max/min of motion and phase space.
46Jumping
47Kneeling
48Turning
49(No Transcript)
50(No Transcript)
51(No Transcript)
52(No Transcript)
53(No Transcript)
54(No Transcript)
55(No Transcript)
56Hidden Markov Model Implementation
- The solution resembles speech recognition where
instead of phonemes one has silhouettes. - Matches with silhouettes in the database of
models produces several matches which are
disambiguated by the HMM or the grammars.
57Making Models (learning)
- Observe 11 actions of 10 different people from 8
points of view.
58Sample Pickup Video
59Sample walk video
60Input keyframes silhouettes from different actors
(walk, bentknees, stand)
61Part of the HMM
62(No Transcript)
63(No Transcript)
64 Making Models (learning)
- Observe 11 actions of 10 different people from 8
points of view. - Extract silhouettes
- Identify common silhouettes
- Each state of the HMM corresponds to a pose and a
viewpoint. - Build transitions, set up probabilities,
introduce silent states. - For each pose and view, average the silhouettes
from different actors.
65Fuzzy poses hidden states
66Matching silhouettes
67Phase correlation
Given images i1 and i2, F1 FFT(i1) F2
FFT(i2) F F1 x F2 F F/F Phasecorr(x,y)
inverseFFT(F)
Phase correlation gives a delta function at the
correct 2D translation
68Phase correlation of logpolar images using
Fourier transform
Scaling becomes y-translation
Delta function on y-axis recovers the scale
Logpolar image
Ordinary image
Z-rotation becomes x-translation
Delta function on x-axis recovers the angle
Logpolar image
Ordinary image
69(No Transcript)
70A sequence of few actions
71Background subtraction (left) and keyframes
(right)
72Result after matching keyframes with model poses
in memory and Viterbi algorithm
73A sequence of few actions
74Background subtraction (left) and keyframes (right
75Matching keyframes with model poses in memory
76Changing viewpoint
77Background subtraction (left) and keyframes
(right)
78Matching keyframes with model poses in memory
79Occlusion
80Background subtraction (left) and keyframes (right
81Matching keyframes with model poses in memory
82Leg
Hand
Torso
Head
Full Body Video
83Leg
Hand
Torso
Head
Full Body Video
84Visual Action
- Phonemes body parts (appearance/movement)
- Morphemes silhouettes
- Syntax as in motoric action (it involves the
conceptual system grounding)
85VHF Filters for detecting humans and their parts
Basic pipeline
An example using depth
Key poses (depth maps)
TRAINING POSES Motion/Depth/Appearance from
various viewpoints
Dimensionality reduction (linear/nonlinear)
Views
View-invariant filters for detecting humans
Apply to a new video
86(No Transcript)
87Putting together visuomotor action and
language GroundingThe Behavior-ome
ExperimentFirst Apply phonological filters and
morphological rulesSecond LearningResult
Action grammars
Wordnet lexicon
actions
start
end
Start
end
Annotation For a given action, denote starting
and ending point
88(No Transcript)
89(No Transcript)
90(No Transcript)
91Keyframe Extraction
92Spatial Layer
P1 Right Hand Surface
P1 Right Leg Surface
P1 Left Leg Surface
Book Surface
P2 Right Hand Surface
Spatial Relation Layer
Contact
Relative Angle
Concept Propagation
Contact
Learnt Object Layer (Independent of Surface)
P1
P2
O1
Learnt Object Relations (Spatial Relations
transferred over Spatial-Learnt Object Mappings)
Holds
Holds
Signature of Give Concept
93Platos mistake?
- Languages of the mind (visual, auditory, speech,
motoric, tactile,..) - Is the mind flat?
- From philosophy to Math/Physics