Title: Modeling the Motivation-Learning Interface in Learning and Decision Making (FA9550-06-1-0204)
1Modeling the Motivation-Learning Interface in
Learning and Decision Making(FA9550-06-1-0204)
- PIs W. Todd Maddox (University of Texas,
Austin) - Arthur B. Markman (University of Texas,
Austin)
2Motivation-Learning Interface (Maddox/Markman)
Technical Approach
Objective
To understand the influence of motivational
incentives on learning and performance through
empirical and computational model-based
analyses. To improve mathematical models of
learning and performance based on data.
Manipulate peoples motivational state through
global and local incentive manipulations. Conduct
experiments on choice and signal detection to
understand how motivation affects the optimality
of performance, and exploration/exploitation
tradeoff.
DoD Benefit
Budget
Actual/Planned K
Motivational states guide actions but differ
across military and non-military settings. Goal
is to identify motivational states that optimize
performance in each setting. Behavioral and
model-based analyses illuminate these effects and
characterize them along an exploration-exploitatio
n continuum.
FY06 FY07 FY08
152 152 152
Annual Progress Report Submitted?
Y
Y
N
Project End Date 2/28/09
3List of Project Goals
- Develop and test a choice/gambling task
- Examine and model motivational influences on this
choice task - Examine and model variants of this task
- Explore social influences on motivation, learning
and performance - Extend to and model related tasks (signal
detection, decision criterion learning, dynamic
decision making)
4Progress Towards Goals (or New Goals)
- Develop and test a choice/gambling taskDone
- Examine and model motivational influences on this
choice taskInitial studies completed and
published - Examine and model variants of this taskSome
competed and published others in progress - Explore social influences on motivation,
learning, and performanceInitial studies
completed and published others in progress - Extend to and model related tasks (signal
detection, decision criterion learning, dynamic
decision making)Work in progress
5Research Questions
- What does it mean to motivate someone to do
well? - How do we achieve this aim?
6Laymans Answer
- What does it mean to motivate someone to do
well? - Get them to try harder (maximize number
correct, targets destroyed, etc) - How do we achieve this aim?
- Give them an incentive for maximizing (raise,
promotion, etc) - Our research suggests that offering a global
incentive (raise) for maximizing local incentive
(number correct) is too simple a story and is
misleading, even if we define trying harder as
attempting to respond optimally.
7Three-Factor Framework
- Influence of motivating incentives on performance
involves a complex three-way interaction between
three factors - Global incentives (Factor 1)
- Approach some global reward (raise), or
- Avoid losing some reward (avoid a pay cut)
- Local incentives (Factor 2)
- Maximize gains (maximize points earned)
- Minimize losses (minimize points lost)
- Task demand (What strategy is optimal?) (Factor
3) - Exploration or exploitation optimal
8Overview of this talk
- Three factor (regulatory fit) framework
- Studies of choice
- Extensions of choice task and model
- Social influences on motivation
- Extensions to signal detection, decision
criterion learning, dynamic decision making
9Global Incentives(Regulatory Focus)
Approach (Promotion Focus) Achieve Global Task Performance Criterion ? Raffle ticket for 50
Avoidance (Prevention Focus) Achieve Global Task Performance Criterion ? Keep 50 raffle ticket given initially
10Task Reward Structure(Local Trial-by-trial Task
Goal)
Gains Earn points for all responses (Earn more points for correct choice than for incorrect choice)
Losses Lose points for all responses (Lose fewer points for correct choice than for incorrect choice)
11Consider the bigger picture
- Hypothesis Fit increases exploration
- Exploration can be defined within tasks
- Willingness to shift strategies
- Willingness to explore a set of options
12Consider the bigger picture
Fit
- Almost all cognitive research involves a
promotion focus and a gains reward structure - Promotion focus small monetary reward or social
contract with experimenter. - Gains reward for correct response, no reward for
error
13Three-way interaction
Gains Losses
Promotion Fit Good Mismatch Poor
Prevention Mismatch Poor Fit Good
Exploitation optimal
Gains Losses
Promotion Fit Poor Mismatch Good
Prevention Mismatch Good Fit Poor
14Choice/Gambling task
Worthy, Maddox, Markman (2007, PBR)
- Does Regulatory Fit affect choice?
- Two-Deck variant of Iowa Gambling task
- Task 1 Exploration Optimal
- Task 2 Exploitation Optimal
- Regulatory Focus (Global incentive)
- Earn ticket or avoid losing ticket
- Reward Structure (Local incentive)
- Gains vs. Losses
15Gains Condition Example
16PICK A CARD!
Yes Bonus No
450
174
0
17Yes Bonus No
450
7
181
174
Correct
0
18PICK A CARD!
Yes Bonus No
450
181
0
19Yes Bonus No
450
3
184
181
Correct
0
20PICK A CARD!
Yes Bonus No
450
184
0
21Losses Condition Example
220
PICK A CARD!
-174
Yes Bonus No
- 450
230
PICK A CARD!
-7
-174
-181
Yes Bonus No
- 450
240
PICK A CARD!
-181
Yes Bonus No
- 450
25Regulatory Fit and Choice
- At any moment, you have an estimate of the
relative goodness of the decks - If you choose deterministically from the better
deck, you are exploiting - If you choose more probabilistically, you are
exploring - Does regulatory fit lead to more exploration than
regulatory mismatch?
26Modeling Choice Behavior
- EVs of each option are updated via a
recency-weighted algorithm
Current EV
New EV
Reward
Recency Parameter
Current EV
- If reward is greater than the current EV the EV
increases - If reward is less than the current EV the EV
decreases
27- a is a free parameter constrained to be between
0 and 1
- Higher a values give greater weight to recent
rewards - When a 1, Updating Equation reduces to
- Alternatively, when a 0, Updating Equation
reduces to
28Action Selection
- Action selection is probabilistically determined
via choice rules (e.g. Luce, 1959)
Softmax Rule
Exploitation parameter
EV for option A
Probability of choosing option A
Sum of EVs for all options
- Higher g values indicate greater exploitation
- Lower g values indicate greater exploration
29Exploration Optimal
- Predictions
- Regulatory Fit should perform better
- Regulatory Fit should yield small exploitation
parameter (defined shortly)
30Exploration optimal - Points Analysis
31Exploitation Parameter(larger value greater
exploitation)
32Exploitation Optimal Results (Gains only)
33Summary
- Regulatory Fit ? Exploratory Behavior
- Fit ? Good performance when Exploration Optimal
- Fit ? Poor performance when Exploitation Optimal
- Replicates pattern seen in classification (Maddox
et al, 2006 Grimm et al, 2008)
34Affect and Choice
Worthy, Maddox, Markman (in preparation)
- Four-Deck variant of Iowa Gambling task
- Exploitation Optimal
- Alternative method for inducing regulatory focus
- Smile vs. Frown faces on all cards
- Reward Structure
- Gains vs. Losses
35PICK A CARD!
Yes Bonus No
450
174
0
36PICK A CARD!
Yes Bonus No
450
174
0
37Predictions
Worthy, Maddox, Markman (in preparation)
- Since exploitation optimal, and assuming
- smile promotion
- Frown prevention
- Predictions
- Regulatory Fit should perform worse
- Regulatory Fit should yield small exploitation
parameter
38Points Analysis
39Exploitation Parameter
40Summary
- Predictions supported
- Same behavioral and model pattern for regulatory
focus and affect manipulation - Follow-up studies (running)
- Exploration optimal task in progress for affect
task. - Model comparison project
- Feedback/ITI delays
41Social Motivation and Cognition
Grimm, Markman, Maddox (2009 JPSP)
- Choice studies so far
- Explicit incentives to induce regulatory focus
- Affect to induce regulatory focus
- Other social factors can affect regulatory focus
- Stereotype threat
- Negative self-relevant stereotype -gt poor
performance - Negative stereotypes may induce a prevention
focus - If so, losses environment should attenuate
effect. - DoD relevant due to hierarchical structure
42Stereotype Threat Math problems
43Exploration Optimal Classification
Task requires exploration of space of possible
rules.
o category A long, steep lines category B
all others
44Possible Rule-based Strategies
83 accuracy
100 accuracy
45Experiment Screen Sample
Gains
Losses
46Method
- Three-dimensional classification task
- Exploration is optimal
- Arbitrary stereotypes given to participants
- Women are better
- Men are better
- Manipulated gains and losses of points
- Predictions
- Traditional stereotype threat result for gains
- Reversed stereotype threat result for losses
47Task Accuracy
Experiment 1 Women are Better
Experiment 2 Men are Better
48Model-Analyses - CJ Use
Experiment 1 Women are Better
Experiment 2 Men are Better
49Work In Progress
- Exploitation optimal task in progress
- Involves information-integration classification
- Prediction Pattern should completely reverse
- End of semester effect
- Prevention focused so better with losses
- supported
50State and Trait Factors Affect Global Incentive
Focus
- Manipulate global incentive focus (state
variables) - Explicit monetary
- Affect/Social stereotype
- Trait variables
- Procrastinators (end-of-semester)
- Personality characteristics
- Impulsivity, sensation seeking, anxiety,
depression - IMPASS -gt bias toward simple rules (Tharp,
Pickering Maddox, under review)
51Task and Model Extensions
52Signal Detection
- Two-stimulus identification (line length)
- Promotion/Prevention x Gains/Losses
- Biased payoffs so accuracy-maximization must be
abandoned (exploration optimal)
53Preliminary Results
- Early learning effect on sensitivity.
- Fit leads to increased sensitivity.
- No systematic effects on bias.
54Extended training
- Effect emerges on bias with extended training
- Fit leads to bias shift toward optimal.
- No systematic effects on sensitivity
55Confidence paradigm
- Classification and Confidence judgment obtained
56Nested Modeling Approach(derived from Mueller
Weidmann and Maddox Bohil)
57Preliminary Model Results
- Fit -gt increased classification and confidence
noise - Likely due to increased exploration
58Followup
- Incorporate into Maddox and Bohils Hybrid Model
- External decision criterion
- .
59Summary LOCUS Plot
Exploration-optimal tasks
Strong interactionsNo consistent main effects
Exploitation-optimal tasks
Zhang, et al (1997 Journal of Neuroscience)
60Summary
- What does it mean to motivate someone to do
well? - How do we achieve this aim?
- It is complex, but systematic and understandable.
- It involves a three-way interaction of
- Global incentives
- Local incentive
- Task demand (i.e., optimal classifier strategy)
61Summary (cont.)
- Regulatory Fit (interaction between global and
local incentives) leads to increased exploration. - Exploration can be advantageous or
disadvantageous, depending upon the task demands.
62Summary (cont.)
- We successfully applied a reinforcement learning
model to choice and identified an exploitation
parameter that tracks regulatory fit effects. - We applied classification learning models to
stereotype threat data and found that regulatory
fit affects the flexibility of hypothesis-testing.
- We are extending the approach to more basic tasks
such as signal detection and criterion learning
and are generalizing relevant models to account
for regulatory fit effects - Finally, we are extending the approach to more
dynamic decision making tasks and model
development is ongoing.
63Future Directions
- Continue model development
- Applications to resource acquisition (foraging)
- Exploration of other social effects on motivation
- Social influences on choking under pressure.
64Interaction with Other Groups and Organizations
- Interactions with AFOSR recipient (Brad Love)
- Interactions with the Institute for Advanced
Technology (IAT) at UT-Austin, an Army UARC - Interactions with the Institute for Innovation
Creativity and Capital (IC2) at UT-Austin - Interactions with the Imaging Research Center
(IRC) at UT-Austin - Interactions with the Institute for Neuroscience
(INS) at UT-Austin - Interactions with the Center for Perceptual
Systems (CPS) at UT-Austin - Interactions with Veterans Affairs Medical Center
(VAMC) at UC-San Diego
65List of Publications Attributed to the Grant
(2008-9)
- Peer-Reviewed Manuscripts
- Grimm, L.R., Markman, A.B., Maddox, W.T.,
Baldwin, G.C. (2008). Differential effects of
regulatory fit on classification learning.
Journal of Experimental Social Psychology, 44,
920-927. - Worthy, D.A., Maddox, W.T., Markman, A.B.
(2008) Ratio and Difference Comparisons of
Expected Reward in Decision Making Tasks. Memory
Cognition, 36, 1460-1469. - Grimm, L.R., Markman, A.B., Maddox, W.T.,
Baldwin, G.C. (in press) Stereotype threat
reinterpreted as regulatory fit. Journal of
Personality and Social Psychology. - Worthy, D.A., Markman, A.B. Maddox, W.T. (in
press) What is pressure? Evidence for social
pressure as a type of regulatory focus.
Psychonomic Bulletin and Review. - Maddox, W.T., Glass, B.D., Markman, A.B. (under
revision) Regulatory fit effects on stimulus
identification. - Grimm, L.R., Markman, A.B., Maddox, W.T. (under
revision) Regulatory fit created by time of
semester and task reward structure influences
test performance. - Glass, B.C., Markman, A.B., Maddox, W.T. (under
review) The generalized exploration model (GEM)
A model of human foraging for empirical analysis - Markman, A.B., Beer, J.S., Grimm, L.R., Rein,
J.R., Maddox, W.T. (under review) The optimal
level of fuzz Case studies in a methodology for
psychological research. - Conference Presentations
- Worthy, D., Markman, A.B., Maddox, W.T. Are
reward expectancies in choice tasks processes as
ratios or differences? Implications for theories
of reward processing in the orbitofrontal cortex.
Poster presented at the Annual Meeting of the
Cognitive Neuroscience Society, San Franscisco,
CA, April, 2008. - Worthy, D.A., Maddox, W.T., Markman, A.B.
(2007). The length of feedback interval and
inter-trial interval effects decision-making in
choice tasks. Poster to be presented at the
Annual Meeting of the Society for Neuroeconomics,
September 27-30, 2008, Hull, Massachusetts. - Worthy, D.A, Maddox, W.T., Markman, A.B. What
is pressure? Relating social pressure to
regulatory focus. Poster presented at the 49th
Annual Meeting of the Psychonomics Society,
Chicago, Il, November, 2008. - Grimm, L.R., Markman, A.B., Maddox, W.T.,
Minimizing Losses Improves End of Semester GRE
Performance. Presentation at the Society for
Personality and Social Psychology, Tampa,
Florida, February 2009. - Glass, B.D., Filoteo, J.V., Markman, A.B.
Maddox, W.T. Regulatory focus and executive
functions. Poster presented at the Annual Meeting
of the Cognitive Neuroscience Society, San
Franscisco, CA, March, 2009..
66(No Transcript)
67End-of-Semester
Grimm, Markman, Maddox (under review)
- End of semester participants are bad,
unmotivated - Maybe in a prevention focus?
- So mismatch with most task reward structures
(gains). - GRE math problems
68Regulatory Fit Exploration Why?
- Empirical support in several domains
- Connection to Neuroscience
- Positive affect-frontal exploration hypothesis
(Isen, Ashby, etc) - Regulatory focus-frontal activation findings
(Amodio, Cunningham, etc) - LC-NE-exploration/exploitation relation
(Ashton-Jones, Cohen, Daw)
69Feedback Delay, ITI and Choice
Worthy, Markman Maddox (2008 SFN)
- Increased ITI shown to increase exploitation in
an exploration optimal task (Bogacz et al, 2007)
70Design and Results
- Four-deck exploitation optimal task (gains only)
- Increased feedback duration-gt less switching,
less exploration.
71Risky Decisions/Feedback Interval
- Each deck has a partner
- Same EV, but one low and one high variance
- Short ITI only (gains only)
- Replicate effect Increased feedback duration-gt
less exploration. - Increased feedback duration-gt fewer risky
choices.
72Followups in progress
- Losses variants
- Exploration optimal variants
73Extending Models
Worthy, Maddox, Markman (2008 MC)
- Choice models use one of two decision rules
- Matching rules
- Difference rules
These rules predict that choices are affected by
scalar additions to reward values, but not by
scalar multiplications
These rules predict that choices are affected by
scalar multiplications of reward values, but not
by scalar additions.
74Testing influence of reward value
Worthy, Maddox, Markman (2008 MC)
- Exp 1 (Exploitation optimal)
- Exp 2 (exploration optimal)
- Control Deck values 1-10
- Deck A EV6 Deck B EV4
- Distance-Preserving Deck values 81-90
- Deck A EV86 Deck B 84
- Ratio-Preserving Deck values 10-100
- Deck A EV60 Deck B EV40
75Results
- Altering ratios has largest effect on performance
76Summary
- Support for 3-way interaction in choice
- Fit -gt exploration
- Affect manipulation similar to focus
- Feedback delay increases exploitation, reduces
risky choices - Implications for training.
- Ratio preserving models supported
77Otto, Gureckis, Markman, Love (in preparation)
Rising Optimum Task
Optimal response allocation requires escaping a
local minimum of reward taken from Bogacz et al.
(2007) and Montague and Berns (2002) Exploration
optimal Prediction Regulatory fit should
perform better
-Are people in regulatory fit less sensitive to
local changes in payoffs? If they are, they will
be able to overcome the local minimum
78Preliminary Results
Model development in progress