Title: Artificial Agents Play the Beer Game Eliminate the Bullwhip Effect and Whip the MBAs
1Artificial Agents Play the Beer Game Eliminate
the Bullwhip Effect and Whip the MBAs
- Steven O. Kimbrough
- D.-J. Wu
- Fang Zhong
- FMEC, Philadelphia, June 2000 file
beergameslides.ppt
2The MIT Beer Game
- Players
- Retailer, Wholesaler, Distributor and
Manufacturer. - Goal
- Minimize system-wide (chain) long-run average
cost. - Information sharing Mail.
- Demand Deterministic.
- Costs
- Holding cost 1.00/case/week.
- Penalty cost 2.00/case/week.
- Leadtime 2 weeks physical delay
3Timing
- 1. New shipments delivered.
- 2. Orders arrive.
- 3. Fill orders plus backlog.
- 4. Decide how much to order.
- 5. Calculate inventory costs.
4Game Board
5The Bullwhip Effect
- Order variability is amplified upstream in the
supply chain. - Industry examples (PG, HP).
6Observed Bullwhip effect from undergraduates game
playing
7Bullwhip Effect Example (P G)
- Lee et al., 1997, Sloan Management Review
8Analytic Results Deterministic Demand
- Assumptions
- Fixed lead time.
- Players work as a team.
- Manufacturer has unlimited capacity.
- 1-1 policy is optimal -- order whatever amount
is ordered from your customer.
9Analytic Results Stochastic Demand (Chen, 1999,
Management Science)
- Additional assumptions
- Only the Retailer incurs penalty cost.
- Demand distribution is common knowledge.
- Fixed information lead time.
- Decreasing holding costs upstream in the chain.
- Order-up-to (base stock installation) policy is
optimal.
10Agent-Based Approach
- Agents work as a team.
- No agent has knowledge on demand distribution.
- No information sharing among agents.
- Agents learn via genetic algorithms.
- Fixed or stochastic leadtime.
11Research Questions
- Can the agents track the demand?
- Can the agents eliminate the Bullwhip effect?
- Can the agents discover the optimal policies if
they exist? - Can the agents discover reasonably good policies
under complex scenarios where analytical
solutions are not available?
12Flowchart
13Agents Coding Strategy
- Bit-string representation with fixed length n.
- Leftmost bit represents the sign of or -.
- The rest bits represent how much to order.
- Rule x1 means if demand is x then order x1.
- Rule search space is 2n-1 1.
14Experiment 1a First Cup
- Environment
- Deterministic demand with fixed leadtime.
- Fix the policy of Wholesaler, Distributor and
Manufacturer to be 1-1. - Only the Retailer agent learns.
- Result Retailer Agent finds 1-1.
15Experiment 1b
- All four Agents learn under the environment of
experiment 1a. - Ăśber rule for the team.
- All four agents find 1-1.
16Result of Experiment 1b
- All four agents can find the optimal 1-1 policy
17- Artificial Agents Whip the MBAs and
Undergraduates in Playing the MIT Beer Game
18Stability (Experiment 1b)
- Fix any three agents to be 1-1, and allow the
fourth agent to learn. - The fourth agent minimizes its own long-run
average cost rather than the team cost. - No agent has any incentive to deviate once the
others are playing 1-1. - Therefore 1-1 is apparently Nash.
19Experiment 2 Second Cup
- Environment
- Demand uniformly distributed between 0,15.
- Fixed lead time.
- All four Agents make their own decisions as in
experiment 1b. - Agents eliminate the Bullwhip effect.
- Agents find better policies than 1-1.
20Artificial agents eliminate the Bullwhip effect.
21Artificial agents discover a better policy than
1-1 when facing stochastic demand with penalty
costs for all players.
22Experiment 3 Third Cup
- Environment
- Lead time uniformly distributed between 0,4.
- The rest as in experiment 2.
- Agents find better policies than 1-1.
- No Bullwhip effect.
- The polices discovered by agents are Nash.
23Artificial agents discover better and stable
policies than 1-1 when facing stochastic demand
and stochastic lead-time.
24Artificial Agents are able to eliminate the
Bullwhip effect when facing stochastic demand
with stochastic leadtime.
25Agents learning
26The Columbia Beer Game
- Environment
- Information lead time (2, 2, 2, 0).
- Physical lead time (2, 2, 2, 3).
- Initial conditions set as Chen (1999).
- Agents find the optimal policy order whatever is
ordered with time shift, i.e., - Q1 D (t-1), Qi Qi-1 (t li-1).
27Ongoing Research More Beer
- Value of information sharing.
- Coordination and cooperation.
- Bargaining and negotiation.
- Alternative learning mechanisms Classifier
systems.
28Summary
- Agents are capable of playing the Beer Game
- Track demand.
- Eliminate the Bullwhip effect.
- Discover the optimal policies if exist.
- Discover good policies under complex scenarios
where analytical solutions not available. - Intelligent and agile supply chain.
- Multi-agent enterprise modeling.
29A framework for multi-agent intelligent
enterprise modeling