Mining Airfare Data to Minimize Ticket Purchase Price presentation

About This Presentation

Transcript and Presenter's Notes

Title: Mining Airfare Data to Minimize Ticket Purchase Price

1
Mining Airfare Data to Minimize Ticket Purchase
Price

Oren Etzioni
Craig Knoblock
Alex Yates
Rattapoom Tuchinda

2
Outline

Introduction and Motivation
Data Mining Methods for Airfare Data
Experiments
Conclusion

3
Introduction

Corporations often use complex polices to vary
product prices over time.
Airline industry often use dynamic pricing
strategies to maximize its revenue. (based on
seasons, seats available, pricing of other
airlines)
In 1988, Bell Laboratories patented a
mathematical formula that could perform rapid
calculations on fare problems with thousands of
variables.

4
Introduction Cont.

Airlines have learned to more effectively price
their product, instead of attacking each other to
gain market share.
Today, airlines use sophisticated software to
track their competitors pricing history and
propose adjustments that optimize their overall
revenue.

5
Pricing Strategies

Flights depart around holidays or weekends appear
to change more.
7 and 14 day advance purchase (airlines
usually increase ticket price two weeks before
departure date and ticket prices are at a maximum
on departure dates.)
Weekend Boosts

6
Example

Price change over time for American Airlines
flight 192223, LAX-BOS, departing on Jan. 2.
(This example shows the rapid price change in
the days priori to the New Year)

7
Categories of airlines

Big players (United Airlines, American Airlines)
Smaller airlines which concentrate on selling
low-price tickets ( Air Trans, Southwest)
Different types of tickets (economic, business,
first class, restricted or unrestricted)

8
Is Airfare Prediction Possible?

Airlines have tons of historical data.
Airlines have many fare classes for seats on the
same flight, use different sales channels (e.g.,
travel agents, priceline.com)
Airfare prices information become increasingly
available on the Web.
But some information is not available (e.g., of
unsold seats in a particular flight).

9
Motivation

The goal of this paper is to learn whether to buy
a ticket or wait at a particular time point, for
a particular flight, given the current price
history recorded.

10
Advisor Model

Consumer wants to buy a ticket.
Hamlet buy (this is a good price).
Or wait (a better price will emerge).
Notify consumer when price drops.

11
Will Flights sell out?

Watch the number of empty seats.
Upgrade to business class.
Place on another flight
In our experiment upgrades were sufficient

12
Flight Data

Airlines typically user the same flight number
(e.g., NW17) to refer to multiple flights with
the same route that depart at the same time on
different dates.
In this paper, a particular flight is referred to
a combination of its flight number and date.
In the training dataset, same class is the set of
states with the same flight number and the same
hours but different department dates.

13
Data Set

Used Fetch.coms data collection infrastructure.
Collected over 12,000 price observations
41 day period, every 3 hours.
Lowest available fare for a one-week roundtrip.
LAX-BOS and SEA-IAD.
6 airlines including American, United, etc.

14
Learning Task Formulation

Input price observation data (built a flight
data collection agent that runs at a scheduled
internal, extracts the pricing data, and stores
the results in a database.
Algorithm label observations run learner--
Hamlet
Output for each purchase point ?
buy VS wait

15
Formulation cont.

Want to use the latest information.
Run learner daily to produce new model.
Learner is trained on data gathered to date.
Learned policy is a collection of these models.
Our experiments evaluate the savings on the
flights.

16
Candidate Approaches

By hand an expert looks at the data.
Time series
Not effective at price jumps!
Reinforcement learning Q-learning.
Used in computational finance.
Rule learning Ripper,

17
RIPPER

RIPPER (Repeated Incremental Pruning to Produce
Error Reduction) is an efficient rule learning
algorithm that can process noisy datasets
containing thousands of examples.
First, the algorithm partition examples into a
growing set and a pruning set. Next, a rule is
grown by adding and pruning features. Repeat the
process until the rule set output by RIPPER.
RIPPER is suitable to handle two-class learning
problem.

18
RIPPER Cont.

In this study, features include price, airline,
route, hours-before-takeoff, etc.
Learned 20-30 rules

19
Simple Time Series

Predict price using a fixed window of k price
observations weighted by a.
We used a linearly increasing function for a
The time series model makes its decision based on
a one-step prediction of the ticket price change
IF Pt1gtPt THEN buy, ELSE wait.

20
Reinforcement Learning

Reinforcement learning is learning what to
do---how to map situations to actions---so as to
maximize a numerical reward signal.
Supervised learning is learning from examples
provided by some knowledgeable external
supervisor.

21
Q-learning

Q learning is a method addressing the task of
Reinforcement Learning. The main idea is to
approximate the function assigning each
state-action pair the highest possible payoff.
Q denote a transition function, mapping each
state-action pair to a successor state
S denote a finite set of states
A denote a finite set of actions
R denote a reward function

22
Q-learning Cont.

Standard Q-Learning formula
s is the state resulting from taking action a in
state s.
r is the discount factor for future rewards, in
this paper, r1

23
Hand-Crafted Rule

A fairly simple policy consulted with travel
agents, using it to compare with other data
mining algorithms
IF ExpPrice(s0,t0)ltCurPrice AND s0gt7 days
THEN wait ELSE buy
ExpPrice(s0,t0) denotes the average over all
MinPrice(s,t) for flights in the training set
with that flight number, where MinPrice(s,t) is
the minimum price of that flight over the
interval starting from s days before departure up
until time t

24
Hamlet

Stacking with three base learners
Ripper (e.g., Rwait)
Time series
Q-learning (e.g., Qbuy)
Using multiple data mining methods to combine the
outputs
Output classifies each purchase point as buy
or wait.

25
A Sample Rule Generated by Hamlet

IF hours-before-takeoffgt480
AND airlineUnited
AND pricegt360
AND TSbuy
AND QLwait
THEN wait
TS is the output of the Time Series algorithm,
QL is the output of Q-Learning.

26
Ticket Purchasing Simulation

Real price data (collected from the Web)
Simulated passengers (a passenger is a person
wanting to buy a ticket on a particular flight at
a particular date and time)
Hamlet run once per day (training data all data
gathered in the past)

27
Saving Experiments

Savings for a simulated passenger is the
difference between the price of a ticket at the
earliest purchase point and the price of the
ticket at the point when the predictive model
recommends buying.
Net savings is savings net of both losses and
upgrade costs.

28
Effectiveness

HAMLETs savings that were 61.8 of optimal!
Savings buy immediately VS Hamlet.
Optimal buy at the best possible time (knowing
the future price information)

29
Savings by Method

Savings over buy now.
Penalty for sell out upgrade cost.
Total ticket cost is 4,579,600.

30
Savings by Method Cont.

Table below shows the savings, losses, upgrade
costs, and net savings achieved in the simulation
by each predictive model

31
Sensitivity Analysis

Varying two key parameters to test the robustness
of the results to changes in simulation
Change the distribution of passengers requesting
flight tickets (e.g., uniform, linear
decrease/increase) to check the performance on
multiple flights in three hour interval.
Allow a passenger to purchase a ticket at any
time during a three hour internal (e.g.,
specifies fly in the morning, afternoon or
evening)

32
Sensitivity Analysis Cont.
Legend
Time Series Q-Learning By Hand Ripper Hamlet Optim
al
33
Upgrade Penalty

Most algorithms avoided the costly upgrades
almost all the time. ( Upgrades as a fraction
of the number of test passengers 4488 of them)

34
Discussion

76 of the time --- no savings possible.
(Prices never dropped from the earliest
purchase point until the flight departed)
Uniform distribution over 21 days.
33 of the passengers arrived in the last week.
No passengers arrived 28 days before.
Simulation understates possible savings!

35
Savings on Feasible Flights

Comparison of Net Savings (as a percent of total
ticket price) on Feasible Flights (price saving
is possible)

36
Related Work

Trading agent competition.
Auction strategies
Temporal data mining.
Time Series.
Computational finance.

37
Future Work

More tests are necessary international,
multi-routes, hotels, etc.
Cost sensitive learning
Additional base learners
Bagging/boosting
Refined predictions
Commercialization patent, license.

38
Conclusions

Dynamic pricing is prevalent.
Price mining a-la-Hamlet is feasible.
Price drops can be surprisingly predictable.
Need additional studies and algorithms.
Great potential to help consumers!?

39
But

Airlines may introduce noise into their pricing
patterns to fool a price miner.
Demand and supply of seats are uncertain. (Good
prediction based on good assumptions)
Who can earn most benefit in the end?
(Airlines, consumers, price miner?)

40
John Nash said
Its a GAME!
41
References

Fast Effective Rule Induction, In A. Prieditis
and S. Russell, editors, Proc of the 12th ICML,
1995.
A General Method for making classifiers
cost-sensitive, In Proc. of Fifth ACM SIGKDD,
1999.
Reinforcement Learning An Introduction. MIT
Press, Cambridge, MA, 1998.
The Analysis of Time Series An Introduction.
Chapman and Hall, London, UK, 1989.
Airlines Rely on Technology To Manipulate Fare
Structure By Scott Mccartney, Wall Street
Jounal, November 3, 1997.

42
End

Thank you!!

Write a Comment

User Comments (0)

Mining Airfare Data to Minimize Ticket Purchase Price PowerPoint PPT Presentation