Title: Mining Airfare Data to Minimize Ticket Purchase Price
 1Mining Airfare Data to Minimize Ticket Purchase 
Price
- Oren Etzioni 
 - Craig Knoblock 
 - Alex Yates 
 - Rattapoom Tuchinda
 
  2Outline
- Introduction and Motivation 
 - Data Mining Methods for Airfare Data 
 - Experiments 
 - Conclusion 
 
  3Introduction
- Corporations often use complex polices to vary 
product prices over time.  - Airline industry often use dynamic pricing 
strategies to maximize its revenue. (based on 
seasons, seats available, pricing of other 
airlines)  - In 1988, Bell Laboratories patented a 
mathematical formula that could perform rapid 
calculations on fare problems with thousands of 
variables.  
  4Introduction Cont.
- Airlines have learned to more effectively price 
their product, instead of attacking each other to 
gain market share.  - Today, airlines use sophisticated software to 
track their competitors pricing history and 
propose adjustments that optimize their overall 
revenue.  
  5Pricing Strategies
- Flights depart around holidays or weekends appear 
to change more.  - 7 and 14 day  advance purchase (airlines 
usually increase ticket price two weeks before 
departure date and ticket prices are at a maximum 
on departure dates.)  - Weekend Boosts
 
  6Example
- Price change over time for American Airlines 
flight 192223, LAX-BOS, departing on Jan. 2.  -  (This example shows the rapid price change in 
the days priori to the New Year)  
  7Categories of airlines
- Big players (United Airlines, American Airlines) 
  - Smaller airlines which concentrate on selling 
low-price tickets ( Air Trans, Southwest)  - Different types of tickets (economic, business, 
first class, restricted or unrestricted) 
  8Is Airfare Prediction Possible?
- Airlines have tons of historical data. 
 - Airlines have many fare classes for seats on the 
same flight, use different sales channels (e.g., 
travel agents, priceline.com)  - Airfare prices information become increasingly 
available on the Web.  - But some information is not available (e.g.,  of 
unsold seats in a particular flight).  
  9Motivation 
- The goal of this paper is to learn whether to buy 
a ticket or wait at a particular time point, for 
a particular flight, given the current price 
history recorded. 
  10Advisor Model
- Consumer wants to buy a ticket. 
 - Hamlet buy (this is a good price). 
 - Or wait (a better price will emerge). 
 - Notify consumer when price drops. 
 
  11Will Flights sell out?
- Watch the number of empty seats. 
 - Upgrade to business class. 
 - Place on another flight 
 - In our experiment upgrades were sufficient
 
  12Flight Data 
- Airlines typically user the same flight number 
(e.g., NW17) to refer to multiple flights with 
the same route that depart at the same time on 
different dates.  - In this paper, a particular flight is referred to 
a combination of its flight number and date.  - In the training dataset, same class is the set of 
states with the same flight number and the same 
hours but different department dates. 
  13Data Set 
- Used Fetch.coms data collection infrastructure. 
 - Collected over 12,000 price observations 
 - 41 day period, every 3 hours. 
 - Lowest available fare for a one-week roundtrip. 
 - LAX-BOS and SEA-IAD. 
 - 6 airlines including American, United, etc. 
 
  14Learning Task Formulation
- Input price observation data (built a flight 
data collection agent that runs at a scheduled 
internal, extracts the pricing data, and stores 
the results in a database.  - Algorithm label observations run learner-- 
Hamlet  - Output for each purchase point ? 
 -  buy VS wait 
 
  15Formulation cont.
- Want to use the latest information. 
 - Run learner daily to produce new model. 
 - Learner is trained on data gathered to date. 
 - Learned policy is a collection of these models. 
 - Our experiments evaluate the savings on the 
flights.  
  16Candidate Approaches
- By hand an expert looks at the data. 
 - Time series 
 - Not effective at price jumps! 
 - Reinforcement learning Q-learning. 
 - Used in computational finance. 
 - Rule learning Ripper,  
 
  17RIPPER
- RIPPER (Repeated Incremental Pruning to Produce 
Error Reduction) is an efficient rule learning 
algorithm that can process noisy datasets 
containing thousands of examples.  - First, the algorithm partition examples into a 
growing set and a pruning set. Next, a rule is 
grown by adding and pruning features. Repeat the 
process until the rule set output by RIPPER.  - RIPPER is suitable to handle two-class learning 
problem. 
  18RIPPER Cont.
- In this study, features include price, airline, 
route, hours-before-takeoff, etc.  - Learned 20-30 rules 
 
  19Simple Time Series
- Predict price using a fixed window of k price 
observations weighted by a.  - We used a linearly increasing function for a 
 - The time series model makes its decision based on 
a one-step prediction of the ticket price change  -  IF Pt1gtPt THEN buy, ELSE wait. 
 
  20Reinforcement Learning 
- Reinforcement learning is learning what to 
do---how to map situations to actions---so as to 
maximize a numerical reward signal.  - Supervised learning is learning from examples 
provided by some knowledgeable external 
supervisor.  
  21Q-learning
- Q learning is a method addressing the task of 
Reinforcement Learning. The main idea is to 
approximate the function assigning each 
state-action pair the highest possible payoff.  - Q denote a transition function, mapping each 
state-action pair to a successor state  - S denote a finite set of states 
 - A denote a finite set of actions 
 - R denote a reward function 
 
  22Q-learning Cont.
- Standard Q-Learning formula 
 - s is the state resulting from taking action a in 
state s.  - r is the discount factor for future rewards, in 
this paper, r1  
  23Hand-Crafted Rule
- A fairly simple policy consulted with travel 
agents, using it to compare with other data 
mining algorithms  -  IF ExpPrice(s0,t0)ltCurPrice AND s0gt7 days 
 -  THEN wait ELSE buy 
 -  ExpPrice(s0,t0) denotes the average over all 
MinPrice(s,t) for flights in the training set 
with that flight number, where MinPrice(s,t) is 
the minimum price of that flight over the 
interval starting from s days before departure up 
until time t  
  24Hamlet
- Stacking with three base learners 
 - Ripper (e.g., Rwait) 
 - Time series 
 - Q-learning (e.g., Qbuy) 
 - Using multiple data mining methods to combine the 
outputs  - Output classifies each purchase point as buy 
or wait.  
  25A Sample Rule Generated by Hamlet
- IF hours-before-takeoffgt480 
 -  AND airlineUnited 
 -  AND pricegt360 
 -  AND TSbuy 
 -  AND QLwait 
 -  THEN wait 
 -  TS is the output of the Time Series algorithm, 
 QL is the output of Q-Learning. 
  26Ticket Purchasing Simulation
- Real price data (collected from the Web) 
 - Simulated passengers (a passenger is a person 
wanting to buy a ticket on a particular flight at 
a particular date and time)  - Hamlet run once per day (training data all data 
gathered in the past)  
  27Saving Experiments
- Savings for a simulated passenger is the 
difference between the price of a ticket at the 
earliest purchase point and the price of the 
ticket at the point when the predictive model 
recommends buying.  - Net savings is savings net of both losses and 
upgrade costs.  
  28Effectiveness
-  HAMLETs savings that were 61.8 of optimal! 
 -  Savings buy immediately VS Hamlet. 
 -  Optimal buy at the best possible time (knowing 
the future price information)  
  29Savings by Method
- Savings over buy now. 
 - Penalty for sell out  upgrade cost. 
 - Total ticket cost is 4,579,600. 
 
  30Savings by Method Cont.
- Table below shows the savings, losses, upgrade 
costs, and net savings achieved in the simulation 
by each predictive model 
  31Sensitivity Analysis
- Varying two key parameters to test the robustness 
of the results to changes in simulation  - Change the distribution of passengers requesting 
flight tickets (e.g., uniform, linear 
decrease/increase) to check the performance on 
multiple flights in three hour interval.  - Allow a passenger to purchase a ticket at any 
time during a three hour internal (e.g., 
specifies fly in the morning, afternoon or 
evening)  
  32Sensitivity Analysis Cont.
Legend
Time Series Q-Learning By Hand Ripper Hamlet Optim
al 
 33Upgrade Penalty
- Most algorithms avoided the costly upgrades 
almost all the time. ( Upgrades as a fraction 
of the number of test passengers  4488 of them)  
  34Discussion
- 76 of the time --- no savings possible. 
 -  (Prices never dropped from the earliest 
purchase point until the flight departed)  - Uniform distribution over 21 days. 
 - 33 of the passengers arrived in the last week. 
 - No passengers arrived 28 days before. 
 - Simulation understates possible savings! 
 
  35Savings on Feasible Flights
- Comparison of Net Savings (as a percent of total 
ticket price) on Feasible Flights (price saving 
is possible)  
  36Related Work
- Trading agent competition. 
 - Auction strategies 
 - Temporal data mining. 
 - Time Series. 
 - Computational finance. 
 
  37Future Work
- More tests are necessary international, 
multi-routes, hotels, etc.  - Cost sensitive learning 
 - Additional base learners 
 - Bagging/boosting 
 - Refined predictions 
 - Commercialization patent, license. 
 
  38Conclusions
- Dynamic pricing is prevalent. 
 - Price mining a-la-Hamlet is feasible. 
 - Price drops can be surprisingly predictable. 
 - Need additional studies and algorithms. 
 - Great potential to help consumers!? 
 
  39But
- Airlines may introduce noise into their pricing 
patterns to fool a price miner.  - Demand and supply of seats are uncertain. (Good 
prediction based on good assumptions)  - Who can earn most benefit in the end? 
 -  (Airlines, consumers, price miner?)
 
  40John Nash said
 Its a GAME! 
 41References
- Fast Effective Rule Induction, In A. Prieditis 
and S. Russell, editors, Proc of the 12th ICML, 
1995.  - A General Method for making classifiers 
cost-sensitive, In Proc. of Fifth ACM SIGKDD, 
1999.  - Reinforcement Learning An Introduction. MIT 
Press, Cambridge, MA, 1998.  - The Analysis of Time Series An Introduction. 
Chapman and Hall, London, UK, 1989.  - Airlines Rely on Technology To Manipulate Fare 
Structure By Scott Mccartney, Wall Street 
Jounal, November 3, 1997. 
  42End