Data Mining to Improve Forecast Accuracy in the Airline Business - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Data Mining to Improve Forecast Accuracy in the Airline Business

Description:

Attributes used in logistic regression model (in sequence of importance) ... Interface from Revenue Management System. Flight level: ALC, FLN, DEP Date/Time. Comp. ... – PowerPoint PPT presentation

Number of Views:442
Avg rating:3.0/5.0
Slides: 27
Provided by: A374
Category:

less

Transcript and Presenter's Notes

Title: Data Mining to Improve Forecast Accuracy in the Airline Business


1
Data Mining to Improve Forecast Accuracy in the
Airline Business
Hans Feyen, Christoph Hüglin Atraxis AG CKCB/Data
Mining and Analysis CH-8058 Zürich Airport Tel.
41 1 812 45 45 www.atraxis.com
2
Outline
  • What is Data Mining
  • Process Oriented
  • Overview of problems tackled using Data Mining
  • Forecasting of No Show passengers based on PNR
    information
  • Methodology
  • Preliminary Results
  • Forecasting of Group Utilisation Ratios at time
    of request
  • Methodology
  • Results

3
Data Mining is the Process of Discovering hidden
Information in Data
Data Mining maximizes the value of a Data
Warehouse
4
Key Success Factors of a Data Mining Project
  • Business Knowledge
  • Business specialists and Data Mining specialists
    work together
  • Communication and Co-operation with Customers
  • Data mining is used to discover patterns and
    relationships in data in order to help you make
    better business decisions
  • Data Handling (Extracting, Deployment)
  • Efficient and system-independent handling
    (extracting, merging, filtering, aggregating) of
    distributed data is absolutely necessary
  • Methodological Knowledge
  • Select the right approach and method for each
    problem
  • Data Quality
  • Garbage in---Garbage out

5
Customer benefits of a Data Mining Project
Data Mining
better exploitation of information
Forecast/Prediction
Recognition
Deviation
Improve Airline- / Marketing Profitability better
decisions and more targeted marketing-actions
6
Case 1 PNR Based No Show Forecast
  • Traditionally, No Show rates are forecasted on a
    segment level using historical flight
    information.
  • Probability of a passenger to be a No Show
    depends on individual passenger behavior.
  • In our approach, we use passenger (available via
    PNR) and schedule information to better exploit
    available data and to obtain more accurate
    forecasts.

Standard Method
Our Approach
Historical Flights
Historical Flights

Booking DW
PAX
Schedule Info
7
Case 1 PNR Based No Show Forecast
  • Data Source
  • We extracted two random samples from a Bookings
    Data Warehouse
  • The first Sample served as a training sample to
    set-up the forecasting models
  • The second Sample served as test sample to
    validate the forecasting models
  • Each contained about 70000 (historical) segments
    and had the same (historical) No Show rate

8
Case 1 PNR Based No Show Forecast
  • Data preparation Derive additional attributes
    from PNR information (use of Trip Analyser and
    OD Builder algorithm)

Attributes available from PNR
(variable type)
Generated Attributes
(variable type)
Booking time prior to departure
(count)
Was PNR split during booking history?
(binary)
Booking class
(categorical)
Origin region
(categorical)
Service class
(categorical)
Destination region
(categorical)
Number of passengers in PNR
(count)
Board point region
(categorical)
Origin City
(categorical)
Off point region
(categorical)
Destination airport
(categorical)
Grouped Board point airports
(categorical)
Board point airport
(categorical)
Position of segment within OD
(count)
Total travel time for OD
(continuous)
Off point airport
(categorical)
Weekday of departure
(categorical)
Total flight time for OD
(continuous)
Flight time (segment) / Flight time (OD)
(categorical)
Connection time between segments in OD
(categorical)
More than one airline used in OD?
(binary)
Is
segment part of round trip? (binary)
Does segment belong to return portion of trip?
(binary)
Purpose of trip (business, leisure, or mixed)
(categorical)
Total time for trip
(continuous)
Number of segments in trip
(count)
Number of scheduled flights per week
(count)
9
Case 1 PNR Based No Show Forecast
  • Exploratory Data Analysis (based on training
    sample)

10
Case 1 PNR Based No Show Forecast
  • Exploratory Data Analysis (based on training
    sample)

11
Case 1 PNR Based No Show Forecast
  • Data Mining Methodologies
  • Combined use of Decision Tree algorithms and
    Logistic Regression
  • Decision Tree algorithms are very useful to
  • Identify which variables influence No-Show
    probability.
  • Reduce the number of levels of categorical
    variables to construct more meaningful new
    variables.
  • Identify interactions between variables that are
    likely to be important terms in a regression
    model.
  • to prepare the data set in such a manner that
    optimal accuracy can be obtained by the logistic
    regression.

12
Case 1 PNR Based No Show Forecast
  • Results of the Modeling
  • Attributes used in logistic regression model (in
    sequence of importance)

1. Connection time between segments in OD 2.
Booking time prior to departure 3. Segment part
of round trip? 4. Purpose of trip 5. Grouped
board point airports 6. Return portion of trip
? 7. PNR split indicator 8. Number of segments
in trip
9. Origin region 10. Flight time (segment) /
Flight time (OD) 11. Number of scheduled flights
per week 12. Destination region 13. More than one
airline used in OD 14. Booking class 15. Number
of passengers in PNR
13
Case 1 PNR Based No Show Forecast
  • Logistic Regression Results (test sample)

70
60
50
40
no-show frequency ()
30
20
10
0
0-2
4-6
8-10
12-14
16-18
20-25
30-35
40-50
probability class ()
Comparison of observed and estimated (by the
logistic regression model) no-show probabilities.
Compared are the observed no-show frequencies by
probability classes.
14
Deployment of Forecasted No Show rates
Interface from Revenue Management System Flight
level ALC, FLN, DEP Date/Time Comp. level.
Phys. Cap. RBD level Seats Sold, Constr. Demand
Forecast
Booking DW
Flight selection Filter
T2
T1
PNR Data
No-Show Forecast
Schedule Info
RMS
RES
T3
Hand back to RMS
T4
Interface to RMS
T4 gt T3 ³ max(T1, T2) processing time No-Show
forecast
15
Case 2 Group Show Up Forecast
  • Background Information
  • What is a Utilization Rate
  • (DCP Data Collection Point)
  • How are currently Utilisation Rate Forecasts
    made?
  • Applicable records in history are selected based
    on
  • customer type, group type, agency, DOW, market
    OD, POS, period, days prior...

16
Case 2 Group Show Up Forecast
The Life of a Group Request A minority of
requested groups survives
All Requested and Accepted Groups
50000
Group Survived All Seats filled
40000
30000
Count
20000
10000
Group Survived Not all seats filled
0
Yes
No
Completely Cancelled Groups
Completely Cancelled Group ?
17
Case 2 Group Show Up Forecast
  • Utilisation Rates depends on the view!

All Requested and Accepted Groups
Only Surviving Groups
Observed
Utilisation Rate
S
A
S
A
Group Type (Adhoc vs Series)
18
Case 2 Group Show Up Forecast
  • Ideally, forecasting Utilization Rates is
    composed of two steps

Forecast
Observed
STEP1 Forecast whether a Group Survives--or not
(at time of request).
Both Steps
Utilization Rate
STEP2 Forecast whether a Seat is filled if the
Group Survives.
SWIDO
FEST
EU NORTH
BENELUX
AFRICA
Combined
MEST
EU SOUTH
EU EAST
AMERICAS
Origin Region
19
Case 2 Group Show Up Forecast
  • Data Mining Methodologies
  • Combined use of Decision Tree algorithms and
    Logistic Regression (as before)
  • Since the purpose of the project was not the
    identification of a predictive model but to get a
    better understanding of factors influencing the
    Group Show Up rate, we only pulled a training
    sample.

20
Case 2 Group Show Up Forecast
Attributes used in the logistic regression model
(in sequence of importance) This model forecasts
the utilisation rate at the time of
request!! Period (0-23, half-months,
expressing seasonality), Origin Region (at
time of request) Seats Requested on Master PNR
(in categories) Booking Region (from
axsWizard, at time of request) Customer
Type (leisure, shareholders, etc) Reservation
System (1A, 1G, Others) Days prior (in
classes) (0-18, 19-33, 34-47, 48-59, 60-73,
74-155, 156-244, gt244) Destination Region, (at
time of request) Group Type (Ad-hoc or
series) Day Of Week (0-6)
21
Case 2 Group Show Up Forecast
Attribute Seats requested on Master PNR (in
categories, percentiles 0-10, 11-20, 21-30, gt30)
All Groups
Forecast
Higher Show-Up rates for smaller requested Groups
Observed
Both Steps
STEP1
Utilization Rate
STEP2
Combined
PERC_Hi
PERC_75
PERC_50
PERC_25
Number of Seats Requested on Master PNR
22
Case 2 Group Show Up Forecast
Attribute Origin Region
All Groups
Forecast
Observed
Show-Up rates differ by Booking Region
Both Steps
STEP1
Utilization Rate
STEP2
Combined
SWIDO
FEST
EUNORTH
BENELUX
AFRICA
MEST
EUSOUTH
EUEAST
AMERICAS
Origin Region
23
Case 2 Group Show Up Forecast
Attribute Amount of Days Prior to departure a
Group Request was made
All Groups
Forecast
Observed
Late requests result in higher Show Up figures
Both Steps
STEP1
Utilization Rate
STEP2
Combined
LE_244
LE_73
LE_47
LE_18
X_245
LE_155
LE_59
LE_33
Days Prior (in categories, LE_18 means that the
request took place 18 days or less before
departure)
24
Case 2 Group Show Up Forecast
  • Hint for further improvement consider changes in
    the shells to produce better forecasts

80
60
Utilization Rate
40
20
0
8
6
4
2
0
Number of times a Master PNR is Splitted
25
Conclusions
  • A Data Mining process provides a structured way
    of analyzing data with the final purpose of
    making more accurate forecasts (as in the two
    presented cases)
  • Data Mining is best suited to extract more
    business understanding out of large data sets.
    Often not a model but knowledge on influence
    factors is the most important result of a data
    mining study.
  • By combining Data Mining methodologies data can
    be optimally prepared for modeling. Apart of
    Decision Trees and Logistic Regression, we often
    use visualisation techniques, Bayesian networks,
    Neural networks, Self Organizing Maps etc.
  • We used Data Mining successfully for a wide range
    of projects CRM (Segmentation of FFP Members,
    FFP Tier level Scenario Calculations, Customer
    Value Prediction), Performance Measurement of
    Revenue Management Systems (based on Wizard),
    Clustering of Booking Curves, Identification of
    Airport Catchments, Yield Monitoring and in
    E-business applications.

26
Atraxis and IT works
Write a Comment
User Comments (0)
About PowerShow.com