Enhancing the Quality of Transferred Household Travel Survey Data: A Bayesian Updating Approach Using MCMC with Gibbs Sampling - PowerPoint PPT Presentation

About This Presentation
Title:

Enhancing the Quality of Transferred Household Travel Survey Data: A Bayesian Updating Approach Using MCMC with Gibbs Sampling

Description:

University of Illinois at Chicago ... et al) Highly aggregate ORNL s NPTS/NHTS transferability study (Pat Hu, et al ... not always the case. How to improve the ... – PowerPoint PPT presentation

Number of Views:171
Avg rating:3.0/5.0
Slides: 37
Provided by: Kou131
Category:

less

Transcript and Presenter's Notes

Title: Enhancing the Quality of Transferred Household Travel Survey Data: A Bayesian Updating Approach Using MCMC with Gibbs Sampling


1
Enhancing the Quality of Transferred Household
Travel Survey Data A Bayesian Updating Approach
Using MCMC with Gibbs Sampling
  • Yongping Zhang
  • Kouros Mohammadian, PhD
  • Department of Civil and Materials Engineering
  • University of Illinois at Chicago

The 11th TRB National Transportation Planning
Applications Conference May 7, 2007
2
Data Transferability
  • The idea is to use data collected in one context
    in a new context. This can reduce or eliminate
    the need for a large data collection in the
    application context.
  • Previous Studies
  • ITE trip generation tables
  • NCHRP 365 (Nancy McGuckin, et al)
  • Highly aggregate
  • ORNLs NPTS/NHTS transferability study (Pat Hu,
    et al)
  • Aggregate (CT level)
  • Data simulation (Stopher and Greaves)
  • Disaggregate (HH level), CRT classification
    method, limited number of independent variables

3
Project Approach
  • Consider larger set of variables
  • NHTS and CTPP datasets
  • Use quantifiable variables that can be easily
    predicted or are available from other sources
    (e.g., PUMS)
  • Consider variables representing Land-use, Urban
    form, and transportation system characteristics
  • Advanced clustering, updating, and simulation
    approaches

4
Data
  • Data Sources
  • 2001 NHTS, 2000 CTPP, PUMS, 2003 TTI, Tiger/Line
    GIS data files
  • Data Cleaning
  • 33 variables of demographics, socio-economics
    and land use
  • Individual level Age group, Race/Ethnicity,
    Education, Occupation
  • Household level HH size, Income, Adults,
    Vehicles, Drivers, Workers
  • Census tract level Housing, Employment, and
    Population densities
  • New Variables

5
New Variables
  • Intersection density (Tiger/Line)
  • No. of intersections / Area
  • Road density (Tiger/Line)
  • Road length / Area
  • Pedestrian environment (Tiger/Line)
  • Block size Road length / No. of intersections
  • Transit friendly environment (CTPP)
  • Transit users / Total no. of workers
  • Transit trips / Total no. of trips
  • Congestion factor
  • Travel time index (TTI report for 85 MSAs)
  • Avg. travel time / Free flow TT in that region

6
Dependent Variables
  • Travel Characteristics (from NHTS trip file
    aggregated to HH level)
  • VMT for each household
  • No. of trips
  • No. of mandatory trips
  • No. of maintenance trips
  • No. of discretionary trips
  • No. of transit trips in the HH
  • No. of private vehicle trips
  • No. of non-motorized (bicycles and walk) trips
  • No. of tours
  • Average trips per tour
  • Average trip distance in miles for all HH members
  • No. of transit users in the HH
  • No. of carpool users in the HH
  • Percentage of public transit usage in the HH
  • Percentage of carpool usage among workers in the
    HH
  • Total commute distance in the HH
  • Average commute distance in the HH

7
(No Transcript)
8
Clustering
  • Classification schema is a critical issue
  • Clustering methods tested include K-Means,
    hierarchical, CRT, TwoStep, ANN
  • 11 clusters were generated using TwoStep
    clustering method
  • ONLY national data is used

9
Clusters
  • Rich and Smart
  • middle age families
  • professional or managerial white collar jobs
  • graduate degrees
  • high incomes
  • majority live in suburbs.
  • greater part are White but also some Asian
  • Young Achievers
  • Young couples without children or mainly with
    pre-school children
  • college degrees
  • white collar jobs in sales, service, technical,
    and professional
  • mid-range income.
  • higher percentages live in suburb or rural areas.
  • Kids-centered Families
  • middle aged and working class families
  • pre-school and school age children
  • usually have college education
  • mid-rage to high level income
  • primarily White and live in suburb or town

10
Clusters, cont.
  • Rural Blues
  • working class, middle aged families
  • pre-school and school age children
  • mainly high school graduates
  • blue collar jobs (farming, manufacturing, etc)
  • low to mid-range income
  • greater part are White and mainly live in rural
    area or small towns.
  • Working Mixing Pot
  • working class White, Black, Asian, or Hispanic
  • single adults or couples
  • college or high school education
  • low to mid-range income
  • Mainstream Families
  • mid-scale, upper mid age, White
  • large working class couples or families with
    older children
  • college or high school education
  • mid-range to high level income
  • suburb or rural areas

11
Clusters, cont.
  • Senior Couples
  • senior couples,
  • majority working and some are retired
  • greater part is White but include some Black,
    Asian, or American-Indians
  • suburb or rural areas.
  • Sustaining Minority Families
  • low income,
  • middle aged, working class families
  • mainly Hispanic or Black but also some Asian and
    White
  • majority have not finished high school
  • service, sales, manufacturing, farming, or
    construction jobs
  • Forever Youngs
  • White senior couples, empty nesters
  • mostly retired but some have sales, service, or
    managerial jobs
  • low to mid-range income

12
Clusters, cont.
  • Traditional Seniors
  • mainly retired single individuals and some
    retired couples
  • low income.
  • majority are White but some Black, Asian, or
    American-Indians
  • Neo Urbans
  • Small families/couples or single individuals
  • dense urban areas
  • college education
  • low to mid-range income
  • sales, service, or professional jobs
  • dominant race is White but a significant number
    are Black, Asian, and Hispanic

13
Cluster-Based Travel Characteristics
14
(No Transcript)
15
Transferability
  • An ANN model (with genetic algorithm) is used to
    simulate cluster membership as a function of 11
    factors for each HH in add-on datasets
  • The model has 92.4 prediction potential
  • Travel characteristics are transferred from
    national clusters to add-on data according to
    their cluster membership
  • Weighted observed and Predicted travel
    characteristics are compared

16
Comparison of Weighted Trip Count per Person
17
Comparison of Weighted Mandatory Trips per Person
18
Original Comparison of Transit Usage Not so
good! some clusters need improvement
  • Compared to No. of Trips, the prediction of
    transit usage is not so good.
  • Cluster 5,8,10,11 show significant difference and
    need improvement.

19
Improvement to Clusters Using CRT
  • 1. The first level of tree is grown upon the
    difference of the No. of vehicles in the
    household (own vehicle or not).
  • 2. Improvement of the model due to this level
    is defined by improvement/(Variance of Node 0).
  • For example, here 0.0017 equals to 13.3, and
    0.009 equals to 7.05 and 0.0002 equals to 1.57.
  • Total model improvement is about 22.

20
Considering DistributionsTrip Rate
Nice match shown! however, not always the case.
How to improve the transferability?
21
Considering DistributionsTrip Distance
Not So Good! Needs to be improved
22
Considering Distributions
  • Various distributions were fitted to the dataset
    including
  • Normal, Gamma, Weibull, Exponential, Max Extreme,
    Lognormal, Logistic, Students t, Min Extreme,
    Triangular, General Beta, Pareto, Uniform,
    Binomial, Geometric, Hyper Geometric, and
    Poisson.
  • The fitting results are interpreted by
  • examining the rankings of the three fit
    statistics
  • A-D, K-S, and Chi-squared statistics
  • visually judging of plots, density and cumulative
    curves
  • p-value and critical values at different sig.
    levels.
  • Non-normal distributions are dominant (e.g.,
    Gamma)

23
Gamma Distribution
Gamma function
k gt 0 is the shape parameter ? gt 0 is the scale
parameter the location parameter determines where
the origin is located
PDF
CDF
24
Fitted Distribution with Parameters for each
Variable by Cluster
25
(No Transcript)
26
Bayesian Updating
  • Local updating can significantly improve the
    quality of the transferred data
  • Used Bayesian updating
  • Traditionally in transferability literature only
    variables with normal distributions have been
    studied due to the simplicity in calculation of
    posterior from normal prior and likelihood.
  • In practice, the variables of interest (i.e., the
    likelihood) can take various distributional
    forms.

27
Bayesian Updating
  • f(x?) is the probability function for the
    observed data x (i.e., local sample), given the
    unknown parameter ?,
  • g(?) is the prior distribution for ?,
  • k(?x) is the posterior distribution for ? given
    observed data x
  • The technique can be expanded to situations when
    no prior data is available.
  • The analyst can do successive updating,
  • using the new information without losing the
    gains from the old one.

28
Bayesian Updating (2)
  • The National sample of NHTS 2001 is used as the
    source for the prior information
  • A small local sample is randomly selected from
    the NY add-on, leaving the rest for validation
  • Bootstrap method is used to resample the data and
    justify the prior distribution assumptions of
    parameters of interest (i.e., scale and shape for
    Normal distribution),
  • Normal distribution is fitted to each of the
    resample datasets.

29
Bayesian Updating (3)
  • Then, Markov Chain Monte Carlo (MCMC) simulation
    with Gibbs Sampling is utilized to update the
    prior with the small local sample.
  • Assuming the updated variables of interest are
    still Gamma distributed, the posterior of
    parameters are used to derive the updated means
    and SD of the variables.
  • Updated parameters are then compared with the
    validation data and national data to test the
    effectiveness of the updating procedure.
  • The comparisons prove that significant
    improvement is achieved.
  • The improvement increases with the local sample
    size
  • a relatively cost-effective sample size is
    suggested

30
  • Root Mean Square Error (RMSE) decreases with the
    increase of sample size.
  • There is instability when the sample size within
    each cluster is smaller than 45 observations.
  • A sample size of 75 per cluster seems to be the
    most cost-effective plan.

31
Updating Results
  • Updated mean values are significantly improved
    towards validation data.

32
Summary of Updating Results
Trip Rates per Person Trip Rates per Person Trip Rates per Person Trip Rates per Person Trip Rates per Person Trip Rates per Person Trip Rates per Person Trip Rates per Person Trip Rates per Person Trip Rates per Person Trip Rates per Person Trip Rates per Person Trip Rates per Person
Cluster National National National National National-updated National-updated National-updated National-updated State of New York State of New York State of New York State of New York
Cluster Location Shape Scale Mean Location Shape Scale Mean Location Shape Scale Mean
2 -0.83 5.42 0.88 3.94 -0.83 5.15 0.92 3.91 -0.30 3.47 1.14 3.66
3 -3.13 12.31 0.61 4.38 -3.13 12.05 0.61 4.22 -1.66 8.44 0.67 3.99
4 -0.99 6.42 0.77 3.95 -0.99 6.05 0.80 3.85 -0.42 4.43 0.89 3.53
8 -0.13 3.14 1.15 3.48 -0.13 2.90 1.12 3.13 0.18 2.40 1.24 3.16
11 0.04 2.52 1.47 3.75 0.04 2.44 1.45 3.58 0.32 2.20 1.40 3.39

Trip Distance per Person Trip Distance per Person Trip Distance per Person Trip Distance per Person Trip Distance per Person Trip Distance per Person Trip Distance per Person Trip Distance per Person Trip Distance per Person Trip Distance per Person Trip Distance per Person Trip Distance per Person Trip Distance per Person
Cluster National National National National National-updated National-updated National-updated National-updated State of New York State of New York State of New York State of New York
Cluster Location Shape Scale Mean Location Shape Scale Mean Location Shape Scale Mean
2 -0.09 1.45 21.28 30.67 -0.09 1.34 21.04 28.10 -0.07 1.32 20.84 27.33
3 -0.49 1.68 18.91 31.18 -0.49 1.62 18.93 30.18 0.11 1.53 19.31 29.62
4 -0.22 1.61 18.55 29.59 -0.22 1.45 19.98 28.75 -0.02 1.30 20.59 26.67
5 -0.09 1.20 24.93 29.93 -0.09 1.20 24.03 28.84 -0.09 1.19 23.97 28.36
6 -0.43 1.91 18.12 34.18 -0.43 1.89 18.22 34.01 -0.08 1.58 21.40 33.69
7 0.11 1.48 22.69 33.58 0.11 1.54 21.69 33.51 -0.08 1.52 20.75 31.55
8 -0.12 1.06 24.08 25.38 -0.12 1.03 24.03 24.63 -0.09 0.90 22.91 20.53
9 -0.09 1.16 21.43 24.72 -0.09 1.16 22.23 25.65 -0.03 1.17 22.17 25.91
33
(No Transcript)
34
Population Synthesizing and Travel Data Simulation
  • Using PUMS Data, NYC population is synthesized.
  • All of the contextual factors were calculated for
    each HH.
  • Synthetic population with all required 33
    variables was generated.
  • Using the ANN model, cluster memberships are
    obtained.
  • Travel data are simulated for each HH using Monte
    Carlo simulation of each travel attribute with
    updated parameters of the fitted distributions.

35
Comparison of Simulated and Add-on NYC Samples
(Trips per Person)
36
Comparison of Simulated and Add-on NYC Samples
(Trip Distance per Person)
Write a Comment
User Comments (0)
About PowerShow.com