Title: Estimating Phone Service and Usage Percentages: How to Weight the Data from a Local, Dual-Frame Sample Survey of Cellphone and Landline Telephone Users in the United States
1Estimating Phone Service and Usage
PercentagesHow to Weight the Data from a Local,
Dual-Frame Sample Surveyof Cellphone and
Landline Telephone Users in the United States
Thomas M. Guterbock TomG_at_virignia.edu
- Presented at
- AAPOR 2009
- Hollywood, FL
- May 14, 2009
2The Problem
- Dual-frame telephone surveys are becoming more
prevalent in U.S. survey research - The rising percentages and distinctive
demographics of cellphone-only CPO households
make it imperative that sample designs cover
them. - Landline RDD Cellphone RDD sample frames
- Result sample data for 3 phone-service segments
- CPO overlap (dual-phone) landline-only LLO
- Problem what is the correct population
distribution across 3 phone service segments?
3National data? No problem
- National Health Interview Survey NHIS data are
the gold standard - Uses a very large N, continuous sampling,
in-person mode to establish household phone
service. - NHIS provides fairly current data on cellphone
coverage, percent CPO, phone segment
distributions - NHIS data are available for the U.S. for four
census regions - State estimates released in 2009 using CPS NHIS
- SOLUTION Weight phone-service segments in the
national sample to NHIS percents for U.S.
4What about local studies?
- We cannot assume that the local phone-service
segment distribution is the same as national or
regional averages. - Cellphone penetration and CPO lifestyle adoption
vary considerably across areas. - Cell penetration is higher in high density areas,
metro areas, high-income areas, flat terrain,
near interstates - CPO percentage varies with age, ethnicity,
urbanicity, landline phone costs - NHIS strong phone service variation across
regions, states - Variation within states is probably similar in
magnitude
5Why not use percents from the local sample data?
- In a local dual-frame sample, we will directly
observe CPO in the cell sample, LLO in the
landline sample. - But estimation from these observed percents is
problematic for several reasons - If we just combine the two samples, we overlook
the fact that overlap households are
double-sampled. - Its not intuitively obvious how to calculate the
percentages for the combined sample from the
split sample results.
6Why not use percents from the local sample data?
- Cellphone-only cases are substantially
overcounted in a cellphone sample. - CPOs have different telephone behaviors. More
likely than dual-phone users . . . - To have phone with them
- To have phone turned on
- To accept calls from unknown numbers
- Cellphone samples are usually kept small because
of higher per-completion cost - So we cant just add up the segment counts from
the two samples.
7Can we use the local sample data?
- Collected data from the two realized, local
samples surely contain useful information about
local phone-service segments - Overcounts of CPO and LLO distort these data
- We have to do the math correctly
- IDEA Estimate the amount of CPO and LLO
overcount in national dual-frame studies, and
then apply an adjustment to the local sample data
to arrive at local estimates for CPO and LLO
8Overview A proposed solution
- Develop algebraic solution for combining the two
sample results from a dual-frame design into an
overall phone service segment distribution,
assuming equal response rates. - Develop algebraic solution for combining the two
samples when response rates are NOT equal - higher response rates (overcounts) are assumed
for CPO and LLO (compared to overlap) - Compare 2007 CHIS to 2007 NHIS (West region) to
estimate response rate ratios that correspond
to the observed overcount - Apply these ratios to newly collected dual-frame
survey data from three counties in Virginia - Result plausible, locality-specific estimates of
phone segments
9Key assumptions
- Local phone-service segment distributions vary
- Forcing NHIS segment distributions onto local
data would distort results - Response rate ratios (rates of overcount) are
constant across surveys - If fielding and screening procedures are similar
- Sampling variability is ignorable
- In comparison of NHIS to CHIS
- In projection from the local samples to local
population
10How to combine dual-frame sample results(equal
response rates)
11The universe of telephone households
100
12Cell phone samples include some that are also in
the RDD frame
Landline- only households are excluded
81.1
Cell phones (Frame 1)
13RDD samples cover all landline households
RDD (Frame 2)
Cell-phone- only households are excluded
86.8
14RDD and Cell samples overlap,yield complete
coverage
a
RDD
LLO LANDLINE ONLY 18.9 PbT.189
OVERLAP CELL LANDLINE 67.9 PabT.679
CPO CELL ONLY 13.2 PaT.132
b
These proportions define the population
distribution of segments
ab
Cell phones
All percentages are from 2007 NHIS data (West
region).
15With equal response rates, cell sample would
show
OVERLAP PabT.679
a
RDD
LLO LANDLINE ONLY PbT.189
CPO PaT.132
81.1
OVERLAP as percent of Frame 1 Pab'
.679/.811 .837
CPO as percent of Frame 1 Pa' .132/.811 .163
Cell phones
All percentages are from 2007 NHIS data (West
region).
16With equal response rates,RDD sample would show
a
86.8
RDD
LLO PbT.189
OVERLAP PabT.679
CPO PaT.132
b
OVERLAP as percent of Frame 2 Pab?.679/.868 .783
LLO as percent Of Frame 2 Pb?.189/.868 .218
Cell phones
ab
All percentages are from 2007 NHIS data (West
region).
17So, if response rates were equal, we would have
. . .
True values NHIS West 2007 True values NHIS West 2007 Observed thru Cell sample Observed thru Cell sample Observed thru RDD sample Observed thru RDD sample
CPO PaT 13.2 Pa' 16.3
OverlapPabT 67.9 Pab' 83.7 Pab? 78.3
LLO PbT 18.9 Pb? 21.7
Total 100.0 100.0 100.0
18How do we get from observedpercentages to
population percents?
True values NHIS West 2007 True values NHIS West 2007 Observed thru Cell sample Observed thru Cell sample Observed thru RDD sample Observed thru RDD sample
CPO PaT ?? Pa' 16.3
OverlapPabT ?? Pab' 83.7 Pab? 78.3
LLO PbT ?? Pb? 21.7
Total 100.0 100.0 100.0
19Formulas for calculating underlying population
distribution
With PabT PaT evaluated, we have
.
20Combining dual-frame sample results when
response rates are not equal
21Three segments, four response rates
RDD sample response rate for LLOs rb
a
RDD
Cell sample response rate for CPOs ra
b
RDD sample response rate for overlap rab?
Cell sample response rate for overlap rab'
ab
Cell phones
224 response rates,2 response rate ratios
- Reduction in base response for dual-phone in the
cell sample is - This is the response rate ratio that applies to
the cellphone sample. - Reduction in base response for dual-phone in the
RDD sample is - This is the response rate ratio for the RDD
sample.
23It follows that . . .
- And our expressions for calculating true
population phone service segments are modified by
incorporating the response rate ratios
24How to calculate response rate ratios
- Now assume that we have observed results from a
dual-frame phone survey. - We also know the true population distribution.
- We can calculate the response rate ratios
25Deriving response rate ratiosby comparingCHIS
2007 to NHIS
26CHIS 2007California Health Interview Survey
True values NHIS West 2007 True values NHIS West 2007 Observed thru Cell sample Observed thru Cell sample Observed thru RDD sample Observed thru RDD sample
CPO PaT 13.2 Pa' 34.6
OverlapPabT 67.9 Pab' 65.4 Pab? 68.3
LLO PbT 18.9 Pb? 32.7
Total 100.0 100.0 100.0
?16.3
?21.7
27From these data we can evaluate r1 and r2
In the cellphone sample, overlap response rate is
only 37 of CPO rate.
In the RDD sample, overlap response rate is about
60 of LLO rate.
- Overcount of CPOs is greater than overcount of
LLOs. - This shows many dual-phone users still use
cellphone - as a secondary device.
28Calculating local area estimatesof population
phone-servicesegment distributions
292008 Prince William County Survey
- Citizen satisfaction survey in large, suburban
county in Northern Virginia - N 1,666
- Triple frame design cellphone, landline RDD, and
directory-listed sample - Here we combine the landline samples and treat as
a dual-frame design - Screening questions patterned after those on CHIS
302008 Results for Prince William County, VA
Observed thru Cell sample Observed thru Cell sample Observed thru RDD sample Observed thru RDD sample
CPO PaT Pa' 40.6 0.7
OverlapPabT Pab' 59.4 Pab? 88.5
LLO PbT Pb? 10.5
Total 100.0 100.0 100.0
312008 Results for Prince William County, VA
True values for PWC True values for PWC Observed thru Cell sample Observed thru Cell sample Observed thru RDD sample Observed thru RDD sample
CPO PaT ?? Pa' 40.6 0.7
OverlapPabT ?? Pab' 59.4 Pab? 88.5
LLO PbT ?? Pb? 10.5
Total 100.0 100.0 100.0
32Apply formulas given above
Calculations based on r1 .368 r2 .598
332008 Results for Prince William County, VA
True values for PWC True values for PWC Observed thru Cell sample Observed thru Cell sample Observed thru RDD sample Observed thru RDD sample
CPO PaT 19.0 Pa' 40.6 0.7
OverlapPabT 75.3 Pab' 59.4 Pab? 88.5
LLO PbT 5.7 Pb? 10.5
Total 100.0 100.0 100.0
342008 Albemarle County Survey
- Citizen satisfaction survey
- Suburban and rural county surrounding City of
Charlottesville, VA - Similar triple-frame design as in PWC survey
- Smaller sample size n 700
352008 Results for Albemarle County, VA
Observed thru Cell sample Observed thru Cell sample Observed thru RDD sample Observed thru RDD sample
CPO PaT Pa' 21.9 0.2
OverlapPabT Pab' 78.1 Pab? 82.7
LLO PbT Pb? 17.2
Total 100.0 100.0 100.0
362008 Results for Albemarle County, VA
True values for Albemarle True values for Albemarle Observed thru Cell sample Observed thru Cell sample Observed thru RDD sample Observed thru RDD sample
CPO PaT 8.4 Pa' 21.9 0.2
OverlapPabT 81.4 Pab' 78.1 Pab? 82.7
LLO PbT 10.2 Pb? 17.2
Total 100.0 100.0 100.0
372008 Chesterfield County Survey
- Citizen satisfaction survey
- Suburban county adjacent to Richmond, VA
- Similar triple-frame design as in PWC survey
- Treated as dual frame here
- n 1600
382008 Results for Chesterfield County, VA
Observed thru Cell sample Observed thru Cell sample Observed thru RDD sample Observed thru RDD sample
CPO PaT Pa' 20.4 0.1
OverlapPabT Pab' 79.6 Pab? 87.6
LLO PbT Pb? 12.4
Total 100.0 100.0 100.0
392008 Results for Chesterfield County, VA
True values for Chesterfield True values for Chesterfield Observed thru Cell sample Observed thru Cell sample Observed thru RDD sample Observed thru RDD sample
CPO PaT 8.0 Pa' 20.4 0.1
OverlapPabT 84.8 Pab' 79.6 Pab? 87.6
LLO PbT 7.2 Pb? 12.4
Total 100.0 100.0 100.0
40Contrasting results
NHIS CHIS NHIS Prince William Albe-marle Chester-field
CPO PaT 13.2 13.2 19.0 8.4 8.0
OverlapPabT 67.9 67.9 75.3 81.4 84.8
LLO PbT 18.9 18.9 5.7 10.2 7.2
Total 100.0 100.0 100.0 100.0 100.0
41Using the estimated segment distribution to
weight thesample data
42Example PWC 2008
Observed thru cell sample Observed thru cell sample Observed thru RDD sample Observed thru RDD sample Combined sample unweighted Combined sample unweighted
CPO 76 40.6 11 0.7 87 5.3
Overlap 111 59.4 1303 88.5 1414 85.4
LLO 154 10.5 154 9.3
Total 187 100.0 1468 100.0 1655 100.0
433-segment weights PWC 2008
Combined sample unweighted Combined sample unweighted True values for PWC Weight Weighted N Weighted N
CPO 87 5.3 19.0 3.61 314 19.0
Overlap 1414 85.4 75.3 .88 1247 75.3
LLO 154 9.3 5.7 .61 94 5.7
Total 1655 100.0 100.0 1655 100.0
44But wait . . . We have 4 segments
Observed thru cell sample Observed thru cell sample Observed thru RDD sample Observed thru RDD sample Combined sample unweighted Combined sample unweighted
CPO 76 40.6 11 0.7 87 5.3
Overlap via cell 111 59.4 111 6.7
Overlap via RDD 1303 88.5 1303 78.7
LLO 154 10.5 154 9.3
Total 187 100.0 1468 100.0 1655 100.0
45If 2 frames split the overlap equally
Combined sample unweighted Combined sample unweighted True values for PWC Weight Weighted N Weighted N
CPO 87 5.3 19.0 3.61 314 19.0
Overlap via cell 111 6.7 37.7 5.62 623 37.7
Overlap via RDD 1303 78.7 37.7 .48 623 37.7
LLO 154 9.3 5.7 .61 94 5.7
Total 1655 100.0 100.0 1655 100.0
46If overlap-cell segment gets weight 2
Combined sample unweighted Combined sample unweighted True values for PWC Weight Weighted N Weighted N
CPO 87 5.3 19.0 3.61 314 19.0
Overlap via cell 111 6.7 75.3 2.00 222 13.4
Overlap via RDD 1303 78.7 75.3 .79 1025 61.9
LLO 154 9.3 5.7 .61 94 5.7
Total 1655 100.0 100.0 1655 100.0
47In Summary . . .
48Problem and solution
- We dont have gold standard data by which to
weight the results of a dual-frame telephone
survey in a local area - Weighting to national or state averages might not
be accurate - We developed needed formulas that relate observed
percentages to underlying population phone
segment distributions - We calculated response rate ratios by comparing
CHIS 2007 to regional NHIS 2007 results. - We applied these ratios to calculate underlying
distributions in three local telephone surveys
49Results
- The estimates for three suburban counties in
Virginia are quite different from national
phone-segment distributionsand from each other - Cellphone penetration is higher in Northern
Virginia than in downstate suburbs, or in
national estimates - CPO lifestyle has been adopted by fewer people in
the downstate suburbs - The estimates can guide weighting of sample data
- But we must use caution in weighting our
cellphone samples up too much - Larger cellphone samples needed in the future
50Future research
- This is a time of rapid change in the telephone
system - We are just learning how to deal with the
weighting issues in cellphone surveys - We need to look at optimization of our dual-frame
designs (cf. Hartley 1962) - Estimates of response rate ratios can be updated
using more current national phone surveys
compared to NHIS - Results would be strengthened if external local
data were available to validate the estimates
51Estimating Phone Service and Usage
PercentagesHow to Weight the Data from a Local,
Dual-Frame Sample Surveyof Cellphone and
Landline Telephone Users in the United States
Thomas M. Guterbock TomG_at_virignia.edu
- Presented at
- AAPOR 2009
- Hollywood, FL
- May 14, 2009