Title: Relationship Between Length of Data Collection Period, Field Costs, and Data Quality
1Relationship Between Length of Data Collection
Period, Field Costs, andData Quality
- J. Michael Brick, Leslie J. Christovich, Kenneth
Herrell, - Benmei Liu, and Timothy Smith
- Presented at ICES-III
- Montréal, Québec, Canada
- June 18-21, 2007
2Discussion topics
- Overview of two web-based NSF establishment
surveys - Response rates and field period
- Definition of response levels for analysis
- Examination of data quality at lower response
levels - Examination of costs at lower response levels
- Conclusions
3Survey of Science and Engineering Research
Facilities
- Biennial survey conducted since 1986
- Census of eligible academic and biomedical
research institutions - Key survey variables include amount and condition
of research space by field of SE, expenditures
for construction and repair/renovation projects,
source of funds for construction and
repair/repair projects, computing and networking
environment - Completed by institutional coordinators, with
contributions from others at the institutions - Burden estimate is 41 hours and 7 hours for
academic and biomedical institutions,
respectively
4Survey of Research and Development Expenditures
at Universities and Colleges
- Annual survey conducted since 1972
- Census of eligible U.S. universities and colleges
- Some of the key survey variables include RD
expenditures by source of funds (e.g., federal,
state/local, industry) and character of work
(e.g., basic research, applied RD), by field of
SE - Completed by institutional coordinators
- Burden estimate is 22 hours per institution
5Response rates
- Final response rates for most recent two cycles
of each survey - Question What would be the consequences to costs
and data quality if the data collection had been
terminated earlier?
Facilities Facilities Expenditures Expenditures
2003 2005 2003 2004
92 94 95 94
6Definition of response levels for analysis
- Three cumulative response levels, consisting of
the first 75, 88, and 90 to respond - At each response level, final nonrespondents and
institutions that responded later than the cut
point were considered as nonrespondents - Weighting was used to compensate for these
nonrespondents at each response level
7Field period by response level
- Table 1. Response-level assignment dates, by
survey cycle for the Facilities and RD
Expenditures Surveys
Weeks from initial contact Weeks from initial contact Weeks from initial contact Weeks from initial contact
Response level Facilities Survey Facilities Survey RD Expenditures Survey RD Expenditures Survey
Response level FY 2003 FY 2005 FY 2003 FY 2004
75 19 17 15 16
88 26 22 26 31
90 27 23 32 33
Full 33 31 47 37
8Differences in response rates, by institution
size and response level
9Differences in response rates, by institution
size and response level (continued)
10Data quality measure - Absolute Relative
Difference
- Differences between the weighted estimates at the
full response level and estimates at each lower
response level were used to approximate the
relative bias - Absolute relative difference (bias) defined as
-
11ARD for Facilities Survey estimates for academic
institutions, by field and response level
12ARD for Facilities Survey estimates for academic
institutions, by field and response level
(continued)
13ARD for Expenditures Survey estimates for
academic institutions, by field and response level
14Discussion of relative differences
- The ARD patterns are not consistent
- The 75 response level generally exhibits larger
bias than the other response levels, but the bias
is generally not statistically significant - No significant bias if data collection were
terminated at the 88 response level
15Costs
- Analysis of Facilities Survey only
- Based on labor costs associated with data
collection (i.e., nonresponse followup) and data
retrieval - Only variable costs are considered (i.e., fixed
costs are excluded)
16Definition of costs
- Data collection (DC) costs
- Costs associated with nonresponse followup.
- Daily costs the sum of the labor hours billed
each month by labor category multiplied by the
hourly rate for each category, and then divided
by the number of days in the month - Total costs for institution X the sum of daily
data collection costs in each month for the
number of days the case was active in the month,
summed across all months of activity
17Definition of costs (continued)
- Data retrieval (DR) costs
- Costs associated with identifying and contacting
institutions with inconsistent or incomplete
responses, and updating responses - Cost per retrieval element the labor hours
billed by each labor category multiplied by the
hourly rate for each category, and divided by
the number of data retrieval elements across all
institutions - Total costs for institution X the cost per
element multiplied by the number of elements
needing retrieval at the institution
18Definition of costs (continued)
- Total costs for each response level include both
DC and DR costs for respondents and
nonrespondents that were active during the
response period.
19Costs for FY 2003 Facilities Survey
20Conclusions
- Little risk of bias No strong evidence that
nonresponse bias was related to the response
level within these categories - Diminishing returns Once a 75 response rate is
obtained, the relative increase in costs is
greater than the relative decrease in bias - Extension of field period Efforts to obtain
responses from the last response group extended
field period at least 4 to 8 weeks
21Conclusions (continued)
- Other factors might influence decision
- Need for timely data
- Uses of survey data for other purposes (e.g.,
frames)