Issue in Analysis and Presentation

About This Presentation

Title:

Issue in Analysis and Presentation

Description:

... in Analysis and Presentation. of Dietary Data. Nutritional Epidemiology. Walter Willet. The underlying objectives of data analysis and presentation are to learn ... – PowerPoint PPT presentation

Number of Views:140

Avg rating:3.0/5.0

Slides: 48

Provided by: ibmsSin

Category:

more less

Transcript and Presenter's Notes

Title: Issue in Analysis and Presentation

1

Issue in Analysis and Presentation
of Dietary Data
Nutritional Epidemiology
Walter Willet

The underlying objectives of data analysis and
presentation are to learn as much as possible
from the available data and to present what has
been learned to readers completely and w/ maximum
clarity.
The approaches for presentation should vary
depending on the intended readership
- simpler analytic approaches and greater use
of figures may be appropriate for a general
medical journal
- for an epidemiologic publication, more
complex methods and primarily tabular results may
be best.

DATA CLEANING BLANKS and OUTLIERS
A common issue w/ dietary data is the treatment
of questionnaires in which some food items have
been left blank.
Two issues arise frequently (1) should subjects
w/ more than a specific number of blanks be
excluded, and (2) how should blanks be treated in
calculating nutrient intakes?
It is useful to understand why participants may
not have completed a response for a specific
food. This could be due to inattention or
carelessness or because the participant did not
eat the food (even though they should have
answered never).
Several patterns can be seen when examining
questionnaires w/ multiple blank items
- For many forms, blank items are interspersed
w/ plausible and seemingly carefully completed
responses to other foods and the never category
is not used, suggesting that blanks meant that
the food was not consumed.
- In occasional questionnaires, whole sections
are left blank, suggesting that they were missed.
?In calculating nutrient intakes, it seems best
to consider intermittent blanks as no consumption
of the food.

A firm rule for allowable number of blanks cannot
be made for all situations, and it is desirable
to conduct evaluations of decision rules whenever
possible
- Nurses Health Study allowed up to 70 (out
of about 130 items) blanks as long as no whole
sections or pages were blank.
- This criterion has been evaluated empirically
within a validation study by examining the
correlation between number of blanks on a
questionnaire and measurement error (calculated
for each person as the absolute value of the
difference between the FFQ and diet record
values).
- For all nutrients examined there was no
appreciable correlation.

Once nutrients are calculated, some responses
will be implausibly high or low, necessitating
additional decisions regarding allowable ranges.
- The use of total energy intake as a primary
criterion can be justified because it is the only
nutrient for which intake is physiologically
fixed within a fairly narrow and predictable
range.
- It is generally considered that total energy
intakes below approximately 1.2 times the resting
or basal metabolic rate estimated from age,
gender and weight are unlikely to be correct, and
intakes of gt4000 kcal/day are unlikely to be true
for even relatively active men.
- Usually an arbitrary allowable range of 500
to 3500 kcal/day for women and 800 to 4000
kcal/day for men are used. Although the extremes
within this range rarely are correct, adjustment
of nutrient intakes for total energy intake will,
to a large extent, compensate for overall under
or overreporting.

Dietary data are highly sensitive to coding and
data entry errors because nutrients are
calculated from large numbers of foods.
- Miscoding a teaspoon to a cup for one food on
a questionnaire or in a 1-week food record can
seriously misclassify an individual for many
nutrients.
- Multiple choice formats and machine-readable
questionnaires are less prone to such errors, but
if any hand-coded or open-ended questions are
included, such errors may occur.
- Extreme values are primarily at the high end
of the distribution due to the skewed
distributions of most nutrients, and they can be
heavily influential when nutrients are considered
as continuous variables.
- Sometimes such extreme values will be
indicative of improper completion of
questionnaires, such as marking the top category
for all foods in a section.
- In other cases, coding, data entry, or food
composition database errors may be discovered.
- Some values will just reflect unusual food
intake patterns without obvious error.

CATEGORIZED vs. CONTINUOUS PRESENTATION of
INDEPENDENT VARIABLES
Intakes of nutrients and food groups are
primarily continuous.
As the traditional presentation of epidemiologic
data has been in the form of rate ratios and rate
differences for levels of exposure, and
statistical methods have been developed for such
purposes, it is not surprising that most
continuous dietary data have been categorized for
analysis in nutritional epidemiologic studies.
Approaches for the creation of categories (1)
use of arbitrarily defined quantiles (e.g.,
quartiles or quintiles) (2) use of standard
round-numbered cutpoints (3) use of cut points
that are determined a priori to have biologic
relevance such as RDA or the intake at which an
enzyme is saturated.
- Finer divisions of extreme categories may
often useful to extend an examination of the
dose-response relationship.

Arguments for using continuous variables
(1) the greatest statistical power is
provided by a continuous variable if the function
reasonably fits the data, although this advantage
may be slight w/ the use of five of more
categories combined w/ an overall test for trend.
(2) when used as a covariate, a crudely
categorized variable may not fully account for
the effect of that variable, resulting in
possible residual confounding.
(3) the use of continuous variables may
facilitate comparisons among studies because a
single relative risk is reported for an
arbitrarily specified increment of intake (e.g.,
RR for 100 mg of cholesterol per day) that does
not depend on the distribution of the dietary
factor in the particular population or on the
choice of cut-points of individual studies.
Tests for nonlinearity, such as the addition of a
quadratic term, can be used to evaluate the
presence of nonlinearity.

GRAPHICAL PRESENTATION of DATA
The central data of epidemiologic studies should
be presented in numerical form to provide the
actual numbers of exposed subjects and the
numbers of endpoints.
Judicious ancillary use of figures summarizing
the primary findings can be helpful to many
readers. Also, a clear and attractive summary
figure is likely to enhance the probability that
others will include your data in their
presentations.
Single Variable Effects
For presenting the effects of one or a few
dichotomous exposure variables, a graphic display
provides little additional perspective and tables
should be used.
With multiple ordinal categories, a figure can
assist in visualizing an overall relationship
(Fig 13-1).
The use of histograms to present RRs has
generally been disfavored because this parameter
is more correctly represented as a point, and
C.Is are less readily presented.

10
(No Transcript)
11

It might be argued that, if absolute risk is
really of interest, then the dependent variable
should be expressed as absolute risk. However,
one reason that relative rather than absolute
risks are generally utilized in epidemiologic
studies is that age is usually the most powerful
determinant thus, absolute risks are usually
arbitrary depending on the age to which the data
have been standardized.
The actual width of categories of the exposure
variable should be represented in the graphic
display.
Example in Fig 13-1, trans-fatty acid intake
was divided into quintiles ? the distances
between quintiles in the figure were made to be
proportional to the differences between quantile
medians.
Usual quantiles
two groups, tertiles, quartiles, quintiles

Actual vs. Predicted Relationships
In graphic displays of the relationship between
two variables, one issue that naturally arises is
whether to provide the actual data or the
prediction from a model derived from the data.
- Many epidmiologists believe that the actual
data should be provided so that the reader can
view the findings w/o being forced to assume the
appropriateness of any model assumptions, such as
whether a linear relationship adequately
describes the relationship.
- The best solution may be to do both for
example, provide the data points for categories
and superimpose the fitted regression line (Fig
13-2).

13
(No Transcript)
14

One clearly inappropriate approach is to
analyze the data as continuous, but then present
the findings as though they were categorical, for
example, by displaying the odds ratios (and C.Is)
for multiple discrete levels of intake that are
all based on a single regression coefficients.
? Provides the potentially misleading impression
of a clearly monotonic relationship and C.I. that
are too narrow for a specific level because they
are based on the overall data.

Display of Joint Effects
A common approach to represent the joint effects
of two exposure variables is the 3-D histogram
(Fig 13-3A).
- This has been criticized by some for the same
reasons that the use of histograms to present
univariate findings has been discouraged.
? alternative display using points (Fig
13-3B).
- A display of C.I. is usually incompatible w/
the 3-D histogram, but this may also be
problematic w/ the use of points as their C.I.
are frequently overlapping.
? critical C.I. will usually need to be
presented in tabular form or in text.

16
(No Transcript)
17

Locally Smoothed Regression Curves and Regression
Splines
Locally smoothed regression curves and regression
splines have been used increasingly to display
epidemiologic findings w/ a continuous exposure
variable.
- With smoothed regression curves, the values
of the dependent variable, such as a relative
risk, are estimated for a continuously moving
window of values of the independent variable.
- In using regression splines, separate linear
or nonlinear functions are fit between specified
points (knots) on the exposure distribution,
and functions are connected at the knots to
produce a smooth curve.
The principal advantages of these approaches are
that a priori assumptions are not imposed
regarding the shape of the dose-response
relationship and that maximal use is made of the
continuous nature of the dependent variable.
Example relation of alcohol intake to risk of
breast cancer
(Fig 13-4)

18
Example - those data fit using regression
splines provide a strong sense that the
relationship is approximately linear and that a
significant increase in risk is seen even at
about 10g (1 drink) per day.
19

Example2 Use of smoothing techniques to
evaluate the relationship between vitamin A
intake and risk of neural crest congenital
malformations (Fig 13-5)
- This analysis suggested the existence of a
threshold at 10,000 IU, above which risk
increased substantially.
- One possible concern is that inflection
points may be accepted too literally, especially
when the data are sparse (e.g., the method does
not provide a C.I. For the apparent threshold,
which would probably be quite wide).
? this might be regarded as an over-fitting of
the data (analogous to an extreme form of
selecting optimal cut-points to demonstrate a
relationship).

The degree to which the conclusions are affected
by somewhat arbitrary choices such as the width
of the window used for smoothing and the number
and spacing of knots deserves further
consideration.
Although smoothing methods and regression splines
for analyses involving dietary intake may prove
valuable, particularly in exploratory data
analysis, their use deserves further evaluation.
EXAMINATION of FOODS and NUTRIENTS
A full evaluation of the relationship between
diet and a disease should involve the analysis of
data on both food and nutrient intakes.
- If an association w/ disease is found for a
specific nutrient, it is important to examine and
report whether the major foods contributing to
this nutrient (as seen in the same dataset,
defined in terms of either absolute contribution
or contribution to between-person variance) are
also related similarly to risk of disease.

One serious problem w/ analyses of specific foods
is the large number of items on a typical
questionnaire.
- Some argue that the p value used for
statistical significance should be adjusted
according to the number of variables examined.
- The general consensus in epidemiology is that
this unduly reduces power and that individual
associations should be evaluated on their own
merits, and conclusions should be made in the
light of consistency w/ other information
internal and external to the study.
- When a large number of foods (or nutrients)
are screened for associations w/o a prior
hypothesis, the likelihood that some
statistically significant relationships will
occur by chance must be considered when
interpreting the findings.
- This issue is complicated by the large number
of foods because an association w/ a food is
generally more likely to be reported if
statistically significant, particularly if
consistent w/ prior expectations. Reporting the
association for each food on a questionnaire, and
possibly for groups of foods, is impossible in
most journals.
? the literature on foods is likely to be
highly biased, and any summary of the published
literature cannot avoid this bias.

No simple solution exists for publication bias
- A partial solution may be to deposit data in
the National Auxiliary Publications Service or on
the internet for all foods when an analysis on a
particular disease is published.
? at least such data will be available to
others attempting to summarize the literature.
- The best approach for avoiding publication
bias on specific foods is probably to analyze
collaboratively the primary data from all
available studies on a topic.

The widespread use of multiple vitamins and other
nutritional supplements adds complexity to
dietary analyses, but can also provide important
insight by greatly extending the range of
observable nutrient intakes.
- Details on dose and duration of supplement
use are usually obtainable w/ substantially
greater precision than is possible for foods.
- When examining associations w/ foods or w/
nutrient intakes, not including supplements, it
will be important to conduct analyses excluding
supplement users, because any effects of
nutrients from foods may be swamped by the
relatively high levels of intakes from
supplements.
- Stratification by nutrient intake from foods
may also be important when examining the effect
of the same nutrient from supplements because
little effect of supplementation might be
expected when intake from foods is high.
- The greatest contrast in risk would usually
be expected when long-term supplement users w/
high intakes from diet are compared w/
nonsupplement users w/ low intakes from diet.

The Effect of Time
Dietary factors may operate at various stages of
the sequence of events (e.g., an antioxidant
might reduce the effect of ionizing ration, which
is an early event or alcohol may influence
endogenous hormone metabolism, which is most
likely to be most important later in
carcinogenesis).
The effect of diet may also be cumulative so that
risk is related to a function of both dose and
duration of exposure.
Dietary factors may also have effects at specific
periods in life far removed from the time of
diagnosis (e.g., high growth rates before puberty
appear to increase breast cancer risk by
advancing the age at menarche, and effects of
maternal diet during pregnancy on the offsprings
risk of breast cancer have been hypothesized).
Our knowledge is often not sufficient to be
confident that an effect of diet would be limited
to only a particular period, it will usually be
difficult to exclude an effect of a dietary
factor until a fairly wide range of temporal
relationships have been examined.

Ideally, a comprehensive dietary assessment would
include a measurement of current diet and also
diet at various times in the past.
- Unfortunately, a comprehensive assessment of
even current diet is already a major burden on
participants. Moreover, the validity of recall
diminishes w/ time, and the reporting of past
diet is heavily influenced by current diet, so
that truly independent retrospective assessments
of various periods appear impossible.
- In practice, in cohort studies an assessment
of current diet (usually over the past year) is
used, and in case-control studies a period in the
past thought to be most plausibly relevant to the
disease (typically 5 to 10 years ago for cancer)
is the focus of recall.
For intake of vitamin and mineral supplement,
information on duration of use can be readily
collected and can be critical (e.g., vitamin E
supplements in relation to CHD risk, or for
vitamin C supplements in relation to risk of
cataracts the associations were limited to longer
term users).

Prospective studies provide important additional
means of assessing temporal relationships w/
diet.
- Baseline data on current diet can be examined
in relation to disease incidence at various
follow-up periods under the reasonable
assumption that diet varies over time, the
maximum relative risk should provide information
on the true induction period.
- The limitation of this approach is largely
practical as few cohorts will be sufficiently
large to provide statistically stable estimates
of risk during multiple time periods.
Prospective studies w/ replicate dietary
assessment provide the opportunity to examine
various intervals between dietary intake and
disease diagnosis w/ much greater power.

THE USE of MULTIPLE DIETARY ASSESSMENTS in
PROSPECTIVE STUDIES
A powerful feature of cohort studies is the
opportunity to collect repeated dietary data over
time. Such repeated measurement of dietary
intake provide many possible analytic
opportunities to reduce the effects of
measurement error and to evaluate various
hypothesized temporal relationships between the
dietary factor

and the disease outcome (Table 13-2).
28

Measured changes in diets of individuals over
time are a mix of true variation and measurement
error.
- the comparison of persons whose intakes are
consistently high w/ those whose intakes are
consistently low provide a strong test of
cumulative exposure, as well as both long and
short latency, because it is highly likely that
these persons were truly high or truly low over
long durations.
- The major limitation of this strategy is the
loss of power due to the exclusion of the many
persons who changed categories and the need to
exclude cases that occur before the repeated
measurement if the analyses are to be truly
prospective.
The use of cumulative average measurements (i.e.,
the average of all measurements for an individual
up to the start of each follow-up interval) takes
advantage of all prior data and thus should
provide a statistically more powerful test of an
association of cumulative exposure.
- This approach deserves further methodologic
development, though, to take into account the
different degrees of measurement error and
information provided at each interval.

Because our understanding of disease etiology is
often inadequate to specify a temporal
relationship w/ confidence, the use of several
rather than just one analytic strategy to examine
various temporal relationships will generally be
appropriate.
Clear evidence that an association is strongest
w/ a particular temporal relationship can provide
important information on the pathogenetic process
and possibilities for intervention.
- If no association is observed, the
demonstration that this lack of relationship is
seen when a full range of temporal relationship
is examined provides the most compelling evidence
that an important association has not been
missed.
Example Analysis of the relationship between
coffee consumption and risk of CHD in women
(Table 13-3)

30
(No Transcript)
31

MULTIVARIATE ANALYSES
Multivariate methods may be particularly
important in nutritional epidemiology because
dietary factors tend to be intercorrelated,
sometimes strongly so.
A common reason for using multivariate analysis
in a study of diet and disease is to address the
question of whether an observed association
between a specific dietary factor and disease
risk is only secondary to its correlation w/
another, truly causal dietary factor.
- A standard approach is simply to include both
variables together in the same model.
- As the focus will typically be on dietary
composition rather than on absolute amounts, the
specific nutrients should usually be expressed as
energy-adjusted residuals or nutrient densities,
and total energy should be included as a term
unless it is unrelated to disease risk.
- Many possible alternative dietary factors
could be considered as potential confounding
variables, and the temptation may be to simply
include these all simultaneously in a model.

A problem w/ including a large number of dietary
factors simultaneously is that the remaining
independent variation in the primary dietary
factor may become quite small because,
collectively, the other variables can account for
almost all of its variation.
An alternative strategy is to conduct a series of
analyses including standard non-dietary factors
at a time. In this process, it may be possible
to eliminate several or all alternative variables
by showing that they have no independent
association w/ disease and that the association
w/ the primary variable remains.
There is no clear limits as to the number of
nutrients that may be included simultaneously as
this will depend on their intercorrelation as
well as size of the dataset. However, because
many dietary variables are strongly correlated w/
many others, the maximum number is likely to be
modest before C.I. became uninformatively wide.

Another common situation arises when one or more
dietary factors are subcomponents of another
(e.g., saturated, monounsaturated, and
polyunsaturated fats are the components of total
fat) entering all four variables simultaneously
is impossible as they are redundant.
- Options (using types of fat as an example)
are listed in Table 13-4.
- Model 1a a standard multivariate model and
does address the independent effect of saturated
fat. The term total fat no longer has the
biologic meaning of total fat because a major
component, saturated fat, is included separately
its meaning then becomes mono- and
polyunsaturated fat.

- Model 1b Total fatres E sat fatres total
fat energy
the residual from the regression of
energy-adjusted saturated fat on energy-adjusted
saturated fat is included. This will provide the
same coefficient for saturated fat as in model
1a, but the full biologic meaning of total fat is
retained ? this model describes disease risk in
relation both to the total fat composition of the
diet and to the type of fat.
- Model 1c Total fat/E sat fat/E energy
the term for saturated fat (as a nutrient
density) can be interpreted as substituting
certain percentage of energy from saturated fat
for the same amount of other types of fat. The
term for total fat as a nutrient density reflects
primarily the energy density of mono- and
polyunsaturated fats.
- Models 2a Total fat sat fat poly fat
energy
2b Total fatres E sat fatres total fat
poly fatres total fat energy
2c Total fat/E sat fat/E poly fat/E
energy
analogous to models 1a-1c, but more
specifically address the substitution of
saturated fat for monounsaturated fat because
polyunsaturated fat is included as a separate
term. Can be used for testing the general
question Does the type of fat add independently
to the prediction of disease above and beyond
total fat?

- Models 3a Sat fat mono fat energy
3b Sat fatres E mono fatres E poly
fatres E energy
3c Sat fat/E mono fat/E poly fat/E
energy
are not fat substitution models because the
total fat composition of the diet is not
constrained. These models address a somewhat
different issue Are each of the types of fat,
substituted for other sources of energy,
independently associated w/ disease risk?
Another common example of a dietary factor w/
nested subcomponents is alcohol intake, where the
question frequently arises whether an observed
association is due to a certain type of alcoholic
beverage.
The interpretation of multivariate analyses
including two or more dietary factors should
always be tempered by knowledge that non of the
dietary variables are measure perfectly.
Moreover, the degree of measurement error may
vary among different dietary factors.
- One dietary factor may appear to be the true
predictor and the other a confounder only because
the former is better measured.

EMPERICAL DIETARY SCORES
Two traditional methods of combining data on
intakes of various foods have been (1) to compute
nutrient intakes using food composition tables,
and (2) to create food groups based on
similarities in nutrient content.
- Use of global scores to describe dietary
patterns or quality has also been suggested.
Factor Analysis
Can be used to identify two or more uncorrelated
dietary patterns based on foods that tend to be
used (or avoided) by the same persons.
A score is created for each person for each
factor by assigning weights to their frequency of
use of each food. Once the scores are computed
for each person, their relation to risk of
disease can be examined.

The role of factor analysis or other multivariate
methods (e.g., principal components or cluster
analysis) to create scores for dietary patterns
in nutritional epidemiology remains unclear.
- In contrast to calculations of nutrient
intakes, there is no biologic basis for these
scores.
- This approach may be useful for describing
the intercorrelation of foods and thus the
identification of potential confounders.
Empirically Selected Variable Score
A tempting strategy for developing a prediction
score is to examine the relation of each food (or
nutrient) w/ risk of disease, pick the
significant associations, and create a summary
score comprised of these variables.
- The problem w/ this approach can be
appreciated by considering that, if 100 foods are
examined as disease predictors, by chance alone
about 5 will be statistically significant. A
score based on these 5 variables will be
extremely significantly predictive of disease,
all on the basis of chance.

One common strategy for cross-validation is to
divide the dataset into halves, create an
empirical prediction score in one half (training
set) and evaluate the score in the other half
(test or validation set).
More statistically efficient alternatives for
cross-validation exist, such as the jack knife,
which involve successively leaving out one
observation and fitting the model w/ the
remaining data to predict the omitted
observation.
Nutrient Prediction Models Using an Independent
Gold Standard
In calculating nutrient intakes from a FFQ, foods
are weighted by their frequency of use and their
nutrient content using a food composition
database.
- Ideally, the weight would also take into
account the validity w/ which intake of each food
was assessed and the bioavailability of the
nutrient from each food.
- This additional weighting can be accomplished
by using an independent quantitative assessment
of nutrient intake or a biochemical indicator of
nutrient intake in a sample of the population.

Diet records, which are often used to assess the
validity of a food-frequency instrument, could be
used as an independent estimate of true nutrient
intake to develop a prediction score from foods
on a FFQ.
- To do so, the nutrient intake from the diet
record would be used as the independent variable
in a multiple regression analysis w/ all foods
from the FFQ being allowed to enter in a stepwise
multiple regression analysis. Foods that explain
the most between-person variance in the nutrient
intake enter first.
- If the validity of a food item on the
questionnaire is low (e.g., if it was worded
poorly), it should not contribute appreciably to
the prediction of the nutrient.
- The nutrient score based on the coefficients
from the stepwise regression could then be
computed for each person and used in analyses
predicting disease.
- Limitations of this approach (1) a large
number of subjects are needed to provide stable
estimates of regression coefficients, probably
considerably larger than most validation studies
(2) the regression coefficients will reflect in
part the respondent characteristics which is a
desirable feature because they may influence
validity- but this means that they may not be
generalizable to other population.

A biochemical indicator could also serve as the
standard for developing a prediction score from
foods on a FFQ.
- Such an approach would take into account
factors such as the bioavailability of the
nutrient in each food and the validity of each
food item.
Example Giovannucci et al. (1995) use this
approach to develop a prediction score for
lycopene intake using plasma lycopene levels as a
standard. They found that cooked tomato products
predicted plasma lycopene levels better than did
raw tomato products so this was incorporated
into the empirical prediction equation. When
this empirical score was used to examine the
relation between lycopene intake and risk of
prostate cancer in the total cohort, a stronger
association was observed than using the standard
calculation of intake.
- The size of substudy is a critical issue
because it determines the precision of
coefficients, but the desirable size is not
clear.

SUBGROUP ANALYSES AND INTERACTIONS
The effects of most dietary factors are likely to
vary among subgroups, depending on the intake of
other dietary factors and characteristics of the
subjects. This issue is generally known as
effect modification or interaction.
- A fundamental issue is whether the
interaction should be assessed on an absolute
scale (whether the rate difference is constant
across categories of the third variable) or
relative scale (whether the rate ratio or RR is
constant across these categories).
One general concern has been that an extensive
search for interactions and associations within
subgroups of other variables creates a high
likelihood of statistically significant
associations arising by chance.
Although there is no simple solution for
evaluating subgroups w/ confidence, this should
not deter investigators from examining them.
- Some subgroup analyses are so important a
priori that they must be examined, and failure to
find evidence of effect modification may even
cast doubt on an association.

When two dietary factors act by a similar
mechanism, it may be difficult to observe an
association by examining only one of these
variables at a time an examination of joint
exposures may be most powerful.
Purely exploratory analysis of associations among
subgroups of known risk factors is also a good
practice even when little a priori reason exists
because new knowledge may be gained
- Such explorations w/o strong prior
expectations should be clearly described as such,
and the reader should be skeptical of any
findings.
- Some would suggest not reporting p values in
such circumstances.
Ultimately, the only fail-safe protection against
spurious conclusions based on subgroups is
demonstration of reproducibility, perhaps over
time within the same study and, most importantly,
in other independent datasets.

ERROR CORRECTION
Methods to correct observed associations for
errors in measurement of exposure variables
require data on either reproducibility or
validity this information is now being collected
as part of many large studies.
The most common use of error correction
procedures is the de-attenuation of correlation
coefficients, probably because this only requires
replicates of one or both measurements being
compared.
- This procedure has become quite routine in
validation studies, where a small number of days
of diet records or 24-hour recall data are
collected as an independent representation of
true long-term intake.
In studies of disease incidence, the calculation
of RRs and C.I. Adjusted for measurement error
has usually been done as a secondary analysis.
- These analyses address 3 interrelated
objectives (1) to obtain the best estimate of
the RR after accounting for attenuation due to
imperfect measurement of the primary exposure
(2) to obtain the best estimate of the true C.I.,
which is particularly important when little
association is seen, because the central question
becomes whether the adjusted C.I. Are
sufficiently narrow to be informative (3) to
account for residual confounding by imperfectly
measured covariates.

ROLE of META-ANALYSIS and POOLED ANALYSIS in
NUTRITIONAL EPIDEMIOLOGY
The place of meta-analysis in epidemiology has
been controversial.
- Some have argued that the combining of data
from randomized trials is appropriate because
statistical power is increased w/o concern for
validity since the comparison groups have been
randomized, but that in observational
epidemiology the issue of validity is determined
large by confounding and bias rather than
limitations of statistical power ? the great
statistical precision obtained by the combining
of data may be misleading because the findings
may still be invalid.
- The combining of all available epidemiologic
data can be of treat value, particularly when a
body of evidence becomes substantial and
difficult to assimilate at once and if the
potential for bias is not ignored.

An alternative to the combining of published
epidemiologic data is to pool and analyze the
primary data from all available studies on a
topic that meet specified criteria.
- Because of the complexity of dietary data,
this approach has great advantages in nutritional
epidemiology and can address many limitations of
individual studies.
Any attempt to combine published data on diet and
disease is immediately confronted w/ the problem
that various investigators have usually used
different approaches for presenting their
findings that make them difficult to combine
- Sometimes RRs are given for arbitrary
quantiles and other times for specified
increments using continuous variables.
- Adjustments for total energy intake are often
done using a variety of methods or not at all,
and the inclusion of other covariates typically
differs among studies.
A major advantage of pooling primary data is that
all data can be analyzed simultaneously using
common approaches and definitions of exposure.

In a pooled analysis, the range of dietary
factors that can be addressed can be considerably
greater than in the separate analyses because any
one study will have few subjects in the extremes
of intake and, sometimes, because the studies
will vary in distribution of dietary factors.
Example in the pooled analysis of
prospective studies of diet and breast cancer, it
was possible to evaluate associations from lt15
to gt45 of energy from fat, which was a far
greater range than possible in the individual
studies.
- Although few pooled analyses have been
conducted at the level of specific foods, this
can provide a less biased assessment of
relationships based on the total body of
evidence.
Evaluation of the consistency of findings in
subgroup analyses across studies will reduce the
likelihood of overinterpreting findings that may
have occurred by chance.

The data quality may differ among studies due to
differences in the questionnaires used, study
designs, or populations.
- This can be addressed if each study includes
a validation/ calibration substudy so that
corrections can be made for the study-specific
measurement error, and the studies w/ more valid
assessments of diet can be given more weight.
- The advantages of pooled analyses in
nutritional epidemiology are so substantial that
this should become common practice for important
issues.