Statistical Errors in Publications - PowerPoint PPT Presentation

1 / 35

About This Presentation

Title:

Statistical Errors in Publications

Description:

Title: BeST Back Skills Training Trial Author: IT Services Last modified by: mhsdaw Created Date: 1/17/2005 12:19:35 PM Document presentation format – PowerPoint PPT presentation

Number of Views:205

Avg rating:3.0/5.0

Slides: 36

Provided by: ITSe208

Category:

more less

Transcript and Presenter's Notes

Title: Statistical Errors in Publications

1
Statistical Errors in Publications

October 2010

OVERVIEW
Greater emphasis on sections dealing with
Design
Sample size
Statistical methodology
Results (Presentation/Interpretation)
Discussion/Conclusion.

SAMPLE PAPERS
Sample 1 Randomised controlled trial
management of ankle sprains comparing elastic
support bandage v. aircast ankle brace (Br. J.
Sports Med, 2005)
Sample 2 Study to assess variables which
predict chronic neck pain disability (Arch Phys
Med Rehabil, 2004).

PREVALENCE OF STATISTICAL ERRORS
Concerns of misuse of statistics dating back over
70 years (Altman, 2004)
Despite greater awareness (e.g. CONSORT) of
statisticalissues such concerns have not
diminished

Prevalence of Statistical Errors (contd)
Serious statistical errors were found in 40 of
164 articles published in psychiatry (Altman,
2002)
At least one serious statistical error occurred
in 38 and 25 of papers in Nature and BMJ
respectively (Garcia-Berthou and Alcaraz (2004))
Many surveys of statistical errors report error
rates ranging from 30-90 (Altman, 1991 Gore
et. al., 1976 Pocock et al., 1987 and MacArthur,
1984).

Why are there so many errors? (Altman, 2004)
Many investigators are not professional
researchers, they are primarily clinicians
Training usually a single course in statistics
Training focuses on data analysis, but issues
such as statistical reporting and
interpreting are not addressed
Statistical content and complexity of medical
research has
increased steadily over recent decades.

(Altman, 2004)
........ When I tell friends outside medicine
that many papers published in medical journals
are misleading because of methodological
weaknesses they are rightly shocked.
Huge sums of money are spent annually on research
that is seriously flawed through the use of
inappropriate designs, unrepresentative samples,
small samples, incorrect methods

Personal Scientific Experience
Observe (Natural Course of Disease)
Research Planning, Grant Writing,
Protocol Development
Concept Development
Hypothesize (Frame Research Question)
Data Collection Analysis
Test (Conduct Experiment/ Clinical Trial)
Experimental Design
Journal Articles, Scientific Meetings
Conclude (Validate or Modify Hypothesis)
Process Stage Activity
Statistical Inference
9

DESIGN
Population A population is a group of
individuals persons, objects, or items from which
samples are taken
Sample A sample is a finite part of a
statistical population whose properties are
studied to gain information about the whole
Sampling Sampling is the process of selecting a
suitable sample, or a representative part of a
population for the purpose of determining
parameters or characteristics of the whole
population.
Purpose of sampling To draw conclusions about
populations from samples, we must use inferential
statistics which enables us to determine a
populations characteristics by directly
observing only a portion (or sample) of the
population.

Design (contd)
Sampling error What can make a sample
unrepresentative of its population? One of the
most frequent causes is sampling error.
Two types of sampling errors
Chance That is the error that occurs just
because of bad luck.
Bias Sampling bias is a tendency to favour the
selection of units that have particular
characteristics (as a result of poor sampling
plan)
To avoid sampling error Plan careful !!
select using a random selection of
participants

SAMPLE SIZE
Sample size may be determined by various
practical constraints
Financial
Resources
Too small a sample is not representative of a
population
Too large a sample results in wastefulness and is
unethical
The larger the sample size the more likely the
results will reflect what will happen in the
population

Sample size (Power Calculation) (contd)
Difference Clinically important difference
significance threshold type I error -
conventionally set at 0.01 or 0.05
Power i.e. 1- type II error -
conventionally 80 or 90
How confident you are that the sample will
detect a difference, if
one really exists in the
population
Variability The less variability among patients
within each group, the more
likely they reflect the overall
populations.

Sample size (Power Calculation) (contd)
Increase in Sample size
Smaller the clinically relevant difference
Increase in power
Less variability
Reduction in Type I error rate
Allow for dropouts and/or withdrawals

Sample size (contd)
Review the two articles in terms of
Design
Sample size

Sample size (contd)
.A major concern in the design of studies is
the almost universal lack of reporting of how the
sample size was obtained.. (Altman, 2000).
Basis of the power calculation is inadequately
described (Malachy, 2004, Vail et al.,
2003).(all sample papers)
Quite often sample size calculations are
computed without allowing for dropouts
(McGuigan, 1995).(all sample papers)

Sample size (contd)
Small studies
Small trials have a low power and high type I
error
No sample size provided, then conclusions of the
study have little value (as sample 2)
If underpowered then the conclusions to be taken
with caution and the results are inconclusive
(as sample 1 )

Sample size (contd)
A description of the sample size in the
literature should contain, for example
The mean and sd. for the RMQ on the active
management is 5.91 and 4.27 respectively
(Oxfordshire Low Back Pain trial, BMJ, 2005). The
smallest difference between the two therapies
which is clinically relevant is approximately
2.0. Using this information, the total number of
participants required for this study will be 700,
allowing for a 25 loss-to-follow up and using
90 power with a 1 type I error rate
(significance level).

METHODS
................ All of the problems hinge on
the understanding of what a statistical test is
doing and what a p-value means ....
(Murphy, 2004)

METHODS
A Statistical test is a procedure you use to
compute a probability in support of the
hypothesis (null)

Methods (contd)
e.g. H0
H1
Test statistic t-test
The test statistic is transformed into a p-value

Methods (contd)
P-value strength of the evidence (quantified by
a probability) in support of the null hypothesis.
Neither the statistical test nor the p-value
PROVE/DISPROVE the null hypothesis they provide
EVIDENCE in support of the null hypothesis.

Methods (contd)
Review the two articles in terms of
Methods
Results (including figures and tables)

Methods (contd)
.. A further issue is the copying of incorrect
or inappropriate methods. Once incorrect
procedures become common, it is hard to stop them
from spreading through the medical literature
like a genetic mutation.. (Altman, 2002).
(as sample 1)
Schwartzer et al. (2000) found that most papers
made important errors in the application of new
technology such as models for longitudinal data.
(Altman, 2000).
(e.g. Hierarchical models in sample 1 ROC curves
in sample 2)

Methods (contd)
Most common errors in Methods section
Failure to check assumption (Nature says that the
most common error was not checking for a normal
distribution and not stating how normality was
tested)
Using linear regression analysis without first
establishing that the relationship is linear
Ignoring paired or ordered categories and
therefore using an inappropriate test
Arbitrarily dividing continuous data into ordinal
categories without explanation (Data dredging)
Multiple comparison (could increase the
likelihood of significant result) (sample 2)
And many more . sub-group analyses,
ignoring repeated measures design, non-matched
analysis for matched data, modelling incorrectly,
i.e. interactions not included .

Methods (contd)
Begin a statistical analysis with data
exploration
Check assumptions
Type of data continuous, binary, ordinal,
repeated over time, etc.
Missing values, outliers, no. of withdrawals
Be careful with computer output (often helps to
do simple calculations by hand first).

RESULTS
..The results section must be written so that
the average reader can understand the study
findings (Cummings, 2003).
poorly written with excessive jargon
(Byrne, 2000).
(sample 1 and sample 2)
.. A major bias is cherry-picking results
(Malachy, 2004).

Results (contd)
Common Language pitfalls
Avoid non-technical uses of technical terms such
as normal, significant, sample
No difference means evidence of lack of
statistical significant difference
(Sample 1)
p-values - using 2 digit precision (e.g. p
0.82)
Do not reduce p-values to non-significant or
NS
Report a quantity so as that it is
scientifically relevant (e.g. mean blood pressure
of 115.73 mmHg should be reported as 115.7 mmHg
or even 116 mmHg)

Results (contd)
P-values
Over-emphasis on the p-value
An arbitrary division of the results into
significant and non-significant according to
the p-value was not the intention of the founders
of statistical inference
Smaller p-values indicate a strong evidence
against the null hypothesis.

Results (contd)
Confidence Intervals
A confidence interval is simply a range of values
which enclose the population value
Confidence intervals are preferable to p-values,
as they tell us the range of possible effect
sizes compatible with the data
The larger the sample size the narrower the
confidence interval
A confidence interval based on the difference
(e.g. treatment difference) and contains a 0, or
on a ratio (e.g. odds ratio) and contains a 1,
implies lack of evidence of a statistically
significant difference.

Results (contd)
and many more pitfalls ..
testing baseline values (sample 1)
not reporting missing data
lack of statistical power not considered
misinterpreting and misunderstanding results
from models e.g. no interactions included.

PRESENTATION
In tables that compare groups include count (of
patients or events) and column percentages
Use appropriate statistics (mean instead of
median for non-normal data)
In tables of column percentages, do not include a
row of counts and percentage of missing data
(doing this will distort the other percentages in
the table)
Statistical software packages provide a large
amount of output need to be selective about
what is presented
Use graphs as alternative to tables with many
entries do not duplicate graphs and tables.
Labelling graphs and tables correctly (sample 1
and sample 2)

INTERPRETATION AND DISCUSSION
Put the study sample in context of the
population
Interpreting studies with non-significant results
and low statistical power as negative (when
they are inconclusive) The absence of proof is
not proof of absence
Errors encountered in the design and analysis of
a study can also continue through to errors in
interpretation (Rushton, 1999)
Weaknesses in study design and study strengths
stated so that a clear and accurate impression of
the reliability of the data can be formed.

And finally..
The misuse of statistics is very important
The need for statisticians to be involved in
research at some stage, preferably early as
possible
Most errors relatively unimportant
Some can have major bearings on the validity of
the study.
So.