CRITICAL ANALYSIS

About This Presentation

Title:

CRITICAL ANALYSIS

Description:

Was there planning for any adverse effects & dropouts. Statistical analysis sensible? ... allow for and manage dropouts? Significance? C.I.s? dose-response? ... – PowerPoint PPT presentation

Number of Views:1332

Avg rating:3.0/5.0

Slides: 55

Provided by: FP79

Category:

more less

Transcript and Presenter's Notes

Title: CRITICAL ANALYSIS

1
CRITICAL ANALYSIS

WHICH RESEARCH DESIGN FOR
WHICH CLINICAL PROBLEM?

2
(No Transcript)
3

4

5
(No Transcript)
6
Appraising a Clinical Experimental Study

Population/Subjects
What was the source population?
What were the inclusion/exclusion criteria?
Were they be a representative and relevant
sample?
How long was the follow-up period?

Are the Results of the Study Valid?
Randomization? Was randomization list hidden?
Were baseline characteristics of the groups same
at start?
Was there an intention-to-treat analysis?
Were interventions outcomes clearly defined
replicable?
How complete was blinding? Assessed at end?
Apart from the experimental intervention, were
the groups treated equally ? i.e. same
co-interventions?
Was comparison group contaminated with main
interventions?
Was compliance with interventions
measured/assured?
Were all accounted for at end? (was follow-up
complete)
Was follow-up time sufficient to detect relevant
outcomes?

Results
How large were the intervention effects?
what measure(s) of 'event rate' or outcome were
used?
What was the NNT or NNH ?
How accurate were estimations of the intervention
effects
e.g. p-values, confidence intervals
How large were the intervention effects?
Did the study have sufficient power?

Applicability and Conclusions
Applicability Relevance?
(to your patients, and is the treatment
feasible available)
Were all important outcomes considered?
Are the likely treatment benefits worth the
potential harm costs? (adverse effects)
Strengths and weaknesses?

10
Appraising a Diagnostic Study

Population/Subjects/Setting
What was the source population tested?
What were the inclusion/exclusion criteria?
Were subjects a representative and relevant
sample?
How did they recruit subjects?

Validity
Was there a Comparison with a 'Gold Standard
Test'
How did they define 'caseness' to be detected by
the test?
If is no 'Gold Standard', can test be validated
in other ways?
Was there blinding of Subjects and of
Investigators to theory?
How thorough was this and was it assessed at the
end?
Was Sample Size OK re Power?
Did all subjects get both new test and Gold
Standard test?
Was there testing by 2 independent investigators?
Was there planning for any adverse effects
dropouts
Statistical analysis sensible?
Test-retest issues discussed?

Conclusions
Sensitivity - Proportion of true positives
identified by a test or by epidemiological
screening.
Specificity - Proportion of true negatives
identified by a test or by epidemiological
screening
Did the test work as well as Gold Standard?
Benefits vs harm?
Relevance? Practicality in the real world?
Are the likely clinical benefits worth the
potential harm costs? (e.g. adverse effects)
Strengths and weaknesses?
How could it be improved?

13
APPRAISING A CAUSATION STUDY

Population/Subjects
What is the source population being studied?
Did they define 'exposed' group vs 'comparison'
group (cohort study)
Or define controls (case-control study) - any
randomisation?
What were the inclusion/exclusion criteria?
Were subjects a representative and relevant
sample?
How did they recruit subjects and
comparisons/controls?

Basic Structure of Study
Cohort study?
A Longitudinal study in which groups of
people are interviewed repeatedly over a period
of time - respondents usually share a common
characteristic. Where the same group of people
are followed up over time this is known as a
cohort study. If a group of different people are
interviewed in each wave a survey this is known
as a trend design.
Case-control study? (did exposure precede
outcome?)
Cross-sectional study? (did exposure precede
outcome?)
Did Researchers Define
The causal factor studied - is their theory
sensible?
The 'outcome' caused by causal factor?
Often the Risk Ratio is discussed (A comparison
of the risk of some health-related event such as
disease or death in two groups)
Was there Blinding?
Re the hypothesis - ideally both subjects
assessors
How good was this and was it assessed at the end?

Data Validity
Was Sample Size Ok re Power
Did they follow-up long enough?
How did they allow for and manage dropouts?
Significance? C.I.s? dose-response? Specificity?
Conclusions
Relevance usefulness?
Strengths and Weaknesses of study?
How could it be improved?

16
APPRAISING A PROGNOSIS STUDY

A Prognostic Factor is a patient characteristic
that can predict the patient's eventual outcome
a demographic e.g. sex, age, race
disease-specific e.g. tumour stage, symptom
pattern
comorbidity other co-existing conditions
Articles that report prognostic factors often use
two independent patient samples
derivation sets asks - "what factors might
predict patient outcomes?"
validation sets ask - "do these prognostic
factors predict patient outcomes accurately?"

Methods
Design? (cohort / case series / prospective vs.
retrospective)
Setting? hospital / location / clinic
Patient Population? - number / screening or
enrollment methods / number screened vs number
enrolled
Description of prognostic or outcome factors
considered
Prognostic Outcome Factors are the numbers of
events that occur over time, expressed in
absolute terms e.g. 5 year survival rate
relative terms e.g. risk from prognostic factor
survival curves a curve that starts at 100 of
the study population and shows of the
population still surviving at successive times.
Applied to onset of a disease, complication or
some other endpoint (e.g. time before relapse)

Validity
Was a defined, representative sample of patients
assembled at a common (usually early) point of
the illness ?
Inclusion and exclusion criteria?
Selection biases?
Stage of disease?
Was patient follow-up sufficiently long
complete?
Reasons for incomplete follow-up?
Prognostic factors similar for patients lost and
not-lost to follow-up?
Were objective unbiased outcome criteria used?
Outcomes defined at start of study?

Validity
Assessors and subjects blinded to prognostic
factor theory?
Statistical models seem OK?
Follow-up duration / completeness / accounting
for patients
If subgroups with different prognoses were
identified
Was there adjustment for important prognostic
factors?
Are the (hopefully valid) results of this
prognosis study important? i.e.
How large is the likelihood of the outcome
event(s) in a specified time?
Survival curves?
How precise are prognostic estimates?
Confidence intervals?

Conclusions
Strengths and Weaknesses of Study
In context of other studies /or current standard
of care?
Next steps for further study of this problem?
Can you apply the (hopefully valid important)
results of this study to caring for your own
patients? - i.e.
were the study patients similar to your own?
patients similar for demographics, severity,
co-morbidity, and other prognostic factors?
will this evidence make a clinically important
impact on your views on what to tell or to offer
your patients?
Compelling reason why the results should not be
applied?
Will the results lead directly to you selecting
or avoiding therapy?
Are the results useful for reassuring or
counselling patients?

Incidence
can be defined as the number of new
occurrences of a phenomenon e.g. illness, in a
defined population in a specified period. An
incidence rate would be the rate at which new
cases of the phenomena occur in a given
population.
Prevalence (also called Prevalence Rate re
prevalence across time)
the number of cases (or events, or conditions)
within a specified time period. e.g. prevalence
of a condition includes all people with the
condition even if the condition started prior to
the start of the specified time period.
Period prevalence The amount a particular
disease present in a population over a period of
time.
Point prevalence The amount of a particular
disease present in a population at a single point
in time.

22
Appraising Systematic Reviews(of treatment /
Intervention Studies)

What were the relevant population(s)?
What were the main exposure(s)?
What were the comparison(s)?
What were the outcome(s)?
Design of the Studies
experimental or non-experimental ?
cross-sectional or longitudinal ?
All trials included in a review should first have
been appraised using the model for experimental
studies

Validity of Review Results
were the criteria used to select studies for
inclusion in Review both explicit and
appropriate?
Is it likely that any important, relevant studies
were missed? (completeness of literature search)
Was the validity of the included studies
appraised?
Were assessments of the studies reproducible?
(documented and replicated)
Were the results similar from study to study?
(tests of heterogeneity)

Results (Size of Effects and Precision)
What were the overall results of the review - how
large were the effects ?
How precise were the results ?
Applicability Relevance
Are the results applicable in normal practice?
Were all important outcomes considered?
Are the likely treatment benefits worth the
potential harm costs ? (e.g. adverse effects
etc.)
Strengths weaknesses of the Review?
How could the Review be improved?

25
Critical Appraisal - NNTS NNHS

Decide from reading the study if the experimental
group had a better outcome than the control group
- if so, do the NNT
Or
if the control group had a better outcome than
the experimental group - if so, do the NNH
When the experimental treatment decreases risk of
an undesirable outcome NNT and RBI (relative
benefit increase) are useful
Number Needed to Treat number of patients who
need to be treated to cause 1 good outcome
Number Needed to Harm number of patients who
need to be treated to cause 1 bad outcome

EER event rate in the experimental group
CER event rate in the control group
If this is a difference, ignore minus signs
except as a reminder as to whether treatment was
overall helpful or harmful
E (event) outcome (express it as a
decimal eg. 40 occurrence as 0.4)
e.g. in a study comparing mood stabilisers,
a bad outcome might be that the manic state does
not improve with the treatment, or gets worse
Absolute Benefit Increase when the treatment
benefits more experimental subjects than occurs
with those in the control group
ABI EER - CER
Relative Benefit Increase fewer bad outcomes in
the experimental group compared with the control
group
RBI EER - CER / CER
NNT 1 / ABI

EXAMPLE
Treatment of acute mania.
Results are a reduction of a certain amount on
the young mania rating scale (YMRS) After 1 week
DRUG A
65 OF SUBJECTS HAD OUTCOME
PLACEBO
30 OF SUBJECTS HAD OUTCOME
EER event rate in the experimental group
CER event rate in the control group
E (event) outcome 65 (S) 30 (C)
EER IS THUS 0.65 CER IS
THUS 0.30
ABI EER - CER 0.35
NNT 1 / ABI 1 / 0.35 2.86
So number needed to treat is close to 3 - i.e. We
have to treat 3 patients for 1 to get benefit.
This would be an extremely good and impressive
NNT.

28
Asking a Research Question

What is the Question? (the Clinical Problem to be
answered)
What sort of Issue being investigated
An Intervention or Treatment ?
A Diagnostic Test or Instrument ?
A Causal factor ?
A Prognostic Factor ?
What is the main alternative for Comparison
A Control group?
A Comparison group?
A Placebo group?
Comparing 2 interventions?
What is the main Outcome or Outcomes?

29
Examples

You are sure that on-call nights for
psychiatric registrars and crisis nurses are
always busier when there is a full moon. How
would you try to determine whether this is in
fact the case?

You are working in the C-L service of a
general hospital. Budget cuts are threatened and
you have to justify maintaining the C-L service
to several medical wards. One ward refers to C-L
a lot, and the other hardly ever. You feel that
your services C-L input shortens the length of
stay for patients with delirium and self-harm.
How could you demonstrate this in time for next
years budgeting round in 9 months time?

31
Significance - p values

The statistical significance of a result is the
probability that the observed relationship (e.g.,
between variables) or difference (e.g., between
means) in a sample occurred by pure chance, and
that in the population from which the sample was
drawn, no such relationship or differences exist.
The p-value represents the probability of error
in accepting our observed result as valid, or
"representative of the population."

32
P-values

A p-value of 0.05 (1 in 20) indicates that there
is a 5 probability that the relation between the
variables found in our sample is a "fluke."
p values of lt0.05 are by convention 'just'
significant
but this level of significance still involves a
pretty high probability of error (5).
Results that are significant at the p lt0.01 level
are considered by convention statistically
significant, and p lt0.005 or p lt0.001 levels are
often called highly significant.

33
Data-mining and spurious significance

The more analyses you perform on a data set, the
more results will "by chance" meet the
conventional significance level.
For example, if you calculate correlations
between ten variables (i.e., 45 different
correlation coefficients), then you should expect
to find by chance that about two (i.e., one in
every 20) correlation coefficients are
significant at the p lt0.05 level, even if the
values of the variables were totally random and
don't correlate in the population.
Some statistical methods that involve many
comparisons include some "correction" for the
total no. of comparisons - but not all do.

34
(No Transcript)
35
Correlation Coefficients

Shows the extent to which a change in one
variable is associated with change in another
variable the relationship between them.
Best to have /-0.90 and above to show a
correlation
Range from -1.00 to 1.00.
-1.00 perfect (strong) negative relationship.
1.00 perfect (strong) positive relationship.
0.00 (midpoint) no relationship at all.

36
Strength vs Reliability of a Relationship
Between Variables

In general, in a sample of a particular size, the
larger the size of the relationship between
variables, the more reliable the relationship.
If there are few observations, then there are
also few possible combinations of values, so the
probability of a chance combination showing a
strong correlation is high - so small 'n' studies
are statistically weak.
If a correlation between variables in question is
very small in the population, then there's no way
to identify it in a study unless the sample is
very large.
Similarly, if a correlation is very large in the
population, then it can be found to be highly
significant even in a very small sample.
If a coin is slightly asymmetrical, and when
tossed is slightly more likely to produce heads
than tails (e.g. 60 vs. 40), then ten tosses
would not be enough to show that the coin is
asymmetrical. But if the coin is weighted to
almost always fall as heads, then ten tosses
would be quite enough to show this.

37
(No Transcript)
38

Other terms and concepts to learn
Measures of Central Tendancy and of Variability
Types of Data

39
Confidence Interval

If the Confidence Interval does not overlap zero,
the effect is said to be statistically
significant
CI is range of values, within which we're fairly
sure the true value of the parameter being
investigated lies.
If independent samples are taken repeatedly from
the population a Confidence Interval calculated
for each, a certain (confidence level) of the
intervals will include the unknown population
parameter. Confidence intervals are usually
calculated so that this percentage is 95.
Width of the confidence interval shows how
uncertain we are about the unknown parameter.
Very wide interval ? more data should be
collected before anything definite can be said
about parameter.

40
Odds Ratios

Compares frequency of exposure to risk factors in
epidemiological studies
The odds ratio is a reasonable approximation of
the relative risk when the outcome is relatively
large (e.g., when less than 1 of the people
exposed to an agent develop disease). The odds
ratio produces larger errors as the outcome rate
rises above 1.
You can say that a proposed risk factor acts as a
significant risk to disease if
odds ratio is gt1
lower edge of the C.I. gt1

41
VARIOUS TESTS

Have some idea what each is for -
A reasonable reference is
http//www.une.edu.au/WebStat/unit_materials/c6_co
mmon_statistical_tests/
Parametric Tests and Non-Parametric Tests
Nonparametric methods are used when we know
nothing about the distribution of the variable in
the population. Not so much that they are for
non-normal distributed data, but there's no
assumption of a normal distribution
Parametric tests are used where there is a normal
distribution

42
Parametric vs Non-Parametric tests

Memorize a name of each sort e.g.

43
Null Hypothesis

The alternative hypothesis (to the
researchers theory). It usually assumes that
there is no relationship between the dependent
and independent variables. The null hypothesis is
assumed to be correct until research demonstrates
that it is incorrect. This process is known as
falsification.

44
POWER

Type I Error Rate (Alpha)
The probability of incorrectly rejecting a true
null hypothesis (a Type I error gives a false
positive result)
Type II Error Rate (Beta)
The probability of incorrectly accepting a false
null hypothesis (a Type II error gives a false
negative result)

In the social sciences there are conventions
that
? the Type I error (risk of a false positive)
- must be kept at or below 0.05 (50)
? the Type II error (risk of a false
negative)- must be kept low as well (20 or
less, generally)
Statistical Power is equal to 1 - ?
and must be kept correspondingly high
Power should be at least 0.80 (80) to detect a
reasonable departure from the null hypothesis
Statistical Power The probability of
rejecting a false null hypothesis

46
In Reject-Support (RS) research (the usual kind)

(the opposite is true in Accept-Support AS
research)
The researcher wants to reject the null
hypothesis
"Society" wants to control Type I error (false
positives)
The researcher is very concerned about Type II
error (false negative - missing the fact that
you have a result that supports your theory - is
much more likely to get published)
High sample size works for the researcher
But if there is "too much power", trivial effects
become "highly significant"

47
Factors influencing power in a statistical test

1. What kind of statistical test is being used
2. Sample size
3. Size of the experimental effect
4. Level of error in experimental measurements
A Sampling Distribution
the distribution of a statistic over repeated
samples
The Standard Error of the Proportion
the standard deviation of the distribution of the
sample proportion over repeated samples

48
Power Analysis in Studies

In planning a study, one must estimate
What would be the reasonable minimum
experimental effect that one wants to detect
A minimum Power to detect that effect
The sample size that will achieve that desired
level of Power

49
Steps required for Power analysis and sample size
estimation

The type of analysis and the null hypothesis are
specified
Power and required sample size for a reasonable
range of likely experimental effects is
investigated
The sample size required to detect a reasonable
experimental effect (i.e. departure from the null
hypothesis) with a reasonable level of power is
calculated, while allowing for a reasonable
margin of error

Method (Excerpt) Statistical analysis
It was estimated that in order to detect a
30 difference between the percentage of
responders in the control group compared with
that in the exercise group at the P0.05 level of
significance, a sample size of 40 subjects per
group would be required to give a power of 90.
Data on poorly responsive depression are scant
but the proportion of responders in the control
group was reasonably anticipated to be 10,
compared with an anticipated 40 in the exercise
group.

Was a power analysis done prior to the study?
What is the main implication?
Yes. The power was set at 0.9 (90)
Power 1-beta (beta is the probability of making
a Type-II error)
So, 0.9 1- beta, or Beta 1 - 0.9, which is
0.1 or 10. Thus the risk of making a Type-II
error in this study was 10, as opposed to most
studies which set Power at 0.8 - i.e. they
tolerate a risk of 20 of making a Type-II error
(a false negative)
Main Implication was that the study did have
enough power to detect a significant improvement,
which it did not do

52
Ethics in Research

http//www.wma.net/e/policy/b3.htm World
Medical Association Helsinki principles for
research in humansEthics Committees Think
about their role and how to design studies to
meet these requirementsRANZCP principles from
Code of EthicsPsychiatrists involved in
clinical research shall adhere to those relevant
ethical principles embodied in national and
international guidelines

53
College Code of Ethics (paraphrased)

It's done on people so high standards are needed
and must be scientifically justified
Must be OKd by an Ethics Committee
Minimize any harm to subjects
The interests of subjects always takes precedence
over science or society's interests
Informed consent must be obtained from people
participating in research
Special care to be taken with consent from those
in dependent relationships, eg. students,
prisoners, the elderly
For minors - consent from parent/guardian

54
College Code of Ethics (paraphrased)

If subjects aren't competent to consent get this
from a relative or guardian
Subjects can withdraw at any time it won't
jeopardise their care
If a researcher uncovers clinically relevant
information needing acting on, researcher should
tell the patient their doctor
Confidential information obtained from the
research stays within the study
No plagiarism, acknowledge all references
Research reports to be truthful and accurate
Ensure participants are deidentified
Declare any conflict of interest in all
publications