Mark Haggard, Helen Spencer - PowerPoint PPT Presentation

Loading...

PPT – Mark Haggard, Helen Spencer PowerPoint presentation | free to view - id: 131467-YmY4N



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Mark Haggard, Helen Spencer

Description:

origins, applications & ultra-short OM2-13. with Wendy Floate Cathy Harper. Mid-Cheshire Hospitals NHS Trust Kingston Hospital NHS Trust ... – PowerPoint PPT presentation

Number of Views:100
Avg rating:3.0/5.0
Slides: 51
Provided by: essofficem
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Mark Haggard, Helen Spencer


1
OM8-30 for assessment outcome in OME origins,
applications ultra-short OM2-13
Mark Haggard, Helen Spencer Mariella
Gregori MRC Multi-centre Otitis Media Study
Group, Cambridge, UK
with Wendy Floate Cathy Harper Mid-Cheshire
Hospitals NHS Trust Kingston Hospital NHS
Trust Eurotitis 2 Study Group Collaborators
BACDA 20th Anniversary Meeting London, 27
January, revised Autumn 2006 for website
2
It is necessary to have
  • Valid and reliable measures of outcome in
    important domains for clinical trials other
    types of study
  • Standardised efficient assessment tools
  • defining cases as to health and development
    status, not just pathology
  • as clinical indicators for treatment decisions
  • International comparability on such measures (for
    general communication about case types, or
    multi-national studies that seek to accumulate or
    contrast)

3
The next 5 slides convey
  • The starting point in deciding what facets and
    domains to include in a measure usually done by
    interviewing about 20 people qualitative
    preliminaries. We wanted greater
    comprehensiveness and quantitative estimate of
    relative importance so gave an open-ended
    questionnaire to over 1000
  • The overview of the multiple stages gone through
    in the psychometric development. This is highly
    simplified and each stage actually consisted of
    many sub-stages. The latter had occasionally to
    differ according to the aim of the resulting
    measure, eg behaviour (generic, developed on
    unaffected children) vs reported hearing
    difficulties (RHD items scaled see below
    against HL to maximise correlation) vs all others
    (conventional internal consistency within
    clinical sample)
  • The resulting mixture of facets served by items
    in OM8-30
  • Their breakdown into reliably supportable domains
  • The high concurrent (criterion) validity of the
    32-item short form against the 83-item long form
    used in the TARGET trial for maximum
    inclusiveness. Diminishing returns are met, in
    that the gain in reliability from adding back in
    the extra 51 items does not make the full set
    much better in various applications such as
    showing group or treatment differences (examples
    of equivalence not given here)

4
Content validity most often mentioned categories
of parent concern (N1100) Long-form (TARGET
trial) short-form (OM8-30) measures take
weighting from of mentions
Hearing 20.8 Behaviour 8.3 Speech/Lang 6.3 Safe
ty 3.9 Miscellaneous 5.7 Balance 0.9
School progress 20.1 Family Impact 10.9 Child
QoL? 13.1 Physical health 5.5 L-T
ear/hearing 3.2 Sleep pattern 1.3
90 of concerns covered by OM8-30 measures
Comprising ear symptoms, respiratory symptoms,
global health and other physical
problems ?Covers missing out socially,
missing out ambiguous reaction of others to
child, vague future, child quality of life
Aligns with Parent Quality of Life, when some
of below are included Covers ambig.
communication, non-acknowledgement, service
delivery, treatment anxieties,other
5
Provisional internally constructed scale
Definition of scale unit
External scale
Item scaling
Item selection
Item weighting
  • Quantifi-cation of each response level
  • Response rate
  • Range
  • Consistency
  • Reliability
  • Validity
  • Formulation of scaled score by principal component

Ear Infection
N
Various types of validation of developed score
Items not used
Item Pool
Score
6
Given 6 domains, numbers of items in each
mini-measure must be small
Behaviour (6)
Parent QoL (5)
School progress (1)
Speech/language (3)
Sleep pattern (3)
Reported Hearing Difficulty (4)
Ear symptoms (3)
Global health (1)
Respiratory Symptoms (5)
7
2-factor summary of impact with 27 items
Behaviour (6)
Parent QoL (5)
Developmentalimpact (15)
School progress (1)
Speech/language (3)
Forbiasadjustment(4)
Sleep pattern (3)
Physicalhealth (12)
Reported Hearing Difficulties
Ear symptoms (3)
Global health (1)
Respiratory Symptoms (5)
8
Concurrent validity correlation of bias-adjusted
total score with bias-adjusted 83- item TARGET
total (unadjusted even higher)
Weighted bias-adjusted total (83 items)
r 0.90 N 324
(Bias-adjusted) 27-item score from OM8-30
9
The next 6 slides convey
  • The general format of the items and the need not
    to assume that the separations between the
    adjacent pairs of response levels are necessarily
    all equal (eg 1.00)
  • The idea of optimally scaling the response-levels
    for a Likert item, to maximise its discriminating
    potential. This is done by a regression between
    the item, distinguishing its response levels
    initially as floating categories, and the raw
    total count (ie for each individual in a very
    large sample). Thus the best spacing between the
    response levels is determined by the average
    spacing in the item count for similar items, as
    this is what maximises the correlation. The
    particular example shows an additional
    sophistication contingent scoring for one
    response category. That has now been abandoned as
    the multiplication involved has been found to add
    unnecessary variability frequent colds is now
    scored as a separate additive item like any other
    and a single scale value attributed to the only
    when colds answers
  • A graphic representation of this idea whereby for
    the underlying more sensitive scale the equality
    assumption is wrong (orange) whereas the scaled
    version with empirically assigned spacings
    between response levels is correct in the sense
    that it maps the item more efficiently onto a
    better and more highly aggregated version of what
    it is trying to measure. One could go round the
    iterative loop one or more further times, eg
    basing the total not on the raw dichotomy items
    but such scaled versions, and then re-scaling.
    However this is labour-intensive and automatic
    algorithms to do this are perhaps not to be
    trusted at this stage. Maximum gain comes from
    the first stage as described
  • Using the expected moderate correlation of a
    reported scale (RHD) with a measured one (HL) as
    a test, the enhancement to the correlation from
    scaling (compare penultimate with last column) is
    worth having. Although 7 does not sound huge, it
    can be mapped into substantial savings in sample
    size hence feasibility of studies. Evident in the
    comparison table is the equivalence of the gain
    from scaling to an approximate doubling of the
    length of the part of questionnaire by adding
    back in the less good 5 items discarded for
    OM8-30
  • Clarification of the distinction between the
    purpose and content of the long short forms
  • The particular items before and after the
    selection for the short form (OM8-30)

10
Response options and item scaling
Typical questionnaire format How often does
your child.ltACTIONgt Typical response options?
Never ? Sometimes ? Often ? Always Typical item
coding for data entry0 Never 1 Sometimes 2
Often 3 Always Possible item scaling,
obtained by predicting HL (categorical
regression) 0 Never 0.1 Sometimes 0.5
Often 0.6 Always
11
Item format scaling for scoring
12
Scaling by categorical regression
frequency-based Likert item
Always
Mean HL (transformed)
Equispaced data entry coding Justified
final coding
Never
Coding Scaling
0.6(3)
13
Item scaling improves score reliability
validity by 7
RHD-9 RHD-4 RHD-9 RHD-4 RHD-4 Scaled
Scaled Not scaled Correlation with
avHL 0.400 0.398 0.352 Data set A
Correlation with avHL 0.530 0.527 0.466
Data set B
r (RHD-9 with RHD-4) set A 0.93
14
Developing RHD-9 and RHD-4
  • RHD-9 Best way of totalling 9 best items on
    Reported Hearing Difficulties
  • Comprehensive and reliable for clinical
    research
  • Items scaled and weighted for optimality
  • RHD-4 Simple short-form (the 4 items given
    highest weight by 1st principal component)
  • Efficient and simple for routine practice
  • Items scaled, but for simplicity not weighted
    as optimum weights are highly similar

15
RHD hearing items (abbreviated)
1) How would you describe your childs
hearing? 2) Has hearing ability
varied? 3) Speaks unduly loudly? 4) Raises
sound level of TV/radio? 5) Responds when
called in a normal voice? 6) Mishears words
when not looking at you? 7) Turns wrong way to
a call or sound? 8) Difficulty hearing when
spoken face to face in quiet room? 9) Difficul
ty hearing when with a group of
people? 10) Asks for things to be repeated?

16
The next 4 slides convey
  • Eight (the 8 in the title OM8-30) a priori facets
    of presentation were used to cluster the items
    and allow them to help select one another on the
    basis of internal consistency. However if you
    only have time and data-entry capacity for 30
    items you cannot reliably support measures of 8
    facets, an all-too-infrequently recognised limit.
    For extremely large samples it would be worth
    comparing sub-scores of down to 3 items, but more
    generally about 6 items per score is a limit and
    roughly double this is a safe and reliable way to
    proceed. Thus an empirical evidence base is
    sought for a 2-domain summary of the items that
    are scored for this purpose. Structural equation
    modelling (SEM summarised in the path diagram)
    has shown on the full data that a 2-domain
    summary is indeed more efficient and parsimonious
    than either a 1- or a 3-domain summary. This is
    not surprising, and is a result often found in
    development of outcome measures, for example even
    in the generic SF-36 which produces a physical
    and a mental domain. Blue arrows are assumed
    causal, red ones a matter of marking a construct
    by contribution, and green and purple arrows show
    correlated residuals strictly outside the model
    but which might be of either type or a third type
    joint manifestations of a common unmarked cause
  • The simplest view is that the relation between
    the two domains is causal. However developmental
    impact also has many other determinants than
    physical health in OME psychological and
    social, as we have documented in detail elsewhere
  • One material part of the models structure is the
    strong linkage between the part of reported
    hearing difficulties that is NOT explained by the
    HL, and the parts of all other measures that are
    not explained by their structural relations to
    each other. This is most economically expressed
    on the underlying variables (in effect, the
    totals) for the 2 main summary domains (see
    curves in light green). This is the evidence that
    response bias can and must be extracted the
    excess of report over what is measurable may not
    all be due to pure bias, but it is usefully
    interpreted as a bias adjustment. Failure to
    consider this contribution to all scores results
    in unnecessarily high error and failure to
    reflect expected and generally confirmed
    relationships. Where bias exists it is
    consequently reduced by fitting the bias term
  • Specifically for OM8-30, a hierarchical view of
    this structure and the scores possible is useful,
    one in which the RHD items are set apart for bias
    adjustment and not totalled into impact

17
Impacts of OME SEM defines the best summary
measures
AverageHearing Level(HL)
Reportedhearingdifficulty
RespiratorySymptoms
Sleep Pattern
Ear Symptoms
Socialconfidence
Global Health Rating (1)
Physical health ( sleep)
Developmental outcomes
Speech Language
Parent Quality of Life (QoL)
Age
Anxiety
Schooling Concerns (1)
BalanceProblems
Context- directed Behaviour
17
18
Construct validity Structural equation modelling
on full 11 TARGET measures did confirm that they
are best summarised in 2, not 1 nor 3, factors
?
?
?
18
19
Enabling an efficient summary of all
variables that is rooted in basic biological
constraints
Reportedhearingdifficulty
AverageHearing Level(HL)
0.23
0.71
Fits details of data extremely well, and is
parsimonious (only 20, not 13 X 14/2 91, links
between mini-measures)
Age
19
20
Hierarchical simplification for OM8-30
Reported Hearing Difficulties
TOTAL
4
Sleep
3
3
Speech Language
Physical Health
Respiratory
General Developmental Impact
5
Behaviour
Ear Infections
Global Health
6
School Progress
Parent QoL
3
1
1
5
21
Current activities agenda OM8-30
  • Re-standardisation in UK and NZ with the items in
    their current number (30) positions in
    sequence, no other context
  • Further development of facility for adjustment of
    parental response bias (field version using
    tympanometry not HL)
  • Completing programme of translations for European
    languages and getting large enough national
    standardisation samples
  • Piloting applications in audit

22
Bias-adjustment what why ?
  • Physical and developmental impact can be measured
    objectively, but this is totally infeasible and
    unaffordable for clinical work
  • So, such information, essential for assessment
    and decision must be provided by reports
    answers to Qs
  • Patients/parents differing mental standards
    affect their mentioning quantitative responses
  • Knowledge about such distortions in judgements
    can be used to improve validity hence value of
    (self-) report
  • specifically, by adjusting for bias.
  • using discrepancies between quantitative report
    and objectively predicted values

23
RHD-4 score within OM8-30 curves summarise the
distributions at various HLs
90thile
250
50thile
200
10thile
150
RHD-4 (1st clinical visit )
100
50
0
0
10
20
30
40
50
60
Average HL (1st clinical visit)
24
RHD-4 (best hearing difficulty questions) HL
curve zones show residual as bias estimator
90th ile
Over-concerned 10
75th ile
25th ile
RHD-4 (1st clinical visit)
10th ile
Under-aware 10
Average HL (1st clinical visit)
25
Response bias adjustment
STEP 1
STEP 2
Hearing Level or weighted tymp score
Reported Hearing Difficulties (RHD)
2
Residuals (RB estimate), Predicted values
2000
26
Residuals discrepancies of data-points from the
best-fit regression line
STEP 2
Residual (RB) 10
Predicted value (regression) line
Overconcerned
20
RHD, Reported Hearing Difficulties
10
Underaware
2
Residual (RB) -5
0
10
40
30
20
Hearing Level
2000
27
Response bias adjustment (RBA)
STEP 2
Expected influences
Parental bias (RB)
Adjusting analysis (multiple regression)
Better model and estimate of particular influence
adjusted (minimising subjective bias)
28
Good distribution of OM8-30 total score after
bias-adjustment Baseline data from children
included in TARGET
Case frequency
29
But how to do bias-adjustment without having to
obtain HL eg with tympanogram A, C1, C2, B,
not giving continuous scale ?Make them do so
you dont even need ME Pressure Max
ComplianceWe derived formula predicting 2-ear,
4-freq, mean HL in 1489 cases 25dBHL....
thus estimating weights for A, C1, C2, B on each
ear Distribution of this tymp-based score is
slightly lumpy, but with care can be used in
samples up to 30dB HL severity
30
Showing the value of bias-adjustment
Epidemiological associations
Age, SEG, sex referral source, etc as predictors
Parents response biases (RB) at V2
Adjusting regression to predict (1) physical
health or (2) developmental impact at V2
Stronger model if RB included
31
Severity distributions for this test
sample Actual HL (2-ear ave) Tymp-based
prediction of HL
The many B-tymps create a nasty spike of
undifferentiable pseudo-HLs in the distribution,
but we now have a method to predict HL for a
child with B tymps, so can disperse the spike,
giving a more continuous powerful measure
32
Bias-adjusting one risk-factor model gives a
power gain of 26 from reducing error(model
sex, age referral source, SEG ? impact)
Multiple variance Residual error R
explained in dev. units Unadjusted Bias-adjusted
(via tymps) Bias-adjusted (via HL)
0.31 10 0.44 0.57 32 0.38 0.58 34 0.38
Sex becomes NS with adjustment, giving a
simpler model using the pseudo-HL described in
previous slides Bias-adjustment uses reported
hearing difficulties on the same occasion (TARGET
V2) as developmental impact.
33
Bias-adjusting another risk-factor model (age,
SEG ? physical health) gains 17in power
  • Multiple variance Residual error R
    explained in dev. units
  • Unadjusted
  • Bias-adjusted (via tymps)
  • Bias-adjusted (via HL)

0.29 9 2.15 0.51 26 1.96 0.52 27 1.94
  • Using RHD-4 from OM8-30 on the same occasion
    (TARGET V2) as physical outcome. N 244 up to
    30dB HL, including some unaffected children.
  • On full RHD9 physical health scores (9 19
    items) for TARGET V1V2 average with N 1261,
    this model gave R0.42, 18 Vexpl, 13 power
    gain

34
HL no better here to derive bias-adjustment term
than tymp-based pseudo HL
Regression to predict residual from derivation
formula via performance determinants
Age, concentration, time to do audiogram,
method (play etc)
SEG marginal Type NS (play/conventional)
..significantly predict residual from regression
of HL with tympanogram data
Multiple R 0.23 fairly good model ( but other
sources of discrepancy too)
35
Why avoid HL ? It is needed at many points, but
typical time for 4-frequency, 2-ear, PTA is just
under 15 minutes, so bias-adjusting OM8-30 from
tympanogram is worthwhile
36
Import of this work
  • The performance factors in measured HL can be
    teased out with a simple ste of variables
  • Bias-adjustment works, reducing variability in
    reported measures
  • In the ranges of age and of HL typical of OME
  • Tymp-based adjustment is as good as HL-based
  • We know why (irrelevant performance variation in
    HL)
  • This points to a possible clinical approach
    re-distributing time productively away from just
    HL measurement
  • Obtain OM8-30, tymps
  • Confirm diagnosis from these
  • If BB or BC2, impact gt a certain criterion,
    test HL at 1 4khz and note average
  • If difference gt Y do bone conduction to identify
    any SNHL underlay
  • Save expensive clinician and audiologist time for
    more important activities, including further
    tests in some children eg Speech-in-Noise
  • Use more of cheaper time of computer clerical
    assistant

37
International validations basis1 TARGET sample
severe for OME2 Finland special wrt RAOM
  • TARGET visit 1 selected by UK gatekeeping, visit
    2 by trial entry criterion ( 2 X 20dBHL, BE)
  • RAOM seen in secondary care in Finnish system due
    to high staffing minimal gatekeeping meet
    demand
  • Finland differs on OM8-30 RAOM items (plt0.01)
  • Literature reflects greater frequency of AOM seen
    in recent years, despite no increased virulence
  • Joki-Erkkila VP, Laippala P, Pukander J. Increase
    in paediatric acute otitis media diagnosed by
    primary care in two Finnish municipalities--1994-5
    versus 1978-9. Epidemiol Infect 1998121529-534
  • Joki-Erkkila VP, Pukander J, Laippala P.
    Alteration of clinical picture and treatment of
    pediatric acute otitis media over the past two
    decades. Int J P Otorhinolaryngol 200055197-201
  • Blomgren K, Pohjavuori S, Poussa T, Hatakka K,
    Korpela R, Pitkäranta A. Effect of accurate
    diagnostic criteria on incidence of acute otitis
    media in otitis-prone children. Scand J Infect
    Dis 2004366-9

38
Two extreme samples revealed by OM8-30in 2-D
plot, when scores adjusted for bias
Strong negative correlation between physical
health and development across centres, both
adjusted for their significant determinants (age,
response bias, but also development adjusted
for selectivity prior op/2nd visit)
4.4
r -0.88
Finland
4.2
UK ENT (TARGET)
4
Physical health
Netherlands
France
Kingston
3.8
Belgium
Cheshire
3.6
1.9
2
2.1
2.2
2.3
2.4
Developmental impact (square root)
A negative correlation is expected because a
child will be referred either for RAOM or OME
symptoms or a mixture
39
pattern less sensible if not bias-adjusted
Raw centre means for physical health against
developmental impact, not adjusted for bias or
severities (r 0.17)
5
UK (TARGET)
4.5
Finland
UK Cheshire
Raw physical health mean
4
Netherlands
Belgium
UK Kingston
3.5
France
3
1.9
2
2.1
2.2
2.3
2.4
2.5
Raw developmental impact mean (square root)
40
National variation is more in the difference
(physical - developmental) than in the sum
Physical health Worse Development
worse
Physical health
Developmental impact (square root)
41
Value of Eurotitis 2 international
standardisation study to OM8-30
  • Factor structure hence basis of scoring highly
    similar across language translations and
    healthcare cultures
  • 2-D plot of (phys dev) vs (phys dev) ie total
    vs difference best way to think about impact
  • Adjusting for simple clinical differences, and
    particularly for parental response bias
  • Reveals expected known effects much better
  • Leaves little purely national variation
  • This makes standardisation, including
    inter-nationally feasible without needing vast N
    or vast budget
  • Model for impact (HL, RAOM, URTI ? developmental
    impact) strong in TARGET and Eurotitis 2 data
  • Ease of application of bias-adjustment in OM8-30
    considerably boosted by demonstration (in
    Eurotitis 2 data also) that tymp-based formula
    pseudo-HL gives results similar to true HL for
    samples varying through normal, marginal,
    definite OME (not so applicable if many HLs go
    above 30 dB)
  • Principal component version of factor structure
    allows selection of best items for
    ultra-short-form OM2-13 Eurotitis 2 data support
    very similar selection hence next logical
    development

42
OM8-30 does several useful things, as recently
shown in several languages health(care)
culturesBut has 32 items ? 8-10 minutes Some
clinicians and public health types will expect
ultra- short Q-aires to do more than reliability
permits So, can an ultra-short form be useful
at all ?
YES !
43
Why two ultra-short forms ?
  • Small number of items cannot reliably cover
    multiple measures (or aims)
  • Indicators reside in pathophysiology quite
    precise reportable symptoms
  • Outcomes must be broader many things influence
    them, hence variable
  • Different items, and more of them required (for
    reliability, given the variability)
  • But two domains supportable (just !)

44
OM2-9 INDICATORS.(9 standard history items
from OM8-30, to guide decision on VTs /-
adenoidectomy)
Give ad with ? VTs, if history bad
(eg score TARGET median) HL 20dB
Give VTs-only if history bad HL 20dB ?

Respiratory infection history (6)
Ear infection history (3)
Evidence for these criteria described
previously (eg BACO) - at end if time
45
OM2-13 OUTCOMES (summary domains only, no
bias-adjustment nor treatment indicators)
No items on schooling, speech/language kept ?
Slight content shift places a ceiling on
correlation (0.837) so underestimating
concurrent validity
Developmental Impact (7)
No reported hearing difficulty items, so no
bias-adjustment
Physical Health (6)
Concurrent validity correlation with phys in
OM8-30 r 0.938
46
Use of short formsto refine, revise or
justify clinical policies via audit studiesie
not throw out baby with bathwater
47
Audits require such tools PLUS
  • Ownership by interested group of Drs, some
    incentive to a high participation rate
  • Slight resources
  • Remind, manage, enter data, convene review
  • Database tool (existing from OM8-30)
  • We will modify for OM2-9 2-13 if demand
    justifies
  • Usually, ethical approval not needed
  • Sufficient numbers (ie power calculation needed)
  • 1 or more appropriate audit questions, eg
  • Is a rule/guideline being followed ? (OM2-9 )
  • Are X of outcomes above standard ? (OM2-13)
  • Example from New Zealand available

48
Future prospects for ultra-short forms
  • Audit via OM2-9 the application of the
    evidence-based case selection criteria (RAOM,
    URTI, developmental impact) for grommets, based
    on TARGET RCT treatment analyses
  • Audit service quality via OM2-13, but for general
    quality of outcomes only, as these items were not
    selected for differential indication
  • For both forms, develop further and make more
    robust the already automated scoring software and
    evaluate time and errors of manual scoring by
    professionals
  • Develop automated data capture
  • Scoring routines (eg optical/magnetic reading)
  • For many parents in future, on-line administration

49
Parting shot given current NHS changes,there
may be increased need to show
OM8-30
  • Research base appropriate tools
  • The capacity to monitor outcomes
  • Do hospitals differ in outcomes ?
  • In TARGET, outcomes did differ, but only slightly
  • This was due mostly to the rate of discretionary
    adenoidectomy (longer-term benefits on HL etc)
  • Benchmarks for quality useful, implying reference
    data, even if patient choice agenda is not
    pursued
  • The capacity to accurately select patients for
    the ability to benefit

OM2-13
HL OM2-9
50
Contact for further information
MRC Multi-centre Otitis Media Study Group,
Cambridge, UK mark.haggard_at_mrc-cbu.cam.ac.uk
About PowerShow.com