Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application - PowerPoint PPT Presentation

About This Presentation
Title:

Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application

Description:

For example, if 2% of all reports have PROZAC as a drug, and 3% of all reports ... of the reports will include this combination (PROZAC in combination with RASH) ... – PowerPoint PPT presentation

Number of Views:140
Avg rating:3.0/5.0
Slides: 34
Provided by: richardfer
Category:

less

Transcript and Presenter's Notes

Title: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application


1
Detecting Multi-Item Associations and Temporal
Trends Using the WebVDME/MGPS Application
  • DIMACS Tutorial on Statistical and Other Analytic
    Health Surveillance Methods
  • 18 June 2003
  • Richard Ferris

2
Pharmaceutical post-marketing surveillance
  • Companies and regulatory agencies collect
    databases of spontaneous adverse reaction reports
  • Relevant exposure data not readily available (the
    denominator problem)
  • Can drug-event combinations of potential interest
    be identified from internal evidence alone?
  • Approach
  • Use an internally defined denominator
  • Construct set of expected counts using a
    stratified independence model

3
Computation of Expected Counts
  • The expected count for a given drug-event
    combination is determined by the overall count
    for the particular drug (across all events) and
    the overall count of the particular event (across
    all drugs)
  • For example, if 2 of all reports have PROZAC as
    a drug, and 3 of all reports have RASH as an
    event, then one would expect that 0.06
    (0.020.03) of the reports will include this
    combination (PROZAC in combination with RASH)
  • (MGPS carries out this computation separately for
    each distinct stratum and sums the
    strata-specific expected counts to obtain an
    overall expected count)

4
Comparing Observed and Expected CountsRelative
Reporting Rate
  • Relative Report Rate (RR) RRij Nij / Eij
  • Easy to interpret, easy to compute
  • Statistically unstable if N is small or E is very
    small
  • The following all have RR 100
  • N 1000, E 10
  • N 100, E 1
  • N 10, E 0.1
  • N 1, E 0.01

5
Comparing Observed and Expected
CountsStatistical Significance
  • What is the probability that Nij would be
    observed by chance (sampling error) when
    expected value is Eij ? (p-value for testing a
    null hypothesis)
  • Harder to interpret (not expressed in same units
    as RR)
  • Results in computation of absurdly small
    probabilities that have no meaning
  • N100, E1 produces 10-158 !
  • Small RR can be very significant (small p-value)
    when sample size is very large
  • N 2000, E 1000, RR 2 is more
    significant than
  • N 10, E 0.1, RR 100

6
Comparing Observed and Expected CountsEmpirical
Bayes Multi-Item Gamma Poisson Shrinker
  • Try for best of both previous approaches
  • interpretability of relative rate
  • adjust properly for sampling variation
  • Focus on the distribution across the set of
    drug-event combinations of the ratios
  • Estimate lij mij /Eij , where Nij
    Poisson(mij )
  • Fit a parameterized prior distribution function
    (mixture of two gamma functions) to the empirical
    distribution of the ls
  • Find posterior distribution of l after observing
    N some value n
  • Use this to obtain posterior estimate of
    expectation value of l given observation of Nij
  • This posterior estimate is what we call EBGM
    (Empirical Bayes Geometric Mean) also get lower
    and upper 95 confidence bounds (EB05, EB95).
  • EBGM is termed the shrinkage estimate for RR

7
Multi-Item Associationsvs. Pairwise Associations
  • Consider the case of an item triplet e.g. 2
    drugs and an event
  • RRijk Nijk/Eijk where Eijk is based on
    independence model
  • EBGMijk shrinkage estimate of RRijk
  • Suppose a particular itemset (drug A, drug B,
    event C kidney failure) is unusually frequent
    (EBGM for the triplet is gtgt 2)
  • Important to ask
  • Is this merely the result of one or more of the
    pairs (AB, AC, BC) being unusually frequent? OR
  • Is this a drug-drug interaction
  • Compare Empirical Bayes estimate of the frequency
    count of the triplet to the prediction from the
    all-2-factor log-linear model
  • EXCESS2 (EBGM E ) EAll2F
  • E is the expected count from independence
  • Computation of EAll2F uses shrinkage estimates of
    pairwise counts
  • EXCESS2 is an estimate of how many extra cases
    were observed over what was expected using the
    all-2-factor model
  • Alternate approach Define Eijk from predictions
    of all-2-factor model in which case resulting
    EBGM directly measures divergence of observed
    count from all-2-factor prediction

8
Health Authority Adoption of Signal Detection
Technologies
  • FDA
  • CDER
  • Experimented in Office of Biostatistics with GPS
    for several years
  • Validated GPS
  • Moving to production
  • Have published data mining results on internal
    web for almost all products
  • CBER
  • initial GPS implementation (VAERS)
  • CRADA between Lincoln and FDA to further develop
    methodology and tools
  • CDC
  • Collaborative GPS methodology development with
    FDA
  • Includes simulation capability
  • WHO Uppsala Monitoring Centre
  • Production safety signal generation mechanism
    using BCPNN

9
FDA/GPS Validation Activities
  • Positive controls
  • Examine data mining results for drug-event
    combinations corresponding to known labeled
    adverse reactions
  • Negative controls
  • Examine data mining results for several drugs
    (with differing safety profiles) given for the
    same indication
  • Roll back database in time to determine when
    method would have provided first signal

10
Databases of Spontaneous AE Reports
  • FDA Spontaneous Report System (SRS)
  • Post-Marketing Surveillance of all Drugs since
    1969
  • Dates from mid-60s thru 1997
  • 1.5 Million Reports
  • Encoded in COSTART
  • FDA Adverse Event Reporting System (AERS)
  • US cases, serious unlabeled events from all
    manufacturers.
  • All products sold in the US 5000 Rxs
  • Replaced SRS in 1997
  • Reactions coded as MedDRA PTs
  • Quarterly Updates, 4-6 month delay
  • Drugs are Verbatim
  • Includes initial and some follow-up reports
  • Includes Demographics, Reactions, Drugs,
    Outcomes, etc.
  • FDA/CDC Vaccine Adverse Events (VAERS)
  • Stricter Laws for Vaccine Adverse Event Reporting

11
Signal Detection DemonstrationUsing VAERS Data
12
Significant EBGM and even extremely
conservative EB05 with small N
13
Simple Rankings by Signal Strength
14
Evolution of Signals Over Time
15
Multi-Symptom Syndromes (Higher Order
Associations)
16
The Serotonin Syndrome
  • Could MGPS be used to identify unknown syndromes?
  • Try mining the AERS data for significant event
    triples using a known syndrome.
  • "The symptoms of the serotonin syndrome are
    euphoria, drowsiness, sustained rapid eye
    movement, overreaction of the reflexes, rapid
    muscle contraction and relaxation in the ankle
    causing abnormal movements of the foot,
    clumsiness, restlessness, feeling drunk and
    dizzy, muscle contraction and relaxation in the
    jaw, sweating, intoxication, muscle twitching,
    rigidity, high body temperature, mental status
    changes were frequent (including confusion and
    hypomania - a "happy drunk" state), shivering,
    diarrhea, loss of consciousness and death. (The
    Serotonin Syndrome, AM J PSYCHIATRY, June 1991)

17
(No Transcript)
18
Using Simulation to Testthe Signal Detection
Process
19
Interpreting Simulation Parameters
Outcome
Yes
No
Yes
P-R
R
P
Exposure
1-P-QR
Q-R
No
1-P
Q
1-Q
1
  1. As R ? P and (Q-R) ? (1-P)  gt No Signal
  2. As R ? P and (Q-R) ltlt (1-P) gt Strong Signal
  3. When R ltlt P and (Q-R)?(1-P) gt No Signal
  4. When R ltlt P and (Q-R) ltlt (1-P) gt Rare event

20
Using Simulation to Create a Receiver Operating
Characteristic (ROC) Curve for EBGM
  • An ROC curve displays the true-positive rate
    (sensitivity) versus the false-positive rate(1
    specificity) for a statistic
  • Ran a 20 iteration simulation using P 0.003Q
    0.001 and R 0.00003 (RR 10) to check the
    true-positive rate
  • Ran a 20 iteration simulation using P 0.003,Q
    0.001 and R 0.0003 (RR 1) to check the
    false-positive rate

21
ROC Curve Based on Simulated Injection of Signals
22
Simulating a Rare Event
  • Sample 100,000 records from VAERS data
  • Set P 0.003, Q 0.001, R 0.00003
  • Iterate 20 Monte Carlo simulations
  • Expect (on average)
  • 0.003 x 100,000 300 Rare Exposures
  • 0.001 x 100,000 100 Rare Outcomes
  • 0.00003 x 100,000 3 Rare Exposure Rare
    Outcome combinations
  • E (300 x 100) / 100,000 0.3
  • RR 3/ 0.3 10

23
Base Simulation on VAERS Data
24
Sample Cases From VAERS
25
Sample 100,000 Cases
26
P 0.003 Q 0.001 R 0.00003
27
20 Monte Carlo Iterations
28
RareExposure Expected N 300
29
RareOutcome Expected N 100
30
RareExposure RareOutcome Expected N
3Expected RR 10
31
Technical Details
  • William DuMouchel. Bayesian Data Mining in Large
    Frequency Tables (with Discussion). The American
    Statistician (1999) pp 177-190.
  • William Dumouchel and Daryl Pregibon. Empirical
    Bayes Screening for Multi-Item Associations.
    Proceedings of KDD 2001.

32
Methodology History and Key Contributors
  • Stephan Evans
  • MCA, UK
  • Proportional reporting ratio (PRR) with Chi 2
    analyses
  • Simple, highly intuitive, can be calculated by
    hand
  • Bate, Lindquist, Edwards et. al.
  • WHO Uppsala Monitoring Centre
  • Bayesian neural network method for adverse drug
    reaction signal generation
  • Ana Szarfman, FDA (CDER) and Bill DuMouchel (ATT)
  • Empiric Bayes, more robust than PRR for small n
  • MGPS method statistical parameter is EGBM
  • William DuMouchel. Bayesian Data Mining in Large
    Frequency Tables (with Discussion). The American
    Statistician (1999) pp 177-190.
  • William Dumouchel and Daryl Pregibon. Empirical
    Bayes Screening for Multi-Item Associations.
    Proceedings of KDD 2001.
  • Multidimensional analyses possible
  • Interactions, gender and other demographic
    associates, syndrome identification
  • Can directly compare EBGM values of different
    drugs, as well as for a specific drug

33
Key Contributors (continued)
  • WHO Collaborating Center for Internatl Drug
    Monitoring M Lindquist, M Stahl, A. Bate, R.
    Edwards, RH Meyboom.
  • Bayesian confidence propagation neural network
    (BCPNN) . Information Component (IC) statistic is
    the measure of the strength of DE relationship
  • Iterative approach
  • L. Gould . Comparison and refinement of Bayesian
    approaches for evaluating spontaneous reports of
    ADRs. DIA Annual meeting, July 2001, (Denver)
  • EB vs BCPNN similar results
  • Thakrar, BT, Blesch, KS, Sacks, ST, Wilcock, K
    (2001)
  • (ISPE, Pharmacoepid. Drug Safety 10),
  • PRR vs. EB similar sensitivity, EB better at
    ranking events based on small N.
Write a Comment
User Comments (0)
About PowerShow.com