Title: MetaAnalysis, Data Mining, and Scientific Reasoning Academy Colloquium: Research Methodology Royal N
1Meta-Analysis, Data Mining, and Scientific
ReasoningAcademy Colloquium Research
MethodologyRoyal Netherlands Academy of Arts
and SciencesAmsterdam, The NetherlandsJanuary
2000John E. CornellGeriatric Research,
Education, and Clinical CenterSouth Texas
Veterans Health Care SystemandDepartment of
MedicineUniversity of Texas Health Science Center
2The Agony and The Ecstasy
- Value and limitations of secondary analyses of
large databases as evidence-based vehicles to
inform health care policy and practice - Secondary analyses are often related to uses and
questions that fall outside the original purpose
and study design
3Oh, the Agony!
- the agony of trying to explain to critics why
the attained sample, although considerably
incomplete is still suitable for the purposes
envisioned in the investigation. - (Feinleib, 1984, p. 784)
4Meta-Analysis and Data Mining
- Retrospective observational studies
- Estimation of causal effects
- Uncover interesting patterns
- Operate on multiple, heterogeneous data sources
- Provide a quantitative basis for decision making
5Meta-Analysis
- Meta-analysis is a set of quantitative methods
that addresses the fundamental problem of
replication in scientific inquiry. - Interpret patterns of results and combine
evidence from different experimental studies to
assess the validity and strength of the evidence
for or against a hypothesis.
6Goals of Meta-Analysis
- Efficiently integrate published research findings
- Establish consistency of treatment effects across
populations, settings, and differences in the way
treatments are implemented - Explore effects of explanatory variables that may
influence variation in treatment effects - Employ methods that minimize bias and random
errors in abstraction, summarization, and
presentation of research evidence
7Data Mining
- A set of statistical methods and computer
algorithms that produce an enumeration of
patterns in a set of data. - Assesses the validity, novelty, potential
usefulness, and understandability of the patterns - A pattern recognition paradigm whose primary goal
is to detect unsuspected relationships in large
databases that are of interest or value to the
owner
8Data Mining
9Concept of Interestingness
- Evidence Statistical significance
- Redundancy
- Similarity to other findings
- Time ordering
- Usefulness Relation to goals of user
- Novelty Deviation from prior knowledge
- Generality Fraction of population the finding
relates to
10Meta-Analysis Data Mining
- Meta-Analysis
- Generate Well-Formulated Research Questions
- Identify Relevant Databases and Develop Efficient
Search Strategies - Review Abstracts to Determine Eligibility
- Apply Strict Inclusion/Exclusion Criteria and
Abstract the Data - Select Meta-Analytic Model(s)
- Perform the Required Analyses
- Interpret the Results
- Determine Implications for Health Care Policy and
Practice
- Data Mining
- Collect Information on Application Domain
- Create Target Database
- Clean Data
- Reduce Data
- Select Data Mining Approach/Algorithm
- Execute the Algorithm
- Interpret the Patterns Discovered
- Determine Appropriate Actions
11Knowledge Summing UP vs. Knowledge Discovery
12Knowledge Summing UP
- Meta-Analysis facilitates our discovering what is
known - Scientific knowledge
- Replication
- Cumulative
- Hypothetico-deductive approach
13Knowledge Summing UP
- Synthesis of existing evidence and the
reconciliation or explanation of contradictory
findings is driven and directed by a priori
articulation of precise hypotheses that make
explicit statements about the expected results
derived from a set of research findings.
14Forest Plot
15Knowledge Discovery
- Data Mining seeks to uncover novel associations
that represent new knowledge - Scientific knowledge
- Atheoretical, Empirical
- Novel, Serendipitous
- Inductive approach
16Knowledge Discovery
- The knowledge domain provides a context that
guides the search process and provides criteria - Rarely does the data analyst start with a
specified set of hypotheses to be confirmed or
disconfirmed by the data - Knowledge is derived from the natural
associations that appear in the data stream
17Uncertainty
18Sources of Uncertainty
19Publication Bias
- A selection bias in the published literature
such that publication of research depends on the
nature and direction of the study results (e.g.,
statistically significant findings, language
bias, covert multiple publication, etc.).
20Publication Bias
4
2
Odds Ratio--log scale
0
-2
0
.5
1
Standard Error
21Meta-Analysis, Data Mining and Scientific
Reasoning
22Challenges
- Research methodology is the integration of
philosophy of science with mathematics - New methodologies challenge existing ideas about
the nature of scientific reasoning - Meta-Analysis and Data Mining elicit criticism
because there use runs counter to accepted models
of scientific reasoning
23Nature of Evidence
- Property of data that influences our beliefs
- Nature of Data
- Information
- Random or Fixed
- Nature of Beliefs
- Hypotheses
- Random or Fixed
24Probabilistic Nature of Scientific Reasoning
- In a multitude of circumstances the physicist is
often in the same position as the gambler who
reckons up his chances. Every time he reasons by
induction, he more of less consciously requires
the calculus of probability. - (Poincaré, 1905)
25Evidence and Statistical Reasoning
- Classical approach
- Likelihood approach
- Bayesian approach
26Meta-Analysis and Data Mining
- Prior belief going into this exercise
- Meta-Analysis is a hypothetico-deductive
enterprise - Data Mining is a strictly empirical inductive
enterprise - Posterior belief based on the evidence
- Deductive/Inductive logic distinction still is
useful - Rethink the nature of data and the nature of
scientific reasoning
27(No Transcript)
28Directed Graphical Model