Towards Evidence-Based Discovery - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Towards Evidence-Based Discovery

Description:

ebd.lis.illinois.edu – PowerPoint PPT presentation

Number of Views:148
Avg rating:3.0/5.0
Slides: 47
Provided by: Catherin333
Category:

less

Transcript and Presenter's Notes

Title: Towards Evidence-Based Discovery


1
Towards Evidence-Based Discovery
  • Catherine Blake
  • School of Information and Library Science
  • University of North Carolina at Chapel Hill
  • http//www.ils.unc.edu/cablake
  • cablake_at_email.unc.edu

2
Motivation
  • Relentless increase in electronically available
    text
  • Life Sciences
  • 17 millionth entry added in April 2007
  • 5,200 journals indexed
  • 12,000 new articles each week !
  • Chemistry more than 110,000 articles in 1 year
    alone
  • Consequences
  • Hundreds of thousands of relevant articles
  • Implicit connections between literature go
    unnoticed

Shift from Retrieval to Synthesis
3
Information Overload
  • One of the diseases of this age is the
    multiplicity of books they doth so overcharge
    the world that it is not able to digest the
    abundance of idle matter that is every day
    hatched and brought forth into the world
  • - Barnaby Rich, 1613

4
Evidence-Based Discovery
If I have seen further than others, it is by standing upon the shoulders of giants. Sir Isaac Newton We can't solve problems using the same kind of thinking we used when we created them. Albert Einstein
5
(No Transcript)
6
Outline
  • Motivation
  • Case Studies
  • METIS
  • Human synthesis
  • Natural language processing
  • Claim Jumping through Scientific Literature
  • Next Steps
  • Summary

7
Systematic Review Process
  • Formulate the problem
  • Locate and select studies
  • Assess quality of studies
  • Collect data
  • Analyze and present results
  • Interpret results
  • Improve and update review

28 months from initial idea to publication
Increased demand due to evidence-based medicine
8
Manual Synthesis
Guesswork guided by scientifically trained
intuition Rescher (1978)
Select
Verify
Extract
Analyze
9
Context Information
  • Study Information
  • e.g. date, location, ...
  • Population Information
  • e.g. gender, age, ...
  • Risk Factor or Intervention
  • e.g. duration of exposure, confounders
  • Disease
  • e.g. stage, confounders

Loosely coupled to review focus
Tightly coupled to review focus
10
Collaborative Information Synthesis
11
Key Estimate Missing Information
2
1
What are people with Breast Cancer
exposed to?
What are people in a similar population exposed
to?
  • Facts for each study
  • number of patients
  • age of patients
  • geographic location
  • risk-factor exposure
  • Codebook
  • question asked
  • age, gender
  • responses

Database of risk factors BRFSS
Studies with Breast Cancer patients
3
Are these rates significantly different?
T. Tengs N. D. Osgood (2001) The link between
smoking and Impotence Two Decades of Evidence,
Preventive Medicine, 32447-52
12
More than Automated Meta-Analysis
  • Traditional analysis
  • same study design
  • medicine RCT
  • epidemiology cohort
  • Information Synthesis
  • any study that includes required information
  • augment missing information

Systematic Review
Key
Main topic
Entire study
Secondary Information
External database
13
(No Transcript)
14
METIS Information Extractor
  • Semantic Grammar
  • Features words, numbers, and semantic types in
    the Unified Medical Language System (UMLS)
  • Information extracted
  • risk factor exposure (tobacco and alcohol ) ?
    gender
  • age (min, max, mean) ? start and end dates
  • number of subjects with medical condition ?
    geographical location

termage termof number10ltn2lt110t
ermtonumber10ltn2lt110 The age of breast
cancer subjects ranged between 20 to 64 years
old. semantic type neoplastic process, or
disease
15
METIS Info Extractor Evaluation
  • Diverse text corpus
  • epidemiology, surgery, biology, ...
  • cohort studies, case-control trials, ...
  • Evaluation
  • Metrics (precision, recall)
  • Annotators (developer, domain expert, expert
    annotator, novice)
  • Primary topic (breast cancer, impotence)
  • Secondary information (tobacco and alcohol
    consumption)

16
METIS Info Extractor Recall
17
METIS Info Extractor Precision
18
METIS Verifier
Converted Article
Electronic version of article
Verify information extracted
19
METIS Verifier
20
METIS Analyzer
  • Meta-Analysis
  • Developed for agricultural application
  • Requires empirical studies with a quantitative
    outcome
  • Unit of study is an article - not a person
  • Result a unitless metric called an effect size
  • Two common meta-analysis techniques
  • Fixed effects
  • Randomized-effects model

Evaluation Compared generated effect size with
examples in text books and published
articles , Result Same effect size
21
Synthetic Estimate Evaluation
Tobacco Consumption
22
(No Transcript)
23
(No Transcript)
24
Outline
  • Motivation
  • Case Studies
  • METIS
  • Claim Jumping
  • Human discovery
  • Natural language processing
  • Human-assisted discovery and synthesis
  • Next Steps
  • Summary

25
(No Transcript)
26
Human Discovery
  • Day-to-day activities of scientists reflect
  • the complex socio-technical environments in which
    successful creativity tools will eventually be
    embedded
  • the human cognitive processing surrounding
    creativity
  • Unit of analysis a paper or grant proposal

How do chemists arrive at their research question
?
How do chemists transform an idea into a
publication ?
27
Approach
  • Recruitment
  • experienced scientists (7-45 yrs)
  • local chemists and chemical engineers
  • response rate 84 (21/25)
  • Semi-structured interviews
  • Critical incident technique
  • seminal paper in their field
  • recent paper authored by the participant
  • paper authored by the participant that they were
    particularly proud of

28
Interview Questions
  • Discovery Questions
  • What is your definition of discovery ?
  • What evidence convinced you that the paper
    addressed the initial research questions ?
  • What factors limited the adoption and deployment
    of the discovery ?
  • How did you arrive at the research question ?
  • What if any existing evidence prompted the
    study/experiment ?
  • Were there any alternative explanations ?
  • Information Usage questions
  • Other than the scientific literature, what
    information resources do you draw from to aid in
    your research processes ?
  • How many articles did you read last month that
    related to each of those projects ?
  • Is that typical of how many articles you read in
    a month for research projects ?
  • Do you read articles for another purpose ? If so
    what?
  • How many hours do you spend reading journal
    articles for research projects?
  • Which journals do you typically read and draw
    from ?
  • How would you characterize the journals that you
    read- are they only within your domain, or do you
    read journals that would be considered
    non-traditional in your research ?
  • If you only have a few minutes to read an
    article, what parts would you read?
  • What do you do with the article once you have
    read it ?

29
Chemists and Chemical Engineers
  • Compared with other scientists chemists and
    chemical engineers
  • read more (Brown,1999)
  • have more personal subscriptions to journals
    (Noble Coughlin, 1997)
  • spend more time reading (Tenopir King, 2003)
  • visit the library more often (Brown, 1999)
  • Consequences
  • information disseminated quickly
  • information has a relative short lifespan

30
Human Discovery Findings
  • Discovery definition
  • Novelty - Balance theory and experimentation
  • Build on existing ideas - Practical application
  • Simplicity
  • Hypothesis generation
  • Discussion - Previous experiments
  • Combine expertise - Read literature
  • Hypothesis validation
  • Iterative - Tightly coupled

31
(No Transcript)
32
Causal Relationships
  • Newspaper genre
  • Causal relationships (Khoo, Chan, Niu, 1998)
  • Biomedical genre
  • Causes and treats (Price Delcambre, 2005)
  • Causal knowledge (Khoo, Chan, Niu, 2000)
  • Universal Grammar
  • Causatives (Comrie, 1974, 1981)
  • Action verbs (Thomson, 1987)

33
Claim Definition
  • To assert in the face of possible contradiction
  • Example sentence reporting a claim
  • This study showed that Tamoxifen reduces the
    breast cancer risk
  • Example Claim Framework
  • Tamoxifenagent
  • reduceschange
  • breast cancer risk object

34
The Claim Framework
  • Goal
  • go beyond genes and proteins
  • differentiate between different levels of
    confidence in the claim
  • consider claims made in the full text
  • Working hypothesis
  • literature will report findings using constructs
    within the Claim Framework
  • human annotators will agree on facets

35
Preliminary Results
  • 29 articles from TREC Genomics
  • Total number of sentences 5535
  • Sentences with gt1 claim 1250 (22.6)
  • Total number of claims 3228
  • Average claims per sentence 2.51
  • Claims that did not fit in the Framework 31
  • Per article
  • Average number of sentences 191
  • Average number of sentences with gt1 claim43

36
Distribution of Claim Categories
Category Total () Total () Pilot() Pilot() Main() Main()
Explicit 2489 77.11 332 83.42 2157 76.63
Implicit 87 2.70 3 0.75 84 2.98
Observation 298 9.23 24 6.03 274 9.73
Correlation 174 5.39 12 3.02 162 5.75
Comparison 165 5.11 27 6.85 138 4.9
Total 3228 100 398 100 2830 100
37
All Documents All Documents All Documents All Documents
Annotation Total () Total () Words (Avg) Words (Avg)
Agent 2894 89.65 5221 1.80
Agent Direction 285 8.83 291 1.02
Agent Modifier 1246 38.60 4448 3.57
Object 3197 99.04 6849 2.14
Object Direction 271 8.40 283 1.04
Object Modifier 1561 48.36 5383 3.44
Change 1897 58.77 1953 1.03
Change Direction 1337 41.42 1358 1.02
Change Modifier 1147 35.53 1618 1.41
Claim Basis 165 5.11 394 2.39
Claim Basis Dir. 42 1.30 43 1.02
Claim Basis Mod. 86 2.66 266 3.09
Total 3228   28107 8.70
38
Inter Annotator Agreement
  • Information Facet Kappa Agreement
  • Agent 0.71 substantial
  • Object 0.77 substantial
  • Change 0.57 moderate
  • ChangeChangeDir 0.88 almost perfect

39
Location of Claims
Total Sentences Total Sentences Total Sentences  
With
Section Claim Total section claim
Abstract 98 309 31.72 7.84
Introduction 357 979 36.47 28.56
Method 6 1121 0.54 0.48
Result 293 1829 16.02 23.44
Discussion 539 1406 38.34 43.12
Total 1250 5535 22.58 100.00
40
(No Transcript)
41
User Study
Steven W. Matson Ph.D. Professor and
Chair Department of Biology   Robert C Millikan
DVM PhD Barbara Sorenson Hulka Distinguished
Professor Department of Epidemiology School of
Public Health   Dr. Rosa Perelmuter,
PhD Director, Moore Undergraduate
Research Apprentice Program Professor of Spanish
and Assistant Dean, Academic Advising
Program   Jan F. Prins PhD. Professor of Computer
Science and Chairman, Department of Computer
Science   Alexander Tropsha, Ph.D. Professor and
Chair Director, Laboratory for Molecular
Modeling   Suzanne West, PhD Researcher Health,
Social and Economics Research RTI International
  • Timothy S. Carey, MD, MPH
  • Sarah Graham Kenan Professor of Medicine
  • Director, Cecil G Sheps Center for Health
    Services Research
  •  
  • Ila Cote, PhD, DABT
  • Acting Division Director
  • US Environmental Protection Agency
  • National Center for Environmental Assessment
  •  
  • Michael T Crimmins PhD.
  • Mary Ann Smith Distinguished Professor of
    Chemistry UNC and Department Chair, Department
    of Chemistry
  •  
  • Paul Jones
  • Clinical Associate Professor
  • School of Information and Library Science
  • Director of ibiblio.org
  •  
  • Rudy L Juliano PhD.
  • Boshamer Distinguished Professor of Pharmacology

42
(No Transcript)
43
Closing Comments
  • Accelerate synthesis
  • Breast cancer study without METIS would take gt13
    years
  • Without synthetic estimate systematic review
  • Accelerate discovery
  • Connections between literature
  • Speculative and orthogonal views
  • Human discovery and synthesis
  • As important if not more so than automation

Tap the vast reservoir of human knowledge Louis
Round Wilson, 1929
44
Acknowledgements
  • Claim Jumping
  • Funded in part by
  • Faculty fellowship from the Renaissance Computing
    Institute
  • UNC Faculty Award
  • Thanks to collaborators
  • Nassib Nassar and Mats Rynge  (RENCI)
  • Amol Bapat and Ryan Jones (SILS)
  • Chemists and Chemical Engineers Study
  • Funded in part by
  • NSF Center for Environmentally Responsible
    Solvents and Processes
  • METIS
  • Funded in part by
  • California Breast Cancer Research program
  • University of California, Irvine
  • Thanks to user groups
  • Particularly to Dr. Adams and Dr. Tengs
  • Academic mentoring
  • Primary Advisor Dr. Wanda Pratt
  • Medical Mentor Dr. Catherine Carpenter
  • Co-Advisors Dr Dennis Kibler and Dr Michael
    Pazzani
  • Committee Member Dr Paul Dourish

45
Questions and Comments Welcome
  • Catherine Blake
  • cablake_at_email.unc.edu
  • School of Information and Library Science
  • University of North Carolina at Chapel Hill
  • http//www.ils.unc.edu/cablake

46
Publication Bias
  • Studies that find a correlation between a risk
    factor and disease are more likely to be
    published (Easterbrook et al, 1991, Ingelfinger
    et al, 1994)
  • METIS provides a new way to explore this bias

Bias introduced by authors, editors, funding, ...
Write a Comment
User Comments (0)
About PowerShow.com