The BioPortal Informatics Infrastructure for Syndromic Surveillance and Biodefense - PowerPoint PPT Presentation

1 / 80
About This Presentation
Title:

The BioPortal Informatics Infrastructure for Syndromic Surveillance and Biodefense

Description:

The BioPortal Informatics Infrastructure for Syndromic Surveillance and Biodefense – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 81
Provided by: Chu8
Category:

less

Transcript and Presenter's Notes

Title: The BioPortal Informatics Infrastructure for Syndromic Surveillance and Biodefense


1
The BioPortal Informatics Infrastructure for
Syndromic Surveillance and Biodefense
  • Hsinchun Chen, Ph.D.
  • Artificial Intelligence Lab, U. of Arizona
  • NSF BioPortal Center

2
Medical Informatics The computational,
algorithmic, database and information- centric
approach to the study of medical and health
care problems. From Medical Informatics
to Infectious Disease Informatics
3
Syndromic Surveillance
  • A syndrome is a set of symptoms or conditions
    that occur together and suggest the presence of a
    certain disease or an increased chance of
    developing the disease (from NIH/NLM)
  • Syndromic surveillance is based on health-related
    data that precede diagnosis and signals a
    sufficient probability of a case or an outbreak
    to warrant further public health response (from
    CDC)
  • Targeting investigation of potential cases
  • Detecting outbreaks associated with bioterrorism

4
Syndromic Surveillance Data Sources in Different
Stages of Developing a Disease ? Reaching
Situational Awareness
Reproduced from Mandl et. al. (2004)
5
Syndromic Surveillance Systems
  • Generation 1, paper-based paper, fax, TEL, TEL
    directory, etc.
  • Generation 2, email-based email, Word/Access,
    pager, cell phone, etc.
  • Generation 3, database-driven database,
    standards, messaging, tabulation, GIS, graphs,
    text, etc.
  • Generation 4, search engine-based real-time,
    interactive, web services, visualized, GIS,
    graphs, texts, sequences, etc.

6
Syndromic Surveillance System Survey
7
Sample Systems and Data Sources Utilized
8
  • BioPortal Overview, WNV, BOT
  • (real-time information collection, sharing,
    access, visualization, and analysis Epi data)
  • Architecture, Information Sharing, Information
    Retrieval, Standards, Policy, Privacy, Data
    Mining, Visualization, HCI

9
Project Background
  • In September, 2002, representatives of 18
    different agencies, including DOD, DOE, DOJ, DHS,
    NIH/NLM, CDC, CIA, NSF, and NASA, are convened to
    discuss disease surveillance
  • AI Lab was chosen to be the technical integrator
    to work with New York and California States to
    develop a prototype system targeting West Nile
    Virus and Botulism

10
BioPortal Project Goals
  • Demonstrate and assess the technical feasibility
    and scalability of an infectious disease
    information sharing (across species and
    jurisdictions), alerting, and analysis framework.
  • Develop and assess advanced data mining and
    visualization techniques for infectious disease
    data analysis and predictive modeling.
  • Identify important technical and policy-related
    challenges in developing a national infectious
    disease information infrastructure.

11
Information Sharing Infrastructure Design
Portal Data Store (MS SQL 2000)
Data Ingest Control Module Cleansing /
Normalization
Info-Sharing Infrastructure
Adaptor
Adaptor
Adaptor
SSL/RSA
SSL/RSA
XML/HL7 Network
PHINMS Network
New
NYSDOH
CADHS
12
Data Access Infrastructure Design
13
Communications/Messaging
  • Messaging Infrastructure installed and tested
  • NYSDOH-UA PHIN MS
  • CADHS-UA Regional message broker
  • NWHC-UA PHIN MS
  • XML generation/conversion tested
  • NY_DeadBird, NY_Alerts, NY_BotHuman, NY_WNVHuman,
    NY_CaptiveAnimal, NY_Mosquito
  • CA_BotHuman, CA_WNVHuman, CA_DeadBird,
    CA_Chicken, CA_Mosquito
  • USGS_Epizoo
  • A scalable, flexible, light-weight messaging
    framework!
  • Easy to include new diseases, new jurisdictions,
    and new techniques!

14
Data Sharing Agreements - MOUs
  • Agreement reached on data sharing principles
    between partner entities
  • Consortium agreement will allow new partners to
    join without full renegotiation
  • Current data sharing agreements and MOU focus on
    institutional level data access
  • NYS template adapted to guide and regulate
    individual user data access/security practice
  • Two levels of data access aggregate and detail
  • A scalable, bottom-up, policy-guided information
    sharing framework!

15
Spatial-Temporal Visualization
  • Integrates four visualization techniques
  • GIS View
  • Periodic Pattern View
  • Timeline View
  • Central Time Slider
  • Visualizes the events in multiple dimensions to
    identify hidden patterns
  • Spatial
  • Temporal
  • Hotspot analysis
  • Phylogenetic (planned)

16
BioPortal Prototype Systems
17
Outbreak Detection Hotspot Analysis
  • Hotspot is a condition indicating some form of
    clustering in a spatial and temporal distribution
    (Rogerson Sun 2001 Theophilides et. al. 2003
    Patil Tailie 2004 Zeng et. al. 2004 Chang et.
    al. 2005)
  • For WNV, localized clusters of dead birds
    typically identify high-risk disease areas
    (Gotham et. al. 2001) automatic detection of
    dead bird clusters can help predict disease
    outbreaks and allocate prevention/control
    resources effectively

18
Retrospective Hotspot Analysis Problem Statement
19
Risk-Adjusted Support Vector Clustering (RSVC)
Feature space
Minimum sphere
Split into several clusters
High baseline density makes two points far apart
in feature space
Estimate baseline density
20
Study II NY WNV
  • On May 26, 2002, the first dead bird with WNV was
    found in NY
  • Based on NYs test dataset

140 records
224 records
March 5
May 26
July 2
new cases
baseline
21
Dead Bird Hotspots Identified
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
BioPortal HotSpot Analysis RSVC, SaTScan, and
CrimeStat Integrated (first visual, real-time
hotspot analysis system for disease surveillance)
  • West Nile virus in California

26
Hotspot Analysis-Enabled STV
27
  • BioPortal Livestock
  • (syndromic category)
  • Information Sharing and Information Retrieval

28
Kansas RSVP-A System
29
Rapid Syndrome Validation Project Animal
(RSVP-A) System
  • URL http//clh.vet.ksu.edu
  • Main function allows vets to enter syndromic
    observations and retrieve statistics bar charts
  • A complete system with administrative functions
    such as profile editing
  • Provides 2 web-based interfaces
  • Regular browser
  • Mobile devices (WAP)
  • Current users 17 vets in 29 counties
  • Projected users 200 (Kansas), 10K (nationwide)

30
BioPortal Integration Livestock
31
Data Characteristics
  • Time Period
  • 7/16 2003 10/17 2005
  • Cross 2 states/29 counties in Kansas and New
    Mexico

32
Imported Attributes
  • RSVP-A monitors 6 syndromes
  • Non-neonatal diarrhea,
  • Neurologic / recumbant
  • Unexpected deaths
  • Weight loss/feed refusal
  • Abortion/birth defect
  • Erosive lesions

33
Records
34
  • BioPortal FMD
  • (phylogenetic tree and news)
  • Information Extraction, Information Sharing, and
    Data/Text Mining

35
FMD global surveillance lessons identified
  • Must understand risks, and nature of changing
    risks, in order to develop strategies for
    prevention and mitigation on a global scale
  • Must understand the global situation in order to
    prepare locally
  • United Kingdom FMD outbreak, 2001 12B, 50-60
    of 4M farm animals (cows, pigs, sheep)
    slaughtered

36
International FMD BioPortal
  • Real time web-based situational awareness of FMD
    outbreaks worldwide through the establishment of
    an international information technology system.
  • FMDv characterization at the genomic level
    integrated with associated epidemiological
    information and modeling tools to forecast
    national, regional and/or international spread
    and the prospect of importation into the US and
    the rest of North America.
  • Web-based crisis management of resourcesfacilitie
    s, personnel, diagnostics, and therapeutics.

37
Preliminary Global FMD Dataset
  • Provider UC Davis FMD Lab
  • Information sources reference labs and OIE
  • Coverage 28 countries globally
  • Time span May, 1905 March, 2005
  • Dataset size 30,000 records of which 6789
    records are complete
  • Host species Cattle, Caprine, Ovine, Bovine,
    Swine, NK, Elephant, Buffalo, Sheep, Camelidae,
    Goat

38
Global FMD Coverage in BioPortal
39
FMD Migration Visualization using BioPortal
(cases in South Asia)
FMD Cases travel back and forth between countries
40
BioPortal-Afghanistan
41
International FMD News
  • Provider UC Davis FMD Lab
  • Information sources Google, Yahoo, and open
    Internet sources
  • Time span Oct 4, 2004 present (real-time
    messaging under development)
  • Data size 460 events (6/21/05)
  • Coverage 51 countries
  • (Africa11, Asia16,
  • Europe12, Americas12)

42
Searching FMD News
  • http//fmd.ucdavis.edu/
  • Searchable by
  • Date range
  • Country
  • Keyword

43
Visualizing FMD News on BioPortal
44
FMD Genetic Visualization
  • Goal Extend STV to incorporate 3rd dimension,
    phylogenetic distance
  • Include a phylogenetic tree.
  • Identify phylogenetic groups and color-code the
    isolate points on the map.
  • Leverage available NCBI tools such as BLAST.
  • Proof of concept SAT 2 3 analysis
  • Data 54 partial DNA sequence records in South
    Africa received from UC Davis FMD Lab
    (Bastos,A.D. et al. 2000, 2003)
  • Date range 1978-1998
  • Countries covered South Africa, Zimbabwe,
    Zambia, Namibia, Botswana

45
Sample FMD Sequence Records
Color-coded View (MEGA3)
Textual View of Gene Sequence
46
Phylogenetic Treeof Sample FMD Data
Identify 6 groups within 2 major families (MEGA3
based on sequence similarity)
Group6
Group1
Group2
Group5
Group4
Group3
47
Genetic, Spatial, and Temporal Visualization of
FMD Data
Phylogenetic tree color coded
Isolates locations color coded
Isolates appearances in time
48
FMD Time Sequence Analysis
First family cases appeared throughout the period
2nd family cases exist before 1993 and a comeback
lately
Second family cases existed before 1993 and
reappeared later after 1997
49
FMD Periodic Pattern Analysis
2nd family concentrated in Feb. while 1st family
spread evenly
50
Locations of Family 1 records
Selected only groups 1, 2, and 3 and found a
spatial cluster
51
Locations of Family 2 records
Sparse isolate locations
Selected only groups 4, 5, and 6
52
FMD BioPortal activity
  • Launched January 5, 2007
  • 65 users from gt15 countries
  • Belgium, Brazil, Canada, France, Germany,
    Italy, India, Iran, Netherlands, Pakistan,
    Paraguay, South Africa, Sweden, U.S., U.K.
  • Research institutes, diagnostic labs,
    government and international agencies and
    organizations, universities

53
  • BioPortal Arizona
  • (chief complaint syndromic surveillance)
  • Text Mining, Ontology

54
Chief Complaints As a Data Source
  • Chief complaints (CCs) are short free-text
    phrases entered by triage practitioners
    describing reasons for patients ER visit
  • Examples lt foot pain left foot pain cp
    chest pain sob shortness of breath so
    should be sob poss uti possibly urinary
    tract infection
  • Advantages of using CCs for surveillance purposes
  • Timeliness Diagnose results are on average 6
    hours slower than CCs
  • Availability and low-cost Most hospitals have
    free-text CCs available in electronic form

55
Existing CC Classification Methods
56
Syndromic Categories in Different Systems
57
Overall System Design
Chief Complaints
58
A Stage 2 Example CC Concepts ? Symptom Group
Concepts
coagulopathy
purpura
ecchymosis
bleeding 1/41/51/6 0.62
4
5
6
Blood In urine
ureteral stone
5
other1/50.2
coma
5
coma1/50.2 dead1/50.2
UMLS
5
out pass
altered_mental_status 1/50.2
59
System Benchmarks
  • Both RODS (Tsui et. al., 2003) and EARS (CDC,
    2006 Hutwagner et. al., 2003) serve as the
    benchmarks
  • RODS uses supervised learning method
  • EARS uses rule-based method
  • Both system are available for test
  • Performance criteria are calculated by comparing
    system outputs with the gold standard

60
Syndromic Categories in Different Systems
61
Expert Agreement by Syndromic Category
  • Syndromic categories with kappa lower than 0.7
    and Other were both excluded in the evaluation

62
Performance Criteria
  • Sensitivity (recall) TP/(TPFN)
  • Specificity (negative recall) TN/(FPTN)
  • Precision TP/(TPFP)
  • F-measure 2 Precision Recall / (Precision
    Recall)
  • In the context of syndromic surveillance,
    sensitivity is more important than precision and
    specificity (Chapman, 2005). Thus, the F2-measure
    is used
  • F2 measure weights recall twice as much as
    precision.
  • F2-measure (12)Precision Recall / (2Recall
    Precision)
  • Note TPTrue Positive, TNTrue Negative
    FPFalse Positive, FNFalse Negative

63
Comparing BioPortal to RODS
p-value lt 0.1 p-value lt
0.05 p-value lt 0.01 Statistical test is
based on 2,500 bootstrapings.
64
Comparing BioPortal to EARS
p-value lt 0.1 p-value lt
0.05 p-value lt 0.01 Statistical test is based
on 2,500 bootstrapings.
65
  • BioPortal Taiwan (international CC syndromic
    surveillance)
  • Data/Text Mining, Information Retrieval and
    Visualization

66
Multi-lingual Chief ComplaintsChinese Example
  • Data Characteristics
  • Mixed expressions in both Chinese and English
  • ????FEVER???????????(?)
  • ??,?????A/W,????,????
  • 18 CC records from NTU Med. Center contain
    Chinese expressions.
  • Some hospitals have 100 CC records in Chinese
    (For example, ??????)
  • Misspellings and typographic errors are not
    serious

67
Prevalence of Chinese Chief Complaints
  • Medical Center ?????? (100),???? (18), ??????
    (8)
  • Regional Hospital ???? (99), ??????? (87),
    ?????? (72),?????? (50), , etc.
  • Local Hospital ?????? (100), ???? (93), ??????
    (88), , etc.

68
The Role of Chinese Chief Complaints in Syndromic
Surveillance Systems
  • The most important role of Chinese words/phrases
    is for describing symptom related information
  • Example ?????? ???????? ????? ??
  • Chinese Punctuation
  • Name Entity
  • Example Diarrhea SINCE THIS MORNING. Group
    poisoning. Having dinner at ??? restaurant.

69
Chinese CC Preprocessing System Design
English Expressions
Translated Chinese Phrases
Stage 0.1
Stage 0.2
Stage 0.3
Segmented Chinese Phrases
Chinese Expressions
Separate Chinese and English Expressions
Chinese Phrase Segmentation
Chinese Phrase Translation
Chinese Chief Complaints
Chinese to English Dictionary
Chinese Medical Phrases
Common Chinese Phrases
Raw Chinese CCs
Mutual Info.
70
Chinese Phrases Segmentation
  • Technology Used
  • MI (Mutual Information)
  • Test bed
  • 1978 records from hospital A
  • 18 records have Chinese expression
  • Results
  • 726 phrases extracted
  • 370 (51) are medical related
  • Example
  • Input ????, ???????,???
  • Output ?-?-?? , ?-??-???-? , ???

71
Chinese Phrases Translation
  • Recruited 3 physicians to help translating 370
    extracted Chinese terms
  • 280 (76) terms have consistent translation
  • Example
  • Input
  • ?-?-?? , ?-??-???-? , ???
  • Intermediate output
  • N/A-N/A-fighting , N/A-N/A-head injury-N/A ,
    epistaxis
  • Final result
  • fighting , head injury , epistaxis

72
Result Self Validation
  • Use the 280 translations against 1978 chief
    complaints from hospital A
  • 1610 (82) records are in English
  • 368 (18) records contain Chinese
  • 36 contains trivial info.
  • Eg. r/o septic shock ????
  • 64 contains non-trivial info.
  • Eg. poor intake and ????
  • 67 has complete translation
  • 2 has partial translation
  • 20 does not have translation

73
Group by Hospital
74
Group by Syndrome Classification
75
  • BioPortal Taiwan
  • (SARS, social network visualization)
  • Information Extraction, Data Mining, and
    Visualization

76
The Taiwan SARS Data
  • Collected by the Graduate Institute of
    Epidemiology at National Taiwan University. The
    data contains the interview records of 1582
    suspected cases, including 479 confirmed SARS
    cases in which the sources of infection for 89
    cases are still unknown.
  • Three kinds of information
  • Symptom information
  • Symptoms
  • The onset dates of the symptoms
  • Contact information
  • Family members
  • Roommates
  • Classmates/Colleague
  • The two-days contact history before the onset of
    symptoms
  • Visiting information
  • Foreign countries
  • Hospital/High risk areas
  • Transportation

77
SARS Spreading Network Based On Family Members
The Family of Nurse ?
78
SARS Spreading Network Based On Family Members
and Visiting Info of High-Risk Areas
The Family of Nurse ?
The Bridges Formed from the Visit in Peace Hosp.
at 4/21
79
  • BioPortal
  • Towards building integrated, real-time situation
    awareness for syndromic surveillance and
    biodefense

80
BioPortal Information
  • Hsinchun Chen, hchen_at_eller.arizona.edu
  • AI Lab, http//ai.arizona.edu
  • BioPortal Demo
  • http//bioportal.org
Write a Comment
User Comments (0)
About PowerShow.com