Title: BioPortal: Disease and Bioagent Information Sharing, Surveillance, Analysis, and Visualization
1BioPortalDisease and Bioagent Information
Sharing, Surveillance, Analysis, and Visualization
- Research Team
- University of Arizona
- University of California, Davis
- Kansas State University
- Arizona Department of Public Health
- University of Utah
- New York State Department of Health/HRI
- California Department of Health Services/PHFE
- U.S. Geological Survey
- The SIMI Group
- Acknowledgements NSF, ITIC, DHS, DOD/AFMIC,
IDIWC, AZDPS
2Research Partners and Supports
- University of Arizona
- University of California, Davis
- Kansas State University
- University of Utah
- Arizona Department of Public Health
- New York State Department of Health/HRI
- California Department of Health Services/PHFE
- U.S. Geological Survey
- The SIMI Group
- NSF
- CIA/ITIC
- DHS
- DOD/AFMIC
- CDC
- AZDPS
3UA Team Members
- Dr. Hsinchun Chen
- Dr. Daniel Zeng
- Lu Tseng
- Cathy Larson
- Kira Joslin
- Wei Chang
- James Ma
- Hsinmin Lu
- Ping Yan
- Aaron Sun
- Keith Alcock
- Sapna Brahmanandam
- Milind Chabbi
- Yuan Wang
4Outline
- Project Background
- BioPortal V1.0 Achievements
- System Architecture
- System Functionalities
- BioPortal Collaboration Framework
- New Developments
- International Foot-and-mouth Disease Monitoring
- Syndromic Surveillance
- Livestock Health Surveillance
5BioPortal Background
Acknowledgment NSF, ITIC, NYSDH, CDHS,
USGS (Drs. Kvach and Ascher)
6Background (I)
- In September, 2002, representatives of 18
different agencies, including DOD, DOE, DOJ, DHS,
NIH/NLM, CDC, CIA, NSF, and NASA, were convened
to discuss disease surveillance. - An interagency working group called Disease
Informatics Senior Coordinating Committee (DISCC)
was established. - DISCC established an Infectious Disease
Informatics Working Committee (IDIWC) to survey
the field and identify gaps. - IDIWC developed requirements for a National
Infectious Disease Informatics Infrastructure
(NIDII).
7Background (II)
- In June, 2003, IDIWC was charged with the task of
developing one or more rapid prototype systems to
demonstrate interoperability and innovation
across species and jurisdictions. - Botulism and West Nile virus were selected as
diseases. - States of New York and California were selected
as partners. - The University of Arizona was chosen as
integrator and was provided with a supplement to
an existing NSF grant.
8BioPortal Project Goals
- Demonstrate and assess the technical feasibility
and scalability of an infectious disease
information sharing (across species and
jurisdictions), alerting, and analysis framework. - Develop and assess advanced data mining and
visualization techniques for infectious disease
data analysis and predictive modeling. - Identify important technical and policy-related
challenges in developing a national infectious
disease information infrastructure.
9BioPortal V1.0 Accomplishments
- Prototype system design and development
- Initial design and implementation of
interoperable messaging backbones - Live prototype systems
- Preliminary user evaluation
- Information sharing
- Data sharing agreements/memoranda of
understanding (MOUs) developed - Many disease datasets integrated into the portal
- Analysis and visualization
- Hotspot analysis research
- Spatial-Temporal Visualizer (STV)
10Information Sharing Infrastructure Design
Portal Data Store (MS SQL 2000)
Data Ingest Control Module Cleansing /
Normalization
Info-Sharing Infrastructure
Adaptor
Adaptor
Adaptor
SSL/RSA
SSL/RSA
XML/HL7 Network
PHINMS Network
New
NYSDOH
CADHS
11Data Access Infrastructure Design
12BioPortal Collaboration Framework
- A Memorandum of Understanding (MOU) is used to
document the relationship between parties that
will be sharing data - Who the entities are and how they will act
independently and cooperatively - What the mutual interests, benefits, and purposes
of sharing data are - How each party will maintain control over and
share their resources, and what each party shall
provide to the other (e.g., system accounts,
portal access) - Which types of data are to be shared (e.g., dead
bird surveillance)
13Summary of MOU
- Confidentiality
- Data is not to be shared outside of the project.
- Data is to be returned or destroyed after 5
years. - Ownership
- Original data is owned by providers.
- Data analysis is jointly owned.
- Scope
- Specific diseases are listed.
- Additional diseases can be added.
- Parties agree separately on which data elements
can be shared (e.g., species, gender, etc.) - Purpose
- Data may be used for system development, for
example.
14Datasets Integrated WNV, BOT
15Communications/Messaging
- Scalable, flexible, light-weight, and extendible.
Easy to include - New diseases
- New jurisdictions
- New techniques!
- Messaging infrastructure installed and tested
- NYSDOH-UA PHIN MS
- CADHS-UA Regional message broker
- NWHC-UA PHIN MS
- XML generation/conversion
- NY_DeadBird, NY_Alerts, NY_BotHuman, NY_WNVHuman,
NY_CaptiveAnimal, NY_Mosquito - CA_BotHuman, CA_WNVHuman, CA_DeadBird,
CA_Chicken, CA_Mosquito - USGS_Epizoo
16BioPortal Research Framework
- BioPortal Demo Develop the system for
demonstration purposes using scrubbed data.
Refine system functionality and performance based
on user feedback. - BioPortal Operation Develop the system for
production mode with real data and real users. - BioPortal Research Continue to develop
advanced technologies and practical sharing
policies. Expand to new diseases and
jurisdictions.
17BioPortal Prototype Systems
18Spatio-Temporal Data Mining Hotspot Analysis
- A hotspot is a condition indicating some form of
clustering in a spatial and temporal distribution
(Rogerson Sun 2001 Theophilides et al. 2003
Patil Tailie 2004). - For WNV, localized clusters of dead birds
typically identify high-risk disease areas
(Gotham et al. 2001). - Automatic detection of dead bird clusters using
hotspot analysis can help predict disease
outbreaks and aid in effective allocation of
prevention/control resources.
19Existing Hotspot Analysis Approach SaTScan
- The spatial scan statistical techniques
implemented in SaTScan are widely used to detect
and evaluate disease outbreaks (Kulldorff 2001). - NYSDOH has used SaTScan to develop an early
warning system for WNV (Gotham et al. 2001). - An important factor considered by spatial scan
statistical analysis is the baseline. - The significance of the density of dead birds
depends on the historical distribution of bird
deaths, human population, and so on.
20Other Hotspot Analysis Approaches CrimeStat and
RSVC
- Hotspot analysis techniques applied to crime
analysis CrimeStat (Levine 2002). - CrimeStats Risk-Adjusted Nearest Neighbor
Hierarchical Clustering (RNNH) Uses a kernel
density estimation obtained from baseline data to
adjust the threshold that controls whether data
points can be grouped together. - Risk-Adjusted Support Vector Machine Clustering
(RSVC) It combines the power and flexibility of
support vector machine-based clustering and the
risk adjustment idea of RNNH.
21Case Study (NY WNV)
- On May 26, 2002, the first dead bird with WNV was
found in NY - Based on NYs test dataset
140 records
224 records
March 5
May 26
July 2
new cases
baseline
22(No Transcript)
23Dead Bird Hotspots Identified
24Hotspot Analysis Findings
- RSVC delivers similar recall levels and higher
precision than SaTScan. - RNNH matches RSVC precision, but has very low
recall. - RSVC significantly outperforms other methods in
the F-measure. - Techniques could be complementary for different
hotspot analysis tasks.
25Spatial-Temporal Visualization
- Integrates four visualization techniques
- GIS View
- Periodic Pattern View
- Timeline View
- Central Time Slider
- Visualizes the events in multiple dimensions to
identify hidden patterns - Spatial
- Temporal
- Hotspot analysis
- Phylogenetic (planned)
26(No Transcript)
27(No Transcript)
28(No Transcript)
29BioPortal HotSpot Analysis RSVC, SaTScan, and
CrimeStat Integrated (first visual, real-time
hotspot analysis system for disease surveillance)
- West Nile virus in California
30Hotspot Analysis-Enabled STV
31BioPortal New Developments
- NSF Infectious Disease Informatics Grant (2004-9)
- International Foot-and-Mouth Disease BioPortal
(2005-6) FMD Lab, UC Davis - Human Syndromic Surveillance System Arizona
State Department of Health (2005-6) - Livestock Syndromic Surveillance System Kansas
State University RSVP-A (2005-6)
32New Research Directions
- Analytical Algorithms
- Prospective hotspot analysis auto baseline
discovery - Spatial-Temporal correlation analysis
- Dynamic Network Analysis
- Visualization
- International FMD news visualization
- Phylogenetic Spatial-Temporal visualization
- Syndromic Surveillance
- Syndromic surveillance system survey
- Emergency room chief complaint syndromic
classification - Livestock syndromic surveillance
33Extended BioPortal Research Framework
- BioPortal Demo
- BioPortal Operation
- BioPortal Research
- FMD BioPortal A dedicated instance of
BioPortal customized for International
Foot-and-Mouth disease monitoring. Additional
functionalities such as gene sequence analysis
and FMD News are added - BioPortal Syndromic Surveillance A specialized
BioPortal instance that processes chief
complainants using a hybrid method of ontology
and knowledge rules - BioPortal Livestock A BioPortal instance
devoted in Livestock syndromic surveillance case
management and data analysis
34International FMD BioPortal
Acknowledgment DHS, DOD, UC Davis (Drs. Thurmond
and Lynch)
35Introduction
- Foot-and-mouth disease (FMD) is the top disease
on the Office International des Epizooties (OIE)
List A, which can infect all cloven-hoofed
animals. - FMD is the most contagious infectious diseases of
livestock animals - Massive shedding of virus and contamination of
the environment. - Transmitted by direct or indirect contact
(droplets), animate vectors (humans), inanimate
vectors (vehicles - Serologically diverse with seven distinct types
(A, O, C, SAT1, SAT2, SAT3, Asia1), which makes
diagnosis and vaccination problematic, and
genetic diversity likely. - Endemic in Africa, Asia, Middle East and South
America - Potential cost for U.S. outbreaks gt10 billion
- Broader economic impact trade and travel
restrictions.
36FMD Information Model
37International FMD BioPortal Goals
- Real-time, web-based situational awareness of FMD
outbreaks worldwide through the establishment of
an international information sharing and analysis
system - FMDv characterization at the genomic level
integrated with associated epidemiological
information and modeling tools to forecast
national, regional, and/or international spread
and the prospect of import into the U.S. and the
rest of North America - Web-based crisis management of resourcesfacilitie
s, personnel, diagnostics, and therapeutics
38Research Plans
- Global FMD epidemiological data
- (Near) real-time data collection
- Web-based information sharing and analysis
- International FMD news
- Indexed collection of global FMD news
- Search and visualization of the FMD news via the
web - FMD genetic/sequence data
- Predictive model using phylogenetic, spatial, and
temporal information to stop FMD at the boarder - Visualization for FMD event in time, space, and
genetic space
39Preliminary Global FMD Dataset
- Provider UC Davis FMD Lab
- Information sources reference labs and OIE
- Coverage 28 countries globally
- Time span May, 1905 March, 2005
- Dataset size 30,000 records of which 6789
records are complete - Host species Cattle, Caprine, Ovine, Bovine,
Swine, NK, Elephant, Buffalo, Sheep, Camelidae,
Goat
40Global FMD Coverage in BioPortal
41FMD Migration Visualization using BioPortal
(cases in South Asia)
FMD Cases travel back and forth between countries
42International FMD News
- Provider UC Davis FMD Lab
- Information sources Google, Yahoo, and open
Internet sources - Time span Oct 4, 2004 present (real-time
messaging under development) - Data size 460 events (6/21/05)
- Coverage 51 countries
- (Africa11, Asia16,
- Europe12, Americas12)
43Searching FMD News
- http//fmd.ucdavis.edu/
- Searchable by
- Date range
- Country
- Keyword
44Visualizing FMD News on BioPortal
45FMD Genetic Information Analysis
- Genome clustering analysis
- Phylogenetic clustering
- Spatial clustering
- Temporal clustering
- Hotspot detection among gene sequences
- Create a tree structure based on semantic
distance between gene sequences. - Automatically detect the dense portion of the
tree. - Identify the connection between the semantic
cluster and the geographic pattern of gene
sequences.
46FMD Genetic Visualization
- Goal Extend STV to incorporate 3rd dimension,
phylogenetic distance - Include a phylogenetic tree.
- Identify phylogenetic groups and color-code the
isolate points on the map. - Leverage available NCBI tools such as BLAST.
- Proof of concept SAT 2 3 analysis
- Data 54 partial DNA sequence records in South
Africa received from UC Davis FMD Lab
(Bastos,A.D. et al. 2000, 2003) - Date range 1978-1998
- Countries covered South Africa, Zimbabwe,
Zambia, Namibia, Botswana
47Sample FMD Sequence Records
Color-coded View (MEGA3)
Textual View of Gene Sequence
48Phylogenetic Trees
49Interactive Phylogenetic Tree
Color coding shows similarity of sequences
User-adjustable grouping threshold to change
clusters
50Phylogenetic Treeof Sample FMD Data
Identify 6 groups within 2 major families (MEGA3
based on sequence similarity)
Group6
Group1
Group2
Group5
Group4
Group3
51Genetic, Spatial, and Temporal Visualization of
FMD Data
Phylogenetic tree color coded
Isolates locations color coded
Isolates appearances in time
52FMD Time Sequence Analysis
First family cases appeared throughout the period
2nd family cases exist before 1993 and a comeback
lately
Second family cases existed before 1993 and
reappeared later after 1997
53FMD Periodic Pattern Analysis
2nd family concentrated in Feb. while 1st family
spread evenly
54Locations of Family 1 records
Selected only groups 1, 2, and 3 and found a
spatial cluster
55Locations of Family 2 records
Sparse isolate locations
Selected only groups 4, 5, and 6
56Syndromic Surveillance
- What is syndrome?
- Syndrome is a group of symptoms which shows
possibility of a certain disease - What is syndromic surveillance?
- The term syndromic surveillance applies to
surveillance using health-related data that
precede diagnosis and signal a sufficient
probability of a case or an outbreak to warrant
further public health response. - Why use syndromic surveillance?
- Traditional disease surveillance systems mainly
rely on physicians to report reportable
diseases occurrences. This process is too
lengthy and does not leave enough response time.
57Syndromic Surveillance System Survey
58Syndromic SurveillanceData Sources
59Syndrome Classification
- To tag emergency room chief complaints with
syndrome label - EX coughing with high fever gt upper
respiratory - Challenges
- Free text no standard language. Ex chst pn,
CP, c/p, chest pai, chert pain, chest/abd pain,
chest discomfort all mean chest pain - Very short usually no more than 3 words
- Multiple languages
- Existing approaches
- Rule-based classification (EARS)
- Bayesian Net classification (RODS)
60Proposed Syndrome Classification
Symptom Groups
CC Cleansing
EMT-P
Chief Complaints
UMLS hierarchy
UMLS concepts
Rule processing engine (defined by public health
officers)
fever
Upper Respiratory Syndrome
cough
If Fever and (cold or cough) -gtthen upper
respiratory
UMLS Unified Medical Language System
61Kansas RSVP-A System
62Rapid Syndrome Validation Project Animal
(RSVP-A) System
- URL http//clh.vet.ksu.edu
- Main function allows vets to enter syndromic
observations and retrieve statistics bar charts - A complete system with administrative functions
such as profile editing - Provides 2 web-based interfaces
- Regular browser
- Mobile devices (WAP)
- Current users 17 vets in 29 counties
- Projected users 200 (Kansas), 10K (nationwide)
63BioPortal Integration Livestock
64Data Characteristics
- Time Period
- 7/16 2003 10/17 2005
- Cross 2 states/29 counties in Kansas and New
Mexico
65Imported Attributes
- RSVP-A monitors 6 syndromes
- Non-neonatal diarrhea,
- Neurologic / recumbant
- Unexpected deaths
- Weight loss/feed refusal
- Abortion/birth defect
- Erosive lesions
66Records
67STV Map View
68Seasonal Effect
- Tuesdays and Fridays have high volumes
- Dec and Jan have higher volume
69BioPortal Surveillance Future Work
- Complete systems comparison and user evaluation
- Multi-lingual chief complaints tagging and
analysis - Establish real-time data messaging
- Integrate potential new datasets
- Plant disease surveillance in Great Plains
- Hanta virus in South America
70BioPortal Future Work
- Complete open source, generic BioPortal
architecture and system - Develop multi-lingual BioPortal
- Incorporate other diseases, e.g., avian
influenza, SARS - Solicit partners and expand test sites
- Continue infectious disease informatics research
71For more informationBioPortal web site
http//www.bioportal.orgAI Lab web site
http//ai.arizona.eduhchen_at_eller.arizona.edu