BioPortal: Disease and Bioagent Information Sharing, Surveillance, Analysis, and Visualization - PowerPoint PPT Presentation

1 / 71
About This Presentation
Title:

BioPortal: Disease and Bioagent Information Sharing, Surveillance, Analysis, and Visualization

Description:

BioPortal: Disease and Bioagent Information Sharing, Surveillance, Analysis, and Visualization – PowerPoint PPT presentation

Number of Views:234
Avg rating:3.0/5.0
Slides: 72
Provided by: Chu8
Category:

less

Transcript and Presenter's Notes

Title: BioPortal: Disease and Bioagent Information Sharing, Surveillance, Analysis, and Visualization


1
BioPortalDisease and Bioagent Information
Sharing, Surveillance, Analysis, and Visualization
  • Research Team
  • University of Arizona
  • University of California, Davis
  • Kansas State University
  • Arizona Department of Public Health
  • University of Utah
  • New York State Department of Health/HRI
  • California Department of Health Services/PHFE
  • U.S. Geological Survey
  • The SIMI Group
  • Acknowledgements NSF, ITIC, DHS, DOD/AFMIC,
    IDIWC, AZDPS

2
Research Partners and Supports
  • University of Arizona
  • University of California, Davis
  • Kansas State University
  • University of Utah
  • Arizona Department of Public Health
  • New York State Department of Health/HRI
  • California Department of Health Services/PHFE
  • U.S. Geological Survey
  • The SIMI Group
  • NSF
  • CIA/ITIC
  • DHS
  • DOD/AFMIC
  • CDC
  • AZDPS

3
UA Team Members
  • Dr. Hsinchun Chen
  • Dr. Daniel Zeng
  • Lu Tseng
  • Cathy Larson
  • Kira Joslin
  • Wei Chang
  • James Ma
  • Hsinmin Lu
  • Ping Yan
  • Aaron Sun
  • Keith Alcock
  • Sapna Brahmanandam
  • Milind Chabbi
  • Yuan Wang

4
Outline
  • Project Background
  • BioPortal V1.0 Achievements
  • System Architecture
  • System Functionalities
  • BioPortal Collaboration Framework
  • New Developments
  • International Foot-and-mouth Disease Monitoring
  • Syndromic Surveillance
  • Livestock Health Surveillance

5
BioPortal Background
Acknowledgment NSF, ITIC, NYSDH, CDHS,
USGS (Drs. Kvach and Ascher)
6
Background (I)
  • In September, 2002, representatives of 18
    different agencies, including DOD, DOE, DOJ, DHS,
    NIH/NLM, CDC, CIA, NSF, and NASA, were convened
    to discuss disease surveillance.
  • An interagency working group called Disease
    Informatics Senior Coordinating Committee (DISCC)
    was established.
  • DISCC established an Infectious Disease
    Informatics Working Committee (IDIWC) to survey
    the field and identify gaps.
  • IDIWC developed requirements for a National
    Infectious Disease Informatics Infrastructure
    (NIDII).

7
Background (II)
  • In June, 2003, IDIWC was charged with the task of
    developing one or more rapid prototype systems to
    demonstrate interoperability and innovation
    across species and jurisdictions.
  • Botulism and West Nile virus were selected as
    diseases.
  • States of New York and California were selected
    as partners.
  • The University of Arizona was chosen as
    integrator and was provided with a supplement to
    an existing NSF grant.

8
BioPortal Project Goals
  • Demonstrate and assess the technical feasibility
    and scalability of an infectious disease
    information sharing (across species and
    jurisdictions), alerting, and analysis framework.
  • Develop and assess advanced data mining and
    visualization techniques for infectious disease
    data analysis and predictive modeling.
  • Identify important technical and policy-related
    challenges in developing a national infectious
    disease information infrastructure.

9
BioPortal V1.0 Accomplishments
  • Prototype system design and development
  • Initial design and implementation of
    interoperable messaging backbones
  • Live prototype systems
  • Preliminary user evaluation
  • Information sharing
  • Data sharing agreements/memoranda of
    understanding (MOUs) developed
  • Many disease datasets integrated into the portal
  • Analysis and visualization
  • Hotspot analysis research
  • Spatial-Temporal Visualizer (STV)

10
Information Sharing Infrastructure Design
Portal Data Store (MS SQL 2000)
Data Ingest Control Module Cleansing /
Normalization
Info-Sharing Infrastructure
Adaptor
Adaptor
Adaptor
SSL/RSA
SSL/RSA
XML/HL7 Network
PHINMS Network
New
NYSDOH
CADHS
11
Data Access Infrastructure Design
12
BioPortal Collaboration Framework
  • A Memorandum of Understanding (MOU) is used to
    document the relationship between parties that
    will be sharing data
  • Who the entities are and how they will act
    independently and cooperatively
  • What the mutual interests, benefits, and purposes
    of sharing data are
  • How each party will maintain control over and
    share their resources, and what each party shall
    provide to the other (e.g., system accounts,
    portal access)
  • Which types of data are to be shared (e.g., dead
    bird surveillance)

13
Summary of MOU
  • Confidentiality
  • Data is not to be shared outside of the project.
  • Data is to be returned or destroyed after 5
    years.
  • Ownership
  • Original data is owned by providers.
  • Data analysis is jointly owned.
  • Scope
  • Specific diseases are listed.
  • Additional diseases can be added.
  • Parties agree separately on which data elements
    can be shared (e.g., species, gender, etc.)
  • Purpose
  • Data may be used for system development, for
    example.

14
Datasets Integrated WNV, BOT
15
Communications/Messaging
  • Scalable, flexible, light-weight, and extendible.
    Easy to include
  • New diseases
  • New jurisdictions
  • New techniques!
  • Messaging infrastructure installed and tested
  • NYSDOH-UA PHIN MS
  • CADHS-UA Regional message broker
  • NWHC-UA PHIN MS
  • XML generation/conversion
  • NY_DeadBird, NY_Alerts, NY_BotHuman, NY_WNVHuman,
    NY_CaptiveAnimal, NY_Mosquito
  • CA_BotHuman, CA_WNVHuman, CA_DeadBird,
    CA_Chicken, CA_Mosquito
  • USGS_Epizoo

16
BioPortal Research Framework
  • BioPortal Demo Develop the system for
    demonstration purposes using scrubbed data.
    Refine system functionality and performance based
    on user feedback.
  • BioPortal Operation Develop the system for
    production mode with real data and real users.
  • BioPortal Research Continue to develop
    advanced technologies and practical sharing
    policies. Expand to new diseases and
    jurisdictions.

17
BioPortal Prototype Systems
18
Spatio-Temporal Data Mining Hotspot Analysis
  • A hotspot is a condition indicating some form of
    clustering in a spatial and temporal distribution
    (Rogerson Sun 2001 Theophilides et al. 2003
    Patil Tailie 2004).
  • For WNV, localized clusters of dead birds
    typically identify high-risk disease areas
    (Gotham et al. 2001).
  • Automatic detection of dead bird clusters using
    hotspot analysis can help predict disease
    outbreaks and aid in effective allocation of
    prevention/control resources.

19
Existing Hotspot Analysis Approach SaTScan
  • The spatial scan statistical techniques
    implemented in SaTScan are widely used to detect
    and evaluate disease outbreaks (Kulldorff 2001).
  • NYSDOH has used SaTScan to develop an early
    warning system for WNV (Gotham et al. 2001).
  • An important factor considered by spatial scan
    statistical analysis is the baseline.
  • The significance of the density of dead birds
    depends on the historical distribution of bird
    deaths, human population, and so on.

20
Other Hotspot Analysis Approaches CrimeStat and
RSVC
  • Hotspot analysis techniques applied to crime
    analysis CrimeStat (Levine 2002).
  • CrimeStats Risk-Adjusted Nearest Neighbor
    Hierarchical Clustering (RNNH) Uses a kernel
    density estimation obtained from baseline data to
    adjust the threshold that controls whether data
    points can be grouped together.
  • Risk-Adjusted Support Vector Machine Clustering
    (RSVC) It combines the power and flexibility of
    support vector machine-based clustering and the
    risk adjustment idea of RNNH.

21
Case Study (NY WNV)
  • On May 26, 2002, the first dead bird with WNV was
    found in NY
  • Based on NYs test dataset

140 records
224 records
March 5
May 26
July 2
new cases
baseline
22
(No Transcript)
23
Dead Bird Hotspots Identified
24
Hotspot Analysis Findings
  • RSVC delivers similar recall levels and higher
    precision than SaTScan.
  • RNNH matches RSVC precision, but has very low
    recall.
  • RSVC significantly outperforms other methods in
    the F-measure.
  • Techniques could be complementary for different
    hotspot analysis tasks.

25
Spatial-Temporal Visualization
  • Integrates four visualization techniques
  • GIS View
  • Periodic Pattern View
  • Timeline View
  • Central Time Slider
  • Visualizes the events in multiple dimensions to
    identify hidden patterns
  • Spatial
  • Temporal
  • Hotspot analysis
  • Phylogenetic (planned)

26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
BioPortal HotSpot Analysis RSVC, SaTScan, and
CrimeStat Integrated (first visual, real-time
hotspot analysis system for disease surveillance)
  • West Nile virus in California

30
Hotspot Analysis-Enabled STV
31
BioPortal New Developments
  • NSF Infectious Disease Informatics Grant (2004-9)
  • International Foot-and-Mouth Disease BioPortal
    (2005-6) FMD Lab, UC Davis
  • Human Syndromic Surveillance System Arizona
    State Department of Health (2005-6)
  • Livestock Syndromic Surveillance System Kansas
    State University RSVP-A (2005-6)

32
New Research Directions
  • Analytical Algorithms
  • Prospective hotspot analysis auto baseline
    discovery
  • Spatial-Temporal correlation analysis
  • Dynamic Network Analysis
  • Visualization
  • International FMD news visualization
  • Phylogenetic Spatial-Temporal visualization
  • Syndromic Surveillance
  • Syndromic surveillance system survey
  • Emergency room chief complaint syndromic
    classification
  • Livestock syndromic surveillance

33
Extended BioPortal Research Framework
  • BioPortal Demo
  • BioPortal Operation
  • BioPortal Research
  • FMD BioPortal A dedicated instance of
    BioPortal customized for International
    Foot-and-Mouth disease monitoring. Additional
    functionalities such as gene sequence analysis
    and FMD News are added
  • BioPortal Syndromic Surveillance A specialized
    BioPortal instance that processes chief
    complainants using a hybrid method of ontology
    and knowledge rules
  • BioPortal Livestock A BioPortal instance
    devoted in Livestock syndromic surveillance case
    management and data analysis

34
International FMD BioPortal
Acknowledgment DHS, DOD, UC Davis (Drs. Thurmond
and Lynch)
35
Introduction
  • Foot-and-mouth disease (FMD) is the top disease
    on the Office International des Epizooties (OIE)
    List A, which can infect all cloven-hoofed
    animals.
  • FMD is the most contagious infectious diseases of
    livestock animals
  • Massive shedding of virus and contamination of
    the environment.
  • Transmitted by direct or indirect contact
    (droplets), animate vectors (humans), inanimate
    vectors (vehicles
  • Serologically diverse with seven distinct types
    (A, O, C, SAT1, SAT2, SAT3, Asia1), which makes
    diagnosis and vaccination problematic, and
    genetic diversity likely.
  • Endemic in Africa, Asia, Middle East and South
    America
  • Potential cost for U.S. outbreaks gt10 billion
  • Broader economic impact trade and travel
    restrictions.

36
FMD Information Model
37
International FMD BioPortal Goals
  • Real-time, web-based situational awareness of FMD
    outbreaks worldwide through the establishment of
    an international information sharing and analysis
    system
  • FMDv characterization at the genomic level
    integrated with associated epidemiological
    information and modeling tools to forecast
    national, regional, and/or international spread
    and the prospect of import into the U.S. and the
    rest of North America
  • Web-based crisis management of resourcesfacilitie
    s, personnel, diagnostics, and therapeutics

38
Research Plans
  • Global FMD epidemiological data
  • (Near) real-time data collection
  • Web-based information sharing and analysis
  • International FMD news
  • Indexed collection of global FMD news
  • Search and visualization of the FMD news via the
    web
  • FMD genetic/sequence data
  • Predictive model using phylogenetic, spatial, and
    temporal information to stop FMD at the boarder
  • Visualization for FMD event in time, space, and
    genetic space

39
Preliminary Global FMD Dataset
  • Provider UC Davis FMD Lab
  • Information sources reference labs and OIE
  • Coverage 28 countries globally
  • Time span May, 1905 March, 2005
  • Dataset size 30,000 records of which 6789
    records are complete
  • Host species Cattle, Caprine, Ovine, Bovine,
    Swine, NK, Elephant, Buffalo, Sheep, Camelidae,
    Goat

40
Global FMD Coverage in BioPortal
41
FMD Migration Visualization using BioPortal
(cases in South Asia)
FMD Cases travel back and forth between countries
42
International FMD News
  • Provider UC Davis FMD Lab
  • Information sources Google, Yahoo, and open
    Internet sources
  • Time span Oct 4, 2004 present (real-time
    messaging under development)
  • Data size 460 events (6/21/05)
  • Coverage 51 countries
  • (Africa11, Asia16,
  • Europe12, Americas12)

43
Searching FMD News
  • http//fmd.ucdavis.edu/
  • Searchable by
  • Date range
  • Country
  • Keyword

44
Visualizing FMD News on BioPortal
45
FMD Genetic Information Analysis
  • Genome clustering analysis
  • Phylogenetic clustering
  • Spatial clustering
  • Temporal clustering
  • Hotspot detection among gene sequences
  • Create a tree structure based on semantic
    distance between gene sequences.
  • Automatically detect the dense portion of the
    tree.
  • Identify the connection between the semantic
    cluster and the geographic pattern of gene
    sequences.

46
FMD Genetic Visualization
  • Goal Extend STV to incorporate 3rd dimension,
    phylogenetic distance
  • Include a phylogenetic tree.
  • Identify phylogenetic groups and color-code the
    isolate points on the map.
  • Leverage available NCBI tools such as BLAST.
  • Proof of concept SAT 2 3 analysis
  • Data 54 partial DNA sequence records in South
    Africa received from UC Davis FMD Lab
    (Bastos,A.D. et al. 2000, 2003)
  • Date range 1978-1998
  • Countries covered South Africa, Zimbabwe,
    Zambia, Namibia, Botswana

47
Sample FMD Sequence Records
Color-coded View (MEGA3)
Textual View of Gene Sequence
48
Phylogenetic Trees
49
Interactive Phylogenetic Tree
Color coding shows similarity of sequences
User-adjustable grouping threshold to change
clusters
50
Phylogenetic Treeof Sample FMD Data
Identify 6 groups within 2 major families (MEGA3
based on sequence similarity)
Group6
Group1
Group2
Group5
Group4
Group3
51
Genetic, Spatial, and Temporal Visualization of
FMD Data
Phylogenetic tree color coded
Isolates locations color coded
Isolates appearances in time
52
FMD Time Sequence Analysis
First family cases appeared throughout the period
2nd family cases exist before 1993 and a comeback
lately
Second family cases existed before 1993 and
reappeared later after 1997
53
FMD Periodic Pattern Analysis
2nd family concentrated in Feb. while 1st family
spread evenly
54
Locations of Family 1 records
Selected only groups 1, 2, and 3 and found a
spatial cluster
55
Locations of Family 2 records
Sparse isolate locations
Selected only groups 4, 5, and 6
56
Syndromic Surveillance
  • What is syndrome?
  • Syndrome is a group of symptoms which shows
    possibility of a certain disease
  • What is syndromic surveillance?
  • The term syndromic surveillance applies to
    surveillance using health-related data that
    precede diagnosis and signal a sufficient
    probability of a case or an outbreak to warrant
    further public health response.
  • Why use syndromic surveillance?
  • Traditional disease surveillance systems mainly
    rely on physicians to report reportable
    diseases occurrences. This process is too
    lengthy and does not leave enough response time.

57
Syndromic Surveillance System Survey
58
Syndromic SurveillanceData Sources
59
Syndrome Classification
  • To tag emergency room chief complaints with
    syndrome label
  • EX coughing with high fever gt upper
    respiratory
  • Challenges
  • Free text no standard language. Ex chst pn,
    CP, c/p, chest pai, chert pain, chest/abd pain,
    chest discomfort all mean chest pain
  • Very short usually no more than 3 words
  • Multiple languages
  • Existing approaches
  • Rule-based classification (EARS)
  • Bayesian Net classification (RODS)

60
Proposed Syndrome Classification
Symptom Groups
CC Cleansing
EMT-P
Chief Complaints
UMLS hierarchy
UMLS concepts
Rule processing engine (defined by public health
officers)
fever
Upper Respiratory Syndrome
cough
If Fever and (cold or cough) -gtthen upper
respiratory
UMLS Unified Medical Language System
61
Kansas RSVP-A System
62
Rapid Syndrome Validation Project Animal
(RSVP-A) System
  • URL http//clh.vet.ksu.edu
  • Main function allows vets to enter syndromic
    observations and retrieve statistics bar charts
  • A complete system with administrative functions
    such as profile editing
  • Provides 2 web-based interfaces
  • Regular browser
  • Mobile devices (WAP)
  • Current users 17 vets in 29 counties
  • Projected users 200 (Kansas), 10K (nationwide)

63
BioPortal Integration Livestock
64
Data Characteristics
  • Time Period
  • 7/16 2003 10/17 2005
  • Cross 2 states/29 counties in Kansas and New
    Mexico

65
Imported Attributes
  • RSVP-A monitors 6 syndromes
  • Non-neonatal diarrhea,
  • Neurologic / recumbant
  • Unexpected deaths
  • Weight loss/feed refusal
  • Abortion/birth defect
  • Erosive lesions

66
Records
67
STV Map View
  • Geocoded at county level

68
Seasonal Effect
  • Tuesdays and Fridays have high volumes
  • Dec and Jan have higher volume

69
BioPortal Surveillance Future Work
  • Complete systems comparison and user evaluation
  • Multi-lingual chief complaints tagging and
    analysis
  • Establish real-time data messaging
  • Integrate potential new datasets
  • Plant disease surveillance in Great Plains
  • Hanta virus in South America

70
BioPortal Future Work
  • Complete open source, generic BioPortal
    architecture and system
  • Develop multi-lingual BioPortal
  • Incorporate other diseases, e.g., avian
    influenza, SARS
  • Solicit partners and expand test sites
  • Continue infectious disease informatics research

71
For more informationBioPortal web site
http//www.bioportal.orgAI Lab web site
http//ai.arizona.eduhchen_at_eller.arizona.edu
Write a Comment
User Comments (0)
About PowerShow.com