XML, Standards, and Ontologies - PowerPoint PPT Presentation

Loading...

PPT – XML, Standards, and Ontologies PowerPoint presentation | free to download - id: 4651e4-ZDkyZ



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

XML, Standards, and Ontologies

Description:

XML, Standards, and Ontologies Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut 371 Fairfield Road, Box U-255 – PowerPoint PPT presentation

Number of Views:418
Avg rating:3.0/5.0
Slides: 122
Provided by: alex57
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: XML, Standards, and Ontologies


1
XML, Standards, and Ontologies
Prof. Steven A. Demurjian, Sr. Computer Science
Engineering Department The University of
Connecticut 371 Fairfield Road, Box U-255 Storrs,
CT 06269-2155
steve_at_engr.uconn.edu http//www.engr.uconn.edu/st
eve (860) 486 - 4818
2
Overview
  • What is XML? How is it Used Today?
  • XML Databases
  • HL7 and CDA
  • Other Standards
  • MeSH
  • Unified Medical Language System
  • ICD9 and ICD9-CM (Intl. Classification Diseases)
  • ICD10 and ICD10-CM
  • SNOMED-CT (Clinical Terms)
  • National Drug Codes (NDC)
  • Ontologies Biomedical and Clinical
  • What are they?
  • How are they Used?
  • Can they be Improved?

3
What is one Possible Solution?
  • Standards and Usage of XML
  • XML Used in Myriad of Context
  • Modeling and Information Exchange (XML Schemas
    and Instances)
  • XML Standards
  • XACML Access Control Markup Language
  • OWL Web Ontology Language
  • HL7/CDA
  • XML Databases
  • What is/will be its Eventual Role in BMI?

4
Overview of XML
  • XML Overview, Tags, schema.
  • XML Query Languages XPath XQuery
  • XML Data Models
  • Storage Strategy XML DBMS
  • Relational, CMS, native
  • Native XML DBMS Pros/Cons.
  • Biomedical Information and Databases
  • BMI Standards and Examples HL7 and CDA
  • Survey of Technology

5
XML overview
  • eXtensible Markup Language
  • Similar to HTML
  • Meta-language that describes the content of the
    document (self-describing)
  • XML is primarily used as a data storage and
    interchange medium
  • XML exists in plain text format, however it may
    be compressed, or altered for transfer

6
XML overview cont.
  • There are no predefined data (tags), or grammer
    inherently in XML
  • XML tags give an XML document structure and
    meaning
  • Available tags are defined by a schema.
  • All tags in an XML document come in pairs, open
    and close
  • Tags are completely nested, and there is no
    ambiguity in their order

7
XML tags
  • XML tags may have an element field which is used
    to store information within the tag or Meta-data
  • Plain text can be placed between tags and this
    text is not parsed
  • CDATA is character data
  • This means that any string of non-markup
    characters is legal as part of the attribute
  • The ENTITY attribute type indicates that the
    attribute will represent an external entity in
    the document itself
  • The ID attribute type if you want to specify a
    unique identifier for each element.

8
XML Schema
  • The structure of an XML document is defined by
    its schema.
  • Dozens on languages to define XML schema
  • DTD
  • W3C (XSD)?
  • NG - Relax
  • This file can validate any instance of an XML
    document against it self.
  • This file, or schema also defines allowable tags.

9
Sample XML Structure
  • XML employees a tree structure model for
    representing data (previous slide)?

shiporder
shipto
orderperson
orderid
address
country
city
name
item
title
name
quantity
price
10
Schema Example (XSD)?
lt?xml version"1.0" encoding"ISO-8859-1"
?gt ltxsschema xmlnsxs"http//www.w3.org/2001/XML
Schema"gt ltxselement name"shiporder"gt
ltxscomplexTypegt ltxssequencegt ltxselement
name"orderperson" type"xsstring"/gt
ltxselement name"shipto"gt ltxscomplexTypegt
ltxssequencegt ltxselement name"name"
type"xsstring"/gt ltxselement
name"address" type"xsstring"/gt
ltxselement name"city" type"xsstring"/gt
ltxselement name"country" type"xsstring"/gt
lt/xssequencegt lt/xscomplexTypegt
lt/xselementgt ltxselement name"item"
maxOccurs"unbounded"gt ltxscomplexTypegt
ltxssequencegt ltxselement name"title"
type"xsstring"/gt ltxselement name"note"
type"xsstring" minOccurs"0"/gt
ltxselement name"quantity" type"xspositiveInteg
er"/gt ltxselement name"price"
type"xsdecimal"/gt lt/xssequencegt
lt/xscomplexTypegt lt/xselementgt
lt/xssequencegt ltxsattribute name"orderid"
type"xsstring" use"required"/gt
lt/xscomplexTypegt lt/xselementgt lt/xsschemagt
11
Querying XML - XPath
  • Many languages to query XML
  • XPath and XQuery are W3C standards
  • Xpath is a compact method of traversing previous
    tree
  • Designed to facilitate use via URL/URI's
  • /shiporder/item/name ? view all items' names
  • Extensible to add user defined behaviors
  • Treats each tag as a node in the tree

12
Querying XML - XQuery
  • Functional extension of XPath
  • XML equivalent of SQL
  • Navigate and manipulate document nodes.
  • Works on collections of documents, or even
    fragments.

FOR b IN document("bib.xml")//book WHERE
b/publisher "Morgan Kaufmann" AND b/year
"1998" RETURN b/title
13
XML Models
  • Naively there are two models of XML use
  • Data-centric
  • Document-centric
  • In reality, most XML use is a hybrid of the two
  • More important is the database strategy used with
    XML
  • Relational
  • Content Managment
  • Native XML

14
Data Centric Model
  • Information is generally stored in a relational
    database
  • XML is transport medium, nothing more
  • Irrelevent to application that data exists as XML
    for some period of time
  • Characteristics
  • Fine grained data.
  • Data relationship is insignificant.
  • Need to transfer relational information.
  • Means of storing new information.

15
Document Centric Model
  • When XML is utilized soley as a document
  • This pesentation in Open Office
  • The documents in part, or in full are stored and
    retrieved
  • Does not originate from relational database
  • Document used for human consumption
  • Usually information written by hand in a language
    like PDF, RTF then converted to XML

16
Reality Hybrid Model
  • Most documents like a PDF will also contain small
    grained information (last edited date, character
    set)
  • Data from a relational DB may even be a document,
    or require self description
  • Various database technologies support all models
  • Important to understand your data, and choose db
    technology that is most compatible

17
XML as Data Exchange Medium
  • Widespread Usage Across Computing
  • UML Tools have Standardized on XML Schema
  • Export Given UML Design to XML Instances
  • Track Both Design Data and Graphical Data
  • Database Interactions via XML
  • Import from XML into a Relational Schema
  • Export form a Relational DB into XML Schema and
    Instances
  • Web Services
  • Exchange of Information
  • SOAP, WSDL, and UDDI
  • Facilitates Information Exchange and Portability

18
Medical Data Model
  • Medical data is non-homogeneous
  • But, there exists general trends in medical data
  • Fine grain data such as dates, times, images
  • Documents and human generated descriptions and
    observations
  • Human interaction creates semi-structured data
  • Ability to transfer information is esential
  • Medical data fits into hybrid model

19
Data Centric Comparison
  • Advantages
  • Utlizes existing database software. (IBM, Oracle,
    SqlServer)?
  • Quick ( existing db's are already fast)
  • Dual role (not limited only to XML)
  • Many even support XQuery
  • Disadvantages
  • More configuration (mapping relational -gt XML)
  • Slower when creating complex XML files due to
    middle step

20
Document Centric Comparison
  • Advantages
  • Good integration into workflow
  • Document managment made easy
  • Collaboration, and web publishing
  • Disadvantages
  • Not able to extract data from document directly
  • Not designed for high availability, high load
    systems
  • Non-uniformity in implementations

21
Storage Strategy Relational
  • Utilizing a relational database to store XML
    documents and data is very popular
  • In a very data centric application this
    approach is intuitive
  • Most top tier database applications support XML
    in some way
  • Oracle, SQL server, IBM, etc...
  • Software is highly supported and well developed.

22
XML Shema mapping
  • Using a relational DB requires mapping XML schema
    to DB schema.
  • Table based
  • Often implemented as a middleware layer
  • Schema structure must follow row-column
    convention
  • Object relational
  • XML is a tree of objects
  • Mapped to DB using well established OR methods
  • Natively supported in some DB apps

23
Storage Strategy CMS
  • CMS Content Management System
  • Used in exclusively document-centric model
  • Various programs allow indexing, storage,
    manipulation, and publication of XML documents
  • Application specific
  • Numerous implementations, most recently Open
    Office and MS Word 2007
  • Not very interesting or useful in context of
    biomedical information

24
Storage Strategy Native
  • Semi structured data
  • Mapping to relational DB causes inflation and
    null space
  • Need more functionality and granularity than CMS
  • Performance increase over relational DB by
    avoiding joins
  • Assuming data is in appropriate order on disk
  • Only returns XML, need to convert for non XML
    manipulation
  • Development still in infancy as of Winter 2007

25
Native XML Databases
  • Definition
  • A database that has an XML document as its
    fundamental unit of (logical) storage and defines
    a (logical) model for an XML document, as opposed
    to the data in that document, and stores and
    retrieves documents according to that model. At a
    minimum, the model must include elements,
    attributes, PCDATA, and document order.
  • Data types No support in XML, need a mapping
  • Document or database schema can be used
  • External user defined mapping
  • Not necessary when only transfering data
  • No requirement on underlying medium or
    implementation
  • Two architectures text and model based

26
Native Text-based
  • Use any DB
  • Rather than mapping schemas, store entire XML
    documents
  • Usually involves saving entire document as a BLOB
    / Character LOB
  • Utilize various text field searches to retrieve
    info from XML document
  • Some DB text searching are being made XML aware
  • Speed Document located on disk preferences full
    or partial document retrieval

27
Native Model-based
  • Internal object model of the document schema
  • Store this model in a database
  • Relational / object-oriented database
  • Proprietary
  • Performance similar to chosen db engine
  • Still limited by hierachy of XML data
  • Retrieve all orderid's from hundreds of docs slow
  • Support for common XML query languages
  • XPath, XQuery, etc...

28
Native XML TLC
  • In the traditional database world, Transactions,
    locking and concurrency are paramount
  • Native XML databases aren't mature enough to
    support everything
  • Most support transactions, but what about LC?
  • Document level locking is easy, but too coarse.
  • Only a few implementations support node level
    locking
  • Commercial products generally support ACID, free
    ones just starting too (2008)
  • Atomicity-Consistency-Isolation-Durability

29
Native XML API's
  • Ubiquity of ODBC interfaces
  • Still applies to native XML databases
  • Most implementations provide their own interface
    for a variety of languages
  • Industry standardization
  • XMLDB API from XMLDB.org, programming language
    neutral
  • JSR 225 Xquery API for JAVA (XQJ). IBM and Oracle

30
Native XML The Rest
  • Referential integrity is supported in an adhoc
    manner at best
  • Database cannot enforce user defined (via schema)
    integrity
  • Some standard mechanisms allow it
  • Eventually both mechanisms will be supported
  • Currently relies heavily on application for
    normalization and integrity
  • Certainly a drawback for medical applications

31
Native XML Scalability
  • Limitation of any DB is time spent seeking HD
  • XML only needs to find pointer to head of doc
  • Therefore an XML DB should scale well in the
    context of retrieving data
  • The only caviat is if the retrieval breaks the
    document hierachy
  • More pointers must be followed, potentially
    slowing retrieval greatly
  • Where there is money, there is a way

32
Biomedical Information
  • Overview of the field.
  • Data storage and transfer problem.
  • XML as a solution.
  • BMI XML examples.
  • Next section Choosing a native DB.

33
BMI Overview
  • The convergence of computation and biomedicine
  • The NIH BMI Science and Tech Initiative
  • Define biomedical computing as a science
  • Many sources of information
  • Clinical, surgical, genetics, drug design,
    biology
  • Standardization in software
  • Algorithm development, high speed computing
  • All relieves on efficient storage and transfer of
    information

34
BMISTI Databases
  • Biomedical computing is entering an age where
    creative exploration of huge amounts of data will
    lay the foundation of hypotheses. NIH Director
  • Problems
  • Standards. Terminology, syntax and semantics need
    to be defined and agreed upon to allow
    integration of data
  • Curation. Database submissions need to be checked
    and cross-referenced to avoid the transitive
    propagation of error
  • Interoperability. Data should be as consistent as
    possible across databases so that researchers can
    compare and contrast it
  • Computational and Systems issue
  • Utilize and manipulate information.
  • Procress large volumes of information.

35
BMI XML
  • Data sharing and semantic interoperability
  • Case study Electronic Health Record
  • The development and use of an integrated health
    record for a patient
  • Hetergenous data, e.g. clinical, clinical-trial,
    genomic data
  • Primary Obstacle Proprietary data formats
  • Uniformity on technical level Text file
  • Step towards semantic goal

36
XML in Clinical Data
  • HL7 standards organization.
  • V2 ASCII bar format. example

HL7V312.02 Message2.16.840.1.113883.1122CNTRL-
34562002081614303516- ---gt
06003.02.16.840.1.113883POLB_IN004410
PIERER respondToRSPtel555-555-5555WP
entit yRspFAMHippocratesGIVHaroldGIVH
SFXACMDtel555-555-5555WP senderSNDnfs127
.127.127.255 device2.16.840.1.113883.1122GHH
LABGIVAn Entit y NameLtel555-555-2005H
agencyFor representedOrganization\NOTH\
location2.16.840.1.113883.1122ELAB-3GHH
LabTN receiverRCVnfs127.127.127.0
device2.16.840.1.113883.1122GHH O
EGIVAn Entit y NameLtel555-555-2005H
agencyFor representedOrganization2.16.840.1.1
13883.19.3.1001GHH Outpatient ClinicTN
location2.16.840.1.113883.1122BLDG4GHH
Outpatient ClinicTN
  • Awkward, inflexible, unclear meaning of values.

37
HL7 V3 Specification
  • Built around Reference Information Model
  • Entity, Role, Participation, and Act
  • Utilizes dedicated vocabularites and data types.
  • Every specification must begin from RIM.
  • Clinical Document Architecture
  • Utilizes XML with tags like observation, code,
    value and id.

ltobservation classCode"OBS" moodCode"EVN"gt
ltid root"10.23.4573.15879"/gt ltcode
code"313193002" codeSystem"2.16.840.1.113883.6.9
6" codeSystemName"SNOMED CT"
displayName"Peak flow"/gt lteffectiveTime
value"20000407"/gt ltvalue xsitype"RTO_PQ_PQ"gt
ltnumerator value"260" unit"l"/gt
ltdenominator value"1" unit"min"/gt
lt/valuegt lt/observationgt
38
XML in Clinical Trials
  • Example Drug studies
  • Utilizing XML would eliminate manual
    transcription when moving data from one system to
    another
  • XML is a universal datatype as it stores
    everything in text
  • Therefore can handle new tech. seamlessly
  • Clinical Data Interchange Standards Consortium
  • Industry standardization

39
CDISC ODM
  • Operational Data Model
  • XML based
  • Facilitate moving data from any collection system
    to clinical trial sponsor
  • Addresses real world issues
  • Incomplete data
  • Partial data transfer
  • Versioning and branching
  • ODM 1.1 current version

40
ODM Layout
41
XML in Genomic Data
  • Various groups export their data in XML
  • NCBI, EBI
  • They do not follow same schema, only allows
    partial semantic interoperability
  • Microarray Gene Experssion Group (MAGE) publishes
    a schema
  • MAGE files are often several gigabytes
  • Illustrates overhead of XML, however researches
    still use it because of interoperability

42
XML Complexity
  • Clinical Genomics Special Interest Group (HL7)?
  • Use genomic data in clinical enviroment
  • Utilize several models such as MAGE, BSML (for
    dna seqs)?
  • All information in raw models not necessary
  • Bubbling up analyzes large raw data sets,
    extracts useful information
  • Transfer useful information to new schema / model
  • Bottom line, there exists complex workflows to
    extract usable information.

43
XML BMI Issues
  • Clinical information like a verbal description or
    advice is unstructured
  • How do you query this?
  • Schemas and Models are extremely complex, with
    nesting, recursion and compound data types
  • Difficult mapping to relational databases
  • XML instances may be gigabytes in size
  • What database solutions exist to handle such
    large files?

44
XML BMI Examples
  • A closer look at the Clinical Document
    Architecture
  • Mayo clinic's implementation of CDA
  • Case study using native XML database to
    facilitate research based upon clinical texts
  • Tamino XML DB
  • Querying native BD
  • UCONN BMI, CSE 300 Spring 2008

45
XML BMI CDA
  • A clinical document is
  • Persistence exists for a defined time period
  • Stewardship Maintained by a designated care
    taker
  • Potential for authentication May be legally
    authenticated
  • It must be human readable on a standard web
    browser
  • Utilizes standard XML syntax
  • www.hl7.de/iamcda2004/finalmat/day1/Calvin20Beebe
    20CDA20Update.pdf

46
XML BMI CDA www.hl7.de/iamcda2004/finalmat/day1/C
alvin20Beebe20CDA20Update.pdf
  • Mayo clinics use of CDA

47
Survey of Native XML DBMS
  • Comprehensive List
  • http//www.rpbourret.com/xml/XMLDatabaseProds.htm
    native
  • Commercial
  • Tamino XML Server
  • Well developed, supported, many tools available
  • Open Source
  • Sedna Fully supports ACID, XQuery
  • eXist Great managment, documentation, indexing

48
eXist
http//www.rpbourret.com/xml/ProdsNative.htmexist
  • Proprietary data store B trees).
  • Supports XQuery/XPath 2.0
  • Full text searches.
  • XMLDB API.
  • Document level concurrency.
  • Complete documentation.
  • Incomplete transaction support.

49
Sedna
http//www.rpbourret.com/xml/ProdsNative.htmsedna
  • Underlying data storage based on DataGuide
  • Supports XQuery/XPath 2.0
  • Full text searches.
  • Custom API for various languages.
  • Command line admin.
  • Transaction support.

50
XML References
  • Canonical XML Version 1.0, John Boyer. 15 March
    2001. W3C
  • XML Path Language (Xpath) 2.0. W3C working
    Draft. 2 May 2003. W3C
  • XML Schema. XML Schema Working Group. 1 January
    2008. W3C
  • lthttp//www.w3.org/XML/Schemagt 
  • XML Schema Formal Description Brown, Fuchs,
    et. al. 25 September 2001. W3C
  • lthttp//www.w3.org/TR/xmlschema-formal/gt
  • Extensible Markup Language (XML). 1 January
    2008. W3C
  • lthttp//www.w3.org/XML/gt
  • http//www.25hoursaday.com/StoringAndQueryingXML.h
    tml
  • http//www.nih.gov/about/director/060399.htm
  • http//www.research.ibm.com/journal/sj/452/shabo.h
    tml
  • Overview of the CDISC Operational Data Model.
    26 April 2002. CDISC

51
What is one Possible Solution?
  • Standards and Usage of XML
  • Consider CDA Clinical Document Architecture
  • Standard for Clinical (Provider) Medical Record
  • Clinical Record Organized as
  • ltpatient_encountergt - location
  • ltlegal_authenticatorgt - MD
  • ltoriginating_organizationgt and ltprovidergt
  • ltpatientgt - name, birthdate, gender
  • ltbody_confidentiality-CONF1gt - note
  • History
  • Past Medical History
  • Medications
  • Allergies
  • Social History
  • Physical Exam
  • Vitals (BP, Resp, Temp, HR)
  • Etc...

52
What is one Possible Solution?
  • Lets Explore this in Greater Detail
  • Starting with the CDA Header

lt?xml version"1.0"?gt lt!DOCTYPE levelone PUBLIC
"-//HL7//DTD CDA Level One 1.0//EN"
"levelone_1.0.dtd"gt ltlevelonegt
ltclinical_document_headergt ltid EX"a123"
RT"2.16.840.1.113883.3.933"/gt ltset_id EX"B"
RT"2.16.840.1.113883.3.933"/gt ltversion_nbr
V"2"/gt ltdocument_type_cd V"11488-4"
S"2.16.840.1.113883.6.1" DN"Consultation
note"/gt ltorigination_dttm V"2000-04-07"/gt
ltconfidentiality_cd ID"CONF1" V"N"
S"2.16.840.1.113883.5.1xxx"/gt
ltconfidentiality_cd ID"CONF2" V"R"
S"2.16.840.1.113883.5.1xxx"/gt
ltdocument_relationshipgt ltdocument_relationsh
ip.type_cd V"RPLC"/gt ltrelated_documentgt
ltid EX"a234" RT"2.16.840.1.113883.3.933"/gt
ltset_id EX"B" RT"2.16.840.1.113883.3.933
"/gt ltversion_nbr V"1"/gt
lt/related_documentgt lt/document_relationshipgt
ltfulfills_ordergt ltfulfills_order.type_cd
V"FLFS"/gt ltordergtltid EX"x23ABC"
RT"2.16.840.1.113883.3.933"/gtlt/ordergt
ltordergtltid EX"x42CDE" RT"2.16.840.1.113883.3.933
"/gtlt/ordergt lt/fulfills_ordergt
53
CDA Example - Continued
54
CDA Example - Continued
55
CDA Example - Continued
56
CDA Example - Continued
57
CDA Example - Continued
58
CDA Example - Continued
59
CDA Example - Continued
60
CDA Example - Continued
61
Other Relevant Standards of Note
  • MeSH
  • Unified Medical Language System
  • ICD9 and ICD9-CM (Intl. Classification Diseases)
  • ICD10 and ICD10-CM
  • SNOMED-CT (Clinical Terms)
  • National Drug Codes (NDC)

62
MeSH
  • The Medical Subject Headings (MeSH) thesaurus is
    a controlled vocabulary produced by the National
    Library of Medicine and used for indexing,
    cataloging, and searching for biomedical and
    health-related information and documents.
  • 2011 MeSH includes the subject descriptors
    appearing in MEDLINE/PubMed, the NLM catalog
    database, and other NLM databases.
  • Many synonyms, near-synonyms, and closely related
    concepts are included as entry terms to help
    users find the most relevant MeSH descriptor for
    the concept they are seeking.
  • http//www.nlm.nih.gov/mesh/

63
Descriptor Data Elements
64
Qualifier Data Elements
65
Supplementary Concepts
66
MeSH in ASCII
NEWRECORD RECTYPE D MH Calcimycin AQ AA AD
AE AG AI AN BI BL CF CH CL CS CT DU EC HI IM IP
ME PD PK PO RE SD ST TO TU UR ENTRY
A-23187T109T195LABNRWNLM (1991)900308abbcde
f ENTRY A23187T109T195LABNRWUNK
(19XX)741111abbcdef ENTRY Antibiotic
A23187T109T195NONNRWNLM (1991)900308abbcdef
ENTRY A 23187 ENTRY A23187, Antibiotic MN
D03.438.221.173 PA Anti-Bacterial Agents PA
Ionophores MH_TH NLM (1975) ST T109 ST
T195 N1 4-Benzoxazolecarboxylic acid,
5-(methylamino)-2-((3,9,11-trimethyl-8-(1-methyl-2
-oxo-2-(1H-pyrrol-2-yl)ethyl)-1,7-dioxaspiro(5.5)u
ndec-2-yl)methyl)-, (6S-(6alpha(2S,3S),8beta(R)
,9beta,11alpha))- RN 52665-69-7 PI
Antibiotics (1973-1974) PI Carboxylic Acids
(1973-1974)
67
MeSH in ASCII
MS An ionophorous, polyether antibiotic from
Streptomyces chartreusensis. It binds and
transports cations across membranes and uncouples
oxidative phosphorylation while inhibiting ATPase
of rat liver mitochondria. The substance is used
mostly as a biochemical tool to study the role of
divalent cations in various biological
systems. OL use CALCIMYCIN to search A 23187
1975-90 PM 91 was A 23187 1975-90 (see under
ANTIBIOTICS 1975-83) HN 91(75) was A 23187
1975-90 (see under ANTIBIOTICS 1975-83) MED
62 MED 847 M90 299 M90 2405 M85
454 M85 2878 M80 316 M80 1601 M75
300 M75 823 M66 1 M66 3 ETC
68
MeSH in XML - desc2011.dtd
lt!-- MeSH DTD file for Descriptor records.
desc2011.dtd --gt lt!-- Author MeSH --gt lt!--
Effective 09/01/2010 --gt lt!-- PCDATA
parseable character data text occurence
indicators (default required, not repeatable)
? zero or one occurrence, i.e., at most one
(optional) zero or more occurrences
(optional, repeatable) one or more
occurrences (required, repeatable)
choice, one or the other, but not both
--gt lt!ENTITY DescriptorReference
"(DescriptorUI, DescriptorName)"gt lt!ENTITY
normal.date "(Year, Month, Day)"gt lt!ENTITY
ConceptReference "(ConceptUI,ConceptName,ConceptUM
LSUI?)"gt lt!ENTITY QualifierReference
"(QualifierUI, QualifierName)"gt lt!ENTITY
TermReference "(TermUI, String)"gt
69
MeSH in XML - desc2011.dtd
lt!ELEMENT DescriptorRecordSet (DescriptorRecord)gt
lt!ATTLIST DescriptorRecordSet LanguageCode
(czedutengfinfregeritajpnlavporscrslvs
pa) REQUIREDgt lt!ELEMENT DescriptorRecord
(DescriptorReference,
DateCreated,
DateRevised?,
DateEstablished?,
ActiveMeSHYearList,
AllowableQualifiersList?,
Annotation?,
HistoryNote?,
OnlineNote?,
PublicMeSHNote?,
PreviousIndexingList?,
EntryCombinationList?,
SeeRelatedList?,
ConsiderAlso?,
PharmacologicalActionList?,
RunningHead?,
TreeNumberList?,
RecordOriginatorsList,
ConceptList) gt lt!ATTLIST DescriptorRecord
DescriptorClass (1 2 3 4) "1"gt
70
MeSH in XML - desc2011.dtd
lt!ELEMENT ActiveMeSHYearList (Year)gt lt!ELEMENT
AllowableQualifiersList (AllowableQualifier)
gt lt!ELEMENT AllowableQualifier (QualifierReferredT
o,Abbreviation )gt lt!ELEMENT Annotation
(PCDATA)gt lt!ELEMENT ConsiderAlso (PCDATA)
gt lt!ELEMENT Day (PCDATA)gt lt!ELEMENT DescriptorUI
(PCDATA) gt lt!ELEMENT DescriptorName (String)
gt lt!ELEMENT DateCreated (normal.date)
gt lt!ELEMENT DateRevised (normal.date)
gt lt!ELEMENT DateEstablished (normal.date)
gt lt!ELEMENT DescriptorReferredTo
(DescriptorReference) gt lt!ELEMENT
EntryCombinationList (EntryCombination)
gt lt!ELEMENT EntryCombination (ECIN,
ECOUT)gt lt!ELEMENT ECIN
(DescriptorReferredTo,QualifierReferredTo)
gt lt!ELEMENT ECOUT (DescriptorReferredTo,QualifierR
eferredTo? ) gt lt!ELEMENT HistoryNote
(PCDATA)gt lt!ELEMENT Month (PCDATA)gt lt!ELEMENT
OnlineNote (PCDATA)gt ETC
71
dMeSH in XML - Sample
lt?xml version"1.0"?gt lt!DOCTYPE
DescriptorRecordSet SYSTEM "desc2011.dtd"gt ltDescri
ptorRecordSet LanguageCode "eng"gt ltDescriptorRec
ord DescriptorClass "1"gt ltDescriptorUIgtD000001
lt/DescriptorUIgt ltDescriptorNamegt
ltStringgtCalcimycinlt/Stringgt lt/DescriptorNamegt
ltDateCreatedgt ltYeargt1974lt/Yeargt
ltMonthgt11lt/Monthgt ltDaygt19lt/Daygt
lt/DateCreatedgt ltDateRevisedgt
ltYeargt2006lt/Yeargt ltMonthgt07lt/Monthgt
ltDaygt05lt/Daygt lt/DateRevisedgt
ltDateEstablishedgt ltYeargt1984lt/Yeargt
ltMonthgt01lt/Monthgt ltDaygt01lt/Daygt
lt/DateEstablishedgt
72
dMeSH in XML - Sample
ltActiveMeSHYearListgt ltYeargt2007lt/Yeargt
ltYeargt2008lt/Yeargt ltYeargt2009lt/Yeargt
ltYeargt2011lt/Yeargt lt/ActiveMeSHYearListgt
ltAllowableQualifiersListgt ltAllowableQualifiergt
ltQualifierReferredTogt ltQualifierUIgtQ00000
8lt/QualifierUIgt ltQualifierNamegt
ltStringgtadministration amp dosagelt/Stringgt
lt/QualifierNamegt lt/QualifierReferredTogt
ltAbbreviationgtADlt/Abbreviationgt
lt/AllowableQualifiergt ltAllowableQualifiergt
ltQualifierReferredTogt ltQualifierUIgtQ000009lt/Q
ualifierUIgt ltQualifierNamegt
ltStringgtadverse effectslt/Stringgt
lt/QualifierNamegt lt/QualifierReferredTogt
ltAbbreviationgtAElt/Abbreviationgt
lt/AllowableQualifiergt ETC
73
Unifies Medical Language System
  • UMLS acronym for was developed for National
    Library of Medicine

Disease is semantic type with around 392
relations (109 semantic relations and 22 other
relations). Pneumonia categorized under one
semantic type Disease, but has hundreds of
relations.
74
UMLS Concepts, Semantic Types/Relations
75
ICD9 Respiratory Diseases
76
ICD10 Respiratory Diseases
77
SNOMED-CT
  • SNOMED stands for Systemized Nomenclature Of
    Medicine Clinical Terms. SNOMED-CT is the result
    of merging two ontologies SNOMED-RT and Clinical
    Terms.
  • http//www.ihtsdo.org/snomed-ct/

77
78
SNOMED-CT
  • Composed of Concepts, Terms, and Relationships
  • Precisely Represent Clinical Information Across
    Scope of Health Care
  • Content Coverage Divided into Hierarchies

78
79
SNOMED Example
80
National Drug Codes
  • Tracking of Drugs (Prescription and OTC)
  • From Submittal Through Approach
  • Keeps Track of Many Details on Medication
  • Each Drug by Manufacturer has Unique NDC
    Identifier
  • See
  • http//www.fda.gov/Drugs/InformationOnDrugs/ucm142
    438.htm
  • Searchable Database
  • http//www.accessdata.fda.gov/scripts/cder/ndc/def
    ault.cfm

81
NDC Examples
82
Biomedical Clinical Ontologies
  • Evolution of WWW
  • Ontology
  • Definition and Description.
  • Example.
  • Present Biomedical Ontology
  • Need for Integration
  • Application of Biomedical Ontology
  • Clinical Trials
  • OASIS Integration Technique
  • Clinical Decision Support System
  • Summary
  • Presentation from Rishi Saripalle, Spring 2008

82
83
Current Information Systems on WWW
  • First Generation
  • Raw data which was pretty much hand-coded by the
    user was published online
  • For example, Static web pages
  • Second Generation
  • Dynamic content generation driven by MDA and
    databases
  • Machines generate the respective HTML
  • Third Generation Semantic Web
  • Generating machine processable information where
    the content is machine understandable, enabling
    intelligent services such as information brokers,
    search agents, information filters to process
    domain related information.

84
What are Ontologies?
  • Definition (from Philosophy)
  • Ontology is study of being or existence and
    forms the basic subject matter of metaphysics.
    It seeks to describe the basic categories and
    relationships of being or existence to define
    entities and types of entities within its
    framework.
  • Definition (from Computer Science)
  • In Computer science , Ontology means
    specification of a conceptualization. It means
    A data model that represents a set of concepts
    within a domain and the relationships between
    those concepts.

85
Advantages of Ontology
  • Semantic way of representing knowledge of the
    domain
  • Intelligent system can provide reasoning Systems
    to make inferences within the Ontology
  • To Share the common structure of information
  • To reuse the similar domain Ontology

86
Development of Ontology
  • Determine the domain and Scope ( Range ) of the
    knowledge
  • Look for already existing ontology in the similar
    domain
  • Listing all the terminologies or Concepts of the
    domain
  • List all the classes and instances to be created
    in the ontology
  • Create the properties which will relate these
    concepts in the ontology

87
Example of Ontology
Wine
Australian Yellow Tail
Class
Individual
Grape
Properties
Maker
Color
Flavor
German
Yellow
Delicate
Australia
88
What are RDF and OWL?
  • Researchers proposed Semantic Web Stack
    illustrating hierarchy of languages, where each
    layer exploits and uses capabilities of the
    layers below
  • OWL and RDF belong the family of knowledge
    representation language.
  • RDF Resource Description Framework
  • http//www.w3.org/RDF/
  • OWL Web Ontology Language
  • http//www.w3.org/TR/owl-features/
  • RDF reminds of Semantic Networks which were
    popular in 1970s

89
Introduction to RDF / OWL
90
RDF Resource Description Framework
  • RDF represents the knowledge in triples
    format Subject Predicate Object
  • For example, Students registerTo
    Classes (Subject) (Predicate)
    (Object)
  • One triple is RDF is referred as a statement
  • RDF is grammar based language has syntax similar
    to XML
  • RDFS (RDF Schema) has syntax similar to RDF and
    provide schema grammar to RDF. For example,
    rdfsClass, rdfssubClassOf etc

91
RDF Resource Description Framework
  • RDF syntax of the above example
  • All the concepts described in the RDF are
    identified using an URI (ex. http//www.example.co
    m/examleStudents).
  • RDF can be viewed as standardized framework for
    providing metadata to domain concepts.

ltrdfsClass rdfabout"http//www.example.com/exam
leStudents" rdfslabel"Students"gt lt/rdfsClassgt
ltrdfsClass rdfabout"http//www.example.com/ex
amleClasses" rdfslabelClasses"gt lt/rdfsClassgt
92
OWL Web Ontology Language
  • OWL is placed on the top of the semantic web
    stack, utilizing all the powerful features
    offered by the layers below (RDF, RDFS, XML)
  • OWL design has been influenced by description
    logic knowledge representational paradigms
  • SHIQ, Semantic Networks, Frames, SHOE, DAML, OIL,
    DAMLOIL.
  • OWL provides richer semantic capabilities than
    its predecessor RDF
  • For example, in the previous example, the
    predicate registerTo is of type rdfProperty.

93
OWL Web Ontology Language
  • OWL differentiates between properties by defining
  • owlObjectProperty for connecting two concepts
    (registerTo) and
  • owlDatatypeProperty - for connecting a concept
    to a datatype (utilized from XML)
  • These two properties inherit from RDF property
  • OWL also defines owlAnnotationProperty for
    embedding metadata onto classes, rules and
    axioms
  • The following slide illustrates the use of OWL,
    RDF and RDFS ( taken from cardiac ontology build
    in OWL using protégé tool)

94
OWL Web Ontology Language
  • Pulmonary Vein is sub-class of Vein which is
    sub-class of Heart.
  • The next slide illustrates the OWL properties and
    expressive power of OWL to restrict the domain
    and range values accepted by these properties.

BioMedical Informatics
95
OWL Web Ontology Language
ltowlObjectProperty rdfID"Complications"gt
ltrdfsdomain rdfresource"Cardiology_Diseases"/gt
ltrdfsrangegt ltowlClassgt
ltowlunionOf rdfparseType"Collection"gt
ltowlClass rdfabout"Cardiology_Complications"/
gt ltowlClass rdfabout"Cardiology_Dise
ases"/gt ltowlClass rdfabout"Cardiolog
y_Causes"/gt lt/owlunionOfgt
lt/owlClassgt lt/rdfsrangegt
lt/owlObjectPropertygt
  • The object property Complications can take
    domain values from class Cardiology_Diseases
    and range values from combination of classes
  • OWL combined with RDF/RDFS provides an
    environment for developing domain ontologies by
    organizing and describing the domain concepts

BioMedical Informatics
96
Disease Ontology
Sub-Classes of Cardiology Diseases
Instances of Mitral_Valve_Disorders
Hierarchical organization of Cardiology Diseases
97
Disease Ontology
Property Defined
Representation of Mitral_Valve_Prolapse
knowledge using properties and instances
98
Implemented Ontology in OWL Format
.. ltCongenital_Heart_Disease
rdfID"Atrial_septal_defect"gt
ltComplicationsgt ltCardiac_Arrhythmias
rdfID"Arrhythmia"gt ltHas_Intervention
rdfdatatype"http//www.w3.org/2001/XMLSchemastr
ing" gtdefibrillationlt/Has_Interventiongt
ltHave_Symptomsgt
ltCardiology_Symptoms rdfID"Dyspnea"/gt
lt/Have_Symptomsgt ltHas_Diagnosis_Testgt
ltCardiology_Diagnosis_Test
rdfID"Coronary_Angiography"gt
ltHas_Synonyms rdfdatatype"http//www.w3.org/2001
/XMLSchemastring" gtcoronary
catheterization lt/Has_Synonymsgt ..
99
Bio-Medical Ontologies
  • Review a Wide Range of Available Ontologies and
    Standards
  • OpenCyc
  • WordNet
  • Galen
  • UMLS
  • SNOMED CT
  • FMA
  • Gene Ontology

100
Open Cyc
  • Open Cyc is an Upper level ontology developed by
    Cycorp Inc.
  • Open Cyc has 60,000 hand coded assertions that
    capture common sense language, so that AI
    algorithms can perform human like reasoning and
    contains 6,000 concepts

101
Example of Open Cyc
102
Word Net
  • WordNet is an electronic lexical database
    developed at Princeton University that
    serves as a resource for applications in
    natural language processing and information
    retrieval.

cancer, malignant neoplastic disease any
malignant growth or tumor caused by abnormal and
uncontrolled cell division it may spread to
other parts of the body through the lymphatic
system or the blood stream   Cancer, Crab
(astrology) a person who is born while the sun is
in Cancer Cancer a small zodiacal constellation
in the northern hemisphere between Leo and
Gemini Cancer, Cancer the Crab, Crab the fourth
sign of the zodiac the sun is in this sign from
about June 21 to July 22 Cancer, genus Cancer
type genus of the family Cancridae
103
Unifies Medical Language System
  • UMLS was developed for National Library of
    Medicine

Disease is semantic type with around 392
relations (109 semantic relations and 22 other
relations). Pneumonia categorized under one
semantic type Disease, but has hundreds of
relations.
104
SNOMED-CT
  • SNOMED stands for Systemized Nomenclature Of
    Medicine Clinical Terms. SNOMED-CT is the
    result of merging two ontologies SNOMED-RT and
    Clinical Terms.

105
Ontology Integration
  • All the ontologies developed have a common aim,
    describing the domain knowledge
  • Integration of ontologies is becoming very
    critical
  • Applications tend to use multiple ontologies
  • Concepts in the various ontologies overlap or
    same concept is described in multiple ways.
  • For example, the concept Blood is described as
    differently
  • Fluid in one ontology
  • Substance in another ontology
  • semi-solid in a third ontology
  • Need to Reconcile these Differences When
    Attempting to Combine data that Originates from
    Different Ontologies

106
Ontology Integration
  • Semantics vs Structural Integration ?
  • Difficulties of integration arise with similar,
    same and complementary ontology integration.

Ontology B
107
OASIS
  • Ontology Mapping and Integration Framework

108
Application of Ontologies
  • Randomized Clinical Trails one of the least
    biasedsources of clinical research evidence, and
    are therefore a critical resource for the
    practice of evidence-based medicine
  • Scientific community is trying to encode the
    finding in computer process able language
  • However, for evidence to be put in practice one
    has to analysis the data. The canonical
    practice for trial interpretation is call
    System Reviewing.
  • Source for Data Specification
  • Trial Reports
  • Trial Databases.

109
Life Cycle of Clinical Trials
Ontology Specifications
110
Designing the Ontology
  • RCT ontology specifications are obtained from
  • Trial Reports
  • Trial Databases - ClinicalTrials.gov, PDQ etc.
  • The ontology is created by dividing the task into
    Sub-Tasks and Methods. This recursive process is
    called Competency Decomposition.
  • RCT decomposition methods combined Generic Tasks
    and Competency Question.

111
Defining the Schema
Intervention -ARM
  • - Frames
  • 601 - Slots

Administrative Concept
TRAIL
Outcome- Concept
Excluded Population
Population
Analyzed Population
112
Matching Patient Records to Clinical Trials
  • Low participation in Clinical Trials is the major
    problem in Clinical and translational research
    area.
  • Matching the patient records to clinical trials
    is presently a manual procedure and its
    tedious.
  • Need a Semantic Bridge between Clinical
    Ontologies (SNOMED CT, etc ..) and
    raw patient data for
  • retrieving matching patient records, clinical
    guidelines and clinical decision support systems
    ( CDSS).

113
Technical Challenges
  • Challenges to be faced during real time
    scenario
  • Knowledge Engineering.
  • Scalability
  • Noisy or Incomplete Data
  • Knowledge Engineering
  • Clinical Ontology has the concept Drug, which
    described active composition of the various drugs
  • However, patient record contains name of
    vendor-specific drugs list
  • Clinical Ontology describe the cause of the
    disorder. The patient records only specify the
    presence or absence of the disorder and where
    was the clinical test conducted.

114
Architecture of Solution
Clinical Trials
115
Implementation Approach
  • Mapping Patient Data Terminology to SNOMED-CT
  • Using UMLS as intermediate target.
  • NLP mapping techniques
  • Manual Mapping
  • Map the raw patient data to SNOMED-CT
    terminology.
  • Example Cerner Drug Lactulose Syrup 20G/30ml
  • SNOMED-CT administeredSubstance
  • Allow user to specify which terms in the
    definition to be matched.
  • Last Bullet Means Ontology Matching NOT Fully
    Automated!
  • This is a Real Problem for Interoperating Data!

116
Contrast in Representation
  • Example
  • SNOMED-CT Disease1
  • hasAgent Virus007
  • Infection due to Bacteria001
  • Infection due to MicroBacteria007
  • Patient Record Disease1 Positive.
  • As there is not much information in the
    patient
  • record the query reasoner cannot find the
    records
  • with partial data.

117
How are Observations Reconciled?
Clinical Trials Description
NCT00084266 Patients with MSRA
NCT00288808 Patients with warfarin
NCT00298870 Patients on steroids
NCT00304382 Patients with Pneumonia,source of Blood or Sputum
? associatedObservation MRSA
? associatedObservation Pneumococcal
Penumonia ? ?
hasSpecimanSource Blood ? Sputum
118
Clinical Decision Support System
  • Clinical Decision Support Systems (CDSS) are
  • Interactive computer programs
  • Designed to assist physicians and other health
    professionals with decision making tasks
  • Components of CDSS
  • Knowledge Base
  • Rule Based Engine
  • Case Base
  • Business Models

119
Example of Usaeg of Rules
IF RULE 1 RULE 2 RULE 3 .. Rule n
THEN INTERVENTION 1 or Rule M
IF p.getGender() male p.getAge()34
p.getBP() lt140 p.getInsulinLevel()lt20 THEN
Asthma Intervention Level 2
Class Patinet HasGender male ? hasAge 34 ?
hasBP MoreThan 140 ? hasInsulinLevel MoreThan 20
120
Summary - Ontologies
  • Ontology
  • Definition and Descriptions.
  • Example.
  • Biomedical Ontology
  • Open Cyc
  • WordNet
  • GALEN
  • SNOMED - CT
  • Integration of Ontologies
  • Application of Biomedical Ontology
  • Clinical Trials.
  • OASIS Integration Technique.
  • Clinical Decision Support System.

121
Concluding Remarks XML/Standards
  • Explored Usage of XML Including
  • Basic XML Concepts
  • XML Tools and Standards
  • XML Databases
  • Use of XML in BMI
  • Reviewed HL7 and CDA
  • Examined Numerous Standards
  • Reviewed Ontology Concepts
About PowerShow.com