Semantic Web: Promising Technologies, Current Applications - PowerPoint PPT Presentation

Loading...

PPT – Semantic Web: Promising Technologies, Current Applications PowerPoint presentation | free to download - id: 439062-ZWQzM



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Semantic Web: Promising Technologies, Current Applications

Description:

Invited and Colloquia talks at: Swinburne Institute of Technology Melbourne (July 18), University of Adelaide -Adelaide (July 23), University of Melbourne ... – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 78
Provided by: AmitS54
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Semantic Web: Promising Technologies, Current Applications


1
Semantic Web Promising Technologies, Current
Applications Future Directions
  • Invited and Colloquia talks at Swinburne
    Institute of Technology Melbourne (July 18),
    University of Adelaide-Adelaide (July 23),
    University of Melbourne- Melbourne (July 31),
    Victoria University- Melbourne
  • Australia, 2008
  • Amit P. Sheth
  • amit.sheth_at_wright.edu
  • Kno.e.sis Center, Comp. Sc Engg
  • Wright State University, Dayton OH, USA
  • Thanks Kno.e.sis team and collaborators

2
Outline
  • Semantic Web key capabilities and technlologies
  • Real-world Applications demonstrating benefit of
    semantic web technologies
  • Exciting on-going research

3
Evolution of the Web
Web as an oracle / assistant / partner - ask
the Web using semantics to leverage text data
services - Powerset
2007
1997
4
  • 1
  • 2
  • 3
  • of
  • Semantic Web

5
1
  • Ontology Agreement with a common
    vocabulary/nomenclature, conceptual models and
    domain Knowledge
  • Schema Knowledge base
  • Agreement is what enables interoperability
  • Formal description - Machine processability is
    what leads to automation

6
2
  • Semantic Annotation (Metadata Extraction)
    Associating meaning with data, or labeling data
    so it is more meaningful to the system and
    people.
  • Can be manual, semi-automatic (automatic with
    human verification), automatic.

7
3
  • Reasoning/Computation semantics enabled search,
    integration, answering complex queries,
    connections and analyses (paths, sub graphs),
    pattern finding, mining, hypothesis validation,
    discovery, visualization

8
Different foci
  • TBL focus on data Data Web (In a way, the
    Semantic Web is a bit like having all the
    databases out there as one big database.)
  • Others focus on reasoning and intelligent
    processing

9
Maturing Capabilites and Ongoing Research
  • Text Mining for Semantic Computing Entity
    Recognition, Relationship Extraction
  • Multi-modal Data Integration aligning text,
    experimental and curated data, multimedia data
  • Clinical and Scientific Workflows with semantic
    web services
  • Hypothesis driven retrieval of scientific
    literature, discovering undiscovered public
    knowledge

10
SW Stack Architecture, Standards
11
From Syntax to Semantics
12
a little bit about ontologies
13
Open Biomedical Ontologies
Many Ontologies Available Today
Open Biomedical Ontologies, http//obo.sourceforge
.net/
14
From simple ontologies
15
Drug Ontology Hierarchy (showing is-a
relationships)
interaction_ with_non_ drug_reactant
16
to complex ontologies
17
N-Glycosylation metabolic pathway
GNT-Iattaches GlcNAc at position 2
18
A little bit about semantic metadata extractions
and annotations
19
Extraction for Metadata Creation
Create/extract as much (semantics)metadata
automatically as possible Use ontlogies to
improve and enhance extraction
EXTRACTORS
METADATA
20
Automatic Semantic Metadata Extraction/Annotation
21
Semantic Web in Action
  • Supporting Clinical Decision Making

22
1. Supporting Clinical Decision Making
  • Status In use today
  • Where Athens Heart Center
  • What Use of Semantic Web technologies for
    clinical decision support

23
Operational Since January 2006
24
Active Semantic Electronic Medical Records (ASEMR)
  • Goals
  • Increase efficiency with decision support
  • formulary, billing, reimbursement
  • real time chart completion
  • automated linking with billing
  • Reduce Errors, Improve Patient Satisfaction
    Reporting
  • drug interactions, allergy, insurance
  • Improve Profitability
  • Technologies
  • Ontologies, semantic annotations rules
  • Service Oriented Architecture

Thanks -- Dr. Agrawal, Dr. Wingeth, and others.
ISWC2006 paper
25
ASEMR - Demonstration
  • Click to Launch

26
ASMER Efficiency
27
Further Opportunity Clinical and Biomedical Data
binary
Health Information Services Elsevier
iConsult
Scientific Literature PubMed 300 Documents
Published Online each day
User-contributed Content (Informal) GeneRifs

NCBI Public Datasets Genome, Protein DBs new
sequences daily
Laboratory Data Lab tests, RTPCR, Mass spec
Clinical Data Personal health history
Search, browsing, complex query, integration,
workflow, analysis, hypothesis validation,
decision support.
28
Semantic Web in Action
  • Genome and Pathway Information Integration

29
Data Sources
Reactome
KEGG
HumanCyc
Entrez Gene
  • pathway
  • protein
  • pmid
  • pathway
  • protein
  • pmid
  • pathway
  • protein
  • pmid

GeneOntology
HomoloGene
  • GO ID
  • HomoloGene ID

30
2. Genome and Pathway Data Integration
  • Status Completed research
  • Where NIH/NIDA
  • What Genome and Pathway data Integration to
    facilitate understanding the genetic basis of
    nicotine dependence
  • How Semantic Web technologies (especially RDF,
    OWL, and SPARQL) support information integration
    and make it easy to create semantically
    integrated resources

31
Entrez Knowledge Model (EKoM)
BioPAX ontology
32
Gene-Pathway Data Integration Collaborators
NIDA, NLM
  • Biological Significance
  • Understand the role of genes in nicotine
    addiction
  • Treatment of drug addiction based on genetic
    factors
  • Identify important genes and use for
    pharmaceutical productions

33
Semantic Web in Action
  • Querying Integrated Data Sources

34
2. Querying Integrated Data Sources
  • Status Completed research
  • Where NIH
  • What Querying Integrated Data Sources
  • Enriching data with ontologies for integration,
    querying, and automation
  • Ontologies beyond vocabularies the power of
    relationships

35
Using Data to Test Hypothesis
Link between glycosyltransferase activity and
congenital muscular dystrophy?
Glycosyltransferase
Congenital muscular dystrophy
Adapted from Olivier Bodenreider, presentation
at HCLS Workshop, WWW07
36
On the World Wide Web
(GeneID 9215)
has_associated_disease
Congenital muscular dystrophy,type 1D
has_molecular_function
Acetylglucosaminyl-transferase activity
Adapted from Olivier Bodenreider, presentation
at HCLS Workshop, WWW07
37
With the Semantically Enhanced Data
SELECT DISTINCT ?t ?g ?d ?t is_a
GO0016757 . ?g has molecular function ?t .
?g has_associated_phenotype ?b2 . ?b2
has_textual_description ?d . FILTER (?d,
muscular distrophy, i) . FILTER (?d,
congenital, i)
From medinfo paper. Adapted from Olivier
Bodenreider, presentation at HCLS Workshop, WWW07
38
Semantic Web in Action
  • Knowledge Driven Query Formulation

39
4. Knowledge driven query formulation
  • Status Research prototype and in progress
  • Workflow with Semantic Annotation of Experimental
    Data already in use
  • Where UGA
  • What Knowledge driven query formulation
  • Semantic Problem Solving Environment (PSE) for
    Trypanosoma cruzi (Chagas Disease)

40
Knowledge driven query formulation
  • Complex queries can also include
  • - on-the-fly Web services execution to retrieve
    additional data
  • inference rules to make implicit knowledge
    explicit

41
T.Cruzi PSE Query Interface
42
N-Glycosylation Process (NGP)
Cell Culture
extract
Glycoprotein Fraction
proteolysis
Glycopeptides Fraction
1
Separation technique I
n
Glycopeptides Fraction
PNGase
n
Peptide Fraction
Separation technique II
nm
Peptide Fraction
Mass spectrometry
ms data
ms/ms data
Data reduction
Data reduction
ms/ms peaklist
ms peaklist
binning
Peptide identification
Glycopeptide identification and quantification
N-dimensional array
Peptide list
Signal integration
Data correlation
43
Semantic Web Process to Incorporate Provenance
Semantic Annotation Applications
44
ProPreO Ontology-mediated provenance
parent ion charge
830.9570 194.9604 2 580.2985
0.3592 688.3214 0.2526 779.4759
38.4939 784.3607 21.7736 1543.7476
1.3822 1544.7595 2.9977 1562.8113
37.4790 1660.7776 476.5043
parent ion m/z
parent ionabundance
fragment ion m/z
fragment ionabundance
ms/ms peaklist data
Mass Spectrometry (MS) Data
45
ProPreO Ontology-mediated provenance
ltms-ms_peak_listgt ltparameter instrumentmicromas
s_QTOF_2_quadropole_time_of_flight_mass_spectromet
er modems-ms/gt ltparent_ion
m-z830.9570 abundance194.9604
z2/gt ltfragment_ion m-z580.2985
abundance0.3592/gt ltfragment_ion
m-z688.3214 abundance0.2526/gt ltfragment_i
on m-z779.4759 abundance38.4939/gt ltfragme
nt_ion m-z784.3607 abundance21.7736/gt ltfr
agment_ion m-z1543.7476 abundance1.3822/gt
ltfragment_ion m-z1544.7595 abundance2.9977/
gt ltfragment_ion m-z1562.8113
abundance37.4790/gt ltfragment_ion
m-z1660.7776 abundance476.5043/gt lt/ms-ms_pea
k_listgt
OntologicalConcepts
Semantically Annotated MS Data
46
Semantic Web in Action Industry Examples
47
  • Zemanta
  • Twine
  • Digger
  • Calais Reuters Thompson
  • Powerset
  • Talis

48
Emerging Research Areas
49
Fact Extraction and Schema Creation
  • Knowledge Extraction from Community-Generated
    Content

50
Motivation
  • ? What will you want to know tomorrow?

51
A Field of Interest
  • Based on a simple domain query, grow a field of
    interest

52
Fact Extraction From Community Content
  • Search helps us find relevant pages/articles
  • But It doesnt answer questions.

53
Fact Extraction From Community Content
  • Fact Extraction is the first step towards
    answering questions.
  • Famous new company that does fact extraction from
    Wikipedia is Powerset http//www.powerset.com

54
Fact Extraction From Community Content
  • Problem without a guiding schema, extracted
    predicates are just terms
  • ? useful for humans, but not for machines

55
Fact Extraction From Community Content
  • Expert-created schemas are expensive and usually
    very restricted

56
Fact Extraction From Community Content
  • Solution Have a community-generated schema
  • ? Wikipedia hierarchy for terms and concepts

57
Hierarchy Creation
Query cognition
58
Fact Extraction From Community Content
  • Solution Have a community-generated schema
  • ? Wikipedia hierarchy for terms and concepts
  • See Automatic Domain Model creation
  • ? Wikipedia Infoboxes for relationship types

Relationshiptypes
59
Learn Patterns that indicate Relationships
  • in Sydney, New South Wales, Australia
  • Sydney is the most populous city in Australia
  • Canberra, the Australian capital city
  • Canberra is the capital city of the Commonwealth
    of Australia
  • Canberra, the Australian capital
  • in Sydney, New South Wales, Australia
  • Sydney is the most populous city in Australia
  • Canberra, the Australian capital city
  • Canberra is the capital city of the Commonwealth
    of Australia
  • Canberra, the Australian capital

We know that countries have capitals. Which one
is Australias?
60
Add relationships
61
Fact Extraction From Community Content
  • Different textual patterns give different levels
    of support for different relationship types
  • Sydney, Australia gives some support towards the
    Capital relationship, because it indicates that
    Sydney is in Australia
  • Can see it as Necessary, but not sufficient

62
Fact Extraction From Community Content
  • Many textual patterns are possibly indicating a
    relationship type

63
Fact Extraction From Community Content
  • The accumulation of many pattern occurrences give
    the necessary support
  • Canberra, Australia ? minimal positive support
  • The Australian capital of Canberra ? additional
    major support

64
Summary
  • Create Domain models from seed queries or seed
    concepts
  • Connect the concepts in the created domain models
    with valid relationships
  • Learn pertinent patterns for relationships
  • Find evidence for relationships in text
  • Wikipedia
  • WWW

65
Discovering Undiscovered Knowledge
  • Connecting the Dots

66
How are Harry Potter and Dan Brown related?
67
Motivation
  • Undiscovered Public Knowledge Swanson 89
  • Hidden connections in text
  • Our objective build mechanisms to reveal these
    connections
  • Our approach
  • Populate existing ontology schemas via
    information extraction from text
  • Use the extracted information to
  • Support browsing
  • Text retrieval
  • Knowledge discovery

68
Populating existing ontologies via information
extraction
  • Information Extraction
  • Entities the artifacts that are being talked
    about
  • Relationships how these entities are connected

69
Discovering the Undiscovered Knowledge
  • Swansons discoveries Associations between
    Migraine and Magnesium Hearst99
  • stress is associated with migraines
  • stress can lead to loss of magnesium
  • calcium channel blockers prevent some migraines
  • magnesium is a natural calcium channel blocker
  • spreading cortical depression (SCD) is implicated
    in some migraines
  • high levels of magnesium inhibit SCD
  • migraine patients have high platelet
    aggregability
  • magnesium can suppress platelet aggregability
  • Data sets generated using these entities (marked
    red above) as boolean keyword queries against
    pubmed
  • Bidirectional breadth-first search used to find
    paths in resulting RDF

70
(No Transcript)
71
Background Knowledge Used
  • UMLS A high level schema of the biomedical
    domain
  • 136 classes and 49 relationships
  • Synonyms of all relationship using variant
    lookup (tools from NLM)
  • 49 relationship their synonyms 350 mostly
    verbs
  • MeSH
  • 22,000 topics organized as a forest of 16 trees
  • Used to query PubMed
  • PubMed
  • Over 16 million abstract
  • Abstracts annotated with one or more MeSH terms

T147effect T147induce T147etiology
T147cause T147effecting T147induced
72
Method Parse Sentences in PubMed
SS-Tagger (University of Tokyo)
SS-Parser (University of Tokyo)
  • Entities (MeSH terms) in sentences occur in
    modified forms
  • adenomatous modifies hyperplasia
  • An excessive endogenous or exogenous
    stimulation modifies estrogen
  • Entities can also occur as composites of 2 or
    more other entities
  • adenomatous hyperplasia and endometrium
    occur as adenomatous hyperplasia of the
    endometrium

(TOP (S (NP (NP (DT An) (JJ excessive) (ADJP (JJ
endogenous) (CC or) (JJ exogenous) ) (NN
stimulation) ) (PP (IN by) (NP (NN estrogen) ) )
) (VP (VBZ induces) (NP (NP (JJ adenomatous) (NN
hyperplasia) ) (PP (IN of) (NP (DT the) (NN
endometrium) ) ) ) ) ) )
73
(No Transcript)
74
Utilizing Extracted Knowledge
  • Supporting browsing, querying and knowledge
    discovery
  • Semantic Browser
  • Query semi-structured representations
  • SPARQL
  • Hypothesis-Driven Retrieval
  • Discovery complex connection patterns
  • Knowledge Discovery operators

75
The projected future of research in Biology
  • From
  • Hypothesis driven wet lab experiments
  • To
  • Data-driven reduction/pruning of hypothesis
    space leading to new insight and possibly
    discovery

76
Example - Evaluating Hypotheses
Keyword query MigraineMH MagnesiumMH
PubMed
77
Example - Semantic Browser
  • Click to Launch

78
Web 2.0
  • Man Meets Machine

79
Putting the man back in Semantics
  • Semantic Web focuses on artificial agents
  • Web 2.0 is made of people (Ross Mayfield)
  • Web 2.0 is about systems that harness collective
    intelligence. (Tim OReilly)
  • The relationship web combines the skills of
    humans and machines

80
Putting the man back in Semantics
Semantic Web focuses on artificial agents
Web 2.0 is made of people (Ross Mayfield)
Web 2.0 is about systems that harness collective
intelligence. (Tim OReilly)
The relationship web combines the skills of
humans and machines
81
Going places
Formal
Powerful
Social, Informal
Implicit
82
A Communitys Pulse
  • Wealth of information available in blogs, social
    networks, chats etc.
  • Free medium of self-expression makes mass
    opinions / interests available
  • Polling for popular culture opinions is easier
  • Social Production undeniably affects markets
  • Results of analysis more effectively tailored to
    specific audience geo-specific retail ads,
    demographic interests in music

83
Buzz on MySpace
  • Mining artist popularity from chatter on MySpace
  • Lists close to listeners preferences
  • vs.
  • Bill Boards

84
The How
  • Metadata Extraction from Comments
  • Artist, Track name in comments are common words
  • Keep your smile on Lil. (Artist Lilly Allen,
    Track Smile)
  • Necessitate a combination of linguistic,
    statistical, domain knowledge and domain specific
    rules to do well
  • Detecting and discarding Spam
  • Accurate popularity estimates
  • Transliterating Slang
  • I say Your music is wicked
  • What I really mean Your music is good
  • Hypercube Demographics' of users who post,
    non-spam positive and negative sentiment comment
    counts
  • Lets one ask questions like Who is the most
    popular artists among the 19 year olds in New
    York?

85
Opportunities
  • Casual Text more and more pervasive
  • Extracting Semantic Metadata a whole different
    problem
  • What works for a news article, scientific
    literature does not work well for content that
    does not follow rules of edited text
  • Need to systematically understand differences in
    these types of text in order to improve enablers
    like entity extraction

86
Event Web and the Semantic Sensor Web
  • Time, Space and Theme

87
Events Spatial, Temporal and Thematic
Spatial
Temporal
Thematic
88
Events and STT Dimensions
  • Powerful mechanism to integrate content
  • Describes Real-World occurrences
  • Can have video, images, text, audio (same event)
  • Search and Index based on events and STT
    relations
  • Many relationship types
  • Spatial
  • What events happened near this event?
  • What entities/organizations are located nearby?
  • Temporal
  • What events happened before/after/during this
    event?
  • Thematic
  • What is happening?
  • Who is involved?

Going further Can we use Who? Where? What?
Why? When? How? Use integrated STT
analysis to explore cause and effect
89
Events and STT Dimensions
E2Soldier
E4Address
lives_at
located_at
located_at
E6Address
lives_at
Georeferenced Coordinate Space (Spatial Regions)
E1Soldier
E1Soldier
occurred_at
E7Battle
assigned_to
participates_in
E8Military_Unit
E8Military_Unit
participates_in
assigned_to
E5Battle
Residency
occurred_at
Battle Participation
E3Soldier
Dynamic Entities
Spatial Occurrents
Named Places
90
Scenario Sensor Data Fusion and Analysis
High-level Sensor
Low-level Sensor
  • How do we determine if the three images depict
  • the same time and same place?
  • the same entity?
  • a serious threat?

90
91
Data Pyramid
Sensor Data Pyramid
Knowledge
Ontology Metadata
Expressiveness
Entity Metadata
Information
Feature Metadata
Raw Sensor (Phenomenological) Data
Data
92
What is Sensor Web Enablement?
http//www.opengeospatial.org/projects/groups/sens
orweb
92
93
SWE Components - Languages
Information Model for Observations and Sensing
Sensor and Processing Description Language
Observations Measurements (OM)
SensorML (SML)
GeographyML (GML)
Real Time Streaming Protocol
Common Model for Geographical Information
Sam Bacharach, GML by OGC to AIXM 5 UGM, OGC,
Feb. 27, 2007.
94
SWE Components Web Services
Sensor Observation Service Access Sensor
Description and Data
Sensor Planning Service Command and Task Sensor
Systems
Discover Services Sensors Providers Data
Sensor Alert Service Dispatch Sensor Alerts to
registered Users
Accessible from various types of clients from
PDAs and Cell Phones to high end Workstations
Sam Bacharach, GML by OGC to AIXM 5 UGM, OGC,
Feb. 27, 2007.
95
Semantic Sensor Web
95
96
Data-to-Knowledge Architecture
  • Knowledge
  • Object-Event Relations
  • Spatiotemporal Associations
  • Provenance/Context

Data Storage (Raw Data, XML, RDF)
Semantic Analysis and Query
  • Information
  • Entity Metadata
  • Feature Metadata

Feature Extraction and Entity Detection
Semantic Annotation
  • Data
  • Raw Phenomenological Data

Sensor Data Collection
Ontologies
  • Space Ontology
  • Time Ontology
  • Domain Ontology

96
97
Semantic Sensor Observation Service
S-SOS Client
BuckeyeTraffic.org
Collect Sensor Data
HTTP-GET Request
OM-S or SML-S Response
Semantic Sensor Observation Service
Oracle SensorDB
Get Observation
Describe Sensor
Get Capabilities
  • Ontology Rules
  • Weather
  • Time
  • Space

SWE
Annotated SWE
Semantic Annotation Service
98
Standards Organizations
W3C Semantic Web
  • SAWSDL
  • SA-REST
  • SML-S
  • OM-S
  • TML-S
  • Resource Description Framework
  • RDF Schema
  • Web Ontology Language
  • Semantic Web Rule Language

Web Services
  • Web Services Description Language
  • REST

OGC Sensor Web Enablement
Sensor Ontology
  • SensorML
  • OM
  • TransducerML
  • GeographyML

National Institute for Standards and Technology
SAWSDL is now a W3C Recommendation
  • Semantic Interoperability Community of Practice
  • Sensor Standards Harmonization

Sensor Ontology
99
Current Research - STT Relationship Analysis
  • Modeling Spatial and Temporal data using SW
    standards (RDF(S))1
  • Upper-level ontology integrating thematic and
    spatial dimensions
  • Use Temporal RDF3 to encode temporal properties
    of relationships
  • Graph Pattern queries over spatial and temporal
    RDF data2
  • Extended ORDBMS to store and query spatial and
    temporal RDF
  • User-defined functions for graph pattern queries
    involving spatial variables and spatial and
    temporal predicates
  • Implementation of temporal RDFS inferencing
  • Extended SPARQL for STT queries

100
Conclusion
101
Take Home Points
  • Semantics - from documents, to entities, to
    relationships
  • Richer, meaningful representations offer more
    insight, powerful reasoning capabilities
  • Semantics and Web technologies for integration
    of information from disparate sources, often
    created for very different purposes with lesser
    human involvement
  • Semantic Web is highly interdisciplinary uses
    IR, AI, KR, DB, DC, ...
  • Increasing mesh of Semantics, Services, People
    for better exploitation of resources (data,
    sensors, services, people)

102
Kno.e.sis Labs (3rd floor, Joshi)
Semantic Sciences Lab (Dr Sheth)
Bioinformatics Lab (Dr Raymer)
Semantic Web Lab (Dr Sheth Dr. S.Wang)
Service Research Lab (Dr Sheth)
Metadata and Languages Lab (Dr Prasad)
Data Mining Lab (Dr Dong)
Sensor Networking Bin Wang
Joint Proposals With Each
103
Kno.e.sis Members a subset
104
References
  • Projects http//knoesis.org/research/
  • Demos at http//knoesis.wright.edu/library/demos/
  • Publications http//knoesis.wright.edu/library
  • Rest http//knoesis.org
  • Thanks to our key sponsors National Science
    Foundation, National Institute of Health, AFRL
    and industry partners.
About PowerShow.com