After OWL: defacto standards - PowerPoint PPT Presentation

About This Presentation
Title:

After OWL: defacto standards

Description:

Kalina Bontcheva, Valentin Tablan, Diana Maynard, Wim Peters, Niraj ... Acronym soup: GATE: HLT API 4 SDK SW & KT. An application: Ontology-Based IE in KIM ... – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 25
Provided by: ham48
Category:

less

Transcript and Presenter's Notes

Title: After OWL: defacto standards


1
After OWL defacto standards for semantic
technologies (or what do you get for 40m EU
research money?) http//gate.ac.uk/
http//nlp.shef.ac.uk/ Hamish
Cunningham,Kalina Bontcheva, Valentin Tablan,
Diana Maynard,Wim Peters, Niraj Aswani, Milena
Yankova, Yaoyong Li, Akshay Java, Michael
Dowman ILASH workshop, March 2004
2
Structure of the talk
  • Context
  • increasing use of semantic technology in IT
  • the role(s) of human language technology
  • substantial investment in the next phase of
    semantic web research
  • Semantic Web moving on from formal standards
  • Acronym soup
  • GATE HLT API 4 SDK SW KT
  • An application Ontology-Based IE in KIM
  • Issues in API design, next steps

3
The Knowledge Economy and Human Language
  • Gartner, December 2002
  • taxonomic and hierachical knowledge mapping and
    indexing will be prevalent in almost all
    information-rich applications
  • through 2012 more than 95 of human-to-computer
    information input will involve textual language
  • A contradiction
  • to deal with the information deluge we need
    formal knowledge in semantics-based systems
  • our information spaces are in informal and
    ambiguous natural language
  • The challenge to reconcile these two phenomena

4
HLT Closing the Loop
KEY MNLG Multilingual Natural Language
GenerationOIE Ontology-aware Information
ExtractionAIE Adaptive IECLIE Controlled
Language IE
(M)NLG
Semantic Web Semantic GridSemantic Web
Services
Formal Knowledge(ontologies andinstance bases)
HumanLanguage
OIE
(A)IE
ControlledLanguage
CLIE
5
SEKT Semantic Knowledge Technology
  • 6th framework IP project
  • Duration 36 months from 1/1/4, 12.5m
  • http//sekt.semanticweb.org/
  • Improve automation of ontology and metadata
    generation
  • Develop highly-scalable solutions
  • Research sound inferencing despite inconsistent
    models
  • Develop semantic knowledge access tools
  • Develop methodology for deployment

6
PrestoSpace (20th Century Rot)
  • 20th Century audio-visual media is rapidly
    disappearing
  • Preservation and restoration are high cost
  • The costs must be justified by increased access
  • Metadata descriptive information about content
  • PrestoSpace (9m IP, 40 months from 02/04)
  • rich metadata and semantic access
  • cross-lingual access
  • syndicated delivery
  • repurposeable content

7
The SDK research cluster
  • Building the European Research Area in KM
    through collaboration with related IP and NoE
    projects in this area for a coordinated impact
    strategy
  • SEKT, DIP, KnowledgeWeb SDK clusterhttp//sdk.
    semanticweb.org/
  • Other related projects
  • AceMedia IP (semantic knowledge systems)
  • PrestoSpace IP (cultural heritage / digital
    libraries)
  • BRICKS IP (cultural heritage / digital libraries)
  • Total EU/6FP investment in semantic tech.
    research 40m potential to influence the
    emergence of defacto standards

8
Next step for Semantics tech from formal to
defacto standards?
  • Computer scientists love standards, so we have
    many
  • For any given problem there are usually 3
    standards
  • OWL is no exception Lite, DL, Full
  • There are good reasons, but cf. RDF(S)
    implementation history applications will of
    necessity mix and match
  • If we can achieve standard practice and libraries
    in applications we will have made a next step and
    will promote takeup
  • (Pathological) example TCP/IP vs. OSI

9
HLT API 4 SDK SW KT
  • What sorts of software do we need?
  • Ontology and metadata management storage
    versionning caching, inferencing etc. (below)
  • Human language technology components and services
    (not monolithic systems, not unproven research
    prototypes)
  • The role of measurement in scaling and
    robustness in HLT this means MUC, TREC, ACE,
    TIDES, ...
  • Heres one we baked earlier....

10
GATE (the Volkswagen Beetle of Language
Processing) is
  • Eight years old, with the largest user
    constituency of its type
  • An architecture A macro-level organisational
    picture for LE software systems.
  • A framework For programmers, GATE is an
    object-oriented class library that implements the
    architecture.
  • A development environment For language engineers,
    computational linguists et al, a graphical
    development environment.
  • Some free components... ...and wrappers for other
    people's components
  • Tools for evaluation visualise/edit
    persistence IR IE dialogue ontologies etc.
  • Free software (LGPL). Download at
    http//gate.ac.uk/download/

11
Critical mass 000s people 00s sites
  • GATE users significant proportion of community.
    A small sample
  • the American National Corpus project
  • the Perseus Digital Library project, Tufts
    University, US
  • Longman Pearson publishing, UK
  • Merck KgAa, Germany
  • Canon Europe, UK
  • Knight Ridder, US
  • BBN (leading HLT research lab), US
  • SMEs Melandra, SG-MediaStyle, ...
  • Imperial College, London, the University of
    Manchester, UMIST, the University of Karlsruhe,
    Vassar College, the University of Southern
    California and a large number of other UK, US and
    EU Universities
  • UK and EU projects inc. MyGrid, CLEF, dotkom,
    AMITIES, CubReporter, Poesia...
  • GATE team projects. Past
  • Conceptual indexing MUMIS automatic semantic
    indices for sports video
  • MUSE, cross-genre entitiy finder
  • HSL, Health-and-safety IE
  • Old Bailey collaboration with HRI on 17th
    century court reports
  • Multiflora plant taxonomy text analysis for
    biodiversity research e-science
  • EMILLE S. Asian language corpus
  • ACE / TIDES Arabic, Chinese NE
  • JHU summer w/s on semtagging
  • Present
  • Advanced Knowledge Technologies 12m UK five
    site collaborative project
  • ETCSL Sumerian digital library
  • MiAKT medical informatics / AKT
  • SEKT Semantic Knowledge Tech
  • PrestoSpace AV Preservation
  • KnowledgeWeb h-TechSight

12
  •                                                
                                                    
                               
  • Architectural principles
  • Non-prescriptive, theory neutral (strength and
    weakness)
  • Re-use, interoperation, not reimplementation
    (e.g. diverse XML support, integration of
    Protégé, Jena, Weka, interoperation with SCHUG in
    MUMIS)
  • (Almost) everything is a component, and component
    sets are user-extendable
  • (Almost) all operations are available both from
    API and GUI
  • Why does this matter? It means that GATE works
    well with other tools, embeds easily, and
    achieves robustness through focus (API
    requirements)

13
All the worlds a Java Bean....
  • CREOLE a Collection of REusable Objects for
    Language Engineering
  • GATE components modified Java Beans with XML
    configuration
  • The minimal component 10 lines of Java, 10
    lines of XML, 1 URL
  • Why bother?
  • Allows the system to load arbitrary language
    processing components

14
WebServices
GATE APIs
Onto-logy
ProtégéOnto-logy
Word- net
Gaz-etteers
...
Language Resource Layer (LRs)
  • NOTES (2)
  • eg Protégé LR VR both wrapped in Res. (bean)
    API
  • ontology repositories and inference are the same
    KAON Sesame Orenge ?
  • NOTES
  • everything is a replaceable bean
  • all communication via fixed APIs
  • low coupling, high modularity, high
    extensibility

15
Issues (1) a common HLT API
  • OGSA, WMSO in the web services layer?
  • Eclipse less code for us, more services for
    users? (A free OWL/UML drawing tool, for example)
  • ISO TC37/SC4 JNLE special LIRICS consortium

16
API Application Ontology-based IE
XYZ was established on 03 November 1978 in
London. It opened a plant in Bulgaria in
Ontology KB
Location
Company
HQ
partOf
City
Country
type
type
HQ
type
type
establOn
partOf
03/11/1978
17
Classes, instances metadata
Gordon Brown met George Bush during his two day
visit.
ltmetadatagt ltDOC-IDgthttp// 1.htmllt/DOC-IDgt
ltAnnotationgt lts_offsetgt 0 lt/s_offsetgt
lte_offsetgt 12 lt/e_offsetgt ltstringgtGordon
Brownlt/stringgt ltclassgtPersonlt/classgt
ltinstgtPerson12345lt/instgt lt/Annotationgt
ltAnnotationgt lts_offsetgt 18 lt/s_offsetgt
lte_offsetgt 32 lt/e_offsetgt ltstringgtGeorge
Bushlt/stringgt ltclassgtPersonlt/classgt
ltinstgtPerson67890lt/instgt lt/Annotationgt lt/metad
atagt
Classesinstances before
Bush
Classesinstances after
18
OBIE in KIM
  • An ontology (KIMO) and 200K instances KB
  • High ambiguity of instances with the same label
    uses disambiguation step
  • Lookup phase marks mentions from the ontology
  • Combined with GATE-based IE system to recognise
    new instances of concepts and relations
  • KB enrichment stage where some of these new
    instances are added to the KB
  • Disambiguation uses an Entity Ranking algorithm,
    i.e., priority ordering of entities with the same
    label based on corpus statistics (e.g., Paris)

Popov et al. KIM. ISWC03
19
OBIE in KIM (2)
Popov et al. KIM. ISWC03
20
KIM demo...
Next steps in OBIE
  • Continue to exploit the pluggability and
    community effects of GATE (and Sesame, Lucene,
    ...)
  • SWAN Semantic Web Annotator at DERI/Galway
  • Syndication
  • Social networking
  • Evaluation (below)

21
(The P in OLP) ChallengeEvaluating Richer NE
Tagging
  • Need for new metrics when evaluating
    hierarchy/ontology-based NE tagging
  • Need to take into account distance in the
    hierarchy
  • Tagging a company as a charity is less wrong than
    tagging it as a person

22
SW IE Evaluation tasks
  • Detection of entities and events, given a target
    ontology of the domain.
  • Disambiguation of the entities and events from
    the documents with respect to instances in the
    given ontology. For example, measuring whether
    the IE correctly disambiguated Cambridge in the
    text to the correct instance Cambridge, UK vs
    Cambridge, MA.
  • Decision when a new instance needs to be added to
    the ontology, because the text contains a new
    instance, that does not already exist in the
    ontology.

23
Issues (2) a common OMM API
  • Two design approaches
  • the richest set of features approachpool
    experience, cover all the bases, be relevant to
    very many users (top-down)
  • the highest common factors approachanalyse
    software, pick common features, create
    plugability layer (bottom-up)
  • Both useful can be combined
  • Approach B. has some key advantages
  • leads to quicker version 1.0
  • minimises arguments (criteria feature exists in
    several sys, not is good)
  • Problems
  • features present several places but not all
    operation not supported?
  • new work not prefigured in version 1.0
    roadmaps, placeholders

24
The end
  • Tutorial on HLT for the Semantic Web at European
    Semantic Web Symposiumhttp//www.esws2004.org/
  • These slides http//gate.ac.uk/sale/talks/ilash-
    semweb-mar2004.ppt
  • More information http//gate.ac.uk/
    http//nlp.shef.ac.uk/

25
Whats the difference between Tony Blair and
Mother Theresa?
  • Theres good news and bad news...
  • The good news the Semantic Web is now a major
    focus of some of the world leaders in AI research
  • The bad news AI always fails
  • (Or what succeeds doesnt get called AI any
    more)
  • How does the machine tell the difference between
    Mother Theresa is a saint and Tony Blair is a
    saint? (It doesnt it has no sense of irony!)
  • Needed clever applications of simple semantics
    (contrast the success of RSS or DC with more
    complex schemes)
  • Defacto standards when we do the simple
    stuffrobustly and in the large
Write a Comment
User Comments (0)
About PowerShow.com