Inside semantic Web search engines: between semantic annotation and Natural Language Processing Dentro i motori di ricerca semantici: tra annotazione semantica ed elaborazione della lingua naturale - PowerPoint PPT Presentation

About This Presentation
Title:

Inside semantic Web search engines: between semantic annotation and Natural Language Processing Dentro i motori di ricerca semantici: tra annotazione semantica ed elaborazione della lingua naturale

Description:

Web search engines are constantly being developed in order to answer to user needs. This development process focuses not only on lexical pattern matching, but also on ... – PowerPoint PPT presentation

Number of Views:195
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Inside semantic Web search engines: between semantic annotation and Natural Language Processing Dentro i motori di ricerca semantici: tra annotazione semantica ed elaborazione della lingua naturale


1
Inside semantic Web search engines between
semantic annotation and Natural Language
Processing Dentro i motori di ricerca semantici
tra annotazione semantica ed elaborazione della
lingua naturale
  • Incontro ISKO Italia - Torino 3 aprile 2009

Intervento di Mela Bosch melabosch_at_europe.com
2
Terminology on Web Search Engines
  • Text Search Engine based on Lexical analysis.
    The main aim of the lexical analysis is to divide
    the text into paragraphs, sentences and words and
    also entities such as e-mail addresses or URLs.
    All these elements are knows as tokens, and the
    Search Engine makes a parsing with statistical
    parameters to develop a range of links as a
    response to a query.
  • Latent semantic indexing (LSI) based on Latent
    semantic analysis (LSA) LSI is a technique of
    Natural Language Processing (NLP) which uses an
    indexed database of documents to find similar
    terms. It can find a synonym and then return the
    best matched websites for the query. LSI does not
    require exact matching words for ranking result.
  • Semantic Web search engines take the sense of
    a word as a factor in its ranking lists or offers
    the user a choice as to the sense of a word or
    phrase.

3
Semantic Web search engines or Search
engines of 3rd generation
  • Three types
  • User oriented Semantic Web search engine It
    returns web page links. It can use internally
    both Semantic Web technologies and LSI. Ex.
    True Knowledge, Hakia and PowerSet.
  • Semantic Web Services oriented engine It
    returns links to ontologies, OWL files, RDF
    instances. It is inadequate for end users. Ex.
    SOWL, WSE, Watson, Falcons, Sindice and Swoogle.
    The idea is to provide ways for businesses to
    inter-operate across domains or services.
  • Social-semantic Web oriented engine The
    socio-semantic web (s2w) uses classification and
    ontologies in very practical situations. S2w
    search engines aim is to complement the formal
    Semantic Web vision adding a pragmatic
    collaborative tagging (folksonomy) approach. The
    main interest is to to enable users to share
    knowledge. Ex. http//www.stumpedia.com/

4
Semantic Web search engines. What are all these
differences for?
The components of Semantic Web search engines
  • Semantic Web means many things to different
    people
  • It is about artificial intelligence, computer
    programs solving complex optimization problems
  • It is about web services, in terms of end user
    value
  • It is the web of data, where information is
    represented in RDF or microformats and OWL.
  • See http//www.readwriteweb.com/archives/semantic
    _web_patterns_a_guide_redux.php
  • Natural Language Processing (NLP)
  • Annotation

5
Annotation
  • Free-text annotation
  • The annotations can be comments, notes,
    explanations, references, examples, advice,
    corrections or any other type of external remark
    that can be attached to or embedded in a Web
    document or a selected part of the document.
  • See http//www.ncb.ernet.in/groups/dake/annotate/
    intro.shtml
  • Semantic annotation in general
  • Semantic annotation is the association of a data
    entity with an
  • element from a classification scheme, ontology or
    other knowledge repository
  • Examples of semantic annotation
  • the assignment of MeSH descriptors to citations
    in MEDLINE
  • the assignment of Gene Ontology terms to gene
    products in UniProt

6
Semantic Web Annotation
Is the technique for uploading machine
understandable data on the Web by creating
metadata through semantic tagging
  • A semantic annotation is a formal annotation,
    where the predicate is an ontological term, and
    the object conforms to an ontological definition.
  • The term annotation can denote both the process
    of annotating and the result of that process.
  • It is crucial to the fulfillment of the Semantic
    Web to give useful meaning to data or to
    unstructured text

7
Semantic Web Annotation
The Semantic Web Annotation process includes
three components
  • an ontology which describes the domain of
    interest
  • a data instance recognition process that
    discovers all instances of interest in target web
    documents based on the defined ontology
  • an annotation generation process creates a
    semantic meaning disclosure file for each
    annotated document. Through the semantic meaning
    disclosure file, any ontology-aware machine agent
    can understand the target document.
  • See http//www.deg.byu.edu/ding/research/Semantic
    Annotation.html

8
Annotation can be manually, automatically or
semi-automatically generated
  • The process of annotating requires semantic
    annotation tools


Types of semantic annotation tools
  • Inline annotation means that the original
    document is augmented with metadata information.

It focuses on annotating information on
pages using RDF so that it is machine readable
Embedded metadata
Also called Semantic Authoring or Bottom-up
approach
9
Types of semantic annotation tools
  • Standoff annotation means that the metadata is
    stored separately from the original document.

Attached metadata

The annotations are then stored in a database
that is made available tousers via websites and
sometimes via web services
  • It is generally preferable from the point of view
    of inter-operability

Also called top-down approach. Its focus is
leveraging information from existing web pages,
to derive meaning automatically
10
There are several choices for annotation

11
The components of Semantic Web search engines
  • Natural Language Processing (NLP)
  • Initially NLP
  • is conceived as a support for Linguistics studies
  • aims at using computers to interpret and
    manipulate words as a part of a language

A powerful method for the investigation and
evaluation of human language itself. i.e.
enhanced study over large corpora of texts
  • Then
  • Artificial Intelligence defines NLP as the act of
    using computers to process written and spoken
    languages for some practical purpose such as
    translating languages, or carrying conversations
    with machines.

12
The components of Semantic Web search engines
  • Natural Language Processing (NLP)
  • After the Web explosion NLP has been used for the
    development of natural language understanding
    systems that convert samples of human language
    into more formal representations that are easier
    to manipulate for computer programs.
  •  
  • Now
  • Thanks to the NLP techniques different algorithms
    such as chunking, clustering, parsing,
    spellchecking, tagging, and word sense
    disambiguation are used to handle text
    intelligently and to get information from the Web
    on text data banks in order to answer questions

13
Conclusion
  • However, both methodologies are now being
    combined
  • semantic web search engines need many pages to be
    annotated (which requires an enormous effort),
  • so that NLP becomes an important help in
    automatic or semi-automatic annotation.
  • At the same time the precision of text analysis
    may be optimized by means of techniques of
    assignment provided by users and professionals.

In conclusion, the trend is the development of
collective knowledge systems that improve as more
people participate, as they are based on human
contributions. All of this will possibly be
integrated by NLP algorithms.
14
References

Iskold, Alex. (2006) Semantic Web Patterns A
Guide to Semantic Technologies.
http//www.readwriteweb.com/archives/semantic_web_
patterns_a_guide_redux.php   Atanas, K. et al.
(2005) Semantic Annotation, Indexing, and
Retrieval. Ontotext Lab. http//www.ontotext.com/
publications/SemAIR_ISWC169.pdf   Vehvilainen, A.
et al. (2006) SemiAutomatic Semantic Annotation
and Authoring, Tool for a Library Help Desk
Service. Helsinki University.
http//www.seco.tkk.fi/publications/2006/vehvilain
en-hyvonen-alm-semi-automatic-semantic-annotation-
and-authoring-tool.pdf   Diana Maynard (2005)
Benchmarking ontology-based annotation tools for
the Semantic Web. Department of Computer Science,
University of Sheffield, UK.http//gate.ac.uk/sale
/ahm05/ahm.pdf   Good, Benjamin M Kawas,
Edward Wilkinson, Mark. (2007) Bridging the gap
between social tagging and semantic annotation
E.D. the Entity Describer. http//precedings.natu
re.com/documents/945/version/2/html    Useful
links http//www.semanticfocus.com/ http//logic.
stanford.edu/oem/projects.html_Coordinating_Colle
ctive_Work http//semantic-mediawiki.org/wiki/Sema
ntic_MediaWiki        
Write a Comment
User Comments (0)
About PowerShow.com