Title: Inside semantic Web search engines: between semantic annotation and Natural Language Processing Dentro i motori di ricerca semantici: tra annotazione semantica ed elaborazione della lingua naturale
1Inside semantic Web search engines between
semantic annotation and Natural Language
Processing Dentro i motori di ricerca semantici
tra annotazione semantica ed elaborazione della
lingua naturale
- Incontro ISKO Italia - Torino 3 aprile 2009
Intervento di Mela Bosch melabosch_at_europe.com
2Terminology on Web Search Engines
- Text Search Engine based on Lexical analysis.
The main aim of the lexical analysis is to divide
the text into paragraphs, sentences and words and
also entities such as e-mail addresses or URLs.
All these elements are knows as tokens, and the
Search Engine makes a parsing with statistical
parameters to develop a range of links as a
response to a query. - Latent semantic indexing (LSI) based on Latent
semantic analysis (LSA) LSI is a technique of
Natural Language Processing (NLP) which uses an
indexed database of documents to find similar
terms. It can find a synonym and then return the
best matched websites for the query. LSI does not
require exact matching words for ranking result. - Semantic Web search engines take the sense of
a word as a factor in its ranking lists or offers
the user a choice as to the sense of a word or
phrase.
3Semantic Web search engines or Search
engines of 3rd generation
- Three types
- User oriented Semantic Web search engine It
returns web page links. It can use internally
both Semantic Web technologies and LSI. Ex.
True Knowledge, Hakia and PowerSet. - Semantic Web Services oriented engine It
returns links to ontologies, OWL files, RDF
instances. It is inadequate for end users. Ex.
SOWL, WSE, Watson, Falcons, Sindice and Swoogle.
The idea is to provide ways for businesses to
inter-operate across domains or services. - Social-semantic Web oriented engine The
socio-semantic web (s2w) uses classification and
ontologies in very practical situations. S2w
search engines aim is to complement the formal
Semantic Web vision adding a pragmatic
collaborative tagging (folksonomy) approach. The
main interest is to to enable users to share
knowledge. Ex. http//www.stumpedia.com/
4Semantic Web search engines. What are all these
differences for?
The components of Semantic Web search engines
- Semantic Web means many things to different
people - It is about artificial intelligence, computer
programs solving complex optimization problems - It is about web services, in terms of end user
value - It is the web of data, where information is
represented in RDF or microformats and OWL. - See http//www.readwriteweb.com/archives/semantic
_web_patterns_a_guide_redux.php
- Natural Language Processing (NLP)
- Annotation
5Annotation
- Free-text annotation
- The annotations can be comments, notes,
explanations, references, examples, advice,
corrections or any other type of external remark
that can be attached to or embedded in a Web
document or a selected part of the document. - See http//www.ncb.ernet.in/groups/dake/annotate/
intro.shtml
- Semantic annotation in general
- Semantic annotation is the association of a data
entity with an - element from a classification scheme, ontology or
other knowledge repository - Examples of semantic annotation
- the assignment of MeSH descriptors to citations
in MEDLINE - the assignment of Gene Ontology terms to gene
products in UniProt
6Semantic Web Annotation
Is the technique for uploading machine
understandable data on the Web by creating
metadata through semantic tagging
- A semantic annotation is a formal annotation,
where the predicate is an ontological term, and
the object conforms to an ontological definition.
- The term annotation can denote both the process
of annotating and the result of that process.
- It is crucial to the fulfillment of the Semantic
Web to give useful meaning to data or to
unstructured text
7Semantic Web Annotation
The Semantic Web Annotation process includes
three components
- an ontology which describes the domain of
interest - a data instance recognition process that
discovers all instances of interest in target web
documents based on the defined ontology - an annotation generation process creates a
semantic meaning disclosure file for each
annotated document. Through the semantic meaning
disclosure file, any ontology-aware machine agent
can understand the target document.
- See http//www.deg.byu.edu/ding/research/Semantic
Annotation.html
8Annotation can be manually, automatically or
semi-automatically generated
- The process of annotating requires semantic
annotation tools
Types of semantic annotation tools
- Inline annotation means that the original
document is augmented with metadata information.
It focuses on annotating information on
pages using RDF so that it is machine readable
Embedded metadata
Also called Semantic Authoring or Bottom-up
approach
9Types of semantic annotation tools
- Standoff annotation means that the metadata is
stored separately from the original document.
Attached metadata
The annotations are then stored in a database
that is made available tousers via websites and
sometimes via web services
- It is generally preferable from the point of view
of inter-operability
Also called top-down approach. Its focus is
leveraging information from existing web pages,
to derive meaning automatically
10There are several choices for annotation
11 The components of Semantic Web search engines
- Natural Language Processing (NLP)
- Initially NLP
- is conceived as a support for Linguistics studies
- aims at using computers to interpret and
manipulate words as a part of a language
A powerful method for the investigation and
evaluation of human language itself. i.e.
enhanced study over large corpora of texts
- Then
- Artificial Intelligence defines NLP as the act of
using computers to process written and spoken
languages for some practical purpose such as
translating languages, or carrying conversations
with machines.
12 The components of Semantic Web search engines
- Natural Language Processing (NLP)
- After the Web explosion NLP has been used for the
development of natural language understanding
systems that convert samples of human language
into more formal representations that are easier
to manipulate for computer programs. -
- Now
- Thanks to the NLP techniques different algorithms
such as chunking, clustering, parsing,
spellchecking, tagging, and word sense
disambiguation are used to handle text
intelligently and to get information from the Web
on text data banks in order to answer questions
13Conclusion
- However, both methodologies are now being
combined - semantic web search engines need many pages to be
annotated (which requires an enormous effort), - so that NLP becomes an important help in
automatic or semi-automatic annotation. - At the same time the precision of text analysis
may be optimized by means of techniques of
assignment provided by users and professionals.
In conclusion, the trend is the development of
collective knowledge systems that improve as more
people participate, as they are based on human
contributions. All of this will possibly be
integrated by NLP algorithms.
14References
Iskold, Alex. (2006) Semantic Web Patterns A
Guide to Semantic Technologies.
http//www.readwriteweb.com/archives/semantic_web_
patterns_a_guide_redux.php Atanas, K. et al.
(2005) Semantic Annotation, Indexing, and
Retrieval. Ontotext Lab. http//www.ontotext.com/
publications/SemAIR_ISWC169.pdf Vehvilainen, A.
et al. (2006) SemiAutomatic Semantic Annotation
and Authoring, Tool for a Library Help Desk
Service. Helsinki University.
http//www.seco.tkk.fi/publications/2006/vehvilain
en-hyvonen-alm-semi-automatic-semantic-annotation-
and-authoring-tool.pdf Diana Maynard (2005)
Benchmarking ontology-based annotation tools for
the Semantic Web. Department of Computer Science,
University of Sheffield, UK.http//gate.ac.uk/sale
/ahm05/ahm.pdf Good, Benjamin M Kawas,
Edward Wilkinson, Mark. (2007) Bridging the gap
between social tagging and semantic annotation
E.D. the Entity Describer. http//precedings.natu
re.com/documents/945/version/2/html Useful
links http//www.semanticfocus.com/ http//logic.
stanford.edu/oem/projects.html_Coordinating_Colle
ctive_Work http//semantic-mediawiki.org/wiki/Sema
ntic_MediaWiki