Title: Towards the Semantic Web 6 Generating Ontologies for the Semantic Web: OntoBuilder R.H.P. Engles and T.Ch.Lech
1Towards the Semantic Web 6 Generating
Ontologies for the Semantic Web
OntoBuilderR.H.P. Engles and T.Ch.Lech
21. The overall
- OntoBuilder
- Extraction of information from texts for building
knowledge bases. - Consist of the two modules OntoExtract and
OntoWrapper.
31.1 The overall architecture
41.2 OntoExtract and OntoWrapper(1/2)
- OntoExtract
- Semi-automatic Ontology construction from
unstructured information (natural language
sources). - OntoWrapper
- Semi-automatic Ontology construction from
semi-structured and structured information
sources. - extract information from places on specific sites
(e.g. names, email addresses, telephone numbers).
5 1.2 OntoExtract and OntoWrapper(2/2)
- CORPORUM is dependent on a linguistic analysis of
a given text, comprising normalization,
tokenization and part-of-speech tagging. - Relations between concepts are defined
(e.g. subClassOf relations, or InstanceOf
relations). - Through semantic analysis of a domain, the tool
can automatically generate relation between words
within a domain. - Visualization of such semantic structures can
than be used for navigation and browsing through
document sets.
62. OntoExtract(1/3)
- OntoExtract supports analysis of natural language
texts and generates lightweight, domain specific
ontologies of these texts (utilizing already
existing knowledge from a central data
repository). - OntoExtract is able to
- analysis of natural language,
- provide initial ontologies,
- refine existing ontologies,
- find relations between key terms in documents,
- find instances of concepts within document,
- finds classes, sub-class relationships.
72. OntoExtract(2/3)
- How does OntoExtract currently work
- parses, tokenizes and analyses text,
- generates nodes and relations between them,
- enhances specific aspects of the discovered
knowledge item using a background
repository (containing general knowledge of the
world, represented in Sesame), - and the final analysis results are submitted to
the RDFS server Sesame.
8Sesame domain knowledge
Sesame background knowledge
93. OntoWrapper
- OntoWrapper
- deal with the analysis of structured pages
- allow the user to define XML/RDF templates,
variables and rule sets to perform a structured
analysis of a specific domain - generate the merged output and sending it to the
Sesame repository as data statements about
specific pages.
104.1 Generating Semantic Structures(1/2)
- Generation of semantic knowledge in information
extraction is based upon the result of parsing
steps that can be of varying analysis depth. - Level of Linguistic Analysis
- Tokenization
- Lexical/Morphological Analysis
- POS tagging
- Syntactic Analysis
- Semantic/Pragmatic Analysis
- Discourse Analysis
- CORPORUMs lexical analysis includes
- text normalization, tokenization, POS tagging
114.1 Generating Semantic Structures(2/2)
- In OntoExtract the initial analysed and annotated
text is transformed into an internal
representation that makes use of a variety of
linguistic analysis steps to come to an initial
interpretation of what is written. - Representation contains the original text, its
annotations, but also the resolutions performed
on it. - The semantic structures undergo a translation
such a more formal representation.
124.2 Generating Ontologies from Textual Resources
- How the translation from linguistics into
formalisms can be done properly - problem of representation level what knowledge
should be represented at the ontology level/ fact
level (what represents an instance/ concept) - problem of dealing with the inheritance problem
- consistency between extracted ontologies and
their truth within specific domains - Ontologies are extracted from single documents
taken from the web( concepts are extracted,
created). These are set into relation with each
other, augmented with properties and found
instances are hooked up to them.
134.3 Visualization and Navigation
- The exported semantic network structures and be
run through a graph layout algorithm in order to
generate visualizations (with CCA viewer). - Intercluster relationships are used to navigate
from one cluster to another by relevant concepts.
145. Issues in Using Automated Text Extraction for
Ontology Building using IE on Web Resources
- Internet has an additional challenge
multi-cultural background of the authors - Generated ontologies can be used as seed
ontologies , automatically generated from a
variety of user defined documents.
15(No Transcript)
16(No Transcript)
17(No Transcript)
18(No Transcript)
19(No Transcript)
20(No Transcript)