CONCEPT MODELING: presentation

About This Presentation

Transcript and Presenter's Notes

Title: CONCEPT MODELING:

1
CONCEPT MODELING
A Research Review
Ðorde Popovic, Ognjen Šcekic, Veljko Milutinovic
January July 2006.
2
Initial Assignment

January 2006 Initial assignmentGet acquainted
with different ways of Concept Modeling,in
general.
More specifically, explore the possibilities
offered by RDF and OWL.
One of the ideas Use the 7 Ws - WHAT,
WHO, WHEN, WHERE, WHY, WHICH, (W)HOW.

3
What is concept modeling?

A way of modeling reality
Identifying concepts
Identifying relations among concepts
Organizing the concepts in a knowledge-base,
allowing an "intelligent" way to search and
process this data.
Why do we need concept modeling?To make
electronic resources not only machine-processable,
but also machine-understandable!

4
Challenges

How to create a model that has a uniform
structure, and is powerful enough to capture the
essence of any concept?
How should these models be linked into an
efficient structure?
How can we bridge the gap between natural
languageand a machine-processable model?

5
Why start with patents?

Described by a very formal, structured language
claims.
Each patent is a novel concept.
Definition of one patent is usually based on
another one.

6
Structure of a Patent Document
General info about the patent
Description
References to related patents
Claims primary target for What
Abstract of the patent

7
Conceptual Indexing

What is conceptual indexing?
New technique for organizing information to
support subsequent access that can dramatically
improve your ability to find the information you
need,with less hassle and with better results.
William A.
Woods
Conceptual indexing combines techniques of
Knowledge representation
Natural language processing
Classical techniques for indexing words and
phrases
Bridges the gap between natural languageand a
machine processable model.

8
Conceptual Indexing

Conceptual indexing technology is a combination
of
Concept extractor
Identifies phrases to be indexed.
Concept assimilator
Analyzes a concept phrase to determine
its place in the conceptual taxonomy.
Conceptual retrieval system
Uses conceptual taxonomy to make connections
between requested and indexed items.

Figure 1 Main components of a conceptual indexer
9
Hybrid Approach Indices RDF/OWL

Conceptual indices
RDF/OWL
Motivation Use the advantages of one approach
to eliminate the drawbacks of the other.

10
Conceptual Indices vs. RDF/OWL
Conceptual indices RDF/OWL ontologies
Major advantages Linear-complexity structures Very expressive and precise
Major advantages Provide basic subsumption relations Based on First-Order Logic
Major advantages Provide built-in knowledgeon low-level concepts Supported by W3C
Major drawbacks Incapability of establishing explicit relations among high-level concepts Great complexity
Major drawbacks Incapability to create precise models Great complexity
11
Why not use ontologies alone?

If we want to use an ontology we have 2 choices
Use an existing, well-established ontology that
might not suite our needs.
Create a new ontology which does suit our needs
We can create several different
ontologies,depending on how we want to capture
the information.
Problems arise when we want to merge ontologies.
This approach works fine within a closed
communitywith specific needs
There already exists a well-defined basic
ontology structure.
Community members have a good knowledge of how to
model new conceptsin terms of the existing ones.

12
Why not use indices alone?

For example, let us take the simplest possible
definition, for a bird
bird 1 a creature with wings and feathers
that lays eggs and can usually fly.
Our index might then contain the following
associationscreature, wings, feathers, eggs,
fly.
A conceptual index does not offer the possibility
to state the fact that some birds do not fly!

1 - Word definition taken from Longman Dictionary
of Contemporary English, 3rd edition, 1995.
13
Hybrid Approach

An index of associations represents a simple
model,similar to what humans have on their
mindwhen they first think of a bird.
Having enough associations, one can create a
model with a considerable degree of accuracy.
RDF/OWL statements provide a means for
expressing additional (but very important)
information(e.g. there are birds that cannot
fly!)
We believe this is good enough for most
applications.

14
Hybrid Approach

It is important to keep track of how many times a
term is mentioned,because it affects its
descriptive power.
Example
U.S. Patent 6,989,179 Synthetic grass
sport surfaces, claims section
1. synthetic grass 10
2. playing surface 9
These terms represent the essence of what is
being described!

15
Hybrid Approach

However, this is only because we know what
synthetic grass and playing surface are!
? At some level, we need to have some
intrinsic, built-in knowledge-base of basic
concepts!
All the other concepts can then be described in
terms of these basic concepts.
Solution Conceptual indexers are equipped with
a knowledge base of basic terms.

16
Patent Model Conceptual Index

A patents Claims section is scanned and
processedby a conceptual indexer.
The result is a descriptive index, associated
with the patent (it size is approx. 1-5 of the
full text).
This index can be seen as an ordered list of the
patents WHAT associations (terms, phrases,
sentence fragments).
An entry in the descriptive index contains a
low-level concept,and the number of its
occurrences.

17
Patent Model RDF/OWL

For a different application, a different RDF/OWL
model needs to be devised.
For describing patents this model could be used
to capture explicitly stated information
Patent number and other numbers (? WHICH)
Inventor, examiner, attorney, (? WHO)
Date when the patent was filed (? WHEN)
Explicit references to similar patents (? WHICH)
etc
Each W can have multiple sub-categories that are
application-specific!

18
Patent Model Creation
Figure 2 Creation of a patent model Claims
section is processed by the conceptual indexer to
produce an index associated with the
patent. Additional information about the concept
is captured by RDF/OWL statements,into a
predefined, application-specific structure.
19
Patent Model Result
Figure 3 Patent model WHAT associations are
contained in a descriptive index. Other Ws are
expressed through RDF/OWL statements.
20
Patent model Big Picture

Descriptive indices are re-processed by the
Conceptual indexer,to form the system index.
Each entry in the system index retains links to
the descriptive indices it originates from,and
vice-versa.
This structure allows us to
Perform quick searches of the existing patents
Add/remove patents easily

21
Figure 4 Top-level scheme
22
Patent Model Implicit Links

Descriptions of similar concepts (patents)
usually make a frequent use of similar or even
same terms.
By determining overlapping terms we
createdynamic, implicit links among similar
concepts.
The number of such implicit links can be used to
express similarity among concepts.
The algorithm for determining the similarity
needs to be tweaked empirically.

23
Advantages Drawbacks

Advantages
Reduced complexity (a great reduction of direct
links between concepts)
Fast search and retrieval (as the result of
using indices)
Scalability
Drawbacks
Use of indices implies loss of precision

24
New Assignment

May 2006
Specific assignment
Find ways of extracting prior art from previously
filed patents.
Use the results to determine novel art in the
descriptions of patents that have yet to be
filed.
Generate new claims from newly found novel
art,to be submitted for new patents.

25
Determining prior and novel art

This work is currently done by experts.
Requires great knowledge on the subject, and
much time spent searching various databasesof
existing patents.
Both time-consuming and money-consuming!

26
Determining prior and novel art

Existing tools use statistical, data-mining
techniques.
Very efficient and fast algorithms available for
extracting relevant keyphrases.
But limited capabilities of establishing any
other than basic relationsamong concepts.
Usually undefined relations.
Problem How to determine more complex relations
among concepts to create claims (sentences)?
Solution Additional Natural Language Processing
(NLP) techniques required!

27
Proposed solution Stage 1

Statistical analysis seed extraction
Process the text with a statistical analysis
tool. (In our case KEA 3.0)
The output of such tools is an index of relevant
words/phrases keywords, associated with a
score.
Ideally, by using a conceptual indexer the
output would be a much more expressive
conceptual index.
Composite keywords are turned into a single
keyword and its descriptors.
Use empirical rules on word scores and composite
phrasesto determine the most relevant keywords,
and declare them to be the seeds for further
analysis.

Three stages 1. Statistical analysis seed
extraction 2. Construction of Claims
table 3. Creation of claims
28
Proposed solution Stage 1

Tools such as KEA require initial training and
tweakingto achieve maximum performance.
We trained KEA on a set of 12 relevant Suns
patents.
All the seeds extracted once are kept in a
database,to be at disposal later when needed.

29
Proposed solution Stage 2

Construction of Claims table
Text is processed once more to eliminate the
sentences not containing any of the seeds.
Each seed is assigned an entry in the claims
table, and its occurrences in the text marked
with a unique marker.
The text is then analyzed sentence by sentence.
Each sentence is decomposed into its functional
parts subject fragments, object fragments,
predicate fragments and different adverbial
fragments. (NLP the hardest part!)

30
0 Grass (WHAT) TYPE synthetic 1
Surface(s) (WHAT) TYPE 0, support,
playing are manufactured from s.g. panels
2 (predicate) 2 Panel(s) (WHAT)
TYPE 0 are placed side-by-side (predicate)
to form continuous support
surface1 (WHY) form continuous support
surface (predicate) are formed of grass
sections3 (predicate) are square OR
rectangular (predicate) have different color
tones (predicate) 3. Section(s) (WHAT)
TYPE 0 are cut from grass panels from
2 (predicate) are sewn OR glued OR attached
together (predicate) by a
hook and loop attachment (HOW)
in a criss crossed way (HOW)
to create a checkered pattern (WHY) crea
te checkered pattern (predicate) are assembled
with ribbons OR fibers (predicate)
lying in different directions (HOW) 4.
Ribbon(s) (WHAT) TYPE 2
lie in different directions (predicate) are
fibrillated (predicate) to remove the grain
directions (WHY) etc
Figure 5 U.S. Patent 6,989,179 Synthetic
grass sport surfaces, Claims table (part of)
31
Proposed solution Stage 3

Creating claims once the table is complete is
straightforward.
Here are some of the created claimsfrom the
previously shown table
A synthetic grass surface manufactured from
synthetic grass panels.
A synthetic grass playing surface as defined in
claim 1, wherein said synthetic grass panels are
placed side by sideto form a continuous support
surface.
A synthetic grass playing surface as defined in
claim 2, wherein said synthetic grass panels are
formed of synthetic grass sections.
Generated claims are compared against prior-art
databaseto select only those claims describing
potential novel art.

32
Problems

Major obstacles that needed to be overcome were
How to determine prior-art
Concept classifier
Sentence Template Tool (NLP)
How to determine functional parts of a sentence
Sentence Analyzer (NLP)

33
Figure 6 Top-level scheme
Patent description is processed by KEA and the
Sentence template tool to extract relevant
keywords (seeds).
Seeds are compared against prior art contained in
the database.
NLP processing
Claims table is created by analyzing sentences
containing seeds.
Generate new claims from the table.
34
Implementation of NLP parts

A subgroup of the research team began working on
the NLP tools.
After extensive research we adopted the Stanford
parseras the base tool for our work.
(http//nlp.stanford.edu)
The parser analyzes single sentences.Its output
is a tree structure showing types of words and
sentence fragments.
It can also determine basic grammar relations.
Our plan Use the first output to create the
template tool, and both outputs to determine
functional parts of a sentence.

35
Stanford parser an example
"One implementation of the snapshot copy process
provides a two-table approach." (ROOT (S
(NP (NP (CD One) (NN implementation))
(PP (IN of) (NP (DT the) (NN snapshot)
(NN copy) (NN process)))) (VP (VBZ provides)
(NP (DT a) (JJ two-table) (NN approach)))
(. .))) num(implementation-2,
One-1) nsubj(provides-8, implementation-2) det(pro
cess-7, the-4) nn(process-7, snapshot-5) nn(proces
s-7, copy-6) prep_of(implementation-2,
process-7) det(approach-11, a-9) amod(approach-11,
two-10) dobj(provides-8, approach-11)
Grammar relations can be used to determine main
functional parts of sentences.
36
Sentence Template Tool

MotivationIn a single patent document authors
often use the same sentence templates for
describing various patent parts.
This tool allows the users to specify the
sentence templates to find, and the parts they
want extracted.

37
Sentence Template Tool

Example from the US patent No. 6,804,755

FIG. 1 is a pictorial representation of a
distributed data processing system in which the
present invention may be implemented FIG. 2 is
a block diagram of a storage subsystem in
accordance with a preferred embodiment of the
present invention . . .FIG. 10 is an exemplary
block diagram of a multi-layer mapping table in
accordance with a preferred embodiment of the
present invention FIG. 11 is an exemplary
illustration of FlexRAID in accordance with the
preferred embodiment of the present invention
. . . etc.

There are more than 20 sentences of the same
structure in this patent description !

38
Sentence Template Tool

This sentence structure is typical for many
patent descriptions, when the inventor is
describing what the pictures represent.
Picture description sentences may contain
important novel concepts.
Novel patents from already filed patents can be
treated as prior art for the analyses of future
patents.

39
Sentence Template Tool

For example
"FIG. 10 is an exemplary block diagram of a
multi-layer mapping table in accordance with a
preferred embodiment of the present invention."
The query that would return the underlined
sentence partmight look like this
Fig is ltNounPhrasegtltPrepositiongtlt?Noun
Phrasegtlt.gt
We developed a comprehensive query syntax for
comparing parsed sentence trees, similar to the
one shown here.

40
Advantages

Frequently used queries can be stored for later
use.
If this tool is to be used primarily within a
company, people working for the company can be
given the guidelineson how to describe certain
parts of the patent to facilitate and make more
efficient the use of this tool.
The key advantage of this approach is that it is
much more accurate than statistical tools,
because it is controlled by the humans.

41
An Unfortunate Turn . . .

Unfortunately, the funding for the project was
not approved ?
Our goal now is to use the accumulated
knowledgein a somewhat different direction!

42
Future plans

Use the results returned by Google, refine them
by applying the semantic analysisand give
immediate answers to user queries!
Users should be able to use the query syntax to
specify not merely the keywords, but also
require the terms to appear in a specified
context, or ask specific questions.

43
Future plans

This kind of analysis requires an enormous amount
of CPU time, and should therefore be performed
only for specific searches
Patents
Legal acts and documents
Newspaper and other archives
Deep internet search
etc.

44
Future plans

Possible solution Each document should contain
an additional metadata section, which would
contain the parsed data from the plain text
contained in it.
That way, documents that change rarelyshould be
processed only once.
Additional storage costs should be outweighedby
the increased search performance.

45
Future plans

Our idea is still in the first stage of
development.
Further research is needed to explore the quality
and feasibility of the proposed solution.
However, we expect to produce some interesting
results ?.

46
CONCEPT MODELING
A Research Review
Ðorde Popovic Ognjen
Šcekic Veljko
Milutinovic popajce_at_ptt.yu
ogi_at_cg.yu
vm_at_etf.bg.ac.yu
Thank you !

Write a Comment

User Comments (0)

About PowerShow.com

CONCEPT MODELING: PowerPoint PPT Presentation