Title: OSHL: A Propositional Prover with Semantics for First-Order Logic
1OSHL A Propositional Prover with Semantics for
First-Order Logic
- David A. Plaisted
- UNC Chapel Hill
2Current theorem provers
- Largely syntactic
- Resolution or ME (tableau) based
- First-order provers are often poor on non-Horn
clauses - Rarely can solve hard problems
- Human interaction needed for hard problems
3Unit Resolution and General Resolution
- Resolution is efficient for Horn and renameable
Horn problems. - Resolution is efficient if the proof can be found
by UR resolution. - Hard problems tend not to be Horn, renameable
Horn, or UR resolvable. - Of 1697 TPTP problems provable by Otter in 30
seconds, 1042 can be proved by UR resolution.
4Unit Resolution and General Resolution
- Of the 1697 problems provable by Otter, only 297
were both non Horn and had rating greater than
zero. - Of these 297, at most 215 are not UR resolvable.
- Otter can do hundreds of thousands of resolutions
in 30 seconds on this machine. - Resolution is inefficient on hard, non UR
resolvable problems. - Need for new approaches.
5How do humans prove theorems?
- Semantics
- Case analysis
- Sequential search through space of possible
structures - Focus on the theorem
6Systematic methods can now routinely solve
verification problems with thousands or tens of
thousands of variables, while local search
methods can solve hard random 3SAT problems with
millions of variables. (from a conference
announcement)
7DPLL Example
p,r,?p,?q,r,p,?r
pT
pF
T,r,?T,?q,r,T,?r
F,r,?F,?q,r,F,?r
SIMPLIFY
SIMPLIFY
?q,r
r,?r
SIMPLIFY
8Hyper Linking
9- Eliminating Duplication with the Hyper-Linking
Strategy, Shie-Jue Lee and David A. Plaisted,
Journal of Automated Reasoning 9 (1992) 25-42.
10Definition Detection
11 - Replacement Rules with Definition Detection,
David A. Plaisted and Yunshan Zhu, in Caferra and
Salzer, eds., Automated Deduction in Classical
and Non-Classical Logics, LNAI 1761 (1998) 80-94.
12More DefinitionsS1 ? S2 ? ? SnSn ? Sn-1 ? ?
S1Left Associative
n OSHL OSHL OSHL Otter Otter Otter Vampire Vampire Vampire E-Setheo DCTP
n time Gen Kept time Gen Kept time Gen Kept time time
2 0.175 41 36 600 100303 24712 0.00 103 90 0.0 0.01
3 0.678 85 80 600 66753 31496 70.1 3606742 50382 0.3 300
4 2.107 141 136 600 47219 22119 300 25898955 68385 0.3 300
5 5.317 207 202 600 46054 20941 300 25298293 67864 2.6 300
6 12.02 283 278 600 60247 22923 300 25612105 68457 300 300
7 38.97 7 3 600 56299 19660 300 25641650 67977 300 300
8 77.94 7 3 600 56352 18932 300 25863117 68542 300 300
13More Definitions
- Similar results for other definitions
- S1 ? S2 ? ? SnSn ? Sn-1 ? ? S1, left side
left associated, right side right associated - S1 ? S2 ? ? Sn S1 ? S2 ? ? Sn ? S1 ? S2 ?
? Sn, both sides associated to the left - S1 ? S2 ? ? Sn S1 ? S2 ? ? Sn ? S1 ? S2 ?
? Sn, left side left associated, right side right
associated - Similar results for n
14Later propositional strategies
- Billons disconnection calculus, derived from
hyper-linking - Disconnection calculus theorem prover (DCTP),
derived from Billons work - FDPLL
15Performance of DCTP on TPTP, 2003
- DCTP 1.3 first in EPS and EPR (largely
propositional) - DCTP 10.2p third in FNE (first-order, no
equality) solving same number as best provers - DCTP 10.2p fourth in FOF and FEQ (all first-order
formulae, and formulae with equality) - DCTP 1.3 is a single strategy prover.
16Strategy Selection in E
17Strategy Selection
- Schulz, Stephan, E-A Brainiac Theorem Prover,
Journal of AI Communications 15(2/3)111-126,
2002.
18Strategy Selection
- The Vampire kernel provides a fairly large number
of features for strategy selection. The most
important ones are - Choice of the main saturation procedure (i)
OTTER loop, with or without the Limited Resource
Strategy, (ii) DISCOUNT loop. - A variety of optional simplifications.
- Parameterised reduction orderings.
- A number of built-in literal selection functions
and different modes of comparing literals. - Age-weight ratio that specifies how strongly
lighter clauses are preferred for inference
selection. - Set-of-support strategy.
19Strategy Selection
- The automatic mode of Vampire 7.0 is derived from
extensive experimental data obtained on problems
from TPTP v2.6.0. Input problems are classified
taking into account simple syntactic properties,
such as being Horn or non-Horn, presence of
equality, etc. Additionally, we take into account
the presence of some important kinds of axioms,
such as set theory axioms, associativity and
commutativity. Every class of problems is
assigned a fixed schedule consisting of a number
of kernel strategies called one by one with
different time limits.
20Various Provers
- PTTP solved 999 of 2200 tested problems.
- Otter proved 1595.
- leanCoP proved 745.
- Source
- Jens Otten and Wolfgang Bibel.leanCoP Lean
Connection-Based Theorem Proving. Journal of
Symbolic Computation, Volume 36, pages 139-161.
Elsevier Science, 2003. - Vampire 6.0 3286 refutations of 7267 problems,
more solved
21DCTP Strategy Selection
- DCTP 1.31 has been implemented as a monolithic
system in the Bigloo dialect of the Scheme
language. - DCTP 1.31 is a single strategy prover.
Individual strategies are started by DCTP 10.21p
using the schedule based resource allocation
scheme known from the E-SETHEO system. Of course,
different schedules have been precomputed for the
syntactic problem classes. The problem classes
are more or less identical with the sub-classes
of the competition organisers. - In CASC-J2 DCTP 10.21p performed substantially
better.
22Semantics
- Gelernter 1959 Geometry Theorem Prover
- Adapt semantics to clause form
- An interpretation (semantics) I is an assignment
of truth values to literals so that I assigns
opposite truth values to L and ?L for atoms L. - The literals L and ?L are said to be
complementary.
23Semantics
-
- We write I C (I satisfies C) to indicate
that semantics I makes the clause C true. - If C is a ground clause then I satisfies C if I
satisfies at least one of its literals. - Otherwise I satisfies C if I satisfies all ground
instances D of C. (Herbrand interpretations.) - If I does not satisfy C then we say I falsifies C.
24Example Semantics
- Specify I by interpreting symbols
- Interpret predicate p(x,y) as x y
- Interpret function f(x,y) as x y
- Interpret a as 1, b as 2, c as 3
- Then p(f(a,b),c) interprets to TRUE but p(a,b)
interprets to FALSE - Thus I satisfies p(f(a,b),c) but I falsifies
p(a,b)
25Obtaining Semantics
- Humans using mathematical knowledge
- Automatic methods (finite models)
- Trivial semantics
26Goal of OSHL
- First-order logic
- Clause form
- Propositional efficiency
- Semantics
- Requires ground decidability
27Structure of OSHL
- Goal sensitivity if semantics chosen properly
- Choose initial semantics to satisfy axioms
- Use of natural semantics
- For group theory problems, can specify a group
- Sequential search through possible
interpretations - Thus similar to Davis and Putnams method
- Propositional Efficiency
- Constructs a semantic tree
28Ordered Semantic Hyperlinking (Oshl)
- Reduce first-order logic problem to propositional
problem - Imports propositional efficiency into first-order
logic - The algorithm
- Imposes an ordering on clauses
- Progresses by generating instances and refining
interpretations -
-
-
29OSHL
- I0 is specified by the user
- Di is chosen minimal so that Ii falsifies Di
- Di is an instance of a clause in S
- Ii is chosen minimal so that Ii satisfies Dj for
all j lt i - Let Ti be D0,D1, , Di-1.
- Ii falsifies Di but satisfies Ti
- When Ti is unsatisfiable OSHL stops and reports
that S is unsatisfiable.
30Clause Ordering
- Llin
- P(f(x),g(x,c))lin 6
- Ldag
- P(f(x),f(x))dag 4
- Extend to clauses additively, ignoring negations
- OSHL chooses Di minimal in such an ordering
31Alternate version of OSHL
- Want to keep the size of T small
- Do this by throwing away clauses of T subject to
the condition - The minimal model of Ti1 is larger than the
minimal model of Ti for all i. - This guarantees completeness.
- Leads to a formulation using sequences of clauses
and resolutions between clauses.
32Rules of OSHL Start with empty sequence (C1,C2,
, Cn), D minimal contradict I, I minimal
model (C1,C2, , Cn,D) (C1,C2, , Cn, D), Cn not
needed (C1,C2, , Cn-1,D) (C1,C2, , Cn,D), max
resolution possible (C1,C2, , Cn-1,res(Cn,D,L)) P
roof if empty clause derived
33-
Propositional Example (?p I0 p) () (-p1,
-p2, -p3) I0-p3 (-p1, -p2, -p3, -p4, -p5,
-p6) I0 -p3,-p6 (, , -p7) I0
-p3,-p6,-p7 (, , -p7, p3, p7) (,
-p4, -p5, -p6, p3) (-p1, -p2,
-p3,p3) (-p1, -p2 ) I0 -p2
34Semantics
- Trivial semantics
- Positive Choose I0 to falsify all atoms, first
D is all positive. Forward chaining. - Negative Choose I0 to satisfy all atoms, first
D is all negative. Backward chaining. - Natural semantics I0 chosen by user
35Semantics Ordering
- ltt a well founded ordering on atoms, extended to
literals - Extend ltt to interpretations as follows
- I and J agree on L if they interpret L the same
- Suppose I0 is given
- I ltt J if I and J are not identical, A is the
minimal atom on which they disagree, and I agrees
with I0 on A
36Semantics Ordering
- ltt is not a well founded ordering on
interpretations. But ltt minimal models of T
always exist. - Ii is always chosen as the ltt minimal model of T.
- Theorem Such Ii always has the form I0L1 Lm
where Li are literals of clauses of T. - I0L1 Lm L iff at(L) ? at(L1 Ln) and
I0 L, or for some i L Li.
-
-
37Instantiation Example
- Suppose I0 interprets arithmetic in the standard
way. - Suppose S contains axioms of arithmetic and the
clause X3?5. - Then the first instance chosen could be 23?5,
(11)3?5, (3-1)3?5 et cetera but it could not
be 33?5, nor could it be an instance of an axiom.
38Instantiation Example
- Suppose the first instance chosen is 23?5.
- Then I1 is I023?5, which interprets all atoms
as in standard arithmetic except that the
statement 23?5 is true. - The next instance chosen might be 23-1 5-1 ?
23 5. This contradicts I1. It is an instance
of the clause X-1 Y-1 ? X Y and corresponds
to generating the subgoal 23-1 5-1.
39U Rules
- Choose clauses instances to match existing
literals. Look for a contradiction. - Basic clauses and U clauses
- Basic clauses are used in three rules given
- Sequence can also have U clauses on the end
- U clauses have a selected literal
- In basic clauses the max. lit. is selected
- In U clauses other literals can be selected.
- Significant performance enhancement.
40U Rules
- UR resolution Find C in S having a ground UR
resolvent with selected literals. Let C' be the
corresponding instance of C. Add C' to the end
of the sequence of clauses and select the UR
resolvent from it. - Filtering Find C in S such that NIL is
derivable by unit resolution from selected
literals and C. Let C' be the corresponding
instance of C. Add C' to the end of the sequence
of clauses. Select a literal from it.
41U Rules
- Case Analysis Find C in S and L in C such that
L has all the variables of C. Find instance L'
of L that is complementary to a selected literal
of some clause in the sequence. Let C' be the
corresponding instance of C. Add C' to the end
of the sequence and select a literal from it. - This rule expands definitions.
42Examples of U Rules
- UR resolution Given the sequence (s(a), p(b),
t(a), q(b)) and the clause not p(X), not q(X),
r(X) create the sequence (s(a), p(b), t(a),
q(b), not p(b), not q(b), r(b) ) - Filtering Given the sequence (s(a), p(b),
t(a), q(b)) and the clause not p(X), not q(X)
create the sequence (s(a), p(b), t(a), q(b),
not p(b), not q(b) )
43Examples of U Rules
- Case analysis Given the sequence (s(a), p(b),
t(a), q(b)) and the clause not q(X), r(X),
s(X) create the sequence (s(a), p(b), t(a),
q(b), not q(b), r(b), s(b) )
44Example Proof Using U Rules
- All positive semantics
- Clauses
- A1. ?X?Y, ?Y?X, XY
- A2. ?Z?X, ?X?Y, Z?Y
- A3. g(X,Y)?X, X?Y
- A4. ?g(X,Y)?Y, X?Y
- A5. ?Z?X, Z?X ? Y
- A6. ?Z?Y, Z?X ? Y
- A7. ?Z?X ? Y, Z?X, Z?Y
- T. ?A ? B B ? A
45Example Proof Using U Rules
- 1. ?A ? B B ? A (T)
- 2. ?A ? B ? B ? A, ?B ? A ? A ? B, A ? B B ?
A (Case Analysis, A1) - 3. ?g(A ? B, B ? A) ? B ? A, A ? B ? B ? A (UR
resolution, A4) - 4. g(A ? B, B ? A) ? B ? A, ?g() ? B (UR
resolution, A5) - 5. g(A ? B, B ? A) ? B ? A, ?g() ? A (UR
resolution, A6) - 6. g() ? B, g() ? A, ?g() ? A ? B (UR
resolution, A7) - 7. A ? B ? B ? A, g() ? A ? B (Filtering, A3)
46Example Proof Using U Rules
- 1. ?A ? B B ? A
- 2. ?A ? B ? B ? A, ?B ? A ? A ? B, A ? B B ?
A (Case Analysis) - 3. ?g(A ? B, B ? A) ? B ? A, A ? B ? B ? A (UR
resolution) - 4. g(A ? B, B ? A) ? B ? A, ?g() ? B (UR
resolution) - 5. g(A ? B, B ? A) ? B ? A, ?g() ? A (UR
resolution) - 8. g() ? B, g() ? A, A ? B ? B ? A,
(Resolution of 6. and 7.)
47Example Proof Using U Rules
- 1. ?A ? B B ? A
- 2. ?A ? B ? B ? A, ?B ? A ? A ? B, A ? B B ?
A (Case Analysis) - 3. ?g(A ? B, B ? A) ? B ? A, A ? B ? B ? A (UR
resolution) - 4. g(A ? B, B ? A) ? B ? A, ?g() ? B (UR
resolution) - 9. g(A ? B, B ? A) ? B ? A, g() ? B, A ? B ? B
? A (Resolution of 8. and 5.)
48Example Proof Using U Rules
- 1. ?A ? B B ? A
- 2. ?A ? B ? B ? A, ?B ? A ? A ? B, A ? B B ?
A (Case Analysis) - 3. ?g(A ? B, B ? A) ? B ? A, A ? B ? B ? A (UR
resolution) - 10. g(A ? B, B ? A) ? B ? A (Resolution of 9.
and 4.)
49Example Proof Using U Rules
- 1. ?A ? B B ? A
- 2. ?A ? B ? B ? A, ?B ? A ? A ? B, A ? B B ?
A (Case Analysis) - 11. A ? B ? B ? A (Resolution of 10. and 3.)
50Example Proof Using U Rules
- 1. ?A ? B B ? A
- 12. ?B ? A ? A ? B, A ? B B ? A (Resolution
of 11 and 2)
Now the other half of the proof will be done.
Note that there is only one ascending sequence of
clauses constructed by OSHL and we are only
indicating part of it.
51Implementation Results
- Slower implementation speed of OSHL
- Uniform strategy versus strategy selection
- The choice of Otter
- Influence of U rules on an earlier version
- None 233 proofs in 30 seconds on TPTP problems
- Using them 900 proofs in 30 seconds
- All results for trivial semantics
52Implementation Results
- OSHL has no special data structures.
- Implemented in OCaML
- No special equality methods
- Semantics was implemented but frequently only
trivial semantics was used. - Thus significant performance improvements are
possible.
53Implementation Results
P R O B S Otter Proofs Otter Proofs Otter Proofs Otter Proofs Otter Proofs OSHL Proofs OSHL Proofs OSHL Proofs OSHL Proofs OSHL Proofs
P R O B S All H O R N Non-Horn Non-Horn Non-Horn All H O R N Non-Horn Non-Horn Non-Horn
P R O B S All H O R N All R 0 R gt 0 All H O R N All R 0 R gt 0
All 4417 1697 764 933 297 636 1027 311 716 265 451
FLD 143 28 0 28 11 17 68 0 68 47 21
SET 604 168 2 166 40 126 211 2 209 93 116
Total Number of Proofs, 30 seconds
54Implementation Results
- Shows that a prover working entirely at the
ground level can come into the range of
performance of a respectable resolution theorem
prover. - DCTP and FDPLL probably perform better than OSHL.
- DCTP and FDPLL do not work entirely at the ground
level and do not use natural semantics.
55Implementation Results
All Horn Non-Horn R0 Rgt0 Non Horn, Rgt0
Clauses, Otter 3483094 215290 3267804 915737 2567357 2460992
Clauses, OSHL 17212 8110 9102 14888 2324 2216
Ratio 202 26.6 359 61.5 1105 1111
For problems for which both provers found
proofs in 30 seconds..
56Implementation Results
- In a given number of inferences OSHL finds more
proofs than Otter for non Horn problems
57Summary of theoretical results about semantics
- Several results show that OSHL with an
appropriate semantics is implicitly performing
unifications. Thus the choice of semantics has a
profound effect on the operation of OSHL. - OSHL has some features of propositional methods
and some features of unification-based methods. - Semantics might significantly improve OSHL.
58Number of Clauses Generated
- Problem clauses, Otter
Oshlsemantics - GRP005-1 57 3
- GRP006-1 62 7
- GRO007-1 85 22
- GRP018-1 266 16
- GRP019-1 267 15
- GRP020-1 265 18
- GRP021-1 264 19
- GRP023-1 79 22
- GRP032-3 83 14
- GRP034-3 141 30
- GRP034-4 222 6
- GRP042-2 21 15
- GRP043-2 80 81
- GRP136-1 0 8
- GRP137-1 0 8
59Lifting Semantics