% "A Complete Method for Program Specialization Based on Unfolding"
% S.-H. Nienhuys-Cheng, R. de Wolf.
% For ECAI'96 (compile with latex209).
\documentstyle[epsfig,times]{ecai96}
\begin{document}
\input psfig
\newtheorem{define}{Definition}
\newtheorem{proposition}{Proposition}
\newtheorem{corollary}{Corollary}
\newenvironment{definition}
{\begin{define}\rm}
{{\hfill $\Diamond$}
\end{define}}
\newcommand{\lproof}{$\Leftarrow$:}
\newcommand{\rproof}{$\Rightarrow$:}
% like enumerate, but with less vertical space between items
\newenvironment{shortenum}
{\begin{enumerate}
\setlength{\topsep}{0in}
\setlength{\topskip}{0in}
\setlength{\parskip}{0in}
\setlength{\parsep}{0in}
\setlength{\partopsep}{0in}
\setlength{\itemsep}{0in}
\vspace{-1ex}}
{\vspace{-1ex}
\end{enumerate}}
\newcounter{vectory}
\newcommand{\sldstep}[3]{
\setcounter{vectory}{#1}
\addtocounter{vectory}{-10}
\put(0,#1){\makebox(0,0){#2}}
\put(0,\value{vectory}){\vector(0,-1){40}}
\put(100,#1){\makebox(0,0)[l]{#3}}
\put(100,\value{vectory}){\vector(-2,-1){80}}
}
\newcommand{\nodea}[2]{
\put(0,60){\makebox(0,0){#1}}
\put(0,50){\vector(0,-1){50}}
\put(-5,25){\makebox(0,0){#2}}
}
\newcommand{\nodeb}[3]{
\put(0,60){\makebox(0,0){#1}}
\put(0,50){\vector(-1,-1){50}}
\put(-35,25){\makebox(0,0){#2}}
\put(0,50){\vector(1,-1){50}}
\put(35,25){\makebox(0,0){#3}}
}
\ecai{438} %Set start page number
\title{A Complete Method for\\ Program Specialization Based on Unfolding}
\author{Shan-Hwei Nienhuys-Cheng\institute{Department of Computer Science,
H4-19, Erasmus University of Rotterdam,
P.O.~Box 1738, 3000 DR Rotterdam, the Netherlands,
{\tt \{cheng,bidewolf\}@cs.few.eur.nl}.}
\and
Ronald de Wolf}
\sectionfoot{Machine Learning and Knowledge Acquisition}
\authorfoot{S.-H. Nienhuys-Cheng and R. de Wolf}
\maketitle
\bibliographystyle{ecai96}
\begin{abstract}
We discuss the problem of specializing a definite program with
respect to sets of positive and negative examples.
This problem is relevant in the field of inductive learning.
To solve such specialization problems, we define UDS-specialization,
based on unfolding, clause deletion, and subsumption,
and prove that any (solvable) specialization problem
has a UDS-specialization as a solution.
\end{abstract}
\medskip
{\noindent\bf Keywords:}
Inductive logic programming, Machine learning, Induction.
\section{INTRODUCTION}
This article discusses the {\em specialization problem}, which concerns
the specialization of a definite program (i.e., a {\sc Prolog} program)
with respect to sets of positive and negative {\em examples}.
These examples are usually expressed as ground atoms.
Suppose $E^{+}$ is a set of positive examples, and $E^{-}$ a set of
negative examples. These sets may be infinite.
One of the problems that researchers in Inductive Logic Programming (ILP) are
interested in, is to find a definite program which is {\em correct} with
respect to the examples.
That is, the problem is to find a program that implies all members of $E^{+}$,
and none of the members of $E^{-}$.
Our hope is that such a program will have captured some of the structure
and relationships among the examples. If so, the program can be used for
predicting truth-values of ground atoms outside the set of examples.
Thus finding a correct program is a form of inductive learning.
For example, let $E^{+}$ contain $Odd(s(0))$, $Odd(s^{3}(0))$,
$Odd(s^{5}(0))$, \ldots and let $E^{-}$ contain $Odd(0)$, $Odd(s^{2}(0))$,
$Odd(s^{4}(0))$, \ldots Then $\{Odd(s(0)),
(Odd(s^{2}(x))\leftarrow Odd(x))\}$ would be a correct program.
Often, it is the case that we initially have a program $T$ that is
sufficiently strong with respect to the examples
(i.e., $T\models E^{+}$ and possibly $T\models A$ for some $A\in E^{-}$).
The problem is then to construct a new program $T'$ from $T$,
such that $T'$ is correct w.r.t.\ the examples.
$T'$ will be called a {\em specialization} of $T$.
A natural way to specialize $T$ is, first, to replace a clause in $T$ by all
its resolvents upon some body-atom in this clause.
Constructing these resolvents is called {\em unfolding}~\cite{kn:tamaki&sato}.
Currently, there seems to be an increasing interest in unfolding as a
specialization method
\cite{kn:bostrom&idestam,kn:bostrom2,kn:bostrom1,kn:alexin&gyimothy&bostrom}.
The new program obtained after unfolding a clause in $T$,
is clearly implied by $T$.
The function of the replaced clause is taken over by the set of resolvents
produced by unfolding.
We can then, secondly, delete some new clauses from the program that have
to do with the negative examples, thus specializing the program.
Hopefully, after repeating these two steps a number of times, we
can get rid of all negative examples.
This method was introduced in~\cite{kn:bostrom&idestam}.
For simplicity, let all examples be ground instances of the atom
$P(x_{1},\ldots,x_{n})$, for some predicate $P$. The motivation for the
method described above, is the fact that it can be used to prune
negative examples from the SLD-tree for $T\cup\{\leftarrow
P(x_{1},\ldots,x_{n})\}$.\footnote{See~\cite{kn:lloyd87} for the definitions
of SLD-resolution and SLD-trees.}
We will illustrate this by an example.
Consider the program $T$, consisting of the following clauses:
\begin{quote}
$C_{1}=P(x,y)\leftarrow Q(x,y)$\\
$C_{2}=Q(b,b)\leftarrow Q(a,a)$\\
$C_{3}=Q(a,a)$
\end{quote}
and $E^{+}=\{P(b,b)\}$, $E^{-}=\{P(a,a)\}$.
The SLD-tree for $T\cup\{\leftarrow P(x,y)\}$ is shown on the left of
figure~\ref{threetrees}.
The labels on the branches correspond to the input clauses used, the
computed answer of each success branch is shown below that branch.
For instance, in the rightmost branch, first $C_{1}$ and then $C_{3}$
are used as input clause, which leads to the computed answer
$\{x/a,y/a\}$, meaning that $T\models P(a,a)$.
We have indicated the computed answer corresponding to the positive example
with a `$+$', for the negative example with a `$-$'.
\begin{figure}[hbt]
\centering
\setlength{\unitlength}{0.22mm}
\begin{picture}(290,235)
{\scriptsize
\put(0,0){
\begin{picture}(115,235)
\put(60,165){\nodea{$\leftarrow P(x,y)$}{1}}
\put(60,95){\nodeb{$\leftarrow Q(x,y)$}{2}{3}}
\put(115,85){\makebox(0,0){$\Box$}}
\put(100,70){\makebox(0,0){$\{x/a,y/a\},-$}}
\put(0,25){\nodea{$\leftarrow Q(a,a)$}{3}}
\put(0,15){\makebox(0,0){$\Box$}}
\put(0,0){\makebox(0,0){$\{x/b,y/b\},+$}}
\end{picture}}
\put(170,0){
\begin{picture}(120,230)
\put(60,165){\nodeb{$\leftarrow P(x,y)$}{1,2}{1,3}}
\put(120,155){\makebox(0,0){$\Box$}}
\put(120,140){\makebox(0,0){$\{x/a,y/a\},-$}}
\put(0,95){\nodea{$\leftarrow Q(a,a)$}{3}}
\put(0,85){\makebox(0,0){$\Box$}}
\put(10,70){\makebox(0,0){$\{x/b,y/b\},+$}}
\end{picture}}
%\put(340,0){
%\begin{picture}(120,230)
%\put(60,165){\nodea{$\leftarrow P(x,y)$}{\hspace{-2mm}1,2}}
%\put(60,95){\nodea{$\leftarrow Q(a,a)$}{3}}
%\put(60,85){\makebox(0,0){$\Box$}}
%\put(60,70){\makebox(0,0){$\{x/b,y/b\},+$}}
%\end{picture}}
}
\end{picture}
\caption{The SLD-trees for $T$ (left) and $T'$ (right)}
\label{threetrees}
\end{figure}
\noindent $P(a,a)$ is a negative example, so we would like to remove this by weakening
the program. This could be done by deleting $C_{1}$ or $C_{3}$ from $T$.
However, this would also make the positive example $P(b,b)$ no longer derivable.
Another way to specialize is, first, to unfold $C_{1}$ upon $Q(x,y)$.
The following $C_{1,2}$ and $C_{1,3}$ are the two clauses produced by
unfolding $C_{1}$.
\begin{quote}
$C_{1,2}=P(b,b)\leftarrow Q(a,a)$ (resolvent of $C_{1}$ and $C_{2}$)\\
$C_{1,3}=P(a,a)$ (resolvent of $C_{1}$ and $C_{3}$)
\end{quote}
Now we replace the unfolded clause $C_{1}$ by its resolvents $C_{1,2}$ and
$C_{1,3}$. This results in $T'=\{C_{2},C_{3},C_{1,2},C_{1,3}\}$.
The SLD-tree for $T'\cup\{\leftarrow P(x,y)\}$ is shown on the right of
figure~\ref{threetrees}. In this tree, the negative example is directly
connected to the root, via the branch that uses $C_{1,3}$.
Now the negative example can be pruned from the tree by deleting $C_{1,3}$
from $T'$, which does not affect the positive example.
Then we obtain $T''=\{C_{2},C_{3},C_{1,2}\}$, which is correct w.r.t.\
$E^{+}$ and $E^{-}$.
The SLD-tree for $T''\cup\{\leftarrow P(x,y)\}$ is simply the tree for $T'$,
after the rightmost branch has been pruned.
The idea behind this method is the following:
\begin{shortenum}
\item Unfolding removes some internal nodes from the SLD-tree, for instance,
the internal node $\leftarrow Q(x,y)$ in the tree on the left of
figure~\ref{threetrees}.
This tends to separate the positive from the negative examples, and also
brings them closer to the root of the tree.
\item If a negative example hangs directly from the root, and its input
clause $C$ is not used elsewhere in the tree for a positive example, then
the program can be specialized by deleting $C$.
\end{shortenum}
In~\cite{kn:bostrom&idestam}, the algorithm {\sc Spectre} is presented, which
implements this specialization technique, using an information-gain
heuristic to guide the search.
They also present some experimental results that are very encouraging.
{\sc Spectre} generally outperforms an alternative algorithm,
which is based on {\em covering} the examples rather than on separating
the positive from the negative examples.
In their experiments, the examples are taken from real-world domains such
as data on when to allow the Space Shuttle to land automatically,
and Congressional voting records. The programs produced by
{\sc Spectre} give a higher accuracy on the test-set, and contain much
less clauses than the programs produced by the covering algorithm.
Similar experiments are described in~\cite{kn:bostrom2}.
In~\cite{kn:bostrom1}, the algorithm {\sc Spectre ii} is presented, which
overcomes some difficulties of {\sc Spectre} concerning recursive clauses,
and which can be applied to multiple-predicate learning (i.e., then
the members of $E^{+}$ and $E^{-}$ need no longer be ground instances
of the same atom $P(x_{1},\ldots,x_{n})$).
\cite{kn:bostrom1} describes a special case in which {\sc Spectre ii} is
complete, and again contains some experimental results.
As we have seen, the use of unfolding as a specialization tool can be
motivated by looking at SLD-trees and the SLD-refutations those trees
contain. However, the perspective of SLD-trees also has a disadvantage.
It can be seen from some examples that we give later on, that unfolding
and clause deletion by itself is not sufficient for a complete
specialization method---some specialization problems cannot be
solved in this way.
Yet by looking at the SLD-tree, it is not clear what should be added to
unfolding and clause deletion to obtain a complete method.
Therefore, in this paper we look at program specialization from a more
general perspective, namely the perspective of SLD-derivations (rather
than refutations) and the {\em Subsumption Theorem}.
From this perspective, it becomes much clearer which operations
really matter for program specialization, and which do not.
We use the Subsumption Theorem for SLD-resolution to show that {\em
subsumption} is what we need to make our specialization technique complete.
Thus we define UDS-specialization here, which is a technique based on
{\bf U}nfolding, clause {\bf D}eletion, and {\bf S}ubsumption.
We prove that UDS-specialization is complete: every solvable specialization
problem has a UDS-spe\-ci\-a\-li\-za\-tion as a solution.
The paper is organized as follows.
In the next section we formally introduce the specialization problem.
Then we define unfolding, and examine some of its properties.
In Section~\ref{secuds} we define UDS-specialization and prove it to be
complete. Finally, in Section~\ref{secinvres} we discuss the relation
of duality between program specialization by unfolding, and program
generalization by {\em inverse resolution}.
A more detailed discussion and proofs concerning these topics
can be found in~\cite{kn:cheng&wolf:wrunf}.
\section{THE SPECIALIZATION PROBLEM}
\begin{definition}
Let $T$ be a definite program, and $E^{+}$ and $E^{-}$ sets of ground atoms.
We say $T$ is {\em sufficiently strong w.r.t.~$E^{+}$ and $E^{-}$} if
$T\models A$ for all $A\in E^{+}$.
$T$ is {\em correct w.r.t.~$E^{+}$ and $E^{-}$} if $T\models A$
for all $A\in E^{+}$ and $T\not\models A$ for all $A\in E^{-}$.
\end{definition}
\noindent A correct program is a special case of a sufficiently strong program.
We let $M_{T}$ denote the least Herbrand model of a definite program
$T$.
For ground atoms $A$, we have $A\in M_{T}$
iff $T\models A$, so $T$ is correct w.r.t.\ $E^{+}$ and $E^{-}$ iff
$E^{+}\subseteq M_{T}$ and $E^{-}\cap M_{T}=\emptyset$.
The specialization problem is the problem of specializing a sufficiently
strong (probably {\em too} strong) program to a correct program.
The following is the statement of the specialization problem in terms
of least Herbrand models, as given in~\cite{kn:bostrom&idestam}.
This definition is also used by others in ILP.
\begin{quote}
{\bf Given}: a definite program $T$ and two disjoint sets of ground atoms
$E^{+}$ and $E^{-}$ (positive and negative examples), such that $T$ is
sufficiently strong w.r.t.\ $E^{+}$ and $E^{-}$.%
\footnote{Note that {\em background knowledge} can be included in $T$, so
we do not need to discuss the use of background knowledge separately.
It is assumed that $E^{+}$ and $E^{-}$ contain only {\em ground} instances
of atoms. The examples may be instances of different predicates (we could
for example have $E^{+}=\{P(a),Q(a,b),R(f(b)),\ldots\}$),
so multiple-predicate learning is incorporated.}\\
{\bf Find}: a definite program $T'$, called a {\em specialization} of
$T$ w.r.t.\ $E^{+}$ and $E^{-}$, such that $M_{T'}\subseteq M_{T}$,
$E^{+}\subseteq M_{T'}$ and $M_{T'}\cap E^{-}=\emptyset$.
\end{quote}
We will make two modifications to this statement of the specialization
problem. Firstly, if $E^{+}$ can only be finite, a solution always
exists (simply put $T'=E^{+}$).
But if we allow infinite sets $E^{+}$ and $E^{-}$, then a correct program
does not always exist.
For instance, suppose our language contains a function symbol of arity $\geq 1$.
Then the set $\cal A$ of ground atoms in the language will be countably
infinite. But every (possibly infinite) subset of ground atoms represents an
Herbrand interpretation. Thus the set of all possible Herbrand
interpretations is the power set of $\cal A$, which will then be
{\em uncountable}. Because the set of definite
programs (i.e., the set of {\em finite} sets of definite clauses) is only
countably infinite, there are much more possible Herbrand interpretations
than there are definite programs. Moreover, every definite program
has only one least Herbrand model.
Thus there is an Herbrand interpretation $I$ (in fact, there are uncountably
many such interpretations) for which there is no program with $I$ as
least Herbrand model.
Now if we put $E^{+}=I$ and $E^{-}={\cal A}\backslash I$,
then there is no program that is correct w.r.t.\ $E^{+}$ and $E^{-}$.
Allowing infinite examples is theoretically important.
Recall the example concerning $Odd$ from the Introduction.
By including all instances of $Odd$ in the examples, we would be
sure that a correct program for $Odd$ is learned.
(This is related to Shapiro's~\cite{kn:shapiro81} use of the notions of an
infinite {\em enumeration} of examples, and {\em convergence in the limit}).
Accordingly, in order to be able to include the possibility of infinite sets
of examples, we have to presuppose the {\em existence} of a correct
specialization $T'$.
A second modification concerns the use of least Herbrand models.
Suppose we have $T=\{P(a), (P(x)\leftarrow P(f(x)))\}$ and
$T'=\{P(x)\leftarrow P(g(x))\}$.
Then $M_{T'}=\emptyset\subset\{P(a)\}=M_{T}$.
Yet $T$ and $T'$ have not very much to do with each other,
since neither $T\models T'$ nor $T'\models T$.
It is doubtful whether $T'$ should always be called a
{\em specialization} of $T$ if only $M_{T'}\subseteq M_{T}$ holds.
In our opinion, it is more appropriate to define a specialization in terms
of logical implication (i.e., $T'$ is a specialization of $T$ iff
$T\models T'$), than in terms of least Herbrand models.
These two modifications suggest a restatement of the (solvable)
specialization problem, as follows:
\begin{quote}
{\bf Given}: a definite program $T$ and two disjoint sets of ground atoms
$E^{+}$ and $E^{-}$ (positive and negative examples), such that $T$ is
sufficiently strong w.r.t.\ $E^{+}$ and $E^{-}$,
and suppose there exists a program $T'$ such that $T\models T'$ and
$T'$ is correct w.r.t.\ $E^{+}$ and $E^{-}$.\\
{\bf Find}: one such a $T'$.
\end{quote}
\section{UNFOLDING}\label{secunf}
In this section, we define unfolding,
which will be used in the next section to solve specialization problems.
\begin{definition}
Let $T$ be a definite program, $C=A\leftarrow B_{1},\ldots,\linebreak[1]
B_{n}$ a definite program clause in $T$, and $B_{i}$ the $i$-th atom in
the body of $C$. Let $\{C_{1},\ldots,C_{m}\}$ be the set of clauses in $T$
whose head can be unified with $B_{i}$.
Then {\em unfolding $C$ upon $B_{i}$ in $T$} means constructing the set
$U_{C,i}=\{D_{1},\ldots,\linebreak[1]D_{m}\}$, where each $D_{j}$ is the resolvent of
$C_{j}$ and $C$, using $B_{i}$ and the head of $C_{j}$ as the literals
resolved upon.
\end{definition}
\noindent Note that $U_{C,i}$ may be the empty set. This is the case if
there is no program clause whose head unifies with the $i$-th atom in the
body of $C$.
Note also that a unit clause cannot be unfolded, since it has no body-atoms.
For an example of unfolding, let $T$ consist of the following clauses:
\begin{quote}
$C_{1}=P(f(x))\leftarrow P(x),Q(x)$\\
$C_{2}=Q(x)\leftarrow R(x,a)$\\
$C_{3}=P(f(a))$\\
$C_{4}=Q(b)$
\end{quote}
Suppose we want to unfold $C_{1}$ upon $Q(x)$ in $T$.
$C_{2},C_{4}$ are the clauses in $T$ whose head can be unified
with $Q(x)$, so $U_{C_{1},2}=\{(P(f(x))\leftarrow P(x),R(x,a)),
(P(f(b))\leftarrow P(b))\}$.
Using the set $U_{C,i}$, we can construct a new program from $T$
in two ways. The first way, used in~\cite{kn:bostrom&idestam,kn:tamaki&sato},
{\em replaces} $C$ by $U_{C,i}$, thus obtaining the program
$(T\backslash\{C\})\cup U_{C,i}$.
We will call this the {\em type 1} program, denoted by $T_{u1,C,i}$.
The second way {\em adds} $U_{C,i}$ to $T$, without deleting the unfolded
clause $C$ from the program.
We will call the program $T\cup U_{C,i}$ obtained in this way the
{\em type 2} program, denoted by $T_{u2,C,i}$.
The next proposition is proved in~\cite{kn:tamaki&sato}.
A different proof can be found in~\cite{kn:cheng&wolf:wrunf}.
\begin{proposition}
Let $T$ be a definite program, $C\in T$, and $T_{u1,C,i}$ the type~1 program
resulting from unfolding $C$ upon $B_{i}$ in $T$.
Then $M_{T}=M_{T_{u1,C,i}}$.
\end{proposition}
\noindent Thus constructing the type 1 program preserves the least Herbrand
model. It does not preserve logical equivalence.
Take for instance $T=\{C=P(f(x))\leftarrow P(x)\}$.
Then $T_{u1,C,1}=\{P(f^{2}(x))\leftarrow P(x)\}$.
Now $M_{T}=M_{T_{u1,C,1}}=\emptyset$, but $T\not\Leftrightarrow T_{u1,C,1}$
since $T_{u1,C,1}\not\models T$.
Note that this means that a specialization of $T$ need not be a
specialization of $T_{u1,C,i}$.
This is actually one of the reasons for the fact that type~1 unfolding and
clause deletion cannot solve all specialization problems
(Section~\ref{secuds}).
On the other hand, constructing the type~2 program {\em does} preserve
logical equivalence. Since $T\subseteq T_{u2,C,i}$ we have
$T_{u2,C,i}\models T$; and because $T_{u2,C,i}\backslash T$ is a set of
resolvents of clauses in $T$, we also have $T\models T_{u2,C,i}$.
\begin{proposition}
Let $T$ be a definite program, $C\in T$, and $T_{u2,C,i}$ the type~2 program
resulting from unfolding $C$ upon $B_{i}$ in $T$.
Then $T\Leftrightarrow T_{u2,C,i}$.
\end{proposition}
\section{UDS-SPECIALIZATION}\label{secuds}
As the example in the Introduction to this paper showed, the combination
of constructing the type~1 program and clause deletion can be used to
specialize definite programs that are too general.
This combination is used in \cite{kn:bostrom&idestam,kn:bostrom1,kn:bostrom2}.
The combination of type~1 unfolding and clause deletion is not complete:
it cannot solve all specialization problems.
Consider $T=\{(P(f(x))\leftarrow P(x)), P(a)\}$.
Then $M_{T}=\{P(a), P(f(a)),\linebreak[1] P(f^{2}(a)),\linebreak[2] P(f^{3}(a)),\ldots\}$.
Let $E^{+}=M_{T}\backslash\{P(f^{2}(a))\}$, $E^{-}=\{P(f^{2}(a))\}$,
and $T_{1}=T$.
The only clause which can be unfolded is $P(f(x))\leftarrow P(x)$.
Unfolding this results in the next type~1 program:
\medskip
\noindent $T_{2}=\{(P(f^{2}(x))\leftarrow P(x)), P(f(a)), P(a)\}.$
\medskip
\noindent Then unfolding $P(f^{2}(x))\leftarrow P(x)$ gives
\medskip
\noindent $T_{3}=\{(P(f^{4}(x))\leftarrow P(x)), P(f^{3}(a)), P(f^{2}(a)),
P(f(a)), P(a)\}.$
\medskip
\noindent Notice that $M_{T_{1}}=M_{T_{2}}=M_{T_{3}}$, but unfolding has
nevertheless weakened the program: $T_{1}\models T_{2}\models T_{3}$, but
$T_{2}\not\models T_{1}$ and $T_{3}\not\models T_{2}$.
In $T_{3}$, $P(f^{4}(x))\leftarrow P(x)$ can be unfolded, etc.
It is not difficult to see that any program which can be constructed from $T$
by type~1 unfolding and clause deletion, is a subset of
\medskip
\noindent $\{P(f^{2^{n}}(x))\leftarrow P(x)), P(f^{2^{n}-1}(a)),
P(f^{2^{n}-2}(a)), \ldots,$\\
$P(f^{2}(a)), P(f(a)), P(a)\},$
\medskip
\noindent for some $n$.
In order to specialize this program such that $P(f^{2}(a))$ is no longer
derivable, we must in any case remove $P(f^{2}(a))$. However, this would also
prune some of the positive examples (such as $P(f^{2^{n}+2}(a))$) from the
program via the clause $P(f^{2^{n}}(x))\leftarrow P(x)$.
Thus type~1 unfolding and clause deletion are not sufficient for this
particular specialization problem.
But suppose we use the type~2 program instead of the type~1 program.
That is, suppose we do not immediately delete the unfolded clause from
the program.
In this case, we {\em can} find a correct specialization w.r.t.\ the
examples given above, as follows.
We start with $T'_{1}=T$, and unfold $P(f(x))\leftarrow P(x)$ without
removing the unfolded clause.
This gives $T'_{2}$:
\medskip
\noindent $T'_{2}=\{(P(f^{2}(x))\leftarrow P(x)), (P(f(x))\leftarrow P(x)),
P(f(a)), P(a)\}.$
\medskip
\noindent Now we unfold $P(f^{2}(x))\leftarrow P(x)$, again without removing
the unfolded clause. This gives $T'_{3}$:
\medskip
\noindent $T'_{3} = \{(P(f^{4}(x))\leftarrow P(x)), (P(f^{3}(x))\leftarrow P(x)),$
$(P(f^{2}(x))\leftarrow P(x)), (P(f(x))\leftarrow P(x)), P(f^{3}(a)),P(f^{2}(a)),P(f(a)),P(a)\}.$
\medskip
\noindent If we delete some clauses from $T'_{3}$, we obtain $T''$:
\medskip
\noindent $T''=\{(P(f^{4}(x))\leftarrow P(x)), (P(f^{3}(x))\leftarrow P(x)),
P(f(a)),P(a)\}.$
\medskip
\noindent This is a correct specialization of $T$ w.r.t.\ $E^{+}$ and $E^{-}$:
$T''\models E^{+}$, and $T''\not\models P(f^{2}(a))$.
Yet the combination of type~2 unfolding and clause deletion is still
not sufficient. Consider $T=\{P(x)\}$, $E^{+}=\{P(f(a)),\linebreak[1]
P(f^{2}(a))\}$ and $E^{-}=\{P(a)\}$.
$T'=\{P(f(x))\}$ is a solution for this specialization problem.
But since $T$ contains only a single unit clause, no unfolding can take
place here. Hence the only two programs which can be obtained by type~2
unfolding and clause deletion, are $T$ itself and the empty set,
neither of which is correct.
In order to solve this specialization problem, we have to allow the
possibility of taking a {\em subsumption step}.\footnote{Subsumption can be
seen as a solution to the problem of {\em ambivalent} leaves in an SLD-tree
\cite{kn:bostrom&idestam,kn:bostrom1}.}
A clause $C$ {\em subsumes} a clause $D$ if there is a substitution $\theta$
such that $C\theta\subseteq D$. If we allow subsumption, we can construct
$T'$ from $T$, thus obtaining a correct specialization.
In general, we can define UDS-specialization ({\bf U}nfolding,
clause {\bf D}eletion, {\bf S}ubsumption) as follows:
\begin{definition}
Let $T$ and $T'$ be definite programs.
We say $T'$ is a {\em UDS-speciali\-zation} of $T$, if there exists a
sequence $T_{1}=T,T_{2},\ldots,T_{n}=T'$ ($n\geq 1$) of definite programs,
such that for each $j=1,\ldots,n-1$, one of the following holds:
\begin{tabbing}
1. \= $T_{j+1}=T_{j_{u2,C,i}}$.\\
2. \> $T_{j+1}=T_{j}\backslash\{C\}$ for some $C\in T_{j}$.\\
3. \> $T_{j+1}=T_{j}\cup\{C\}$ for a $C$ that is subsumed by some $D\in T_{j}$.
\end{tabbing}
\vspace*{-3mm}
\end{definition}
\noindent UDS-specialization is indeed complete.
For the proof of completeness, we need the Subsumption Theorem for
SLD-reso\-lu\-tion, which we have proved in~\cite{kn:cheng&wolf:csn95-s}.
Here an SLD-derivation of a clause $D$ from a set $T$ of Horn clauses
is a sequence $R_{0},\ldots,R_{n}=D$ of clauses, such that
$R_{0}\in T$, and each $R_{i+1}$ is a resolvent of $R_{i}$ and
a definite program clause $C_{i+1}\in T$, resolved upon the head of
$C_{i+1}$ and a {\em selected atom} in the body of $R_{i}$.
See figure~\ref{slddeduc} for illustration.
The Subsumption Theorem is then the following completeness result:
\begin{theorem}[Subsumption Theorem for SLD]
Let $T$ be a set of Horn clauses, and $C$ be a Horn clause.
Then $T\models C$ iff $C$ is a tautology, or there exists an SLD-derivation
from $T$ of a clause $D$ which subsumes $C$.
\end{theorem}
\begin{figure}[hbt]
\centering
\setlength{\unitlength}{0.23mm}
\begin{picture}(100,225)
{\scriptsize
\put(0,0){
\begin{picture}(0,225)
\sldstep{225}{$R_{0}\in T$}{$C_{1}\in T$}
\put(0,165){\makebox(0,0){$R_{1}$}}
\put(0,145){\makebox(0,0){$\vdots$}}
\sldstep{120}{$R_{n-1}$}{$C_{n}\in T$}
\put(0,60){\makebox(0,0){$R_{n}=D$}}
\put(0,50){\vector(0,-1){40}}
\put(30,30){\makebox(0,0){subsumes}}
\put(0,0){\makebox(0,0){$C$}}
\end{picture}}
}
\end{picture}
\caption{An SLD-derivation of $D$ from $T$, $D$ subsumes $C$}
\label{slddeduc}
\end{figure}
\noindent This Subsumption Theorem for SLD-resolution is part of our
more general research concerning several versions of the Subsumption
Theorem (also for non-Horn clauses,
see~\cite{kn:cheng&wolf:ilp95,kn:cheng&wolf:acsc}).
Using this theorem, we can now prove the completeness of UDS-specialization.
Suppose there are definite programs $T$ and $T'$, such that $T'$ contains no
tautologies. Furthermore, suppose $T\models T'$.
Then for every $C\in T'$, we have $T\models C$.
Let $C$ be some particular clause in $T'$ that is not in $T$.
Then by the Subsumption Theorem, there exists a derivation from $T$ of a
clause $D$, which subsumes $C$ (as in figure~\ref{slddeduc}).
Since $R_{1}$ is a resolvent of $R_{0}$ and $C_{1}$ (upon the selected
atom $B_{i}$ in $R_{0}$), if we unfold $R_{0}$ in $T$ upon $B_{i}$
we get the program $T_{u2,R_{0},i}$ which contains $R_{1}$.
Now if we unfold $R_{1}$ in $T_{u2,R_{0},i}$, we get a program which
contains $R_{2}$, etc.
Thus after $n$ applications of unfolding, we can produce a program
(a superset of $T$) containing the clause $R_{n}=D$.
Since $D$ subsumes $C$, we can add $C$ to the program
by the third item in the definition of UDS-specialization.
If we do this for every $C\in T'$ that is not in $T$, we get a program
$T''$ which contains every clause in $T'$. Since $T''$ is obtained from $T$
by a finite number of applications of unfolding and subsumption, $T''$
is a UDS-specialization of $T$. Now delete from $T''$ all those clauses
that are not in $T'$. Then we obtain $T'$ as a UDS-specialization of $T$.
Thus if $T\models T'$, then $T'$ is a UDS-specialization of $T$.
Obviously, the converse also holds, so we have the following powerful result:
\begin{theorem}\label{impliffuds}
Let $T$ and $T'$ be definite programs, such that $T'$ contains no tautologies.
Then $T\models T'$ iff $T'$ is a UDS-specialization of $T$.
\end{theorem}
\noindent Now suppose we are given $T$, $T'$, $E^{+}$ and $E^{-}$,
such that $T\models T'$ and $T'$ is correct w.r.t.\ $E^{+}$ and $E^{-}$.
We can assume $T'$ contains no tautologies.
Then it follows from the previous theorem that $T'$ is a UDS-specialization
of $T$. This shows that UDS-specialization is complete:%
\footnote{UDS-specialization need not specialize {\em minimally} in the
sense advocated in \cite{kn:wrobel:minimal}: a UDS-specialization of an
initial $T$ may be considerably different from $T$.
On the other hand, the approach of~\cite{kn:wrobel:minimal} has the
disadvantage that each clause in a specialized theory should be equal to or
subsumed by a clause in the initial $T$ (p.~71, postulate~1), which
is quite restrictive.}
\begin{corollary}[Completeness of UDS]\label{udscomplete}
Every (solvable) specialization problem has a UDS-specialization as solution.
\end{corollary}
\noindent{\bf Efficiency:}\\
Note that if we want to unfold some particular clause $C$, we actually only
need to consider the resolvents of $C$ and clauses from the original $T$.
This is clear from figure~\ref{slddeduc}, since in order to produce
$R_{i+1}$, we only need to resolve $R_{i}$ against $C_{i+1}$,
which is a member of the original $T$.
In other words, we only need to add a subset of $U_{C,i}$ to the program.
We might define $U'_{C,i}$ as the set of resolvents upon $B_{i}$ of
$C$ and clauses from the original $T$, and then use
$T_{j+1}=T_{j}\cup U'_{C,i}$ instead of $T_{j}\cup U_{C,i}$.
This reduces the number of clauses that unfolding produces,
and hence improves efficiency.
\section{RELATION WITH INVERSE RESOLUTION}\label{secinvres}
In ILP, there are basically two possible approaches: the top-down approach
(of which UDS-specialization is an example) which starts with an overly
general program and specializes this, and the bottom-up approach which starts
with an overly specific program and generalizes this.
There is an interesting relation between our previous analysis of program
specialization on the one hand, and program generalization by {\em inverse
resolution} (see for instance \cite{kn:muggleton&buntine,kn:muggleton-ilp,%
kn:rouveirol,kn:sato&akiba}) on the other hand.
The inversion of resolution is a well-known approach towards
generalization in ILP. Here the inversion of a resolution-step can
be viewed as the dual of unfolding.
However, in the same way as specialization by unfolding is not complete
without subsumption, its dual also needs (the inversion of) subsumption.
Most research in inverse resolution has focused on inverting resolution
steps, mostly ignoring the inversion of the final subsumption step.
By the previous analysis, inverting a subsumption step will be
necessary for completeness.
For example, we cannot generalize $T=\{P(f(x))\}$ to $T'=\{P(x)\}$ just
by inverting resolution steps.
\section{CONCLUSION}
In this paper, we have discussed the problem of specializing a definite
program with respect to sets of positive and negative examples.
Following~\cite{kn:bostrom&idestam}, we used unfolding as a specialization
tool. Looking at unfolding from the perspective of the Subsumption Theorem,
rather than from the perspective of refutations and SLD-trees, it became
clear that subsumption should be added to unfolding to construct a
complete specialization technique.
We defined UDS-specialization, a combination of {\bf U}nfolding,
clause {\bf D}eletion and {\bf S}ubsumption, and showed it to be complete:
every (solvable) specialization problem has a UDS-specialization as a
solution. Finally, we briefly discussed the duality between program
specialization by unfolding and program generalization by inverse resolution.
%\bibliography{unfold}
\begin{thebibliography}{10}
\bibitem{kn:alexin&gyimothy&bostrom}
Zolt{\'{a}}n Alexin, Tibor Gyim{\'{o}}thy, and Henrik Bostr{\"{o}}m,
`Integrating algorithmic debugging and unfolding transformation in an
interactive learner', in {\em Proc.\ of the 5th Int.\ Workshop on Inductive
Logic Programming (ILP-95)}, ed., Luc De~Raedt, pp. 437--453. Katholieke
Universiteit Leuven, (1995).
\bibitem{kn:bostrom2}
Henrik Bostr{\"{o}}m, `Covering vs.\ divide-and-conquer for top-down induction
of logic programs', in {\em Proc.\ of the 14th Int.\ Joint Conference on
Artificial Intelligence (IJCAI-95)}, pp. 1194--1200. Morgan Kaufman, (1995).
\bibitem{kn:bostrom1}
Henrik Bostr{\"{o}}m, `Specialization of recursive predicates', in {\em Proc.\
of the 8th European Conference on Machine Learning (ECML-95)}, eds.,
N.~Lavra{\v{c}} and S.~Wrobel, volume 912 of {\em Lecture Notes in Computer
Science}, pp. 92--106. Springer-Verlag, (1995).
\bibitem{kn:bostrom&idestam}
Henrik Bostr{\"{o}}m and Peter Idestam-Almquist, `Specialization of logic
programs by pruning {SLD}-trees', in {\em Proc.\ of the 4th Int.\ Workshop on
Inductive Logic Programming (ILP-94)}, ed., Stefan Wrobel, pp. 31--48, Bad
Honnef/Bonn, (1994).
\bibitem{kn:lloyd87}
John~W. Lloyd, {\em Foundations of Logic Programming}, Springer-Verlag, Berlin,
2nd edn., 1987.
\bibitem{kn:muggleton-ilp}
Stephen Muggleton, `Inductive logic programming', in {\em Inductive Logic
Programming}, ed., Stephen Muggleton, volume~38 of {\em APIC Series},
pp. 3--27, Academic Press, (1992).
\bibitem{kn:muggleton&buntine}
Stephen Muggleton and Wray Buntine, `Machine invention of first-order
predicates by inverting resolution', in {\em Proc.\ of the 5th Int.\
Conference on Machine Learning (ICML-88)}, ed., John Laird, pp. 339--352.
Morgan Kaufman, (1988).
\bibitem{kn:cheng&wolf:ilp95}
Shan-Hwei Nienhuys-Cheng and Ronald~de Wolf, `The subsumption theorem
in inductive logic programming: Facts and fallacies',
in {\em Advances in Inductive Logic Programming}, ed.,
Luc De~Raedt, pp. 265--276. IOS Press, (1996).
\bibitem{kn:cheng&wolf:csn95-s}
Shan-Hwei Nienhuys-Cheng and Ronald~de Wolf, `The subsumption theorem
revisited: Restricted to {SLD}-resolution', in {\em Proc.\ of Computing
Science in the Netherlands (CSN-95)}, ed., J.~C.~van Vliet, pp. 143--154.
SION, (1995).
\bibitem{kn:cheng&wolf:acsc}
Shan-Hwei Nienhuys-Cheng and Ronald~de Wolf, `The equivalence of the
subsumption theorem and the refutation-completeness for unconstrained
resolution', in {\em Proc.\ of the Asian Computer Science Conference
(ACSC-95)}, eds., K.~Kanchanasut and J.-J.~L\'{e}vy,
volume 1023 of {\em Lecture Notes in Computer Science},
pp. 269--285. Springer-Verlag, (1995).
\bibitem{kn:cheng&wolf:benelearn95}
Shan-Hwei Nienhuys-Cheng and Ronald~de Wolf,
`Specializing definite programs by unfolding',
in {\em Proc.\ of Benelearn'95}.
Workreport of the Universit\'e Libre de Bruxelles, (Sept.\ 1995).
\bibitem{kn:cheng&wolf:wrunf}
Shan-Hwei Nienhuys-Cheng and Ronald~de Wolf,
`The specialization problem and the completeness of unfolding',
Workreport EUR-CS-96-01, Erasmus University Rotterdam, (1996).
\bibitem{kn:rouveirol}
C{\'{e}}line Rouveirol, `Extensions of inversion of resolution applied to
theory completion', in {\em Inductive Logic Programming}, ed., Stephen
Muggleton, volume~38 of {\em APIC Series}, pp. 63--92, Academic Press, (1992).
\bibitem{kn:sato&akiba}
Taisuke Sato and Sumitaka Akiba, `Inductive resolution', in {\em Proc.\ of
the 4th Int.\ Workshop on Algorithmic Learning Theory (ALT-93)},
volume 744 of {\em Lecture Notes in Artificial Intelligence},
pp. 101--110. Springer-Verlag, (1993).
\bibitem{kn:shapiro81}
Ehud~Y. Shapiro, `Inductive inference of theories from facts', Research Report
192, Yale University, (1981).
\bibitem{kn:tamaki&sato}
Hisao Tamaki and Taisuke Sato, `Unfold/fold transformation of logic programs',
in {\em Proc.\ of the 2nd Int.\ Logic Programming Conference}, ed.,
Sten-{\AA}ke T{\"{a}}rnlund, pp. 127--138, Uppsala, (1984). Uppsala
University.
\bibitem{kn:wrobel:minimal}
Stefan Wrobel, `On the proper definition of minimality in specialization
and theory revision', in {\em Proc.\
of the 6th European Conference on Machine Learning (ECML-93)}, ed.,
P.~Brazdil, volume 667 of {\em Lecture Notes in Computer
Science}, pp. 65--82. Springer-Verlag, (1993).
\end{thebibliography}
\end{document}