Original post

In logic and computer science, unification is a process of automatically solving
equations between symbolic terms. Unification has several interesting
applications, notably in logic and type inference. In this post I want to
present the basic unification algorithm with a complete implementation.

constants, variables and function applications:

• A lowercase letter represents a constant (could be any kind of constant, like
an integer or a string)
• An uppercase letter represents a variable
• f(...) is an application of function f to some parameters, which
are terms themselves

This representation is borrowed from first-order logic and is also used in the
Prolog programming language. Some examples:

• V: a single variable term
• foo(V, k): function foo applied to variable V and constant k
• foo(bar(k), baz(V)): a nested function application

## Pattern matching

Unification can be seen as a generalization of pattern matching, so let’s

We’re given a constant term and a pattern term. The pattern
term has variables. Pattern matching is the problem of finding a variable
assignment that will make the two terms match. For example:

• Constant term: f(a, b, bar(t))
• Pattern term: f(a, V, X)

Trivially, the assignment V=b and X=bar(t) works here. Another name to
call such an assignment is a substitution, which maps variables to their
assigned values. In a less trivial case, variables can appear multiple times in
a pattern:

• Constant term: f(top(a), a, g(top(a)), t)
• Pattern term: f(V, a, g(V), t)

Here the right substitution is V=top(a).

Sometimes, no valid substitutions exist. If we change the constant term in the
latest example to f(top(b), a, g(top(a)), t), then there is no valid
substitution becase V would have to match top(b) and top(a)
simultaneously, which is not possible.

## Unification

Unification is just like pattern matching, except that both terms can contain
variables. So we can no longer say one is the pattern term and the other the
constant term. For example:

• First term: f(a, V, bar(D))
• Second term f(D, k, bar(a))

Given two such terms, finding a variable substitution that will make them
equivalent is called unification. In this case the substitution is {D=a,
V=k}
.

Note that there is an infinite number of possible unifiers for some solvable
unification problem. For example, given:

• First term: f(X, Y)
• Second term: f(Z, g(X))

We have the substitution {X=Z, Y=g(X)} but also something like {X=K, Z=K,
Y=g(K)}
and {X=j(K), Z=j(K), Y=g(j(K))} and so on. The first substitution
is the simplest one, and also the most general. It’s called the most general
unifier
or mgu. Intuitively, the mgu can be turned into any other unifier
by performing another substitution. For example {X=Z, Y=g(X)} can be turned
into {X=j(K), Z=j(K), Y=g(j(K))} by applying the substitution {Z=j(K)}
to it. Note that the reverse doesn’t work, as we can’t turn the second into the
first by using a substitution. So we say that {X=Z, Y=g(X)} is the most
general unifier for the two given terms, and it’s the mgu we want to find.

## An algorithm for unification

Solving unification problems may seem simple, but there are a number of subtle
corner cases to be aware of. In his 1991 paper Correcting a Widespread Error in
Unification Algorithms
,
Peter Norvig noted a common error that exists in many books presenting the
algorithm, including SICP.

The correct algorithm is based on J.A. Robinson’s 1965 paper “A machine-oriented
logic based on the resolution principle”. More efficient algorithms have been
developed over time since it was first published, but our focus here will be
on correctness and simplicity rather than performance.

The following implementation is based on Norvig’s, and the full code (with
tests) is available on Github.
This implementation uses 3, while Norvig’s original is in Common Lisp.
There’s a slight difference in representations too, as Norvig uses the Lisp-y
(f X Y) syntax to denote an application of function f. The two
representations are isomorphic, and I’m picking the more classical one which is
used in most papers on the subject. In any case, if you’re interested in the
more Lisp-y version, I have some Clojure code online that
ports Norvig’s implementation more directly.

We’ll start by defining the data structure for terms:

```class Term:
pass

class App(Term):
def __init__(self, fname, args=()):
self.fname = fname
self.args = args

# Not shown here: __str__ and __eq__, see full code for the details...

class Var(Term):
def __init__(self, name):
self.name = name

class Const(Term):
def __init__(self, value):
self.value = value
```

An App represents the application of function fname to a sequence of
arguments.

```def unify(x, y, subst):
"""Unifies term x and y with initial subst.

Returns a subst (map of name->term) that unifies x and y, or None if
they can't be unified. Pass subst={} if no subst are initially
known. Note that {} means valid (but empty) subst.
"""
if subst is None:
return None
elif x == y:
return subst
elif isinstance(x, Var):
return unify_variable(x, y, subst)
elif isinstance(y, Var):
return unify_variable(y, x, subst)
elif isinstance(x, App) and isinstance(y, App):
if x.fname != y.fname or len(x.args) != len(y.args):
return None
else:
for i in range(len(x.args)):
subst = unify(x.args[i], y.args[i], subst)
return subst
else:
return None
```

unify is the main function driving the algorithm. It looks for a
substitution, which is a Python dict mapping variable names to terms.
When either side is a variable, it calls unify_variable which is shown next.
Otherwise, if both sides are function applications, it ensures they apply the
same function (otherwise there’s no match) and then unifies their arguments
one by one, carefully carrying the updated substitution throughout the process.

```def unify_variable(v, x, subst):
"""Unifies variable v with term x, using subst.

Returns updated subst or None on failure.
"""
assert isinstance(v, Var)
if v.name in subst:
return unify(subst[v.name], x, subst)
elif isinstance(x, Var) and x.name in subst:
return unify(v, subst[x.name], subst)
elif occurs_check(v, x, subst):
return None
else:
# v is not yet in subst and can't simplify x. Extend subst.
return {**subst, v.name: x}
```

The key idea here is recursive unification. If v is bound in the
substitution, we try to unify its definition with x to guarantee consistency
throughout the unification process (and vice versa when x is a variable).
There’s another function being used here – occurs_check; I’m retaining its
classical name from early presentations of unification. Its goal is to guarantee
that we don’t have self-referential variable bindings like X=f(X) that would

```def occurs_check(v, term, subst):
"""Does the variable v occur anywhere inside term?

Variables in term are looked up in subst and the check is applied
recursively.
"""
assert isinstance(v, Var)
if v == term:
return True
elif isinstance(term, Var) and term.name in subst:
return occurs_check(v, subst[term.name], subst)
elif isinstance(term, App):
return any(occurs_check(v, arg, subst) for arg in term.args)
else:
return False
```

Let’s see how this code handles some of the unification examples discussed
earlier in the post. Starting with the pattern matching example, where variables
are just one one side:

```>>> unify(parse_term('f(a, b, bar(t))'), parse_term('f(a, V, X)'), {})
{'V': b, 'X': bar(t)}
```

Now the examples from the Unification section:

```>>> unify(parse_term('f(a, V, bar(D))'), parse_term('f(D, k, bar(a))'), {})
{'D': a, 'V': k}
>>> unify(parse_term('f(X, Y)'), parse_term('f(Z, g(X))'), {})
{'X': Z, 'Y': g(X)}
```

Finally, let’s try one where unification will fail due to two conflicting
definitions of variable X.

```>>> unify(parse_term('f(X, Y, X)'), parse_term('f(r, g(X), p)'), {})
None
```

Lastly, it’s instructive to trace through the execution of the algorithm for
a non-trivial unification to see how it works. Let’s unify the terms
f(X,h(X),Y,g(Y)) and f(g(Z),W,Z,X):

• unify is called, sees the root is an App of function f and loops
over the arguments.

• unify(X, g(Z)) invokes unify_variable because X is a variable,
and the result is augmenting subst with X=g(Z)
• unify(h(X), W) invokes unify_variable because W is a variable,
so the subst grows to {X=g(Z), W=h(X)}
• unify(Y, Z) invokes unify_variable; since neither Y nor Z
are in subst yet, the subst grows to {X=g(Z), W=h(X), Y=Z} (note that
the binding between two variables is arbitrary; Z=Y would be equivalent)
• unify(g(Y), X) invokes unify_variable; here things get more
interesting, because X is already in the subst, so now we call
unify on g(Y) and g(Z) (what X is bound to)

• The functions match for both terms (g), so there’s another loop over
arguments, this time only for unifying Y and Z
• unify_variable for Y and Z leads to lookup of Y in the
subst and then unify(Z, Z), which returns the unmodified subst;
the result is that nothing new is added to the subst, but the unification
of g(Y) and g(Z) succeeds, because it agrees with the existing
bindings in subst
• The final result is {X=g(Z), W=h(X), Y=Z}

## Efficiency

The algorithm presented here is not particularly efficient, and when dealing
with large unification problems it’s wise to consider more advanced options. It
does too much copying around of subst, and also too much work is repeated
because we don’t try to cache terms that have already been unified.

For a good overview of the efficiency of unification algorithms, I recommend
checking out two papers:

• “An Efficient Unificaiton algorithm” by Martelli and Montanari
• “Unification: A Multidisciplinary survey” by Kevin Knight