λ-calculus

Motivated by the nice handwritten formulas in Turing's thesis ("A System of Logic based on Ordinals", 1938), I have prepared a very basic introduction to $\lambda$-calculus using a toy implementation that I wrote some time ago, with illustrative examples that are automatically computed and inserted in the document using this tool.

The $\lambda$-calculus is a theoretical model of computation based on a very simple concept of function. We all are very familiar with function definitions, application, and composition, both in maths and programming. For example, $f(x) = sin(2x-3)$ defines a function called $f$ by composing other available functions as described in the right-hand side. It is interesting to investigate how far we can get if we can only define and compose functions "from scratch", without any available primitive or built-in useful function. In this setting, functions have nothing to operate on but other functions, and obtain other functions as results.

The important thing about a function is not its name but the syntactic substitution rule to get the result, so it is anonymously written as $(\lambda x . e)$, where $e$ is some expression depending on $x$. The name of the argument is also not important (it is a "dummy" variable), but different names must be used to distinguish the arguments of different functions in combined expressions. Application is denoted simply by juxtaposition. Functions of several variables can be created from nested functions of a single variable (currying). In the following we use a simplified notation omitting the $\lambda$ symbols and unnecessary parenthesis.

The simplest function is the identity:

i =
x.x

We can apply it to itself:

i i →
(x.x)(x.x)
x.x

Another simple function is the constant function, which always returns its first argument:

k =
xy.x

For example:

k i k →
(xy.x)(x.x)(xy.x)
(yx.x)(xy.x)
x.x
k k i →
(xy.x)(xy.x)(x.x)
(yxy.x)(x.x)
xy.x

(These two functions together with the substitution combinator S = λ xyz.xz(yz) are enough to define any other computable function, but this is another story).

arithmetic

The natural numbers can be encoded as functions using a clever idea: The number $n$ is the higher-order function that applies another function $n$ times to its argument:

zero =
xz.z
one =
xz.xz
two =
xz.x(xz)
seven =
yx.y(y(y(y(y(y(yx))))))

The successor function adds a new application:

suc =
zyx.y(zyx)
suc three =
(zyx.y(zyx))(yx.y(y(yx)))
suc three →
yx.y(y(y(yx)))

The full sequence of reductions is:

suc three →
(zyx.y(zyx))(yx.y(y(yx)))
yx.y((yx.y(y(yx)))yx)
yx.y((x.y(y(yx)))x)
yx.y(y(y(yx)))

Now we can define addition as repeated increments:

add = lam[x] (x!suc)

(White code boxes show the definitions in the source code, using previously defined functions for clarity.) The expanded $\lambda$ expression is:

add =
x.x(zyx.y(zyx))

And this is the full computation of 2+3:

add two three →
(x.x(zyx.y(zyx)))(xz.x(xz))(yx.y(y(yx)))
(xz.x(xz))(zyx.y(zyx))(yx.y(y(yx)))
(z.(zyx.y(zyx))((zyx.y(zyx))z))(yx.y(y(yx)))
(zyx.y(zyx))((zyx.y(zyx))(yx.y(y(yx))))
yx.y((zyx.y(zyx))(yx.y(y(yx)))yx)
yx.y((yx.y((yx.y(y(yx)))yx))yx)
yx.y((x.y((yx.y(y(yx)))yx))x)
yx.y(y((yx.y(y(yx)))yx))
yx.y(y((x.y(y(yx)))x))
yx.y(y(y(y(yx))))

Multiplication is easy with this encoding of numbers:

mul =
xyz.x(yz)
mul two three →
(xyz.x(yz))(xz.x(xz))(yx.y(y(yx)))
(yz.(xz.x(xz))(yz))(yx.y(y(yx)))
z.(xz.x(xz))((yx.y(y(yx)))z)
zz'.(yx.y(y(yx)))z((yx.y(y(yx)))zz')
zz'.(x.z(z(zx)))((yx.y(y(yx)))zz')
zz'.z(z(z((yx.y(y(yx)))zz')))
zz'.z(z(z((x.z(z(zx)))z')))
zz'.z(z(z(z(z(zz')))))
mul (add three two) three →
zx.z(z(z(z(z(z(z(z(z(z(z(z(z(z(zx))))))))))))))

And powers are even easier:

pow =
xy.yx
pow two three →
xz.x(x(x(x(x(x(x(xz)))))))

logic

Conditional evaluation is easily encoded. The logical constants "true" and "false" are functions which select the first or second argument as the branch to run (the typical "if-then-else" construction is not explicitly needed).

true =
xy.x
false =
xy.y

The logical connectives "and", "or", and "not" can also be defined in a natural way. Using this Boolean logic we can define useful predicates over the numbers. The simplest predicate is testing if a number is zero:

isZero =
x.x(yxy.y)(xy.x)
isZero zero →
(x.x(yxy.y)(xy.x))(xz.z)
(xz.z)(yxy.y)(xy.x)
(z.z)(xy.x)
xy.x
isZero seven →
(x.x(yxy.y)(xy.x))(yx.y(y(y(y(y(y(yx)))))))
(yx.y(y(y(y(y(y(yx)))))))(yxy.y)(xy.x)
(x.(yxy.y)((yxy.y)((yxy.y)((yxy.y)((yxy.y)((yxy.y)((yxy.y)x)))))))(xy.x)
(yxy.y)((yxy.y)((yxy.y)((yxy.y)((yxy.y)((yxy.y)((yxy.y)(xy.x)))))))
xy.y

Computing the predecessor of a number is not trivial. A possible solution is based on conditional evaluation.

pre = lam [x] (x!phi!(lam[z] (z!zero!zero))! false)
  where
    phi =  lam [y,z] (z!(suc!(y!true))!(y!true)) 
pre =
x.x(yz.z((zyx.y(zyx))(y(xy.x)))(y(xy.x)))(z.z(xz.z)(xz.z))(xy.y)
pre zero →
(x.x(yz.z((zyx.y(zyx))(y(xy.x)))(y(xy.x)))(z.z(xz.z)(xz.z))(xy.y))(xz.z)
(xz.z)(yz.z((zyx.y(zyx))(y(xy.x)))(y(xy.x)))(z.z(xz.z)(xz.z))(xy.y)
(z.z)(z.z(xz.z)(xz.z))(xy.y)
(z.z(xz.z)(xz.z))(xy.y)
(xy.y)(xz.z)(xz.z)
(y.y)(xz.z)
xz.z
pre two →
(x.x(yz.z((zyx.y(zyx))(y(xy.x)))(y(xy.x)))(z.z(xz.z)(xz.z))(xy.y))(xz.x(xz))
(xz.x(xz))(yz.z((zyx.y(zyx))(y(xy.x)))(y(xy.x)))(z.z(xz.z)(xz.z))(xy.y)
(z.(yz.z((zyx.y(zyx))(y(xy.x)))(y(xy.x)))((yz.z((zyx.y(zyx))(y(xy.x)))(y(xy.x)))z))(z.z(xz.z)(xz.z))(xy.y)
(yz.z((zyx.y(zyx))(y(xy.x)))(y(xy.x)))((yz.z((zyx.y(zyx))(y(xy.x)))(y(xy.x)))(z.z(xz.z)(xz.z)))(xy.y)
(z.z((zyx.y(zyx))((yz.z((zyx.y(zyx))(y(xy.x)))(y(xy.x)))(z.z(xz.z)(xz.z))(xy.x)))((yz.z((zyx.y(zyx))(y(xy.x)))(y(xy.x)))(z.z(xz.z)(xz.z))(xy.x)))(xy.y)
(xy.y)((zyx.y(zyx))((yz.z((zyx.y(zyx))(y(xy.x)))(y(xy.x)))(z.z(xz.z)(xz.z))(xy.x)))((yz.z((zyx.y(zyx))(y(xy.x)))(y(xy.x)))(z.z(xz.z)(xz.z))(xy.x))
(y.y)((yz.z((zyx.y(zyx))(y(xy.x)))(y(xy.x)))(z.z(xz.z)(xz.z))(xy.x))
(yz.z((zyx.y(zyx))(y(xy.x)))(y(xy.x)))(z.z(xz.z)(xz.z))(xy.x)
(z.z((zyx.y(zyx))((z.z(xz.z)(xz.z))(xy.x)))((z.z(xz.z)(xz.z))(xy.x)))(xy.x)
(xy.x)((zyx.y(zyx))((z.z(xz.z)(xz.z))(xy.x)))((z.z(xz.z)(xz.z))(xy.x))
(y.(zyx.y(zyx))((z.z(xz.z)(xz.z))(xy.x)))((z.z(xz.z)(xz.z))(xy.x))
(zyx.y(zyx))((z.z(xz.z)(xz.z))(xy.x))
yx.y((z.z(xz.z)(xz.z))(xy.x)yx)
yx.y((xy.x)(xz.z)(xz.z)yx)
yx.y((yxz.z)(xz.z)yx)
yx.y((xz.z)yx)
yx.y((z.z)x)
yx.yx
pre seven →
yx.y(y(y(y(y(yx)))))

Based on the predecessor function we can define subtraction and the predicate less-than.

sub two seven →
yx.y(y(y(y(yx))))

The equality predicate is now easily defined in terms of and and less-than. The expanded $\lambda$ expression is impenetrable

 eq
xy.(xy.xy(xy.y))((xy.(x.x(yxy.y)(xy.x))(x(x.x(yz.z((zyx.y(zyx))(y(xy.x
)))(y(xy.x)))(z.z(xz.z)(xz.z))(xy.y))y))xy)((xy.(x.x(yxy.y)(xy.x))(x(x
.x(yz.z((zyx.y(zyx))(y(xy.x)))(y(xy.x)))(z.z(xz.z)(xz.z))(xy.y))y))yx)

but it works as expected:

eq (add one two) three →
xy.x
eq (add one two) (mul two two) →
xy.y

data structures

Pairs of things (tuples) can be also expressed as higher order functions, together with functions to extract its elements. And using pairs we can define the "cons cells" typically used to create lists.

headL (tailL (cons one (cons two (cons three nil)))) →
xz.x(xz)

Any data structure can actually be defined using $\lambda$ expressions.

(In this kind of untyped $\lambda$-calculus there are no built-in checks of type consistency, so we can freely apply by mistake, for example, an arithmetic function to a list of logical values, and we will obtain a perfectly looking but absurd $\lambda$ expression.)

add one (cons three nil) →
yx.y(y(xy'.y')(z.z(y'x.y'(y'(y'x)))(z.z(xy'.x)(xy'.x)))x)

local scope and closures

In the following pseudo-code a variable in scope is captured by a function. This is directly available in the $\lambda$ formalism:

y = 3

f x = x+y
let f = lam[y] (lam[x] (add!x!y)  ) ! three
f =
(yx.(x.x(zyx.y(zyx)))xy)(yx.y(y(yx)))
f two →
yx.y(y(y(y(yx))))

undefined computations

There are some peculiar expressions which don't have a normal form, yielding an infinite sequence of reductions. For instance, the well-known "self-applying" function ω works perfectly on many arguments:

ω =
x.xx
ω i →
(x.xx)(x.x)
(x.x)(x.x)
x.x
ω two →
(x.xx)(xz.x(xz))
(xz.x(xz))(xz.x(xz))
z.(xz.x(xz))((xz.x(xz))z)
zz'.(xz.x(xz))z((xz.x(xz))zz')
zz'.(z'.z(zz'))((xz.x(xz))zz')
zz'.z(z((xz.x(xz))zz'))
zz'.z(z((z'.z(zz'))z'))
zz'.z(z(z(zz')))

but when applied to itself we get an indefinite result:

(x.xx)(x.xx)
(x.xx)(x.xx)
(x.xx)(x.xx)
(x.xx)(x.xx)

The existence of this kind of divergent computations is a natural consequence of the $\lambda$-calculus being Turing-complete.

Expressions containing the dangerous (ωω) or equivalent infinite loops must be handled with care. For example, consider the following expression:

k i (ω ω) =
(xy.x)(x.x)((x.xx)(x.xx))

Using the applicative evaluation order found in many common programming languages, in which arguments are evaluated before substituted into the function body, we would fall into an infinite loop. But using normal order, where the function is applied before the arguments are evaluated, we get the expected result:

k i (ω ω) →
(xy.x)(x.x)((x.xx)(x.xx))
(yx.x)((x.xx)(x.xx))
x.x

Normal order evaluation always reach a normal form that cannot be further reduced, if it exists. This is related to the greater expressive power of lazy languages.

self-reference

Many interesting computations are best described by means of recursive functions, defined in terms of themselves. For example:

fact 0 = 1
fact n = n * fact (n-1)

take 0 l = []
take n l = head l : take (n-1) (tail l)

range a b = if a > b then [] else a : range (a+1) b

ones = 1 : ones

(In "imperative" languages these things are usually expressed using iteration and destructive update, but most languages natively allow recursive definitions.)

In the austere formalism of $\lambda$-calculus all functions are anonymous, so this kind of self-reference is apparently difficult to achieve.

A possibility could be extending the $\lambda$ expressions with function names or abbreviations, to be expanded only when actually needed to reduce an expression.

However, this is not necessary: anonymous functions are actually expressive enough to define any recursive computation with the help of the fixed point combinator $Y$. It verifies $Y f = f (Y f)$, so $Y f$ is the solution to the equation $x = f x$. A possible implementation of $Y$ is the following:

yC
y.(x.y(xx))(x.y(xx))

As an example, the factorial function can be defined in a natural way as:

fact = yC ! lam[f,x] (isZero!x!one!(mul!x!(f!(pre!x))))

It expands to something more or less complex

 fact
(y.(x.y(xx))(x.y(xx)))(fx.(x.x(yxy.y)(xy.x))x(xz.xz)((xyz.x(yz))x(f((x
.x(yz.z((zyx.y(zyx))(y(xy.x)))(y(xy.x)))(z.z(xz.z)(xz.z))(xy.y))x))))

and works as expected:

fact zero →
xz.xz
fact three →
zx.z(z(z(z(z(zx)))))

Another example is the range function, useful to create a list of successive integers:

range = lam[x,y] (yC!(g!y)!x)
  where 
    g = lam[y,f,x] (ge ! x ! (suc!y) ! nil ! (cons ! x ! (f!(suc!x))))
headL (tailL (range three seven)) →
yx.y(y(y(yx)))

Other examples are takeL and at.

takeL = yC! lam[f,y,x] (isZero!y!nil!(cons!(headL!x)!(f!(pre!y)!(tailL!x))))
sumL (takeL three (range two seven)) →
yx.y(y(y(y(y(y(y(y(yx))))))))

(sumL will be defined below.)

at = yC ! lam[f,y,x] (isZero!y!(headL!x)!(f!(pre!y)!(tailL!x)))
at three (range two seven) →
yx.y(y(y(y(yx))))

Note that the Y combinator itself and many generated functions $Y f$ diverge if evaluated without providing the required arguments. Turing-complete behavior is both powerful and risky and can be achieved from very simple computational ingredients.

Finally, we don't have any problem with infinite lists:

ones = yC ! (cons!one)
ones =
(y.(x.y(xx))(x.y(xx)))((xy.(xyz.zxy)(xy.y)((xyz.zxy)xy))(xz.xz))
at seven ones →
xz.xz

This is one the simplest examples of the fixed point combinator: the solution of the equation x=f(x), where f = cons 1, is an infinite list of ones. (This is interpreted as a constant value but it is encoded by a function, as everything else).

The eigenvectors or a linear operator $L$ are the solutions (up to scale) of the equation $x=Lx$. Remarkably, the $Y$ combinator is an algorithmic solver, able to find the "eigenfunction" $g$ of a (higher-order) function $f$, such that $g = Y f$.

fold

The fold function is a universal tool for list processing.

foldL = lam[f,z] (yC!(g!f!z))
  where
    g = lam[f,z,y,x] (isNil!x!z! (f!(headL!x)!(y!(tailL!x))) )
 foldL
fz.(y.(x.y(xx))(x.y(xx)))((fzyx.(x.x(xy.x))xz(f((xyz.x(yz))(x.x(xy.x))
(x.x(xy.y))x)(y((xyz.x(yz))(x.x(xy.y))(x.x(xy.y))x))))fz)

Many useful recursive functions can now be easily defined without explicitly using the Y combinator:

sumL = foldL!add!zero
sumL  (cons seven (cons two (cons three nil))) →
yx.y(y(y(y(y(y(y(y(y(y(y(yx)))))))))))
lengthL = foldL!(k!(add!one))!zero
 lengthL   (range   three   seven) →
yx.y(y(y(y(yx))))
mapL = lam[f] (foldL!(lam[x,y] (cons!(f!x)!y)) ! nil)

At this point any computation can be expressed in high-level terms. For example, something like

sum . map (^2) $ [1..3]

translates to the following $\lambda$ expression

 sumL!((mapL!(lam[x](pow!x!two)))!(range ! one! three))
(fz.(y.(x.y(xx))(x.y(xx)))((fzyx.(x.x(xy.x))xz(f((xyz.x(yz))(x.x(xy.x)
)(x.x(xy.y))x)(y((xyz.x(yz))(x.x(xy.y))(x.x(xy.y))x))))fz))(x.x(zyx.y(
zyx)))(xz.z)((f.(fz.(y.(x.y(xx))(x.y(xx)))((fzyx.(x.x(xy.x))xz(f((xyz.
x(yz))(x.x(xy.x))(x.x(xy.y))x)(y((xyz.x(yz))(x.x(xy.y))(x.x(xy.y))x)))
)fz))(xy.(xy.(xyz.zxy)(xy.y)((xyz.zxy)xy))(fx)y)((xyz.zxy)(xy.x)(xy.x)
))(x.(xy.yx)x(xz.x(xz)))((xy.(y.(x.y(xx))(x.y(xx)))((yfx.(xy.(x.x(yxy.
y)(xy.x))(x(x.x(yz.z((zyx.y(zyx))(y(xy.x)))(y(xy.x)))(z.z(xz.z)(xz.z))
(xy.y))y))x((zyx.y(zyx))y)((xyz.zxy)(xy.x)(xy.x))((xy.(xyz.zxy)(xy.y)(
(xyz.zxy)xy))x(f((zyx.y(zyx))x))))y)x)(xz.xz)(yx.y(y(yx)))))

which, after "only" 3614 reduction steps, evaluates to:

sumL ((mapL (lam[x](pow x two))) (range   one  three)) →
yx.y(y(y(y(y(y(y(y(y(y(y(y(y(yx)))))))))))))

implementation and extensions

This simple implementation of $\lambda$-calculus requires just a few lines of Haskell code. The key step is the substitution subroutine, that must be aware of free and bound variables.

It could be easily extended with "primitive" functions and data for improved efficiency, and we could also add function names and other syntactic sugar to get something closer to a practical programming language.