Algorithmic Gymnastics: Anonymous recursion and the Y combinator

These are notes for myself to help me build intuition. Use at your own peril.

The theory

The identity function $id(x)$ evaluates to its input:

def id(x):
  return x

The function $f(x)$ evaluates not to $x$, but to $f(x)$:

def f(x):
  return f(x)

We say that an expression is self-replicating if it evaluates to itself For example $f(42)$ is a self-replicating expression because $f(42) = f(42)$ (by construction).

In lambda calculus, we cannot name functions, therefore we can't do this thing of evaluating the function f inside of its definition by name. The question is: does there exist a self-replicating expression under this constraint?

The answer is yes.

def f(x):
  return x(x)

$f(f)$ evaluates to $f(f)$ so $f(f)$ is a self-replicating expression, and $f$ does not invoke itself by name so it is a valid function in lambda calculus. $f(f)$ is known as the Ω combinator. Obviously it doesn't make sense to evaluate it in Python, since it's an expression that recurses infinitely. But it is interesting as an object of study, because it's an example of recursion where we haven't called a function from inside itself by name.

A fixed-point combinator is a function $Y(f)$ such that $Y(f) = f(Y(f))$. Again, we could define

def Y(f):
  return f(Y(f))

But this definition would be illegal in lambda calculus because we use $Y$ by name inside its definition. The question is: can we define a function $Y$ such that $Y(f) = f(Y(f))$, under the constraint?

Yes we can. Consider

def Y(f):
  def g(x):
    return f(x(x))
  return g(g)

Then $Y(f) = g(g)$. But evaluating the RHS using the definition of $g$ gives $g(g) = f(g(g))$, and $f(g(g)) = f(Y(f))$. Therefore $Y(f) = f(Y(f))$ (for now we are treating this as merely a mathematical statement, without considering its computational implications). $Y$ is known as the Y combinator.

The general principle at work in both of these examples is that we cannot self-reference a function inside its definition. However, we can work around this by passing a function to itself as a parameter – then, the function can invoke its parameter, i.e. can invoke itself. This gives us the recursion. This is what we do with both $f(f)$ and $g(g)$.

Anonymous recursion without the Y combinator

Consider the traditional recursive factorial definition:

>>> def fac(n):
...     return 1 if n == 0 else n * fac(n-1)
...
>>> fac(5)
120

What's the smallest change we can make to avoid explicit self-reference by name? The factorial function must invoke a function recursively, but it cannot explicitly invoke itself by name. Let's introduce a new parameter $f$ for this function. In order to use the factorial function, we must now pass it to itself so that it knows how to recurse:

>>> def fac(f, n):
...     return 1 if n == 0 else n * f(f, n-1)
...
>>> fac(fac, 5)
120

In lambda calculus all functions are at most one-parameter. We can use currying to express the logic above with only functions of one parameter:

>>> def fac(f):
...     def inner(n):
...         return 1 if n == 0 else n * f(f)(n-1)
...     return inner
...
>>> fac(fac)(5)
120

To demonstrate that this corresponds to a valid expression in lambda calculus, we can rewrite the exact same thing using only single-argument anonymous functions, with a modest loss of readability:

>>> (lambda f: lambda n: (lambda n: 1 if n == 0 else n * f(f)(n-1))(n))(
      lambda f: lambda n: (lambda n: 1 if n == 0 else n * f(f)(n-1))(n))(5)
120

The point here is that functionally, the Y combinator isn't strictly necessary to obtain recursion in lambda calculus. Instead, it's enough to use the principle of passing a function to itself so it can invoke itself.

...so why do we even need the Y combinator?

Our example works, but it’s ugly:

Firstly, we need to explicitly include the function parameter $f$ and thread it into the recursive call to itself. This threading action is auxiliary to the actual recursive business logic we're encoding, yet it ends up conflated with it. This function-threading is a pattern that needs to happen for every recursive function we write, so if possible, we'd like to abstract it out.
Secondly, we are forced to state the function fac twice: the first time is when we define it, and the second is the parameter that we pass into the invocation of the definition. This is especially evident in the lambda form, where the full function definition must be written twice. In lambda calculus, everything is written in its full expanded form, so this deficiency is especially bothersome.

The Y combinator, then, can be introduced as a convenient, reusable tool that encapsulates the pattern of passing functions as parameters into themselves, allowing us to write recursive functions more easily and cleanly.

We return to the curried prior example, but attempt to fix the deficiencies above: we'll remove the function threading and the repeated statement of the function.

>>> def fac(f):
...     def inner(n):
...         return 1 if n == 0 else n * f(n-1)
...     return inner
...
>>> fac(5)
<function fac.<locals>.inner at 0x10440dd30>

Now the code is free of the deficiencies above. Unfortunately we can't actually invoke the function in this form.

This is where the Y-combinator comes in.

>>> def Y(f):
...   def g(x):
...     return f(x(x))
...   return g(g)

>>> factorial = Y(fac)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in Y
  File "<stdin>", line 3, in g
  File "<stdin>", line 3, in g
  File "<stdin>", line 3, in g
  [Previous line repeated 995 more times]
RecursionError: maximum recursion depth exceeded

What's happening here is that $Y(f)$ evaluates to $f(Y(f))$ which evaluates to $f(f(Y(f))$ and so on, infinitely. If we were working in a language with lazy evaluation such as Haskell, it would work because it would only evaluate the argument to $Y$ if necessary. But Python evaluates eagerly, so it tries to do the infinite expansion.

We can bolt lazy evaluation onto the Python example by judiciously passing defensive lambdas as function arguments instead of actual values. Our functions can then choose to evaluate or not evaluate their parameters.

>>> def lazy_fac(f):
...     def inner(n):
...         val = 1 if n == 0 else n * f()(n-1)
...         return val
...     return inner
...
>>> def Y(f):
...     def g(x):
...         val = f(lambda: x()(x))
...         return val
...     return g(lambda: g)
...
>>>
>>> fact = Y(lazy_fac)
>>> fact(5)
120

And we're done!

Let's take one last look at how the Y combinator actually does its thing. We'll consider the non-lazy version because it's cleaner – even though it doesn't actually run in Python.

We have a function $fac$ which takes a recursive argument $f$, and returns a function $inner$ that invokes $f$. When we call $fac$ with a realised function $f$, the returned $inner$ is actually a closure with its $f$ bound.

$Y(f)$ evaluates to $f(Y(f))$. So when we call $Y(fac)$, the returned value is $fac(Y(fac))$. In other words, the returned value is the closure $inner$, with its recursive function $f$ bound to $Y(fac)$. But this $Y(fac)$ is also the closure $inner$ with the same bound function. And so on.

This means that the value $Y(fac)$ is the closure $inner$ with its free variable $f$ bound to the closure $inner$. In other words, when $inner$ calls $f(n-1)$, it is in effect just calling $inner(n-1)$.

The Y combinator allows the factorial function to function exactly as it would if it were using named recursion. But instead of encoding the named recursion into the code, it effectively binds the name into the function at runtime using the mechanics of closures, achieving the same thing. Neat!

Acknowledgement goes to this post for some of this intuition.

The Wikipedia page on anonymous recursion does a similar breakdown to here.

Algorithmic Gymnastics

Thursday, April 10, 2025

Anonymous recursion and the Y combinator

The theory

Anonymous recursion without the Y combinator

...so why do we even need the Y combinator?

No comments:

Post a Comment