Blog

Jim Weirich

This is a 3 part article on understanding the Y-Combinator, based on the Y-Not? talk I gave at RubyConf 2012.

In Part I we will cover the historical background for the Y-Combinator.

In Part II we will lay down a foundation of technical topics needed to understand the Y-Combinator

In Part III we will derive the Y-Combinator from scratch.

Getting Started

Though many have heard of the Y-Combinator, few understand what it is for, let alone how it works. This article is an attempt to remove some of the mystery around the Y-Combinator and to make it more accessible to your average programmer.

We are going to build up to Y slowly, touching on several topics along the way. But before we are ready to talk about the Y-Combinator, we need to delve into the history of computability first.

What is Computable? (Turing's Work)

In the 1930s, mathematicians were struggling with the question of what algorithms were computable and which were not, and if different types of logic systems (remember, this was before the introduction of computers) compute different sets of algorithms.

The most famous of these attempts at defining computability is probably Turing's effort, from whence we get the term "Turing Machine".

A Turing Machine is a device that has access to an infinitely long tape. A head on the Turing Machine can only read or write from the position on the tape directly under the head. The tape can be advanced forward or backward one position at a time. A program stored on the Turing Machine directs the machine to read, write or move the tape, and can make decisions based on what was read from the tape.

(Note: If you want to see a Turing machine in action, here is a good video)

Turing then showed that this extremely simple, trivial device is able to compute "anything that is computable". What's really amazing is that the most powerful and expressive computer language we use today cannot not compute anything that the Turing Machine cannot also compute.

What is Computable? (Church's Work)

Around the same time Alan Turing was playing with his infinite tapes, Alonzo Church was also thinking about the problem of computability. But Church's approach was quite different from that of Turing's.

Church designed a language that had a single data type, a function. No strings, no integers, no floating point numbers ... just functions. Here's the identify function in Church's notation:

  λx.x

The lambda character ("λ") introduces a function. The "x" immediately following the lambda is the name of the argument to the function. The dot (".") separates the argument from the body. The final "x" is the body of the function, in this case returning original argument.

Since we have no integers in Church's "lambda calculus", we need a way to represent numbers. Here is one common convention:

  ZERO = λf.λv.v
  ONE  = λf.λv.f v
  TWO  = λf.λv.f (f v)
  TEN  = λf.λv.f (f (f (f (f (f (f (f (f (f v)))))))))

The number ONE is a function that takes two arguments, a function and a value. It then applies the function to the value once. TWO applies the function to the value twice. TEN applies the function 10 times. With this scheme, we can represent the number zero by having it return the value without applying the function at all.

If/then/else can easily be represented in lambda calculus very easily:

  TRUE  = λx.λy.x
  FALSE = λx.λy.y

TRUE is a function of two arguments that returns the first argument. FALSE takes two arguments and returns the second. How is that helpful? Here is a function that returns ONE if its argument is true and ZERO if the argument is false:

  ONE_IF_TRUE = λb.b (λf.λv.f v) (λf.λv.v)

Just a note, the "NAME=" notation we are using here isn't really part of lambda calculus. We are just using it to give names to convenient fragments of lambda expressions.

Lambda calculus expressions are evaluated by substituting the argument value into the body of the function. Essentially, lambda calculus is nothing more than simple macro expansions.

For example, suppose we wanted to evaluate

  ONE_IF_TRUE TRUE

First we need to replace our convenient names with the real lambda expressions.

  (λb.b (λf.λv.f v) (λf.λv.v)) λx.λy.x

Now replace b in the body with "λx.λy.x":

  (λx.λy.x) (λf.λv.f v) (λf.λv.v)

Now replace x in the body with the first argument:

  λy.(λf.λv.f v) (λf.λv.v)

Finally replace y in the function body with the remaining argument. This is easy because y doesn't occur in the function at all.

  λf.λv.f v

And we are left with the value of ONE. So, "ONE_IF_TRUE TRUE" evaluates to ONE. Not a big surprize there. What is interesting about this is that the whole process of evaluating lambda expressions nothing more than basic text replacement. That's it.

The Missing Piece

So Church claims that his Lambda Calculus is able to compute anything that is computable, just like a Turing Machine.

(An historical aside here, Church actually made his claim first, before Turing. It was number of years later that everyone finally decided that the set of things computable by Lambda Calculus and the set computable by a Turing Machine were actually the same set. But that doesn't matter to our story).

But there is a problem with lambda calculus. Let me demonstrate with a little bit of Ruby code.

Here is the definition of the factorial function using Ruby's anonymous functions.

  fact = ->(n) { n.zero? ? 1 : n * fact.(n-1) }
  fact.(5)     # => 120

In Lambda Calculus terms, we have cheated here. Assigning the factorial function to a variable allows us to reference that variable inside the function. This is strictly forbidden in Lambda Calculus. The problem becomes clear if we inline the definition of factorial:

  ->(n) { n.zero? ? 1 : n * fact.(n-1) }.(5)

If you were to actually run this code, you would get a "undefined local variable or method `fact' " error. Since we no longer assign a value to fact, we can't use fact inside its own defintion. This is a fundamental limitation of Lambda Calculus, you cannot recursively call anonymous functions.

What's Next?

But wait! If Lambda Calculus is truly Turing Complete, then there must be some way of getting around this "no recursion" problem. Don't worry, there is. But first in Part II we must digress a bit and cover some technical ground to lay some foundations.