If one searches for ‘Y combinator’ now, it is likely that the first hit is Paul Graham’s incubator company Y Combinator. I am not sure whether Paul likes this fact or not—as a Lisp hacker, he must like the Y combinator very much to name his company with it, but now people looking for information for the mathematical/programmatic concept may be distracted by his company information. If he is truly still a Lisp hacker, he may even regret it a bit—though it apparently benefits his company.
Y combinator is an intriguing concept. When I first saw it, I spent the whole weekend reading and studying about it. Still, I did not fully get it. People think the Y combinator is an important concept in functional programming^{1}:
… we can similarly use knowledge of the Y combinator as a dividing line between programmers who are “functionally literate” (i.e. have a reasonably deep knowledge of functional programming) and those who aren’t.
There are a lot of existing materials to introduce the Y combinator, but I have not yet known one that is written for C++ programmers. I would like to fill this gap.
This said, I still will start from functional languages, but you, dear reader, are not required to know one in advance. The focus is on the concept, and I will show running C++ code soon.
Now the journey begins.
Lambda Calculus
Of course you can find a huge amount of materials about lambda calculus, and I will describe just enough for C++ programmers to have the basic concept. Instead of the more common way of defining a square function as ‘sqr(x) = x · x’, one can define it as a lambda expression:
sqr = λx.x · x
And that is exactly how languages like Haskell and Scheme define such a function, bar the syntax difference.
Haskell:
sqr = \ x > x * x
Scheme:
(define sqr (lambda (x) (* x x)))
The key benefit is that you do not need a name for functions now. In order to apply the sqr function to 3, one can simply write (you may notice that parentheses are only used to override associativity, but not required around 3: this is the usual convention):
(λx.x · x) 3
It is also intuitive how it is evaluated (called βreduction in the terminology of lambda calculus): one replaces the occurrences of x with 3, and removes the initial λx. So the result is simply 3 · 3 = 9.
The C++ code for printing the result would be (assuming int
type; also note that parentheses are required around ‘3’ in C++):
cout << [](int x) { return x * x; } (3) << endl;
However, considering that C++’s grammar is not really helpful to see lambda expressions clearly, it is probably better to use a name wherever appropriate:
auto const sqr = [](int x) { return x * x; }; cout << sqr(3) << endl;
Please be aware that auto
is used here, as each C++ lambda expression has a unique type. However, a lambda expression can be converted to a function
object (defined in <functional>). The example above can be changed as follows (‘std::
’ is omitted), and the code will still compile and run (maybe with a negligible overhead):
function<int(int)> const sqr = [](int x) { return x * x; }; cout << sqr(3) << endl;
Fixedpoint Combinator
A fixedpoint combinator^{2} is a higherorder function that satisfied the following equation:
y f = f ( y f )
One can see immediately that this definition can lead to infinite expansion:
y f = f ( y f ) = f ( f ( y f )) = f (… f ( y f )…)
And that is exactly where the power lies. If the function y is given, it is possible to define a recursive function in a nonrecursive way! Let us take factorial as an example. The traditional definition should be like:
fact n = If IsZero n then 1 else n × fact (n − 1)
Assume a function F exists so that fact = y F:
( y F ) n = If IsZero n then 1 else n × ( y F ) (n − 1)
F ( y F ) n = If IsZero n then 1 else n × ( y F ) (n − 1)
Replace y F with f :
F f n = If IsZero n then 1 else n × f (n − 1)
We now get a function F that can be defined without any recursions. We can define factorial exactly this way in Haskell:
fix f = f (fix f) fact = fix $ \ f n > if (n == 0) then 1 else n * f (n  1)
This definition has two problems:
 It works only in a lazy language^{3} like Haskell.
 Recursion occurs in the definition of
fix
.
I will address the first problem in the next section, and leave the second to the section after.
How to Overcome Laziness
The laziness of Haskell is great power, but it is not available in most other languages. As I mentioned in my last blog^{4}, the solution is quite simple: an additional layer of indirection. A lambda expression is not evaluated before the argument is given, so the following transformation will help (ηabstraction; assuming f takes one argument):
f ⇒ λx.f x
The following program works in lazy Scheme:
(define Y (lambda (f) (f (Y f)))) (define F (lambda (f) (lambda (n) (cond ((eq? n 0) 1) (else (* n (f ( n 1)))))))) (define fact (Y F))
Before going ahead to transform the code for strict^{5} Scheme, let us analyse the involved types first. That will be very important when we try to implement it in C++, and the misunderstanding of the types led me to think that the Y combinator was unimplementable in C++. A Haskelllike notation is used below:
 f (in F) ∷ int → int
 F f ∷ int → int
 Y F ∷ int → int
 F ∷ (int → int) → int → int
 Y ∷ ((int → int) → int → int) → int → int
We can see that f (see line 10 in the code above), F f (see line 7), and Y F (which is the factorial function) are all firstorder functions that take an integer and return an integer. Therefore, F is a secondorder function that takes such a firstorder function and returns a firstorder function. Our Y takes a secondorder function like F, and returns a firstorder function.
Seeing that Y f is a firstorder function, we can now apply the ηabstraction on it to eliminate the infinite expansion. The following definition of Y makes the program work again in strict Scheme:
(define Y (lambda (f) (f (lambda (x) ((Y f) x)))))
In the meantime, we have prepared enough for an implementation in C++, and I can show you the code now:
#include <iostream> #include <functional> using namespace std; template <typename T, typename R> function<R(T)> Y( function<function<R(T)>(function<R(T)>)> f) { // Y f = f (λx.(Y f) x) return f([=](T x) { return Y(f)(x); }); } typedef function<int(int)> fn_1ord; typedef function<fn_1ord(fn_1ord)> fn_2ord; fn_2ord almost_fact = [](fn_1ord f) { return [f](int n) { if (n == 0) return 1; else return n * f(n  1); }; }; int main() { fn_1ord fact = Y(almost_fact); cout << "fact(5) = " << fact(5) << endl; }
Hopefully everything now looks familiar and natural (except the C++ syntax, which is still quite cumbersome for the job).
Curry’s Y Combinator
Now we can come back and address the recursive definition problem. It is probably not a problem for practical usage, but it is very interesting indeed. The solution we encounter most often, the Y combinator, is attributed to Haskell Curry^{6} (yes, the Haskell language is named after him). It is also mindboggling, like any of the selfreferencing puzzles and paradoxes. I am amazed by it, and that is why I want to write this blog.
There are some good references about deriving the Y combinator^{7} ^{8} ^{9}, and I will simply present the result here:
Y = λf.(λx.f (x x)) (λx.f (x x))
I also want to show to you that this definition satisfies the fixedpoint equation:
Y F = (λf.(λx.f (x x)) (λx.f (x x))) F (expand Y )
= (λx.F (x x)) (λx.F (x x)) (βreduction on the bound variable f )
= F ((λx.F (x x)) (λx.F (x x))) (βreduction on the first bound variable x)
= F (Y F)
Notice there is an equivalent form (βreduce it once to get the Y above):
Y = λf.(λx.x x) (λx.f (x x))
Basically, this is the definition in lazy Scheme:
(define Y (lambda (f) ((lambda (x) (x x)) (lambda (x) (f (x x))))))
Recalling that the function argument of Y—which is f above—takes a firstorder function as argument, we know x x should return a firstorder function. So we can apply the ηabstraction to get the strict Scheme version:
(define Y (lambda (f) ((lambda (x) (x x)) (lambda (x) (f (lambda (y) ((x x) y)))))))
Before implementing it in C++, we have one more difficulty. A statically typed language cannot make a function call like x x, at least not directly so, as the type system will forbid it. Even Haskell, which normally allows people to write very succinct code, cannot express the Y as directly as Scheme. One has to fool the type system with a proxy object.
Here is how it goes in C++:
template <typename F> struct self_ref_func { function<F(self_ref_func)> fn; };
I.e. an object of this type contains a function that takes an object of its own type and returns an object of the template type, which is again a function. E.g.:
typedef function<int(int)> fn_1ord; typedef self_ref_func<fn_1ord> fn_self_ref; fn_self_ref x = { … }; // assign a suitable value to x x.fn(x); // (x x)
With the help of this selfreferencing type, Curry’s Y combinator can finally be implemented in C++:
template <typename T, typename R> function<R(T)> Y( function<function<R(T)>(function<R(T)>)> f) { // Y = λf.(λx.x x) (λx.f (λy.(x x) y)) typedef function<R(T)> fn_1ord; typedef self_ref_func<fn_1ord> fn_self_ref; fn_self_ref r = { [f](fn_self_ref x) { // λx.f (λy.(x x) y) return f(fn_1ord([x](T y) { return x.fn(x)(y); })); } }; return r.fn(r); }
Summary
I have shown how the famous Y combinator can be implemented in C++, and what techniques are required to do it. I wish you found the information useful. If you have gone thus far, you will also probably be interested in reading about practical uses of the Y combinator^{10}, and you will probably want a decent Haskell implementation^{11} and Scheme implementation^{12}. I have also put the C++ code for Y (slightly changed) in a public repository^{13}, as readytouse library code. People have actually implemented the Y combinator in more than five dozen languages^{14}.
My C++ code is tested on Clang 3.5 and GCC 4.9 in C++11 mode. Haskell code is tested on GHC 7.8.3, and Scheme code is tested on DrRacket 6.1.
Have fun!
Update (5 May 2016)
The code here only demonstrates what can be done in C++, but is not optimized for speed. I have just noticed that there is now a proposal^{15} to add Y combinator to the C++ standard library. To my amazement, the reference implementation is both simple and fast—in my test, its speed is close to that of native recursive functions (O3
under Clang, or O2
under GCC). You probably want to check it out!
 Mike Vanier: The Y Combinator (Slight Return) or How to Succeed at Recursion Without Really Recursing, section ‘Why Y?’. ↩
 Wikipedia: Fixedpoint combinator. ↩
 Wikipedia: Lazy evaluation. ↩
 Yongwei Wu: Study Notes: Functional Programming with C++. ↩
 Wikipedia: Strict programming language. ↩
 Felice Cardone and J. Roger Hindley: History of Lambdacalculus and Combinatory Logic (PDF), pp. 8–9. ↩
 Mike Vanier: The Y Combinator (Slight Return) or How to Succeed at Recursion Without Really Recursing, section ‘Deriving the Y combinator’. ↩
 Matthias Felleisen: A Lecture on the Why of Y (PS). ↩
 Daniel P. Friedman and Matthias Felleisen: The Little Schemer, chapter 9. MIT Press, Cambridge, Mass., 4th edition, 1995. ↩
 Bruce J. McAdams: That About Wraps it Up: Using FIX to Handle Errors Without Exceptions, and Other Programming Tricks. ↩
 GHC is the standard Haskell implementation. ↩
 Racket is a nice Scheme platform. ↩

functional.h contains
fix_simple
,fix_curry
, andfix_function_converter
make_curry
, among other templates. ↩  Rosetta Code: Y Combinator. ↩
 Yegor Derevenets: A Proposal to Add Y Combinator to the Standard Library. ↩
In Wikipedia, the Y combinator is an “implementation of the fixed point combinator”, which in turn is defined as a “higherorder function that returns some fixed point of its argument function, if one exists”. The property Yf=f(Yf) is a sequence there. You use Yf=f(Yf) as a definition. The Y combinator you implemented works for your definition, but it’s not a fixed point combinator as it does not return the fixed point of a function. Is the Wikipedia definition wrong?
LikeLike
y f = f (y f) is the (symbolic) definition of a fixedpoint combinator, but it is not the definition of the Y combinator. The Y combinator refers the Curry’s version, λf.(λx.f (x x)) (λx.f (x x)), and it is a fixedpoint combinator.
LikeLike
Well λf.(λx.f (x x)) (λx.f (x x)) seems right, but above you cite Y = λf.(λx.x x) (λx.f (x x)) and comment in the code you comment is as Y = λf.(λx.x x) (λx.f (λy.(x x) y)). This is hard to get. Did you check f(Yf), f(f(Yf)), …?
LikeLike
@Pavel, it seems you have missed this sentence in the article: ‘Notice there is an equivalent form (βreduce it once to get the Y above)’.
LikeLike