A couple of weeks ago I ended up on this page about smoothsort, thanks hackernews you faithful time sync. Smoothsort is widely considered to be the greatest of the greats when it comes to both execution time and memory. It manages optimized for best case and memory.
The short story is that it uses an in-place heap sort to achieve this but instead of a boring old binary heap it uses a mighty Leonardo heap.
I will not go in depth about Leonardo heaps or smoothsort as I want to follow up with a post about implementing all of this in Rust as my hello world, instead I will talk about the math behind this wonderfully quirky data structure.
The lesser known Leonardo numbers
Everybody knows about Fibonacci, but the Leonardo numbers have one up on them… Sorry couldn’t help myself.
Which means that the first few Leonardo numbers are:
A Leonardo heap consists of one or more binary trees such that the number of nodes in each tree is a Leonardo number. Now in order for Leonardo heaps to work we must prove that:
- Any number can be written as a sum of distinct Leonardo numbers. Thus we can take any set of numbers and arrange them into binary trees each having nodes, such that is a Leonardo number.
- For any , the number of binary trees needed to form the heap () is at most . Knowing how many binary trees our Leonardo heap contains will help when calculating the asymptotic complexity of the push and pop operations.
I really wanted to prove these two conjectures myself so I stopped reading Keith’s explanation and picked up a pencil. I have to say it was harder than it should have been.
Caveat: I’m pretty rusty when it comes to math so pretty please let me know if I get stuff wrong. There’s a “Did I get something wrong?” link at the end that points to the GitHub page where you can open an issue or submit a pull-request for this very article.
That’s the gist of what we want to prove. In English, we need to prove that for any natural number there exists a sequence of numbers denoted such that the sum of the -th Leonardo numbers () for is , also we restrict the sequence to be of distinct monotonically increasing numbers by adding the condition .
I stared at these equations for a while. I knew there was some sort of induction proof out there. The Leonardo sequence formula hinted at that especially hard with the plus one in . But I couldn’t figure it out. It felt like you transition from to by collapsing two consecutive Leonardo numbers into the next one, but there was one piece of the puzzle missing.
With a little help from React I built this nifty widget that computes the desired sequence for consecutive . The first column is , the next are the members of the set. You can use the toggle at the top to switch between the actual Leonardo numbers or the Leonardo sequence indices.
Source on github.
You might have noticed that for the I actually use instead of this is intentional and actually part of the nifty solution. You can already start to see some pretty clear patterns and recursion emerge. If you look closely at the numbers you’ll see that between any two consecutive only one of two things happens:
= , aka the first to indices are consecutive and they collapse into the next Leonardo number, because if we have two consecutive indices there
We add or at the front of the sequence, if both were already there that would fall into (1).
The first situation isn’t intuitively true because, when trying to transition from to , collapsing two consecutive Leonardo numbers might give you another number that’s already in the sequence. But that’s actually the missing piece. If you look at all the numbers in the box you’ll notice that we only ever get consecutive Leonardo numbers in the first two positions of the sequence, check it out. Thus we can prove the conjecture by further restricting its conditions:
Let’s break it down. The first condition stays the same as before. The 2nd and 3rd ensure not only that the sequence is monotonically increasing but that two consecutive indices will only occur at the beginning of the sequence. The 4th is a little helper condition that enforces to be part of the sequence only if already is. It helps us have a single valid transition when neither or are part of the sequence for and we’re transitioning to .
For the induction part we say that for , the empty sequence fulfils all conditions. Now we need to prove that for any with a sequence , fulfilling the conditions, there’s a sequence fulfilling the same conditions for . This becomes trivial after the observations I made above, when looking at the changes in structure between consecutive values of . We have these cases:
- = + 1, the first two elements of are consecutive
If that’s the case we can defined a sequence with elements. and . It follows that:
- = 1 and case (1) does not apply, which means that we can define with elements such that and , it follows that:
- ≠ , from that follows that ≠ otherwise the 4th condition would imply and case (1) would apply. So let there be a sequence with elements such that and , it follows that:
Which wraps up our proof nicely. All that’s left now is to figure out if for the series , but I’ll leave this as an exercise to the reader1.
Tune in next time when I take all this further and talk about the Leonardo heap and how that’s used in implementing the elusive smoothsort.
Hint: write Leonardo numbers as a function of Fibonacci numbers.↩