Another great algorithms question I heard recently:
Given a set $S$ of real numbers, call a pair of numbers $x,y \in S$ close if $x$ and $y$ appear consecutively in a sorted representation of $S$ (i.e. $x,y$ are close iff $S \cap (x,y) = \emptyset$). Find a close pair $x,y \in S$ that maximises $|x-y|$.
The trivial $O(n \log n)$ solution is to sort $S$ using any $O(n \log n)$ comparison sort, iterate through each pair of elements in $S$, and pick the pair whose absolute difference is greatest. However, there is a simple and elegant $O(n)$ solution.
If $S$ contained integers bounded in a reasonably-sized interval, we could modify the trivial solution described above to use a $O(n)$ sort such as radix sort. Unfortunately the elements of $S$ are arbitrary real numbers so no such property holds. However, we will try to use a bucket sort approach anyway. The first step is to decide on a bucketing scheme, and this is where the first "aha!" observation in the solution comes into play:
Claim: a solution $x,y$ to this problem satisfies $|x-y| \geq \frac{\max(S) - \min(S)}{|S|-1}$.
Where does that right-hand-side expression come from? To me it represents a sort of discrete mean value theorem. There are $|S|-1$ close pairs in $S$: the number of consecutive pair of elements in the sorted representation of $S$. Each of these pairs represents a disjoint interval in $\left[\min(S), \max(S)\right]$, and together they partition this entire interval. Say we divide the entire contiguous interval covered by $S$ into $|S|-1$ disjoint uniform partitions of length $L$. In any other partitioning of this interval (including the partitioning denoted by the close pairs in $S$), at least one partition must have length at least $L$ (similarly, at least one partition must have length at most $L$). This is intuitive and easy to visualise, but for the sticklers there's an easy proof by contradiction: suppose that the close pair $x,y \in S$ maximising $|x-y|$ has $|x-y| < \frac{\max(S) - \min(S)}{|S|-1}$. Then the size of the union of all the $|S|-1$ close pair intervals in $S$ is less than $\max(S) - \min(S)$. But the close pair intervals partition $\left[\min(S), \max(S)\right]$ so the size of the union of all close pair intervals must equal $\left|\left[\min(S), \max(S)\right]\right| = \max(S) - \min(S)$.
We will proceed with our bucketing using $|S|-1$ buckets each of size $\frac{\max(S) - \min(S)}{|S|-1} = M$. The first bucket contains elements $s \in S$ where $s < \min(S) + M$, the second bucket contains $s \in S$ where $\min(S) + M \leq s < \min(S) + 2M$, etc. The bucketing process is a linear time operation because for each $s \in S$, a simple mathematical expression tells us which bucket will contain $s$, and we can put it in a bucket in constant time. (Buckets would probably be implemented as linked lists or arrays).
Now what? Well, we've asserted that the elements in the solution pair have absolute distance at least $M$, so they can't both be in the same bucket. We only have to consider pairs of elements in different buckets. Also, if a bucket contains more than 2 elements, we can ignore all but the minimum and maximum in that bucket: no element $x$ in the bucket can form a close pair with an element $y$ of another bucket, because either the minimum or the maximum in the bucket will lie in between $x$ and $y$. So we will proceed as follows: for each bucket, find the minimum in that bucket and the maximum in that bucket, and delete all other elements of the bucket. This is a linear-time operation as every element in $S$ is processed at most twice (once to check if it's a maximum or minimum in its bucket, and potentially once more to erase it from its bucket).
Now we have $|S|-1 = O(n)$ buckets, each with at most two elements. We can simply iterate through each bucket in increasing order and build a list of the elements in all buckets, by adding first the minimum then the maximum in each bucket. This list has $O(n)$ elements, more importantly it's sorted, and most importantly, both the elements in the solution pair are guaranteed to be in the list. To finish off, we simply iterate through each consecutive pair of the list and pick the pair with the greatest difference.
In my opinion this is too much of an "aha!" problem to give a useful signal in an interview (either you see the $M$ trick or you don't), but it's fascinating nonetheless.
Edit: a colleague of mine found a similar solution that follows easier reasoning. Use $|S|+1$ buckets of equal size across the range $\left[\min(S), \max(S)\right]$. Since there are only $|S|$ elements, by [an alternative formulation of] the pigeonhole principle at least one of the buckets must be empty, so the difference between the closest pair is at least one bucket-width. This allows us to ignore all elements that aren't the min or max in a bucket. We now proceed as before.