Algorithmic Gymnastics: September 2013

A while ago, I made the claim on this blog that "Quad-tree Range Queries are O(sqrt(S)) Time". Today, I thought I'd prove that this is true. Technically, I'll prove a much more formal statement, and then hand-wave a bit to give the above. We will prove that:

The tightest attainable upper-bound on the size of the minimum set of quad-tree nodes whose union is exactly the span of a given rectangle in an $N\times N$ space is $N$.

To prove this, we will consider the general statement "the tightest attainable upper-bound on the number of quad-tree nodes whose union spans a rectangle in an $N\times N$ space is $K$". Then we will show that $K \leq N$ and furthermore that $N \leq K$, allowing us to conclude that $N=K$.

First, we show that $K \leq N$. To do this, it is sufficient to show that there is at least one rectangle that requires a minimum set of nodes of size at least $N$. We can easily do this by choosing any rectangle of height 1 and width $N$. Since each quad-tree node spans a square, we will need each node's corresponding square to have a height of exactly 1 (so it doesn't exceed the height of our rectangle). But since they're squares, their width must also be 1 – so we'd need $N$ of these nodes to describe the rectangle. Thus our minimum set is of size $N$, so $K \leq N$.

Now we will show that $N \leq K$. Intuitively this seems more difficult because we have to prove that for all rectangles, we can find a minimum spanning set of nodes of size at most $N$. We will show this result by using the steps of the standard recursive algorithm for querying a rectangle.

Recall that the algorithm to operate on a quad-tree range looks like:

FUNCTION RangeOperate(node, query_rect)
  IF node.rect is completely contained within query_rect:
    DO OPERATION ON node // This node has been 'selected' by the query
  ELSE IF node.rect is completely disjoint from query_rect:
    RETURN // Nothing more to do
  ELSE:
    RangeOperate(node.quadrant_1, query_rect)
    RangeOperate(node.quadrant_2, query_rect)
    RangeOperate(node.quadrant_3, query_rect)
    RangeOperate(node.quadrant_4, query_rect)

We seek an upper bound on the number of nodes that will be operated upon. Each time RangeOperate recurses, it shifts its attention from an $n\times n$ node to an $\frac{n}{2} \times \frac{n}{2}$ node. Because the function can initially be called on a rectangle of size $N\times N$, and since it can't go any deeper than $1\times 1$ nodes, the function can recurse to a maximum depth of $\log_2 N$. Keep this in mind, we'll need it later.

We make at most 4 recursive sub-calls per call of RangeOperate. Let's list all the possible types of rectangles that query_rect could be, and for each one, consider what these sub-calls will do.

If query_rect completely contains or is completely disjoint from node.rect, we don't recurse at all. Call this the termination condition.
If query_rect lies within only one of the four quadrants of node, then all but one of the four sub-calls will encounter the termination condition. The other will continue recursing.
If query_rect lies within two of the four quadrants of node, then two of the four sub-calls will encounter the termination condition. The other two will continue recursing.
If query_rect lies within all four quadrants of node, none of the four sub-calls will terminate. However, the query_rect for each of these sub-calls will overlap the sub-call's node.rect at one of its corners. Let's use subnode for a sub-call's node rect; query_rect still refers to the same rectangle. Without loss of generality, let's assume that query_rect overlaps the top-right corner of subnode.rect (the three other cases are rotations of this case).
- If query_rect only overlaps with quadrant 1 of subnode, then only one sub-call of this sub-call (the sub-sub-call if you will) will not meet the termination condition.
- If query_rect overlaps with quadrants 1 and 2 of subnode or quadrants 1 and 3 of subnode, then only these two sub-sub-calls will not meet their termination condition.
- If query_rect overlaps with all four quadrants of subnode, then quadrant 1 of subnode must be completely within the query_rect; so this sub-sub-call meets the termination condition. The sub-sub-calls on quadrants 2 and 3 have query_rect overlapping two corners of subsubnode.rect, and thinking about this for a moment will convince you that at least two of the sub-sub-sub-calls will then meet their termination condition.

By visualising the cases, you'll quickly convince yourself that it is impossible for query_rect to lie within exactly three of node's quadrants, so this list is exhaustive. What was the point of all this? Well, we established that after recursing down three levels from any call of RangeOperate, at most 8 sub-sub-sub-calls won't have met their termination conditions. We can amortize this to prove that at most 2 sub-calls will remain active from any given call of RangeOperate – and since our recursion tree has depth $\log_2 N$ and branching factor 2, the total number of nodes operated on is at most $2^{\log_2 N} = N$.

Of course, there are other interesting questions raised by this exercise:

Do these results generalise to higher dimensions? Oct-trees and such?
What can we say about quad-trees that span non-square spaces? (Hint: if the space spanned is $W\times H$, we can adapt what we've done already to show $K=\max(W, H)$.
We've shown that the upper-bound on the minimum set size is $N$, and we have (implicitly) shown that the standard quad-tree querying algorithm never queries a set of nodes larger than $N$ by proving the $N \leq K$ result using the steps of that same algorithm. But what guarantee do we have that this quad-tree querying algorithm will find a minimum set for any rectangle? For instance, say we had two algorithms, the standard query algorithm $S$ and another query algorithm $P$. We have established that the number of nodes queried by each of these algorithms is at most $N$, but suppose that $S$ queries on average $N/2$ nodes and $Q$ queries on average $N/3$ nodes. Can we prove that $S$ does in fact select a minimum set for any rectangle query?

Algorithmic Gymnastics

Monday, September 30, 2013

Proving a Bound on Quad-tree Queries