Algorithmic Gymnastics: Finding Circuits in Points on a Plane

The Japanese Olympiad in Informatics Open Contest, held today, was absurdly hard. Of around 140 contestants, two solved Q1. No one solved Q3. 24 contestants, including myself, solved Q2. This post takes a formal approach to solving Q2.

There are N (1 ≤ N ≤ 10⁵) points positioned at integer coordinates on a plane. Starting at any point, find a route that connects all points exactly once via line segments, without any intersections, and returns to the starting point.

Two examples of valid routes through a given set of points. Note that the points will not always be spaced out in a grid.

Breaking Down the Problem

First, let's consider a simpler problem: ignore the requirement of having to return to the start point. All we have to do is connect up all the points with straight lines, without intersections, without "taking the pen off the page". Is it always possible to do this?

Intersections arise when you double back on yourself. If you only moved in one direction -- say, to the right -- then you'd never re-encounter a line you drew, so you'd never create an intersection. If there was a way to connect all points such that your line segments extend only vertically or to the right, then this route would be intersection-less. Clearly, we have to be somewhat clever about how we do this; for example, the case below cannot be completed if we're at position 9.

We can't get to the 3 or 2 without intersecting something.

Concept & Proof

What if we started at the leftmost point, and processed all the points sequentially from left to right? In theory, this would never result in a case such as the one above. Here's a handwavy inductive proof outlining why we can pick all points sequentially:

Lemma: All possible line segments representing connections between points, lie inside the convex hull of those points. This is the inverse of one one of the definitions of the convex hull ("A set of points is defined to be convex if it contains the line segments connecting each pair of its points." -- Wikipedia)

First, we must establish a starting point. If there is a point with a unique minimum x-coordinate, then we can pick that. If there are several points that share a minimum x-coordinate, then we cannot pick a 'middle' point, or else we can't access those above and below without either travelling left later or retracing a vertical line. So we will pick a point on a vertical extremity, the bottom-most left-most point.
Assume we've connected all points to the left and vertically below this point, and none vertically above or to the right. The next sequential point (sorting first by x ascending, then y ascending) is either above us or to the right somewhere. If it is above us, we can always simply draw the next line segment upwards to reach it; because all other line segments only lie inside the convex hull of points vertically below or to the left (by the lemma), there is no risk of intersecting this hull, and thus no risk of intersecting any of the connections inside this hull. By the same logic, if the next point is to the right, we can also connect it. In the new scenario, all points to the left and below are still connected, and none are connected above or to the right, but we've connected one more point.

By this reasoning, all points can be connected from left to right. The first and last points in this ordering will be the first and last points in our route, respectively. However, as the diagram shows, this algorithm often won't allow us to draw a non-intersecting line that returns to the start point. At this point, if you don't have the full solution yet, I encourage you to stop and think about it a bit before moving on.

Connecting the Dots (heh)

To solve the problem, there is one crucial observations to be made:

A circuit can be formed by partitioning the set of nodes into two subsets. One subset represents the points you take while moving right, the other represents those you take while moving left. The 'pivot' point, at which you stop going right and begin going left, can be part of either subset. However, you'd need to make sure that the connections of one subset don't intersect with the connections of another. A simple way to ensure this is to ensure their convex hulls don't intersect (as per the lemma above) -- in other words, partition them into two disjoint convex sets.

This is easily accomplished by simply drawing a line through the plane, and dividing the points into subsets based on what side of the line they're on. The line must be placed so that you can get from the rightmost point in the 'going right' subset to the rightmost point in the 'going left' subset, and also the leftmost point in the 'going left' subset to the leftmost point in the 'going right' subset, without intersecting any other connections.

Failing to fulfil the latter requirement above. No way to get from 7 to 12, or 1 to 6.

This latter requirement is interesting. As the diagram above demonstrates, we need to avoid situations where connections in the 'going right' subset (subset containing the point 1) cannot access the rightmost point in the 'going left' subset (in this case, point 12 by our sort).

The fundamental insight, at least for me, came when I considered what happens when we first use the all-connected-without-circuit algorithm outlined above, then we try to connect the rightmost point to the leftmost point. This final line will usually intersect with many of the previous segments drawn in by the algorithm. However, each of the line segments that intersect with it connect two points, and these points must be on opposing sides of the line. If we were able to reroute these connections so they only connected pairs of points above the line or pairs of points below the line, then no connections would intersect our rightmost-leftmost line.

Now, the inductive proof above tells us that it is possible to connect all the points above the line from left to right, and similarly for the points below the line.

Instead of acting as a connection in itself, the leftmost-rightmost line is acting as a guide. Having connected all the points on top of the line, and also all the points below the line, all we have to do now is connect the top set of points, the bottom set of points, and the leftmost and rightmost points, without creating any more intersections.

The leftmost-rightmost line ensures the convex hulls of the two sets are disjoint -- thus, we don't have to worry about intersections between the top and bottom sets. Consider the potential connection we wish to forge between the set of points above the line, and the rightmost point (that lies on the line). The rightmost element in the set fulfils the condition that all other points in its set are connected, and lie directly below or to the left of it. The rightmost point on the line, being the rightmost point overall, also lies to the right of our rightmost set point. So we can definitely establish a connection between the rightmost point of our above-line set and the rightmost line point without intersecting any segments occuring below the leftmost-rightmost line or any of the segments to the left or directly below our current point in this set; this covers all existing connections. Hence, we can connect our above-line set to the rightmost point on the line without creating any intersections.

An identical argument can be made for the leftmost and rightmost points for the above-line and below-line sets. None of them interfere with each other, due to the disjointedness of their convex hulls. This creates a full circuit, thus solving the general case of the problem.

Here's the 100%-scoring code I wrote for it in the competition:

#include <cstdio>
#include <vector>
#include <utility>
#include <algorithm>
using namespace std;
 
typedef long long ll;
 
typedef pair<int, int> point;
#define x first
#define y second
#define SIZE(z) ((int)z.size())
 
int N;
pair<point, int> points[100005];
 

int pos(point p1, point p2, point p3) {
    double a = p2.y - p1.y;
    double b = -(p2.x - p1.x);
    double c = ((ll)(p2.x - p1.x)*(ll)p1.y - (ll)(p2.y-p1.y) * (ll)p1.x);
    
    if (a*p3.x + b*p3.y + c > 0) return -1;
    if (a*p3.x + b*p3.y + c < 0) return 1;
    return 0; 
}
 

int marked[100005];
int main() {
    scanf("%d", &N);
    for (int i=0 ; i<N ; i++) {
        scanf("%d %d", &points[i].first.x, &points[i].first.y);
        points[i].second = i+1;
    }
 
    sort(points, points+N);
 
    point first_point = points[0].first;
    point last_point = points[N-1].first;
    vector<int> line_points, above_points, below_points;

    bool all_on_line = true;
    for (int i=0 ; i<N ; i++) {

        // Get relative position (above, below, or on line)
        int position = pos(first_point, last_point, points[i].first);
        
        if (position == 0) {
            // Point lies on the line
            above_points.push_back(i);
        } else if (position == 1) {
            // Point lies above the line
            above_points.push_back(i);
            all_on_line = false;
        } else {
            // Point lies below the line
            below_points.push_back(i);
            all_on_line = false;
        }
    }
    if (all_on_line) {
        // All points are collinear. Impossible.
        printf("0\n");
        return 0;
    }
 
    for (int i=0 ; i<SIZE(above_points) ; i++) {
        printf("%d\n", points[above_points[i]].second);
    }
    for (int i=SIZE(below_points)-1 ; i>=0 ; i--) {
        printf("%d\n", points[below_points[i]].second);
    }
    
    return 0;
}

Algorithmic Gymnastics

Monday, September 17, 2012

Finding Circuits in Points on a Plane

Breaking Down the Problem

Concept & Proof

Connecting the Dots (heh)

No comments:

Post a Comment