Algorithmic Gymnastics: 2012

Monday, September 17, 2012

Finding Circuits in Points on a Plane

The Japanese Olympiad in Informatics Open Contest, held today, was absurdly hard. Of around 140 contestants, two solved Q1. No one solved Q3. 24 contestants, including myself, solved Q2. This post takes a formal approach to solving Q2.

There are N (1 ≤ N ≤ 10⁵) points positioned at integer coordinates on a plane. Starting at any point, find a route that connects all points exactly once via line segments, without any intersections, and returns to the starting point.

Two examples of valid routes through a given set of points. Note that the points will not always be spaced out in a grid.

Breaking Down the Problem

First, let's consider a simpler problem: ignore the requirement of having to return to the start point. All we have to do is connect up all the points with straight lines, without intersections, without "taking the pen off the page". Is it always possible to do this?

Intersections arise when you double back on yourself. If you only moved in one direction -- say, to the right -- then you'd never re-encounter a line you drew, so you'd never create an intersection. If there was a way to connect all points such that your line segments extend only vertically or to the right, then this route would be intersection-less. Clearly, we have to be somewhat clever about how we do this; for example, the case below cannot be completed if we're at position 9.

We can't get to the 3 or 2 without intersecting something.

Concept & Proof

What if we started at the leftmost point, and processed all the points sequentially from left to right? In theory, this would never result in a case such as the one above. Here's a handwavy inductive proof outlining why we can pick all points sequentially:

Lemma: All possible line segments representing connections between points, lie inside the convex hull of those points. This is the inverse of one one of the definitions of the convex hull ("A set of points is defined to be convex if it contains the line segments connecting each pair of its points." -- Wikipedia)

First, we must establish a starting point. If there is a point with a unique minimum x-coordinate, then we can pick that. If there are several points that share a minimum x-coordinate, then we cannot pick a 'middle' point, or else we can't access those above and below without either travelling left later or retracing a vertical line. So we will pick a point on a vertical extremity, the bottom-most left-most point.
Assume we've connected all points to the left and vertically below this point, and none vertically above or to the right. The next sequential point (sorting first by x ascending, then y ascending) is either above us or to the right somewhere. If it is above us, we can always simply draw the next line segment upwards to reach it; because all other line segments only lie inside the convex hull of points vertically below or to the left (by the lemma), there is no risk of intersecting this hull, and thus no risk of intersecting any of the connections inside this hull. By the same logic, if the next point is to the right, we can also connect it. In the new scenario, all points to the left and below are still connected, and none are connected above or to the right, but we've connected one more point.

By this reasoning, all points can be connected from left to right. The first and last points in this ordering will be the first and last points in our route, respectively. However, as the diagram shows, this algorithm often won't allow us to draw a non-intersecting line that returns to the start point. At this point, if you don't have the full solution yet, I encourage you to stop and think about it a bit before moving on.

Connecting the Dots (heh)

To solve the problem, there is one crucial observations to be made:

A circuit can be formed by partitioning the set of nodes into two subsets. One subset represents the points you take while moving right, the other represents those you take while moving left. The 'pivot' point, at which you stop going right and begin going left, can be part of either subset. However, you'd need to make sure that the connections of one subset don't intersect with the connections of another. A simple way to ensure this is to ensure their convex hulls don't intersect (as per the lemma above) -- in other words, partition them into two disjoint convex sets.

This is easily accomplished by simply drawing a line through the plane, and dividing the points into subsets based on what side of the line they're on. The line must be placed so that you can get from the rightmost point in the 'going right' subset to the rightmost point in the 'going left' subset, and also the leftmost point in the 'going left' subset to the leftmost point in the 'going right' subset, without intersecting any other connections.

Failing to fulfil the latter requirement above. No way to get from 7 to 12, or 1 to 6.

This latter requirement is interesting. As the diagram above demonstrates, we need to avoid situations where connections in the 'going right' subset (subset containing the point 1) cannot access the rightmost point in the 'going left' subset (in this case, point 12 by our sort).

The fundamental insight, at least for me, came when I considered what happens when we first use the all-connected-without-circuit algorithm outlined above, then we try to connect the rightmost point to the leftmost point. This final line will usually intersect with many of the previous segments drawn in by the algorithm. However, each of the line segments that intersect with it connect two points, and these points must be on opposing sides of the line. If we were able to reroute these connections so they only connected pairs of points above the line or pairs of points below the line, then no connections would intersect our rightmost-leftmost line.

Now, the inductive proof above tells us that it is possible to connect all the points above the line from left to right, and similarly for the points below the line.

Instead of acting as a connection in itself, the leftmost-rightmost line is acting as a guide. Having connected all the points on top of the line, and also all the points below the line, all we have to do now is connect the top set of points, the bottom set of points, and the leftmost and rightmost points, without creating any more intersections.

The leftmost-rightmost line ensures the convex hulls of the two sets are disjoint -- thus, we don't have to worry about intersections between the top and bottom sets. Consider the potential connection we wish to forge between the set of points above the line, and the rightmost point (that lies on the line). The rightmost element in the set fulfils the condition that all other points in its set are connected, and lie directly below or to the left of it. The rightmost point on the line, being the rightmost point overall, also lies to the right of our rightmost set point. So we can definitely establish a connection between the rightmost point of our above-line set and the rightmost line point without intersecting any segments occuring below the leftmost-rightmost line or any of the segments to the left or directly below our current point in this set; this covers all existing connections. Hence, we can connect our above-line set to the rightmost point on the line without creating any intersections.

An identical argument can be made for the leftmost and rightmost points for the above-line and below-line sets. None of them interfere with each other, due to the disjointedness of their convex hulls. This creates a full circuit, thus solving the general case of the problem.

Here's the 100%-scoring code I wrote for it in the competition:

#include <cstdio>
#include <vector>
#include <utility>
#include <algorithm>
using namespace std;
 
typedef long long ll;
 
typedef pair<int, int> point;
#define x first
#define y second
#define SIZE(z) ((int)z.size())
 
int N;
pair<point, int> points[100005];
 

int pos(point p1, point p2, point p3) {
    double a = p2.y - p1.y;
    double b = -(p2.x - p1.x);
    double c = ((ll)(p2.x - p1.x)*(ll)p1.y - (ll)(p2.y-p1.y) * (ll)p1.x);
    
    if (a*p3.x + b*p3.y + c > 0) return -1;
    if (a*p3.x + b*p3.y + c < 0) return 1;
    return 0; 
}
 

int marked[100005];
int main() {
    scanf("%d", &N);
    for (int i=0 ; i<N ; i++) {
        scanf("%d %d", &points[i].first.x, &points[i].first.y);
        points[i].second = i+1;
    }
 
    sort(points, points+N);
 
    point first_point = points[0].first;
    point last_point = points[N-1].first;
    vector<int> line_points, above_points, below_points;

    bool all_on_line = true;
    for (int i=0 ; i<N ; i++) {

        // Get relative position (above, below, or on line)
        int position = pos(first_point, last_point, points[i].first);
        
        if (position == 0) {
            // Point lies on the line
            above_points.push_back(i);
        } else if (position == 1) {
            // Point lies above the line
            above_points.push_back(i);
            all_on_line = false;
        } else {
            // Point lies below the line
            below_points.push_back(i);
            all_on_line = false;
        }
    }
    if (all_on_line) {
        // All points are collinear. Impossible.
        printf("0\n");
        return 0;
    }
 
    for (int i=0 ; i<SIZE(above_points) ; i++) {
        printf("%d\n", points[above_points[i]].second);
    }
    for (int i=SIZE(below_points)-1 ; i>=0 ; i--) {
        printf("%d\n", points[below_points[i]].second);
    }
    
    return 0;
}

Thursday, May 10, 2012

Quad-tree Range Queries are O(sqrt(S)) Time

I was told my whole informatics career that quad-tree operations were all logarithmically proportional to the size of the tree's range. This is true for insertion and point queries, but it is not true for range queries. In the worst case, a range query will recurse down to 1x1 blocks around the edge of your rectangle -- and there are O(sqrt(S)) of these.

Wednesday, April 18, 2012

Digital Extraction and Rotation

This will be a short one. In the Google Code Jam qualification round, Problem C involved rotating the digits of a number to the left, e.g., 12405 -> 24051 -> 40512 -> 5124.

Python lets us do this with easy string <-> int conversions, but these aren't particularly fast. Instead, we can do it mathematically by extracting digits with % and /. This requires us knowing the number of digits in the number beforehand, but this can be calculated with the log₁₀ function (which is relatively slow, because computers store numbers in binary). In the Code Jam problem the number of digits didn't vary, so I could just calculate the length once then pass it in to my shifting function.

>>> def rot(n, length):
     return n % (10**(length-1))*10 + n / (10**(length - 1))

 
>>> rot(12405, 5)
24051
>>> 
>>> rot(24051, 5)
40512
>>> rot(23829239, 8)
38292392

The % / trick is really great, especially in languages such as C where string <-> int conversions are less straight-forward. It works on two principles: integer division with / by a power of 10 allows us to strip off digits from the right. You can visualise this as moving the decimal point to the left, then deleting everything that follows it. For example, 14231 / 10² gives 142.31, which with int division is 142. Similarly, applying the modulo operator % with a power of 10 allows us to strip digits from the right. 14231 % 10² is the remainder when 14231 is divided by 100, which as we saw before is 31.

Can someone please confirm that mathematical shifting is faster than converting it to a string with str() and manipulating that? And also, that log₁₀ is relatively slow?

Hacky but Effective Gradient Descent

Here's an interesting situation I encountered recently:

Given a L of N (1 ≤ N ≤ 100000) integers, where L_0..k is sorted in descending order and L_k..N is sorted in ascending order for some secret k, determine the minimum value in the list. Additionally, this list has the property that:

In L_0..k, if a < b < c then L_a-L_b ≥ L_b-L_c.
In L_k..N, if a < b < c then L_c-L_b ≥ L_b-L_a.
You may only look at roughly 1000 elements in the list.

The vague bound on permitted number of lookups is due to the way the original problem was phrased. In the original problem, there was a function with discrete domain and range instead of a list, and the function was slow to evaluate. As long as my program ran in under 3 seconds for some test case, it was marked correct for that test case; knowing the speed of the judging computer, this allowed for roughly 1000 function calls.

To rephrase the problem, the list's values form a sort of discrete parabola (though there are no guarantees on symmetry), and we have to find its minimum.

EDIT: I later found out that ternary search was a correct solution. However, one of my friends used the parabola-like property to do a standard binary search over the difference between successive points: i.e., its derivative, (if such a term is defined for this situation), which is a monotonic function. Kudos to Charlie for that beautiful solution).

After trying to code a couple of inevitably buggy ternary searches, I tried a different approach, based on the notion of gradient descent. The idea here is that we have a function whose minimum we want to find, so we pick some starting point on the slope of the function and sort of 'slide' down the slopes (though really, this 'slide' is more of a 'hop' to another point on the function's slope), stopping once we reach the minimum. This is also related to the notion of simulated annealing . When implementing gradient descent with a continuous function, there are two ways of getting the slope of our current point: either differentiate the function at the point (thus giving an exact result) or just approximate it by picking two close points.

I used the latter approach for this problem. We can get the 'tangent' of some point on the list by comparing two adjacent points. For example, if our current position in the list holds the value 1000 and the subsequent position holds the value 500, we should jump forward by a large amount; but if the subsequent position held the value 990, we're probably quite close to the minimum, so we should jump forward by a small amount.

It turns out that 1000 list lookups is a hell of a lot, so even dodgy heuristics like these work most of the time -- it's a number of lookups vs. speed of convergence trade-off, and since our permitted number of lookups is very generous, we can be confident we'll converge to the correct solution.

Pretty much, all I had to do was implement something like this. Note that this specific code is largely untested.

# this list is small, but the solution works on very large lists as well
L = [123912, 12000, 11000, 9000, 5000, 4600, 4300, 4250, 4230, 4228, 4231, 4235, 4250, 5400, 6900, 10000, 234432, 23423432, 8645632423436]

min_point, min_val = -1, float('inf')

# start at list item 0, but really we could start anywhere
cur = 0

# initialise the jump size to some value that's slightly smaller than the size of L
jump = int((len(L)-1 - cur) / 1.5)

for i in xrange(500):
    # this could theoretically go over the length of the list, but that won't happen
    # due to the pattern of jumps. i don't think you can ever normally get to the
    # rightmost element, but this should really be proved
    adj = cur + 1

    # perform two lookups: one for the current point, one for the adjacent point
    val_cur = L[cur]
    val_adj = L[adj]

    # have we found a new best?
    if val_cur < min_val:
        min_point, min_val = cur, val_cur

    if val_adj < val_cur:
        # point to the right is smaller than point to the left...
        # meaning we are on the left of the min, meaning jump right
        cur += jump
    else:
     # otherwise, jump left
        cur -= jump

    # perform a vague reduction of the jump size that will allow it to reach 1 within
    # 500 steps (this should be dependent on list size, but this is all a huge hack).
    # just make sure it doesn't get to 0 prematurely, we always want to allow adjacent jumps!
    jump = max(0, int(jump * 0.8))

# and, magic!
print min_point, min_val

Pretty cool, huh? This is a very crude version of gradient descent, but hey -- it works, and it's surprisingly effective. It found the precise minimum value in all ~30 cases, including ones that were specifically designed to break heuristic programs like this. Sometimes you've just gotta hack something up.

EDIT 2: Here's an implementation of Charlie's algorithm:

# this list is small, but the solution works on very large lists as well
L = [123912, 12000, 11000, 9000, 5000, 4600, 4300, 4250, 4230, 4228, 4227, 4231, 4235, 4250, 5400, 6900, 10000, 234432, 23423432, 8645632423436]
 
min_point, min_val = -1, float('inf')
 
# start at list item 0, but really we could start anywhere
start, end = 0, len(L) - 1
while start <= end:
    mid = (start + end) / 2
    print start, end, mid
    mid_val, next_val = L[mid], L[mid+1]

    if mid_val < min_val:
        min_point, min_val = mid, mid_val

    if next_val > mid_val:
        # we're on the pos grad. slope
        end = mid - 1
    else:
        start = mid + 1

print min_point, min_val

Friday, February 3, 2012

We Need To Talk About Binary Search.

Although the basic idea of binary search is comparatively straightforward, the details can be surprisingly tricky.

-- Donald Knuth

Why is binary search so damn hard to get right? Why is it that 90% of programmers are unable to code up a binary search on the spot, even though it's easily the most intuitive of the standard algorithms?

Firstly, binary search has a lot of potential for off-by-one errors. Do you do inclusive bounds or exclusive bounds? What's your break condition: lo=hi+1, lo=hi, or lo=hi-1? Is the midpoint (lo+hi)/2 or (lo+hi)/2 - 1 or (lo+hi)/2 + 1? And what about the comparison, < or ≤? Certain combinations of these work, but it's easy to pick one that doesn't.
Secondly, there are actually two variants of binary search: a lower-bound search and an upper-bound search. Bugs are often caused by a careless programmer accidentally applying a lower-bound search when an upper-bound search was required, or vice versa.
Finally, binary search is very easy to underestimate and very hard to debug. You'll get it working on one case, but when you increase the array size by 1 it'll stop working; you'll then fix it for this case, but now it won't work in the original case!

I want to generalise and nail down the binary search, with the goal of introducing a shift in the way the you perceive it. By the end of this post you should be able to code any variant of binary search without hesitation and with complete confidence. But first, back to the start: here is the binary search you were probably taught...

Input:	sorted list of elements, query term
Output:	the index of the first appearance of the query in the list, or an `ERROR` value otherwise

I propose an alternative definition: a binary search takes as input a (monotonic) function f(x) and a boolean predicate function p(v), and searches over the finite domain of the function for arguments where the predicate is true for the function's value -- i.e., values x such that p(f(x)) is true.

Based on this definition, here are the definitions of the variants:

Upper-bound:	find the maximum argument x such that p(f(x)) is true
Lower-bound:	find the minimum argument x such that p(f(x)) is true

The traditional binary search described above is a special case of the general lower-bound search, where f(x) = array[x], the domain of f(x) is the set of integers {0, 1, ..., N-1} (N being the length of array) and p(v) = v ≥ query.

In other words, you're searching for the minimum argument x such that array[x] ≥ query. Hey, this is just what we had before!

Let's face it: this description of binary search isn't very helpful. For example, why use this predicate thing when all we need is a simple array[mid] < query in our binary search?

The advantage of this somewhat convoluted definition comes when either the query is not in the array or it's in the array many times. Say you're searching for the first instance of the number 6 in the following array, using the traditional method:

[1, 1, 2, 4, 5, 5, 5, 6, 6, 6, 6, 8, 10, 10, 11]

It should output 7. Let's try this out...

lo, hi = 0, len(arr) - 1
while lo < hi:
    mid = (lo + hi) / 2
    if arr[mid] >= 6:  hi = mid - 1
    else:              lo = mid + 1
print mid  # should print 7

This looks about right. If mid is greater or equal to 6, it'll keep searching to the left, trying to find smaller values of 6. Otherwise, it'll search to the right of mid. Will mid always be the correct index by the end? I run it... apparently not, it outputs 5. Oh, I know! It's because I used ≥ instead of >! Right now it'll keep searching left after it encounters the first 6. OK, change that to a > and run it again. Now I'm getting 9... oh, it must be because I'm accessing arr[mid] at the end instead of arr[lo]. On the final iteration, arr[mid] would have become too large, but arr[lo] will be just right -- it should be at exactly 7, which is what we want. Hit F5; wtf, 10? Undo that ≥/> change from before but keep the other changes to see if it makes a difference -- nope, now 6? And I haven't even considered the issue of inclusive or exclusive bounds...

No joke, I just messed around with the +'s and -'s and bounds and ≥s and >s and ≤s <s and los and mids and his for about 10 minutes and I can't find a single combination which gets me an answer of 7. This is worse than any other bug because you'll undoubtedly end up in a loop of case-bashing: getting it to work for this case, then finding it fails for another, then fixing it for that case and finding it now fails the original case. This style of debugging never works. Instead, turn off your monitor, grab a pen and paper and plan this out.

OK, are we doing upper-bound or lower-bound search? We're finding the minimum index with the value 6; lower-bound then. What's the predicate? It's a lower-bound search, so we want it to return true while we're larger than our desired index and return false when it's smaller. Easy! p(v) = (v >= 6).

So let's do this right. I now write my own p(v) function, even though the logic is ridiculously simple, and I translate the predicate-based binary search definition into code. Most importantly, I introduce a new variable, the best_so_far variable.

def p(v): return v >= 6

lo, hi = 0, len(arr) - 1
best_so_far = None

while lo <= hi:
    x = (lo + hi) / 2
    if p(arr[x]):
        # we found a potential minimum x, but we should still check to see if any smaller ones work
        best_so_far  = x
        hi = x - 1
    else:
        # the predicate is false, so we need to go right to find true values
        lo = x + 1

print best_so_far

And it works first time. *whistles*

But the problem definition changes! Your boss tells you that now, you must search for the last occurrence of 6!

But hey, that's cool. Re-evaluate the problem: it's now an upper-bound search. Our predicate must return true for values smaller or equal to 6, but start returning false after we get to 7, so the maximum x that p(f(x)) = true is the index of the last 6.

def p(v): return v <= 6

lo, hi = 0, len(arr) - 1
best_so_far = None

while lo <= hi:
    x = (lo + hi) / 2
    if p(arr[x]):
        # we found a potential maximum x, but we should still check to see if any larger ones work
        best_so_far = x
        lo = x + 1
    else:
        # the predicate is false, so we need to go left to find true values
        hi = x - 1

print best_so_far

Again, it works first go. Notice that I only changed two things: the predicate function (reversed the sign) and the direction we head when we find a predicate=true (i.e. I swapped the lines lo = x + 1 and hi = x - 1). I did not have to mess with any +'s or -'s. No off-by-ones were introduced during the making of this function.

Notice also that I use lo = x + 1 and not lo = x. Similarly, hi = x - 1 and not hi = x. This is a foolproof way to avoid the nasty infinite loop binary search bug caused by integer division -- it ensures that you're never considering a value of x more than once, so you're always narrowing down your search space by at least 1 each time, hence ensuring termination. The use of max/min_so_far gives us complete control over how we're approaching the solution, meaning that we don't need to mess around trying to work out whether it's lo, hi or mid that contain the return value at the conclusion of the algorithm. I personally find that inclusive bounds work best with this form of binary search, but your mileage may vary. If you use exclusive bounds, I make no guarantees on this strategy's correctness.

Yet again, the requirements change. The list is now in descending order, and you need to find the index of the first item less than 5.

[11, 10, 10, 8, 6, 6, 6, 6, 5, 5, 5, 4, 2, 1, 1]

As usual, you only need to make two decisions. Upper- or lower-bound? It's clearly lower-bound: you're finding the first item that satisfies the predicate. What's the predicate? p(v) = (v < 5). Expected output is 11.

def p(v): return v < 5

lo, hi = 0, len(arr) - 1
best_so_far  = None

while lo <= hi:
    x = (lo + hi) / 2
    if p(arr[x]):
        # we found a potential minimum x, but we should still check to see if any smaller ones work
        best_so_far  = x
        hi = x - 1
    else:
        # the predicate is false, so we need to go right to find true values
        lo = x + 1

print best_so_far

It prints 11.

Do I expect every programmer to write out a trivial p(v) function for every binary search they write? Of course not. It might help you think about the problem, but it's not required. If you take one thing away from this post, let it be this: in any binary search you ever write, whether it be over a list of strings, or a multi-dimensional space, or over the domain of a function that uses the inclusion-exclusion principle on O(1) cumulative sums of rectangular regions of a ternary predicate mapped over the integer values in a 2D grid (this has happened before), you just need to worry about whether it is upper- or lower-bound and what your predicate is.

BAM. No more bugs in a binary search, ever. You can thank me later.

Tuesday, January 3, 2012

Shortest Superstring Problem

If I give you a list of words, can you find the shortest string that contains all the words?

I can!

It turns out that this problem is equivalent to TSP, which means there are no polynomial-time algorithms to solve it. However, we can tackle it the same way as we tackle TSP: do the dynamic programming solution, which although slow, is the optimal correct algorithm. There's a trick to it though, which caught me out the first time: if one word is a complete subset of another and does not appear at the beginning or the end (as in the case ['abc', 'b']), the edge traversal principle doesn't work. In cases like these, the smaller word must first be removed from the list. You can see this below with ['germanic', 'german', 'germ']. EDIT: And if two words are the same, they'll both be substrings of each other, but we don't want to remove both of them. Thanks to Werner Lemberg for picking this up in the comments.

For once, some neat code. What's going on???

all_words = ['ginger', 'german', 'minutes', 'testing', 'tingling', 'minor', 'testicle', 'manage', 'guilt', 'germanic', 'normal', 'malt', 'german', 'germ']
all_N = len(all_words)

words = []

# remove strings which are substrings of others
for i in xrange(all_N):
    for j in xrange(all_N):
        if i != j and all_words[i] != all_words[j] and all_words[i] in all_words[j]:
            break
    else:
        words.append(all_words[i])

N = len(words)
        
 
# determine the numerical overlap between two strings a,b
def overlap(a, b):
    best = 0
    for i in xrange(1, min(len(a), len(b))+1):
        if b.startswith(a[-i:]):
            best = i
    return best
 
 
cost = [[None] * N for _ in xrange(N)]
# Precompute edge costs
# for every pair of words with indices u,v
for u in xrange(N):
    for v in xrange(N):
        # work out the best compressed concatenation you can make with u then v
        cost[u][v] = len(words[u]) + len(words[v]) - overlap(words[u], words[v])
 
                 
cache = {}
backtrace = {}
                 
# top-down DP
def solve(used, last):
    if (used,last) not in cache:
     
        bestCost = 0
        bestOption = None
 
        # the base case is when used == 0. using no words, the optimal solution is of length 0
        if used != 0:
         
            bestCost = float('inf')
         
            # for each word we can use
            for i in xrange(N):
                 
                # if the word is as yet unused
                if (1 << i) & used:
                 
                    # calc the cost of using it
                    newCost = cost[i][last] + solve(used & ~(1<<i), i)
                     
                    # if we've reached a new best solution, update stuff
                    if newCost < bestCost:
                        bestCost = newCost
                        bestOption = i
         
        # cache stuff
        cache[(used, last)] = bestCost
        backtrace[(used, last)] = bestOption
     
    return cache[(used, last)]
 
 
# run it for all possible starting cases
bestCost = float('inf')
bestOption = None
for i in xrange(N):
    cur = solve(((1<<N)-1) & ~(1<<i), i)
    if cur < bestCost:
        bestCost = cur
        bestOption = i
 
 
# reconstruct the words that were used
soln = []
used = (1<<N) - 1
last = bestOption
while last is not None:
    soln.append(words[last])
    used &= ~(1<<last)
    last = backtrace[(used, last)]
soln.reverse()
 
 
# now compress the words of the solution into the final string
cur = soln[0]
for i in xrange(1, N):
    cur += soln[i][overlap(cur, soln[i]):]
print cur

The output is minutestinglingingermanicminormaltmanageguiltesticle. Fascinating stuff, I know.