Algorithmic Gymnastics: 2013

Saturday, December 28, 2013

Combinatorial Knapsack Family

This cute little set of algorithmic triplets solve like 60% of all dynamic programming problems.

(created by me. correctness not guaranteed)

Sunday, October 20, 2013

Tutorial on MaxScore (Information Retrieval)

In the Information Retrieval & Web Search course I'm taking, we look at an algorithm for document-at-a-time scoring called MaxScore (also written as max_score, maxscore or max score in the literature). Now MaxScore is a pretty cool algorithm, but unfortunately you wouldn't know that, because to learn MaxScore you have to piece together the ideas behind the technique from half a dozen hand-waved explanations in various academic papers.

This is a great loss to the recreational programming community, so I've written (to my knowledge) the first thorough and in-depth explanation of MaxScore. It includes a general outline, examples, and implementation details, as well as a detailed explanation and proof of why it works. Hopefully it will be useful to others.

Link here
Alternative link

Monday, September 30, 2013

Proving a Bound on Quad-tree Queries

A while ago, I made the claim on this blog that "Quad-tree Range Queries are O(sqrt(S)) Time". Today, I thought I'd prove that this is true. Technically, I'll prove a much more formal statement, and then hand-wave a bit to give the above. We will prove that:

The tightest attainable upper-bound on the size of the minimum set of quad-tree nodes whose union is exactly the span of a given rectangle in an $N\times N$ space is $N$.

To prove this, we will consider the general statement "the tightest attainable upper-bound on the number of quad-tree nodes whose union spans a rectangle in an $N\times N$ space is $K$". Then we will show that $K \leq N$ and furthermore that $N \leq K$, allowing us to conclude that $N=K$.

First, we show that $K \leq N$. To do this, it is sufficient to show that there is at least one rectangle that requires a minimum set of nodes of size at least $N$. We can easily do this by choosing any rectangle of height 1 and width $N$. Since each quad-tree node spans a square, we will need each node's corresponding square to have a height of exactly 1 (so it doesn't exceed the height of our rectangle). But since they're squares, their width must also be 1 – so we'd need $N$ of these nodes to describe the rectangle. Thus our minimum set is of size $N$, so $K \leq N$.

Now we will show that $N \leq K$. Intuitively this seems more difficult because we have to prove that for all rectangles, we can find a minimum spanning set of nodes of size at most $N$. We will show this result by using the steps of the standard recursive algorithm for querying a rectangle.

Recall that the algorithm to operate on a quad-tree range looks like:

FUNCTION RangeOperate(node, query_rect)
  IF node.rect is completely contained within query_rect:
    DO OPERATION ON node // This node has been 'selected' by the query
  ELSE IF node.rect is completely disjoint from query_rect:
    RETURN // Nothing more to do
  ELSE:
    RangeOperate(node.quadrant_1, query_rect)
    RangeOperate(node.quadrant_2, query_rect)
    RangeOperate(node.quadrant_3, query_rect)
    RangeOperate(node.quadrant_4, query_rect)

We seek an upper bound on the number of nodes that will be operated upon. Each time RangeOperate recurses, it shifts its attention from an $n\times n$ node to an $\frac{n}{2} \times \frac{n}{2}$ node. Because the function can initially be called on a rectangle of size $N\times N$, and since it can't go any deeper than $1\times 1$ nodes, the function can recurse to a maximum depth of $\log_2 N$. Keep this in mind, we'll need it later.

We make at most 4 recursive sub-calls per call of RangeOperate. Let's list all the possible types of rectangles that query_rect could be, and for each one, consider what these sub-calls will do.

If query_rect completely contains or is completely disjoint from node.rect, we don't recurse at all. Call this the termination condition.
If query_rect lies within only one of the four quadrants of node, then all but one of the four sub-calls will encounter the termination condition. The other will continue recursing.
If query_rect lies within two of the four quadrants of node, then two of the four sub-calls will encounter the termination condition. The other two will continue recursing.
If query_rect lies within all four quadrants of node, none of the four sub-calls will terminate. However, the query_rect for each of these sub-calls will overlap the sub-call's node.rect at one of its corners. Let's use subnode for a sub-call's node rect; query_rect still refers to the same rectangle. Without loss of generality, let's assume that query_rect overlaps the top-right corner of subnode.rect (the three other cases are rotations of this case).
- If query_rect only overlaps with quadrant 1 of subnode, then only one sub-call of this sub-call (the sub-sub-call if you will) will not meet the termination condition.
- If query_rect overlaps with quadrants 1 and 2 of subnode or quadrants 1 and 3 of subnode, then only these two sub-sub-calls will not meet their termination condition.
- If query_rect overlaps with all four quadrants of subnode, then quadrant 1 of subnode must be completely within the query_rect; so this sub-sub-call meets the termination condition. The sub-sub-calls on quadrants 2 and 3 have query_rect overlapping two corners of subsubnode.rect, and thinking about this for a moment will convince you that at least two of the sub-sub-sub-calls will then meet their termination condition.

By visualising the cases, you'll quickly convince yourself that it is impossible for query_rect to lie within exactly three of node's quadrants, so this list is exhaustive. What was the point of all this? Well, we established that after recursing down three levels from any call of RangeOperate, at most 8 sub-sub-sub-calls won't have met their termination conditions. We can amortize this to prove that at most 2 sub-calls will remain active from any given call of RangeOperate – and since our recursion tree has depth $\log_2 N$ and branching factor 2, the total number of nodes operated on is at most $2^{\log_2 N} = N$.

Of course, there are other interesting questions raised by this exercise:

Do these results generalise to higher dimensions? Oct-trees and such?
What can we say about quad-trees that span non-square spaces? (Hint: if the space spanned is $W\times H$, we can adapt what we've done already to show $K=\max(W, H)$.
We've shown that the upper-bound on the minimum set size is $N$, and we have (implicitly) shown that the standard quad-tree querying algorithm never queries a set of nodes larger than $N$ by proving the $N \leq K$ result using the steps of that same algorithm. But what guarantee do we have that this quad-tree querying algorithm will find a minimum set for any rectangle? For instance, say we had two algorithms, the standard query algorithm $S$ and another query algorithm $P$. We have established that the number of nodes queried by each of these algorithms is at most $N$, but suppose that $S$ queries on average $N/2$ nodes and $Q$ queries on average $N/3$ nodes. Can we prove that $S$ does in fact select a minimum set for any rectangle query?

Thursday, August 15, 2013

Simple In-Place Swap

Beginner programmers are often given the task of swapping the values of two variables. First, they will try to assign to each variable the value of the other.

x := y
y := x

Of course, this doesn't behave as expected -- the first assignment overwrites the original value of $x$. The standard solution to this issue is to use a temporary variable to duplicate the value of one of the input variables.

temp := x
x := y
y := temp

For the more mathematically-inclined student, there's an extension to this puzzle: swap the variables' values without storing intermediate values.

The trick is to notice that you can hold the original values of $x$ and $y$ implicitly as simple arithmetic results. For example, let $x := x + y$. We haven't lost the original value of $x$ -- we still have it 'stored' as $x - y$. Once this observation is discovered, some trivial arithmetic on a few small examples is sufficient to find the solution.

x := x + y
y := x - y
x := x - y

To verify that this algorithm actually works, we can let $y'$ be the final value of $y$ and $x'$ be the final value of $x$. By substituting previous values into the above equations, we see that $y' = (x + y) - y = x$ and $x' = (x + y) - ((x + y) - y) = y$, confirming that the values have indeed been swapped.

This is a particularly good task to give to children who are learning to program. It introduces to them the idea that programming isn't just about menial bit-twiddling, and reinforces the importance of analytical problem solving in computer science. It's also very approachable yet still conceptually interesting.

Monday, June 17, 2013

Capacity Scaling for Fast Max Flow

In my previous post, I presented an implementation of the Ford-Fulkerson method (termed 'method' and not 'algorithm' because it encompasses a number of different algorithms) for finding the maximum flow through a network.

The Ford-Fulkerson method involves repeatedly finding so-called augmenting paths in the residual network, then saturating these paths, terminating only when there are no augmenting paths left. The order in which these paths are found is dependent on the specific algorithm employed. The Edmonds-Karp algorithm, for instance, uses a breadth-first search to find the shortest augmenting path. Another algorithm employs a priority queue to find the maximum-flow augmenting path.

There is another approach, however, which accomplishes the same goal but attacks the problem from a different direction: the capacity scaling algorithm. Capacity-scaling for max flow has woefully little literature on it. I found mentions of it in CS lecture slides from several universities, and also in a TopCoder algorithms tutorial. Apparently it's presented in Network Flows: Theory, Algorithms, and Applications, but I'd neven read about it before. In practice, capacity scaling is just as simple to conceptualise, just as easy to implement, and quite a bit faster than traditional Ford-Fulkerson.

Ford-Fulkerson first finds an augmenting path, then saturates that path. Capacity scaling turns this principle on its head: start by assuming that an augmenting path exists with residual flow at least $r$, then find such a path and add $r$ to its flow. (this only works for integral edge weights by the way!)

In particular, we will begin by choosing a value of $r$ such that $r$ is the largest power of 2 that is less than the maximum edge weight in the graph. We will then push $r$ units of flow out of the source node, and hope that all $r$ units reach the sink. If they do, we try push another $r$ units out. Otherwise, set $r := r/2$ and repeat. Iterate while $r \geq 1$.

max_flow = 0
r = 2 ^ max_edge_weight
while r >= 1:
  do:
    push r units from source to sink
    max_flow += r
  while (successful)
  r /= 2
output max_flow

The algorithm should be fairly intuitive. It's clear that it finds the maximum flow, because it terminates only when it cannot push a single unit of flow i.e. there are no augmenting paths left. Furthermore, it pushes flow down an edge at most $\left \lfloor \lg U \right \rfloor$ times, where $U$ is the maximum edge weight in the graph: because each pass considers a different power of 2, one can interpret the algorithm as effectively 'setting each bit' of the flow amount for each edge.

Now to derive the time complexity. First, I want to make a claim that will help us out. I try to avoid proofs, but this one is important so I'll sketch it out.

EDIT: I'm no longer convinced of this proof's correctness. The problem is that by pushing negative flow, you can effectively undo work you did on a previous iteration. I'll leave it here anyway for readers to pick holes in.

Claim: For a graph $G(V,E)$ the algorithm retains a single value of $r$ for at most $|E|$ iterations of the flow-pushing routine.

Proof: Consider the initial case $r=2^{ \left \lfloor \lg U \right \rfloor}, U=\max \; \{ W(e) \mid e \in E \} $, where by design, no edge has residual flow more than twice $r$. Each time we successfully push flow, we decrease the residual flow of at least one edge $e$ by $2^{ \left \lfloor \lg U \right \rfloor}$. Now $W(e)-2^{ \left \lfloor \lg U \right \rfloor} \leq U - 2^{ \left \lfloor \lg U \right \rfloor}$ (since $U$ is chosen to be the greatest of all $W(e)$'s, and furthermore $U - 2^{ \left \lfloor \lg U \right \rfloor} \leq 2^{ \left \lfloor \lg U \right \rfloor - 1}$. So $e$ now has weight $2^{ \left \lfloor \lg U \right \rfloor - 1} < 2^{ \left \lfloor \lg U \right \rfloor}$ so we cannot push $r$ more units of flow down $e$. This process can only occur once for each edge, so the value of r is retained for at most $|E|$ iterations. Following this, we let $r:=r/2$ and classify edges as follows: those that we augmented on the previous iteration (which now have residual flow not more than twice $r$), those that we could not traverse on the previous iteration (because their values were not more than twice $r$), and those which were unreachable on the previous iteration (either because they're not in the connected component or because they can only be reached by first traversing an edge of residual flow not more than twice $r$ -- and hence can be thought of as having the residual flow of this earlier edge). Observe that this scenario is identical to our initial state, in which all reachable edges have $W(e) \leq 2r$, so by induction the algorithm retains a single value of $r$ until the base case $r < 1$.

Since there are $\left \lfloor \lg U \right \rfloor$ values of $r$ to trial, at most $|E|$ repetitions of the push-routine for each value of $r$, and each push-routine conducts an $O(E)$ search (BFS works well), we arrive at a final complexity of $O(E^2 \log U)$. This outperforms Edmonds-Karp's $O(VE^2)$ and the traditional iterated DFS both in theory and in practice.

Code will follow shortly.

Tuesday, May 28, 2013

More like "hacks-imum flow"! Hahaha!

So I have a hunch that the Python maximum flow implementation on Wikipedia is implemented incorrectly. Their find_path method is as follows:

    def find_path(self, source, sink, path):
        if source == sink:
            return path
        for edge in self.get_edges(source):
            residual = edge.capacity - self.flow[edge]
            if residual > 0 and not (edge,residual) in path:
                result = self.find_path( edge.sink, sink, path + [(edge,residual)] ) 
                if result != None:
                    return result

The following condition prevents the depth-first search from getting itself stuck in cycles, as it searches over the residual network:

            if residual > 0 and not (edge,residual) in path:

That looks fine. However, consider the case where your residual network is a dense directed acyclic graph with all edge weights positive. Furthermore, the source node is the 'root' of this DAG (it's an ancestor of every other node), and there is exactly one path from the source to the sink. The number of outgoing paths from the source will increase exponentially in proportion to the number of nodes!

The following graphic illustrates such a case. The source is A and the sink is B. All black edges have residual flow 1, and the red edge has residual flow 0. As the number of horizontal and vertical 'layers' in the graph increases and the number, the number of cycle-free paths from A to X increases exponentially, and none of these paths lead to the sink. So the algorithm could potentially keep trying these faulty paths before ever stumbling upon a path that leads to B.

Of course, I've almost certainly overlooked something. After all, Wikipedia doesn't have any mistakes, right?
As a plea for your forgiveness, I'll post my max-flow code. It doesn't have any of that silly 'object-oriented' stuff. The network example I use is the one found here.

import copy

N = 7
adj = [[0]*N for _ in range(N)]
adj[0][1] = 3
adj[0][2] = 1
adj[1][3] = 3
adj[2][3] = 5
adj[2][4] = 4
adj[3][6] = 2
adj[4][5] = 2
adj[5][6] = 3
source = 0
sink = 6

# the residual network begins as a copy of the original network
residual = copy.deepcopy(adj)

def get_augpath():
    seen = [False] * N

    def dfs(cur):
        if seen[cur]: return
        seen[cur] = True

        for next in xrange(N):
            if residual[cur][next] > 0:
                if next == sink:
                    # cur has an edge directly to the sink
                    return [cur, next]
                else:
                    # suffix is a path from cur to sink, or [] if none exists
                    suffix = dfs(next)
                    if suffix:
                        return [cur] + suffix
        # no paths found
        return []
    return dfs(source)

def max_flow():
    mf  = 0

    path = get_augpath()
    # keep iterating while a path exists
    while path:
        # get the min-cost edge along our augmenting path
        flow_to_push = min(adj[path[i]][path[i+1]] for i in xrange(len(path)-1))
        mf += flow_to_push
        for i in xrange(len(path) - 1):
            residual[path[i]][path[i+1]] -= flow_to_push
            residual[path[i+1]][path[i]] += flow_to_push
        print 'pushed', flow_to_push, 'along', path

        # try find another augmenting path
        path = get_augpath()
    return mf

print max_flow()

Monday, May 20, 2013

Turtle Tower

Another nice question I encountered recently!

You want to build a tower of turtles. Each turtle has a weight and a strength. A turtle tower is stable if each turtle in the tower can support all the turtles above it; that is, if each turtle's strength is at least as large as the total weight on top of it. How many turtles are in the tallest stable tower you can make?

Dynamic programming problems jump out at you like crazed carnivorous turtles, and this is no exception. It looks like the longest increasing subsequence problem, with a hint of knapsack. The first step would be to establish an ordering of turtles such that each turtle in the ordering can be placed after all other turtles, then we'd use some simple dynamic programming to select some maximal subset!

But hang on. What's the ordering? If the turtles are $(s=5, w=5)$ and $(s=10,w=10)$ then the heavier turtle must go below. If the turtles are $(s=2,w=10)$ and $(s=15,w=5)$ then the lighter turtle must go below! If the turtles are $(s=5,w=5)$ and $(s=10,w=2)$ then the stronger turtle must go below; but if the turtles are $(s=10,w=2)$ and $(s=5,w=15)$ then the weaker turtle must go below! Hmm...

It turns out that you have to sort turtles by the sum of their weight and strength. I don't know the intuition behind why this works, but it does; and proving it is an interesting and surprisingly involved exercise.

Denote the strength and weight of turtle $T$ by $s_T$ and $w_T$, respectively.

Lemma: If a stable tower can be built of height $h$, then a stable tower can be built of height $h$ such that for any pair of turtles $A,B$ where $B$ is below $A$, then $s_B+w_B \geq s_A+w_A$.

Proof: The proof will consist of an exchange argument. We will show that for any adjacent pair of turtles in the tower, we can always arrange them so that the lower turtle has a higher strength plus weight than the upper turtle, while still maintaining the tower's stability. If this is possible, then by a 'bubble sort'-type argument, we can repeatedly swap out-of-order adjacent pairs until the tower is sorted by each turtle $t$'s $s_t+w_t$.

Consider a tower consisting only of a pair of turtles $A,B$. Our task is to prove that if $B$ is placed below $A$, then $s_B+w_B \geq s_A+w_A$.

Case 1: assume $B$ supports $A$ but $A$ does not support $B$; that is, $s_B\geq w_A$ and $w_B > s_A$. Adding these inequalities, we obtain $s_B+w_B>s_A+w_A$ as required. Easy peasy!
Case 2: assume $B$ supports $A$ and vice versa. Regardless of their ordering, the total weight of their tower will obviously be $w_A+w_B$. Furthermore, if $B$ is placed below $A$, the weight supported by the tower is given by $\min(s_A,s_B-w_A)$ and if $A$ is placed below $B$, then the weight supported by the tower is given by $\min(s_B, s_A-w_B)$. We wish to maximise the weight the tower can hold above it, and we can prove that if by placing $B$ below $A$ where $s_B+w_B\geq s_A+w_A$, this is accomplished; that is, we can show that $\min(s_A,s_B-w_A)\geq \min(s_B,s_A-w_B)$. We will work backwards from the statement we wish to prove until we reach a true statement (not the most elegant proof but whatever).
\[\begin{aligned} \min(s_A,s_B-w_A) &\geq \min(s_B,s_A-w_B)\\ \min(s_A,s_B-w_A) + w_A + w_B &\geq \min(s_B,s_A-w_B) + w_A + w_B\\ \min(s_A+w_A+w_B,s_B+w_B) &\geq \min(s_B+w_B+w_A,s_A+w_A)\quad \text{(by distributivity of min)}\\ \min(s_A+w_A+w_B,s_B+w_B) &\geq s_A+w_A\quad \text{(by our assumption } s_B+w_B\geq s_A+w_A\text{)} \end{aligned}\] Clearly, both $s_A+w_A+w_B$ and $s_B+w_B$ are greater or equal to $s_A+w_A$. From our premise, we've reached a true statement, therefore the premise ("$B$ should be placed below $A$") is true!

These two cases are exhaustive for all situations where $B$ is placed below $A$, so our proof is complete!

The full solution is left as an exercise to the reader :) it can be be solved in $O(n^2)$ time and $O(n)$ space.

EDIT: after years, I've finally seen a satisfying explanation of the sort order. Suppose turtle $i$ supports turtle $j$ supports some stack of turtles. The total weight of the stack is no more than $s_i + w_i$. If $s_j+w_j > s_i+w_i$ then $s_j > (s_i + w_i) - w_j$, i.e. turtle $j$ supports the total weight of the stack excluding its own weight. Hence we can place turtle $j$ at the bottom of the stack.

Sunday, May 5, 2013

N Coloured Blobs

N coloured blobs wander around in groups on an open plain. Each group has its own distinct colour, and all coloured blobs in a group share the same colour. Occasionally, two groups will bump into each other: each of the blobs in the smaller group will take on the colour of the larger group, and the two groups will merge. If two groups of equal size bump into each other, the ’mergee’ is chosen at random.

A researcher monitors a coloured blob population. Initially, each blob is on its own (i.e. in a group of size 1) and has its own distinct colour. After a sufficient amount of time has passed, all the blobs have become part of one huge group of the same colour. Let K be the total number of times that a blob changed its colour.

A lower bound on K is N-1, if every blob is merged individually into the huge group, hence only changing colour once: from its initial colour to the colour of the huge group. Determine an upper bound on K that is as tight as possible.

This is a sweet little problem with a not terribly difficult solution, but it involves some interesting complexity analysis and allows us to derive a beautiful new implementation for the disjoint-set data structure.

Let's follow a single blob for its lifecycle and consider the number of times that it changes colour. Assume that at every merge, it merges with blob-set of the same size and takes on the other blob-set's colour.

After the first merge, its blob-set is size 2. After the second merge, its blob-set is size 4. We extrapolate to see that after the $k$th merge, its blob-set is size $2^k$; so if there were $N$ blobs initially, then a single blob can experience at most $\log_2 N$ merges and thus at most $\log_2 N$ colour changes. To give our final upper-bound, we assume that each blob experiences this number of colour changes, meaning that in the order of $O(N \log N)$ colour changes occur overall. (In fact, we can reduce the upper bound to $\frac{N \log_2 N}{2}$, but being a computer scientist, I'm not concerned about constant factors. seanieg89 from the boredofstudies.org forum proved this upper bound with some fancy maths, and that proof is available here.)

This demonstrates an interesting implementation of the disjoint-set data structure that I hadn't ever considered before. Give each set a label and keep track of the size of each set. When it comes time to merge two sets, overwrite the labels of the smaller set with those of the larger set. This is worst-case time $O(n \log n)$, as opposed to the union-find implementation which is marginally faster: $O(n \cdot a(n))$, where $a(n)$ is the inverse Ackermann function. However, in many of the places where we'd normally use disjoint sets, like Kruskal's algorithm, we need to sort stuff proportional to $n$ anyway, meaning the final complexity of the algorithm becomes $O(n \log n)$ anyway.

So our blob-merging operation, which has virtually no overhead and so is blisteringly fast, can actually replace the standard union-find algorithm in many situations! I've seen plenty of olympiad questions which require fast disjoint sets, but have additional complications which mean that the union-find algorithm doesn't work very nicely. In these situations, it might be preferable to merge some blobs as above.

Monday, April 1, 2013

Pirate Gold

Been super busy, haven't updated in ages! I feel very guilty. Here's a classic interview problem, plus my solution. (my regards to my algorithms lecturer, Aleks Ignjatovic, for the write-up)

There are five pirates who have to split 100 bars of gold. They all line up and proceed as follows:

The first pirate in line gets to propose a way to split up the gold (for example: everyone gets 20 bars)

The pirates, including the one who proposed, vote on whether to accept the proposal. If the proposal is rejected, the prate who made the proposal is killed.

The next pirate in line then makes his proposal, and the 4 pirates vote again. If the vote is tied (2 vs 2) then the proposing pirate is still killed. Only majority can accept a proposal. The process continues until a proposal is accepted or there is only one pirate left. Assume that every pirate :

above all wants to live;

given that he will be alive he wants to get as much gold as possible;

given maximal possible amount of gold, he wants to see any other pirate killed, just for fun;

each pirate knows his exact position in line;

all of the pirates are excellent puzzle solvers.

Like many other problems of the sort, it's helpful to start by considering a very small case -- for example, when $n=2$.

Call the five pirates $A,B,C,D,E$. The leftmost pirate in the line goes first.

2 Pirates: $D,E$.

$D$ proposes. $D$ votes for himself.
$E$ can always vote against $D$, resulting in a tie, meaning $E$ gets all the money.
$D$: dead
$E$: 100

3 Pirates: $C,D,E$.

$C$ proposes. $C$ votes for himself.

$D$ knows that if $C$ dies, then $D$ will also die in the following round (as in the 2 pirates case). So $D$ will avoid death by voting for $C$ regardless of $C$'s proposition.

$C$ wins with $2/3$ votes.

$C$: 100

$D$: 0

$E$: 0

4 Pirates: $B,C,D,E$.

$B$ proposes. $B$ votes for himself.
$C$ can get 100 bars by killing $B$, and gains nothing if $B$ lives, so $C$ will vote against $B$.
$D$ will get nothing if $B$ dies, and as pirates always want to kill pirates for fun, he will kill $B$ unless $B$ gives him some incentive to vote for $B$'s proposition. So if $B$ gives $D$ 1 bar, $D$ will vote for $B$.
Similarly for $E$ -- so $B$ gives $E$ 1 bar in exchange for $E$'s vote. $B$ wins with 3/4 votes.
$B$: 98
$C$: 0
$D$: 1
$E$: 1

5 Pirates: $A,B,C,D,E$.

$A$ proposes. $A$ votes for himself.
$B$ can get 98 bars by killing $A$, and gains nothing if $A$ lives, so $B$ will vote against $A$.
$C$ will get nothing if $A$ dies, and as pirates always want to kill pirates for fun, he will kill $A$ unless $A$ gives him some incentive to vote for $A$'s proposition. So if $A$ gives $C$ 1 bar, $C$ will vote for $A$.
$D$ will get 1 bar if $A$ dies, and as pirates always want to kill pirates for fun, he will kill $A$ unless $A$ gives him some incentive to vote for $A$'s proposition. So if $A$ gives $C$ 2 bars, $C$ will vote for $A$.
$A$ wins with 3/5 votes.
$A$: 97
$B$: 0
$C$: 1
$D$: 2
$E$: 0
Therefore $A$ should propose that $A$ gets 97 gold bars, $C$ gets 1 gold bar and $D$ gets 2 gold bars.

General Case

This is a simple recursive procedure -- the $n$th pirate has to bribe just enough other pirates so that he gets enough votes, while also maximising his own money. This can be done by finding those pirates who would get the least amount of gold bars in the case of $n-1$ pirates, then bribe them with one extra gold bar more than they would otherwise have gotten. Here is a $O(n\log n)$ Python implementation, and its output.

def best_strategy(n):
    if n == 2:
        return [-1, 100]
    else:
        moneys = [100] + best_strategy(n-1)
        copy = sorted(zip(moneys, range(n-1)))
        next = [100] + [0] * (n-1)
        for i in xrange(int((n-1) / 2)):
            their_money, person = copy[i]
            next[person] = their_money + 1
            next[0] -= next[person]
        return next

for i in xrange(2, 15):
    print i, best_strategy(i)

Output

2 [-1, 100]
3 [100, 0, 0]
4 [99, 0, 1, 0]
5 [97, 0, 1, 2, 0]
6 [97, 0, 1, 2, 0, 0]
7 [96, 0, 1, 2, 0, 1, 0]
8 [96, 0, 1, 2, 0, 1, 0, 0]
9 [95, 0, 1, 2, 0, 1, 0, 1, 0]
10 [95, 0, 1, 2, 0, 1, 0, 1, 0, 0]
11 [94, 0, 1, 2, 0, 1, 0, 1, 0, 1, 0]
12 [94, 0, 1, 2, 0, 1, 0, 1, 0, 1, 0, 0]
13 [93, 0, 1, 2, 0, 1, 0, 1, 0, 1, 0, 1, 0]
14 [93, 0, 1, 2, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0]