Tuesday, May 27, 2014

Counting Integers with a Particular Binary Property

The Hamming weight or popcount of $n$ is the number of 1s in the binary representation of $n$.

Question: how many positive integers less than $N$ have a popcount divisible by 3?

The naive $O(N \log N)$ solution is to loop through each integer $n < N$ and count the number of bits that are set. This is slow.

Don't fret! There is an $O(\log^2 N)$ solution.

Suppose that $N$ is a power of 2. Then its binary representation is 1 followed by $\lg(N)$ 0s. We can count the number of positive $n\leq N$ with $\mathrm{popcount}(n)$ by simply doing $C(\lg(N), 3) + C(\lg(N), 6) + ... + C(\lg(N), 3k)$ while $3k \leq \lg(N)$. For instance, when $N=16$, $C(4,3) = 4$ which is correct.

Next, suppose that $\mathrm{popcount}(N) = 2$, (i.e. $N$ has exactly two bits set, i.e. N$ $is the sum of a pair of distinct powers of 2). E.g. say $N = 20$ (binary 10100). The solution that worked for $N=16$ will still work. Now in a sense we've "considered" that power of 2. Then consider the next power of 2 that makes up the number, 4. 4 has 2 0's following, so we use the same principle as before! However, we remember that the bit representing 16 has already been set, so we're no longer looking for popcounts that are multiples of 3; instead, we're looking for popcounts which give a remainder of 2 modulo 3. Mathematically, we do $C(\lg(4), 2) + ... + C(\lg(4), 3k-1)$ while $3k-1 \leq 4$. In this case, we have only $C(2,2) = 1$. This corresponds to $n = 19$ (binary 10011). So for $N=20$, the answer is $4+1=5$.

OK now we generalise the pattern.

from euler import popcount, binom
from numpy import log2

def solve(N):
    ans = 0
    num_considered = 0
    p = 2**60
    oldN = N
    while p >= 1:
        if N & p:
            if (num_considered + 1) % 3 == 0:
                ans += 1
            mult = 3 - (num_considered % 3)
            while mult <= log2(p):
                ans += binom(log2(p), mult)
                mult += 3
            num_considered += 1
        p //= 2
    return ans

I'm not even sure that this is $O(\log^2 N)$... there are two nested loops that each run $O(\log n)$ times, and inside the nested loop we call a binomial function which in theory is $O(nk)$, and we call it with $n,k \leq \log N$ producing an extra $O(\log^2 N)$ multiplier. But due to the ranges of the arguments we use for the binomial function, I'm pretty sure each call is amortized constant time. Maybe.

euler is my Project Euler helper functions module which is available on my Github.

Thursday, May 8, 2014

P vs NP for Dummies

I seem to type up a variation of this post every few months. I'm saving this one for the permanent record.


P vs NP for Dummies


There are some yes/no problems which are easy. If I give you a list on numbers and ask you to tell me if the number 5 is in that list, you can solve it easily by looking at each number in the list. In the worst case, when the number 5 is at the very end of the list, this takes about n steps/instructions.
There are yes/no problems which are slightly harder -- for example, I might ask you to tell me if 12*14=168. When you multiply two n-digit numbers using the method you learnt in school, you need about n2 steps.
Both of these problems require some number of steps that's bounded by a polynomial of the input size; i.e. for each of these problems there is a fixed k so that the number of steps required to solve an instance of that problem of size n it is always bounded by nk . This class of problems is called P which is a terrible pseudo-abbreviation for 'polynomial time on a deterministic Turing machine' (a deterministic Turing machine is a standard computer, more or less).
Here's another yes/no problem: given a map of a country with n cities, determine whether there's a route that can be taken that visits every city exactly once, and whose total distance is less that a given distance d. If we are told that the answer is yes and we are given the corresponding route, we can easily check its total distance by simply following the route and adding up all the distances between cities. This verification process takes about n steps since you're adding n-1 numbers together. The set of yes/no problems whose solutions can be verified in polynomial time (given some extra information, in this case the route itself) is called NP. This specific problem is a special case called an NP-complete problem, meaning that it is in a sense 'harder' than all other problems in NP.
One might conjecture that, because you can verify a solution to this problem in a polynomial number of steps, then perhaps you can also solve it from scratch in polynomial time? Obviously all problems in P are also in NP, since if we can verify a solution to any problem in P in polynomial time by simply solving that problem and checking it against the given solution. Since this problem is 'harder' than all the other NP problems, if we can solve this one in polynomial time, then we can solve all the other NP problems in polynomial time too! This would mean that all NP problems are also P problems, in other words P = NP.
So you set out to try to find a polynomial time method to produce a solution to this problem. But you can't. No one can. We don't have a polynomial time method to solve for this problem. In fact we don't have a polynomial time method for any of the dozens of known NP-complete problems. These problems range from finding routes across maps, to packing items efficiently into a backpack, to optimally playing candy crush. On the flip side though, no one's managed to prove that a polynomial time method doesn't exist. Such a proof would demonstrate that P =/= NP.
So there you are. The problem of P vs NP is the problem of proving either that P=NP (this can be done simply by finding a polynomial time method to solve problem above) or that P =/= NP (this intuitively seems harder to prove).
For the route-finding problem described above (it's called the travelling salesman problem), the best currently known method is a small optimisation on "brute force every permutation of cities to visit". This method bound the number of steps by about n2 * 2n. If you can get rid of the exponential term in that expression, you get a million dollar prize and immediately render most of the field of cryptography obsolete.