Friday, December 13, 2019

Bounded Counting Knapsack

Counting knapsack problems look like this: given $n$ items with respective integer weights $w_1,\dots,w_n$, how many ways can we form combinations of these items that have total weight $s$?

There are three basic variations of counting knapsack.

  1. In the “unbounded” variation we allow combinations to contain repeated items. For instance with 3 items $a,b,c$ with weights 1,3,5 we can form a combination of $3a, 2b, 1c$ with total weight 16. The partition function $p(s)$ in number theory is given by solving this problem for $n=s$ and $w_i=i$ for all $i$.
  2. In the “binary” or “0/1” variation, we can only use each item once. The weights need not be distinct, so if the $w_i$ are 1,1,3,5 respectively then we can form a combination 1,1,5 that sums to 7, but we cannot form a combination 1,3,3.
  3. In the "bounded" variation, we are provided with integers $c_1, \dots, c_n$ and we are allowed to use up to $c_i$ instances of item $i$ in each combination. This is a generalisation of the two variations above, where in the unbounded variation we have $c_i = \infty$ for all $i$ and in the 0/1 variation we have $c_i = 1$ for all $i$.

The unbounded and 0/1 variations are classic combinatorics examples that can be solved in $O(ns)$ time and $O(n)$ space using basic dynamic programming. Denote $[i,k]$ to be the number of combinations of items $1, \dots, i$ with total weight $k$, so that the solution to the problem is given by $[n,s]$. Then the unbounded variation has recurrence $$[i,k] = [i-1,k] + [i,k-w_i]$$ This recurrence has the interpretation "to form a combination of weight $k$ using items $1, \dots, i$ we can either add the $i$th item to a combination of weight $k-w_i$ formed by items $1, \dots, i$, or we can just take any existing combination of weight $k$ formed by items $1, \dots, i-1$. The 0/1 variation has the very similar recurrence $$[i,k] = [i-1,k] + [i-1,k-w_i]$$ with an analogous interpretation; the only difference is that we are prevented from using item $i$ more than once in a combination. The $O(n)$ space bound is given by observing that in both cases, we can evaluate $[i, k]$ by referring only to $[j,k]$ for $j \geq i-1$, so any $[j,k]$ with $j<i-1$ can be discarded. In all variations we have base cases $[i,k] = 1$ if $i=k=0$ and $[i,k] = 0$ otherwise.

For the bounded variation things get more interesting. We might expect a solution to look like the solution to the solution for the unbounded variation, except that the unbounded solution has no "memory" for how many times it has used an item, so it has no inbuilt way of preventing us from exceeding our allowance for an item. The 0/1 variation, on the other hand, generalises naturally to the bounded case. We just replace the binary decision in its recurrence – "do we use item $i$ or not?" – by a more general decision "how many instances of item $i$ do we use?". The resulting recurrence is

$$[i,k] = \sum_{q=0}^{c_i} [i-1, k-qw_i]$$

When $c_i = 1$, this is exactly equivalent to the 0/1 variation recurrence. However, when $c_i = \infty$, this looks quite different to the recurrence for the unbounded case. Why? It is instructive to think hard about the answer to this question.

What is the time complexity of this solution? It is not $O(ns)$ because clearly the recurrence is no longer $O(1)$. Overall, for each item $i$ we sum over $c_i$ values $s$ times, so we could say it is $O(s\sum_i c_i)$. In the worst case our items have small $w_i$ and $c_i \approx s$, so $\sum_i c_i$ starts to look like $ns$ and we're looking at $O(s^2n)$.

At this point we observe that all $w_i$ are distinct: if we had $w_i = w_j$ for $i \neq j$, we could replace items $i,j$ with a new item $i+j$ with weight $w_{i+j} = w_i + w_j$ and $c_{i+j} = c_i + c_j$. For any $i,k$ we have $[i,k-w_i] = 0$ whenever $k-w_i \leq 0$, so for any fixed $w_i$ we are summing over at most $k/w_i$ terms. Using the distinctness of $w_i$ we have

$$\sum_{i=1}^n \frac{k}{w_i} \leq \sum_{i=1}^n \frac{k}{i} \leq k \log n$$

What are we saying here? If we fix a value $k$ we can evaluate $[i,k]$ across all $i$ in $O(k \log n)$. $k$ is bounded by $s$ and we must evaluate this for each $k$, so the algorithm in fact runs in $O(s^2 \log n)$. Of course, if the $c_i$ are known a priori to be small, this is a very loose bound and it's more useful to think of the algorithm as running in $O(n \sum_i c_i)$. Either way it's a far cry from the $O(ns)$ solutions for the other variations.

Generally we speed up an algorithm by noticing and getting rid of redundant computation, and the case is no different here. The critical observation is best conveyed through example. Fix $w_i = 3, c_i=4$. We have

$$\begin{aligned} [i,k] &= [i-1,k] + [i-1,k-3] + [i-1,k-6] + [i-1,k-9] + [i-1,k-12]\\ [i,k+1] &= [i-1, k+1] + [i-1, k-2] + [i-1, k-5] + [i-1, k-8] + [i-1, k-11] \\ [i,k+2] &= [i-1, k+2] + [i-1, k-1] + [i-1, k-4] + [i-1, k-7] + [i-1, k-10] \end{aligned}$$

No repeated work yet. But see what happens for $[i,k+3]$:

$$[i,k+3] = [i-1, k+3] + [i-1, k] + [i-1, k-3] + [i-1, k-6] + [i-1, k-9]$$

All except the first element of this summation have been seen before in the solution to $[i,k]$! In fact we can write

$$[i,k+3] = [i-1,k+3] + [i,k] - [i,k-12]$$

What is happening here is that to solve $[i,k]$ we are use $[i,k-w_i]$ as a cumulative array that sums values across $[i-1, \dots]$. Unfixing $w_i, c_i$ we arrive at the $O(1)$ recurrence

$$[i,k] = [i-1, k] + [i,k-w_i] - [i,k-(c_i+1)w_i]$$

To me this is a remarkable result. In some informal sense the recurrence appears to "lie between" the recurrences for the 0/1 and unbounded variations, in that it starts with $[i-1, k] + [i, k-w_i]$ like the unbounded variation, but then it corrects itself using $[i-1,k-(c_i+1)w_i]$ to avoid using item $i$ too many times. It makes sense if I squint, but I don't think I could have come up with that correction procedure through combinatorial reasoning. The recurrence seems to say "we make a combination of items up to $i$ with weight $k$ either by just taking a combination of weight $k$ using items up to $i-1$, or by adding item $i$ to a combination $k-w_i$ of items up to $i$, but if it's possible that we used too many of item $i$, uncount the number of combinations that contain too many uses of that item." This gives us a $O(ns)$ time, $O(n)$ space solution to the bounded counting knapsack problem.

In the next entry we will consider another counting knapsack variation which generalises the unbounded counting knapsack problem into another "dimension", with beautiful results.

Saturday, November 2, 2019

Some reasons for studying mathematics

A different sort of post today.

I've never had much interest in doing maths. I'm fundamentally incurious about mathematical objects, I don't feel joy or excitement when I prove a theorem, and I rarely enjoy the process of solving mathematical problems. The typical questions that I encounter in my course of mathematical study don't appeal to my intellectual hunger in the same way that, say, a computer science problem can take over a corner of my brain for weeks. This is how I know that mathematics is not for me. So why did I choose to study it?

When I studied algorithms in high school I learned a bunch of graph search techniques and dynamic programming and computational geometry etc, but more importantly I developed a sort of mental toolbox for reasoning about systems and processes. Items in the toolbox could also be described as "problem-solving techniques" or "approaches" or "paradigms" or "meta-algorithms"; they can feel abstract and fuzzy and difficult to describe, but certainly seem to exist as distinct concepts in my brain so that I can manifest them at will. They're obviously useful in the context of algorithmic problem solving, but they're also applicable to problems in totally unrelated fields. There are hundreds of tools in my toolbox of various levels of abstraction and usefulness. I'll have a go at identifying some of the tools in my algorithms toolbox:

  • Thinking in terms of growth rates
  • The notion of "greediness"
  • "Sliding window" methods: we can solve a sequence of overlapping problems by solving the first problem and efficiently transforming that solution into a solution for the subsequent problem, and so on.
  • Divide and conquer
  • Laziness vs eagerness
  • Invariants
  • "X reduces to Y", meaning in a precise sense "X is at least as hard as Y, and we can't solve X without solving Y"

This mental toolbox represents the greatest benefit I received from studying algorithmic problem solving.

The study of mathematics has given me a whole new set of tools to add to my toolbox. As in the examples above, these tools tools are used constantly in the study of maths, but they have little to do with mathematical problems themselves. These mathematical tools are often far more insightful and valuable to me than the problems that they help to solve, and I can apply them to problems outside of mathematics. On its own, this is a good enough reason for me to study maths (it's not the only reason – see below). In this entry I will list and briefly describe some of the tools I've added to my toolbox in the past few years of maths education. Most of these tools can be expressed as statements about systems or structures. I may update this list over time as new tools occur to me. All tools are stated informally.

  • Action to structure: we can study actions on a structure by treating the actions as first-class objects themselves. (for instance studying symmetries via the symmetry group. Reverse of "Structure to action".)
  • Basis: in some cases, the behaviour of a system is wholly described by its behaviour on a small set of representative inputs.
  • Continuity: the property of being able to induce arbitrarily small changes in the output of a system via correspondingly small changes in the input.
  • Contradiction (proof by): one can assert a proposition by supposing that it doesn't hold and subsequently taking a logical sequence of steps to arrive at an impossibility.
  • Convexity: the property of "not having dents"; not necessarily in a geometric setting.
  • Density arguments: if a property holds for a subset $A$ of a set $X$ and any element of $X$ is the limit of elements in $A$, then we may be able to extend the property to all elements of $X$. (This is useful when $A$ consists of "simple" examples where the property is easily explored. The obvious example is the treatment of simple functions in integration theory.)
  • Diagonalisation: if we have a countable number of instances from some class, we can generate a new instance of that class simply by ensuring that it differs from each previous instance at exactly one point.
  • Differentiation: we can learn about a system by observing its instantaneous change.
  • Duality: Two seemingly disparate domains sometimes pair up conveniently, so that we can study items in one domain by looking at the corresponding item in the other domain.
  • Eigen-: we can learn about a system by studying the inputs on which it is particularly well-behaved.
  • Exchange arguments: we can show that a state is optimal by taking any other state and showing that it doesn't get any worse if we bring it a step closer to the proposed optimal state.
  • Graphs: complex systems and structures often have natural representations as networks.
  • Integration: we can learn about a system by observing its cumulative effect.
  • Isomorphism: rather than speaking about the equality of two objects, it's often more meaningful to consider equivalence with respect to some structural property.
  • Lifts: methods that transform solutions for a problem in one domain to solutions in a broader domain, hence "lifting" them.
  • Limits: properties that hold as we get close to something may or may not hold once we get there, depending on the attributes of the surrounding space.
  • Locality vs globality: properties that hold at a micro level often don't hold at the macro level and vice versa.
  • Orthogonality: a stronger or more precise form of independence. If A, B are orthogonal then we can change A in any way without affecting B and vice versa.
  • Pigeonhole principle: if you're placing balls into boxes with more balls than boxes, at least one box will have more than one ball. (And in reverse: if there are less balls than boxes, at least one box will be empty.)
  • Probabilistic method: informally, if a property has non-zero chance of holding for a randomly-selected element of a set, then the property holds for at least one element of the set. If the mean of the values in a set is $x$ then at least one of the values is at least $x$ and at least one of the values is at most $x$.
  • Projection: we can learn about a complex structure by studying what it looks from a fixed perspective. I'm stunned at how often the notion of projection appears in mathematics and how abstract it can become. We can project in the literal geometric sense – as in projective geometry, or projecting a vector onto another – but we can also project the integers onto the set {0,1} via the equivalence relation of parity, or we can project a vector space to another vector space via a linear map, or we can project a sigma-algebra to a sub-sigma-algebra via conditional expectation. Projection is closely linked to the notion of homomorphism – so much so that I wish our study of homomorphisms had begun with an explicit acknowledgement that homomorphisms are precisely the maps that behave "projectively".
  • Representatives: we can sometimes study the interactions of complex structures (often equivalence classes) by instead considering the interactions of individual representatives from those complex structures. (e.g. dealing with cosets via coset representatives.)
  • Structure to action: we can learn about a structure by constructing an action of that structure on some class of objects and studying the action instead. (for instance studying a group via its group action on a set. The reverse of "Action to structure".)
  • Substructure: we can learn about a structure by studying the smaller structures embedded within it. (see subgroup, subspace, submodule, subgraph, etc.)
  • Symmetry: we can learn about a structure by studying the ways in which we can transform the structure without changing the way it looks.
  • Two-player game: some notions can be understood in terms of a two-player game oppositional game. For instance epsilon-delta proofs interpreted as a game between an epsilon-picker and a delta-picker.

Why else do I study maths if I fundamentally don't care for it? I find maths to be an exceptional form of brain training. Most of this training happens via two mechanisms:

  1. Structuring and summarising information: as part of my maths education I'm expected to be able to regurgitate in exam conditions something like fifty theorems and proofs per course. Each proof can consist of pages of dense equation manipulation or tricky argumentation, which is far too much to learn by rote. Instead, I need to memorise the core ideas of each theorem and proof in summarised form, but in a way where I can elaborate on those summaries as much as required until I arrive at their full unsummarised forms. I also need to develop a rich mental map of the relationships between the objects of study. Both of these processes develop my reasoning abilities.
  2. Coping with abstraction: the objects of study in pure mathematics become increasingly abstract throughout a course of study, but we still need to reason about them. So we develop skills and techniques for coping with abstraction. For instance, we can fall back to geometric intuition (vector space = "space with arrows that can be added to each other and scaled"; orbit-stabiliser theorem = "sending a fixed vertex of a polyhedron to all other vertices then counting symmetries which fix that vertex in place"), as long as we keep in mind the limitations of that geometric intuition. Or we can comprehend abstraction in terms of sufficiently distinct representative examples whose generalisation "spans" the full abstraction (for instance it's difficult to understand abstract measure theory in full generality, so we develop most of our intuition by comparing and contrasting canonical examples: Lebesgue measure, counting measure, Hausdorff measure, Dirac measure, ...). If we think of an abstract concept as a space of instances of examples ("measure theory" as the study of the set of measures), this technique corresponds to choosing a set of examples at different points on the boundary of the space that help us to reason about any instance inside the boundary. I suspect that this skill is exercised more in mathematical study than in any other area of study.

Studying maths for me feels a lot like going to the gym. My interest in number theory or complex analysis is about as deep as my interest in push ups or squats. To me their value is as means of self-improvement rather than as ends in themselves. I deeply respect my lecturers and mathematically-minded peers, but as my study has progressed and the mathematical objects in consideration have become more contrived, I increasingly struggle to understand the motivation behind a career in research mathematics. But of course many academic mathematicians say the exact same thing about careers in tech or finance. Different strokes for different folks...


Sunday, October 13, 2019

Developing intuition for Noether's isomorphism theorems

In abstract algebra, Noether's isomorphism theorems are extremely general statements about algebraic objects and the relationships between them. I've studied the theorems in separate courses on groups, rings, and modules over commutative rings (including vector spaces), so I've spent plenty of time trying to grasp their essence. In this post I share some of the intuition that I've developed. If nothing else, this intuition helps to remember the statement of the theorems. I will present the theorems in the context of abelian groups, where the statements are especially straight-forward, in order to avoid clouding the intuition with details about normal subgroups and so on. I will number the theorems 1, 2, 3, 4.

Theorem 1. If $f$ is a homomorphism between abelian groups $A, B$ then $$\frac{A}{\text{Ker } f} \cong \text{Im } f$$

This is the core theorem from which theorems 2 and 3 easily follow. By taking the quotient of $A$ with the kernel of $f$, we are forming a group that looks like $A$ except where we've combined the elements that together map to the same element in the codomain. I think of this as forcing $f$ to become injective by taking all the sets of elements where it fails to be injective and combining those elements into a single element. Since our version of $f$ is now injective, and it maps onto its image (by definition of image), then obviously it's a bijection onto its image. Moreover, the structure-preserving nature of the homomorphism means that structure is preserved through the bijection – i.e. there is an isomorphism between the LHS and RHS.

To me, the statement becomes almost tautological under this interpretation. It seems to say "if we take a homomorphism and disregard all the points where it fails to be injective, then it becomes injective".

Theorem 2. If $A, B$ are subgroups of an abelian group $G$, then $$\frac{A+B}{A} \cong \frac{B}{A \cap B}$$

I think of this statement as a broad generalisation of a basic result from number theory: if $a, b$ are positive integers then $a/\gcd(a,b) = \mathrm{lcm}(a,b)/b$ (this rearranges into the more familiar form $ab/\gcd(a,b) = \mathrm{lcm}(a,b)$. If we draw the diamond-shaped partial order of the "divides" relation with the positive integers $a, b, \gcd(a,b), \mathrm{lcm}(a,b)$ and then we draw the partial order of the "subgroup" relation with groups $A, B, A \cap B, A + B$, we see that the number-theoretic statement on the "divides" partial order looks very similar to what the second isomorphism theorem says about the "subgroup" partial order.

Indeed, $A \cap B$ as the "greatest common subgroup" of $A, B$ and $A + B$ is the "smallest common supergroup" of $A, B$. We know that the quotient operation feels a lot like division, so it seems natural to me to relate the quotient operation with greatest common subgroups and smallest common supergroups to the division operation with greatest common divisors and least common multiples.

The number theory statement can in fact be derived from the second isomorphism theorem applied to the $(\mathbb Z, +)$, so we're not saying anything groundbreaking. The benefit of this view of things is that you probably already have a good intuition for the properties of GCDs and LCMs, so it's just a matter of translating that intuition to the world of abstract algebra.

This isn't a complete intuition. In particular it's not clear under this intuition why we get an isomorphism between the LHS and RHS rather than just a bijection. But it's a useful starting point, or at the very least a mnemonic device.

Theorem 3. If $A \leq B \leq G$ are abelian groups, then $$\frac{G/A}{B/A} \cong \frac{G}{B}$$

This is probably the easiest theorem to remember out of the three. Quotients feel like fractions, and indeed we can "multiply both sides" of the fraction to cancel out quotients of quotients. Unfortunately I don't yet have a good intuition for the preservation of structure.


It's worth keeping in mind that the proofs of the three theorems are all in the "follow your nose" style. For the first theorem we want to establish an isomorphism between the LHS and RHS; it turns out that the most natural possible mapping from the LHS to the RHS turns out to be this isomorphism. For the remaining theorems we again pick the most natural mappings between the LHS and RHS (it'll be easiest to take the mapping from the RHS to the LHS). Again this mapping will happen to be a homomorphism, and the second and third theorems fall out immediately from applying the first theorem to this homomorphism. This is quite remarkable and it means that even if you haven't developed a fully-formed intuition for the theorems, it's easy to convince yourself rigorously that they're correct.


Theorem 4. Let $P$ be a subgroup of an abelian group G. Then there exists a bijective correspondence$$\left\{ \text{subgroups of } \frac{G}{P} \right\} \longleftrightarrow \left\{ \text{subgroups of G that include P}\right\}$$

The right-to-left map looks like this: if $Q \leq P$ is a subgroup satisfying $P \subseteq Q$, then $P$ is also a subgroup of $Q$, so we can form the quotient $Q/P$. Since $Q \leq G$ it would seem natural that $Q/P \leq G/P$. Indeed this is the case, so $Q/P$ is in the LHS. In other words, we map from the right to the left simply by taking the quotient of the subgroup with $P$.

For the left-to-right map, notice that any subgroup of $G/P$ has elements of the form $g+P$ where $g$ is a coset representative. Denote the set of all such representatives by $H$. We can show that $H$ is a subgroup of $G$ (it inherits its structure from $G/P$). Moreover, the representatives of the identity element of the subgroup of $G/P$ are the elements of $P$ itself, so the elements of $P$ are contained within $H$. Thus $P \subseteq H \leq G$ so $H$ lands in the RHS. I visualise this map as "unpacking" a subgroup of the quotient into its representatives, and it makes intuitive sense to me that (1) these representatives form a subgroup of the quotient numerator, and (2) the identity of the quotient (i.e. the coset $0+P$) is in the subgroup and unpacks to the set $P$.

Some work has to be done to prove that these maps are inverses of each other, but if we think of the maps above as packing elements into a quotient and then unpacking the quotient back to elements, the fact that the maps are inverses of each other shouldn't come as a surprise.