Algorithmic Gymnastics: 2048 in Python

Let's write 2048 in Python. Disclaimer: this is 2am bored-on-the-couch code and is not representative of my professional coding style ;)

First let's implement the action of the left arrowkey. This isn't the shortest or most elegant code, but it is easy to reason about. We treat each row individually, ignore the blank elements in it, and compress the remaining elements into the left. For the element-combining logic, we first decide if anything needs to be combined, and if anything does, we consider cases. There are actually very few cases that provoke combination, and these can be enumerated as: AA, AAB, ABB, AABB, AABC, ABBC, ABCC. We can handle AAAA, ABAA, etc. through these cases too.

def left(x):
    news = []
    score = 0
    for row in x:
        c = [v for v in row if v]
        if all(a != b for a, b in zip(c, c[1:])):
            new = c[:]
        elif len(c) == 2:
            new = [c[0]*2]; score = c[0]*2
        elif len(c) == 3:
            if c[0] == c[1]:
                new = [c[0]*2, c[2]]; score = c[0]*2
            elif c[1] == c[2]:
                new = [c[0], c[1]*2]; score = c[1]*2
        else:
            if c[0] == c[1] and c[2] == c[3]:
                new = [c[0]*2, c[1]*2]; score = c[0]*2 + c[1]*2
            elif c[0] == c[1]:
                new = [c[0]*2, c[2], c[3]]; score = c[0]*2
            elif c[1] == c[2]:
                new = [c[0], c[1]*2, c[3]]; score = c[1]*2
            elif c[2] == c[3]:
                new = [c[0], c[1], c[2]*2]; score = c[2]*2
        new += [None] * (4 - len(new))
        news.append(new)
    return news, score

The remaining arrowkey actions can be implemented by their symmetry with the left arrowkey action, via rotation. The do method will carry out the arrowkey actions. We only place a new tile on the board if the action resulted in a state change.

def cw(x):
    return [[x[3-j][i] for j in range(4)] for i in range(4)]

def do(x, d):
    if d == 'l':
        a, s = left(x)
    if d == 'd':
        a, s = left(cw(x))
        a = cw(cw(cw(a)))
    if d == 'r':
        a, s = left(cw(cw(x)))
        a = cw(cw(a))
    if d == 'u':
        a, s = left(cw(cw(cw(x))))
        a = cw(a)
    return (placenew(a) if a != x else a), s

The placenew function puts a new 2 or 4 into an empty square, following the probabilities of the standard web version of the game.

def placenew(x):
    es = [(r,c) for r in range(4) for c in range(4) if x[r][c] is None]
    r, c = random.choice(es)
    y = deepcopy(x)
    y[r][c] = 4 if random.random() < 0.1 else 2
    return y

This function determines if our position can be escaped. If it can't be escaped with any actions, the game is over.

1 2	def escapable(x, dirs='udlr'): return any(do(x,d)[0] != x for d in dirs)

Let's code up a play strategy where we pick actions at random.

x = [
    [None, None, 2, None],
    [None, None, None, None],
    [None, None, None, None],
    [None, 2, None, None],
]

def playrandom(x):
    ts = 0
    while escapable(x):
        x, s = do(x, random.choice('udlr'))
        ts += s
    return ts

How successful is this strategy? Let's play lots of random games and see the average score.

def mean(v):
    return sum(v) / len(v)

def stdev(v):
    u = mean(v)
    return (sum([(k-u)**2 for k in v]) / (len(v)-1))**0.5

scores = []
while True:
    s = playrandom(x)
    scores.append(s)
    if len(scores) > 1:
        print(int(mean(scores)), int(stdev(scores)), max(scores))

This shows that the mean score at endgame with the random strategy is about 810. Can we do better? Anyone who's played the game a couple times will know that ideally you wanna keep your big tiles in one corner, say the bottom left corner. So alternating down and left actions might prove more effective. We can add in right actions if we get stuck with downs and lefts. In a true disaster we can try go up and back down.

def playwell(x):
    ts = 0
    while escapable(x):
        start = ts
        x, s = do(x, 'd'); ts += s
        x, s = do(x, 'l'); ts += s
        if not escapable(x, 'dl'):
            x, s = do(x, 'r'); ts += s
            if not escapable(x, 'dlr'):
                x, s = do(x, 'u'); ts += s
                x, s = do(x, 'd'); ts += s
    return ts

This gives a mean endgame score of about 2400. Nice!

Algorithmic Gymnastics

Monday, December 13, 2021

2048 in Python

No comments:

Post a Comment