[algorithm] What is the optimal algorithm for the game 2048?

I have recently stumbled upon the game 2048. You merge similar tiles by moving them in any of the four directions to make "bigger" tiles. After each move, a new tile appears at random empty position with a value of either 2 or 4. The game terminates when all the boxes are filled and there are no moves that can merge tiles, or you create a tile with a value of 2048.

One, I need to follow a well-defined strategy to reach the goal. So, I thought of writing a program for it.

My current algorithm:

while (!game_over) {
    for each possible move:
        count_no_of_merges_for_2-tiles and 4-tiles
    choose the move with a large number of merges
}

What I am doing is at any point, I will try to merge the tiles with values 2 and 4, that is, I try to have 2 and 4 tiles, as minimum as possible. If I try it this way, all other tiles were automatically getting merged and the strategy seems good.

But, when I actually use this algorithm, I only get around 4000 points before the game terminates. Maximum points AFAIK is slightly more than 20,000 points which is way larger than my current score. Is there a better algorithm than the above?

This question is related to algorithm logic artificial-intelligence 2048

The answer is


This algorithm is not optimal for winning the game, but it is fairly optimal in terms of performance and amount of code needed:

  if(can move neither right, up or down)
    direction = left
  else
  {
    do
    {
      direction = random from (right, down, up)
    }
    while(can not move in "direction")
  }

I wrote a 2048 solver in Haskell, mainly because I'm learning this language right now.

My implementation of the game slightly differs from the actual game, in that a new tile is always a '2' (rather than 90% 2 and 10% 4). And that the new tile is not random, but always the first available one from the top left. This variant is also known as Det 2048.

As a consequence, this solver is deterministic.

I used an exhaustive algorithm that favours empty tiles. It performs pretty quickly for depth 1-4, but on depth 5 it gets rather slow at a around 1 second per move.

Below is the code implementing the solving algorithm. The grid is represented as a 16-length array of Integers. And scoring is done simply by counting the number of empty squares.

bestMove :: Int -> [Int] -> Int
bestMove depth grid = maxTuple [ (gridValue depth (takeTurn x grid), x) | x <- [0..3], takeTurn x grid /= [] ]

gridValue :: Int -> [Int] -> Int
gridValue _ [] = -1
gridValue 0 grid = length $ filter (==0) grid  -- <= SCORING
gridValue depth grid = maxInList [ gridValue (depth-1) (takeTurn x grid) | x <- [0..3] ]

I thinks it's quite successful for its simplicity. The result it reaches when starting with an empty grid and solving at depth 5 is:

Move 4006
[2,64,16,4]
[16,4096,128,512]
[2048,64,1024,16]
[2,4,16,2]

Game Over

Source code can be found here: https://github.com/popovitsj/2048-haskell


Algorithm

while(!game_over)
{
    for each possible move:
        evaluate next state

    choose the maximum evaluation
}

Evaluation

Evaluation =
    128 (Constant)
    + (Number of Spaces x 128)
    + Sum of faces adjacent to a space { (1/face) x 4096 }
    + Sum of other faces { log(face) x 4 }
    + (Number of possible next moves x 256)
    + (Number of aligned values x 2)

Evaluation Details

128 (Constant)

This is a constant, used as a base-line and for other uses like testing.

+ (Number of Spaces x 128)

More spaces makes the state more flexible, we multiply by 128 (which is the median) since a grid filled with 128 faces is an optimal impossible state.

+ Sum of faces adjacent to a space { (1/face) x 4096 }

Here we evaluate faces that have the possibility to getting to merge, by evaluating them backwardly, tile 2 become of value 2048, while tile 2048 is evaluated 2.

+ Sum of other faces { log(face) x 4 }

In here we still need to check for stacked values, but in a lesser way that doesn't interrupt the flexibility parameters, so we have the sum of { x in [4,44] }.

+ (Number of possible next moves x 256)

A state is more flexible if it has more freedom of possible transitions.

+ (Number of aligned values x 2)

This is a simplified check of the possibility of having merges within that state, without making a look-ahead.

Note: The constants can be tweaked..


I am the author of a 2048 controller that scores better than any other program mentioned in this thread. An efficient implementation of the controller is available on github. In a separate repo there is also the code used for training the controller's state evaluation function. The training method is described in the paper.

The controller uses expectimax search with a state evaluation function learned from scratch (without human 2048 expertise) by a variant of temporal difference learning (a reinforcement learning technique). The state-value function uses an n-tuple network, which is basically a weighted linear function of patterns observed on the board. It involved more than 1 billion weights, in total.

Performance

At 1 moves/s: 609104 (100 games average)

At 10 moves/s: 589355 (300 games average)

At 3-ply (ca. 1500 moves/s): 511759 (1000 games average)

The tile statistics for 10 moves/s are as follows:

2048: 100%
4096: 100%
8192: 100%
16384: 97%
32768: 64%
32768,16384,8192,4096: 10%

(The last line means having the given tiles at the same time on the board).

For 3-ply:

2048: 100%
4096: 100%
8192: 100%
16384: 96%
32768: 54%
32768,16384,8192,4096: 8%

However, I have never observed it obtaining the 65536 tile.


EDIT: This is a naive algorithm, modelling human conscious thought process, and gets very weak results compared to AI that search all possibilities since it only looks one tile ahead. It was submitted early in the response timeline.

I have refined the algorithm and beaten the game! It may fail due to simple bad luck close to the end (you are forced to move down, which you should never do, and a tile appears where your highest should be. Just try to keep the top row filled, so moving left does not break the pattern), but basically you end up having a fixed part and a mobile part to play with. This is your objective:

Ready to finish

This is the model I chose by default.

1024 512 256 128
  8   16  32  64
  4   2   x   x
  x   x   x   x

The chosen corner is arbitrary, you basically never press one key (the forbidden move), and if you do, you press the contrary again and try to fix it. For future tiles the model always expects the next random tile to be a 2 and appear on the opposite side to the current model (while the first row is incomplete, on the bottom right corner, once the first row is completed, on the bottom left corner).

Here goes the algorithm. Around 80% wins (it seems it is always possible to win with more "professional" AI techniques, I am not sure about this, though.)

initiateModel();

while(!game_over)
{    
    checkCornerChosen(); // Unimplemented, but it might be an improvement to change the reference point

    for each 3 possible move:
        evaluateResult()
    execute move with best score
    if no move is available, execute forbidden move and undo, recalculateModel()
 }

 evaluateResult() {
     calculatesBestCurrentModel()
     calculates distance to chosen model
     stores result
 }

 calculateBestCurrentModel() {
      (according to the current highest tile acheived and their distribution)
  }

A few pointers on the missing steps. Here: model change

The model has changed due to the luck of being closer to the expected model. The model the AI is trying to achieve is

 512 256 128  x
  X   X   x   x
  X   X   x   x
  x   x   x   x

And the chain to get there has become:

 512 256  64  O
  8   16  32  O
  4   x   x   x
  x   x   x   x

The O represent forbidden spaces...

So it will press right, then right again, then (right or top depending on where the 4 has created) then will proceed to complete the chain until it gets:

Chain completed

So now the model and chain are back to:

 512 256 128  64
  4   8  16   32
  X   X   x   x
  x   x   x   x

Second pointer, it has had bad luck and its main spot has been taken. It is likely that it will fail, but it can still achieve it:

Enter image description here

Here the model and chain is:

  O 1024 512 256
  O   O   O  128
  8  16   32  64
  4   x   x   x

When it manages to reach the 128 it gains a whole row is gained again:

  O 1024 512 256
  x   x  128 128
  x   x   x   x
  x   x   x   x

I'm the author of the AI program that others have mentioned in this thread. You can view the AI in action or read the source.

Currently, the program achieves about a 90% win rate running in javascript in the browser on my laptop given about 100 milliseconds of thinking time per move, so while not perfect (yet!) it performs pretty well.

Since the game is a discrete state space, perfect information, turn-based game like chess and checkers, I used the same methods that have been proven to work on those games, namely minimax search with alpha-beta pruning. Since there is already a lot of info on that algorithm out there, I'll just talk about the two main heuristics that I use in the static evaluation function and which formalize many of the intuitions that other people have expressed here.

Monotonicity

This heuristic tries to ensure that the values of the tiles are all either increasing or decreasing along both the left/right and up/down directions. This heuristic alone captures the intuition that many others have mentioned, that higher valued tiles should be clustered in a corner. It will typically prevent smaller valued tiles from getting orphaned and will keep the board very organized, with smaller tiles cascading in and filling up into the larger tiles.

Here's a screenshot of a perfectly monotonic grid. I obtained this by running the algorithm with the eval function set to disregard the other heuristics and only consider monotonicity.

A perfectly monotonic 2048 board

Smoothness

The above heuristic alone tends to create structures in which adjacent tiles are decreasing in value, but of course in order to merge, adjacent tiles need to be the same value. Therefore, the smoothness heuristic just measures the value difference between neighboring tiles, trying to minimize this count.

A commenter on Hacker News gave an interesting formalization of this idea in terms of graph theory.

Here's a screenshot of a perfectly smooth grid, courtesy of this excellent parody fork.

A perfectly smooth 2048 board

Free Tiles

And finally, there is a penalty for having too few free tiles, since options can quickly run out when the game board gets too cramped.

And that's it! Searching through the game space while optimizing these criteria yields remarkably good performance. One advantage to using a generalized approach like this rather than an explicitly coded move strategy is that the algorithm can often find interesting and unexpected solutions. If you watch it run, it will often make surprising but effective moves, like suddenly switching which wall or corner it's building up against.

Edit:

Here's a demonstration of the power of this approach. I uncapped the tile values (so it kept going after reaching 2048) and here is the best result after eight trials.

4096

Yes, that's a 4096 alongside a 2048. =) That means it achieved the elusive 2048 tile three times on the same board.


My attempt uses expectimax like other solutions above, but without bitboards. Nneonneo's solution can check 10millions of moves which is approximately a depth of 4 with 6 tiles left and 4 moves possible (2*6*4)4. In my case, this depth takes too long to explore, I adjust the depth of expectimax search according to the number of free tiles left:

depth = free > 7 ? 1 : (free > 4 ? 2 : 3)

The scores of the boards are computed with the weighted sum of the square of the number of free tiles and the dot product of the 2D grid with this:

[[10,8,7,6.5],
 [.5,.7,1,3],
 [-.5,-1.5,-1.8,-2],
 [-3.8,-3.7,-3.5,-3]]

which forces to organize tiles descendingly in a sort of snake from the top left tile.

code below or on github:

_x000D_
_x000D_
var n = 4,_x000D_
 M = new MatrixTransform(n);_x000D_
_x000D_
var ai = {weights: [1, 1], depth: 1}; // depth=1 by default, but we adjust it on every prediction according to the number of free tiles_x000D_
_x000D_
var snake= [[10,8,7,6.5],_x000D_
            [.5,.7,1,3],_x000D_
            [-.5,-1.5,-1.8,-2],_x000D_
            [-3.8,-3.7,-3.5,-3]]_x000D_
snake=snake.map(function(a){return a.map(Math.exp)})_x000D_
_x000D_
initialize(ai)_x000D_
_x000D_
function run(ai) {_x000D_
 var p;_x000D_
 while ((p = predict(ai)) != null) {_x000D_
  move(p, ai);_x000D_
 }_x000D_
 //console.log(ai.grid , maxValue(ai.grid))_x000D_
 ai.maxValue = maxValue(ai.grid)_x000D_
 console.log(ai)_x000D_
}_x000D_
_x000D_
function initialize(ai) {_x000D_
 ai.grid = [];_x000D_
 for (var i = 0; i < n; i++) {_x000D_
  ai.grid[i] = []_x000D_
  for (var j = 0; j < n; j++) {_x000D_
   ai.grid[i][j] = 0;_x000D_
  }_x000D_
 }_x000D_
 rand(ai.grid)_x000D_
 rand(ai.grid)_x000D_
 ai.steps = 0;_x000D_
}_x000D_
_x000D_
function move(p, ai) { //0:up, 1:right, 2:down, 3:left_x000D_
 var newgrid = mv(p, ai.grid);_x000D_
 if (!equal(newgrid, ai.grid)) {_x000D_
  //console.log(stats(newgrid, ai.grid))_x000D_
  ai.grid = newgrid;_x000D_
  try {_x000D_
   rand(ai.grid)_x000D_
   ai.steps++;_x000D_
  } catch (e) {_x000D_
   console.log('no room', e)_x000D_
  }_x000D_
 }_x000D_
}_x000D_
_x000D_
function predict(ai) {_x000D_
 var free = freeCells(ai.grid);_x000D_
 ai.depth = free > 7 ? 1 : (free > 4 ? 2 : 3);_x000D_
 var root = {path: [],prob: 1,grid: ai.grid,children: []};_x000D_
 var x = expandMove(root, ai)_x000D_
 //console.log("number of leaves", x)_x000D_
 //console.log("number of leaves2", countLeaves(root))_x000D_
 if (!root.children.length) return null_x000D_
 var values = root.children.map(expectimax);_x000D_
 var mx = max(values);_x000D_
 return root.children[mx[1]].path[0]_x000D_
_x000D_
}_x000D_
_x000D_
function countLeaves(node) {_x000D_
 var x = 0;_x000D_
 if (!node.children.length) return 1;_x000D_
 for (var n of node.children)_x000D_
  x += countLeaves(n);_x000D_
 return x;_x000D_
}_x000D_
_x000D_
function expectimax(node) {_x000D_
 if (!node.children.length) {_x000D_
  return node.score_x000D_
 } else {_x000D_
  var values = node.children.map(expectimax);_x000D_
  if (node.prob) { //we are at a max node_x000D_
   return Math.max.apply(null, values)_x000D_
  } else { // we are at a random node_x000D_
   var avg = 0;_x000D_
   for (var i = 0; i < values.length; i++)_x000D_
    avg += node.children[i].prob * values[i]_x000D_
   return avg / (values.length / 2)_x000D_
  }_x000D_
 }_x000D_
}_x000D_
_x000D_
function expandRandom(node, ai) {_x000D_
 var x = 0;_x000D_
 for (var i = 0; i < node.grid.length; i++)_x000D_
  for (var j = 0; j < node.grid.length; j++)_x000D_
   if (!node.grid[i][j]) {_x000D_
    var grid2 = M.copy(node.grid),_x000D_
     grid4 = M.copy(node.grid);_x000D_
    grid2[i][j] = 2;_x000D_
    grid4[i][j] = 4;_x000D_
    var child2 = {grid: grid2,prob: .9,path: node.path,children: []};_x000D_
    var child4 = {grid: grid4,prob: .1,path: node.path,children: []}_x000D_
    node.children.push(child2)_x000D_
    node.children.push(child4)_x000D_
    x += expandMove(child2, ai)_x000D_
    x += expandMove(child4, ai)_x000D_
   }_x000D_
 return x;_x000D_
}_x000D_
_x000D_
function expandMove(node, ai) { // node={grid,path,score}_x000D_
 var isLeaf = true,_x000D_
  x = 0;_x000D_
 if (node.path.length < ai.depth) {_x000D_
  for (var move of[0, 1, 2, 3]) {_x000D_
   var grid = mv(move, node.grid);_x000D_
   if (!equal(grid, node.grid)) {_x000D_
    isLeaf = false;_x000D_
    var child = {grid: grid,path: node.path.concat([move]),children: []}_x000D_
    node.children.push(child)_x000D_
    x += expandRandom(child, ai)_x000D_
   }_x000D_
  }_x000D_
 }_x000D_
 if (isLeaf) node.score = dot(ai.weights, stats(node.grid))_x000D_
 return isLeaf ? 1 : x;_x000D_
}_x000D_
_x000D_
_x000D_
_x000D_
var cells = []_x000D_
var table = document.querySelector("table");_x000D_
for (var i = 0; i < n; i++) {_x000D_
 var tr = document.createElement("tr");_x000D_
 cells[i] = [];_x000D_
 for (var j = 0; j < n; j++) {_x000D_
  cells[i][j] = document.createElement("td");_x000D_
  tr.appendChild(cells[i][j])_x000D_
 }_x000D_
 table.appendChild(tr);_x000D_
}_x000D_
_x000D_
function updateUI(ai) {_x000D_
 cells.forEach(function(a, i) {_x000D_
  a.forEach(function(el, j) {_x000D_
   el.innerHTML = ai.grid[i][j] || ''_x000D_
  })_x000D_
 });_x000D_
}_x000D_
_x000D_
_x000D_
updateUI(ai);_x000D_
updateHint(predict(ai));_x000D_
_x000D_
function runAI() {_x000D_
 var p = predict(ai);_x000D_
 if (p != null && ai.running) {_x000D_
  move(p, ai);_x000D_
  updateUI(ai);_x000D_
  updateHint(p);_x000D_
  requestAnimationFrame(runAI);_x000D_
 }_x000D_
}_x000D_
runai.onclick = function() {_x000D_
 if (!ai.running) {_x000D_
  this.innerHTML = 'stop AI';_x000D_
  ai.running = true;_x000D_
  runAI();_x000D_
 } else {_x000D_
  this.innerHTML = 'run AI';_x000D_
  ai.running = false;_x000D_
  updateHint(predict(ai));_x000D_
 }_x000D_
}_x000D_
_x000D_
_x000D_
function updateHint(dir) {_x000D_
 hintvalue.innerHTML = ['?', '?', '?', '?'][dir] || '';_x000D_
}_x000D_
_x000D_
document.addEventListener("keydown", function(event) {_x000D_
 if (!event.target.matches('.r *')) return;_x000D_
 event.preventDefault(); // avoid scrolling_x000D_
 if (event.which in map) {_x000D_
  move(map[event.which], ai)_x000D_
  console.log(stats(ai.grid))_x000D_
  updateUI(ai);_x000D_
  updateHint(predict(ai));_x000D_
 }_x000D_
})_x000D_
var map = {_x000D_
 38: 0, // Up_x000D_
 39: 1, // Right_x000D_
 40: 2, // Down_x000D_
 37: 3, // Left_x000D_
};_x000D_
init.onclick = function() {_x000D_
 initialize(ai);_x000D_
 updateUI(ai);_x000D_
 updateHint(predict(ai));_x000D_
}_x000D_
_x000D_
_x000D_
function stats(grid, previousGrid) {_x000D_
_x000D_
 var free = freeCells(grid);_x000D_
_x000D_
 var c = dot2(grid, snake);_x000D_
_x000D_
 return [c, free * free];_x000D_
}_x000D_
_x000D_
function dist2(a, b) { //squared 2D distance_x000D_
 return Math.pow(a[0] - b[0], 2) + Math.pow(a[1] - b[1], 2)_x000D_
}_x000D_
_x000D_
function dot(a, b) {_x000D_
 var r = 0;_x000D_
 for (var i = 0; i < a.length; i++)_x000D_
  r += a[i] * b[i];_x000D_
 return r_x000D_
}_x000D_
_x000D_
function dot2(a, b) {_x000D_
 var r = 0;_x000D_
 for (var i = 0; i < a.length; i++)_x000D_
  for (var j = 0; j < a[0].length; j++)_x000D_
   r += a[i][j] * b[i][j]_x000D_
 return r;_x000D_
}_x000D_
_x000D_
function product(a) {_x000D_
 return a.reduce(function(v, x) {_x000D_
  return v * x_x000D_
 }, 1)_x000D_
}_x000D_
_x000D_
function maxValue(grid) {_x000D_
 return Math.max.apply(null, grid.map(function(a) {_x000D_
  return Math.max.apply(null, a)_x000D_
 }));_x000D_
}_x000D_
_x000D_
function freeCells(grid) {_x000D_
 return grid.reduce(function(v, a) {_x000D_
  return v + a.reduce(function(t, x) {_x000D_
   return t + (x == 0)_x000D_
  }, 0)_x000D_
 }, 0)_x000D_
}_x000D_
_x000D_
function max(arr) { // return [value, index] of the max_x000D_
 var m = [-Infinity, null];_x000D_
 for (var i = 0; i < arr.length; i++) {_x000D_
  if (arr[i] > m[0]) m = [arr[i], i];_x000D_
 }_x000D_
 return m_x000D_
}_x000D_
_x000D_
function min(arr) { // return [value, index] of the min_x000D_
 var m = [Infinity, null];_x000D_
 for (var i = 0; i < arr.length; i++) {_x000D_
  if (arr[i] < m[0]) m = [arr[i], i];_x000D_
 }_x000D_
 return m_x000D_
}_x000D_
_x000D_
function maxScore(nodes) {_x000D_
 var min = {_x000D_
  score: -Infinity,_x000D_
  path: []_x000D_
 };_x000D_
 for (var node of nodes) {_x000D_
  if (node.score > min.score) min = node;_x000D_
 }_x000D_
 return min;_x000D_
}_x000D_
_x000D_
_x000D_
function mv(k, grid) {_x000D_
 var tgrid = M.itransform(k, grid);_x000D_
 for (var i = 0; i < tgrid.length; i++) {_x000D_
  var a = tgrid[i];_x000D_
  for (var j = 0, jj = 0; j < a.length; j++)_x000D_
   if (a[j]) a[jj++] = (j < a.length - 1 && a[j] == a[j + 1]) ? 2 * a[j++] : a[j]_x000D_
  for (; jj < a.length; jj++)_x000D_
   a[jj] = 0;_x000D_
 }_x000D_
 return M.transform(k, tgrid);_x000D_
}_x000D_
_x000D_
function rand(grid) {_x000D_
 var r = Math.floor(Math.random() * freeCells(grid)),_x000D_
  _r = 0;_x000D_
 for (var i = 0; i < grid.length; i++) {_x000D_
  for (var j = 0; j < grid.length; j++) {_x000D_
   if (!grid[i][j]) {_x000D_
    if (_r == r) {_x000D_
     grid[i][j] = Math.random() < .9 ? 2 : 4_x000D_
    }_x000D_
    _r++;_x000D_
   }_x000D_
  }_x000D_
 }_x000D_
}_x000D_
_x000D_
function equal(grid1, grid2) {_x000D_
 for (var i = 0; i < grid1.length; i++)_x000D_
  for (var j = 0; j < grid1.length; j++)_x000D_
   if (grid1[i][j] != grid2[i][j]) return false;_x000D_
 return true;_x000D_
}_x000D_
_x000D_
function conv44valid(a, b) {_x000D_
 var r = 0;_x000D_
 for (var i = 0; i < 4; i++)_x000D_
  for (var j = 0; j < 4; j++)_x000D_
   r += a[i][j] * b[3 - i][3 - j]_x000D_
 return r_x000D_
}_x000D_
_x000D_
function MatrixTransform(n) {_x000D_
 var g = [],_x000D_
  ig = [];_x000D_
 for (var i = 0; i < n; i++) {_x000D_
  g[i] = [];_x000D_
  ig[i] = [];_x000D_
  for (var j = 0; j < n; j++) {_x000D_
   g[i][j] = [[j, i],[i, n-1-j],[j, n-1-i],[i, j]]; // transformation matrix in the 4 directions g[i][j] = [up, right, down, left]_x000D_
   ig[i][j] = [[j, i],[i, n-1-j],[n-1-j, i],[i, j]]; // the inverse tranformations_x000D_
  }_x000D_
 }_x000D_
 this.transform = function(k, grid) {_x000D_
  return this.transformer(k, grid, g)_x000D_
 }_x000D_
 this.itransform = function(k, grid) { // inverse transform_x000D_
  return this.transformer(k, grid, ig)_x000D_
 }_x000D_
 this.transformer = function(k, grid, mat) {_x000D_
  var newgrid = [];_x000D_
  for (var i = 0; i < grid.length; i++) {_x000D_
   newgrid[i] = [];_x000D_
   for (var j = 0; j < grid.length; j++)_x000D_
    newgrid[i][j] = grid[mat[i][j][k][0]][mat[i][j][k][1]];_x000D_
  }_x000D_
  return newgrid;_x000D_
 }_x000D_
 this.copy = function(grid) {_x000D_
  return this.transform(3, grid)_x000D_
 }_x000D_
}
_x000D_
body {_x000D_
 font-family: Arial;_x000D_
}_x000D_
table, th, td {_x000D_
 border: 1px solid black;_x000D_
 margin: 0 auto;_x000D_
 border-collapse: collapse;_x000D_
}_x000D_
td {_x000D_
 width: 35px;_x000D_
 height: 35px;_x000D_
 text-align: center;_x000D_
}_x000D_
button {_x000D_
 margin: 2px;_x000D_
 padding: 3px 15px;_x000D_
 color: rgba(0,0,0,.9);_x000D_
}_x000D_
.r {_x000D_
 display: flex;_x000D_
 align-items: center;_x000D_
 justify-content: center;_x000D_
 margin: .2em;_x000D_
 position: relative;_x000D_
}_x000D_
#hintvalue {_x000D_
 font-size: 1.4em;_x000D_
 padding: 2px 8px;_x000D_
 display: inline-flex;_x000D_
 justify-content: center;_x000D_
 width: 30px;_x000D_
}
_x000D_
<table title="press arrow keys"></table>_x000D_
<div class="r">_x000D_
    <button id=init>init</button>_x000D_
    <button id=runai>run AI</button>_x000D_
    <span id="hintvalue" title="Best predicted move to do, use your arrow keys" tabindex="-1"></span>_x000D_
</div>
_x000D_
_x000D_
_x000D_


This is not a direct answer to OP's question, this is more of the stuffs (experiments) I tried so far to solve the same problem and obtained some results and have some observations that I want to share, I am curious if we can have some further insights from this.

I just tried my minimax implementation with alpha-beta pruning with search-tree depth cutoff at 3 and 5. I was trying to solve the same problem for a 4x4 grid as a project assignment for the edX course ColumbiaX: CSMM.101x Artificial Intelligence (AI).

I applied convex combination (tried different heuristic weights) of couple of heuristic evaluation functions, mainly from intuition and from the ones discussed above:

  1. Monotonicity
  2. Free Space Available

In my case, the computer player is completely random, but still i assumed adversarial settings and implemented the AI player agent as the max player.

I have 4x4 grid for playing the game.

Observation:

If I assign too much weights to the first heuristic function or the second heuristic function, both the cases the scores the AI player gets are low. I played with many possible weight assignments to the heuristic functions and take a convex combination, but very rarely the AI player is able to score 2048. Most of the times it either stops at 1024 or 512.

I also tried the corner heuristic, but for some reason it makes the results worse, any intuition why?

Also, I tried to increase the search depth cut-off from 3 to 5 (I can't increase it more since searching that space exceeds allowed time even with pruning) and added one more heuristic that looks at the values of adjacent tiles and gives more points if they are merge-able, but still I am not able to get 2048.

I think it will be better to use Expectimax instead of minimax, but still I want to solve this problem with minimax only and obtain high scores such as 2048 or 4096. I am not sure whether I am missing anything.

Below animation shows the last few steps of the game played by the AI agent with the computer player:

enter image description here

Any insights will be really very helpful, thanks in advance. (This is the link of my blog post for the article: https://sandipanweb.wordpress.com/2017/03/06/using-minimax-with-alpha-beta-pruning-and-heuristic-evaluation-to-solve-2048-game-with-computer/ and the youtube video: https://www.youtube.com/watch?v=VnVFilfZ0r4)

The following animation shows the last few steps of the game played where the AI player agent could get 2048 scores, this time adding the absolute value heuristic too:

enter image description here

The following figures show the game tree explored by the player AI agent assuming the computer as adversary for just a single step:

enter image description here enter image description here enter image description here enter image description here enter image description here enter image description here


I copy here the content of a post on my blog


The solution I propose is very simple and easy to implement. Although, it has reached the score of 131040. Several benchmarks of the algorithm performances are presented.

Score

Algorithm

Heuristic scoring algorithm

The assumption on which my algorithm is based is rather simple: if you want to achieve higher score, the board must be kept as tidy as possible. In particular, the optimal setup is given by a linear and monotonic decreasing order of the tile values. This intuition will give you also the upper bound for a tile value: s where n is the number of tile on the board.

(There's a possibility to reach the 131072 tile if the 4-tile is randomly generated instead of the 2-tile when needed)

Two possible ways of organizing the board are shown in the following images:

enter image description here

To enforce the ordination of the tiles in a monotonic decreasing order, the score si computed as the sum of the linearized values on the board multiplied by the values of a geometric sequence with common ratio r<1 .

s

s

Several linear path could be evaluated at once, the final score will be the maximum score of any path.

Decision rule

The decision rule implemented is not quite smart, the code in Python is presented here:

@staticmethod
def nextMove(board,recursion_depth=3):
    m,s = AI.nextMoveRecur(board,recursion_depth,recursion_depth)
    return m

@staticmethod
def nextMoveRecur(board,depth,maxDepth,base=0.9):
    bestScore = -1.
    bestMove = 0
    for m in range(1,5):
        if(board.validMove(m)):
            newBoard = copy.deepcopy(board)
            newBoard.move(m,add_tile=True)

            score = AI.evaluate(newBoard)
            if depth != 0:
                my_m,my_s = AI.nextMoveRecur(newBoard,depth-1,maxDepth)
                score += my_s*pow(base,maxDepth-depth+1)

            if(score > bestScore):
                bestMove = m
                bestScore = score
    return (bestMove,bestScore);

An implementation of the minmax or the Expectiminimax will surely improve the algorithm. Obviously a more sophisticated decision rule will slow down the algorithm and it will require some time to be implemented.I will try a minimax implementation in the near future. (stay tuned)

Benchmark

  • T1 - 121 tests - 8 different paths - r=0.125
  • T2 - 122 tests - 8-different paths - r=0.25
  • T3 - 132 tests - 8-different paths - r=0.5
  • T4 - 211 tests - 2-different paths - r=0.125
  • T5 - 274 tests - 2-different paths - r=0.25
  • T6 - 211 tests - 2-different paths - r=0.5

enter image description here enter image description here enter image description here enter image description here

In case of T2, four tests in ten generate the 4096 tile with an average score of s 42000

Code

The code can be found on GiHub at the following link: https://github.com/Nicola17/term2048-AI It is based on term2048 and it's written in Python. I will implement a more efficient version in C++ as soon as possible.


There is already an AI implementation for this game here. Excerpt from README:

The algorithm is iterative deepening depth first alpha-beta search. The evaluation function tries to keep the rows and columns monotonic (either all decreasing or increasing) while minimizing the number of tiles on the grid.

There is also a discussion on Hacker News about this algorithm that you may find useful.


Many of the other answers use AI with computationally expensive searching of possible futures, heuristics, learning and the such. These are impressive and probably the correct way forward, but I wish to contribute another idea.

Model the sort of strategy that good players of the game use.

For example:

13 14 15 16
12 11 10  9
 5  6  7  8
 4  3  2  1

Read the squares in the order shown above until the next squares value is greater than the current one. This presents the problem of trying to merge another tile of the same value into this square.

To resolve this problem, their are 2 ways to move that aren't left or worse up and examining both possibilities may immediately reveal more problems, this forms a list of dependancies, each problem requiring another problem to be solved first. I think I have this chain or in some cases tree of dependancies internally when deciding my next move, particularly when stuck.


Tile needs merging with neighbour but is too small: Merge another neighbour with this one.

Larger tile in the way: Increase the value of a smaller surrounding tile.

etc...


The whole approach will likely be more complicated than this but not much more complicated. It could be this mechanical in feel lacking scores, weights, neurones and deep searches of possibilities. The tree of possibilities rairly even needs to be big enough to need any branching at all.


I think I found an algorithm which works quite well, as I often reach scores over 10000, my personal best being around 16000. My solution does not aim at keeping biggest numbers in a corner, but to keep it in the top row.

Please see the code below:

while( !game_over ) {
    move_direction=up;
    if( !move_is_possible(up) ) {
        if( move_is_possible(right) && move_is_possible(left) ){
            if( number_of_empty_cells_after_moves(left,up) > number_of_empty_cells_after_moves(right,up) ) 
                move_direction = left;
            else
                move_direction = right;
        } else if ( move_is_possible(left) ){
            move_direction = left;
        } else if ( move_is_possible(right) ){
            move_direction = right;
        } else {
            move_direction = down;
        }
    }
    do_move(move_direction);
}

I became interested in the idea of an AI for this game containing no hard-coded intelligence (i.e no heuristics, scoring functions etc). The AI should "know" only the game rules, and "figure out" the game play. This is in contrast to most AIs (like the ones in this thread) where the game play is essentially brute force steered by a scoring function representing human understanding of the game.

AI Algorithm

I found a simple yet surprisingly good playing algorithm: To determine the next move for a given board, the AI plays the game in memory using random moves until the game is over. This is done several times while keeping track of the end game score. Then the average end score per starting move is calculated. The starting move with the highest average end score is chosen as the next move.

With just 100 runs (i.e in memory games) per move, the AI achieves the 2048 tile 80% of the times and the 4096 tile 50% of the times. Using 10000 runs gets the 2048 tile 100%, 70% for 4096 tile, and about 1% for the 8192 tile.

See it in action

The best achieved score is shown here:

best score

An interesting fact about this algorithm is that while the random-play games are unsurprisingly quite bad, choosing the best (or least bad) move leads to very good game play: A typical AI game can reach 70000 points and last 3000 moves, yet the in-memory random play games from any given position yield an average of 340 additional points in about 40 extra moves before dying. (You can see this for yourself by running the AI and opening the debug console.)

This graph illustrates this point: The blue line shows the board score after each move. The red line shows the algorithm's best random-run end game score from that position. In essence, the red values are "pulling" the blue values upwards towards them, as they are the algorithm's best guess. It's interesting to see the red line is just a tiny bit above the blue line at each point, yet the blue line continues to increase more and more.

scoring graph

I find it quite surprising that the algorithm doesn't need to actually foresee good game play in order to chose the moves that produce it.

Searching later I found this algorithm might be classified as a Pure Monte Carlo Tree Search algorithm.

Implementation and Links

First I created a JavaScript version which can be seen in action here. This version can run 100's of runs in decent time. Open the console for extra info. (source)

Later, in order to play around some more I used @nneonneo highly optimized infrastructure and implemented my version in C++. This version allows for up to 100000 runs per move and even 1000000 if you have the patience. Building instructions provided. It runs in the console and also has a remote-control to play the web version. (source)

Results

Surprisingly, increasing the number of runs does not drastically improve the game play. There seems to be a limit to this strategy at around 80000 points with the 4096 tile and all the smaller ones, very close to the achieving the 8192 tile. Increasing the number of runs from 100 to 100000 increases the odds of getting to this score limit (from 5% to 40%) but not breaking through it.

Running 10000 runs with a temporary increase to 1000000 near critical positions managed to break this barrier less than 1% of the times achieving a max score of 129892 and the 8192 tile.

Improvements

After implementing this algorithm I tried many improvements including using the min or max scores, or a combination of min,max,and avg. I also tried using depth: Instead of trying K runs per move, I tried K moves per move list of a given length ("up,up,left" for example) and selecting the first move of the best scoring move list.

Later I implemented a scoring tree that took into account the conditional probability of being able to play a move after a given move list.

However, none of these ideas showed any real advantage over the simple first idea. I left the code for these ideas commented out in the C++ code.

I did add a "Deep Search" mechanism that increased the run number temporarily to 1000000 when any of the runs managed to accidentally reach the next highest tile. This offered a time improvement.

I'd be interested to hear if anyone has other improvement ideas that maintain the domain-independence of the AI.

2048 Variants and Clones

Just for fun, I've also implemented the AI as a bookmarklet, hooking into the game's controls. This allows the AI to work with the original game and many of its variants.

This is possible due to domain-independent nature of the AI. Some of the variants are quite distinct, such as the Hexagonal clone.


Examples related to algorithm

How can I tell if an algorithm is efficient? Find the smallest positive integer that does not occur in a given sequence Efficiently getting all divisors of a given number Peak signal detection in realtime timeseries data What is the optimal algorithm for the game 2048? How can I sort a std::map first by value, then by key? Finding square root without using sqrt function? Fastest way to flatten / un-flatten nested JSON objects Mergesort with Python Find common substring between two strings

Examples related to logic

How to do perspective fixing? What is the optimal algorithm for the game 2048? ReferenceError: Invalid left-hand side in assignment Prolog "or" operator, query Write code to convert given number into words (eg 1234 as input should output one thousand two hundred and thirty four) JQuery .hasClass for multiple values in an if statement AND/OR in Python? Simple 'if' or logic statement in Python 1 = false and 0 = true? Reference — What does this symbol mean in PHP?

Examples related to artificial-intelligence

How to get Tensorflow tensor dimensions (shape) as int values? How to compute precision, recall, accuracy and f1-score for the multiclass case with scikit learn? What is the optimal algorithm for the game 2048? Epoch vs Iteration when training neural networks What's is the difference between train, validation and test set, in neural networks? What is the role of the bias in neural networks? What is the difference between supervised learning and unsupervised learning? What are good examples of genetic algorithms/genetic programming solutions? source of historical stock data What algorithm for a tic-tac-toe game can I use to determine the "best move" for the AI?

Examples related to 2048

What is the optimal algorithm for the game 2048?