Many of the other answers use AI with computationally expensive searching of possible futures, heuristics, learning and the such. These are impressive and probably the correct way forward, but I wish to contribute another idea.
Model the sort of strategy that good players of the game use.
For example:
13 14 15 16
12 11 10 9
5 6 7 8
4 3 2 1
Read the squares in the order shown above until the next squares value is greater than the current one. This presents the problem of trying to merge another tile of the same value into this square.
To resolve this problem, their are 2 ways to move that aren't left or worse up and examining both possibilities may immediately reveal more problems, this forms a list of dependancies, each problem requiring another problem to be solved first. I think I have this chain or in some cases tree of dependancies internally when deciding my next move, particularly when stuck.
Tile needs merging with neighbour but is too small: Merge another neighbour with this one.
Larger tile in the way: Increase the value of a smaller surrounding tile.
etc...
The whole approach will likely be more complicated than this but not much more complicated. It could be this mechanical in feel lacking scores, weights, neurones and deep searches of possibilities. The tree of possibilities rairly even needs to be big enough to need any branching at all.