Chess on a GPGPU: Transposition table and hashing issues

The transposition table is an essential part of an effifient chess engine. It stores results of subtrees that have been searched into a hash table. Then, if the search encounters the root of the subtree by a different move order, it can use the stored results directly instead of performing the search. It is easy, however, to introduce nasty bugs with its implementation. There are two kinds of errors that arise with the transposition table code: (1) hash collisions, and (2) invalid use of hashed search results. In addition, the transposition table must, most probably, reside in the global memory. That is not only slow, but allows a third class of errors, i.e. corruption of the hash entries by concurrent actions by multiple threads. Let us examine the possible errors one by one and then sketch a plan for the implemetation.

Hash collisions occur when to positions are either given the same hash key or mapped into the same position in the hash table. The latter case is much more common that the former one if the hash keys are long enough. They are also easier to avoid. Simply storing the hash keys in the transposition table and comparing them is enough the avoid these errors. The case where the hash keys are the same, is more difficult. It can be avoided to some extent by producing the hash keys properly, e.g. by Zobrist hashing (see http://chessprogramming.wikispaces.com/Zobrist+Hashing). However, even with good hashing, the probability of two positions given the same key is non-zero. With 32-bit keys, there are only 4 billion possible keys. Then, if we have, 4 million positions in the transposition table, the probability of a position given the same key as already given to antoher position is 1:1000. This might sound like a small number, but if you repeat this a million time, you get a thousand errors or so. That is enough to completely ruin the search. So an easy solution is to use longer keys. With 64-bit keys these kind of errors occur much more seldom, but they still do occur, given a large enough transposition table and a search that is efficient enough to visit billions of nodes during the game. So one must be careful to design the transposition table so that the probability of these errors is minimized.

The hashed results can be used for wrong purposes. It is, of course, necessary to check that the stored entry is still valid and deep enough. Then, it is also necessary to check the search bounds. Especially, zero window searches do not give accurate results, but only fail high or low. Then it is necessary to compare the search window used in the transposition entry to the current one. Sometimes, it is possible to use the transposition table entry for move ordering although the stored score cannot be used in the current search.

So where in the search tree to use the transposition table? Its purpose is to save computation time. Two factors affect the potential saving: the number of nodes under the subtree corresponding to the hash entry, and the probability of the search probing the entry. The number of nodes under the subtree is the greatest near the root. However, the probability of hitting the same node is zero for less than 3 plies. After that the probability increases gradually as long as the transposition table is large enough. Imagine a infinitely large transposition table with perfect hashing. Then the probability of finding a position approaches 100 % when the depth of the search increases. Why? Eventually the search will visit all the positions that are required to prove the absolute result, i.e. win for either side or draw. This is only a theoretical consideration, because in practice, the transposition tables are of finite size and the hashing is less than perfect. The probability of probing a node decreases with a small transposition table when the number of stored nodes becomes proportional to the size of the transposition table. This happens because we run out of places in the hash table and there are collisions which lead to discarding data (either the just searched position or the transposition table entry result). This is another reason why saving leaf nodes in the transposition table does not sound good (the other reason being that the subtree is only one node plus possible quiescence search nodes).

I have made simple tests with a transposition table on the CPU. I have written a version of perft ()-function (see previous posts for explanation) that uses a transposition table. The transposition table entries consist of four unsigned ints which take 16 bytes of memory in total. Two of the entries are for the hash keys. The first one is the primary key which is used for hashing and the second one is for checking the validity of the hash entry in the case of a collision. I use the Zobrist hashing scheme. The other two values are the depth and the score. These are included just for testing because perft does not return a score. In addition, the 128-bit hash entry size corresponds to the memory alignment of the global memory. I run the tests with 256 Mb and 1 Gb transposition tables with 16 M and 64 M entries, respectively. These correspond to possible global memory sizes on modern GPUs. I think that we can use as much memory as possible because the global memory access is slow regardless of the table size. The newest GPUs have caches for the global memory, but due to the random access pattern of the hash probes it is of little help. However, the results look like this (initial position, 1 Gb table):

perft (7) = 3195075629 at 39.93 Mnps
depth 1 hash table: 42.477849 % hits, 18.217728 % collisions
depth 2 hash table: 58.034033 % hits, 4.928729 % collisions
depth 3 hash table: 34.097889 % hits, 1.510128 % collisions
depth 4 hash table: 33.115211 % hits, 0.386183 % collisions
depth 5 hash table: 0.000000 % hits, 0.237530 % collisions
depth 6 hash table: 0.000000 % hits, 0.000000 % collisions
depth 7 hash table: 0.000000 % hits, 0.000000 % collisions

The depth refers to the remaining depth in the tree traversial. Leaf nodes were not stored in the transposition table. The node count is wrong (should be 3,195,901,860) because of hash collisions that went undetected. It is clear that 32-bit hash keys are not sufficient for avoiding errors. Another thing that can be seen is that the utility of the transposition table is at best when the number of nodes is still much less than the table size. There are about 2 million nodes at depth 2. For the 64 million entry transposition table this results in the best hit rate. At depth 1, there is already about 24 million nodes, which results in lots of collions reducing the hit rate.

It should be noted that the hit rates are different for different search algorithms. The numbers above are not representative of typical transpositions table hit rates with alpha-beta search algorithms, because perft ()-function visits all the nodes in the search tree. Hit rates are smaller when the visited nodes are selected based on the alpha and beta bounds.

1 comment:

Unknown9/06/2013 10:47 PM
RE: "With 32-bit keys, there are only 4 billion possible keys. Then, if we have, 4 million positions in the transposition table, the probability of a position given the same key as already given to another position is 1:1000"

The probability of a hash collision with 4 million positions out of 4 billion possible keys is much greater than 1:1000

Odds are 50:50 that the first hash collision occurs when the 65,537th position is hashed.

See
http://en.wikipedia.org/wiki/Birthday_problem

2/18/2013

Transposition table and hashing issues

1 comment: