Thinking differently

Most chess engines seem to rely on a move generator that produces a list of moves. Then, it is possible to sort the moves so that the best candidates are searched first. However, move lists take memory space. That is a problem for my GPU move generator. There is 16 kB of shared memory for a thread group. If a thread group contains 256 threads, each thread has 64 bytes of memory. 32 bytes of it is reserved for the current positions, so only 32 bytes is left for moves. That is not enough for a move list. One has to think differently.

As mentioned in my previous posts, I implemented a traditional bitboard-based move generator that produces one move at a time on the GPU. Unfortunately, the performance is not so great. For comparison, I had a move generator that outputs a full move list at once. The former was four times slower than the latter. It seems that there are two reasons for that. First, the move generator is more complex, because it has to take the previous move as an input to regenerate the state of the move generator and to produce the next move. Second, it does unnecessary work, because one move needs only one destination square, although a bitboard representing all of them is generated for each move. I think it is time to revisit the design I had for the original CUDA chess move generator.

The basic idea is to have a few generic numbers represent the state of the move generator. For example: one for the source square (1-64), one for the direction (1-8) and one for the steps in that direction (1-7). Then the numbers are increased as the moves are produced. A similar idea was used in the early dedicated chess computers. And of course, one can use bit twiddling tricks to pack all the numbers in one integer. Tests are required to see how to implement the move generator most efficiently.


  1. have you given up on getting a chess program to run on GPU?

  2. I haven't given up on the idea. There are two reasons, why this blog has been inactive: I have been busy and don't have a good idea for the search algorithm. Professionally, I work with GPUs every day and also play chess every day, so as soon as I get a good idea, I might continue.

  3. Samuel,

    I'm a mathematician and programmer. I have experience with machine learning, and I've just started learning CUDA. I'm not interested in chess as such, for me it's just a well-defined challenge. I am interested in AlphaZero and and have been watching (and contributing to) LeelaChessZero's progress. There's a question in my mind: are large neural nets really the best approach to chess, or they just the first, and perhaps the easiest, way of exploiting GPUs? My hunch is that's there's nothing special about neural nets. Over the last few months I've been thinking how to deal with a tree search in a GPU, and I have devised what I hope is a good algorithm...

    I'll say more when I know you're reading this and are interested in my ideas. Or you can find me via www.indriid.com.


  4. Graham,

    Sorry for the delay. I think deep neural networks are a good fit for GPUs. It's relatively easy to utilize the massive parallelism of GPUs with them. But I think AlphaZero has many other interesting techniques, which contribute to its success. I am going to write a new blog post about it soon.

    And I am very interested in hearing about your algorithm.


  5. Samuel,

    I've put some notes about my approach and progress here.