...the world would be perfect. Unfortunately, it is not. Specifically, the rules of chess could be elegant, but there are some special moves that make things annoyingly difficult. Those are the double pawn move, promotion, en passant, and castling. In addition, making sure the side to move is not in check after the move can cause headaches. I think my move generator is only half as fast as it could be without the exceptions.
I am trying to find ways to handle the special moves like normal moves. This is not easy. Some moves require extra work. In castling, two pieces move. In promotions, the piece type changes during the move. In en passant, the captured piece is in a different square than the destination square of the piece. I do not want to make make() and unmake()-functions much more complicated. In a typical chess engine, there are ifs for testing for the special cases. I want to avoid ifs even if the thread divergence is not an issue.
I tried to fit the move data in 32 bits. This means all the data required for restoring the position after undoing the move. I could do it easily without the special moves. But with them, I think that I need more. A general move might look like this:
|captured pc. 2||piece 2||to sq. 2||from sq. 2||captured pc. 1||piece 1||to sq. 1||from sq. 1|
|4 bits||4 bits||6 bits||6 bits||4 bits||4 bits||6 bits||6 bits|
That is 40 bits. So is it enough? Let's see. We have to- and from-squares, plus moving piece and captured piece. Two times. Obviously, regular moves (captures or not) require only one set of the fields. The other set can then be set to point to a dummy square and piece. Then, special moves require more fields. Castling consists of one king move and one rook move without captured pieces. En passant consists of a pawn move which captures the en passant square and of another move where an empty square captures the opponents pawn (sounds funny, but it works). Promotion consists of a pawn move (can be a capture or not) and of another move where the promoted piece captures the pawn on the promotion square (again, sounds odd but it works).
Then, some extra fields are required for restoring the position. These include castling rights, en passant square, and move counter for the 50-moves rule. Finally, in my implementation, I need to store the piece list indices for captured pieces, which is extra 4 bits per captured piece. So the eventual move structure could be:
|move counter||EP sq.||Castling rights||captured pc. 2||piece 2||to sq. 2||from sq. 2||captured pc. 1||piece 1||to sq. 1||from sq. 1|
|8 bits||4 bits||4 bits||4+4 bits||4 bits||6 bits||6 bits||4+4 bits||4 bits||6 bits||6 bits|
This is not the thightest packing possible, but it takes 64 bits, which is convinient for alignment reasons. For the GPU, it is better to pack things like this, although a few extra instructions are required to unpack them, because memory reads take time, and going over the 1 kB per warp limit reduces the multiprocessor occupancy rate.