SSE Optimized Sudoku solver

Just programming a sudoku solver is boring, so a friend of mine (Christoph Simon) and I decided to do an SSE optimized implementation. There are faster plain C implementations out there, but it was an interesting project.

Get it

You can get it using git, or browser the source:

git clone https://git.sipsolutions.net/sudoku.git

Implementation

The important thing about the implementation is the choice of the storage format. We decided to store a '1' bit for a number that can still be put into a field, and a '0' if not. We then split each number into a separate bit field. This means, that we now have 9 fields (one for each number in the sudoku) with each 81 bits that needs to be stored.

To make calculations easier, we put these into the lower 3 words of the 4 word vector. Where each word stores three blocks of the sudoku. So we got the following format (w.b, where 'w' is the word and 'b' is the bit inside the word):

 0.00  0.01  0.02 | 0.03  0.04  0.05 | 0.06  0.07  0.08
 0.09  0.10  0.11 | 0.12  0.13  0.14 | 0.15  0.16  0.17
 0.18  0.19  0.20 | 0.21  0.22  0.23 | 0.24  0.25  0.26
--------------------------------------------------------
 1.00  1.01  1.02 | 1.03  1.04  1.05 | 1.06  1.07  1.08
 1.09  1.10  1.11 | 1.12  1.13  1.14 | 1.15  1.16  1.17
 1.18  1.19  1.20 | 1.21  1.22  1.23 | 1.24  1.25  1.26
--------------------------------------------------------
 2.00  2.01  2.02 | 2.03  2.04  2.05 | 2.06  2.07  2.08
 2.09  2.10  2.11 | 2.12  2.13  2.14 | 2.15  2.16  2.17
 2.18  2.19  2.20 | 2.21  2.22  2.23 | 2.24  2.25  2.26

So, lets say we have the following fields where a 6 is allowed to be placed:

 X . . | . . X | . X .
 . X . | . . . | X . .
 . . . | X X X | . X .
-----------------------
 X . . | . . X | . X .
 X . . | . . X | . X .
 X . . | . . X | . X .
-----------------------
 X . . | . . X | . X .
 X . . | . . X | . X .
 X . . | . . X | . X .

We get the following values in the 32bit words:

0x0429083a
0x042a150a
0x042a150a

which means the stored vector is:

0x00000000042a150a042a150a0429083a

On this we perform the following simplifications:

  1. If there is only one number possible in one field, it may not be set in the same block/line/column.
  2. Every number needs to be set at least once in a block/line/column. So if it is only possible to place a number into a specific field, no other number can be placed there.

After this check whether everything is still valid (no field with no possible number left, other error checks).

Once a sudoku is simplified, we check whether the it has been solved, and if not set a number and do a recursion.

Speed

I have not done many measurements. But it seems that the inner simplification loop requires about 1000 processor cycles. So we can generate more than a million solutions per second, but this is of course only valid in the optimal case, other measurements suggests more like 25000 per second.