Tuesday, July 31, 2012

Hacking the Sudoku solver from The Ruby Programming Language

On only page 18 of Flanagan and Matsumoto's excellent "The Ruby Programming Language" a relatively complex example is presented: a Sudoku puzzle solver. I've had a decades-long interest in maze solvers, and this tied in nicely with that. I resolved to give the code a close look when I'd finished the book.

And surprising myself, I soon did. I read it cover-to-cover, frequently even re-reading sections I'd already covered when a lack of mastery of specific material became evident later. The language is a nice alternative to Perl, which had been my scripting language of choice since 1998. Ruby has a lot of cool characteristics, even if it sometimes falls weakness to Perl's desire to appease too much diversity in coding styles. I personally think being able to do things effectively two different ways is inferior to being able to do things as effectively only one, as the latter makes multi-author code easier to follow and coherently maintain. But as usual, I digress.

When I'd felt I'd given enough attention to the rest of the text, I returned to the Sudoku solver. I couldn't resist making some on-the-fly changes immediately. Ffor example, certain look-up tables are hard-coded when they can be trivially generated algorithmically, for example a table to the "box number" associated with a given cell. But beyond that, I didn't like that the code was hard-coded for a 9×9 puzzle. Sudokus must be a square array of squares, so the length must equal the width each of which is the square of an integer. This integer I call the "size" of the puzzle. So a 9×9 would be a size 3.

9×9 is convenient because squares can be filled with the symbols 1, 2, 3, 4, 5, 6, 7, 8, and 9. I say "symbols" rather than "numbers" because there's nothing numerical about Sudoku. The only test of the elements is the test of uniqueness. Any set of characteristics which could be tested for uniqueness could do. You can use letters, pictograms, colors, anything which can be compared for match or no match. So it's commonly stated that Sudoku is a mathematical puzzle, but the only mathematics involved are the mathematics of sets. The elements of the set are arbitrary.

Given this, I was surprised that the example puzzle begins by converting the symbols in the puzzle into integers using an obscure string of numbers stores in a string. Mappingf character sequences to integer representation is an efficiency already taken care of by the symbol object in Ruby. This approach seems more appropriate for C code than Ruby. Instead of storing cell contents with numbers and converting to strings, I converted the code to use symbols, which I store in an array. This is much simpler. The only question is what symbols to use. I picked digits 1-9 followed by letters (A-Z, then a-z) then on to circled numbers available in Unicode. This allowed me to solve puzzles up to at least 100×100 (size 10). An alternate would be to allow users of the puzzle "class" to specify the symbols to be used.

Then there was the issue of how the class was organized. A Sudoku module was defined with a puzzle "class", then a module method was defined to solve the puzzle. This seemed artificial to me, since the class had specific functionality required to support the solution algorithm, why act as if the solution was some sort of client of a more general class? I instead incorporated the solution method directly into the class, formalizing the justification for the class methods specific to the solution algorithm.

So with the code now supporting arbitrary puzzle sizes, I gave it a try. 9×9: no problem. It blows through those, even for a starting puzzle which is a totally blank board. Then 16×16 was also no problem. But get up to 25×25 and it started to develop hic-cups... and 36×36 was a real challenge.

These large puzzles revealed some more fundamental weaknesses of the original code. Here's a general outline of the algorithm:

1. If there's any squares with no options, the puzzle is impossible, so indicate that.
2. If there's a square which has only one option, choose that option, and return to the previous step (making sure the puzzle isn't impossible).
3. At this step, every blank square has at least two options. Pick one of those with the least number of options. Think of this as the square most likely to yield a correct "guess". Pick one of the available options for this square (original code picked the lowest-value option; in my code I randomize the selection order).
4. Try to solve the puzzle with the "guess" in this square. If it fails, try the next guess. If there's no more guesses available for this square, if all have been tried and each yielded an impossible puzzle, then mark the puzzle as impossible.

This algorithm naturally lends itself to recursion, so that's what Flanagan and Matsumoto do. Everytime a guess is made at a cell value, the solution method calls itself with a full copy of the puzzle with the guess filled in. Unfortunately, a large puzzle can involve a LOT of guesses. Making a full copy of the puzzle for a recursion call every time is inefficient.

So I changed this to a hierarchical data structure. Every time a guess was made only the new cell's value was stored, in a hash. If a value wasn't found in the hash, it was checked for in a parent hash from the solution call which recursively called the present one. Now memory wasn't a problem, but cell value lookups really slowed things down.

The approach I took was to have the code cache these value look-ups. Furthermore, instead of evaluating from scratch what values were unused in each row, column, and box, I stored these values in separate hashes. With this combination, memory use was substantially reduced while decent speed was restored.

A problem remained however, and that was stack size. LISP is a language in which recursion is a preferred technique: the language is designed with what's called "tail recursion" so that recursion can be done efficiently. Ruby doesn't have this -- it's optimized for loops, not recursion. I was hitting fundamental stack limits with the larger puzzles for which there seemed to be no solution.

So I decided to bite the bullet and get rid of the recursion. Recursion is super-convenient for book-keeping. Eliminating the recursion required that I keep track of where I was, where I was, and how to get back to previous states if the present solution direction yielded a dead end. Eventually I got it working, but only with considerable effort.

Now once again I had no problem with the smaller puzzles. But I really wanted to solve 36×36 and beyond. In these puzzles, the code would get hung up for many hours, seemingly stuck in a loop. But from the algorithm, a loop should be impossible... to check for this, I installed code which kept a record of a "hash value" for a puzzle state, then checking to make sure no solution loops were found. I'd check for loops up to 10 thousand steps long, and never found any. Yet the solution would grind on for 10 million total guesses or more...

I retreated to smaller puzzles. Generally the code could solve these, but every often it would get hung up. It would grind on and on, modifying a small group of squares in an attempt to break through to a solution path. Once the path was found, the rest of the solution might go quickly.

The issue is Sudoku puzzles are like mazes, but in solving mazes, the number of options available at a given point is at most 2×N, where N is the number of dimensions. With Sudoku there may be as many as L options available, where L is the side length of the puzzle. It's not surprising it would be easy to get lost in a 32-dimensional maze...

So I've concluded solving big Sudoku's is simply difficult. Without researching it, I wonder if it's possible to apply more intelligence to the situation than simply guessing, in order, the options of the square with the least number of options. But I'll leave it at that: my mission of exercising Ruby skills was accomplished. (note: I just noticed there's a forum dedicated to the topic!)

Finally, here's a sample of my test case. I start with a minimal puzzle, one with a single row and column randomly filled it with no box violations:

``` .  .  .  .  .  .  .  .  .  .  .  .  .  .  2  .  .  .  .  .  .  .  .  .  .
.  .  .  .  .  .  .  .  .  .  .  .  .  .  1  .  .  .  .  .  .  .  .  .  .
.  .  .  .  .  .  .  .  .  .  .  .  .  .  C  .  .  .  .  .  .  .  .  .  .
.  .  .  .  .  .  .  .  .  .  .  .  .  .  6  .  .  .  .  .  .  .  .  .  .
.  .  .  .  .  .  .  .  .  .  .  .  .  .  N  .  .  .  .  .  .  .  .  .  .
.  .  .  .  .  .  .  .  .  .  .  .  .  .  3  .  .  .  .  .  .  .  .  .  .
.  .  .  .  .  .  .  .  .  .  .  .  .  .  5  .  .  .  .  .  .  .  .  .  .
.  .  .  .  .  .  .  .  .  .  .  .  .  .  9  .  .  .  .  .  .  .  .  .  .
.  .  .  .  .  .  .  .  .  .  .  .  .  .  7  .  .  .  .  .  .  .  .  .  .
.  .  .  .  .  .  .  .  .  .  .  .  .  .  F  .  .  .  .  .  .  .  .  .  .
.  .  .  .  .  .  .  .  .  .  .  .  .  .  E  .  .  .  .  .  .  .  .  .  .
.  .  .  .  .  .  .  .  .  .  .  .  .  .  D  .  .  .  .  .  .  .  .  .  .
.  .  .  .  .  .  .  .  .  .  .  .  .  .  4  .  .  .  .  .  .  .  .  .  .
.  .  .  .  .  .  .  .  .  .  .  .  .  .  K  .  .  .  .  .  .  .  .  .  .
.  .  .  .  .  .  .  .  .  .  .  .  .  .  P  .  .  .  .  .  .  .  .  .  .
.  .  .  .  .  .  .  .  .  .  .  .  .  .  I  .  .  .  .  .  .  .  .  .  .
.  .  .  .  .  .  .  .  .  .  .  .  .  .  M  .  .  .  .  .  .  .  .  .  .
.  .  .  .  .  .  .  .  .  .  .  .  .  .  8  .  .  .  .  .  .  .  .  .  .
.  .  .  .  .  .  .  .  .  .  .  .  .  .  O  .  .  .  .  .  .  .  .  .  .
3  C  P  4  9  G  B  K  1  F  7  L  5  N  A  D  O  I  6  H  E  M  8  2  J
.  .  .  .  .  .  .  .  .  .  .  .  .  .  G  .  .  .  .  .  .  .  .  .  .
.  .  .  .  .  .  .  .  .  .  .  .  .  .  L  .  .  .  .  .  .  .  .  .  .
.  .  .  .  .  .  .  .  .  .  .  .  .  .  H  .  .  .  .  .  .  .  .  .  .
.  .  .  .  .  .  .  .  .  .  .  .  .  .  J  .  .  .  .  .  .  .  .  .  .
.  .  .  .  .  .  .  .  .  .  .  .  .  .  B  .  .  .  .  .  .  .  .  .  .
```

Then I solve. Progress is shown here, with a logarithmic axis on iteration number, where the y-axis is number of points solved (not every point is shown). You can see how the progress is in short bursts, then the solver becomes bogged down, then there's another short burst. These portions of no visible progress can last for millions of iterations if I'm unlucky.

Then here's the solution (non-unique in this case). Nodes which received at least as many guesses as total cells in the puzzle (in this case 625) are encased in [brackets]:

``` G  L  4  B  1  K  P  H  J  9  M  D  I  O  2  3  N  7  C  8  6  A  5  E  F
F  K  6  A  M  D  N  C  O  8  G  5  L  B  1  P  H  E  2  4  9  3  J  7  I
8  O  D  2  E  B  L  1 [5][I] H  3  7  4  C [A][6][J][9] F  M  G  N  K  P
I  N  C  3  J  E  F  G  2  7  P  8  9  A  6  M  5  L  D  K  1  H  B  4  O
7  5  9  P  H  4  A  M  3  6  K  E  F  J  N  B  I  G  O  1  8  2  D  C  L
K  H  G  7  C  9  I  B  D  P  E  4  J  M  3  1  A  2  F  L  N  O  6  8  5
2  8  E  D  A  L  3  J  F  C  I  P  O  1  5  G  4  6  H  N  K  B  9  M  7
5  F  N  J  P  A  K  O  G  M  L  2  B  6  9  8  E  D  7  I  C  4  H  3  1
M  6  O  9  4  8  1  5  E  2  N  A  H  G  7  C  K  3  B  P  L  J  F  I  D
1  3  L  I  B  N  H  7  6  4  8  C  D  K  F  O  M  9  J  5  A  E  G  P  2
4  M  K  1  8  F  J  I  L  B  5  7  N  3  E  6  G  P  A  D  2  C  O  9  H
B  J  7  F  O  1  E  4  M  K  2  9  A  L  D  H  3  8  N  C  5  I  P  6  G
A  G  2  6  L  H  7  P  C  N  O  F  M  I  4  K  J  5  E  9  B  1  3  D  8
P  E  H  C  I  O  D  3  9  5  J  6  G  8  K  4  7  B  1  2  F  L  M  N  A
D  9  5  N  3  6  G  2  8  A  B  H  1  C  P  L  F  M  I  O  4  7  K  J  E
N  1  M  L  2  J  9  8  P  H  3  G  E  D  I  5  C  F  4  7  O  6  A  B  K
6  7  B  8  F  2  O  N  4  D  9  1  K  P  M  J  L  A  G  E  3  5  I  H [C]
O  D  J  5  G  I  M  6  A  E  C  B  4  H  8  2  1  K  P  3  7  N  L  F  9
H  I  A  E  K  C  5  L  7  3  F  J  6  2  O  9  B  N  8  M [D][P][1][G] 4
3  C  P  4  9  G  B  K  1  F  7  L  5  N  A  D  O  I  6  H  E  M  8  2  J
E  4  1  M  6  5  C  D  K  L  A  N  8  9  G  I  2  H  3  J  P  F  7  O  B
9  B  I  G  N  P  2  F  H  O  1  M  3  E  L  7  D  C  K  A  J  8  4  5  6
J  2  3  K  D  M  8  A  B  1  4  I  C  7  H  F  P  O  5  6  G  9  E  L  N
C  A  F  H  7  3  6  9  N  G  D  O  P  5  J  E  8  4  L  B  I  K  2  1  M
L  P  8  O  5  7  4  E  I  J  6  K  2  F  B  N  9  1  M  G  H  D  C  A  3
```

2 comments:

Miki said...

Hello.
Could you tell me what program do you use to make elevation profiles and other graphic?
Regards,
Michal

djconnel said...

[url=http://plasma-gate.weizmann.ac.il/Grace/]xmgrace[/url]: not so easy to use but produces good-quality plots. I wish I knew gnuplot... need to invest time in learning that at some point. It has has better default behavior and is easier to generate automatically.