Saturday, April 15, 2006

Scrabble is harder!

In addition to the crossword solver tool, the  wirdz site now includes a look up engine for Scrabble® and similar word games - the wirdz Scrabble Solver.

Making this run fast was a tougher problem than solving anagrams or doing wild card looks up against a dictionary when the length of the target word is known (as was the case for the crossword solver).

One "self imposed" design constraint was that the word list would be held in MySQL and that the lookups would achieved through standard SQL and would be fast. (There are published word generators based on C scripts but ... these would break the design objectives).

The word list used is YAWL. This has over 250K entries.

The "naive" approach to solving the problem would be to generate the list of combinations first and then check them one by one against the word list. Only a good approach if high server load and long responses times are the objectives as the number of potential combinations rises dramatically and the list of letters gets longer. (Ignoring duplicate letters in the list, the number of combinations follows Pascal's Triangle).

An alternative approach is to blast through the table (or a suitable subset - for example there is no point looking for 10 letter words if the target list only has 8 letters). This then needs to be optimised for efficient read access to the data (a table scan rather than indirect access through an index) and an efficient method for the query to identify valid words.

The wirdz engine uses this approach.

Identification of potential candidates is based on a bit map field with a bit being set if a the word includes a particular letter. The basic test logic is as follows:

select *
from WORDLIST
where conv('01000111100000000000001001',2,10) & letter_map = letter_map

In this case the test set of letters is "SATURDAY".

This provides a list of candidate words rapidly which are then filtered taking into account the number of occurrences of each letter in the set of letters and the candidate word.

And the result - MySQL took 0.016 seconds to count the 163 words in the list which can be made from the above letters and 0.015 seconds to return the initial list of 15 words (actual performance of course depends on overall server load etc). Not too bad though but slow enough compared with the normal AJAX based wirdz dictionary looks ups, which auto update on keyup in the text area, for this feature to be disabled and replaced with a button.

Got a better/faster/slicker method - please add a comment.

0 Comments:

Post a Comment

<< Home