Saturday, April 15, 2006

Scrabble is harder!

In addition to the crossword solver tool, the  wirdz site now includes a look up engine for Scrabble® and similar word games - the wirdz Scrabble Solver.

Making this run fast was a tougher problem than solving anagrams or doing wild card looks up against a dictionary when the length of the target word is known (as was the case for the crossword solver).

One "self imposed" design constraint was that the word list would be held in MySQL and that the lookups would achieved through standard SQL and would be fast. (There are published word generators based on C scripts but ... these would break the design objectives).

The word list used is YAWL. This has over 250K entries.

The "naive" approach to solving the problem would be to generate the list of combinations first and then check them one by one against the word list. Only a good approach if high server load and long responses times are the objectives as the number of potential combinations rises dramatically and the list of letters gets longer. (Ignoring duplicate letters in the list, the number of combinations follows Pascal's Triangle).

An alternative approach is to blast through the table (or a suitable subset - for example there is no point looking for 10 letter words if the target list only has 8 letters). This then needs to be optimised for efficient read access to the data (a table scan rather than indirect access through an index) and an efficient method for the query to identify valid words.

The wirdz engine uses this approach.

Identification of potential candidates is based on a bit map field with a bit being set if a the word includes a particular letter. The basic test logic is as follows:

select *
from WORDLIST
where conv('01000111100000000000001001',2,10) & letter_map = letter_map

In this case the test set of letters is "SATURDAY".

This provides a list of candidate words rapidly which are then filtered taking into account the number of occurrences of each letter in the set of letters and the candidate word.

And the result - MySQL took 0.016 seconds to count the 163 words in the list which can be made from the above letters and 0.015 seconds to return the initial list of 15 words (actual performance of course depends on overall server load etc). Not too bad though but slow enough compared with the normal AJAX based wirdz dictionary looks ups, which auto update on keyup in the text area, for this feature to be disabled and replaced with a button.

Got a better/faster/slicker method - please add a comment.

Monday, April 10, 2006

Anagrams are easy!

The latest addition to the wirdz dictionary set is a crossword word search.

This is based on the same AJAX framework as the main dictionary engine, giving "instant" response as text is entered. It handles both anagrams and wild card searches. The rule is simple: if the term contains a wild card, it's a wild card search - otherwise look for an anagram. (Anagrams with wild cards? - fuggedaboudit!)

Anagrams are handled by including a field in the database with the letters of the word/term sorted in alphabetical order. This makes look-ups extremely fast.

For wild card searches the word-list table is indexed by term length and contains a number of indexed fields based on the original term to make look ups faster. The result: super fast display updates as new letters or wild cards are typed. AJAX and MySQL is a very powerful combo.

Since we don't have telnet or ssh access to the host server, loading the worldlist table had to be carried out using remote MySQL access. If you haven't come across it before, check out the MySQL LOAD DATA INFILE command. Using this with a local comma separated file loaded nearly 250000 records in short order and much faster than running a local insert script or using remote ODBC or JDBC connections.