Thursday, May 07, 2020

There are a number of interesting old dictionaries which have been digitized such that they can be relatively easily processed and uploaded to populate the mysql database table stucture used by the wirdz™ dictionary engine.

The latest addition to wirdz.com is the Sailor's Word Book, a 19th century dictionary of nautical terms created by one admiral and edited by another. 

As always, the text was never intended to be easy for automated text process, which is this case would have been over 100 years after the book was produced.  The trick is to be able to identify where the main body of the dictionary starts, where it ends, when a new term starts, whether there is a part of speech included and where the definition starts.  The Victorian type setters and the folks who undertake the digitisation are in general both well disciplined folks with strong attention to detail and consistency and so, despite there always being a few edge cases, the digitisation, which in the case of content for wirdz.com uses variations on a core tool built in Python, can be relatively straight-forward.

0 Comments:

Post a Comment

<< Home