Escape from Atamyrat

(The wirdz online dictionary can be found at wirdz.com.)

With the wirdz dictionary engine, as you change dictionaries, the term you are looking for (or the initial letters of the term) are transferred through to the next dictionary via the URLusing a bit of JavaScript intervention. All well and good if the term is in plain ASCII. But what about Unicode charsets?

JavaScript provide 2 functions "escape" and "unescape" to convert from Unicode to URL safe escape codes and back. Here's an example based on "pinyin" which is used to represent Chinese in a roman font with "tone" marks to express the pronunciation:

Pinyin:

yīngguó diànzǐ jiésuàn xìtǒng

Escaped pinyin:

y%u012Bnggu%F3%20di%E0nz%u01D0%20ji
%E9su%E0n%20x%ECt%u01D2ng

The escape function uses Unicode escapes (%u + a four digit hex value) for Unicode characters not included in the ASCII range 128-255 but used standard ASCII escapes for characters in the range 128-255. No doubt for good backwards compatibility reasons.

The dictionary engine needs to convert back to Unicode on the server. (More on the way in which PHP handles Unicode escapes in URLs in a later entry).

It is cleaner to ensure that all escaped characters are Unicode encoded. To achieve this the following JavasScript function can be used:


function unicodeEscape (pstrString) {
  if (pstrString == "") {
    return "";
  }
  var iPos = 0;
  var strOut = "";
  var strChar;
  var strString = escape(pstrString);
  while (iPos < strString.length) {
    strChar = strString.substr(iPos, 1);
    if (strChar == "%") {
      strNextChar = strString.substr(iPos + 1, 1);
      if (strNextChar == "u") {
        strOut += strString.substr(iPos, 6);
        iPos += 6; 
      }
      else {
        strOut += "%u00" + 
                  strString.substr(iPos + 1, 2);
        iPos += 3;
      }
    }
    else {
      strOut += strChar;
      iPos++;
    }
  }
  return strOut;
}

Using this, the above pinyin becomes:

UTF8 escaped pinyin:

y%u012Bnggu%u00F3%u0020di%u00E0nz%u01D0
%u0020ji%u00E9su%u00E0n%u0020x%u00ECt%u01D2ng

Tuesday, March 07, 2006

Escape and Unescape - JavaScript

0 Comments:

About Me

Previous Posts