Escape and Unescape - JavaScript
(The wirdz online dictionary can be found at wirdz.com.)
With the wirdz dictionary engine, as you change dictionaries, the term you are looking for (or the initial letters of the term) are transferred through to the next dictionary via the URLusing a bit of JavaScript intervention. All well and good if the term is in plain ASCII. But what about Unicode charsets?
JavaScript provide 2 functions "escape" and "unescape" to convert from Unicode to URL safe escape codes and back. Here's an example based on "pinyin" which is used to represent Chinese in a roman font with "tone" marks to express the pronunciation:
Pinyin:
Escaped pinyin:
The escape function uses Unicode escapes (%u + a four digit hex value) for Unicode characters not included in the ASCII range 128-255 but used standard ASCII escapes for characters in the range 128-255. No doubt for good backwards compatibility reasons.
The dictionary engine needs to convert back to Unicode on the server. (More on the way in which PHP handles Unicode escapes in URLs in a later entry).
It is cleaner to ensure that all escaped characters are Unicode encoded. To achieve this the following JavasScript function can be used:
Using this, the above pinyin becomes:
UTF8 escaped pinyin:
With the wirdz dictionary engine, as you change dictionaries, the term you are looking for (or the initial letters of the term) are transferred through to the next dictionary via the URLusing a bit of JavaScript intervention. All well and good if the term is in plain ASCII. But what about Unicode charsets?
JavaScript provide 2 functions "escape" and "unescape" to convert from Unicode to URL safe escape codes and back. Here's an example based on "pinyin" which is used to represent Chinese in a roman font with "tone" marks to express the pronunciation:
Pinyin:
yīngguó diànzǐ jiésuàn xìtǒng
Escaped pinyin:
y%u012Bnggu%F3%20di%E0nz%u01D0%20ji %E9su%E0n%20x%ECt%u01D2ng
The escape function uses Unicode escapes (%u + a four digit hex value) for Unicode characters not included in the ASCII range 128-255 but used standard ASCII escapes for characters in the range 128-255. No doubt for good backwards compatibility reasons.
The dictionary engine needs to convert back to Unicode on the server. (More on the way in which PHP handles Unicode escapes in URLs in a later entry).
It is cleaner to ensure that all escaped characters are Unicode encoded. To achieve this the following JavasScript function can be used:
function unicodeEscape (pstrString) { if (pstrString == "") { return ""; } var iPos = 0; var strOut = ""; var strChar; var strString = escape(pstrString); while (iPos < strString.length) { strChar = strString.substr(iPos, 1); if (strChar == "%") { strNextChar = strString.substr(iPos + 1, 1); if (strNextChar == "u") { strOut += strString.substr(iPos, 6); iPos += 6; } else { strOut += "%u00" + strString.substr(iPos + 1, 2); iPos += 3; } } else { strOut += strChar; iPos++; } } return strOut; }
Using this, the above pinyin becomes:
UTF8 escaped pinyin:
y%u012Bnggu%u00F3%u0020di%u00E0nz%u01D0 %u0020ji%u00E9su%u00E0n%u0020x%u00ECt%u01D2ng
0 Comments:
Post a Comment
<< Home