Wednesday, May 06, 2020

Chinese characters and escapes with AJAX and PHP

Although most of the time, UTF-8 combined with the multi-byte settings on PHP seem to do the trick beneath the hood, there is always a corner case.  In order to support Chinese character search against the English-Chinese Dictionary or Kanji/Kana searches against the Japanese-English Dictionary passing Chinese and Japanese characters via the Ajax call to the PHP back-end process is needed.  When this was tested, the characters passed to the server process, which were escaped UTF-8 did not prove to be easily converted back to UTF-8 using any of the standard PHP functions.

For example, the Chinese character string 介質訪問控制層 arrived as:

%u4ECB%u8CEA%u8A2A%u554F%u63A7%u5236

when processed through the AJAX "get" method.

After eliminating all of the standard functions, the following steps finally did the trick:

1. Replacing % with \ in the input string to get it into the form of a "standard" escaped UTF-8 string.

2. Converting the string to a string representing a single element array, i.e. "['my utf8 string']"

3. Using the json_decode function on the string.

4. Extracting the converted text string from the returned array.

Not the most obvious way to handle the problem but simple in the end!  Because the string passed to the server by the AJAX call is subject to character filtering, the risk of a false positive getting misinterpreted by the json function is minimized.

0 Comments:

Post a Comment

<< Home