Pinyin tone numbers to tone marks in MySQL
The following SQL will convert from Chinese pinyin tone number format to pinyin with tone marks.
The query is based on The Fool's Workshop Pinyin to Unicode Converter. This is uses a PHP Script which takes tone number pinyin in and outputs the html entities for the tone mark characters.
For the wirdz English-Chinese dictionary, it is better to do all the processing in advance and hold the pinyin in both forms directly in the dictionary table. This eliminates the need for on the fly conversion but more importantly allows the dictionary to be full text searched directly using tone mark pinyin.
Here's the script:
The query is based on The Fool's Workshop Pinyin to Unicode Converter. This is uses a PHP Script which takes tone number pinyin in and outputs the html entities for the tone mark characters.
For the wirdz English-Chinese dictionary, it is better to do all the processing in advance and hold the pinyin in both forms directly in the dictionary table. This eliminates the need for on the fly conversion but more importantly allows the dictionary to be full text searched directly using tone mark pinyin.
Here's the script:
/* Pinyin tone numbers to pinyin tone marks */ set names 'utf8'; /* Convert tone numbers to intermediate representation */ update TABLE set pinyin_tones = replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace(lower(pinyin_numbers), 'ang1','//aq//ng'), 'ang2','//aw//ng'), 'ang3','//ae//ng'), 'ang4','//ar//ng'), 'eng1','//eq//ng'), 'eng2','//ew//ng'), 'eng3','//ee//ng'), 'eng4','//er//ng'), 'ing1','//iq//ng'), 'ing2','//iw//ng'), 'ing3','//ie//ng'), 'ing4','//ir//ng'), 'ong1','//oq//ng'), 'ong2','//ow//ng'), 'ong3','//oe//ng'), 'ong4','//or//ng'), 'an1','//aq//n'), 'an2','//aw//n'), 'an3','//ae//n'), 'an4','//ar//n'), 'en1','//eq//n'), 'en2','//ew//n'), 'en3','//ee//n'), 'en4','//er//n'), 'in1','//iq//n'), 'in2','//iw//n'), 'in3','//ie//n'), 'in4','//ir//n'), 'un1','//uq//n'), 'un2','//uw//n'), 'un3','//ue//n'), 'un4','//ur//n'), 'ao1','//aq//o'), 'ao2','//aw//o'), 'ao3','//ae//o'), 'ao4','//ar//o'), 'ou1','//oq//u'), 'ou2','//ow//u'), 'ou3','//oe//u'), 'ou4','//or//u'), 'ai1','//aq//i'), 'ai2','//aw//i'), 'ai3','//ae//i'), 'ai4','//ar//i'), 'ei1','//eq//i'), 'ei2','//ew//i'), 'ei3','//ee//i'), 'ei4','//er//i'), 'a1','//aq//'), 'a2','//aw//'), 'a3','//ae//'), 'a4','//ar//'), 'a1','//aq//'), 'a2','//aw//'), 'a3','//ae//'), 'a4','//ar//'), 'er2','//ew//r'), 'er3','//ee//r'), 'er4','//er//r'), 'lyue','l//v//e'), 'nyue','n//v//e'), 'e1','//eq//'), 'e2','//ew//'), 'e3','//ee//'), 'e4','//er//'), 'o1','//oq//'), 'o2','//ow//'), 'o3','//oe//'), 'o4','//or//'), 'i1','//iq//'), 'i2','//iw//'), 'i3','//ie//'), 'i4','//ir//'), 'nyu3','n//ve//'), 'lyu','l//v//'), 'v1','//vq//'), 'v2','//vw//'), 'v3','//ve//'), 'v4','//vr//'), 'v0','//vs//'), 'u1','//uq//'), 'u2','//uw//'), 'u3','//ue//'), 'u4','//ur//'); /* Convert to tone marks */ update TABLE set pinyin_tones = replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace(pinyin_tones, '//aq//','ā'), '//aw//','á'), '//ae//','ǎ'), '//ar//','à'), '//eq//','ē'), '//ew//','é'), '//ee//','ě'), '//er//','è'), '//iq//','ī'), '//iw//','í'), '//ie//','ǐ'), '//ir//','ì'), '//oq//','ō'), '//ow//','ó'), '//oe//','ǒ'), '//or//','ò'), '//uq//','ū'), '//uw//','ú'), '//ue//','ǔ'), '//ur//','ù'), '//vq//','ǖ'), '//vw//','ǘ'), '//ve//','ǚ'), '//vr//','ǜ'), '//vs//','ü'), '//aaq//','Ā'), '//aaw//','À'), '//aae//','Ǎ'), '//aar//','¿'), '//eeq//','Ē'), '//eew//','É'), '//eer//','È');
0 Comments:
Post a Comment
<< Home