MySQL string splitter
      The latest dictionary to be added to the wirdz/AMCD site is an English-Japanese dictionary based on the EDICT Japanese-English dictionary.  This dictionary is made available in the form of a delimiter separated text file with a primary delimiter character of "/".
Because the original file is Japanese to English, it may have several English equivalents, separated as above, for a single Japanese terms (up to 11).
To restructure the data as English-Japanese sorted alphabetically, it was necessary to create separate entries corresponding to each English term. This was done on the database side (much faster once the initial data load to the remote host server was completed - this took some time as the original dictionary has >100000 entries).
MySQL doesn't have a direct equivalent to a "split" function/method for strings but it does have the substring_index function which can be used to achieve the same effect as shown below.
Using:
the returned values are: 'a' 'b' 'c' and an empty string as required.
The above formula doesn't need the first item to be handled as a special case and returns empty strings correctly when the item count is less than the position requested.
    Because the original file is Japanese to English, it may have several English equivalents, separated as above, for a single Japanese terms (up to 11).
To restructure the data as English-Japanese sorted alphabetically, it was necessary to create separate entries corresponding to each English term. This was done on the database side (much faster once the initial data load to the remote host server was completed - this took some time as the original dictionary has >100000 entries).
MySQL doesn't have a direct equivalent to a "split" function/method for strings but it does have the substring_index function which can be used to achieve the same effect as shown below.
Using:
select replace(substring(substring_index('a,b,c', ',', 1),
                         length(substring_index('a,b,c', 
                                                ',', 
                                                1 - 1)) + 1),
               ',', '') ITEM1,
       replace(substring(substring_index('a,b,c', ',', 2),
                         length(substring_index('a,b,c', 
                                                ',', 
                                                2 - 1)) + 1),
               ',', '') ITEM2,
       replace(substring(substring_index('a,b,c', ',', 3),
                         length(substring_index('a,b,c', 
                                                ',', 
                                                3 - 1)) + 1),
               ',', '') ITEM3,
       replace(substring(substring_index('a,b,c', ',', 4),
                        length(substring_index('a,b,c', 
                        ',', 4 - 1)) + 1),
               ',', '') ITEM4
the returned values are: 'a' 'b' 'c' and an empty string as required.
The above formula doesn't need the first item to be handled as a special case and returns empty strings correctly when the item count is less than the position requested.



1 Comments:
How could you count the tokens of a string like, e.g., the java StringTokenizer does?
Post a Comment
<< Home