If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > MySQL > Reverse Operation for GROUP_CONCAT()?

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 03-30-10, 11:30
Ikviens Ikviens is offline
Registered User
 
Join Date: Mar 2006
Posts: 55
Reverse Operation for GROUP_CONCAT()?

Hi,

How, if at all possible, can I do the opposite of what GROUP_CONCAT() function does? In other words, I would like to break a string field into rows. Breakpoints should be supplied in the field data.

A concrete example is breaking a word into syllables. I want to have data of a word as follows (the column word is better replaced by word_id, which in another table linked to the word but I chose to simplify here):
Code:
+--------+-------------+----------+
| word   | syllable_id | syllable |
+--------+-------------+----------+
| awaken |           1 | ə       | 
| awaken |           2 | ˈweɪ   | 
| awaken |           3 | kən     | 
+--------+-------------+----------+
The results table would be created or selected from dictionary table like this:
word_id: 1
spelling: awaken
pronunciation: əweɪ.kən

For pronunciation, the International Phonetic Alphabets separator for syllables is the full stop. It is by convention omitted before the accent marks (ˈ and ˌ). In order to assure that there exists only one separator for MySQL to process, dictionary entries like this (note the difference from the underscored part above) would be pre-processed so that 1. ˈ and ˌ are replaced by .ˈ and .ˌ and that 2. a full stop at the beginning of the word is deleted. With this measure we can get ˌed.jʊ.ˈkeɪ.ʃən for "education" and ˈed.ɪt for "edit."

Now, suppose I have dictionary populated with words whose pronunciations are set off into syllables by full stop. How would I go from here to enter each syllable in different rows (of still the same column; preferably numbered)?

Thank you all for you help.
Reply With Quote
  #2 (permalink)  
Old 03-30-10, 12:18
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,525
i will outline the strategy i would pursue, and leave it to you to implement and test it

first, you will need a table of numbers, running from 1 to whatever maximum number of separators you have

i have a general purpose table that i use for various things, and i would use it here --
Code:
CREATE TABLE numbers (n INTEGER NOT NULL PRIMARY KEY);
INSERT INTO numbers (n) VALUES 
 (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(12),(13),(14),...
next, i would cross join these numbers to my concatenated strings, and use the SUBSTRING_INDEX function to split the strings at the various separators

SUBSTRING_INDEX(word,'.',1) yields the 1st syllable
SUBSTRING_INDEX(SUBSTRING_INDEX(word,'.',2),'.',-1) yields the 2nd syllable
SUBSTRING_INDEX(SUBSTRING_INDEX(word,'.',3),'.',-1) yields the 3rd syllable
SUBSTRING_INDEX(SUBSTRING_INDEX(word,'.',4),'.',-1) yields the 4th syllable

and you can see the pattern emerging, using n from the numbers table

is this enough of a start?
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #3 (permalink)  
Old 04-03-10, 01:34
Ikviens Ikviens is offline
Registered User
 
Join Date: Mar 2006
Posts: 55
Thanks, Rudy, for showing how the embedded STRING_INDEX function works!

Luckily, we can use the same formula for the first syllable as for the rest. So, for the dictionary table consisting of two columns (word such as 'awaken' and pronunciation such as 'ə.ˈweɪ.kən'; note the accent symbol is different from the single quote), the following is my implementation.

Code:
CREATE TABLE digits (digit int(1));
INSERT INTO digits values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
CREATE TABLE sequence (seq int(3));
INSERT INTO sequence (
SELECT D1.digit + D2.digit*10 + D3.digit*100
  FROM digits D1 CROSS JOIN
       digits D2 CROSS JOIN
       digits D3);

SELECT word, seq as syllable_id, SUBSTRING_INDEX(SUBSTRING_INDEX(pronunciation,'.',seq),'.',-1) as syllable
  FROM sequence CROSS JOIN dictionary
 WHERE seq BETWEEN 1 and (SELECT 1 + LENGTH(pronunciation) - LENGTH(REPLACE(pronunciation,'.','')))
   AND word = 'awaken';
English words usually have no more than 8 syllables, so my sequence table is unreasonably large. I created it nonetheless because large sequential integers comes in handy for many uses.


Code:
+--------+-------------+----------+
| word   | syllable_id | syllable |
+--------+-------------+----------+
| awaken |           1 | ə       | 
| awaken |           2 | ˈweɪ   | 
| awaken |           3 | kən     | 
+--------+-------------+----------+
3 rows in set (0.00 sec)
YEY!
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On