Results 1 to 2 of 2
  1. #1
    Join Date
    Aug 2002
    Posts
    2

    Words and Syllables

    Perhaps this is a simple question, but I've thought about it and haven't really come up with any good ideas. I'm working with MySQL (and probably PHP, if that matters any).

    What I'm trying to do is create a database that has a big list of words. But I also want to keep track of the syllables that make up each word. So, take the word "database" for example. I want to have some way of knowing that this word splits into syllables like "da-ta-base."

    I've thought about having one table that stores syllables, so "database" will be stored as "da", "ta", and "base", but then I would have to have some way of tying them together to make a word. I would need to have a field that is something like "ordinal number" so "da" would be first, "ta" would be second, and "base" would be third, so I wouldn't get "tabaseda" or something. The disadvantage here is that there's a possibility of having two different syllables marked "first" for the same word. Also, to get a whole word, I would have to first query the syllable table and then patch all the syllables together, which seems pretty time-consuming. However, an advantage would be that I would only have to store the syllable "ta" once, and I could reuse it in words like "potato" and "table" and "magenta."

    Another idea would be to store the entire word in one table and then have a many-to-many relationship with syllables in the other table. For example, "database" is stored in one table, and it's linked to the syllables "da," "ta," and "base" in another. But then I've got redundant data. And I think I would still need some way of indicating that "da" is actually the first syllable, "ta" the second, and "base" the third. Otherwise, trying to figure out syllable breaks in words like "academia" from the syllables "a," "a," "ac," "dem," and "i" would be a programming nightmare. You try it.

    A third idea (and a really horrible one) would be to have fields in one table like "first syllable," "second syllable," etc. But this is so bad I'm not even considering this one.

    Number four: I could have a "first syllable" table, a "second syllable" table, and so on, where words are matched to nth-syllables in a one-to-one relationship. This seems only slightly better than Idea Number Three.

    Finally, I could make some sort of a table that just indicates syllable breaks, in a one-to-many relationship. So, for the word "database," it would have the numbers 2 and 4 (syllable breaks after the second and fourth letters). This seems to be OK, but it also has the disadvantage of requiring some extra work to extract the syllables, and it doesn't have the advantage that Idea One has of being able to reuse syllables. So I couldn't run a quick query to find out which words have "ta" as a syllable. Also, there's the possibility of illegal values (like 100) that would require some safety checks in my calling code.

    There's also the problem that some words have more than one way to break them up into syllables. I suppose that wouldn't be too hard to implement with any given idea; it would just require another table.

    So anyway, I would really appreciate any suggestions that anyone has. Maybe I have already found the best idea, in one of my five that I've presented. But I have this gut feeling that there's some way that the pros do it, and I'm missing that method completely.

    Thanks a bunch.

    Brian

  2. #2
    Join Date
    Oct 2002
    Location
    Baghdad, Iraq
    Posts
    697

    Re: Words and Syllables

    Originally posted by bkell
    Perhaps this is a simple question, but I've thought about it and haven't really come up with any good ideas. I'm working with MySQL (and probably PHP, if that matters any).

    What I'm trying to do is create a database that has a big list of words. But I also want to keep track of the syllables that make up each word. So, take the word "database" for example. I want to have some way of knowing that this word splits into syllables like "da-ta-base."
    Forget about whether this would perform well for a moment:

    I think the simplest solution would be to store the word and the offset in a relation variable. Just one. No many to whatever.

    The relation variable would be called Syllables, and its predicate would be "There is a syllable in the word W at position P."

    Using SQL to express this, again logically:

    CREATE TABLE SyllablesBase ( word WORD, position POS, PRIMARY KEY (word, position) )

    That's all you need. Length can be derived as such:

    CREATE VIEW SyllablesLength SELECT so.word, so.position, (SELECT TOP 1 MIN(si.position) FROM (SyllablesBase UNION SELECT so.word AS word, LEN(so.word) AS position) AS si GROUP BY si.word WHERE so.word = si.word AND si.position > so.position) - so.position AS length
    FROM SyllablesBase so

    (where si is Syllables Inner, and so is Syllables Outer)

    That assumes that every syllable is entered: this is *not* required by the predicate I gave, which merely talks about individual syllables. If you can't enforce that guarantee, the SyllablesLength view will fail, so add it in.

    It would also make sense to create a view like this:

    CREATE VIEW Syllables SELECT *, MID(word, position, length) AS Syllable FROM SyllablesLength

    To find the nth syllable:

    CREATE VIEW SyllableOrdinals SELECT *, (SELECT COUNT(*) FROM SyllablesBase si WHERE si.position <= so.position AND si.word = so.word) AS ordinal FROM SyllablesBase so

    Performance-wise:

    If you can get the views I suggest to work, you should go with them because they remove all the possible redundant data. Try them before assuming they'll perform poorly. You can always just have your app precompute them if need be. There's *no* need for multiple tables.

    Where I said "WORD" and "POS" you can substitute, probably VARCHAR(127) and TINYINT.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •