Page 1 of 2 12 LastLast
Results 1 to 15 of 22
  1. #1
    Join Date
    Feb 2009
    Posts
    54

    db design name field

    Hello everybody,
    I am making myself a database for research purposes which will hold data about people and the books they wrote if they did. It is very similar to a bibliographic database, and the ready-to-use examples I have found on the net don't really fit my needs, so I decided to make one. I'm using Postgresql 9.

    Some remarks:
    1. Data will be in UTF-8 as it will be in many scripts (English names in latin script, Chinese names in Chinese characters, Russian names in cyrillic and so on).
    2. A lot of the data is historical, therefore very static, once a name is in it will be forever, therefore I would like to avoid having to type it again for other data entries to avoid mistyping.

    Main problem and main question:
    Name fields. Being culturally (and historically) very diverse data, a "first name", "middle name", "last name" fields solution is clearly not sufficient. I was thinking about a many-to-many relationship, and then using array_to_string to get the rows in one line. Then I opted for using an array as the name field:
    Code:
    name varchar[]
    This way I can write the names in the order they are and access them separately if needed (note: the surname is kept separate in another field, the problem is with given names). Using an array also makes it easy to access names that have spaces.
    What do you think about this solution? Any better advice?

    Related to this is another problem. Being this data historical, some of the names (first, last, whatever) have been spelled differently in different historical sources (especially in manuscripts). To record this, i was thinking about another field for "also_spelled". But then I will evenctually have to point to the historical source that has this spelling. So, either I create another table with the different spelling and source, or use a bi-dimensional array. What do you think?

    While I'm here. I could not find a way to let Postgresql to format names with the initial only. Is there a way, or can I only parse it afterwards?

    Second problem:
    Books with more than one author and/or with translator, editor, curator etc.
    The solution I have come up to is this (simplified for this problem):
    Code:
    create table person (
            person_id int,
            name varchar[]);
    create table opus (
            opus_id int,
            title varchar);
    create table person_opus (
            person_id int,
            opus_id int,
            type int, /* it is the role of the person: author, translator etc., with a lookup table. */ );
    And from person_opus create a temporary table for authors, one for translators etc:

    Code:
    select string_agg(name[1],' '), person_opus.opus_id from person
    into temporary author
    join person_opus on person.person_id = person_opus.person_id
    where person_opus.type = 1
    group by opus_id;
    Code:
    select string_agg(name[1],' '), person_opus.opus_id from person
    into temporary tranlsator
    join person_opus on person.person_id = person_opus.person_id
    where person_opus.type = 2
    group by opus_id;
    Code:
    select author.name, opus.title, translator.name
    from author
    left join opus on author.opus_id = opus.opus_id
    left join translator on author.opus_id = translator.opus_id;
    It seems to me overly complicated, but could not think of any better solution

    Of course, I could make a person_opus table with fields "author_1", "author_2", "author_3", "translator" etc., but wanted to avoid a table like this, which seems to me to be kind or "rigid".


    I think this is enough for now. Thank you in advance for your help.

  2. #2
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Some examples that could not fit into "FirstName", "MiddleName", "LastName" please?
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  3. #3
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    traditionally far eastern names do not fit the first middle last name stylistic
    however as this is historic data I question whether you need this information at all. an author is an author. its unlikely their name will change over time, especailly if the work in question is say fiction where it takes a long time to develop a persona.

    an author may write many articles / publications / books etc, more than one author may contribute to one book, and author may have multiple contributions to a book (eg as author, as referee, as advisor) and so on.

    there is a tricky case where one person may actually have more than one author name
    I'd rather be riding on the Tiger 800 or the Norton

  4. #4
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Some traditional far eastern name examples, please?
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  5. #5
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,799
    I would create a table of aliases off of the person table. One person to many aliases (citing source). Rather than "Temporary Tables", most books call what you have shown "Views". Maybe postgres is different, though.

    Blindman;
    How about a western name? Publius Virgilius Maro. Does he have a middle name?
    Last edited by MCrowley; 07-06-11 at 15:01. Reason: Mucked up the name.

  6. #6
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Quote Originally Posted by MCrowley View Post
    Blindman;
    How about a western name? Publius Virgilius Maro. Does he have a middle name?
    Why would that not fit into a schema allowing three distinct name elements?
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  7. #7
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,799
    The problem comes in searching. When you are looking for "Virgil", what does the query look like? Granted, the single name Virgil would be better placed in the aliases table, but if you are going to search a name as a whole, should it be three fields?

    Is the purpose of splitting the name up into several fields intended to provide an easy way of presenting the data in the form LastName, FirstName?

  8. #8
    Join Date
    Feb 2009
    Posts
    54
    Thank you all.
    Let me give you some more detail. In fact, I have a special table for names, with many columns, in order not to have half of the "person" table filled with names. This is because antique Chinese would not have only a name, but also a "zi" (or several "zi") and a "hao" (or several), and a lot more. I have a field for patronimic. I have a field for "clan name". And many others. So, the main question is how to make a model that allows but does not compel the possibility of having many names, last names, nicknames etc. without making a new fileld numbered progressively for each of them.

    Many examples would not fit into "first" "middle" "last" name. For example many people in Italy or Spain have three or four given names.

    The example given by MCrowley is also nice. In that case I might revert into sticking to proper historical naming conventions:
    Roman naming conventions - Wikipedia, the free encyclopedia
    I need this for research, so data must be precise.
    And yes, an alias table is a good idea.

    What I need is a very versatile model, which will be easily adaptable to situations I have not thought of while designing the database. It is fine if it is harder to insert data, as long as it will be easier to arrange it in any way I wish afterwards.

    About the question posed by healdem: the database is not only a bibliographic one, some people in the database will be there only as historical figures, maybe emperors, generals, whoever you like. But I want to have only one table for persons which may have written books, or produced sculptures or whatever. The "person" table will be kind of pivotal to a lot of other tables which will grow around it.

    So, I would like to ask again: according to your experience, is using an array for names in the way I described workable. If not, why?

    About the second problem. Of course I could use views instead of temporary tables. Maybe it would be better. But what about the method in general (of many authors and/or translators etc)?

    Tks

  9. #9
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    I usually use five name fields: Honorific, FirstName, MiddleName, LastName, Suffix.
    This is more than sufficient for storing any name, and doing any necessary searches.
    I've asked repeatedly for examples that could not fit into this model and still none have been supplied, so I will drop out of this thread.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  10. #10
    Join Date
    Feb 2009
    Posts
    54
    Yes, probably most cases would fit, but not all, especially non-Western. As I said in my answer, in antiquity the Chinese would have at least one "zi" (字), but often more than one. If writers, they would probably have a "biming" (笔名), but often more than one. So I need a model that will easily fit strange cases not easily predictable. In this case, would you make fields like "zi_1" "zi_2" "zi_3"?
    This explain what is a "zi": Chinese style name - Wikipedia, the free encyclopedia

    For example Huang Zongxi (1610--1695):
    zi: Tai Chong, De Bing;
    hao: Nan Lei, Lizhou laoren, Lizhou shanren, Lanshui yuren, Yudeng dongzhu, Shuangbao yuanzhang, Gujishi shichen;


    We can put it in a more general way:
    What if you have more than one value that goes into the same filed of a row? Given that you need to be able to access them separately, would you always use another table, or would you use an array? Why? Or else how would you do it?

  11. #11
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Most western name searches are via FirstName or LastName. MiddleName, Honorific, and suffix are rarely searched.
    I would guess that no matter how long and complex these chinese names are, for practical purposes only isolated elements are targets of searches. I'd place the primarily searched components in consistent columns, and then dump the rest of the name in what westerners would consider the MiddleName column.
    Note that this MiddleName column could still be searched, though not as efficiently as the primary name columns.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  12. #12
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    If a person has more than one actual "full" name, then these should be spun off into a table of aliases.

    An alternative would be to store the person's name as a single string and use fulltext searching, but this would make sorting very difficult.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  13. #13
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,799
    It's going to come down to how you want to use the data. Do you need to separate all the various names for anything? Do you need to take actions only on certain families?

    The problem (as you have found) is that the requirements are just not solid. The problems that the various name orders present to the front end or report developer are going to be significant.

    As for the example of Huang Zongxi, I would push the zis and the haos to the alias table. This will give you a single place to look for any of them. Don't forget to put the most common name (Huang Zongxi) in the alias table as well. If there is no need to separate any of the names within themselves, then I would be tempted (because of all the various possible orders) to just have a single field for name, and let the various archivists decide how to decode family, clan and given names out of those.

  14. #14
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,799
    Blindman raises a good point about sorting. How are these names sorted? If you have the following:

    Singman Rhee
    Kim Il-Sung
    Kim Jong-Il
    Kim Jong-Un

    What should a sorted list look like, and how is it derived, when you add something like

    Harry S. Truman

  15. #15
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Well as a red-blooded American, I would assume that Truman would sort out on top.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •