Results 1 to 10 of 10
  1. #1
    Join Date
    Nov 2002
    Posts
    98

    huge records vs huge amount of records

    it is a designing issue i am curious about, since apparently i will be involved in a situation where i have to make a decision soon.

    we are in the business of making credit card systems. typically we will need to save some logs on credit card transactions.

    a credit card message consists at most 128 defined fields of message blocks, and each message can have various blocks, which is identifed by a bitmap at the beginning of the message:


    the old design was similar to

    create table messages
    message_header
    F1
    F2
    F3.....

    the result is a monstrocity of records that contains 128+ columns with most of them empty. Somehow i feel it isn't effcient at all.


    what i wanted to build is similar to:

    create table message_fields
    Message_id
    Field_name
    Filed_value

    this way there will only be 1 field value per record, and it makes sure the space is used effciently. But the amount of record will obviously increase at least 20~30 times, which might be a problem in optimizing queries.

    What I would like some advices on are how to optimize such queries, whether the 2nd solution is better in this case, and if 2nd solution will really use it's storage space more efficiently.


    TIA

    Mark

  2. #2
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171

    Re: huge records vs huge amount of records

    Originally posted by mchih
    it is a designing issue i am curious about, since apparently i will be involved in a situation where i have to make a decision soon.

    we are in the business of making credit card systems. typically we will need to save some logs on credit card transactions.

    a credit card message consists at most 128 defined fields of message blocks, and each message can have various blocks, which is identifed by a bitmap at the beginning of the message:


    the old design was similar to

    create table messages
    message_header
    F1
    F2
    F3.....

    the result is a monstrocity of records that contains 128+ columns with most of them empty. Somehow i feel it isn't effcient at all.


    what i wanted to build is similar to:

    create table message_fields
    Message_id
    Field_name
    Filed_value

    this way there will only be 1 field value per record, and it makes sure the space is used effciently. But the amount of record will obviously increase at least 20~30 times, which might be a problem in optimizing queries.

    What I would like some advices on are how to optimize such queries, whether the 2nd solution is better in this case, and if 2nd solution will really use it's storage space more efficiently.


    TIA

    Mark
    The 128 column solution will probably be a lot easier to work with, and empty columns don't usually occupy much space. You may even end up requiring more space with your proposed change, since you estimate you will have 20-30 times more records. Not that space saving is usually very important in database design: efficiency of querying is generally more important.

    Of course, it depends also on the data in those 128 columns. If they are 128 different pieces of information (attributes) then 128 columns is correct. On the other hand, if some columns fall into "repeating groups" like (e.g.) tran_date_1, tran_amount_1, ..., tran_date_9, tran_amount_9, then they should be normalised into a separate table to facilitate queries like "get sum of all tran amounts for each message".

    However, it sounds like you are considering an "EAV" (entity, attribute, value) over-generic solution which is generally a bad idea.

  3. #3
    Join Date
    Nov 2002
    Posts
    98
    thanks for the advice

    the columns consist of different values, so each record itself is pretty much normalized.

    what i am wondering is whether the overhead on empty column space (70 to 80% of them will likely be empty) will cost some performance issue and if i need to look into it.

    another problem is that each field(column) COULD have sub-fields, and there might be some custom-made transaction fields as well.

    if that's the case, is it possible to setup a table for each type of transaction, and use a relation table to link the relating transactions (messages usually comes in pairs).

    i guess i will need to tell myself that empty columns are not the end of world :P

  4. #4
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    I think you'll lose back a lot of your space when you index that table to make it run efficiently. Generally in a business application, response time is more critical than disk space, and you'll get a faster response time (and simpler, more easily maintained code) if you keep the 128 column schema.

    blindman

  5. #5
    Join Date
    Nov 2002
    Posts
    98
    another thing i notice while i was studying the old schema is that they don't seem to apply foreign keys on relating tables.

    besides having a structural link, do foreign keys serve as indexes as well?

    TIA

    Mark

  6. #6
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Originally posted by mchih
    another thing i notice while i was studying the old schema is that they don't seem to apply foreign keys on relating tables.

    besides having a structural link, do foreign keys serve as indexes as well?

    TIA

    Mark
    No, in fact you should almost always create an index to "support" a foreign key constraint, i.e. to enable constraint checks to be performed efficiently.

    The main function of a foreign key is to ensure data integrity; they may also help the optimizer in determining the best query plan.

    I should say that the above remarks apply to Oracle in particular, other DBMSs may have different characteristics.

  7. #7
    Join Date
    Nov 2002
    Posts
    98
    thanks for the advice.

    At the moment we will likely use Oracle9i for the production db. so it's right in your alley, andrewst :P

    Right now I am looking at some credit card termial management schema.
    it also has about 100~200 parameters per record. what's different from the credit card transactions is that they seem to fit into groups

    eg:

    create table terminals
    id
    visa_parameter1
    visa_parameter2
    .
    .
    .
    debit_parameter1
    debit_parameter2
    .
    .
    .


    In this case, would it be more efficient if i separate different kinds of parameters into different tables?

    eg.

    create table terminals
    id
    visa_parameter_id
    debit_parameter_id
    .
    .
    .

    create table visa_parameters
    id
    parameter_1
    parameter_2
    .
    .
    .


    For what i can gather, it won't increase the record number significantly. In fact, it might even reduce the required space, since most terminals use the same set of parameter, and in this structure one set of parameter can be referenced to different terminals easily.

    What I am wondering is whether this structure would suffer the same pitfall as the credit card transaction schema, namely sacrificing efficiency for space.

    TIA

    Mark

  8. #8
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Originally posted by mchih
    thanks for the advice.

    At the moment we will likely use Oracle9i for the production db. so it's right in your alley, andrewst :P

    Right now I am looking at some credit card termial management schema.
    it also has about 100~200 parameters per record. what's different from the credit card transactions is that they seem to fit into groups

    eg:

    create table terminals
    id
    visa_parameter1
    visa_parameter2
    .
    .
    .
    debit_parameter1
    debit_parameter2
    .
    .
    .


    In this case, would it be more efficient if i separate different kinds of parameters into different tables?

    eg.

    create table terminals
    id
    visa_parameter_id
    debit_parameter_id
    .
    .
    .

    create table visa_parameters
    id
    parameter_1
    parameter_2
    .
    .
    .


    For what i can gather, it won't increase the record number significantly. In fact, it might even reduce the required space, since most terminals use the same set of parameter, and in this structure one set of parameter can be referenced to different terminals easily.

    What I am wondering is whether this structure would suffer the same pitfall as the credit card transaction schema, namely sacrificing efficiency for space.

    TIA

    Mark
    I can't give a definitive answer without knowing the data in a lot more detail than is possible here, but there may be little to gain by splitting this table. Efficiency of querying and using the tables is generally much more important than reducing space. Consider how the data is used and then choose the design that best supports that. Remember, empty columns use up very little space - in fact, no space at all if they are at the end of the record.

  9. #9
    Join Date
    Nov 2002
    Posts
    98
    what i meant by reducing space is more like reducing the number of records. ie trying to find some parameters that have some kind of grouping, and then put them into a separate table and then linking it with a foreign key. so if I were to search for those parameters, I can get a quicker response since there would be less records there.

    for example:

    create table terminals
    (
    parameter1
    parameter2
    ....
    area_id
    )

    create table areas
    (
    area_parameter1
    area_parameter2
    ....
    )

    terminals within the same area would have the same parameter.
    Can you comment on this particular design?

    TIA

    Mark

  10. #10
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Originally posted by mchih
    what i meant by reducing space is more like reducing the number of records. ie trying to find some parameters that have some kind of grouping, and then put them into a separate table and then linking it with a foreign key. so if I were to search for those parameters, I can get a quicker response since there would be less records there.

    for example:

    create table terminals
    (
    parameter1
    parameter2
    ....
    area_id
    )

    create table areas
    (
    area_parameter1
    area_parameter2
    ....
    )

    terminals within the same area would have the same parameter.
    Can you comment on this particular design?

    TIA

    Mark
    Sounds reasonable - in fact, what you are doing is standard normalisation: if some parameters are functionally dependent on the area to which a terminal belongs rather than on the individual terminal, then they should be normalised to a separate area table.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •