| |
Welcome to the dBforums forums.
You are currently viewing our boards as a guest which gives you limited access to view most discussions, articles and access our other FREE features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload your own photos and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!
If you have any problems with the registration process or your account login, please contact contact support.
If you prefer not to see double-underlined words and corresponding ads, place your cursor here for ContentLink opt out.
|
 |

07-13-03, 22:32
|
|
Registered User
|
|
Join Date: Nov 2002
Posts: 98
|
|
|
huge records vs huge amount of records
|
it is a designing issue i am curious about, since apparently i will be involved in a situation where i have to make a decision soon.
we are in the business of making credit card systems. typically we will need to save some logs on credit card transactions.
a credit card message consists at most 128 defined fields of message blocks, and each message can have various blocks, which is identifed by a bitmap at the beginning of the message:
the old design was similar to
create table messages
message_header
F1
F2
F3.....
the result is a monstrocity of records that contains 128+ columns with most of them empty. Somehow i feel it isn't effcient at all.
what i wanted to build is similar to:
create table message_fields
Message_id
Field_name
Filed_value
this way there will only be 1 field value per record, and it makes sure the space is used effciently. But the amount of record will obviously increase at least 20~30 times, which might be a problem in optimizing queries.
What I would like some advices on are how to optimize such queries, whether the 2nd solution is better in this case, and if 2nd solution will really use it's storage space more efficiently.
TIA
Mark
|
|

07-14-03, 07:25
|
|
Moderator.
|
|
Join Date: Sep 2002
Location: UK
Posts: 4,874
|
|
|
Re: huge records vs huge amount of records
Quote:
Originally posted by mchih
it is a designing issue i am curious about, since apparently i will be involved in a situation where i have to make a decision soon.
we are in the business of making credit card systems. typically we will need to save some logs on credit card transactions.
a credit card message consists at most 128 defined fields of message blocks, and each message can have various blocks, which is identifed by a bitmap at the beginning of the message:
the old design was similar to
create table messages
message_header
F1
F2
F3.....
the result is a monstrocity of records that contains 128+ columns with most of them empty. Somehow i feel it isn't effcient at all.
what i wanted to build is similar to:
create table message_fields
Message_id
Field_name
Filed_value
this way there will only be 1 field value per record, and it makes sure the space is used effciently. But the amount of record will obviously increase at least 20~30 times, which might be a problem in optimizing queries.
What I would like some advices on are how to optimize such queries, whether the 2nd solution is better in this case, and if 2nd solution will really use it's storage space more efficiently.
TIA
Mark
|
The 128 column solution will probably be a lot easier to work with, and empty columns don't usually occupy much space. You may even end up requiring more space with your proposed change, since you estimate you will have 20-30 times more records. Not that space saving is usually very important in database design: efficiency of querying is generally more important.
Of course, it depends also on the data in those 128 columns. If they are 128 different pieces of information (attributes) then 128 columns is correct. On the other hand, if some columns fall into "repeating groups" like (e.g.) tran_date_1, tran_amount_1, ..., tran_date_9, tran_amount_9, then they should be normalised into a separate table to facilitate queries like "get sum of all tran amounts for each message".
However, it sounds like you are considering an "EAV" (entity, attribute, value) over-generic solution which is generally a bad idea.
|
|

07-14-03, 14:02
|
|
Registered User
|
|
Join Date: Nov 2002
Posts: 98
|
|
|
thanks for the advice
the columns consist of different values, so each record itself is pretty much normalized.
what i am wondering is whether the overhead on empty column space (70 to 80% of them will likely be empty) will cost some performance issue and if i need to look into it.
another problem is that each field(column) COULD have sub-fields, and there might be some custom-made transaction fields as well.
if that's the case, is it possible to setup a table for each type of transaction, and use a relation table to link the relating transactions (messages usually comes in pairs).
i guess i will need to tell myself that empty columns are not the end of world :P
|
|

07-14-03, 18:29
|
|
World Class Flame Warrior
|
|
Join Date: Jun 2003
Location: Ohio
Posts: 9,299
|
|
I think you'll lose back a lot of your space when you index that table to make it run efficiently. Generally in a business application, response time is more critical than disk space, and you'll get a faster response time (and simpler, more easily maintained code) if you keep the 128 column schema.
blindman
|
|

07-14-03, 21:53
|
|
Registered User
|
|
Join Date: Nov 2002
Posts: 98
|
|
another thing i notice while i was studying the old schema is that they don't seem to apply foreign keys on relating tables.
besides having a structural link, do foreign keys serve as indexes as well?
TIA
Mark
|
|

07-15-03, 06:20
|
|
Moderator.
|
|
Join Date: Sep 2002
Location: UK
Posts: 4,874
|
|
Quote:
Originally posted by mchih
another thing i notice while i was studying the old schema is that they don't seem to apply foreign keys on relating tables.
besides having a structural link, do foreign keys serve as indexes as well?
TIA
Mark
|
No, in fact you should almost always create an index to "support" a foreign key constraint, i.e. to enable constraint checks to be performed efficiently.
The main function of a foreign key is to ensure data integrity; they may also help the optimizer in determining the best query plan.
I should say that the above remarks apply to Oracle in particular, other DBMSs may have different characteristics.
|
|

07-15-03, 23:27
|
|
Registered User
|
|
Join Date: Nov 2002
Posts: 98
|
|
thanks for the advice.
At the moment we will likely use Oracle9i for the production db. so it's right in your alley, andrewst :P
Right now I am looking at some credit card termial management schema.
it also has about 100~200 parameters per record. what's different from the credit card transactions is that they seem to fit into groups
eg:
create table terminals
id
visa_parameter1
visa_parameter2
.
.
.
debit_parameter1
debit_parameter2
.
.
.
In this case, would it be more efficient if i separate different kinds of parameters into different tables?
eg.
create table terminals
id
visa_parameter_id
debit_parameter_id
.
.
.
create table visa_parameters
id
parameter_1
parameter_2
.
.
.
For what i can gather, it won't increase the record number significantly. In fact, it might even reduce the required space, since most terminals use the same set of parameter, and in this structure one set of parameter can be referenced to different terminals easily.
What I am wondering is whether this structure would suffer the same pitfall as the credit card transaction schema, namely sacrificing efficiency for space.
TIA
Mark
|
|

07-16-03, 06:27
|
|
Moderator.
|
|
Join Date: Sep 2002
Location: UK
Posts: 4,874
|
|
Quote:
Originally posted by mchih
thanks for the advice.
At the moment we will likely use Oracle9i for the production db. so it's right in your alley, andrewst :P
Right now I am looking at some credit card termial management schema.
it also has about 100~200 parameters per record. what's different from the credit card transactions is that they seem to fit into groups
eg:
create table terminals
id
visa_parameter1
visa_parameter2
.
.
.
debit_parameter1
debit_parameter2
.
.
.
In this case, would it be more efficient if i separate different kinds of parameters into different tables?
eg.
create table terminals
id
visa_parameter_id
debit_parameter_id
.
.
.
create table visa_parameters
id
parameter_1
parameter_2
.
.
.
For what i can gather, it won't increase the record number significantly. In fact, it might even reduce the required space, since most terminals use the same set of parameter, and in this structure one set of parameter can be referenced to different terminals easily.
What I am wondering is whether this structure would suffer the same pitfall as the credit card transaction schema, namely sacrificing efficiency for space.
TIA
Mark
|
I can't give a definitive answer without knowing the data in a lot more detail than is possible here, but there may be little to gain by splitting this table. Efficiency of querying and using the tables is generally much more important than reducing space. Consider how the data is used and then choose the design that best supports that. Remember, empty columns use up very little space - in fact, no space at all if they are at the end of the record.
|
|

07-22-03, 23:00
|
|
Registered User
|
|
Join Date: Nov 2002
Posts: 98
|
|
what i meant by reducing space is more like reducing the number of records. ie trying to find some parameters that have some kind of grouping, and then put them into a separate table and then linking it with a foreign key. so if I were to search for those parameters, I can get a quicker response since there would be less records there.
for example:
create table terminals
(
parameter1
parameter2
....
area_id
)
create table areas
(
area_parameter1
area_parameter2
....
)
terminals within the same area would have the same parameter.
Can you comment on this particular design?
TIA
Mark
|
|

07-23-03, 06:07
|
|
Moderator.
|
|
Join Date: Sep 2002
Location: UK
Posts: 4,874
|
|
Quote:
Originally posted by mchih
what i meant by reducing space is more like reducing the number of records. ie trying to find some parameters that have some kind of grouping, and then put them into a separate table and then linking it with a foreign key. so if I were to search for those parameters, I can get a quicker response since there would be less records there.
for example:
create table terminals
(
parameter1
parameter2
....
area_id
)
create table areas
(
area_parameter1
area_parameter2
....
)
terminals within the same area would have the same parameter.
Can you comment on this particular design?
TIA
Mark
|
Sounds reasonable - in fact, what you are doing is standard normalisation: if some parameters are functionally dependent on the area to which a terminal belongs rather than on the individual terminal, then they should be normalised to a separate area table.
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|