If a table contains duplicate data, I assume the database level pkey is a sequence generated unique number. But how is it possible to find a record inspite of having a business pkey (please see the example scenario below).
Also what is best way to check for the existence of a record to determine to update or insert a record.
Ex: If a nurse takes note of the results of the two tests like Blood pressure and pulse taken at a cetain time.
Later if nurse realizes that blood pressure value was entered wrong, she would enter the correct blood pressure value but expects the first value to be in databse with an entered an error flag.
In this case, I assume that there would be obserations table with sequencially generated number and patient id and foriegn key. The business key could be a composite pkey of patient id & observation type and time at which the observationis taken.
Please let me know the best optimal way to check for the existence of the observation record to determine update or insert into the table. I was thinking the select count(*) from observations where (mentioning all the fields values of that table(except the unique number that is auto generated since it is not avialable for the user) would be very inefficent as the where clause has all fields including the the non key values.
Why do you say "where (mentioning all the fields values of that table(except the unique number that is auto generated since it is not avialable for the user)" when all you would need is "where (business key) = (value)"? i.e.
where patient_id = ?
and observation_type = ?
and observation_time = ?
The most efficient way to code the check probably depends on your DBMS. For example, in Oracle you could use the MERGE command designed specially for this kind of update, or you could do this:
set value = l_value
where patient_id = l_patient_id
and observation_type = l_type
and observation_time = l_time;
if sql%rowcount = 0 then
-- Updated 0 rows
insert into observation (...) values (...);
Let's use the more standard terminology:
A Natural Key is what you are calling the "business key": i.e. a set of columns that "naturally" belongs to and uniquely identifies the rows in a table.
A Surrogate Key is what you called "the unique number that is auto generated since it is not avialable for the user".
1) In many cases the primary key can be defined using the natural key, e.g. if orders are identified by the business via an "Order Number", then that Order Number can also serve as the primary key.
To a large extent the decision of whether or not to use a surrogate key is a matter of taste: some people will never use them, some people will always use them, and others will mix and match.
Some cases when surrogate keys may be useful:
When you really can't identify a set of columns that can uniquely identify a row. A common example is People: what uniquely identifies a person? Two different people can have the same name, date of birth and address, at least in theory.
When the natural key is overly cumbersome: for example, even if (surname, forename, middle_name, dob, zip_code, address_line_1) was sure to be unique, would you really want to use it in all your foreign keys, queries and updates?
When the natural key is volatile, if cascading updates of primary keys to child tables is problematic in your DBMS (e.g. Oracle doesn't have cascading updates, you have to write a lot of code yourself to handle them).
When your application building tool of choice is designed to work best (or only) with single-column keys. (Though you could then consider getting a bettrer tool!)
The golden rule is that wherever you use a surrogate as the primary key, you must also declare UNIQUE constraints for the natural key(s). Otherwise you can end up with the same (duplicate) data recorded several times with different surrogate key values.
2) I'm not sure what this question means - can you give an example?
We identified two business scenarios in the application:
1) Record update: If at time T1 a record R1 is persisted in the database, at later point of time, if the user realizes that the values in certain column(s) of R1 was wrong, user marks the record as errorenous record by updating a boolean column called -error record with value 'Y'.
Then recreates another record R2 which is identical to R1 except with corrected values in certain column(s). This scenario was labled as Record update.
Also in this case, I wonder if a surrogate or a mix of surrogate and natural pkey are needed ???
2) Column(s) update: At time T1 a record is created and some of the columns are left blank as some info. at that point of time is not available in the business flow. At time T2, the value(s) of the empty column(s) are available, then the same record is updated with the values. This scenario was
labeld as Column update.
Also in this case, I assume that natural pkey is suffice???
1) Why would you keep erroneous copies of data in the same table? Why not move it out to a separate (audit trail) table if you need to keep it at all - making querying on valid data a lot simpler and more efficient. By keeping it all in one table you force the need for a surrogate key (your table will contain duplicates by any other business key), and your business key can't be enforced by a simple primary/unique constraint (instead you will have to ensure that there is only 1 row with error='N' per business key). Not nice.
I guess nothing beats the knowledge from experience...Thanks for the prompt replies...very informative.
I have worked mostly on the business layer but currentely working on database tier too and thus you might find my database questions very basic;
I have one more question:
The question is about the strategies to check for the record existence to determine insert or update.
I have seen two scenarios each in two tier and 3 tier enviornment respectively:
In 2 tier env., the check made to decide whether to insert or upate was based on select count(*) from x_table. If count is zero insert else update.
In 3 tier envoirnment, the check was boolean flag based.
A boolean flag 'isPersisted' is initially set by the system as false and is an attribute of the data transfer java object(DTO). The DTO is created by the client but the flag is not touched by the client. The database layer always returns the inserted/updated data in the form of DTO to handle the optmistic locking; During updates, the database layer team sets the flag 'isPersisted' to true. Thus in the persitence logic at the middle tier, if the flag is false, insert logic of the database layer is evoked else the update logic is evoked.
So my question is, are the above strategies one of the many strategies and are there any reasons to choose one over the above.
Are there any web sites/books related to pratical approaches/strategies related to database design/decision making patterns.