| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |

03-24-10, 07:03
|
|
Registered User
|
|
Join Date: May 2008
Posts: 33
|
|
|
Storing Actual history and database history
|
|
Hello,
I'm creating a table to track real changes of an object and also our knowledge of an object against time. For example, it could be some statistics about a town.
The important thing is that our version of real history of the object at any point in time is just as valuable as a version of history at any other point in time. So for example the current version of history is just as important as what we thought the history of the object was a year ago.
The kind of design I've come up with so far is:
Code:
--These will be constants that will never change since the town is created in the database
CREATE TABLE TownSeeds(
Id int Identity(1,1) CONSTRAINT PK_TownSeeds Primary Key,
EntryDate DateTime NOT NULL DEFAULT Current_TimeStamp
)
--stores changing attributes of the town
CREATE TABLE Towns(
Id int CONSTRAINT FK_TownsChanges_TownSeeds FOREIGN KEY REFERENCES dbo.TownSeeds(Id),
TownName varchar(255) NOT NULL,
Population float,
--other attributes...
--actual history of the town
ChangeDate DateTime NOT NULL,
NextChangeDate DateTime NOT NULL, --created to make queries easier
--stores our history of knowledge of the town
UpdateDate datetime NOT NULL,
NextUpdateDate datetime NOT NULL,
CONSTRAINT PK_Towns Primary Key(Id,UpdateDate)
)
Here is some example data being inserted:
Code:
--Create town with two points in history where population changes
INSERT INTO TownSeeds VALUES(DEFAULT,DEFAULT)
DECLARE @Town1Id int
SET @Town1Id=Scope_Identity()
DECLARE @EntryDate datetime
SET @EntryDate=(SELECT EntryDate FROM TownSeeds WHERE Id=@Town1Id)
INSERT INTO Towns
SELECT
Id=@Town1Id,
TownName='test town',
Population=5000,
ChangeDate='19050101',
NextChangeDate='19100101',
UpdateDate=@EntryDate,
NextUpdateDate='99991231'
INSERT INTO Towns
SELECT
Id=@Town1Id,
TownName='test town',
Population=7000,
ChangeDate='19100101',
NextChangeDate='99991231',
UpdateDate=@EntryDate,
NextUpdateDate='99991231'
--Better research shows that the actual population was 5500 in 1905
DECLARE @UpdateDate datetime
SET @UpdateDate=CURRENT_TIMESTAMP
UPDATE Towns
SET NextUpdateDate=@UpdateDate
WHERE Id=@Town1Id AND UpdateDate=@EntryDate
INSERT INTO Towns
SELECT
Id=@Town1Id,
TownName='test town',
Population=5500,
ChangeDate='19050101',
NextChangeDate='19100101',
UpdateDate=@UpdateDate,
NextUpdateDate='99991231'
I'm a little concerned it might be hard to apply constraints to this data with this added complexity.
I'd really appreciate some feedback on this design. I'm guessing this is a fairly common requirement so I'm hoping someone here will have some helpful feedback from their experience implementing something similar.
I've actually asked this question a while back here, but would appreciate a little more discussion.
Thanks,
Michael
|
|

03-25-10, 05:14
|
|
Registered User
|
|
Join Date: May 2008
Posts: 33
|
|
Please let me know if I haven't been clear on the question or you'd like more information. I actually asked the question in a different forum before so was hoping you guys might have more to say. I'd appreciate any response!
Michael
|
|

03-25-10, 06:00
|
|
King of Understatement
|
|
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
|
|
|
|
There were some errors in your code - corrected.
Code:
IF EXISTS (SELECT NULL FROM sys.tables WHERE object_id = OBJECT_ID('dbo.towns'))
BEGIN
DROP TABLE dbo.towns
END
IF EXISTS (SELECT NULL FROM sys.tables WHERE object_id = OBJECT_ID('dbo.TownSeeds'))
BEGIN
DROP TABLE dbo.TownSeeds
END
--These will be constants that will never change since the town is created in the database
CREATE TABLE TownSeeds(
Id int Identity(1,1) CONSTRAINT PK_TownSeeds Primary Key,
EntryDate DateTime NOT NULL DEFAULT Current_TimeStamp
)
--stores changing attributes of the town
CREATE TABLE Towns(
Id int CONSTRAINT FK_TownsChanges_TownSeeds FOREIGN KEY REFERENCES dbo.TownSeeds(Id),
TownName varchar(255) NOT NULL,
Population float,
--other attributes...
--actual history of the town
ChangeDate DateTime NOT NULL,
NextChangeDate DateTime NOT NULL, --created to make queries easier
--stores our history of knowledge of the town
UpdateDate datetime NOT NULL,
NextUpdateDate datetime NOT NULL,
CONSTRAINT PK_Towns Primary Key(Id,UpdateDate)
)
DECLARE @Town1Id INT
--Create town with two points in history where population changes
INSERT INTO TownSeeds VALUES(DEFAULT)
SET @Town1Id=Scope_Identity()
DECLARE @EntryDate datetime
SET @EntryDate=(SELECT EntryDate FROM TownSeeds WHERE Id=@Town1Id)
INSERT INTO Towns
SELECT
Id=@Town1Id,
TownName='test town',
Population=5000,
ChangeDate='19050101',
NextChangeDate='19100101',
UpdateDate=@EntryDate,
NextUpdateDate='99991231'
INSERT INTO Towns
SELECT
Id=@Town1Id,
TownName='test town',
Population=7000,
ChangeDate='19100101',
NextChangeDate='99991231',
UpdateDate=DATEADD(d, 1, @EntryDate),
NextUpdateDate='99991231'
--Better research shows that the actual population was 5500 in 1905
DECLARE @UpdateDate datetime
SET @UpdateDate=CURRENT_TIMESTAMP
UPDATE Towns
SET NextUpdateDate=@UpdateDate
WHERE Id=@Town1Id AND UpdateDate=@EntryDate
INSERT INTO Towns
SELECT
Id=@Town1Id,
TownName='test town',
Population=5500,
ChangeDate='19050101',
NextChangeDate='19100101',
UpdateDate=@UpdateDate,
NextUpdateDate='99991231'
|
|

03-25-10, 06:05
|
|
King of Understatement
|
|
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
|
|
Actually, I only corrected the compile errors - I think there are logical errors in your data.
|
|

03-25-10, 06:44
|
|
:-)
|
|
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 4,449
|
|
I don't see any reason to store the surrogate key (TownSeed) in a separate table. Also, I doubt Next...Date will help to speed up queries, while having this columns definitely introduces maintenance problems. Otherwise, it looks like a normal slowly changing dimension to me. Google for "slowly changing dimension" and you'll find plenty of information.
|
|

03-25-10, 13:30
|
|
Registered User
|
|
Join Date: May 2008
Posts: 270
|
|
Your design is pretty much there, but I'd suggest changing your table names to clarify some of the concepts.
Your 'TownSeeds' table is your time-invariant data, and is the "anchor" for your time-variant attributes (it's also what other tables will probably hold a foreign key to if they need a reference to a "town"). In fact, this table represents your Town, and I'd rename the table as such:
Town(town id, founded date, coordinates, ...)
Now we need a new name for time-variant table. I don't know exactly what you're recording here, but since it's census time here in the states, I'll take some inspiration from that:
TownCensus(town id, row valid date, row superceded date, from date, to date, town name, population, ...)
Your natural key for this table is the town and the time period during which the row is valid. You'll most likely need some sort of trigger to enforce it -- a simple primary key will be able to enforce unique dates, but can't prevent overlapping time periods, which you most likely don't want (you should only have one valid row at any one time). You'll probably also need to enforce that there's no overlap in your 'from' and 'to' dates for currently-valid rows, and possibly no gaps either.
One potential drawback to this design is that everytime you enter a change, you must provide values for all columns. If you don't expect changes to occur in tandem (only one attribute changes out of, say, a dozen), then you're going to be double-entering a lot of data to record the one change. Although this might still be desirable if you're entering in some sort of official record -- like census results -- in which case you may be interested in recording that there's been no change.
The solution to this, as proposed in the other forum, is to move each attribute into its own table, each with their own timestamp columns. That way you can record a population change without having to re-enter everything else that didn't change.
You can mix-and-match as well. Since I wouldn't expect town names to change very much (if at all), I might do something like:
Town(town_id, ...)
TownName(town_id, from_date, to_date, town_name, ...)
TownCensus(town_id, from_date, to_date, population, ...)
Unfortunately, there's no way around implementing a bunch of complex triggers to enforce all your various time constraints. Your joins also have the potential to become very complex. You may want to follow up in the appropriate DB-specific forum if you have implementation problems. There's also a very good book (though slightly outdated) that you may want to look at called 'Developing Time-Oriented Database Applications in SQL' by Richard Snodgrass. Handily enough, it's available for free download.
|
|

03-26-10, 07:44
|
|
Registered User
|
|
Join Date: May 2008
Posts: 33
|
|
Oh dear, I can't believe I hadn't checked this code. That's probably why I didn't receive as much discussion when I posted this issue last in the other forum. Here is the corrected version:
Code:
--These will be constants that will never change since the town is created in the database
CREATE TABLE TownSeeds(
Id int Identity(1,1) CONSTRAINT PK_TownSeeds Primary Key,
EntryDate DateTime NOT NULL DEFAULT Current_TimeStamp
)
--stores changing attributes of the town
CREATE TABLE Towns(
Id int CONSTRAINT FK_TownsChanges_TownSeeds FOREIGN KEY REFERENCES dbo.TownSeeds(Id),
TownName varchar(255) NOT NULL,
Population float,
--other attributes...
--actual history of the town
ChangeDate DateTime NOT NULL,
NextChangeDate DateTime NOT NULL, --created to make queries easier
--stores our history of knowledge of the town
UpdateDate datetime NOT NULL,
NextUpdateDate datetime NOT NULL,
CONSTRAINT PK_Towns Primary Key(Id,UpdateDate,ChangeDate)
)
--Create town with two points in history where population changes inserted a year ago
INSERT INTO TownSeeds VALUES(dateadd(year,-1,CURRENT_TIMESTAMP))
DECLARE @Town1Id int
SET @Town1Id=Scope_Identity()
DECLARE @EntryDate datetime
SET @EntryDate=(SELECT EntryDate FROM TownSeeds WHERE Id=@Town1Id)
INSERT INTO Towns
SELECT
Id=@Town1Id,
TownName='test town',
Population=5000,
ChangeDate='19050101',
NextChangeDate='19100101',
UpdateDate=@EntryDate,
NextUpdateDate='99991231'
INSERT INTO Towns
SELECT
Id=@Town1Id,
TownName='test town',
Population=7000,
ChangeDate='19100101',
NextChangeDate='99991231',
UpdateDate=@EntryDate,
NextUpdateDate='99991231'
--Better research now shows that the actual population was 5500 in 1905
DECLARE @UpdateDate datetime
SET @UpdateDate=CURRENT_TIMESTAMP
UPDATE Towns
SET NextUpdateDate=@UpdateDate
WHERE Id=@Town1Id AND UpdateDate=@EntryDate
INSERT INTO Towns
SELECT
Id=@Town1Id,
TownName='test town',
Population=5500,
ChangeDate='19050101',
NextChangeDate='19100101',
UpdateDate=@UpdateDate,
NextUpdateDate='99991231'
The nextupdate and nextchangedate will help speed up queries as a common requirement is to pull up 1 row per town showing it's attributes on a particular date as we believed they were on particular certain date. For instance, we might want the population and the name of all towns for 10 years ago as we had them recorded a year ago. Having the Next... columns makes this easy:
Code:
DECLARE @RealHistoryDate datetime
SET @RealHistoryDate=DATEADD(Year,10,CURRENT_TIMESTAMP)
DECLARE @UpdateHistoryDate datetime
SET @UpdateHistoryDate=DATEADD(Year,-1,CURRENT_TIMESTAMP)
SELECT *
FROM Towns T
WHERE
T.ChangeDate<=@RealHistoryDate
AND
T.NextChangeDate>@RealHistoryDate
AND
T.UpdateDate<=@UpdateHistoryDate
AND
T.NextUpdateDate>@UpdateHistoryDate
Without the next... columns it would be difficult to pull up only one row for each town like in the example.
The "Seed" table is used for foreign key reference and for new id generation. I know it's not ideal but I thought this would be the least exotic method of doing both these things and also a place to store non-changing attributes. I haven't stored the founding date there as this might of been researched incorrectly and revised at a later date. Other things that might be considered storing in the "Seeds" table are columns that state a relationship that defines an object, altough I can't think of any examples at the moment.
Almost everything will be a changing attribute, and there are a lot more than the example shows. This is why I have chosen to call the history table "towns", as the seeds table will rarely be used in queries. This is also why it would be inpratical to have a table for each changing attribute, as this would means a table for each one, which will typically be around 20 different attributes and there are other objects that have relationships to towns, such as counties, that also have changing attributes.
Thanks for the link on the book, I've just downloaded it and will have a good look through.
Another question; the primary key in this changing attribute table is the Id,UpdateDate and ChangeDate. This is also a clustered index. Any column that relates to another object type with a high number of objects I.e. counties, has a non-clustered index on it. Is this a good way to index this table?
A few other notes; I am not actually storing information about towns, rather objects in the offshore oil and gas industry, but wanted to use an example that everyone is familiar with. Also, I am using SQL Server 2008, although this shouldn't really be an issue, as I expect the design should be the same on any database engine.
Thanks very much for your replies!
|
|

03-26-10, 07:55
|
|
Registered User
|
|
Join Date: May 2008
Posts: 33
|
|
And on the issue of checking data integrity I had thought of triggers, although they really aren't my favourite things. They just don't seem quite flexible enough and don't always produce expected results.
Another solution I had thought of is running a scheduled stored procedure that flags data not following certain rules. This means that transaction speed isn't as compromised and rules can be more complex with the scheduled sproc run at times of low database activity. I guess the issue is here that "flagged" data will have to be dealt with in some way. Perhaps the original researcher can be asked to correct it before it is marked as usable.
Let me know if you think this is a good approach.
Thank you
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|