Page 1 of 2 12 LastLast
Results 1 to 15 of 20
  1. #1
    Join Date
    Jan 2008
    Posts
    11

    Conditional Foreign Keys

    We have a model that is maintained on multiple database systems. In each of these systems we intend to enforce the standard foreign key methodology for the local data. However, we also replicate the data to other systems for reporting and lookup purposes.

    The data is identified by a source system identifier so we always know when system "owns" the data. So if the source system identifier is "Jacksonville", that data can be modified on the Jacksonville system. If the source system identifier is "Atlanta", that data can be modified in Atlanta. The "Jacksonville" data cannot be altered in any way in Atlanta and the "Atlanta" data cannot be modified in Jacksonville. So we want our Foreign Keys to only constrain the local data.

    While we would like the Foreign Keys to enforce all the data, there are sometimes issues with the data replication that cause the data to lost or out of sync. We are willing to accept that the quality of non-native data may by less than accurate, but we cannot allow the failures to cripple the entire data movement process.

    So we have proposed using a Conditional Foreign Key model. While the database doesn't support this, we can engineer it through Check Constraints. Has anyone seen this done? What pitfalls and or gotchas are there? Obviously aside from the inaccurate non-native data. :-)

  2. #2
    Join Date
    Mar 2009
    Location
    Sydney, Australia
    Posts
    258
    Well you already have several (source) messes, plus you have replicated the messes; you are already suffering in quality ... but you want to add another order of messiness to it. The whole thing is a mess (now) and if you add another layer of mess on top of it, you will get mess squared. You may need more machines as well. I speak from several assignments of untangling such messes.

    I have seen it done with Check Constrains and triggers and procs. None of them work well, and they all have a heavy admin burden.

    Why don't you take the mess you have now as being unacceptable (it is!), and fix that once and for all. Get a single real server than can handle all your sources; implement one single database that has enough bandwidth to be accessed from the remote (now source) sites; support OLTP and DSS all from the one place. One fact in one place. Not one fact copied to five places, and lost sometimes. Zero replcation except for DR.
    Regards
    Derek Asirvadem (Formerly DerekA)
    Information Architect / Senior Sybase DBA
    Copyright 2009 Software Gems Pty Ltd

    I answer questions from the Original Poster only. If you have a genuine question, as the moderators have requested, start a new thread.

    http://www.softwaregems.com.au

  3. #3
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    It does sound like a mess mate - are there any mitigating reasons for having multiple databases each holding a share of the data? Do you know where the data errors are occurring in your system?

  4. #4
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    HogHunter,
    It seems like you want to use constraints as a data security measure rather than for data integrity. If you can't change the architecture then I suggest you use the security features of your DBMS rather than rely on CHECK constraints.

    Most DBMSs will let you create views over some subsets of the data and then grant permissions only on those views rather than on tables.

  5. #5
    Join Date
    Jan 2008
    Posts
    11
    Quote Originally Posted by Derek Asirvadem View Post
    Well you already have several (source) messes, plus you have replicated the messes; you are already suffering in quality ... but you want to add another order of messiness to it. The whole thing is a mess (now) and if you add another layer of mess on top of it, you will get mess squared. You may need more machines as well. I speak from several assignments of untangling such messes.

    I have seen it done with Check Constrains and triggers and procs. None of them work well, and they all have a heavy admin burden.

    Why don't you take the mess you have now as being unacceptable (it is!), and fix that once and for all. Get a single real server than can handle all your sources; implement one single database that has enough bandwidth to be accessed from the remote (now source) sites; support OLTP and DSS all from the one place. One fact in one place. Not one fact copied to five places, and lost sometimes. Zero replcation except for DR.
    Thanks for the feedback, could you tell me what problems you have seen with constraints and procs? I know about issues with triggers but not constraints or procs.


    In fact, we do not have any mess. Our data integrity is solid. We do not have any existing bad data. We currently manage between 12,000 and 15,000 transactions a day that incorporate roughly 200,000-300,000 rows of data. None of it is bad.

    The problem lies with the constraints put on us by our existing infrastructure and the business problem presented. A major component of our infrastructure is a data movement tool that handles real time data movement. However, since the tool is moving the data over long distances and within a heterogeneous environment, it is unable to move the data as an atomic transaction. That being said, if we are assured that the local database system manages integrity over the data within its system of record, we then know that the source data is good. The problem becomes managing data movement and dealing with individual failures and ensuring they are properly handled. The data movement tools are very robust and deal with data movement very well. They handle failures and recovery quite well. The problem is they do not have the ability to process the data in the exact order it was written locally. This is often due to a hiccup in communications. Once a parent record in not written the children will also fail until the recovery occurs. Since the recovery may take few minutes, data will queue up too rapidly to be acceptable. These parameters are not negotiable. Ergo, I am forced to engineer a solution within these bounds.

    A single database is not possible due to a variety of business problems including but not limited to distance and alternate domains. Essentially, part of the purpose for the system is to address a legacy issue. To address the issue a front end was built to collect the data in a more timely and maintainable manner. However, once the data is captured timeliness is no longer an issue. The issue then becomes integration into the legacy system and interactive maintenance for a brief period until the legacy system takes full control over the data.

    The option of a homogeneous system is not in any way available.

    Based on these business constraints, and assuming that the local systems with be bound by check constraints that ensure data that originates locally is referentially clean, I am asking if this makes sense or if there is a better way to handle this. Re-engineering the architecture is out of the question.
    Last edited by HogHunter; 11-10-09 at 16:03.

  6. #6
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    I don't see how it's possible to do what you want using CHECK constraints. Except for trivial cases it's not really possible to use a CHECK constraint to prevent data being changed because they only validate the intended state of the data. It's not possible to write a CHECK constraint that says "this value used to be X therefore you can't change it to Y". That's called a transition constraint and although some (not all) systems do support transition constraints they are not the same as CHECK constraints.

    You could use views and the WITH CHECK OPTION feature if your DBMS supports it. But if you are going to use views anyway then why not just grant or deny permission on those views to allow only the updates you want. That's what views and permissions are designed for.

  7. #7
    Join Date
    Jan 2008
    Posts
    11
    Quote Originally Posted by mike_bike_kite View Post
    It does sound like a mess mate - are there any mitigating reasons for having multiple databases each holding a share of the data? Do you know where the data errors are occurring in your system?

    My response to Derek details the major issues, but I wanted to thank you for posting the question and to directly reply to the question you pose. The issue we face is occasionally, a hiccup occurs during data movement. When that happens a parent record may not have been written. If we try to write the children while a FK exists, the children will fail as well. The data movement tool can identify and recover from errors. So the bad data will eventually be repaired. However, if the FKs cause additional failures, the cascading result can become overwhelming. Also, since the data can only be edited at the place it was created, there is no concern that the bad data could represent a problem for the target system from an edit point of view. Of course it does pose the problem that attempts to view the data may be problematic. That risk is acceptable for two reasons: first the failures are rare, and most of the data is inconsequential on the target side. On the flip side, when a FK exists, if the data gets out of sync, recovering and resyncing requires a monumental effort that would be dramatically simplified if the non-native data could be bulk moved without regard for referential integrity especially if we know at the end of the bulk process we will have gotten everything in line.

    I hope that has cleared up the problem rather than muddied it. :-)

  8. #8
    Join Date
    Jan 2008
    Posts
    11
    Quote Originally Posted by dportas View Post
    I don't see how it's possible to do what you want using CHECK constraints. Except for trivial cases it's not really possible to use a CHECK constraint to prevent data being changed because they only validate the intended state of the data. It's not possible to write a CHECK constraint that says "this value used to be X therefore you can't change it to Y". That's called a transition constraint and although some (not all) systems do support transition constraints they are not the same as CHECK constraints.

    You could use views and the WITH CHECK OPTION feature if your DBMS supports it. But if you are going to use views anyway then why not just grant or deny permission on those views to allow only the updates you want. That's what views and permissions are designed for.
    Hmm, my understanding was that a check constraint could be written to validate data and if the data is invalid, it would cause the update/insert to fail. Am I wrong in my understanding? My platforms are MS SQL and DB2.

  9. #9
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    Quote Originally Posted by HogHunter View Post
    Hmm, my understanding was that a check constraint could be written to validate data and if the data is invalid, it would cause the update/insert to fail. Am I wrong in my understanding? My platforms are MS SQL and DB2.
    Yes but that won't stop Atlanta modifying data that belongs to Jacksonville because if "Jacksonville" is a permitted value then Atlanta can insert that value too. And if "Jacksonville" is NOT a permitted value then Jacksonville data couldn't be included in Atlanta's database at all (in which case just leave the data out and don't bother with the constraint).

    Maybe if you give a small example with some sample data we could understand what you mean.

  10. #10
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Quote Originally Posted by HogHunter View Post
    If we try to write the children while a FK exists, the children will fail as well. The data movement tool can identify and recover from errors. So the bad data will eventually be repaired. However, if the FKs cause additional failures, the cascading result can become overwhelming. Also, since the data can only be edited at the place it was created, there is no concern that the bad data could represent a problem for the target system from an edit point of view. Of course it does pose the problem that attempts to view the data may be problematic. That risk is acceptable for two reasons: first the failures are rare, and most of the data is inconsequential on the target side. On the flip side, when a FK exists, if the data gets out of sync, recovering and resyncing requires a monumental effort that would be dramatically simplified if the non-native data could be bulk moved without regard for referential integrity especially if we know at the end of the bulk process we will have gotten everything in line.
    Could you run a stored procedure to examine the transferred data (at each site) and add an artificial parent if none exists along with a time stamp. I assume when the parent data does finally get sent then your existing processes would update the artificial data for the parent with the real data. If any artificial parents exist for longer than an hour (let's say) then you can flag an alert.

    I'll also ask what would happen if you just got rid of the FKs and keep your existing processes unchanged.

  11. #11
    Join Date
    Jan 2008
    Posts
    11
    Quote Originally Posted by mike_bike_kite View Post
    Could you run a stored procedure to examine the transferred data (at each site) and add an artificial parent if none exists along with a time stamp. I assume when the parent data does finally get sent then your existing processes would update the artificial data for the parent with the real data. If any artificial parents exist for longer than an hour (let's say) then you can flag an alert.

    I'll also ask what would happen if you just got rid of the FKs and keep your existing processes unchanged.
    Mike,

    Thanks for the suggestion. In fact, that is one of the proposed alternative solutions. Both have a scheduled task to validate the data and remove the FKs. We are considering that but the issue is we then loose the database integrity constraint at the local level. With check constraints, we ensure the data integrity at the point where it is created and edited and we trust the data movement to maintain that integrity as a function of its operation. The model is kind of like a data warehouse model. In a data warehouse since you know you only have one source for data manipulation it is common to eliminate FKs. In our situation, we are treating the local data set as transactional thus requiring constraint/database level integrity management and non-local data as having a single source for data manipulation.

    There is a strong likelihood the team that desires no constraints with a scheduled management proc may win the day. That is more performant as well. It just exposes us to serious data issues.

  12. #12
    Join Date
    Jan 2008
    Posts
    11
    Quote Originally Posted by dportas View Post
    Yes but that won't stop Atlanta modifying data that belongs to Jacksonville because if "Jacksonville" is a permitted value then Atlanta can insert that value too. And if "Jacksonville" is NOT a permitted value then Jacksonville data couldn't be included in Atlanta's database at all (in which case just leave the data out and don't bother with the constraint).

    Maybe if you give a small example with some sample data we could understand what you mean.
    Ah, I failed to mention state that a business rule defines that constraint. Only data that is local may edited by local systems. All applications are required to have this functionality. Our QA team tests for this in every baseline test. That is how we ensure a "system of record". In any mixed system environment, you must establish baseline rules for who owns the data. Without that you have chaos. Sorry for omitting that critical understanding. It is so fundamental to our existence that I forgot to mention it. It is a rule that is so serious, violations can result in serious discipline including termination.

  13. #13
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    So if I understand it, you already have the logic that evaluates whether an update can happen or not based on who they are. That's not really logic that belongs in a CHECK constraint though because it's not a deterministic constraint at all, right?

    CHECK constraints are only checked on INSERT or UPDATE, not on DELETE. So you can prevent users updating other people's data but you can't stop them deleting it.

  14. #14
    Join Date
    Jan 2008
    Posts
    11
    Quote Originally Posted by dportas View Post
    So if I understand it, you already have the logic that evaluates whether an update can happen or not based on who they are. That's not really logic that belongs in a CHECK constraint though because it's not a deterministic constraint at all, right?

    CHECK constraints are only checked on INSERT or UPDATE, not on DELETE. So you can prevent users updating other people's data but you can't stop them deleting it.

    The control for "who" can update/add//delete which data is not managed at a check constraint level. It is managed at the business rules layer. The purpose for the suggested check constraints is purely for the purpose of replacing the referential integrity properties of Foreign Key constraints. The difference being that it will apply the referential integrity check only on the data that belongs to the set of data managed within this system. In other words, it is establishing a FK for the"system of record" not the entire table/data set. Foreign Keys can only operate across the entire table and we want to work on subsets.

    Every table has a column called a source system identifier (SSID). It is set to a value determined by the location the data was created from. So data created in Jacksonville will have an SSID of "JAX" and data created in Atlanta will have an SSID of "ATL" Programs running in Jacksonville can display any data but they may only alter data that has an SSID of "JAX". Conversely programs running in Atlanta can display all data but can only modify data with an SSID of "ATL". The controls for this is maintained separately from the database. It is at a layer we call the DAL or Data Access Layer. This layer is built in either C# or Java depending on the program environment. We have a standard that does not allow for business rules to be embedded in the database. That design decision is one that is beyond the scope of this problem.

    To restate the problem / question: I have an environment where normal foreign keys cannot be reliably used due to issues that arise solely out of communicating data via a replication model that is not able to maintain transactional integrity. However, I can be assured that the data will eventually be synchronized accurately. What I would like is a way to establish referential integrity over a subset of the data based on a predetermined Source System Identifier. If the data `comes from an external source we will assume the external system is managing referential integrity and trust the incoming data implicitly. However, we want to establish controls at a local level similar to foreign keys.

    We cannot split the data into separate databases and we cannot eliminate the multiple instances of location. The data must be merged from heterogeneous systems. My view was to establish two way constraint such as the following:

    Code:
    -- Calls udf that looks for Code in use by child 
    -- prevents orphans
    
    ALTER TABLE VALID_CODES  WITH 
    	CHECK ADD  
    	CONSTRAINT con_FK_VALID_CODES_in_use 
    	CHECK  ((udf_FK_VALID_CODES_in_use(VALID_CD)='T'))
    GO
    
    ALTER TABLE VALID_CODES
    CHECK CONSTRAINT con_FK_VALID_CODES_in_use
    GO
    
    -- Calls a udf to see if the Code being used in a child table exists in the parent
    -- prevents bad data
    ALTER TABLE PEOPLE WITH 
    	CHECK ADD  
    	CONSTRAINT con_FK_VALID_CODES_apply_to_PEOPLE 
    	CHECK  ((udf_FK_VALID_CODES_apply_to_PEOPLE(VALID_CD)='T'))
    GO
    
    ALTER TABLE PEOPLE 
    CHECK CONSTRAINT con_FK_VALID_CODES_apply_to_PEOPLE
    GO

  15. #15
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    I don't think the check to prevent orphan rows will work for two or three reasons:

    - CHECK constraints don't get validated when you DELETE a row
    - When you UPDATE, only the new value will get checked, not the existing one, so I don't see how you can check whether the existing value is in use.
    - In Microsoft SQL Server, CHECK constraints are evaluated on a per-row basis. They break atomicity. If the UPDATE involves changing more than one value then you could stop the update on the first row even if the effect of the whole transaction doesn't invalidate the constraint. This is a major flaw with the way SQL Server constraints work.

    You could use an ASSERTION, except that Microsoft SQL Server doesn't support them. I think you'll have to use a proc or trigger instead.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •