Page 1 of 2 12 LastLast
Results 1 to 15 of 23

Thread: Archiving data

  1. #1
    Join Date
    Nov 2009
    Posts
    2

    Unanswered: Archiving data

    I am creating a table that is going to have a lot of data in the future. If I have a flag column "Archived" in the table to differentiate between active data and archived, would that be enough to prevent any query performance issue or do I need to actually create a separate table and move old data to that table?

  2. #2
    Join Date
    Aug 2009
    Posts
    262
    "lot of data"

    then why creating a separate table ? .

    boolian condition (yes/no) or flag 1/0 is the best method for faster scan .

  3. #3
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4
    The size of your table will stay the same. So the RDBMS has the same amount of data to search trough. No speed gain.

    Declaring an index on the Archive column is quite useless: an index performs best when the column(s) it is defined on has many different values. Declaring an index on eg. gender ('male'/'female') is therefore useless, same here.
    ... do I need to actually create a separate table and move old data to that table?
    Yes. You may even consider moving it to a separate Archive database.

    If you have a data warehouse, it will have to combine both tables.
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

  4. #4
    Join Date
    Aug 2009
    Posts
    262
    it appears the boolean column is not what you have experienced . even in oracle , having no specific datatype for boolean .. we do it through varchar2 . and when i say we it means all the dba's around the globe including V.P or oracle corp , Tom.


    just to have a check weather a data is being archived or not. a simple column bearing the value of yes or no , 1/0 , Y/N tends up a quick and simple condition to resolve in a stored proceedure .

    inseat of passing a whole dataset into table or temp tables .
    a simple condition such as

    if flag != 'y'
    then do
    else if flag!='n'
    do this


    provided it is not documented that exist clause works faster .

    if any one here have been gone throu8gh the pain of tuning as a dba , you will agree flags are a better choice when we only need to classify a data set apart from another

    .


    waiting for mr. flump to prove me wrong .

  5. #5
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    George
    Home | Blog

  6. #6
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by mishaalsy View Post
    waiting for mr. flump to prove me wrong .
    You won't need to wait long. I have no idea why you keep answering SQL Server questions referring to what goes on in Oracle. I avoid DB2 and Oracle forums because I have no idea how to get the best out of them.

    Assuming you mean that the first query is quicker than the second:
    Code:
    SELECT this, that
    FROM thisTable
    WHERE booleanColumn = 1
    Code:
    SELECT this, that
    FROM thisTable
    then no, this is no quicker.

    The only exceptions would be if booleanColumn is indexed:
    1) There is a clustered index lead by the booleanColumn
    2) There is a covering index lead by the booleanColumn
    3) There is a non-covering index lead by the booleanColumn and there are so few rows where booleanColumn = 1 (depending on the data between 1 row to ~5% of the table) that it is more efficient to perform lookups than scan the clustered index\ heap

    If a significant proportion of the rows of the table have booleanColumn = 0 then you will save on network traffic, but I doubt that is what you meant.

    It isn't true that indexing boolean columns is useless or should never ever be done, but you'd be hard pressed to describe a situation where it would be worth while.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  7. #7
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    Provided Answers: 5
    Hey there Mr. Flump,
    I have one! A table that is used to keep track of work, when the work is completed a flag is switched to off, when we need to see what work is yet to be accomplished we run
    Code:
    select cols... from work_table where work_flag = 'y'
    There is an index on the work_flag column and is used by the query. The thing of it is, is 99.8% of the data at any given time does not = Y. The same query looking for N is a scan of the entire table and useless for finding anything.
    Other than items of that nature, I completely agree with you that a flag or boolean condition is useless for speeding things up. Can it be used to filter data? Sure. The only way to speed things up in this case would be to mark almost all of the data as archived or remove the archived data from the table as you had already suggested. And, I work primarily on DB2, but this should be the same in any type of database repository, whether Oracle, Informix or paper in a file cabinet, spread out on desk or however you are holding your data.
    Dave Nance

  8. #8
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by dav1mo View Post
    Hey there Mr. Flump,
    I have one!
    Dave, If I was pressed then that would be exactly the sort of application I would use it for (I am assuming it falls in to the third category I mentioned, which is where the boolean is highly selective for one of its values). It illustrates that an index on a boolean is not "useless".

    Since this is a feature of B-Trees then your experience of DB2 is relevant since you are talking about something cross applicable to all enterprise RDBMSs, not some specific Oracle behaviour.

    In short, I'm not certain whether or not milshaay has requested you join this thread. What you post I agree with and is not in contradiction to what I posted.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  9. #9
    Join Date
    Nov 2009
    Posts
    2
    dav1mo's scenario is exactly what I am facing. Can I use a flag column? If so should I create an index on the flag column?
    Last edited by awu25; 11-10-09 at 14:05.

  10. #10
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Using a flag column to designate deleted records rather than archiving the data to a separate table can have significant performance impact on large tables. You'll be doing a table scan every time you search for data.
    Archiving is my preference, though you can mitigate some of the performance impact of bit columns by creating indexed views of only active data.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  11. #11
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    And the real answer is......

    Create 2 Tables that are identical

    Keep your bit column if you like for your logical delete

    Create a trigger that will fire on an update

    Then move all rows from the current table to history if the bit flad is set to 1

    You can also create a view that is the union of both tables if you want to see everything

    MOO
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  12. #12
    Join Date
    Aug 2009
    Posts
    262
    no mr.flump i have not invited any one . please get real

    I am currently working in a massive data handler organization . Oracle and sqlserver are running in parallel. oracle OLT db / warehouse . and sqlserver as the better half ( like a wife of oracle) .

    mr.flump indexing is not a greater practice that hey if we have data lets index it . NO.
    If you have worked as a DBA , tuning and performance is not about indexing every thing up. it comes after wards . and if this is hard for you to understand , contact your dba.

    I agree with mr. dav1mo , When archiving the primary issue is filtering.
    and when the data crosses the limit of large scale. it will eventually needed to be filtered. Creating tables separate for this purpose is DATA Replicating , an overhead in db which will not be approved by many dba.

    When i say a simple flag column , it is said for filtering purpose , to classify archived (dead) data apart from the archiving data( dying data) and the ALIVE data . (nice huh )
    And the real answer is......

    Create 2 Tables that are identical

    Keep your bit column if you like for your logical delete

    Create a trigger that will fire on an update

    Then move all rows from the current table to history if the bit flad is set to 1

    You can also create a view that is the union of both tables if you want to see everything

    MOO
    the basic of a developer , regardless of database software is the calculation of bits/byts . creating two table identical, yes it looks good at home pc , or stand alone database , but not in a large scale organization .

    create a trigger to register the flag , yes but this comes after wards when you decide you are going to flag or not .
    views are the overload , i donot avoid them but they are simply an overhead.

  13. #13
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by mishaalsy View Post
    no mr.flump i have not invited any one . please get real
    Ok.

    Quote Originally Posted by mishaalsy View Post
    mr.flump indexing is not a greater practice that hey if we have data lets index it . NO.
    Do you really think that is a revelation?

    Quote Originally Posted by mishaalsy View Post
    If you have worked as a DBA , tuning and performance is not about indexing every thing up. it comes after wards . and if this is hard for you to understand , contact your dba.
    You made me LOL.

    Quote Originally Posted by mishaalsy View Post
    Creating tables separate for this purpose is DATA Replicating
    Wrong
    Quote Originally Posted by mishaalsy View Post
    an overhead in db which will not be approved by many dba.
    Wrong

    Quote Originally Posted by mishaalsy View Post
    When i say a simple flag column , it is said for filtering purpose , to classify archived (dead) data apart from the archiving data( dying data) and the ALIVE data . (nice huh )
    No - not nice, unless it is indexed (which as I said is a rare requirement on a boolean). What you propose means that to identify your non-archived data you must scan the entire table (both "archived" and non-archived data). if this is hard for you to understand , contact your dba.

    Quote Originally Posted by mishaalsy View Post
    the basic of a developer , regardless of database software is the calculation of bits/byts
    Up till this point most of what you said at least made sense, which is a departure from your usual posts
    Quote Originally Posted by mishaalsy View Post
    creating two table identical, yes it looks good at home pc , or stand alone database , but not in a large scale organization .
    Wrong

    Quote Originally Posted by mishaalsy View Post
    views are [snip] simply an overhead.
    How do you come up with this rubbish?
    Testimonial:
    pootle flump
    ur codings are working excelent.

  14. #14
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by awu25 View Post
    dav1mo's scenario is exactly what I am facing. Can I use a flag column? If so should I create an index on the flag column?
    To address this rather than the squabbling, yes you can. If you do use a flag, then you must index it. However, whether or not this is an efficient course would require more information.

    The suggestion by blindman and Brett (having two tables) is the more typical approach to this and will work well almost whatever the specifics of your circumstances.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  15. #15
    Join Date
    Aug 2009
    Posts
    262
    thank you . as always you have proven me wrong . for which i am greatly thankful .

    i do would like to know are you a working proff. and in 0which kind of an organization you are.
    versions of your softwares and operations .

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •