Results 1 to 7 of 7
  1. #1
    Join Date
    Jun 2005
    Posts
    319

    Unanswered: Is there an easy way to ignore data on inserts that violate a rule?

    I am currently importing tick data for a stock.

    let's say my table structure is like this:

    CREATE TABLE tick
    (
    tickId bigint identity(1,1) primary key
    , tickTime datetime
    , price money
    )

    If the stream of data I get resembles:

    '4/17/12 2:00:00.000', 10.00
    '4/17/12 2:00:02.000', 10.02
    '4/17/12 2:00:01.000', 10.01
    '4/17/12 2:00:03.000', 10.03

    I want my table to look like this:

    1, '4/17/12 2:00:00.000', 10.00
    2, '4/17/12 2:00:02.000', 10.02
    3, '4/17/12 2:00:03.000', 10.03

    Essentially ignoring the out of place '4/17/12 2:00:01.000' record. What is the least expensive way to accomplish this?

  2. #2
    Join Date
    Jun 2005
    Posts
    319
    I can put a clustered index on the date field, but I am counting on tickId to be sequential so I don't think that will help.

  3. #3
    Join Date
    Jun 2005
    Posts
    319
    I think I got it, I am bulk inserting the data from file so I can...

    1) BULK insert raw_table
    2) run:
    Code:
    DECLARE @maxTickTime datetime
    SELECT @maxTickTime = MAX(tickTime) FROM tick
    
    INSERT tick
    SELECT TOP 100 PERCENT tickTime, price
    FROM raw_table R
    WHERE R.tickTime > @maxTickTime
    ORDER BY tickTime
    Last edited by Gagnon; 04-18-12 at 11:27.

  4. #4
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4
    I don't see how that code could work.
    First, the highest tickTime is searched.
    Second, only records with a tickTime later than the highest possible TickTime are inserted.

    No records will match, by definition: Who is the tallest guy in class? Now will all guys taller than the tallest guy raise their hands? Not even the tallest guy may raise his hand, because he is not taller than himself.

    Shouldn't your code be rather like this:
    Code:
    INSERT tick
    SELECT R.tickTime, R.price
    FROM raw_table R
    	INNER JOIN raw_table L ON
    		R.id = L.Id - 1
    WHERE R.tickTime < L.TickTime
    
    -- extra code to transfer the last record
    INSERT tick
    SELECT TOP 1 L.tickTime, L.price
    FROM raw_table L
    	INNER JOIN raw_table R ON
    		L.id = R.Id + 1
    WHERE L.tickTime > R.TickTime
    ORDER BY L.ID DESC
    The Raw_table also needs an IDENTITY column Id.

    Essentially ignoring the out of place '4/17/12 2:00:01.000' record.
    I interpret "ignoring" as "not inserting it".
    Last edited by Wim; 04-18-12 at 11:55.
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

  5. #5
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Don't load your data directly into a production table.
    Load it into a staging table, and then you can muck about with it to your heart's content before pushing it into production.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  6. #6
    Join Date
    Jun 2005
    Posts
    319
    Quote Originally Posted by Wim View Post
    I don't see how that code could work.
    First, the highest tickTime is searched.
    Second, only records with a tickTime later than the highest possible TickTime are inserted.

    No records will match, by definition: Who is the tallest guy in class? Now will all guys taller than the tallest guy raise their hands? Not even the tallest guy may raise his hand, because he is not taller than himself.

    Shouldn't your code be rather like this:
    Code:
    INSERT tick
    SELECT R.tickTime, R.price
    FROM raw_table R
    	INNER JOIN raw_table L ON
    		R.id = L.Id - 1
    WHERE R.tickTime < L.TickTime
    
    -- extra code to transfer the last record
    INSERT tick
    SELECT TOP 1 L.tickTime, L.price
    FROM raw_table L
    	INNER JOIN raw_table R ON
    		L.id = R.Id + 1
    WHERE L.tickTime > R.TickTime
    ORDER BY L.ID DESC
    The Raw_table also needs an IDENTITY column Id.

    I interpret "ignoring" as "not inserting it".
    Maybe my OP didn't provide enough detail. I have a .Net application that inserts 10 records at a time into a file then calls a stored proc to bulk insert the data. Your height comparison doesn't really fit since as new tick data is made available it is newer in date then the previous records. In a perfect world we would only be receiving data with newer date values.

    Originally I was thinking of putting a table constraint on the table and then just ignoring any new rows that failed the constraint automatically. The solution I came up with seems to work for now.

  7. #7
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4
    My mistake. Your code is correct.

    SELECT @maxTickTime = MAX(tickTime) FROM tick

    FROM raw_table L
    ...
    WHERE L.tickTime > R.TickTime

    @maxTickTime is extracted from another table that rw_table. I missed that.
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •