Results 1 to 4 of 4
  1. #1
    Join Date
    Jan 2012
    Posts
    2

    Unanswered: Help Sybase insert millions of records

    Hello,

    I have a comma separated file approx 60MB which has 4.4 million records in it. Following is how I load the data into concrete table, the performance is very slow. It takes almost 15 minutes to load the 4.4 million records.

    1) BCP file into a pm_volcurves_histdata working table which does not have any index on it. The BCP for 4.4 million rows into working pm_volcurves_histdata table takes approx 3 minutes

    2) Then I call a stored procedure which loads data from pm_volcurves_histdata working table to actual pm_volcurves_data table. The indexes on the actual table are dropped prior to executing the stored procedure. Following is the code for stored procedure. Basically processing 40,000 at a time

    Code:
    select @cnt = count(1) from pm_volcurves_histdata
    
    while ( @cnt>0 )
    begin
      begin
            begin tran
            set rowcount 40000
    			
          /* Get batch of 40000 records from pm_volcurves_histdata into  #temp_pm_volcurves_data */
    	select curve_id, commodity, currency, rundate, term, strike, volatility
    	into #temp_pm_volcurves_data
    	from pm_volcurves_histdata
    				
    	create index #temp_pm_volcurves_data_idx on #temp_pm_volcurves_data(rundate, curve_id)
    			
    		
    	/* Delete the batch of 40000 records that we got into #temp_pm_volcurves_data from pm_volcurves_histdata */
    	delete from pm_volcurves_histdata
    			
    	set rowcount 0
            commit tran
      end
    
    begin tran
    		
      /* Insert the curve if it is not present in pm_volcurves table */
          insert into pm_volcurves (curve_id,commodity, currency)
          select distinct curve_id, commodity, currency  from #temp_pm_volcurves_data a	
          where not exists (select b.curve_id from pm_volcurves b where b.curve_id=a.curve_id)
    
      /* Insert the data into pm_volcurves_data table */
          insert into pm_volcurves_data(sequence_no, rundate, term,strike, volatility )
          select b.sequence_no, a****ndndate, a.term, a.strike, a.volatility 
          from #temp_pm_volcurves_data a, pm_volcurves b 
          where a.curve_id=b.curve_id
    			
       commit tran
    
    drop table #temp_pm_volcurves_data
    select @cnt=count(1) from pm_volcurves_histdata
    end
    I did print out the time for each iteration of loop and each iteration of loop is taking approx 3 seconds and it is processing 40,000 at a time.
    I would really appreciate your help. There is requirement to load another such data which is 65 million records.
    Last edited by srattani; 01-16-12 at 23:04.

  2. #2
    Join Date
    May 2005
    Location
    South Africa
    Posts
    1,365
    Provided Answers: 1
    You don't seem to use the index on the temp table
    idx on #temp_pm_volcurves_data(rundate, curve_id)
    Try index on only curve_id
    Don't know what your other indexes look like so can't comment.
    Try bigger batches

    Or depending on your data, maybe try using batches of curve_id and rundate to eliminate the temp table and delete step

  3. #3
    Join Date
    Jan 2012
    Posts
    2
    Hello pdreyer,

    Thanks for your reply.

    I did take away rundate from the index but it did not shave off any significant time overall roughly 10 seconds.

    How big of a batch size should I use?

    There are 4 unique curve ids and approx 1000 different rundates, but problem is since I BCP data from file to pm_volcurves_histdata table, this pm_volcurves_histdata table has no index on it (to make BCP faster) so any select if I do on this table with a criteria i.e. where curve_id= this or rundate=that will do a full table scan on 4.4 million rows which will make it even more slow.

    Any other advise? I would really appreciate any help.

  4. #4
    Join Date
    May 2005
    Location
    South Africa
    Posts
    1,365
    Provided Answers: 1
    So create the index after bcp in?
    The batch size is dependent on your log size
    60MB for 4.4M rows doesn't seem that big
    Watch your log space growth on the 40000
    maybe batches of 300 thousand+

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •