If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > Sybase > Help Sybase insert millions of records

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 01-16-12, 16:46
srattani srattani is offline
Registered User
 
Join Date: Jan 2012
Posts: 2
Help Sybase insert millions of records

Hello,

I have a comma separated file approx 60MB which has 4.4 million records in it. Following is how I load the data into concrete table, the performance is very slow. It takes almost 15 minutes to load the 4.4 million records.

1) BCP file into a pm_volcurves_histdata working table which does not have any index on it. The BCP for 4.4 million rows into working pm_volcurves_histdata table takes approx 3 minutes

2) Then I call a stored procedure which loads data from pm_volcurves_histdata working table to actual pm_volcurves_data table. The indexes on the actual table are dropped prior to executing the stored procedure. Following is the code for stored procedure. Basically processing 40,000 at a time

Code:
select @cnt = count(1) from pm_volcurves_histdata

while ( @cnt>0 )
begin
  begin
        begin tran
        set rowcount 40000
			
      /* Get batch of 40000 records from pm_volcurves_histdata into  #temp_pm_volcurves_data */
	select curve_id, commodity, currency, rundate, term, strike, volatility
	into #temp_pm_volcurves_data
	from pm_volcurves_histdata
				
	create index #temp_pm_volcurves_data_idx on #temp_pm_volcurves_data(rundate, curve_id)
			
		
	/* Delete the batch of 40000 records that we got into #temp_pm_volcurves_data from pm_volcurves_histdata */
	delete from pm_volcurves_histdata
			
	set rowcount 0
        commit tran
  end

begin tran
		
  /* Insert the curve if it is not present in pm_volcurves table */
      insert into pm_volcurves (curve_id,commodity, currency)
      select distinct curve_id, commodity, currency  from #temp_pm_volcurves_data a	
      where not exists (select b.curve_id from pm_volcurves b where b.curve_id=a.curve_id)

  /* Insert the data into pm_volcurves_data table */
      insert into pm_volcurves_data(sequence_no, rundate, term,strike, volatility )
      select b.sequence_no, a****ndndate, a.term, a.strike, a.volatility 
      from #temp_pm_volcurves_data a, pm_volcurves b 
      where a.curve_id=b.curve_id
			
   commit tran

drop table #temp_pm_volcurves_data
select @cnt=count(1) from pm_volcurves_histdata
end
I did print out the time for each iteration of loop and each iteration of loop is taking approx 3 seconds and it is processing 40,000 at a time.
I would really appreciate your help. There is requirement to load another such data which is 65 million records.

Last edited by srattani; 01-16-12 at 22:04.
Reply With Quote
  #2 (permalink)  
Old 01-17-12, 16:38
pdreyer pdreyer is offline
Registered User
 
Join Date: May 2005
Location: South Africa
Posts: 1,268
You don't seem to use the index on the temp table
idx on #temp_pm_volcurves_data(rundate, curve_id)
Try index on only curve_id
Don't know what your other indexes look like so can't comment.
Try bigger batches

Or depending on your data, maybe try using batches of curve_id and rundate to eliminate the temp table and delete step
Reply With Quote
  #3 (permalink)  
Old 01-23-12, 14:41
srattani srattani is offline
Registered User
 
Join Date: Jan 2012
Posts: 2
Hello pdreyer,

Thanks for your reply.

I did take away rundate from the index but it did not shave off any significant time overall roughly 10 seconds.

How big of a batch size should I use?

There are 4 unique curve ids and approx 1000 different rundates, but problem is since I BCP data from file to pm_volcurves_histdata table, this pm_volcurves_histdata table has no index on it (to make BCP faster) so any select if I do on this table with a criteria i.e. where curve_id= this or rundate=that will do a full table scan on 4.4 million rows which will make it even more slow.

Any other advise? I would really appreciate any help.
Reply With Quote
  #4 (permalink)  
Old 01-24-12, 14:34
pdreyer pdreyer is offline
Registered User
 
Join Date: May 2005
Location: South Africa
Posts: 1,268
So create the index after bcp in?
The batch size is dependent on your log size
60MB for 4.4M rows doesn't seem that big
Watch your log space growth on the 40000
maybe batches of 300 thousand+
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On