If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > Microsoft SQL Server > 1 billion row table (Loan Performance)

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old
Registered User
 
Join Date: Jun 2004
Location: Long Island
Posts: 696
1 billion row table (Loan Performance)

I will be receiving data from a company called Loan Performance, that has one file/table that will hold 1 billion rows. They send data by period, and I plan to load the data via BCP via NT/DOS scripts. The 1 billion rows represents data for 200+ periods.

Are the following design plans feasible

1. Partition table by period value, I'm not sure of the max number of partitions per table in 2005, but I think we have periods data back to 1992 and a new one gets created every month, so the possibility of having > 1000 partitions exists. I plan on just pre-creating partitions for future data, instead of dynamically creating when a new period is sent.

2. Load data via BCP in DOS shell scripts that will drop index (by partition), BCP in data, and they re-create indexes by partition, is this possible ? and will I see a performance increase as opposed to one huge table (I'm pretty much sure I will). There is usually one periods data present per day, but sometimes the vendor resends all data (would get loaded on weekend).

I'm a bit unsure of where to start being I never worked with this amount of data. I worked with partitioning in Oracle a long time ago.

I plan on having an 2XQuadCore 2.66Ghz CPU with 32GB of RAM and SQL2005EE 64Bit connected to 1 Terabyte SAN Disk.

Thanks all,
PMA

Last edited by PMASchmed; 03-28-08 at 10:02.
Reply With Quote
  #2 (permalink)  
Old
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,910
Hi

I don't see any point dropping the index - if you plan to do this every time you get new data you will surely suffer performance degradation. Does the clustered index match the order of data in the file? Bulk insert has a flag option to indicate this that optimises the insert speed.
__________________
Testimonial:
Quote:
pootle flump
ur codings are working excelent.
Reply With Quote
  #3 (permalink)  
Old
Registered User
 
Join Date: Jun 2004
Location: Long Island
Posts: 696
PF, the tables may have between 5-15 indexes (as recommended by vendor), I currently have this load process setup on SQL2000, and when I have indexes on the table (4), the BCP in slows down at least 10 fold. Is this BCP option available in SQL2000 ? My reason for wanting to use partitioning is for the index drop and create (to re-create indexes on 1 billion rows would take a day or so). I did not yet create a clustered index on the table (in SQL 2K) Also, the files are fixed format data with a .fmt file defined.

Last edited by PMASchmed; 03-28-08 at 10:20.
Reply With Quote
  #4 (permalink)  
Old
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,910
AFAIK you need a CI for table partitioning. I'm not certain though. Although I've used tables of > 500, 000 rows I've never bothered with partitioning. I would love to test differences. Presumably the benefit is only really for scans right? (and for building non clustered indexes according to your post). We bulk insert into these 500k+ tables regularly - the loading of the entire db (80GB of data per load, most updates to existing rows, some new rows) takes about 3-4 hours.

Also - BULK INSERT not BCP has the CI\ file order flag.
__________________
Testimonial:
Quote:
pootle flump
ur codings are working excelent.
Reply With Quote
  #5 (permalink)  
Old
Registered User
 
Join Date: Jun 2004
Location: Long Island
Posts: 696
From what I've read the majority of benefits of partitioning are seen for data loading, which is my biggest issue, not really querying. I will need to look at the files, and or ask the vendor how the data is ordered and create a CI based on that. I have 2 birds to kill here, unfamiliarity with this amount of data, as well as the data itself.

I there anyone out there that has worked with this amount of data on SQL2005 ?

PF, thanks for your help, I will look into BulkCopy as opposed to BCP, I actually did write scripts to do so (REMd out in script), just was not aware of this option.
Reply With Quote
  #6 (permalink)  
Old
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,910
Our dbs are in SQL2k5 - I cannot recall how they performed in 2k....

Remember that a monotonically increasing clustered index is very quick for insert speed. Since you are partitioning on period, and your client is supplying data each period, this will be monotonically increasing right? So all but the latest partition will not actually change at all.

I think the penny has just dropped for me. It is really the partitioning of the NCIs that you expect to reap dividends from, rather than the table partitioning per say.
__________________
Testimonial:
Quote:
pootle flump
ur codings are working excelent.
Reply With Quote
  #7 (permalink)  
Old
Registered User
 
Join Date: Jun 2004
Location: Long Island
Posts: 696
All past periods will never increase in size, but data may be resent for past periods due to re-classification of loans within that data period.

Data is typically deleted/replace for a period/collateral type (represents a file for a respective period), never an append, entire file is sent, not just the delta on resends at the file level.

The CI will probably be a composite type, period, collateral and loan_id. I have contacted the vendor to verify this is how the data is ordered.

One period has many collaterals that have many loan ids. (file is one collateral for a specific period that contains many loan_ids).

maybe the upgrade to SQL2K5EE64 will increase bulk copy speed alone, I will test with multiple indexes on the table before I partition.

Thanks much,
PMA

Last edited by PMASchmed; 03-28-08 at 11:06.
Reply With Quote
  #8 (permalink)  
Old
Registered User
 
Join Date: Jun 2004
Location: Long Island
Posts: 696
Need help on this guys.

I see articles to load staging table then alter table switch (will always be a complete load by period (partition) if resend of data).
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On