I will be receiving data from a company called Loan Performance, that has one file/table that will hold 1 billion rows. They send data by period, and I plan to load the data via BCP via NT/DOS scripts. The 1 billion rows represents data for 200+ periods.
Are the following design plans feasible
1. Partition table by period value, I'm not sure of the max number of partitions per table in 2005, but I think we have periods data back to 1992 and a new one gets created every month, so the possibility of having > 1000 partitions exists. I plan on just pre-creating partitions for future data, instead of dynamically creating when a new period is sent.
2. Load data via BCP in DOS shell scripts that will drop index (by partition), BCP in data, and they re-create indexes by partition, is this possible ? and will I see a performance increase as opposed to one huge table (I'm pretty much sure I will). There is usually one periods data present per day, but sometimes the vendor resends all data (would get loaded on weekend).
I'm a bit unsure of where to start being I never worked with this amount of data. I worked with partitioning in Oracle a long time ago.
I plan on having an 2XQuadCore 2.66Ghz CPU with 32GB of RAM and SQL2005EE 64Bit connected to 1 Terabyte SAN Disk.
I don't see any point dropping the index - if you plan to do this every time you get new data you will surely suffer performance degradation. Does the clustered index match the order of data in the file? Bulk insert has a flag option to indicate this that optimises the insert speed.
PF, the tables may have between 5-15 indexes (as recommended by vendor), I currently have this load process setup on SQL2000, and when I have indexes on the table (4), the BCP in slows down at least 10 fold. Is this BCP option available in SQL2000 ? My reason for wanting to use partitioning is for the index drop and create (to re-create indexes on 1 billion rows would take a day or so). I did not yet create a clustered index on the table (in SQL 2K) Also, the files are fixed format data with a .fmt file defined.
AFAIK you need a CI for table partitioning. I'm not certain though. Although I've used tables of > 500, 000 rows I've never bothered with partitioning. I would love to test differences. Presumably the benefit is only really for scans right? (and for building non clustered indexes according to your post). We bulk insert into these 500k+ tables regularly - the loading of the entire db (80GB of data per load, most updates to existing rows, some new rows) takes about 3-4 hours.
Also - BULK INSERT not BCP has the CI\ file order flag.
From what I've read the majority of benefits of partitioning are seen for data loading, which is my biggest issue, not really querying. I will need to look at the files, and or ask the vendor how the data is ordered and create a CI based on that. I have 2 birds to kill here, unfamiliarity with this amount of data, as well as the data itself.
I there anyone out there that has worked with this amount of data on SQL2005 ?
PF, thanks for your help, I will look into BulkCopy as opposed to BCP, I actually did write scripts to do so (REMd out in script), just was not aware of this option.
Our dbs are in SQL2k5 - I cannot recall how they performed in 2k....
Remember that a monotonically increasing clustered index is very quick for insert speed. Since you are partitioning on period, and your client is supplying data each period, this will be monotonically increasing right? So all but the latest partition will not actually change at all.
I think the penny has just dropped for me. It is really the partitioning of the NCIs that you expect to reap dividends from, rather than the table partitioning per say.