Results 1 to 8 of 8
  1. #1
    Join Date
    Jun 2004
    Location
    Long Island
    Posts
    696

    1 billion row table (Loan Performance)

    I will be receiving data from a company called Loan Performance, that has one file/table that will hold 1 billion rows. They send data by period, and I plan to load the data via BCP via NT/DOS scripts. The 1 billion rows represents data for 200+ periods.

    Are the following design plans feasible

    1. Partition table by period value, I'm not sure of the max number of partitions per table in 2005, but I think we have periods data back to 1992 and a new one gets created every month, so the possibility of having > 1000 partitions exists. I plan on just pre-creating partitions for future data, instead of dynamically creating when a new period is sent.

    2. Load data via BCP in DOS shell scripts that will drop index (by partition), BCP in data, and they re-create indexes by partition, is this possible ? and will I see a performance increase as opposed to one huge table (I'm pretty much sure I will). There is usually one periods data present per day, but sometimes the vendor resends all data (would get loaded on weekend).

    I'm a bit unsure of where to start being I never worked with this amount of data. I worked with partitioning in Oracle a long time ago.

    I plan on having an 2XQuadCore 2.66Ghz CPU with 32GB of RAM and SQL2005EE 64Bit connected to 1 Terabyte SAN Disk.

    Thanks all,
    PMA
    Last edited by PMASchmed; 03-28-08 at 10:02.

  2. #2
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,910
    Hi

    I don't see any point dropping the index - if you plan to do this every time you get new data you will surely suffer performance degradation. Does the clustered index match the order of data in the file? Bulk insert has a flag option to indicate this that optimises the insert speed.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  3. #3
    Join Date
    Jun 2004
    Location
    Long Island
    Posts
    696
    PF, the tables may have between 5-15 indexes (as recommended by vendor), I currently have this load process setup on SQL2000, and when I have indexes on the table (4), the BCP in slows down at least 10 fold. Is this BCP option available in SQL2000 ? My reason for wanting to use partitioning is for the index drop and create (to re-create indexes on 1 billion rows would take a day or so). I did not yet create a clustered index on the table (in SQL 2K) Also, the files are fixed format data with a .fmt file defined.
    Last edited by PMASchmed; 03-28-08 at 10:20.

  4. #4
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,910
    AFAIK you need a CI for table partitioning. I'm not certain though. Although I've used tables of > 500, 000 rows I've never bothered with partitioning. I would love to test differences. Presumably the benefit is only really for scans right? (and for building non clustered indexes according to your post). We bulk insert into these 500k+ tables regularly - the loading of the entire db (80GB of data per load, most updates to existing rows, some new rows) takes about 3-4 hours.

    Also - BULK INSERT not BCP has the CI\ file order flag.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  5. #5
    Join Date
    Jun 2004
    Location
    Long Island
    Posts
    696
    From what I've read the majority of benefits of partitioning are seen for data loading, which is my biggest issue, not really querying. I will need to look at the files, and or ask the vendor how the data is ordered and create a CI based on that. I have 2 birds to kill here, unfamiliarity with this amount of data, as well as the data itself.

    I there anyone out there that has worked with this amount of data on SQL2005 ?

    PF, thanks for your help, I will look into BulkCopy as opposed to BCP, I actually did write scripts to do so (REMd out in script), just was not aware of this option.

  6. #6
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,910
    Our dbs are in SQL2k5 - I cannot recall how they performed in 2k....

    Remember that a monotonically increasing clustered index is very quick for insert speed. Since you are partitioning on period, and your client is supplying data each period, this will be monotonically increasing right? So all but the latest partition will not actually change at all.

    I think the penny has just dropped for me. It is really the partitioning of the NCIs that you expect to reap dividends from, rather than the table partitioning per say.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  7. #7
    Join Date
    Jun 2004
    Location
    Long Island
    Posts
    696
    All past periods will never increase in size, but data may be resent for past periods due to re-classification of loans within that data period.

    Data is typically deleted/replace for a period/collateral type (represents a file for a respective period), never an append, entire file is sent, not just the delta on resends at the file level.

    The CI will probably be a composite type, period, collateral and loan_id. I have contacted the vendor to verify this is how the data is ordered.

    One period has many collaterals that have many loan ids. (file is one collateral for a specific period that contains many loan_ids).

    maybe the upgrade to SQL2K5EE64 will increase bulk copy speed alone, I will test with multiple indexes on the table before I partition.

    Thanks much,
    PMA
    Last edited by PMASchmed; 03-28-08 at 11:06.

  8. #8
    Join Date
    Jun 2004
    Location
    Long Island
    Posts
    696
    Need help on this guys.

    I see articles to load staging table then alter table switch (will always be a complete load by period (partition) if resend of data).

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •