Results 1 to 6 of 6
  1. #1
    Join Date
    Dec 2003
    Posts
    44

    Unanswered: SQL 2005 SSIS - Rowcounts...

    All,

    Got a question about SSIS performance and rowcounts. I have an 8 gig flat file that I'm reading/transforming in SSIS and it's got 3 conditional splits. That bit alone took 3 minutes for SSIS to process.

    I added rowcount transforms after each of the three splits so I could verify my rowcount tallies.

    Now it takes 20 mins for SSIS - just to get to the row counts. Anyone know why? SSIS knew what my rowcounts were prior to my adding the rowcount xform because it reports that to me in my flow diagram for each conditional split. Why does it take nearly 7 times longer now - and what can I do about that.

    Any insights would be appreciated.

    Isaac

  2. #2
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Do not use SSIS, DTS, or any other package to transform data.
    Use these tools only for loading data into staging tables, and then run sprocs on your staging tables to transform, cleanse, verify, and distribute the data.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  3. #3
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    I've done ETL both ways, using packages designed to automate much of the process (like DTS, SSIS, and many others) and also by using custom written applications (in several languages).

    I've found that the 80/20 rule definitely applies to what ETL I've been asked to do... About eighty percent of the ETL I've been asked to do is best handled with tools built to automate ETL processes. About twenty percent of the ETL I've been asked to do is best handled via custom written applications.

    Your milage will vary. Easy ETL tasks (which are what everyone starts with) are far easier and more efficiently handled by tools designed to do the bulk of the work for you. More complex ETL tasks will probably require custom coded solutions.

    -PatP

  4. #4
    Join Date
    Dec 2003
    Posts
    44
    While philosophical discussions on the value of having diverse approaches to ETL are important, in this case, I'd like some direct feedback on why a rowcount transform appears to add so much additional overhead to an SSIS flow. Why isn't the row count just a property of the conditional split that I can interrogate later/elsewhere in the flow?

    Isaac

  5. #5
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Philosophical discussions aside, because SSIS is an inefficient tool for set-based transformations and most of its functionality is only there for h@ck0rz who don't know SQL.
    That is why a rowcount transform appears to add so much additional overhead to an SSIS flow and the row count isn't just a property of the conditional split that you can interrogate later/elsewhere in the flow.
    You are using a crappy development environment, so expect crappy performance and needlessly complicated coding.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  6. #6
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Quote Originally Posted by ieruiz
    While philosophical discussions on the value of having diverse approaches to ETL are important, in this case, I'd like some direct feedback on why a rowcount transform appears to add so much additional overhead to an SSIS flow. Why isn't the row count just a property of the conditional split that I can interrogate later/elsewhere in the flow?

    Isaac
    I beleive this is because RowCount of a file is retrieved as a completely separate step, almost like a parsing of the file. It is not calculated during workflow tasks.

    Because you have this conditional split, which I'm assuming you're trying to retrieve the counts from each split... It's parsing the file the same number of times as there are splits, performing the conditions in memory before returning the result.

    Please note that this is based off previous observations, not anything I've read elsewhere.
    George
    Home | Blog

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •