Results 1 to 4 of 4
  1. #1
    Join Date
    Jan 2007
    Posts
    56

    Unanswered: Automating validation of structure/data of CSV file

    Hi,

    We have the following scenario: We receive CSV files every month for which SSIS packages were built to process the data. The following problems occur from time to time:

    1. The structure of the CSV file changed (e.g. column added or removed)
    2. There were no footers in the data, but now footers started to appear
    3. Date format changed (e.g. used to be mm/dd/yyyy, but became mm.dd.yyyy)
    4. Number format changed (e.g. from 2000 to 2,000)

    Currently we have person who manually opens each file, and using our "validation document" validates to ensure none of these or similar problems occur. We would like to move away from this manual process if possible and are looking for suggestions.

    I understand that items 3. and 4. could be caught by loading data into a staging table with VARCHAR data types, and performing validation before moving it any further.

    Item 2 is a bit questionable (meaning depending on the footer size SSIS load could fail or not).

    Item 1, however, is a sure fail of the SSIS package that directly loads the data into a table.

    Thus I feel the two possible options are:

    1. Create a custom script that will run through the file, row by row, apply all the necessary validations and report an error or continue if all checks out

    2. Use some 3rd party tool to validate the files (semi-manually) before kicking off the SSIS processing.

    My questions are:

    1. If you've had encountered a similar problem, how did you resolve it? If you did build a custom script, could you share, or do you know of some Framework that was built that could be used somewhat as plug and play?

    2. Does anyone know of good 3rd party tool(s) to assist in this process?

    Thanks in advance!

  2. #2
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    My clients use a mish-mash of techniques to deal with this kind of problem.

    The easiest approach (by far) is to simply modify the SSIS package to send the input that fails to conform to an exception file. This lets rows that pass trivial validation to move on through the SSIS package, and culls the data that a human has to examine down to only processing failures.

    A more efficient way to is to use a validation script. Powershell works well, Perl is another good choice, and AWK is idiot-simple and blazingly fast for this kind of task. All of these can detect problems, and you can add code to the script to automagically "fix" most kinds of errors too.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  3. #3
    Join Date
    Aug 2004
    Location
    Dallas, Texas
    Posts
    831
    Quote Originally Posted by sql_er View Post
    Hi,

    We have the following scenario: We receive CSV files every month for which SSIS packages were built to process the data. The following problems occur from time to time:
    I reject the file and have the originator resubmit to our formatted guidelines. No exceptions.

  4. #4
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Quote Originally Posted by corncrowe View Post
    I reject the file and have the originator resubmit to our formatted guidelines. No exceptions.
    I like simple rules like that for dealing with hard problems! So when a vendor upgrades their switch software, do you send the file back to the offending switch or to the vendor? How does the CFO or the VP in charge of billing react?

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •