Page 1 of 2 12 LastLast
Results 1 to 15 of 22
  1. #1
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912

    Unanswered: Import CSV in to SQL Server

    Hi

    I do this regularly using my own generated files. However, I am now importing CSV that I don't generate myself and can't control. It is, however, properly formatted CSV.

    What do you guys use?

    Summary:
    BCP - can't handle properly formatted CSV and requires post import processing
    BULK INSERT - wrapper for BCP so same issue
    SSIS - can't handle properly formatted CSV and requires additional code to fix
    LogParser - can't insert into VARCHAR(MAX) column
    DTS - works great but deprecated
    OPENROWSET using Text Driver - I can't get this to work and looing on the webs I am not alone in this

    Options yet to pursue:
    Power Shell - would prefer not to if I can avoid it - this is not an admin task
    .NET -

    A Fast CSV Reader - CodeProject

    Is there anything you guys can recommend?

    Ta
    Dan

  2. #2
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    I am not sure what you mean by "properly formatted", thus don't understand why BCP can't handle "properly formatted" CSV. If it is a true comma-delimited, and you know the structure in advance, you can still use BCP (or BULK INSERT for that matter) with a format file. If it is really properly formatted, meaning each character field is enclosed in double-quotes and separated by commas, - you can even use a linked server to Text File.
    "The data in a record depends on the Key to the record, the Whole Key, and
    nothing but the Key, so help me Codd."

  3. #3
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    I've never had a problem getting either DTS or SSIS to handle a properly formatted CSV file.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  4. #4
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by blindman View Post
    I've never had a problem getting either DTS or SSIS to handle a properly formatted CSV file.
    DTS agreed. SSIS?
    https://connect.microsoft.com/SQLSer...?wa=wsignin1.0
    https://connect.microsoft.com/SQLSer...s-in-csv-files
    https://connect.microsoft.com/SQLSer...e-import-fails

    https://connect.microsoft.com/SQLSer...-load-all-data
    Four Connect issues on this alone.

  5. #5
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by rdjabarov View Post
    I am not sure what you mean by "properly formatted", thus don't understand why BCP can't handle "properly formatted" CSV. If it is a true comma-delimited, and you know the structure in advance, you can still use BCP (or BULK INSERT for that matter) with a format file. If it is really properly formatted, meaning each character field is enclosed in double-quotes and separated by commas, - you can even use a linked server to Text File.
    Pull my plonker if I have ever been able to get anything but the simplest BCP format file working.

    If I posted a CSV file and table would you be prepared to help me sort out one Robert?

  6. #6
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    You bet! Post it.
    "The data in a record depends on the Key to the record, the Whole Key, and
    nothing but the Key, so help me Codd."

  7. #7
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Thanks Robert. I tried to construct as complicated a scenario as I could think of so as to cover off everything. However, I forgot that there is an easy solution to BCPing into a table selected columns out of order (by using a view) so I can change this if required. I have included a text column full of all sorts of valid but complicated text using different combinations of delimiters. Any help gratefully received.

    Code:
    /*
    SUMMARY - deomonstration of how to import properly formatted CSV in to table using BCP & format files.
    */
    
    /*
    Destination table for BCP import of properly formatted CSV.
    */
    IF EXISTS (SELECT NULL FROM sys.tables WHERE object_id = OBJECT_ID('dbo.bcp_test')) 
    BEGIN
        DROP TABLE dbo.bcp_test
    END
    
    CREATE TABLE dbo.bcp_test
        (
            id              INT IDENTITY(1, 1)                                  NOT NULL
          , col_excluded    INT                                                     NULL
          , col_1           INT                                                     NULL
          , col_2           INT                                                     NULL
          , col_3           INT                                                     NULL
          , vchar_col       VARCHAR(MAX)        COLLATE Latin1_General_CI_AS        NULL 
          , CONSTRAINT pk_bcp_test PRIMARY KEY CLUSTERED (id) WITH (FILLFACTOR = 100)
        ) ON [PRIMARY]
    GO
    
    /*
    CSV file contents. NOTE:
        * Column Delimiter:     ,
        * Text Delimiter:       "
        * Row Delimiter:        \n
        * Text is properly formatted and may contain any of the other delimiters.
        * Text is only delimited if it contains one of the three delimiters.
        * No header row for text.
        * There are four column in the CSV file. These respectively map to: col_2, col_3, col_1, vchar_col
    
    ============================================================================
    12,1001,1,poots
    10,1001,3,
    9,1001,4,NULL
    8,1001,5,"robert"
    7,1001,6,"blindman,pat"
    6,1001,7,"Wage Drone - 
    ,,MCrowley"
    3,1002,10,"more ""quoted, stuff"""
    2,1002,12,"""stuff"" quoted"
    1,1002,13,"no quotes to appear"
    
    
    BCPing the above should be equivalent to this:
    INSERT INTO dbo.bcp_test (col_2, col_3, col_1, vchar_col)
                SELECT    12, 1001, 1,  'poots'
    UNION ALL   SELECT    10, 1001, 3,  NULL
    UNION ALL   SELECT    9,  1001, 4,  'NULL'
    UNION ALL   SELECT    8,  1001, 5,  'robert'
    UNION ALL   SELECT    7,  1001, 6,  'blindman,pat'
    UNION ALL   SELECT    6,  1001, 7,  'Wage Drone -
    ,,MCrowley'
    UNION ALL   SELECT    3,  1002, 10, 'more "quoted, stuff"'
    UNION ALL   SELECT    2,  1002, 12, '"stuff" quoted'
    UNION ALL   SELECT    1,  1002, 13, 'no quotes to appear'
    
    So the once the BCP has completed the result of SELECT * should be: 
    SELECT  *
    FROM    dbo.bcp_test
    ==============================================================================
    */
    Attached Files Attached Files
    Last edited by pootle flump; 07-07-10 at 16:41. Reason: Removed line not conforming to rfc4180
    Testimonial:
    pootle flump
    ur codings are working excelent.

  8. #8
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    Here's what I got as a source for the test:
    Code:
    12,1001,1,poots
    10,1001,3,
    9,1001,4,NULL
    8,1001,5,robert
    7,1001,6,blindman,pat
    6,1001,7,Wage Drone -
    ,,MCrowley
    4,1002,9,quoted "stuff" in here
    3,1002,10,more "quoted, stuff"
    2,1002,12,"stuff" quoted
    1,1002,13,no quotes to appear
    Is that what needs to be used for the test?
    "The data in a record depends on the Key to the record, the Whole Key, and
    nothing but the Key, so help me Codd."

  9. #9
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Poots:

    Lines 2-4 don't meet RFC4180 (specifically point #5). Very few apps will successfully import those lines unless the app has been written to deal with arbitrary formatting/quoting.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  10. #10
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    Provided Answers: 11
    Do you at least have a guarantee that col_2 is always present?

    EDIT: Actually, I am not sure it really matters. If you change line 7 from ",,MCrowley" to "1,1002,13,no quotes to appear", you can not distinguish whether it is a continuation of the above record, or a brand new record.
    Last edited by MCrowley; 07-07-10 at 14:49.

  11. #11
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    Provided Answers: 11
    Very crude, somewhat error prone, based on several assumptions, and done mainly for my own curiosity:
    Code:
    use strict;
    
    my $col_2;
    my $col_3;
    my $col_4;
    my $prev_col_2;
    my $prev_col_3;
    my $prev_col_4;
    my $vchar_col;
    my $line;
    
    open (POOTS, "C:\\poots.csv") || die "Can not open pootle's file, because of this error: $!\n";
    
    
    $line = <POOTS>;
    do 
     {
      if ($line !~ /^(\d+),(\d+),(\d+),(.*)/)
       { #  Looks like a continuation line.  Really!  Who does this?...
        chomp($line);
        $vchar_col = $vchar_col . "\n" . $line;
        if ($prev_col_2 eq "")  #  Preserve old values...
         {$prev_col_2 = $col_2;
          $prev_col_3 = $col_3;
          $prev_col_4 = $col_4;
         }
       }  
      else
       {  #insert logic goes here
        if ($col_2 ne "")
         {print "insert into bcp_test (col_2, col_3, col_4, vchar_col) values ($col_2, $col_3, $col_4, '$vchar_col') \n\n";
         }
         $prev_col_2 = "";
         $prev_col_3 = "";
         $prev_col_4 = "";
    
        #  Begin new line
        ($col_2, $col_3, $col_4, $vchar_col) = $line =~ /^(\d+),(\d+),(\d+),(.*)/;
       }
      $line = <POOTS>;
     } while ($line ne "");

  12. #12
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by MCrowley View Post
    Do you at least have a guarantee that col_2 is always present?
    Yes.

    Quote Originally Posted by MCrowley View Post
    EDIT: Actually, I am not sure it really matters. If you change line 7 from ",,MCrowley" to "1,1002,13,no quotes to appear", you can not distinguish whether it is a continuation of the above record, or a brand new record.
    Code:
    6,1001,7,"Wage Drone - 
    ,,MCrowley"
    This is a single row - there is a \n in the text which is recognised because the text is delimited.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  13. #13
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by rdjabarov View Post
    Here's what I got as a source for the test:</snip>Is that what needs to be used for the test?
    That is what the final contents of the table should be.

    So for example, this is in the CSV:
    Code:
    8,1001,5,"robert"
    This is what will be saved to the table:
    Code:
    8,1001,5,robert
    EDIT - I have uploaded a file called pootle.txt (it is csv but DBF won't let me upload .csv extension) for your source. It is in post #7
    Last edited by pootle flump; 07-07-10 at 16:42.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  14. #14
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by Pat Phelan View Post
    Poots:

    Lines 2-4 don't meet RFC4180 (specifically point #5). Very few apps will successfully import those lines unless the app has been written to deal with arbitrary formatting/quoting.
    I tried two different tools and they worked fine with what I had. I don't actually understand your precise point - I reviewed everything and they certainly meet point #5 however one row violates #6:
    Code:
    4,1002,9,quoted "stuff" in here
    I have now removed this from the input and expected output.

    If that's not what you meant could you be more specific please?
    Testimonial:
    pootle flump
    ur codings are working excelent.

  15. #15
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Just to illustrate the challenge a bit I just stumbled on to this:
    https://connect.microsoft.com/SQLSer...?wa=wsignin1.0

    You'll need to be able to crack a nut Erland Sommarskog couldn't crack to do this with BCP.
    Testimonial:
    pootle flump
    ur codings are working excelent.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •