Results 1 to 9 of 9
  1. #1
    Join Date
    Mar 2009
    Location
    Australia
    Posts
    113

    Unanswered: Tricky data import - suggestions

    I have a project to import data from an archaic system which I've completed, but because of the nature of the data that comes out of said system and the method I've used to import it it takes an hour or more to import a few hundred thousand records which as far as I'm concerned is too slow.

    Basically the problem is this old system exports the data one field per line in the text file export. So if you have a table with 20 fields, you have 20 lines of 1 field per line for record 1, then 20 lines for record 2 and so on.

    What I have done basically is import it into a temporary table, then I assign a field number and record number to the data record like thus:
    (Note: Record number is an identity value starting at 1 that is assigned to each line of the import file. @v_ColumnCount is a variable storing the number of columns for the file being processed - variable is used as there is 20 -30 files that are to be imported this way)

    Code:
    Update	##v_TempData
    Set	Field_Number = 
    		Case 
    			When Record_Number%@v_ColumnCount = 0 Then @v_ColumnCount
    			Else Record_Number%@v_ColumnCount
    		End,
    	Data_Record_Number = Ceiling(Cast(Record_Number as decimal(17,2))/Cast(@v_ColumnCount as decimal(17,2)))
    Field number is to store the ordinal position of the field in the destination table, i.e. 1 to 20, Data_Record_Number is to store the record number in the destination table.

    What I then do is insert into the destination table just field 1, selecting all records where field_number = 1.
    After that I loop through the rest of the fields updating with something that looks a little like this:
    Code:
    Set @v_UpdateString  = 'UPDATE	Import_'+@v_TableName+CHAR(13)
    Set @v_UpdateString += 'SET	['+@v_ColumnName+'] = TD.Record_Text'+CHAR(13)
    Set @v_UpdateString += 'FROM	Import_'+@v_TableName+' CI'+CHAR(13)
    Set @v_UpdateString += 'JOIN	##v_TempData TD' +CHAR(13)
    Set @v_UpdateString += '	ON	TD.Data_Record_Number = CI.Record_Key' +CHAR(13)
    Set @v_UpdateString += '	AND	TD.Field_Number = '+CAST(@v_LoopCounter as nvarchar)+CHAR(13)
    exec sp_executeSQL @v_UpdateString
    It's messy, I know... and it gets particularly dragged by one of the files which has 184 fields and about 120,000 records - that one alone takes 25+ minutes to pivot the records around this way. We're not talking about huge files here so I'd expect to be able to do it a lot quicker than this.

    I would really love to pull this time down as much as possible but I'm struggling for ideas on another way of handling the import process.

    Any suggestions would be most welcome - if you want some sample files let me know and I'll upload a full example with some scrambling of the sensitive data.

  2. #2
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4
    Perhaps you could convert those files into CSV files, by using a computer language you're fluent in (C#, Java, Python, ...). And then use BCP to import those CSV files into your database.
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

  3. #3
    Join Date
    Mar 2009
    Location
    Australia
    Posts
    113
    I was hoping to come up with a solution that would keep the process inside SQL server but yes I have thought of the need to develop an intermediate conversion program which I will do if I can't come up with anything else.

  4. #4
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,799
    Provided Answers: 11
    What is the row terminator? You might be able to import this with BCP yet. You just need to find a value (or perhaps create a format file - ick) and specify the field terminator is "\n".

    edit: I just tried this out with a small table from one of my own databases:
    Code:
    bcp dbname..tablename in data.txt /Sserver\instance /c /r"\n|\n" /t"\n" /T
    Names of the servers and objects have been changed to protect the innocent.....Not that he really deserved the protection, but hey.
    Last edited by MCrowley; 03-28-11 at 22:01.

  5. #5
    Join Date
    Mar 2009
    Location
    Australia
    Posts
    113
    Both the row and field terminator in the file is CR/LF so that won't really work.

    Example of some data, there's 2 records in the data below - 14 fields in each record thus 14 lines for each record, 28 lines in total for these 2 sample records
    +-+-+-+-+-+-+-+-+-+-Start Data+-+-+-+-+-+-+-+-+-+-
    105
    40321935
    30/04/2011
    04/03/2011

    16.28
    0
    0
    0
    0
    0

    1.48

    476
    0481674
    30/04/2011
    22/03/2011
    28535
    57.04
    0
    0
    0
    0
    0

    5.19

    +-+-+-+-+-+-+-+-+-+-End Data+-+-+-+-+-+-+-+-+-+-

    Repeat for a total of 146838 records in this particular file and that takes a while to try pivoting the data around in the manner I'm doing it.

  6. #6
    Join Date
    Feb 2004
    Location
    New Zealand
    Posts
    1,418
    Provided Answers: 7
    Been think about this for a couple of days
    well its been bugging me. LOL

    My plan would be do it in 2 bits
    write a VBscript

    use FSO to open the text file in one big chunk then
    use vb split to split the data by line feeds into a array
    then take the first 14 arrays and join then by the ","
    append that to a text file CSV
    then do the next 14 and join then by "," and append that to the text file .CSV

    as you have so many to do

    you can get the FSO to loop though a folder read each file and doing the above.

    FSO = Fileing System Object

    would look something like

    Const ForReading = 1, ForWriting = 2, ForAppending = 8
    Set fso = CreateObject("Scripting.FileSystemObject")
    Set MyFile = fso.OpenTextFile(FileName, ForReading)
    strText = MyFile.ReadAll

    splitit = split(strTest,vbnewline)

    'splitit array should have the data you want
    ' now lets join it

    GoodData=""
    for aa = 0 to 13 'only because we started at 0
    GoodData = GoodDate & splitit(aa) & ","
    next
    'take off the last ,
    Gooddata = mid(Gooddata,1,len(gooddata)-1)

    'now write the text file

    Set DoneFile = fso.OpenTextFile(NewFileName, ForAppending, True)
    DoneFile.WriteLine Gooddate
    DoneFile.Close
    .......
    .......
    and so on

    once done imput the CSV into the SQL System

    http://msdn.microsoft.com/en-us/libr...(v=vs.85).aspx

    and here

    http://www.4guysfromrolla.com/webtech/090599-1.shtml

    as debuging vb script is hard

    open excel and alt+f11 puts you into visual basic add a module

    and start coding

    See you on firday as I taking a kid on camp
    Last edited by myle; 03-29-11 at 04:10. Reason: spelling putting link in
    hope this help

    See clear as mud


    StePhan McKillen
    the aim is store once, not store multiple times
    Remember... Optimize 'til you die!
    Progaming environment:
    Access based on my own environment: DAO3.6/A97/A2000/A2003/A2007/A2010
    VB based on my own environment: vb6 sp5
    ASP based on my own environment: 5.6
    VB-NET based on my own environment started 2007
    SQL-2005 based on my own environment started 2008
    MYLE
    YOUR PASSWORD IS JUST LIKE YOUR TOOTHBRUSH DON'T SHARE IT.

  7. #7
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,799
    Provided Answers: 11
    Looks like you may need to deal with format files. Some people have great luck with them, but I am not one of them. Start by exporting from the table you want to load using a command like this (note the lack of either /c /n or /f switches):
    Code:
    bcp dbname..tablename out data.txt /Sserver\instance /T
    BCP will then prompt you for a few bits of information. BCP will make guesses, but if you go with those guesses, the data will be exported in native format. Here is an example of the responses for an integer field:
    Code:
    Enter the file storage type of field EnvironmentID [int]: char
    Enter prefix-length of field EnvironmentID [1]: 0
    Enter length of field EnvironmentID [12]:
    Enter field terminator [none]: \n
    Leaving a field blank gives you the default. All prefixes should be 0, all field storage (that means storage in the datafile, not the DB) should be char. The rows are separated by a count of fields, rather than by any delimiter.

  8. #8
    Join Date
    Aug 2004
    Location
    Dallas, Texas
    Posts
    831
    Possibly,

    For import create dynamic table:
    -- count rows
    -- using cursor create columns (col1,col2, etc.) for physical table which you drop later
    -- next insert into new table using cursor (slow)

    What you don't know until reading the file is how many columns. Either way the data has to be in column order to use, and so the rows need to pivot. Yep, tricky. VB script is probably the best method. I have a sample script that you can toy with and maybe it will work?

  9. #9
    Join Date
    Aug 2004
    Location
    Dallas, Texas
    Posts
    831
    Here's sample VB script to read file and insert into destination table. The code snippet was part of a larger project that read xml source and imported into data warehouse table(s).

    As I said, you will need to modify for your own use. I think you can try for/next loop or some other means to parse the import file into table.

    Code:
    '**********************************************************************
    '  Visual Basic ActiveX Script
    '************************************************************************
    
    Function Main()
    
      Dim objScrADORS
      Dim objDestADORS
      Dim objScrADOCnn
      Dim objDestADOCnn
    
      Dim objScrFields
      Dim objScrField
    
      Dim objDestFields
      Dim objDestField
    
      Dim filepath
      filepath = DTSGlobalVariables("@Path").Value
    
      Dim strAddressId
      Dim strSql
    
      ' add new records when table is empty otherwise these values will be assigned later
      rscnt = 0
      lastParsedRecord = 1
    
      Const adOpenForwardOnly = 0
      Const adLockBatchOptimistic = 4
      Const adCmdFile = 256
      Const adAddNew = 16778240
      
      Const adOpenKeyset = 1
      Const adLockPessimistic = 3
      Const adLockOptimistic = 2
    
            Set objScrADOCnn = CreateObject("ADODB.Connection")
            Set objScrADORS = CreateObject("ADODB.Recordset")
    
            Set objDestADOCnn = CreateObject("ADODB.Connection")
            Set objDestADORS = CreateObject("ADODB.Recordset")
    
    
            'The XML File Recordset (Source)
            objScrADOCnn.Open "Provider=MSDAOSP; Data Source=MSXML2.DSOControl.4.0;"
            objScrADORS.Open filepath & "INSERT FILE NAME HERE", objScrADOCnn, adOpenKeyset, adLockOptimistic
    
            On Error Resume Next
    
            ' Check for values in table because if non then we want to skip the delete/insert and just insert records -- initial load process
            objDestADOCnn.Open = "provider=SQLOLEDB;data source=" & DTSGlobalVariables("@datasource").Value & ";database=BOB;uid=" & DTSGlobalVariables("@Userid").Value & ";pwd=" & DTSGlobalVariables("@Password").Value
            objDestADORS.Open "SELECT TOP 1 * FROM DESTINATION_TABLE_HERE", objDestADOCnn, adOpenKeyset, adLockOptimistic
    
            ' Make sure we aren't starting with an empty table
            If Not objDestADORS.EOF Then
    
                    ' initialise counter
                    Dim i
                    ' build sql string for criteria
                    While Not objScrADORS.EOF
                            With objScrADORS
                                 strSql = strSql & "'" & objScrADORS.Fields("Col") & i & "',"
                            End With
                            objScrADORS.movenext
                    Wend
            
                    ' now strip off the trailing comma (if exist)
                    strSql = Left(strSql, Len(strSql) - 1)
            
                    ' go back to first record for source
                    objScrADORS.movefirst
    
                    objDestADORS.Close
                    objDestADOCnn.Close
    
                    'The SQL Destination
                    objDestADOCnn.Open = "provider=SQLOLEDB;data source=" & DTSGlobalVariables("@datasource").Value & ";database=BOB;uid=" & DTSGlobalVariables("@Userid").Value & ";pwd=" & DTSGlobalVariables("@Password").Value
                    objDestADORS.Open strSql, objDestADOCnn, adOpenKeyset, adLockOptimistic
    
            End If
            
            ' check to see if we have more records to insert
            If objDestADORS.EOF Then
    
                    objDestADORS.Close
                    objDestADOCnn.Close
            
                    Set objADOCnn = CreateObject("ADODB.Connection")
                    Set objADORS = CreateObject("ADODB.Recordset")
                    
                    objDestADOCnn.Open = "provider=SQLOLEDB;data source=" & DTSGlobalVariables("@datasource").Value & ";database=BOB;uid=" & DTSGlobalVariables("@Userid").Value & ";pwd=" & DTSGlobalVariables("@Password").Value
                    objDestADORS.Open "SELECT * FROM DESTINATION_TABLE_HERE WHERE 1=2", objDestADOCnn, adOpenKeyset, adLockOptimistic
    
    
            'Add New Records
            While Not objScrADORS.EOF
                    With objDestADORS
    
                             On Error Resume Next
                            .AddNew
                            .Fields ("Col") & i = objScrADORS.Fields("Col").Value
                            .Update
            
                            objScrADORS.movenext
                    End With
            Wend
            End If
    
              objScrADORS.Close
              objDestADORS.Close
              objScrADOCnn.Close
              objDestADOCnn.Close
    
            Main = DTSTaskExecResult_Success
    
    End Function

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •