Page 1 of 2 12 LastLast
Results 1 to 15 of 18
  1. #1
    Join Date
    Aug 2006
    Posts
    56

    Unanswered: DBLOAD for mass import?

    Hi

    I have a file that is comma delimitated and I want to import the rows into Sybase. I found some info on DBLOAD, but I cannot find an example of how it works.

    Also, the fields in the file do not match one-for-one to the database table columns I wish to import into. I need some way to map the values and I am not sure DBLOAD can do that.

    Any suggestions?

  2. #2
    Join Date
    Aug 2006
    Posts
    1
    Hi,

    you might want to try bcp, a client-side tool for bulk loading into a server. Create a temp table formatted just like your file, bcp in and distribute to final tables. Check the reference manuals for syntax.

    Good luck!

  3. #3
    Join Date
    Sep 2002
    Location
    Sydney, Australia
    Posts
    255
    Alternately, you can use bcp-out creating a format file (speciying the columns you have, ignoring the columns you do not have) for the purpose of using the format file for bcp-in.

    If you have large volumes of data, get the data file onto the host system, and then run bcp, it will load much faster.

    If you drop the indices on the table for the duration of bcp-in, it will be much faster (and substantially smaller tran log laod).
    Last edited by DerekA; 08-07-06 at 03:58.
    Derek Asirvadem
    Senior Sybase DBA/Information Architect derekATsoftwaregemsDOTcomDOTau
    Anything worth doing is worth doing Right The First Time
    Spend your money on standards-compliant development or spend 10 times more fixing it

  4. #4
    Join Date
    Aug 2006
    Posts
    56
    Thanks, I will look up bcp in the Sybase manual.

    Also, is DBLOAD not capable of choosing individual fields?
    Last edited by mburke; 08-07-06 at 11:19.

  5. #5
    Join Date
    Aug 2006
    Posts
    56
    I am having a problem with the format. One of my data files uses a space to delimitate each field. But on the last field it has spaces because it is an address field. Example...

    00111 506 MAPLE DRIVE SCHUAMBURG
    00727 22 CALLE C # A

    So after the second field, it becomes a free flowing string. I can't change this file format because its from a vender. Can bcp handle a situation like this?

  6. #6
    Join Date
    Sep 2002
    Location
    Sydney, Australia
    Posts
    255
    Evidently your file was created by vending idiots, and not for the purpose of reading into another computer program. Two choices: Either write your own program to create the file, from whatever vended "database" it came, to use STANDARD delimiters for fields; enclose spaces in quotation marks; escape special characters and otherwise behave in a sociable manner.

    OR read awk or perl manuals, and use it to edit/transpose fields. perl is beautiful here and does it in stream, plus it has a client connection to Sybase (db handle) courtesy of Monsieur Peppler, so you can do the data cleansing and loading in one very fast pass.
    Derek Asirvadem
    Senior Sybase DBA/Information Architect derekATsoftwaregemsDOTcomDOTau
    Anything worth doing is worth doing Right The First Time
    Spend your money on standards-compliant development or spend 10 times more fixing it

  7. #7
    Join Date
    Sep 2002
    Location
    Sydney, Australia
    Posts
    255
    Also, is DBLOAD not capable of choosing individual fields?
    No idea about DBLOAD, however if you want to load data into Sybase (as posted) you can use bcp, and bcp is quite capable of choosing columns. In the default mode (native or char) bcp assumes all columns, but you can select specific columns by specifying a FORMAT file to bcp (RTFM). The easiest way to do this is to get bcp to produce the format file, by doing a (dummy) bcp out; then use the format file for the bcp in.
    Derek Asirvadem
    Senior Sybase DBA/Information Architect derekATsoftwaregemsDOTcomDOTau
    Anything worth doing is worth doing Right The First Time
    Spend your money on standards-compliant development or spend 10 times more fixing it

  8. #8
    Join Date
    Aug 2006
    Posts
    56
    Thanks Derek. Yes, the file is coming from an unruly source. Another problem is the target table has an index column not reflected in the source data file. So that mucks things up to.

    I would like to just write a simple program to do the migration. I will see what I can talk the architects into.

  9. #9
    Join Date
    Sep 2002
    Location
    Sydney, Australia
    Posts
    255
    Quote Originally Posted by mburke
    Another problem is the target table has an index column not reflected in the source data file. So that mucks things up to.
    Can you expand ...
    There are straightforward options for handling indexes on the target with regard to speed, logging, RI, etc. You just have to be clear about what you want.
    Derek Asirvadem
    Senior Sybase DBA/Information Architect derekATsoftwaregemsDOTcomDOTau
    Anything worth doing is worth doing Right The First Time
    Spend your money on standards-compliant development or spend 10 times more fixing it

  10. #10
    Join Date
    Aug 2006
    Posts
    56
    Well, the table has an indexed value in the first column. It also has a 'last update date' on the tail end. These columns are not represented in the data file. So I am not sure how to get bcp to handle it.

  11. #11
    Join Date
    May 2005
    Location
    South Africa
    Posts
    1,365
    Provided Answers: 1
    1st:
    Quote Originally Posted by mburke
    I have a file that is comma delimitated
    Later:
    Quote Originally Posted by mburke
    I am having a problem with the format. One of my data files uses a space to delimitate each field. But on the last field it has spaces because it is an address field. Example...

    00111 506 MAPLE DRIVE SCHUAMBURG
    00727 22 CALLE C # A

    So after the second field, it becomes a free flowing string. I can't change this file format because its from a vender. Can bcp handle a situation like this?
    Space or comma delimited is both problematic
    e.g. Address: Johannesburg, South Africa

    A fixed position would be better e.g. if you can always expect to find the address in e.g column 50-100.

    But as long as it is only the last field, bcp can handle it as the newline character delimits the last field. And indexes will be no problem unless you try and insert duplicate data into a unique index.

    e.g. Below format files tell bcp to insert file columns 1,2 and 3 into table columns 1,2 and 4 respectively.
    Code:
    1> create table t1 
    2> (c1 int
    3> ,c2 int
    4> ,c3 char(2) null
    5> ,addr char(30)
    6> ,upddate datetime null
    7> )
    8> go
    1> create unique index ix1 on t1 (c1)
    2> go
    $ cat f1.txt
    00111 506 MAPLE DRIVE SCHUAMBURG 
    00727 22 CALLE C # A 
    1 1 some dummy data
    111111111 111111111 more dummy data to insert
    $ cat f1.fmt
    10.0
    3
    1       SYBCHAR 0       12      " "     1       code1
    2       SYBCHAR 0       12      " "     2       value2 
    3       SYBCHAR 0       30      "\n"    4       residential address
    $ bcp tempdb..t1 in f1.txt -Usa -Pxxxxx -Ssrv1 -f f1.fmt
    
    Starting copy...
    
    4 rows copied.
    Clock Time (ms.): total = 1  Avg = 0 (4000.00 rows per sec.)
    $ ##### And if comma delimited #####
    $ cat f2.txt
    2,1,Sydney, Australia
    999999999,999999999,Johannesbyrg, South Africa
    $ cat f2.fmt
    10.0
    3
    1       SYBCHAR 0       12      ","     1       key1
    2       SYBCHAR 0       12      ","     2       val1
    3       SYBCHAR 0       30      "\n"    4       addr1
    $ bcp tempdb..t1 in f2.txt -Usa -Pxxxxxx -Ssrv1 -f f2.fmt
    
    Starting copy...
    
    2 rows copied.
    Clock Time (ms.): total = 1  Avg = 0 (2000.00 rows per sec.)
    
    1> select * from t1
    2> go
     c1          c2          c3   addr                           upddate                    
     ----------- ----------- ---- ------------------------------ -------------------------- 
             111         506 NULL MAPLE DRIVE SCHUAMBURG                               NULL 
             727          22 NULL CALLE C # A                                          NULL 
               1           1 NULL some dummy data                                      NULL 
       111111111   111111111 NULL more dummy data to insert                            NULL 
               2           1 NULL Sydney, Australia                                    NULL 
       999999999   999999999 NULL Johannesbyrg, South Africa                           NULL 
    
    (6 rows affected)

  12. #12
    Join Date
    Aug 2006
    Posts
    56
    That makes sense to me, pd. But what happens if the index value (c1) is not in the data file? That is one of my problems.

    Also, my ninth column in the database is the fifth field in the data file.

    Lastly, I do need to remove the data that exists in the table before import.
    Last edited by mburke; 08-11-06 at 11:29.

  13. #13
    Join Date
    Sep 2002
    Location
    Sydney, Australia
    Posts
    255

    Oh Dear ...

    Quote Originally Posted by mburke
    Well, the table has an indexed value in the first column. These columns are not represented in the data file. So I am not sure how to get bcp to handle it.
    So let me get this right: you want to insert bulk rows into a table which has an indexed column (key) WITHOUT specifying the key value. This is not a job for bcp (bulk copy utility), neither bcp nor sybase ASE has the crystal ball that tells them what the key value should be. Three options.

    1 You have a relational database with meaningful natural keys. In that case, you have to control the keys yourself.
    - Create a temporary (work) table for the bcp-in, identical to the target table minus the IndexedColumn
    - execute the bcp-in to the worktable
    - write simple SQL to trawl through the table with rowcount set to 1
    - for each row in the worktable
    --- figure out what the key should be (eg from other columns, relationships, etc) [a]
    --- check if the row already exists in the target table
    --- if not, insert the row (plus the constructed key column) from the worktable into the real table

    2 You have one of those filing systems that happen to reside in a "database"; the keys are meaningless and only there for the uniqueness demand. Same as (1), at step [a]:
    newkey = select max (key)+1 from realtable.

    3 The IndexedColumn in the target table in your filing system is an identity column. Easy for you, but messy for the poor guys who have to administer it in future.
    Use -N option on the bcp-in

    You will not get a suggestion to use an identity column from me. If I did provide that, then I would have to be responsible and give you directions for all the messy manual labour to fix it up and administer it, so no thanks.

    To avoid [meaningless] next-sequential number problems, think about indexing one or more of the existing columns (from the source file) in the target table or worktable, to create a meaningful unique key (address, in your example).

    Quote Originally Posted by mburke
    It also has a 'last update date' on the tail end. These columns are not represented in the data file. So I am not sure how to get bcp to handle it.
    Easy. Set a default on the column of getdate(). Exclude the column in the bcp-in format file.

    Quote Originally Posted by mburke
    Also, my ninth column in the database is the fifth field in the data file.
    No problem if you use the bcp format file. Just re-order the column definitions. You can also exclude columns.

    Quote Originally Posted by mburke
    Lastly, I do need to remove the data that exists in the table before import.
    Be nice and RTFM. See truncate table SQL command.

    If you need to delete masses of data without truncating the table, you have to limit your transaction size. Otherwise you will crash the transaction log unless and your DBA will send you to Conventry.

    Alternative
    Further, the file (as per the whole thread) is raw data coming in from an unruly source, and needs to be washed and dried, before you can insert it into a real table anyway. bcp-in to the worktable and cleanse perhaps each column in several passes using SQL (not row-processing).

    You should consider doing the whole job in one pass. Look into perl (plus the Sybase DB Handle, called sybperl. It is brilliant at doing this sort of thing, easy to learn and use, and blindingly fast.
    Last edited by DerekA; 08-12-06 at 05:32.
    Derek Asirvadem
    Senior Sybase DBA/Information Architect derekATsoftwaregemsDOTcomDOTau
    Anything worth doing is worth doing Right The First Time
    Spend your money on standards-compliant development or spend 10 times more fixing it

  14. #14
    Join Date
    Aug 2006
    Posts
    56
    Thank you, Derek. This is all very helpful. The identity column is meaningless to the data. It is just used to index the rows. I can truncate and recreate the table.

    I will be inserting one-million+ rows into the table. So I assume I should have the transaction log shut off.

    Also, I have been using the format file bcp generates. What are the possible parameters?


    This is an example of what I have so far...
    1 SYBCHAR 0 4 "," 1 id_ex_list


    I will look into that Perl option.

  15. #15
    Join Date
    May 2005
    Location
    South Africa
    Posts
    1,365
    Provided Answers: 1
    Quote Originally Posted by mburke
    ...bcp... What are the possible parameters?
    See the Sybase Utility Guide
    Chapter 3: Using bcp to Transfer Data to and from Adaptive Server
    Using format files
    Elements of the bcp format file
    Chapter 8: Utility Commands Reference
    bcp

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •