Results 1 to 14 of 14
  1. #1
    Join Date
    Jan 2004
    Posts
    7

    Question Unanswered: Formatting text file data to import into database

    I have a text file and I need to do some things that would take too long to do by hand. Can I make scripts to do the following operations?

    I need to add an break between any number and letter when they are directly next to each other. Example = "bob20", break between b and 2.

    I also need to Move first "10 number" sets with "-" charaters bet first two sets of three numbers to end of entry, first number listed first, second listed second.

    In addition, lets say hypotheticaly we have a database where there are 10 fields, each seperated by a comma, what should be entered when there is no data for that field yet? Just blank space? Example ,,,bob,,,,20,,,,
    Last edited by healdem; 05-08-12 at 13:08. Reason: Photo removed at request of OP

  2. #2
    Join Date
    Oct 2003
    Location
    Germany
    Posts
    138
    Hi,
    please give an example from your file.
    I think the output you want, is a line like CSV format from Excel.
    Tell me which part of the original line is to be converted to which array in you database file, for example.

    This part I understand :
    I need to add an break between any number and letter when they are directly next to each other. Example = "bob20", break between b and 2.

    This part I dont understand :
    I also need to Move first "10 number" sets with "-" charaters bet first two sets of three numbers to end of entry, first number listed first, second listed second.
    Greetings from germany
    Peter F.

  3. #3
    Join Date
    Jan 2004
    Posts
    7

    Smile Examples

    Yes CSV would be good as its pretty universal (MS Access, MySQL, Winfax, Outlook etc....) can all use it.

    This is the format I want to end up with


    Format=
    Company name,st address,city,state,zip,po P O Box,city,state,zip,phone1,phone2,fax1,fax2,contact 1 (name and email)contact2,contact3,contact4,contact5,contact6 ,contact7,contact8,contact9,ccontact10,notes

    therefore this;
    Boaen Mech. Contr. Inc.Ph. 912-233-3208Everette Boaen
    1223 W. 52nd St. (31405)Fax 912-233-3458Eddie Shuman
    PO Box 22054Douglas Smith
    Savannah GA 31403


    would end up like this;
    Boaen Mech. Contr. Inc.,1223 W. 52nd St.,Savannah, GA,31405,PO Box 22054,Savannah, GA, 31403,912-233-3208,,912-233-3458 ,,Everette Boaen,Eddie Shuman,Douglas Smith,,,,,,,,,

    That I could import into the many programs I have that can use this. In adition I want to import it into MySQL and post it to the company website to make it easier for employees to access that kind of data. Eventually Id like to link the address book with the accounting database and the purchase order database then you could have a complete dossier of each company that would be easy to access and easy to analyze. Hopefully, eventually, with a way to create a graph of which areas we do most business in, who we have the most orders with, what time of year we sell the most, etc....

    The original file was in word and was arranged in three columns. One for the address info, one for contact numbers, and the third was for the contact names. I took the columns out, deleted the space holders (used in the hardcopy for people to pen in changes) and got it arranged like you see above.

    Now I just need to do the first thing you understand (bob20 to bob then 20 on the next line), and also move the phone and fax numbers to the end. the rest I guess will have to be by hand because each entry is variable. But I thought this could cut alot of time out and Id learn something too


    Since the post I investigated writing a batch file to do the job, but could find a way to do it.

    Thanks for your interest, I hope you can help,
    Jeff Ernest


    PS; LOL sorry about that pic, didnt know it was sooo big, hehe

  4. #4
    Join Date
    Oct 2003
    Location
    Germany
    Posts
    138
    Hi,
    I can not help you out by this problem, because the input are not structured in any way.
    Sorry
    Greetings from germany
    Peter F.

  5. #5
    Join Date
    Jan 2004
    Posts
    7

    bob20

    Is there at least a way to seperate the letter/number problem bob20=bob (next line) 20???

  6. #6
    Join Date
    Oct 2003
    Location
    Germany
    Posts
    138
    Hi,
    try this : every time when the characters change from numeric to a-z or A-Z or in the other way, a new word will be created.

    old_string="bob20peter30Germany4711Dortmund3333"
    new_string=`echo $old_string | awk ' BEGIN { FS="@"}
    {
    dummy=0
    word=""
    char1=substr($1,1,1)
    if (char1 >= "0" )
    {
    num=1
    }
    else
    {
    num=0
    }
    lang=length($1)
    while ( lang != i )
    {
    i=i+1
    if ( substr($1,i,1) >= "0" )
    {
    if ( num==1 )
    {
    word=word""substr($1,i,1)
    }
    else
    {
    print word
    word=substr($1,i,1)
    num=1
    }
    }
    else
    {
    if ( num==0 )
    {
    word=word""substr($1,i,1)
    }
    else
    {
    print word
    word=substr($1,i,1)
    num=0
    }
    }
    }
    print word
    }'`
    for i in $new_string
    do
    echo $i
    done


    And this is the output :

    bob
    20
    peter
    30
    Germany
    4711
    Dortmund
    3333
    Last edited by fla5do; 01-27-04 at 14:59.
    Greetings from germany
    Peter F.

  7. #7
    Join Date
    Jan 2004
    Posts
    7

    Hmmm

    OK I know this is a real n00b question, but is that awk or what and how do I run it?

  8. #8
    Join Date
    Jun 2002
    Location
    UK
    Posts
    525
    I think the way that you have extracted your data from Word has made this exercise pretty futile. I'm not saying that it can't be done but without seeing all your data, there is little chance that we could pick up all the natties that you have introduced.

    I've just had a play now, and notice that 52nd Street would become 52,nd Street.

    It would be simpler if you could retain the concept of Word's columns somehow. I haven't used it in a while but if you save as txt file, won't the columns be TAB separated? That would be a lot easier to deal with.

    PS I hope you are a manager and not an analyst 'cos that spec sucks! ;-)

  9. #9
    Join Date
    Jan 2004
    Posts
    7

    Wink Yes it sucks

    Well,
    Its for a company migrating from (*looks down in embarresment and mutters*) HARDCOPIES, *pauses for shock value*, so yes the whole operation sucks. This is the first in a series of changes to convert the company. The problem is that the head honcho had it setup in word without columns just manually spacing over, so it sucks big time.

    He had "_____" (underscores) as placemarkers to manually write in changes post-facto. That kinda helped as I could just "replace" the underscore with a carriage return, thereby deleting the useless underscores and formating a bit better all-in-one move.

    You can see the originals here> http://www.airionassociates.com/~addresses

    That is a page I made after exporting the docs to html and making that direcotry page. It works ok, but its a bitch to update when I think of how easy a db would be in relation to what there is now.


    So, anyway, how do I run that script above in *nix . IS it an awk command or what??

    Thanks
    Jeff

    file contents in format:-

    name,st address,city,state,zip,po box,city,state,zip,ph1,ph2,fax1,fax2,ct1 (name and email)ct2,ct3,ct4,ct5,ct6,ct7,ct8,ct9,ct10,notes
    Last edited by healdem; 05-08-12 at 16:50. Reason: file attachment removed at request of OP

  10. #10
    Join Date
    Jun 2002
    Location
    UK
    Posts
    525
    Okay... as you indicated, the data is too variable to perform all of what you are asking in one go. I suggest you use vi for most of this (it's a really good, interactive editor if your not familiar with it).

    To start you off, run this at the command line to split your fields onto individual lines:

    Code:
    sed 's/\([0-9]\)\([^0-9)-]\{1,\}\)/\1\
    \2/g' ab1.txt > ab1.new
    It will create a new file called 'ab1.new' which you can then set to work on with vi.

    Damian

  11. #11
    Join Date
    Jan 2004
    Posts
    7

    sed code issue

    I tried the command in OSX and it worked however I had some problems.

    1) it added a "^m" or "\2" to the end of each line
    2) added "\xa0^m" in the blank lines
    3) words in contact with numbers were deleted, not sent to a new line

    thus;

    Erickson Assoc. Inc.
    912-527-9500
    Eric Erickson
    #1 Erickson Drive (31405)
    912-527-9510
    Hogan Pullin
    PO Box 23529
    Clate RalstonBill Morgan
    Savannah GA 31403Brad Harris
    Roger Ervin


    became;

    Erickson Assoc. Inc.^M
    912-527-9500\2
    Eric Erickson^M
    #1\231405)^M
    912-527-9510\2
    Hogan Pullin^M
    PO Box 23529\2
    Clate RalstonBill Morgan^M
    Savannah GA 31403\2
    Roger Ervin^M


    Is it the version of sed? the OS? or the code?
    All your help is greatly appreciated.

    Thanks
    Jeff

  12. #12
    Join Date
    Jun 2002
    Location
    UK
    Posts
    525
    Did you copy the command exactly as I gave it (it is *supposed* to be over 2 lines) ?

    The \2 at the end of the lines would indicate a syntax error. The sed command I gave you uses \1 and \2 to represent the portions of the regular expression in parentheses \([0-9]\) is \1 and \([^0-9)-]\{1,\}\) is \2.

    It looks like you've entered:

    sed 's/\([0-9]\)\([^0-9)-]\{1,\}\)/\1\\2/g' ab1.txt > ab1.new

    This would mean that \2 is now \\2. If this were the case, the special meaning of \2 would have been escaped and I guess that is why you now see the literal '\2' rather than the buffered value that it should represent.

    As for the ^M characters, they look like a port from DOS. Try...

    flip -u yourFile

    or, if that doesn't work...

    flip -ub yourFile

    'flip' just removes different operating systems' control characters.

    HTH

  13. #13
    Join Date
    Jan 2004
    Posts
    7

    Thumbs up sucess

    I switched to bash (was using tcsh), and it all worked like a charm, thanks for saving me hours worth of editing by hand !!!!!!!!!!!!!!

    Next thing I plan on doing is learning sed!!!!!! LOL

    IM not too bad with VI but dont know about awk and sed and others.

    Again thanks

  14. #14
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Moderators: See the discussion in the moderators forum before making any further changes to this thread.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •