| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |

01-23-04, 14:00
|
|
Registered User
|
|
Join Date: Jan 2004
Posts: 7
|
|
Formatting text file data to import into database
|
|
I have a text file and I need to do some things that would take too long to do by hand. Can I make scripts to do the following operations?
I need to add an break between any number and letter when they are directly next to each other. Example = "bob20", break between b and 2.
I also need to Move first "10 number" sets with "-" charaters bet first two sets of three numbers to end of entry, first number listed first, second listed second.
In addition, lets say hypotheticaly we have a database where there are 10 fields, each seperated by a comma, what should be entered when there is no data for that field yet? Just blank space? Example ,,,bob,,,,20,,,,
|
Last edited by healdem; 05-08-12 at 12:08.
Reason: Photo removed at request of OP
|

01-25-04, 09:37
|
|
Registered User
|
|
Join Date: Oct 2003
Location: Germany
Posts: 138
|
|
Hi,
please give an example from your file.
I think the output you want, is a line like CSV format from Excel.
Tell me which part of the original line is to be converted to which array in you database file, for example.
This part I understand :
I need to add an break between any number and letter when they are directly next to each other. Example = "bob20", break between b and 2.
This part I dont understand :
I also need to Move first "10 number" sets with "-" charaters bet first two sets of three numbers to end of entry, first number listed first, second listed second.
__________________
Greetings from germany
Peter F.
|
|

01-25-04, 10:00
|
|
Registered User
|
|
Join Date: Jan 2004
Posts: 7
|
|
Examples
|
|
Yes CSV would be good as its pretty universal (MS Access, MySQL, Winfax, Outlook etc....) can all use it.
This is the format I want to end up with
Format=
Company name,st address,city,state,zip,po P O Box,city,state,zip,phone1,phone2,fax1,fax2,contact 1 (name and email)contact2,contact3,contact4,contact5,contact6 ,contact7,contact8,contact9,ccontact10,notes
therefore this;
Boaen Mech. Contr. Inc.Ph. 912-233-3208Everette Boaen
1223 W. 52nd St. (31405)Fax 912-233-3458Eddie Shuman
PO Box 22054Douglas Smith
Savannah GA 31403
would end up like this;
Boaen Mech. Contr. Inc.,1223 W. 52nd St.,Savannah, GA,31405,PO Box 22054,Savannah, GA, 31403,912-233-3208,,912-233-3458 ,,Everette Boaen,Eddie Shuman,Douglas Smith,,,,,,,,,
That I could import into the many programs I have that can use this. In adition I want to import it into MySQL and post it to the company website to make it easier for employees to access that kind of data. Eventually Id like to link the address book with the accounting database and the purchase order database then you could have a complete dossier of each company that would be easy to access and easy to analyze. Hopefully, eventually, with a way to create a graph of which areas we do most business in, who we have the most orders with, what time of year we sell the most, etc....
The original file was in word and was arranged in three columns. One for the address info, one for contact numbers, and the third was for the contact names. I took the columns out, deleted the space holders (used in the hardcopy for people to pen in changes) and got it arranged like you see above.
Now I just need to do the first thing you understand (bob20 to bob then 20 on the next line), and also move the phone and fax numbers to the end. the rest I guess will have to be by hand because each entry is variable. But I thought this could cut alot of time out and Id learn something too
Since the post I investigated writing a batch file to do the job, but could find a way to do it.
Thanks for your interest, I hope you can help,
Jeff Ernest
PS; LOL sorry about that pic, didnt know it was sooo big, hehe
|
|

01-26-04, 17:30
|
|
Registered User
|
|
Join Date: Oct 2003
Location: Germany
Posts: 138
|
|
Hi,
I can not help you out by this problem, because the input are not structured in any way.
Sorry
__________________
Greetings from germany
Peter F.
|
|

01-26-04, 17:33
|
|
Registered User
|
|
Join Date: Jan 2004
Posts: 7
|
|
|
bob20
Is there at least a way to seperate the letter/number problem bob20=bob (next line) 20???
|
|

01-27-04, 13:33
|
|
Registered User
|
|
Join Date: Oct 2003
Location: Germany
Posts: 138
|
|
Hi,
try this : every time when the characters change from numeric to a-z or A-Z or in the other way, a new word will be created.
old_string="bob20peter30Germany4711Dortmund3333"
new_string=`echo $old_string | awk ' BEGIN { FS="@"}
{
dummy=0
word=""
char1=substr($1,1,1)
if (char1 >= "0" )
{
num=1
}
else
{
num=0
}
lang=length($1)
while ( lang != i )
{
i=i+1
if ( substr($1,i,1) >= "0" )
{
if ( num==1 )
{
word=word""substr($1,i,1)
}
else
{
print word
word=substr($1,i,1)
num=1
}
}
else
{
if ( num==0 )
{
word=word""substr($1,i,1)
}
else
{
print word
word=substr($1,i,1)
num=0
}
}
}
print word
}'`
for i in $new_string
do
echo $i
done
And this is the output :
bob
20
peter
30
Germany
4711
Dortmund
3333
__________________
Greetings from germany
Peter F.
|
Last edited by fla5do; 01-27-04 at 13:59.
|

01-29-04, 09:07
|
|
Registered User
|
|
Join Date: Jan 2004
Posts: 7
|
|
|
Hmmm
OK I know this is a real n00b question, but is that awk or what and how do I run it?
|
|

01-29-04, 09:50
|
|
Padawan
|
|
Join Date: Jun 2002
Location: UK
Posts: 525
|
|
I think the way that you have extracted your data from Word has made this exercise pretty futile. I'm not saying that it can't be done but without seeing all your data, there is little chance that we could pick up all the natties that you have introduced.
I've just had a play now, and notice that 52nd Street would become 52,nd Street.
It would be simpler if you could retain the concept of Word's columns somehow. I haven't used it in a while but if you save as txt file, won't the columns be TAB separated? That would be a lot easier to deal with.
PS I hope you are a manager and not an analyst 'cos that spec sucks! ;-)
|
|

01-29-04, 10:37
|
|
Registered User
|
|
Join Date: Jan 2004
Posts: 7
|
|
Yes it sucks
Well,
Its for a company migrating from (*looks down in embarresment and mutters*) HARDCOPIES, *pauses for shock value*, so yes the whole operation sucks. This is the first in a series of changes to convert the company. The problem is that the head honcho had it setup in word without columns just manually spacing over, so it sucks big time.
He had "_____" (underscores) as placemarkers to manually write in changes post-facto. That kinda helped as I could just "replace" the underscore with a carriage return, thereby deleting the useless underscores and formating a bit better all-in-one move.
You can see the originals here> http://www.airionassociates.com/~addresses
That is a page I made after exporting the docs to html and making that direcotry page. It works ok, but its a bitch to update when I think of how easy a db would be in relation to what there is now.
So, anyway, how do I run that script above in *nix . IS it an awk command or what??
Thanks
Jeff
file contents in format:-
name,st address,city,state,zip,po box,city,state,zip,ph1,ph2,fax1,fax2,ct1 (name and email)ct2,ct3,ct4,ct5,ct6,ct7,ct8,ct9,ct10,notes
|
Last edited by healdem; 05-08-12 at 15:50.
Reason: file attachment removed at request of OP
|

01-29-04, 16:55
|
|
Padawan
|
|
Join Date: Jun 2002
Location: UK
Posts: 525
|
|
Okay... as you indicated, the data is too variable to perform all of what you are asking in one go. I suggest you use vi for most of this (it's a really good, interactive editor if your not familiar with it).
To start you off, run this at the command line to split your fields onto individual lines:
Code:
sed 's/\([0-9]\)\([^0-9)-]\{1,\}\)/\1\
\2/g' ab1.txt > ab1.new
It will create a new file called 'ab1.new' which you can then set to work on with vi.
Damian
|
|

01-29-04, 19:59
|
|
Registered User
|
|
Join Date: Jan 2004
Posts: 7
|
|
|
sed code issue
I tried the command in OSX and it worked however I had some problems.
1) it added a "^m" or "\2" to the end of each line
2) added "\xa0^m" in the blank lines
3) words in contact with numbers were deleted, not sent to a new line
thus;
Erickson Assoc. Inc.
912-527-9500
Eric Erickson
#1 Erickson Drive (31405)
912-527-9510
Hogan Pullin
PO Box 23529
Clate RalstonBill Morgan
Savannah GA 31403Brad Harris
Roger Ervin
became;
Erickson Assoc. Inc.^M
912-527-9500\2
Eric Erickson^M
#1\231405)^M
912-527-9510\2
Hogan Pullin^M
PO Box 23529\2
Clate RalstonBill Morgan^M
Savannah GA 31403\2
Roger Ervin^M
Is it the version of sed? the OS? or the code?
All your help is greatly appreciated.
Thanks
Jeff
|
|

01-29-04, 20:36
|
|
Padawan
|
|
Join Date: Jun 2002
Location: UK
Posts: 525
|
|
Did you copy the command exactly as I gave it (it is *supposed* to be over 2 lines) ?
The \2 at the end of the lines would indicate a syntax error. The sed command I gave you uses \1 and \2 to represent the portions of the regular expression in parentheses \([0-9]\) is \1 and \([^0-9)-]\{1,\}\) is \2.
It looks like you've entered:
sed 's/\([0-9]\)\([^0-9)-]\{1,\}\)/\1\\2/g' ab1.txt > ab1.new
This would mean that \2 is now \\2. If this were the case, the special meaning of \2 would have been escaped and I guess that is why you now see the literal '\2' rather than the buffered value that it should represent.
As for the ^M characters, they look like a port from DOS. Try...
flip -u yourFile
or, if that doesn't work...
flip -ub yourFile
'flip' just removes different operating systems' control characters.
HTH
|
|

01-29-04, 22:53
|
|
Registered User
|
|
Join Date: Jan 2004
Posts: 7
|
|
sucess
I switched to bash (was using tcsh), and it all worked like a charm, thanks for saving me hours worth of editing by hand !!!!!!!!!!!!!!
Next thing I plan on doing is learning sed!!!!!! LOL
IM not too bad with VI but dont know about awk and sed and others.
Again thanks
|
|

05-09-12, 23:17
|
|
Resident Curmudgeon
|
|
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,614
|
|
Moderators: See the discussion in the moderators forum before making any further changes to this thread.
-PatP
__________________
In theory, theory and practice are identical. In practice, theory and practice are unrelated.
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|