If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Data Access, Manipulation & Batch Languages > Unix Shell Scripts > Removing comma's within quotation marks

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 02-08-05, 15:46
mvillan mvillan is offline
Registered User
 
Join Date: Dec 2003
Location: Ogden Utah
Posts: 34
Removing comma's within quotation marks

Hello:

I am bringing this up again since we have gotten a little further and my previous comments and thus looking for some urgent input.

Well I have a csv file where some strings are enclosed by a quotation marks. We found that the following sed operation sed 's/\(".*\)[,]\(.*"\)/\1\2/g'
will remove the comma's within the quotation marks (if any exist)(this is what we want) only if one set of quotation marks exist within each line. When more than one set of quotation marks exists then the sed operation does not work on that particular line.

sed 's/\(".*\)[,]\(.*"\)/\1\2/g'

xxxxxxx,xxxxxxxx,"Russellville,",AR,xxxxxx,xxxxxxx ,xxxxxxxx,",ARLINGTON HEIGHTS,IL,60005,US,5.96,0.98,,,,,,,,,,,,,,,,,,,,, ,,,,

Therefore, does any one know how to pass a second or third or multiple arguments within the sed operations so when it does pass through a line with multiple quotes it understands it and executes the operation of removing the comma's which is what we want to accomplish.
__________________
mvilla
Reply With Quote
  #2 (permalink)  
Old 02-08-05, 16:12
vgersh99 vgersh99 is offline
Registered User
 
Join Date: Apr 2004
Location: Boston, MA
Posts: 325
how about something like this:

sed -e 's/\("[^,]*\)[,]\([^"]*"\)/\1\2/g' file
__________________
vlad
+-----------------------+
| #include <disclaimer.h> |
+-----------------------+
Reply With Quote
  #3 (permalink)  
Old 02-08-05, 16:32
mvillan mvillan is offline
Registered User
 
Join Date: Dec 2003
Location: Ogden Utah
Posts: 34
Thank

great posting!
!!!!!!!!!!!!!!!!!!wow!!!!!!!!!!!!!!!!!!! works great. Can you explain your thought process here so I can learn.
__________________
mvilla
Reply With Quote
  #4 (permalink)  
Old 02-08-05, 16:57
vgersh99 vgersh99 is offline
Registered User
 
Join Date: Apr 2004
Location: Boston, MA
Posts: 325
as 'ericbrunson' [from another forum has mentioned] sed's regex-s are "greedy" - specifying ".*" was eating up too much and you had to limit it to a ','.

How would you do that?
Well...... if we have "something followed by a comma" it means "anything BUT a comma followed by a comma":
[^,]*
anything BUT a comma repeated 0 or more times
[,]
followed by a comma.

You still have a problem if you have MULTIPLE quoted commas as in:

Quote:
xxxxxxx,xxxxxxxx,"Russellville,",AR,xxxxxx,xxxxxxx ,xxxxxxxx,ARLINGTON HEIGHTS,IL,60005,US,5.96,0.98,,,,,,"foo,bar,baz",, ,,,,,,,,,,,,, ,,,,
__________________
vlad
+-----------------------+
| #include <disclaimer.h> |
+-----------------------+
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On