| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |

02-08-05, 15:46
|
|
Registered User
|
|
Join Date: Dec 2003
Location: Ogden Utah
Posts: 34
|
|
|
Removing comma's within quotation marks
|
|
Hello:
I am bringing this up again since we have gotten a little further and my previous comments and thus looking for some urgent input.
Well I have a csv file where some strings are enclosed by a quotation marks. We found that the following sed operation sed 's/\(".*\)[,]\(.*"\)/\1\2/g'
will remove the comma's within the quotation marks (if any exist)(this is what we want) only if one set of quotation marks exist within each line. When more than one set of quotation marks exists then the sed operation does not work on that particular line.
sed 's/\(".*\)[,]\(.*"\)/\1\2/g'
xxxxxxx,xxxxxxxx,"Russellville,",AR,xxxxxx,xxxxxxx ,xxxxxxxx,",ARLINGTON HEIGHTS,IL,60005,US,5.96,0.98,,,,,,,,,,,,,,,,,,,,, ,,,,
Therefore, does any one know how to pass a second or third or multiple arguments within the sed operations so when it does pass through a line with multiple quotes it understands it and executes the operation of removing the comma's which is what we want to accomplish.
__________________
mvilla
|
|

02-08-05, 16:12
|
|
Registered User
|
|
Join Date: Apr 2004
Location: Boston, MA
Posts: 325
|
|
how about something like this:
sed -e 's/\("[^,]*\)[,]\([^"]*"\)/\1\2/g' file
__________________
vlad
+-----------------------+
| #include <disclaimer.h> |
+-----------------------+
|
|

02-08-05, 16:32
|
|
Registered User
|
|
Join Date: Dec 2003
Location: Ogden Utah
Posts: 34
|
|
|
Thank
|
|
great posting!
!!!!!!!!!!!!!!!!!!wow!!!!!!!!!!!!!!!!!!! works great. Can you explain your thought process here so I can learn.
__________________
mvilla
|
|

02-08-05, 16:57
|
|
Registered User
|
|
Join Date: Apr 2004
Location: Boston, MA
Posts: 325
|
|
as 'ericbrunson' [from another forum has mentioned] sed's regex-s are "greedy" - specifying ".*" was eating up too much and you had to limit it to a ','.
How would you do that?
Well...... if we have "something followed by a comma" it means "anything BUT a comma followed by a comma":
[^,]*
anything BUT a comma repeated 0 or more times
[,]
followed by a comma.
You still have a problem if you have MULTIPLE quoted commas as in:
Quote:
|
xxxxxxx,xxxxxxxx,"Russellville,",AR,xxxxxx,xxxxxxx ,xxxxxxxx,ARLINGTON HEIGHTS,IL,60005,US,5.96,0.98,,,,,,"foo,bar,baz",, ,,,,,,,,,,,,, ,,,,
|
__________________
vlad
+-----------------------+
| #include <disclaimer.h> |
+-----------------------+
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|