Results 1 to 4 of 4
  1. #1
    Join Date
    Dec 2003
    Location
    Ogden Utah
    Posts
    34

    Unanswered: Removing comma's within quotation marks

    Hello:

    I am bringing this up again since we have gotten a little further and my previous comments and thus looking for some urgent input.

    Well I have a csv file where some strings are enclosed by a quotation marks. We found that the following sed operation sed 's/\(".*\)[,]\(.*"\)/\1\2/g'
    will remove the comma's within the quotation marks (if any exist)(this is what we want) only if one set of quotation marks exist within each line. When more than one set of quotation marks exists then the sed operation does not work on that particular line.

    sed 's/\(".*\)[,]\(.*"\)/\1\2/g'

    xxxxxxx,xxxxxxxx,"Russellville,",AR,xxxxxx,xxxxxxx ,xxxxxxxx,",ARLINGTON HEIGHTS,IL,60005,US,5.96,0.98,,,,,,,,,,,,,,,,,,,,, ,,,,

    Therefore, does any one know how to pass a second or third or multiple arguments within the sed operations so when it does pass through a line with multiple quotes it understands it and executes the operation of removing the comma's which is what we want to accomplish.
    mvilla

  2. #2
    Join Date
    Apr 2004
    Location
    Boston, MA
    Posts
    325
    how about something like this:

    sed -e 's/\("[^,]*\)[,]\([^"]*"\)/\1\2/g' file
    vlad
    +-----------------------+
    | #include <disclaimer.h> |
    +-----------------------+

  3. #3
    Join Date
    Dec 2003
    Location
    Ogden Utah
    Posts
    34

    Thank

    great posting!
    !!!!!!!!!!!!!!!!!!wow!!!!!!!!!!!!!!!!!!! works great. Can you explain your thought process here so I can learn.
    mvilla

  4. #4
    Join Date
    Apr 2004
    Location
    Boston, MA
    Posts
    325
    as 'ericbrunson' [from another forum has mentioned] sed's regex-s are "greedy" - specifying ".*" was eating up too much and you had to limit it to a ','.

    How would you do that?
    Well...... if we have "something followed by a comma" it means "anything BUT a comma followed by a comma":
    [^,]*
    anything BUT a comma repeated 0 or more times
    [,]
    followed by a comma.

    You still have a problem if you have MULTIPLE quoted commas as in:

    xxxxxxx,xxxxxxxx,"Russellville,",AR,xxxxxx,xxxxxxx ,xxxxxxxx,ARLINGTON HEIGHTS,IL,60005,US,5.96,0.98,,,,,,"foo,bar,baz",, ,,,,,,,,,,,,, ,,,,
    vlad
    +-----------------------+
    | #include <disclaimer.h> |
    +-----------------------+

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •