Results 1 to 12 of 12
  1. #1
    Join Date
    Jan 2004
    Posts
    5

    Unanswered: Another Search & Replace Question

    Hello!

    I have a text file with the following line :

    this is a <defns:field1>test file</defns:field1>created for simulation



    In this, I need to delete all characters (& including) between <defns:field1> and </defns:field1>.



    Is there any script out there that would help me accomplish this? I tried
    sed 's/<defns:field1>.</defns:field1>//g' and it doesn't work.

    Thanks for any help.
    hb

  2. #2
    Join Date
    Oct 2003
    Location
    Germany
    Posts
    138
    try this:
    myvar="this is a <defns:field1>test file</defns:field1>created for simulation"
    echo $myvar
    newvar=`echo $myvar | sed s/"\<defns:field1\>"//g`
    newvar=`echo $newvar | sed s/"\<\/defns:field1\>"//g`
    echo $newvar

    You have to devaluate every charakters like "></" with backslash and that will work.
    Greetings from germany
    Peter F.

  3. #3
    Join Date
    Jul 2003
    Location
    Edinburgh
    Posts
    35

    Re: Another Search & Replace Question

    Depends whether you want to delete or retain the tags. Respectively:

    sed 's/<defns:field1>.*<\/defns:field1>//g' < infile > outfile

    sed 's/\(<defns:field1>\).*\(<\/defns:field1>\)/\1 \2/g' < infile > outfile

    You will run into problems if there are two or more complete tags on one line - the wildcard is greedy.

    I don't see where the replacement comes in ...

  4. #4
    Join Date
    Jan 2004
    Posts
    5
    Thanks to Both of you. Chillies, your suggestion worked. Now, you are right about two complete tages in one line. I still cant' find a work around for that.
    I have something like :
    this is a <defns:field1><ns1:a>xxx</ns1:a><ns1:a>xxx</ns1:a></defns:field1>

    When I try to remove the above string, it just doesn't do anything.


    Thanks!

  5. #5
    Join Date
    Jun 2002
    Location
    UK
    Posts
    525
    How about if you say that there can't be a tag opening character between your tags, i.e. [^<]*

    sed 's/\(<ns1:a>\)[^<]*\(<\/ns1:a>\)/\1\2/g'

  6. #6
    Join Date
    Jan 2004
    Posts
    5
    I think I did not explain properly .
    I am looking to replace the entire
    "<defns:field1><ns1:a>xxx</ns1:a><ns1:a>xxx</ns1:a></defns:field1>"

    Thank folks.

  7. #7
    Join Date
    Jun 2002
    Location
    UK
    Posts
    525
    Now you've confused me ;-)

    Can you post an example of the input and the expected output?

    Thanks, Damian

  8. #8
    Join Date
    Jan 2004
    Posts
    5
    Thanks for your help Damian.
    This is the input :

    <root><defns:field1><ns1:a>xxx</ns1:a><ns1:a>xxx</ns1:a></defns:field1></root>


    Output :

    <root>REPLACED</root>

  9. #9
    Join Date
    Jun 2002
    Location
    UK
    Posts
    525
    But that shouldn't cause you any problems. Chillies' post covered this...

    sed 's/\(<root>\).*\(<\/root>\)/\1REPLACED\2/'

    What did you try?

    The problem that Chillies pointed out was that because the wildcard matching in sed is greedy, if you had...

    <root>remove</root><root>remove</root>

    and you wanted to end up with...

    <root></root><root></root>

    but you tried...

    sed 's/\(<root>\).*\(<\/root>\)/\1\2/g

    you would end up with...

    <root></root>

    which is why I suggested...

    sed 's/\(<root>\)[^<]*\(<\/root>\)/\1\2/g

    because this would give you...

    <root></root><root></root>

    Does this make sense?

    Damian

  10. #10
    Join Date
    Jul 2003
    Location
    Edinburgh
    Posts
    35
    My original sed command should have worked on nested tags, as long but it won't work if there were two or more <tag>stuff</tag> constructs on the same line (see the explanation by Damian).

    If you're going to be working with many nested tags, you should look at one of the XML parsers out there. They'll have all the tricky cases worked out, though you'll have to spend more time setting up the script.

  11. #11
    Join Date
    Jan 2004
    Posts
    5
    Damian & chillies,
    Thanks for the help.
    As you mentioned, I am now thinking of going the XSLT way. This is the problem:

    I have this huge(700M) (ill-formed) XML batch file. Besides being badly formed, these XMLs also have a variety of embedded namespaces. I am trying to delete a few nodes and recreate the batch . Unfortunately I do have nested tages and as Chillies mentioned.

    Once my XSLT solution works, I'll post it . Thanks for all the help.

  12. #12
    Join Date
    Jul 2003
    Location
    Edinburgh
    Posts
    35
    If you're using hand-crafted code to fix broken XML, my final piece of advice is to state any assumptions made when building the script, then test that those assumptions hold. It is computer science after all ...

    In the above example, the script will fail if there are two or more tag groups on the same line, so test for it:

    grep '</tag>.*</tag>'

    Good luck!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •