If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Data Access, Manipulation & Batch Languages > Unix Shell Scripts > Another Search & Replace Question

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 01-05-04, 16:30
hvenkatr hvenkatr is offline
Registered User
 
Join Date: Jan 2004
Posts: 5
Another Search & Replace Question

Hello!

I have a text file with the following line :

this is a <defns:field1>test file</defns:field1>created for simulation



In this, I need to delete all characters (& including) between <defns:field1> and </defns:field1>.



Is there any script out there that would help me accomplish this? I tried
sed 's/<defns:field1>.</defns:field1>//g' and it doesn't work.

Thanks for any help.
hb
Reply With Quote
  #2 (permalink)  
Old 01-05-04, 17:44
fla5do fla5do is offline
Registered User
 
Join Date: Oct 2003
Location: Germany
Posts: 138
try this:
myvar="this is a <defns:field1>test file</defns:field1>created for simulation"
echo $myvar
newvar=`echo $myvar | sed s/"\<defns:field1\>"//g`
newvar=`echo $newvar | sed s/"\<\/defns:field1\>"//g`
echo $newvar

You have to devaluate every charakters like "></" with backslash and that will work.
__________________
Greetings from germany
Peter F.
Reply With Quote
  #3 (permalink)  
Old 01-06-04, 08:30
chillies chillies is offline
Registered User
 
Join Date: Jul 2003
Location: Edinburgh
Posts: 35
Re: Another Search & Replace Question

Depends whether you want to delete or retain the tags. Respectively:

sed 's/<defns:field1>.*<\/defns:field1>//g' < infile > outfile

sed 's/\(<defns:field1>\).*\(<\/defns:field1>\)/\1 \2/g' < infile > outfile

You will run into problems if there are two or more complete tags on one line - the wildcard is greedy.

I don't see where the replacement comes in ...
Reply With Quote
  #4 (permalink)  
Old 01-06-04, 10:44
hvenkatr hvenkatr is offline
Registered User
 
Join Date: Jan 2004
Posts: 5
Thanks to Both of you. Chillies, your suggestion worked. Now, you are right about two complete tages in one line. I still cant' find a work around for that.
I have something like :
this is a <defns:field1><ns1:a>xxx</ns1:a><ns1:a>xxx</ns1:a></defns:field1>

When I try to remove the above string, it just doesn't do anything.


Thanks!
Reply With Quote
  #5 (permalink)  
Old 01-06-04, 11:41
Damian Ibbotson Damian Ibbotson is offline
Padawan
 
Join Date: Jun 2002
Location: UK
Posts: 525
How about if you say that there can't be a tag opening character between your tags, i.e. [^<]*

sed 's/\(<ns1:a>\)[^<]*\(<\/ns1:a>\)/\1\2/g'
Reply With Quote
  #6 (permalink)  
Old 01-06-04, 12:29
hvenkatr hvenkatr is offline
Registered User
 
Join Date: Jan 2004
Posts: 5
I think I did not explain properly .
I am looking to replace the entire
"<defns:field1><ns1:a>xxx</ns1:a><ns1:a>xxx</ns1:a></defns:field1>"

Thank folks.
Reply With Quote
  #7 (permalink)  
Old 01-06-04, 12:41
Damian Ibbotson Damian Ibbotson is offline
Padawan
 
Join Date: Jun 2002
Location: UK
Posts: 525
Now you've confused me ;-)

Can you post an example of the input and the expected output?

Thanks, Damian
Reply With Quote
  #8 (permalink)  
Old 01-06-04, 13:04
hvenkatr hvenkatr is offline
Registered User
 
Join Date: Jan 2004
Posts: 5
Thanks for your help Damian.
This is the input :

<root><defns:field1><ns1:a>xxx</ns1:a><ns1:a>xxx</ns1:a></defns:field1></root>


Output :

<root>REPLACED</root>
Reply With Quote
  #9 (permalink)  
Old 01-06-04, 13:26
Damian Ibbotson Damian Ibbotson is offline
Padawan
 
Join Date: Jun 2002
Location: UK
Posts: 525
But that shouldn't cause you any problems. Chillies' post covered this...

sed 's/\(<root>\).*\(<\/root>\)/\1REPLACED\2/'

What did you try?

The problem that Chillies pointed out was that because the wildcard matching in sed is greedy, if you had...

<root>remove</root><root>remove</root>

and you wanted to end up with...

<root></root><root></root>

but you tried...

sed 's/\(<root>\).*\(<\/root>\)/\1\2/g

you would end up with...

<root></root>

which is why I suggested...

sed 's/\(<root>\)[^<]*\(<\/root>\)/\1\2/g

because this would give you...

<root></root><root></root>

Does this make sense?

Damian
Reply With Quote
  #10 (permalink)  
Old 01-06-04, 17:13
chillies chillies is offline
Registered User
 
Join Date: Jul 2003
Location: Edinburgh
Posts: 35
My original sed command should have worked on nested tags, as long but it won't work if there were two or more <tag>stuff</tag> constructs on the same line (see the explanation by Damian).

If you're going to be working with many nested tags, you should look at one of the XML parsers out there. They'll have all the tricky cases worked out, though you'll have to spend more time setting up the script.
Reply With Quote
  #11 (permalink)  
Old 01-07-04, 12:30
hvenkatr hvenkatr is offline
Registered User
 
Join Date: Jan 2004
Posts: 5
Damian & chillies,
Thanks for the help.
As you mentioned, I am now thinking of going the XSLT way. This is the problem:

I have this huge(700M) (ill-formed) XML batch file. Besides being badly formed, these XMLs also have a variety of embedded namespaces. I am trying to delete a few nodes and recreate the batch . Unfortunately I do have nested tages and as Chillies mentioned.

Once my XSLT solution works, I'll post it . Thanks for all the help.
Reply With Quote
  #12 (permalink)  
Old 01-07-04, 13:09
chillies chillies is offline
Registered User
 
Join Date: Jul 2003
Location: Edinburgh
Posts: 35
If you're using hand-crafted code to fix broken XML, my final piece of advice is to state any assumptions made when building the script, then test that those assumptions hold. It is computer science after all ...

In the above example, the script will fail if there are two or more tag groups on the same line, so test for it:

grep '</tag>.*</tag>'

Good luck!
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On