| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |
|

02-05-04, 15:19
|
|
Registered User
|
|
Join Date: Feb 2004
Posts: 3
|
|
|
Remove text up to first blank line in text file
|
|
I need a command line oneliner that will remove all the text at the begining of a text file up to the first blank line and save the results.
In fact I need to do this to multiply files in a folder.
I have tried the following ( and more) none work.
NOTE I don't care if the result ends in a new suffix ie: $file.out or not
EXAMPLES OF MY BAD NEWBIE CRAP:
cd ./txt/ ; for file in *txt ; do sed '1,/^$/d' $file ; done
cd ./txt/ ; for file in *txt ; do sed '1,/^$/d' 's/w\ \./tmp.tex' ; cp ./tmp.tex ./$file ; done
cd ./txt/ ; for file in *txt ; do sed '1,/^$/d' ; cp ./$file ./$file.out ; done
cd ./txt/ ; for file in *txt ; do sed '/^$/,/^$/D' "$file" ; done
|
|

02-06-04, 04:50
|
|
Padawan
|
|
Join Date: Jun 2002
Location: UK
Posts: 525
|
|
This should do it
awk '/^$/ && ! textFound {next}{textFound=1; print}' file > newFile
|
|

02-06-04, 04:54
|
|
Registered User
|
|
Join Date: Jan 2004
Location: Bordeaux, France
Posts: 319
|
|
|
|
Try this :
Code:
cd ./txt/
for file in *txt
do
sed '/./,$!d' $file > $file.tmp
mv $file.tmp $file
done
__________________
Jean-Pierre.
|
|

02-06-04, 05:17
|
|
Padawan
|
|
Join Date: Jun 2002
Location: UK
Posts: 525
|
|
Quote:
Originally posted by aigles
Try this :
Code:
cd ./txt/
for file in *txt
do
sed '/./,$!d' $file > $file.tmp
mv $file.tmp $file
done
|
Hello again Jean Pierre. I know that the command above is correct but I have no idea why!
If I wanted to retain the lines from the first non-blank, to the last non-blank, I would do something like...
sed '/./,/./!d'
However, this would mean that blank lines at the end would be stripped out.
I have found that if I wanted to match to the end of the file, I would use...
sed '/./,/\$/!d'
I can't understand why I've had to escape the $ and now your example has totally confused me. Could you help clear it up for me?
Thanks, Damian
|
|

02-06-04, 06:07
|
|
Registered User
|
|
Join Date: Jan 2004
Location: Bordeaux, France
Posts: 319
|
|
- Some explanations
sed '/./,$!d'
/./ <= Non empty line, at least contains a char
$ <= End of file
/./,$ <= Select lines from first nom empty to end of file
! <= Negate selection
/./,$! <= Select empty lines at top of file
d <= Delete selected lines
/./,$!d <= Delete empty lines at top of file
In a regular expression (/ . . . /)
$ = end of line
\$ = character $
In address range (outside of RE)
$ = last line
- To remove empty lines at end of file :
Code:
sed -e :a -e '/^\n*$/N;/\n$/ba' $file > $file.tmp
- To remove empty lines at top and end of file
Code:
sed '/./,$!d' $file | sed -e :a -e '/^\n*$/N;/\n$/ba > $file.tmp
- Links
do-it-with-sed
Sed FAQ
__________________
Jean-Pierre.
|
|

02-06-04, 06:36
|
|
Padawan
|
|
Join Date: Jun 2002
Location: UK
Posts: 525
|
|
Quote:
In a regular expression (/ . . . /)
$ = end of line
\$ = character $
In address range (outside of RE)
$ = last line
|
Of course! It's an address, not an RE. I still don't understand why I have to escape the $ in the following...
sed '/./,/\$/!d'
As you see, this is a regex and like you, I believe that this ought to represent the character $. It doesn't btw.

|
|

02-06-04, 06:48
|
|
Registered User
|
|
Join Date: Jan 2004
Location: Bordeaux, France
Posts: 319
|
|
If your file dosen't contains the character '$' the address /\$/ is take as $ (end of file).
In that case /./,$ and /./,/\$/ are equivalent.
__________________
Jean-Pierre.
|
|

02-06-04, 06:58
|
|
Padawan
|
|
Join Date: Jun 2002
Location: UK
Posts: 525
|
|
That seems to be correct. Is this behaviour documented anywhere? My man page has the following to say about addresses...
Certain commands called addressed commands allow you to specify one line or a
range of lines to which the command should be applied. The following rules apply
to addressed commands:
o A command line without an address selects every line.
o A command line with one address, expressed in context form, selects each
line that matches the address.
o A command line with two addresses separated by commas selects the entire
range from the first line that matches the first address through the next
line that matches the second. (If the second address is a number less than
or equal to the line number first selected, only one line is selected.)
Thereafter, the process is repeated, looking again for the first address.
|
|

02-06-04, 08:16
|
|
Registered User
|
|
Join Date: Jan 2004
Location: Bordeaux, France
Posts: 319
|
|
My man page says the same thing.
The document do-it_with-sed specify :
Quote:
- commands may take 0, 1 or 2 addresses
- if no address is given, a command is applied to all pattern spaces
- if 1 address is given, then it is applied to all pattern spaces
that match that address
- if 2 addresses are given, then it is applied to all formed pattern spaces
between the pattern space that matched the first address, and the next
pattern space matched by the second address.
If pattern spaces are all the time single lines, this can be said
like, if 2 addrs are given, then the command will be executed on
all lines between first addr and second (inclusive)
If the second address is an RE, then the search starts only on
the next line. That's why things like this work:
/foo/,/foo/<cmd>
|
This last point can explain this not documented behavior
Extract from Sed FAQ
Quote:
Address ranges are:
(1) Inclusive. The range "/From here/,/eternity/" matches all the lines containing "From here" up to and including the line containing "eternity". It will not stop on the line just prior to "eternity". (If you don't like this, see section 4.24.)
(2) Plenary. They always match full lines, not just parts of lines. In other words, a command to change or delete an address range will change or delete whole lines; it won't stop in the middle of a line.
(3) Multi-linear. Address ranges normally match 2 lines or more. The second address will never match the same line the first address did; therefore a valid address range always spans at least two lines, with these exceptions which match only one line:
if the first address matches the last line of the file
if using the syntax "/RE/,3" and /RE/ occurs only once in the file at line 3 or below
if using HHsed v1.5. See section 3.4.
(4) Minimalist. In address ranges with /regex/ as <address2>, the range "/foo/,/bar/" will stop at the first "bar" it finds, provided that "bar" occurs on a line below "foo". If the word "bar" occurs on several lines below the word "foo", the range will match all the lines from the first "foo" up to the first "bar". It will not continue hopping ahead to find more "bar"s. In other words, address ranges are not "greedy," like regular expressions.
(5) Repeating. An address range will try to match more than one block of lines in a file. However, the blocks cannot nest. In addition, a second match will not "take" the last line of the previous block. For example, given the following text,
start
stop start
stop
the sed command '/start/,/stop/d' will only delete the first two lines. It will not delete all 3 lines.
(6) Relentless. If the address range finds a "start" match but doesn't find a "stop", it will match every line from "start" to the end of the file. Thus, beware of the following behaviors:
/RE1/,/RE2/ # If /RE2/ is not found, matches from /RE1/ to the
# end-of-file.
20,/RE/ # If /RE/ is not found, matches from line 20 to the
# end-of-file.
/RE/,30 # If /RE/ occurs any time after line 30, each
# occurrence will be matched in sed15+, sedmod, and
# GNU sed v3.02+. GNU sed v2.05 and 1.18 will match
# from the 2nd occurrence of /RE/ to the end-of-file.
If these behaviors seem strange, remember that they occur because sed does not look "ahead" in
|
__________________
Jean-Pierre.
|
|

02-06-04, 08:48
|
|
Padawan
|
|
Join Date: Jun 2002
Location: UK
Posts: 525
|
|
|
|

02-06-04, 11:43
|
|
Registered User
|
|
Join Date: Oct 2003
Posts: 706
|
|
Geek alert!  Geek alert!
And that, gentlebeings, is why I personally don't use sed.
Oh, it's damm powerful, as you can plainly see, but it's incomprehensible. At least in my experience, when I come back to a sed-line that I myself have written, even one day later, I have forgotten what it means and I spend a long time puzzling it out.
Witness the fact that one line of 'chicken scratches' was followed by about four explanatory posts describing what it means. To me, that's a maintenance issue.
I'm not saying that awk is too much better at this than sed, but at least you have the opportunity to put some comments into it. And you can also write more than one rule, you can write procedures and so-forth, which make the whole process much easier to understand when you encounter your own code a second time.
'Chicken scratching' certainly tends to give Unix shell programming a bad reputation.
... P.S.: Nothing personal intended here! Nothing at all. Just another point of view.
|
|

02-06-04, 12:04
|
|
Registered User
|
|
Join Date: Jan 2004
Location: Bordeaux, France
Posts: 319
|
|
hi Guru
I agree with you.
I use sed especially to carry out substitutions, for the rest I use awk.
__________________
Jean-Pierre.
|
|

02-06-04, 13:11
|
|
Registered User
|
|
Join Date: Feb 2004
Posts: 3
|
|
Quote:
Originally posted by aigles
Try this :
Code:
cd ./txt/
for file in *txt
do
sed '/./,$!d' $file > $file.tmp
mv $file.tmp $file
done
|
Did this.....
cd ./txt/ ; for file in *txt ; do sed '/./,$!d' $file > $file.tmp ; mv $file.tmp $file ; done
Got this...
bash: syntax error near unexpected token `do'
I don't see anything wrong here... except maybe it needs quotes around the file names etc.. so...
I did this...
cd /home/Wolfe/Mail/IRCUNDERGROUND/txt/ ; for file in *txt ; do sed '/./,$!d' "$file" > "$file.tmp" ; mv $file.tmp $file ; done
No errors, but... the headers remain. 
|
|

02-06-04, 14:20
|
|
Registered User
|
|
Join Date: Jan 2004
Location: Bordeaux, France
Posts: 319
|
|
- Put quotes around all file names
Code:
cd /home/Wolfe/Mail/IRCUNDERGROUND/txt/
for file in *txt
do
sed '/./,$!d' "$file" > "$file.tmp"
mv "$file.tmp" "$file"
done
- Execute 'set -x' before the for loop to verify commands
- If your directory doesn't contains file '*txt' the variable 'file' will get the value '*txt'.
To avoid that, you can use 'find' command :
Code:
cd /home/Wolfe/Mail/IRCUNDERGROUND/txt/
find . -name '*txt' | \
while read file
do
sed '/./,$!d' "$file" > "$file.tmp"
mv "$file.tmp" "$file"
done
__________________
Jean-Pierre.
|
|

02-06-04, 16:17
|
|
Registered User
|
|
Join Date: Feb 2004
Posts: 3
|
|
Quote:
[SIZE=1]Originally posted by aigles [list][*] Put quotes around all file names
[]
|
Well.. I did that and the verdict is...
No errors... and no results.
the files remain as they were as if I had done nothing...
Some other things that I have tried:
cd /home/Wolfe/Mail/IRCUNDERGROUND/txt/ ; for file in *txt ; do perl -e '$i=0; while(<>){if($i|^[A-Za-z]:|/^\b*$/){print $_};$i++}' < $file > $file.tmp ; done
All this does is crash.. symbols within the the text file tend to screw things up "<" for example will cause an error.. so will pipes "|" ++> etc...
My thinking was that any line in the text that begins with "<text>:" is most likely to be a header, and therefore the line could be deleted. This would be better then just deleting to the first blank line, because if someone forwards an email as text to the system, the headers from the first would show up in the post. Problem is however, not all the lines in the header start with a string and a ":", some start with a "<" etc.. and therefore screw things up and don't get deleted. I pre-deleted most of the symbols causing this, but it is a pointless task.
The things that is really ticking me off. is the I did have a working sed script oneliner that did the job. It deleted everything to the first blank line and echo'd the contents to a new file ($file.out) but somehow it got deleted even in my backups. Oh well back to the drawing board as they say.
Sorry for the long rant.
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|