PDA

View Full Version : parsing html files?


shuchi
02-03-03, 23:39
hi all
i have certain html files with same format but diff data and i need a perl script to be able to parse thru the page and store relevant info under my headings in an external text file.
cut n paste is an option many people have given me bt info is too much n need sm electronic so basically how to parse/extract data frm html using perl?
to sum up i have html code for html tables(probabaly made from some database table) and i need a code to read the headings n extract all data under it and save to a text file...
ne tutorial ref/example code/sytax wud be gr8 help!
:) thnxx

-shuchi

Bernd Dulfer
02-04-03, 03:50
Go to CPAN and get the appropriate module:

http://search.cpan.org/author/GAAS/HTML-Parser-3.27/

shuchi
02-04-03, 21:15
thnx for the help...
i tried using HTML::Parser, HTML::TokeParser and HTML::TableExtract methods but smehow i thnk im doing somethin qrong as all my parsing/get_text/get_title etc outputs are just coming in the form:

HTML::TableExtract=HASH(0x1ab5178)

HTML::TokeParser=HASH(0x1db1c8c)

...etc...
can someone please tell me the syntax to use.. i jus have few html tables of same format in an html page and from all of them i need to extract all the data. There are more than 1 tble on th page but all have same format so i wnat to somehow check the column names and for all the tables extract everythin under those columns into one file. therefore my external text file will have ALL the data of the html page jus leaving out the table and column names.

please help asap..! examples and syntax wud be gr8 help..:)

-shuchi

shuchi
02-05-03, 04:59
okee never mind every1..problem solved..:)
thnx for all the effort n help!

-shuchi

sanketr
02-28-04, 13:28
i have a database which contains whole files...different types of files..mainly html,pdf,txt,office documents.i need to parse these documents and find certain texts out of it.can u suggest me something?
it would be of great help to me since u guys r the pro's in this field and i really dont know much of it.any small tip from u can be very helpful...
please let me know asap...
thanks
sanket