Results 1 to 5 of 5
  1. #1
    Join Date
    Jan 2003
    Posts
    55

    Unanswered: parsing html files?

    hi all
    i have certain html files with same format but diff data and i need a perl script to be able to parse thru the page and store relevant info under my headings in an external text file.
    cut n paste is an option many people have given me bt info is too much n need sm electronic so basically how to parse/extract data frm html using perl?
    to sum up i have html code for html tables(probabaly made from some database table) and i need a code to read the headings n extract all data under it and save to a text file...
    ne tutorial ref/example code/sytax wud be gr8 help!
    thnxx

    -shuchi
    Last edited by shuchi; 02-03-03 at 23:49.
    You try and try again..but then give up, there's no sense in being a complete fool about it!!!

  2. #2
    Join Date
    Sep 2002
    Location
    Germany, near Aachen
    Posts
    120
    Go to CPAN and get the appropriate module:

    http://search.cpan.org/author/GAAS/HTML-Parser-3.27/

  3. #3
    Join Date
    Jan 2003
    Posts
    55

    ..hmmmm...???..feel like a duhh..more help pleez!

    thnx for the help...
    i tried using HTML::Parser, HTML::TokeParser and HTML::TableExtract methods but smehow i thnk im doing somethin qrong as all my parsing/get_text/get_title etc outputs are just coming in the form:

    HTML::TableExtract=HASH(0x1ab5178)

    HTML::TokeParser=HASH(0x1db1c8c)

    ...etc...
    can someone please tell me the syntax to use.. i jus have few html tables of same format in an html page and from all of them i need to extract all the data. There are more than 1 tble on th page but all have same format so i wnat to somehow check the column names and for all the tables extract everythin under those columns into one file. therefore my external text file will have ALL the data of the html page jus leaving out the table and column names.

    please help asap..! examples and syntax wud be gr8 help..

    -shuchi
    You try and try again..but then give up, there's no sense in being a complete fool about it!!!

  4. #4
    Join Date
    Jan 2003
    Posts
    55
    okee never mind every1..problem solved..
    thnx for all the effort n help!

    -shuchi
    You try and try again..but then give up, there's no sense in being a complete fool about it!!!

  5. #5
    Join Date
    Feb 2004
    Location
    New York
    Posts
    1

    question

    i have a database which contains whole files...different types of files..mainly html,pdf,txt,office documents.i need to parse these documents and find certain texts out of it.can u suggest me something?
    it would be of great help to me since u guys r the pro's in this field and i really dont know much of it.any small tip from u can be very helpful...
    please let me know asap...
    thanks
    sanket

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •