Results 1 to 7 of 7
  1. #1
    Join Date
    Mar 2004
    Location
    Slovenia
    Posts
    56

    Capture data from web site

    I would like to Capture data from web site, and import data into a database. Is there any software that can do this? Capturing is enough, i know how to import data to DB.

    Any ideas?
    Back to the basics...

  2. #2
    Join Date
    Mar 2003
    Location
    The Bottom of The Barrel
    Posts
    6,102
    What kind of data are we talking here? Is this a site you are authorized to "capture"?

    Those two questions will dictate your options.
    oh yeah... documentation... I have heard of that.

    *** What Do You Want In The MS Access Forum? ***

  3. #3
    Join Date
    Oct 2002
    Location
    Baghdad, Iraq
    Posts
    697
    Quote Originally Posted by mp218
    I would like to Capture data from web site, and import data into a database. Is there any software that can do this? Capturing is enough, i know how to import data to DB.

    Any ideas?
    There are several strategies:

    A command-line utility such as 'curl' or 'wget.' Both are Unix based, but both have been ported to Windows.
    -- advantages: super quick, can be scheduled, does many things, robust error handling, no programming
    -- disadvantages: some subtle differences from a browser, inflexible

    An http library such as libwww for Perl.
    -- advantages: total flexibility, can be scheduled
    -- disadvantages: programming, roll your own error handling, major differences from a browser

    Scripting a browser, such as IE.
    -- advantages: much flexibility, almost identical to an actual user
    -- disadvantages: additional complexity of DOM, etc, programming can be hard, usually must be in foreground to run

  4. #4
    Join Date
    Oct 2004
    Posts
    5
    it seems that you can import data from the web into Excel.
    am i right?
    however, i dont know much about the right way to import more complicating data.

  5. #5
    Join Date
    Oct 2002
    Location
    Baghdad, Iraq
    Posts
    697
    Quote Originally Posted by ZoneFire
    it seems that you can import data from the web into Excel.
    am i right?
    however, i dont know much about the right way to import more complicating data.
    Yar, that's true. Some apps like Excel, Word and many databases can do web scraping. (That's the name for it, since it's the same concept as screen scraping.)

    That usually means that the app will parse an HTML table.

    Now, it's been a while, but I did some very extensive work getting data into Excel in this manner. Do some tests and see what you can get it to do before planning your work around it.

    One caveat that I should mention about web scraping in general: it's *very* fragile. When the owner of the site makes changes it can crash your system or, worse, introduce subtle bugs. Moreover, if you have to deal with anything that can use non-ASCII characters you must test diligently to ensure that text is going through correctly.

    IMHO, you should start small and work your way up. Also, don't make promises because it's very easy to get 99% of the way there just to find that there's a issue on their side that you can't do anything about.

  6. #6
    Join Date
    Mar 2004
    Location
    Slovenia
    Posts
    56

    Web scraping

    ...thx 4 all the answers.

    I just tried to scrape statistics from NBA.com for my fellow countryman Beno Udrih, using Excel's web query. The results are quite good for an ad-hoc query. For getting the data and using it for further analyse. But for gathering the data, i should probably use some other software...

    Best Regards!
    Back to the basics...

  7. #7
    Join Date
    Feb 2004
    Posts
    4

    Scraping data

    I recommend the following O'Reilly books:

    "Spidering Hacks" mostly discusses Perl, however there is also a PHP project that demonstrates some nifty tricks.

    "Perl and LWP" covers the LWP Perl module in some detail.

    I sometimes use Teleport Pro to capture specific pages into a single directory on my PC, then use Perl's OpenDir function to sequentially process each of the files.

    You can download a free evaluation at http://www.tenmax.com/teleport/pro/home.htm

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •