I would like to Capture data from web site, and import data into a database. Is there any software that can do this? Capturing is enough, i know how to import data to DB.
There are several strategies:
A command-line utility such as 'curl' or 'wget.' Both are Unix based, but both have been ported to Windows.
-- advantages: super quick, can be scheduled, does many things, robust error handling, no programming
-- disadvantages: some subtle differences from a browser, inflexible
An http library such as libwww for Perl.
-- advantages: total flexibility, can be scheduled
-- disadvantages: programming, roll your own error handling, major differences from a browser
Scripting a browser, such as IE.
-- advantages: much flexibility, almost identical to an actual user
-- disadvantages: additional complexity of DOM, etc, programming can be hard, usually must be in foreground to run
it seems that you can import data from the web into Excel.
am i right?
however, i dont know much about the right way to import more complicating data.
Yar, that's true. Some apps like Excel, Word and many databases can do web scraping. (That's the name for it, since it's the same concept as screen scraping.)
That usually means that the app will parse an HTML table.
Now, it's been a while, but I did some very extensive work getting data into Excel in this manner. Do some tests and see what you can get it to do before planning your work around it.
One caveat that I should mention about web scraping in general: it's *very* fragile. When the owner of the site makes changes it can crash your system or, worse, introduce subtle bugs. Moreover, if you have to deal with anything that can use non-ASCII characters you must test diligently to ensure that text is going through correctly.
IMHO, you should start small and work your way up. Also, don't make promises because it's very easy to get 99% of the way there just to find that there's a issue on their side that you can't do anything about.
I just tried to scrape statistics from NBA.com for my fellow countryman Beno Udrih, using Excel's web query. The results are quite good for an ad-hoc query. For getting the data and using it for further analyse. But for gathering the data, i should probably use some other software...