I was asked to gather data on a specific lawfirm. Skadden Arps to be exact. (skadden.com). Rather than manually copying, pasting and formatting, is there an easier way to compile a database from the internet? I mean is there a way to get a website to dump its database to me? Any thoughts would be appreciated.
There are open source web scraping platforms, there are companies that provide the services and there are companies that provide relatively inexpensive web scraping software. You can also roll your own.
Note that you are skirting round a grey (or at least off-white) area of the law when you get in to web scraping especially depending on your use of the data. Since you have used their URL and posted it in a public forum you may even find a representative of theirs pops in to the thread for a chat.
As I said, it is murky. There have been (so far unsuccessful) lawsuits. Also, some companies (such as Amazon) state in their T&Cs that you may not use... well, let's just post a snippet
You may not systematically extract and/or re-utilise parts of the contents of the website without Amazon.co.uk's express written consent. In particular, you may not utilise any data mining, robots, or similar data gathering and extraction tools to extract (whether once or many times) for re-utilisation of any substantial parts of this website, without Amazon.co.uk's express written consent. You also may not create and/or publish your own database that features substantial (eg our prices and product listings) parts of this website without Amazon.co.uk's express written consent.
Anyway, we are getting away from your question. Yes it is possible. If you are going from a standing start then you have a bit of a steep learning curve if this is a one off. If you will repeat this many times then it is worth immersing yourself in some software, otherwise I would sub contract this to one of the rent-a-coder sites.
I have read that. I already knew that you cannot copy copyrighted materials. I guess then only question is to whether or not this is "Trespass to chattels". I am assuming no, because we are not looking to damage the law firm's property in any way. But I will defiantly read more into that.
You can usually throttle software to minimise its footprint if you are concerned.
I happen to have spoken in a great deal of depth about this with two of the market leaders and I know that (at the time we were negotiating with them) they had had no reports of anyone using their software having any problems.
The reason for this is I work for an organisation that provides such services so I know how to use these tools and what it takes to acquire the proficiency to use them. I assume you want to record all the attorneys from that site?
Cheapest is to roll your own of course but requires some programming skill.
The best two I found (when I reviewed this a year or so ago) were Mozenda and Kapow.
Kapow have two products - a heavy duty client installed enterprise product that is incredibly expensive but very powerful and a pay-as-you-go online version that launched a little after I gave my recommendation (prior to that was a free online version that was far too flaky to use).
We use Mozenda and are extremely happy with it - support is excellent and it is constantly improving.