If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Chit Chat > Pulling Databases from the internet

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 05-13-10, 10:09
ericgcollyer ericgcollyer is offline
Registered User
 
Join Date: May 2010
Posts: 3
Pulling Databases from the internet

I was asked to gather data on a specific lawfirm. Skadden Arps to be exact. (skadden.com). Rather than manually copying, pasting and formatting, is there an easier way to compile a database from the internet? I mean is there a way to get a website to dump its database to me? Any thoughts would be appreciated.
Reply With Quote
  #2 (permalink)  
Old 05-13-10, 10:24
pootle flump pootle flump is offline
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
There are open source web scraping platforms, there are companies that provide the services and there are companies that provide relatively inexpensive web scraping software. You can also roll your own.

Note that you are skirting round a grey (or at least off-white) area of the law when you get in to web scraping especially depending on your use of the data. Since you have used their URL and posted it in a public forum you may even find a representative of theirs pops in to the thread for a chat.
Reply With Quote
  #3 (permalink)  
Old 05-13-10, 10:37
ericgcollyer ericgcollyer is offline
Registered User
 
Join Date: May 2010
Posts: 3
I don't see how using information that is public can get me in trouble with the law. But I may be mistaken
Reply With Quote
  #4 (permalink)  
Old 05-13-10, 10:58
gvee gvee is offline
www.gvee.co.uk
 
Join Date: Jan 2007
Location: UK
Posts: 10,156
Here's just one little link. Good old Wikipdia Web scraping - Wikipedia, the free encyclopedia
__________________
George
Twitter | Blog
Reply With Quote
  #5 (permalink)  
Old 05-13-10, 11:01
pootle flump pootle flump is offline
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
As I said, it is murky. There have been (so far unsuccessful) lawsuits. Also, some companies (such as Amazon) state in their T&Cs that you may not use... well, let's just post a snippet
Quote:
You may not systematically extract and/or re-utilise parts of the contents of the website without Amazon.co.uk's express written consent. In particular, you may not utilise any data mining, robots, or similar data gathering and extraction tools to extract (whether once or many times) for re-utilisation of any substantial parts of this website, without Amazon.co.uk's express written consent. You also may not create and/or publish your own database that features substantial (eg our prices and product listings) parts of this website without Amazon.co.uk's express written consent.
Anyway, we are getting away from your question. Yes it is possible. If you are going from a standing start then you have a bit of a steep learning curve if this is a one off. If you will repeat this many times then it is worth immersing yourself in some software, otherwise I would sub contract this to one of the rent-a-coder sites.
Reply With Quote
  #6 (permalink)  
Old 05-13-10, 11:02
pootle flump pootle flump is offline
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
Actually, by posting an extract of Amazon's T&Cs I suppose I have violated their T&Cs.

Naughty pootle
Reply With Quote
  #7 (permalink)  
Old 05-13-10, 11:04
ericgcollyer ericgcollyer is offline
Registered User
 
Join Date: May 2010
Posts: 3
I have read that. I already knew that you cannot copy copyrighted materials. I guess then only question is to whether or not this is "Trespass to chattels". I am assuming no, because we are not looking to damage the law firm's property in any way. But I will defiantly read more into that.
Reply With Quote
  #8 (permalink)  
Old 05-13-10, 11:11
pootle flump pootle flump is offline
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
You can usually throttle software to minimise its footprint if you are concerned.

I happen to have spoken in a great deal of depth about this with two of the market leaders and I know that (at the time we were negotiating with them) they had had no reports of anyone using their software having any problems.

The reason for this is I work for an organisation that provides such services so I know how to use these tools and what it takes to acquire the proficiency to use them. I assume you want to record all the attorneys from that site?
Reply With Quote
  #9 (permalink)  
Old 08-12-10, 03:30
larrys larrys is offline
Registered User
 
Join Date: Aug 2010
Posts: 1
Would any of you be willing to suggest some names of software that can be purchased that works well, and doesn't cost a lot?
Reply With Quote
  #10 (permalink)  
Old 08-12-10, 03:57
pootle flump pootle flump is offline
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
Cheapest is to roll your own of course but requires some programming skill.
The best two I found (when I reviewed this a year or so ago) were Mozenda and Kapow.
Kapow have two products - a heavy duty client installed enterprise product that is incredibly expensive but very powerful and a pay-as-you-go online version that launched a little after I gave my recommendation (prior to that was a free online version that was far too flaky to use).
We use Mozenda and are extremely happy with it - support is excellent and it is constantly improving.
__________________
Testimonial:
Quote:
pootle flump
ur codings are working excelent.
Reply With Quote
Reply

Tags
law firm database dump

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On