Results 1 to 10 of 10
  1. #1
    Join Date
    May 2010
    Posts
    3

    Pulling Databases from the internet

    I was asked to gather data on a specific lawfirm. Skadden Arps to be exact. (skadden.com). Rather than manually copying, pasting and formatting, is there an easier way to compile a database from the internet? I mean is there a way to get a website to dump its database to me? Any thoughts would be appreciated.

  2. #2
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    There are open source web scraping platforms, there are companies that provide the services and there are companies that provide relatively inexpensive web scraping software. You can also roll your own.

    Note that you are skirting round a grey (or at least off-white) area of the law when you get in to web scraping especially depending on your use of the data. Since you have used their URL and posted it in a public forum you may even find a representative of theirs pops in to the thread for a chat.

  3. #3
    Join Date
    May 2010
    Posts
    3
    I don't see how using information that is public can get me in trouble with the law. But I may be mistaken

  4. #4
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Here's just one little link. Good old Wikipdia Web scraping - Wikipedia, the free encyclopedia
    George
    Home | Blog

  5. #5
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    As I said, it is murky. There have been (so far unsuccessful) lawsuits. Also, some companies (such as Amazon) state in their T&Cs that you may not use... well, let's just post a snippet
    You may not systematically extract and/or re-utilise parts of the contents of the website without Amazon.co.uk's express written consent. In particular, you may not utilise any data mining, robots, or similar data gathering and extraction tools to extract (whether once or many times) for re-utilisation of any substantial parts of this website, without Amazon.co.uk's express written consent. You also may not create and/or publish your own database that features substantial (eg our prices and product listings) parts of this website without Amazon.co.uk's express written consent.
    Anyway, we are getting away from your question. Yes it is possible. If you are going from a standing start then you have a bit of a steep learning curve if this is a one off. If you will repeat this many times then it is worth immersing yourself in some software, otherwise I would sub contract this to one of the rent-a-coder sites.

  6. #6
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Actually, by posting an extract of Amazon's T&Cs I suppose I have violated their T&Cs.

    Naughty pootle

  7. #7
    Join Date
    May 2010
    Posts
    3
    I have read that. I already knew that you cannot copy copyrighted materials. I guess then only question is to whether or not this is "Trespass to chattels". I am assuming no, because we are not looking to damage the law firm's property in any way. But I will defiantly read more into that.

  8. #8
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    You can usually throttle software to minimise its footprint if you are concerned.

    I happen to have spoken in a great deal of depth about this with two of the market leaders and I know that (at the time we were negotiating with them) they had had no reports of anyone using their software having any problems.

    The reason for this is I work for an organisation that provides such services so I know how to use these tools and what it takes to acquire the proficiency to use them. I assume you want to record all the attorneys from that site?

  9. #9
    Join Date
    Aug 2010
    Posts
    1
    Would any of you be willing to suggest some names of software that can be purchased that works well, and doesn't cost a lot?

  10. #10
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Cheapest is to roll your own of course but requires some programming skill.
    The best two I found (when I reviewed this a year or so ago) were Mozenda and Kapow.
    Kapow have two products - a heavy duty client installed enterprise product that is incredibly expensive but very powerful and a pay-as-you-go online version that launched a little after I gave my recommendation (prior to that was a free online version that was far too flaky to use).
    We use Mozenda and are extremely happy with it - support is excellent and it is constantly improving.
    Testimonial:
    pootle flump
    ur codings are working excelent.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •