Results 1 to 3 of 3
  1. #1
    Join Date
    Jun 2010
    Posts
    1

    Unanswered: What are my options for manipulating this much data?

    Hi all,

    There is a site that outputs 1,000,000 single cell rankings on a daily basis (.csv). These rankings change on a daily basis. This csv file only has two things: the numbered ranking, and the name of the item.

    I want an automated way to crunch this data in the following ways:

    1. Find new additions to the list.
    2. Biggest gainers and losers in terms of rankings. (so this process will involve yesterday's list)

    I have servers available to crunch data (mysql) and would prefer this process be done there rather than a macro/script that crunches this on my desktop.

    I would like the results to be outputted into excel format.

    Please move if this is the wrong forum. Thanks

  2. #2
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    The MINUS operator is perfect for the first question but mySQL does not have one so you'll need to use EXISTS or a LEFT OUTER JOIN. I also don't have a clue how to get from mySQL to Excel without some third party tools.

    I am inclined to move this to mySQL forum. Please could someone with more experience of that product let me know if that is the best destination forum.

  3. #3
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    When you only have a hammer, every problem looks like a nail. The OP (Original Poster) has MySQL, so that's the best place to move this post. MySQL is also a good tool for solving this kind of problem.

    If MySQL supported the FULL OUTER JOIN, that would be the best solution. A UNION between two SELECT statements will get you close enough to get the FULL OUTER JOIN functionality.

    A lot of the design decisions depend on how much history the OP needs to keep for research purposes. If the output of the comparison is all that is needed, then a simple (two table) design would probably be best. This is rarely the case, history often needs to be online for at least 15 trading days so some analysis needs to be done up front to ferret out the requirements.

    The short answer boils down to "Yes, I think that MySQL can do what the OP needs relatively well."

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •