Results 1 to 2 of 2
  1. #1
    Join Date
    Jan 2011
    Posts
    4

    Unanswered: Multiple data properties as columns or rows

    I'm wondering what others might do in this scenario. I've tried it both ways and both seem to work, though the back-end code is quite different. Here's the scenario:

    A single object (a read) might have multiple levels of classification associated with it (say, kingdom, phylum, class, family, etc: the ranks and taxa names associated with each rank: Bacillus, Staphylococcus, etc.) that all need to be queryable (e.g., give me all reads with phylum = x, and taxa = y). I see two ways to do this:

    1) rank is a column with rows associated with each read
    2) each rank is a row, tied to the read (multiple rows/read)

    I'm wondering what might be the most efficient. For 1, I'd need to index each column, and for 2) I'd only need 2 indexes (rank and taxa).

    Might look like this:

    1)
    Code:
    CREATE TABLE classification1
    (
      read_id integer NOT NULL,
      kingdom text,
      phylum text,
      ...
      CONSTRAINT classification1_pkey PRIMARY KEY (read_id),
      CONSTRAINT classification1_read_id_fkey FOREIGN KEY (read_id)
          REFERENCES "read" (read_id) MATCH SIMPLE
          ON UPDATE NO ACTION ON DELETE NO ACTION
      ...
    )
    2)
    Code:
    CREATE TABLE classification2
    (
      read_id integer NOT NULL,
      rank text NOT NULL,
      taxa text,
      CONSTRAINT classification2_pkey PRIMARY KEY (read_id, rank),
      CONSTRAINT classification2_read_id_fkey FOREIGN KEY (read_id)
          REFERENCES classification1 (read_id) MATCH SIMPLE
          ON UPDATE NO ACTION ON DELETE NO ACTION
      ...
    )
    I'm operating on millions of reads (~40M) with each read having ~6-8 classifications (either rows/cols, depending on design).

    I'm leaning toward option 1 to reduce the number of rows in favor of increasing the number of indexes, and taking the hit on insert speed.

    Thoughts?

    Thanks,
    Chris

  2. #2
    Join Date
    Aug 2009
    Location
    Olympia, WA
    Posts
    337
    Approach #2 is called an Entity Attribute Value (EAV) table. They can be useful when you are not working with a fixed set of attributes and give you a little more flexibility. But they are generally frowned upon as they are harder to maintain.

    Given that you have a fixed set of attributes (kingdom, phylum, class, family) that isn't likely to change I would recommend approach #1.

    You can read more about EAV modelling here:
    Entity-attribute-value model - Wikipedia, the free encyclopedia

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •