Results 1 to 6 of 6
  1. #1
    Join Date
    Apr 2002
    Posts
    84

    Wink Database design for a decision tree

    We need to store a decision tree in a database
    table. The decision nodes could be either string
    or numeric value. Could anyone kindly advise
    a database table that could efficiently fit this
    requirement, taking account of the construction
    of SQL statments on this table and update of the
    decsion criteria in the future?

    Thanks.

  2. #2
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Table: Decision_Nodes
    Node_ID autonumber,
    Parent_Node_ID integer,
    Node_Value text

    Node_ID is the primary key.
    Index Parent_Node_ID and create a self-referential relationship to Node_ID with cascading updates and deletes.
    Leave Node_Value as text, and you can convert the contents to numeric values when necessary.

    I have designed a lot of Access and MSSQL databases with tree structures like this, and I'll warn you that the coding gets very complex.

    blindman

  3. #3
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    149
    I would agree with Blindman with the first 2 columns of the table. But I would not create the value column as a text. Data manipulation with text will be a real pain (I am talking SQL Server). I would rather create

    tblNode
    =====
    NodeId
    ParentNodeId
    NodeDataType char(1) ( I for Float S for String)

    tblNumericValue
    ==========
    ValueId INT
    NodeId INT (FK tblNode)
    DateTime (if needed)
    Value (INT/FLOAT depending on what you mean by numeric data)

    tblStringValue
    ==========
    ValueId INT
    NodeId INT (FK tblNode)
    DateTime (if needed)
    Value CHAR

    Though it looks complex to have parallel tables for query, it is not bad at all. You could case statements in the Select to query the right table for the data. Let me know if you want to go with this idea and I can help you with the SQL.

  4. #4
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    149
    I would agree with Blindman with the first 2 columns of the table. But I would not create the value column as a text. Data manipulation with text will be a real pain (I am talking SQL Server). I would rather create

    tblNode
    =====
    NodeId
    ParentNodeId
    NodeDataType char(1) ( I for Float S for String)

    tblNumericValue
    ==========
    ValueId INT
    NodeId INT (FK tblNode)
    DateTime (if needed)
    Value (INT/FLOAT depending on what you mean by numeric data)

    tblStringValue
    ==========
    ValueId INT
    NodeId INT (FK tblNode)
    DateTime (if needed)
    Value CHAR

    Though it looks complex to have parallel tables for query, it is not bad at all. You could case statements in the Select to query the right table for the data. Let me know if you want to go with this idea and I can help you with the SQL.

  5. #5
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Whether or not to store all values as varchars or to store different values in different tables depends on whether you will need to perform mathematical operations on the dataset. If you are going to need to say, find the average numeric value for cases where the value is numeric and excluding non-numeric data then it may be worth the effort to spin these values off into separate tables. Otherwise, if you are merely going to report these values, your coding is going to get complex enough with recursive procedure calls for searching through your tree structure. Let the interface handle the conversion, and you won't have to modify your structure or code if you decide you also want other data types such as date values or uniqueidentifiers.

  6. #6
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    149
    Yes, that makes sense to me. I agree with you on that point. I was assuming that there will be some calculations that happen on the value column if it is numeric. If not, char datatype might be ok. But ofcourse, depending on how big the table is, you will save quite a bit of disk space by defining a numeric as numeric instead of char. We have a similar situation where both the points mentioned above hold good. The table has a few million records and we do need to perform some math operations on the data. That is were I was coming from. Thanks for bringing up this point though.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •