I have to put together a database as part of an assessment and am a bit stuck. I wonder if anyone can help. The table I am working on needs a column to describe the kind of property (house, apartment etc). I would lean towards using the enum type, but think this would be frowned on as it is not used in other DBMSs. Is there a better way to represent a column which can take one of many values, but should not allow spelling mistakes or abbreviations etc?
I also need to have columns showing a yes, or a no. Again I would have used enum before, but now maybe tinyint is the way to go. Not sure at all
enums are ok, but they are not a robust or recommended solution.
For the sort of requirement you have it would be best in my opinion to have a foreign key to say a parent table defining types of property
ID - autonumber
Desc - varchar
TypeID - FK to table PropertyTypes.ID
.....other columns as required
when you need to display the property type your do a lookup in the property type table of a join. That way round if the agency deciodes its going upmarket and wants to re-classify flats as apartments, then changing the PropertType.Desc to 'Apartment' gets round any spelling mistakes.... If youa re unclear on the principles normailisation Rudy's (R937) site has a great piece form Paul Litwin, in my book well worth a read.
if you are using a yes/no then a more suitable datatype I'd contend is a boolean which has a true / false value.
enums have a very limited application, usually where you already know every possible state.. examples used are often Gender where the enum is Male, Female or Unknown, or perhaps in these modern times Male, Female, Unknown or indeterminate, or even depending on what day is it today.
if there is ever a risk that you may have to extend or change the enum values then in my books enum isn't appropriate.. even a simple thing like male or female may be an issue....... if you go for enums for gender you are stuck with M or F.. not really a major issue you may think.. but what happens if you need to re-write the software for another language say Spanish or Japenese.. or whatever.. it requires a change to the db.. adds to complexity..... whereas if your values are pulled form another table who cares what language the customer wants if they want hombre & senora, then they can set it up in their tables without requiring you to do additional unit testing
Healdem, thank you. That is very helpful. I will use a lookup table for the property types. I have also read the Paul Litwin article on r937's site, which again is very helpful to me. I have several large volumes about database design and it is very easy to get lost between entities and tables, normalisation and data modelling! It is good to have a concise look at relational principles.
The fact that the article mentions rows, tables and columns is helpful. I still get stuck when working out the border between data modelling and the actual database and its tables. I have been writing about why relational principles are important for successful database design and get a little mixed up about things such as when to use the term entity and when to use table when talking about relational theory.
Anyway it is very interesting, albeit a steep learning curve. I work with MySQL daily in my job and now I am studying basic relational principles I am starting to realise the theory behind lots of the problems I am having. Things such as duplicate records for example, because primary keys have not been chosen properly in the past. Other big problems are the lack of referential integrity constraints, which mean that I have to manually check for orphaned records quite often. Seeing the problems in day-to-day life helps with the course though.
praise from Rudy, no less..
I fear its going to my head.....
Originally Posted by r937
i would have suggested the same lookup table, except without using an auto_increment
just out of curiosity why would you not use an autonumber column in this instance. Years ago I seem to remember its was drilled into developers not to use codes which had intrinsic meaning so an Autonumber would seem to make sense.. I could for example envisage an agency coding property types based on other criteria.. eg 10..19 are say flats (101: 1 bed, 102 2 Bed etc......) 20..29 are detached houses (say 202 s bed etc...). The aim of the type key being autonumber would be transparent.. well in this case invisible as the user will never see the key just the text.
I wouldn't have thought the autonumber would add significantly overhead.. so I'm curious why you wouldn't use an auto_increment
using a surrogate key (an auto_increment) is not wrong, i just feel that sometimes it is unnecessary and actually counter-productive
any meaning embedded in a property type number (e.g. 10-19 flats, 20-29 houses) is a very bad idea
sometimes the natural key (in this case the property type name) is just as useful (if somewhat longer, yes, it takes up a bit more space, but this isn't the 20th century any more and disk space is relatively cheap)
the nice part about using the name as a foreign key is that you have the same assurance of data integrity (i.e. you cannot add a property using a property type that doesn't exist), and yet you do not have to join to the property type table to "translate" the numerical code into the name
sometimes, not having all these names to look up can make a complex query a lot simpler
Thanks both, interesting discussion. I hadn't thought about using the property type as the key.
Healdem's comment above about the 1 bedroom, 2 bedroom and other versions of property is especially interesting. I have to decide how to handle different numbers of bedrooms, but had thought to put them as an integer in the main property table. It seemed to me that if I started putting them in to the property type table that would give a lot of different types of property that might be a bit complicated to remember and write queries for. Also that would not really work well with r937's suggestion of property types as the actual key.
a flat is a flat, irrespective of the number of bedrooms.. so I certainly wouldn't try to subtype flats in the property type.. the number of bedrooms is a function of the property itself, not the property type. So don't use 10..19 as flats with 1,2,3..9? bedrooms.. please for your own sanity...
if you store the number of bedrooms in the property then you can search on that by specifying property type = flat and bedrooms = 3 or bedrooms in (3,4)
Its an interesting perspective using an alpha as the fk to property type..... OK it was some 16..20 years ago but I still have the scars from being told never ever to use alpha value as a PK, especially if the alpha value had the remotest possibility of being changed.
I am now in the process of mapping the ER Diagram to the actual tables, I am going to use an auto-incrementing value for this exercise. Regarding the bedrooms, am going to store those in Property as suggested.
Just as an aside, I am interested to know how other people prefer to name attributes, especially attributes that occur in more than one table. I am tempted to use PropertyAddress1, rather than just Address1, as it might make it easier to see what is going on in subsequent queries. (even though if table name is included).
I am still trying to understand the theory behind splitting the columns for property type off into a lookup table.
Obviously I can see that leaving them in the table will give repeating values, which could cause update and delete anomalies. I have tried to see how leaving them in violates relational principles. I have looked at normalization to the third form, but cannot see how property type depends even partly on another non key column.
Say you have your table of properties for sale
and you try to classify each property. some attributes are clearly a function of the property itself (eg No Bathrooms, No Bedrooms, No Public rooms downstairs, No Garage Sapces, No Parking spaces etc.....)
some attributes are common or shared.. so example the type of property may be a house, detached house, semi detached, a flat, a houseboat a whatever.
if you use a property type identifier in the properrty enforcing a foreign relationship to a table called property types then you know that all properties of a specific code are the same type.
if you elect to store the property type in the poroperty record itself you have to be very very carefull about what data you allow... although to a human flat, FLAT & Flat are the same... to a computer the may not be, let alone allowing for typos.. ok its hard to see how someone could screw up flat, but say they decided to use apartment for some porpoerties.....
that means when you do a cusotmer search you have to look for flats or apartments. If some decides that by changing the name of the property type its makes the agency look hipper, or inflates property proces. or what ever.
you would have to go in and change multiple pieces of information.
if the description of the property type is held in one place then you change it in the one place.
the whole pointy or relational integrity is to reduce the risk of data errors and anomolies.
The point of the foreign key definition is that it transfers validation to the database engine.. saying in effect you can only put a value in this column which matches a value in the other table, where this value must already exist. Straight away you have removed the risk of typos and misspellings. you will never get away from the problem of misclassification because the user selected the wrong value(s)
Thank you very much Healdem. I have been writing this all weekend. Drawing ER diagrams, mapping entities to tables and attributes to rows, plus explaining it all. It is a good way to learn.
I have implemented the lookup table as you suggested. I would like to make this database as efficient as possible. I think it is normalised now to 3NF. The Paul Litwin article has been very useful. As I mentioned, I have large volumes on this, but his overview is very accessible and understandable.