I am currently designing my db tables and I would like to ask if making the email of members the primary key, with a limit of 100 chars, is a good idea?
I would use this as a foreign key in 3 or 4 other tables. (like larger sites use emails nowadays for login)
I understand that emails might change down the line, but probably not that often and emails identifies uniquely the member. The change would ripple through to the foreign keys as well if a member would make such a request.
Could you tell me if this is good design, I am planning for several thousands of members (who wouldn't ), and lots of selects, and some inserts. I would be happy to test it, but not sure how to, as this is just theory at the moment.
I have seen the arguments for surrogate keys and autonumbers but in a sense that is an extra layer. Also, the index for an email column would not be that big if I am allowing only 100 characters. Does anybody have concrete factual experience with this? I have seen enough theory...
let's put it this way -- it's not a bad idea, and it will work just fine, but in your situation i would probably go ahead and use an auto_increment surrogate along with a unique constraint (unique index) on the email
so yes, the surrogate adds an additional layer (two indexes instead of one), but i believe the joins will be slightly faster using integer foreign keys, and of course the foreign key indexes will be smaller
plus, there is no ripple effect when a user changes her email
it's a close call, but i would go with the auto_increment -- and i am one of the very few who frequently argue for using natural keys!
I agree with r937. E-mail addresses do have the quality of being unique, which is good. In "Database Design for Mere Mortals" there is mention of a principle that primary key values shouldn't be changed very often (the ripple effect), and e-mails would be subject to pretty frequent changes. If you do use the e-mail as primary key, your RDBMS program will need to be able to cascade the change - I don't think all are capable of this.
If you do use the e-mail as primary key, your RDBMS program will need to be able to cascade the change - I don't think all are capable of this.
There are those of us from the Relational Algebra camp that will assert that even when a database engine can cascade updates and deletions, it should not do so. From that perspective, this is the same as deleting all of the rows with the old key, and inserting new rows with the new key, all done inside of a transaction. While you might be able to do it, this is just plain wrong for so many reasons.
You also have to be aware that email addresses can have multiple equivalent values which could wreck your query results. For example, the leading node of a gmail email address has optional periods. or, the www in the domain name can be optional or, .com and .net are interchangable.
i would guess that for all the instances where there exist two domains in the .com and .net top level domain spaces with identical second level names, the number of times where they are interchangeable (i.e. point to the same server) are rare
i think the adjective you want to use is OCCASIONALLY, not FREQUENTLY
Frequently does not imply a specific frequency. Or a frequency greater than a certain value. Its meaning is subjective. Try accessing the alternate domain to several popular websites. A lot of domain name holders will purchase both .com and .net and alias the .net to the .com. I am not saying that it will work with all domains, but if there are multiple exceptions, then this needs to be taken into consideration in the design.
Frequently does not imply a specific frequency. Or a frequency greater than a certain value. Its meaning is subjective.
quite true, but i assure you that its meaning is different from rarely, about which one can also say "does not imply a specific frequency. Or a frequency less than a certain value. Its meaning is subjective."
yes, i do realize that some organizations purchase both the .com and .net names
but in the context of a table of email addresses, i think it is wrong to assume that all the .com and .net addresses with the same second level name are the same
That's my point. If it's a primary key and there are possible ambiguous alternate values for whatever you are considering as your primary key, you better take that into consideration in your design. And wouldn't it be wrong to assume that none of the email addresses would have possible alternate valid values?
that the same person might have two emails is not what is being modelled, as far as i know
Hi r937 and urquel,
r937 your understanding is correct, a different email, be it .net or .com would be handled as a separate account to keep things simple! If a user would want to change emails that would be permitted but that would change their login name as well.