If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > More columns <-> more joins

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 11-27-09, 03:11
verellen verellen is offline
Registered User
 
Join Date: Nov 2009
Posts: 1
More columns <-> more joins

Hey all,

I have the following question:
When designing a database, qua performance, which is the best senario:

Having a table with lots of columns, so you do not have to use many joins.
Or storing your data in seperated tables, which is more clarifying and flexible, but when you need to gather the data, you'll have to use more joins.

When does having a lot of joins have affect on the performance?

Last edited by verellen; 11-27-09 at 03:16.
Reply With Quote
  #2 (permalink)  
Old 11-27-09, 08:17
gvee gvee is offline
www.gvee.co.uk
 
Join Date: Jan 2007
Location: UK
Posts: 10,156
__________________
George
Twitter | Blog
Reply With Quote
  #3 (permalink)  
Old 11-29-09, 01:30
sco08y sco08y is offline
Registered User
 
Join Date: Oct 2002
Location: Baghdad, Iraq
Posts: 697
Quote:
Originally Posted by verellen View Post
Hey all,

I have the following question:
When designing a database, qua performance, which is the best senario:

Having a table with lots of columns, so you do not have to use many joins.
Or storing your data in seperated tables, which is more clarifying and flexible, but when you need to gather the data, you'll have to use more joins.

When does having a lot of joins have affect on the performance?
I'll run over the pros and cons very quickly:

Pros
  • For some queries, performance is better.
  • The schema is simpler.
Cons
  • Some queries may perform worse.
  • Updating is more expensive, and concurrency of updates may suffer.
  • Integrity constraints may not be expressible as simple foreign key constraints.
  • Your business logic will move out of the schema into the application.
And the "simpler schema" often lures people in: taking all those messy tables and glomming them into one is very appealing to, shall we say, the PowerPoint mentality. So when you're proposing an initial design, you've probably only done slides on the schema, and so some jackalope will say, "Keep it simple, stupid" because he thinks that deleting complexity from the schema is going to make it magically disappear. The trouble is that when the slideshow is over and you start working on the application, that complexity comes right back. Since the schema is settled on, the complexity has to be managed by the application! You just got promoted from DBA / app designer to DBMS-functionality-reinventer.

Now, there are some specialized cases where join performance is critical over everything else. But it's hard to know ahead of time whether that's your case and whether it can only be achieved by denormalizing. To know for certain, you're going to want to have profiling data and query plans to prove that joins are the actual chokepoint for your performance. You're going to want to prove to yourself that you can't mitigate the performance problem with optimizer hints, clustered tables or materialized views. All of these techniques can potentially solve the problem, but only if you have an actual system with real data to test them on.

Which brings me to the standard warning about premature optimization, which applies doubly when you're designing a schema. It's generally much harder to go from a denormalized schema to a normalized schema than the reverse. It sounds easy enough in theory, but in practice, denormalization allows redundancy to creep in. (When you denormalize, you're not using the DBMS's native ability to prevent redundancy.) Redundancy, in turn, allows inconsistent data to creep in. Real data, unlike toy examples you see in books, represents actual people and actual things and actual money. If you have to fix that data, you're going to have to be the bearer of bad news to a whole lot of people.

Redundancy will also appear in the application. If you have a check in the application and another coder does the classic "copy and paste", that check has now been replicated. If you have to change it, you have to somehow verify that you've changed it in every place in your application. You could, in theory, put all those checks neatly in an application library. But rather than rely on programmers to always do the right thing, it makes more sense to centralize your logic and manage it through the DBMS. That's what it's there for, and the way to make that work is by normalizing.
Reply With Quote
  #4 (permalink)  
Old 11-29-09, 11:01
Pat Phelan Pat Phelan is offline
Resident Curmudgeon
 
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,605
In the vast majority of cases, normalized schemas will produce faster and more efficient applications than denormalized schemas.

There will be a few exceptions, on the order of three in ten thousand cases where a denormalized schema may produce one specific opeartion faster. These are typically in Data Warehouse schemas, often used for building analytical cubes.

If you can afford the ongoing manpower costs for dealing with the problems induced by denormaliztion, there is a chance that it might produce a faster application. If maintenance and storage costs are negligable compared to query cost, you could have the first application I have heard of where denormalization might be cost effective!

-PatP
__________________
In theory, theory and practice are identical. In practice, theory and practice are unrelated.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On