If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > The customers they are a'changing

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 05-14-10, 07:05
woptsufp woptsufp is offline
Registered User
 
Join Date: May 2010
Posts: 3
The customers they are a'changing

Hi there.

Working as an analyst I often find myself doing adhoc queries on quite large amounts of data. The goal is often to find out some pattern to movements in our customer base or to track other changes over time.

I'm not looking for some definitive answers (not sure there are any) but if you guys can point me in some direction or offer some insights about other ways to go about this it would be greatly appreciated.

I work in Telecommunications and my main areas of interests are calls, products/services and revenues and how they relate to our customers and how they change over time.

Example

For a given month I have aggregated call data (millions of rows) and customer ids. A hypothetical table definition could look like:

Code:
CREATE TABLE calls (First_day_of_month (Date)
					,CustomerID (Int)
					,Call_Seconds (Int))
The combination of First_day_of_month and CustomerID is unique.

A typical question I would like to answer is, for each month how many customers called more than they did last month? For the last two years.

What I end up doing is something like (pseudocodish):
Code:
SELECT calls1.First_day_of_month
	,COUNT(calls1.CustomerID) AS Number_of_customers_calling_more
FROM calls calls1
	INNER JOIN calls calls2
		ON calls1.First_day_of_month = DATEADD(calls2.First_day_of_month, MONTH, 1)
			AND calls1.CustomerID = calls2.CustomerID
WHERE calls1.Call_Seconds > calls2.Call_Seconds
AND calls1.First_day_of_month >= CAST('2008-01-01' AS DATETIME)
What I'm doing is joining the calls table on itself on customer ids and month but changing the specific month on one of the tables to reflect the time period I'm interested in tracking changes over.

This, as you can imagine, puts the database through it's paces and depending on the size of the datasets can be quite slow to work with. Slow in this case means hours for query completion and I find that puts a defenitive damper on my enthusiasm for digging through the data. It works, but it is a serious pain to work with.

I've tried a lot of different indexing options and it definitely helps up to a point but I'm thinking that this problem can't be unique to me and somewhere there is a more elegant solution.

Any thoughts? Pointers? Links?

I'm willing to rip the data apart in any way that will help me get better performance.

Thanks
Reply With Quote
  #2 (permalink)  
Old 05-14-10, 07:11
pootle flump pootle flump is offline
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
Is this SQL Server? If so, what version?
You say the first day of month and customer id are unique - please can you post the exact definition of that constraint i.e. script it out?
Are you aware that using functions on indexed columns renders them unsargable and therefore unusable (at least in seeks)?

if all your queries are ad-hoc then it is really quite difficult to plan for them. On the other hand, if this is a regular requirement then a column to store the previous month's calls would be a simple and very performant solution.
Reply With Quote
  #3 (permalink)  
Old 05-14-10, 07:31
woptsufp woptsufp is offline
Registered User
 
Join Date: May 2010
Posts: 3
Thanks for the reply.

It's all kinds of servers I do most of my work on Sql Server 2008 but I'm not a MSSQL fanatic and play around with other systems as well depending on the situation.

Regarding the uniqueness of CustomerID vs Month that's not something I have actually specified constraints for. The integrity of the data is ensured by other means and is not a priority in this environment. I've been told that constraints are not really helping with performance but I could be wrong. That said, each Customer ID should only show up once every month. And so far it does.

I dig the data out of our enterprise data store and dump it into Sql Server 2008 and I'm not working in the production environment. The output is used for reports or if I find something interesting it's used as a prototype for requirements to go into production at a later stage.

I was not aware that funcions on indexed columns had such an impact, thanks for the information.
Reply With Quote
  #4 (permalink)  
Old 05-14-10, 07:45
pootle flump pootle flump is offline
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
Unique and Primary Key constraints are enforced using B-Tree indexes in most enterprise RDBMSs, so whoever told you that constraints don't help with performance was quite wrong. Certain other constraints can be used by some RDBMS engines to optimise their querying, for example NOT NULL and FOREIGN KEY constraints.

Indexes pretty well always either help or make no difference to select queries - they have no cost in this regard. Their cost is upkeep during INSERT, DELETE and UPDATE operations. If this is offline and you don't care how long it takes to load, just how long it takes to query, then I would index liberally and defragment the indexes after every load.

You would do well to study execution plan analysis in order to identify what indexes are being used and where indexes are required. The DTA in SQL Server can help you with this.
Reply With Quote
  #5 (permalink)  
Old 05-14-10, 08:12
woptsufp woptsufp is offline
Registered User
 
Join Date: May 2010
Posts: 3
Thanks again.

I'll do some further testing and see what I can come up with based on your input. I was aware that primary keys helped with performance but not that unique constraints could help further. Have a nice weekend

(Anyone else with further input is very welcome to contribute)
Reply With Quote
  #6 (permalink)  
Old 05-19-10, 17:20
loquin loquin is offline
Super Moderator
 
Join Date: Jun 2004
Location: Arizona, USA
Posts: 1,797
Quote:
Originally Posted by woptsufp
... I was aware that primary keys helped with performance but not that unique constraints could help further...
Well, a primary key is just a unique key with a further constraint the data be non-null.
__________________
Lou
使大吃一惊
"Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
"I have my standards. They may be low, but I have them!" - Bette Middler
"It's a book about a Spanish guy named Manual. You should read it." - Dilbert


Last edited by loquin; 05-20-10 at 14:27.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On