If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > Database index design

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 01-21-11, 10:05
kur1j kur1j is offline
Registered User
 
Join Date: Jan 2011
Posts: 4
Database index design

We are having some trouble with a table being able to return a few thousand records from a table with 80million (this is our largest to date and will keep growing) rows.

The table is structured as follows

CREATE TABLE "utilization" (
"id" integer NOT NULL DEFAULT autoincrement UNIQUE,
"system_id" integer NOT NULL,
"unit_id" unsigned int NOT NULL,
"unit_date" date NULL,
"utilization" integer NULL,
"interval" varchar(1) NULL,
"unit_hour" integer NULL,
"unit_actual" varchar(1) NULL,
PRIMARY KEY ( "id" ASC )
);

id = PK
system_id = there can be multiple systems hosted on the same database but this value is usually something like a zip code or some kinda identifier for our customers
unit_id = this is a unique identification for each unit in the system (there can be multiple rows per unit)
unit_date = this is just the date the unit reported the information
utilization = the amount of utilization the unit reported
interval = over what time period the information was taken ('H'our or 'D'ay)
unit_hour = used to identify for which 'H' the utilization was taken
unit_actual = was this an estimated value or actual


Based on this th rows will look something like this...

1, 01254, 2314875, '2000-01-01', 5, 'H', 0, 'Y'
2, 01254, 2314875, '2000-01-01', 1, 'H', 1, 'Y'
3, 01254, 2314875, '2000-01-01', 9, 'H', 2, 'Y'
..
4, 01254, 2314875, '2000-01-01', 2, 'H', 23, 'Y'
5, 01254, 2314875, '2000-01-01', 17, 'D', NULL, 'Y'


If everything works you should have 25 records per unit every day. (1 for each hour and 1 for the whole day). Can have anywhere between 500 units and 50k+ units (lots of rows as you can tell).

Currently we have composite index's (index(unit_id,unit_date) and index(unit_date,unit_id).

When we are querying on the table we are usually wanting all the records in a particular date range for a particular unit. So when we do this we are getting back 4-5k records on avg (more if the system has been active longer). We do some math on the records we get back and spit out the data. These queries usually take between 20-45 seconds. We are trying to speed the queries up but not sure what else we can really do (index wise). The indexes we currently have are being used for the queries we are running. However we just can't seem to speed the queries up.

As far as adding more indexes and as you can tell the issue is the only decent uniqueness in the table is the date and unit_id which we have indexes on. So I don't think adding more indexes would really help the situation.

Would adding a clustered index or covering index be beneficial? I am just throwing ideas out there and would just like some ideas on something to try.
Reply With Quote
  #2 (permalink)  
Old 01-21-11, 11:48
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
can we see the query, and the EXPLAIN on the query?
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #3 (permalink)  
Old 01-21-11, 13:26
rdjabarov rdjabarov is offline
Registered User
 
Join Date: Jul 2003
Location: San Antonio, TX
Posts: 3,611
The combination of UNIT_ID and UNIT_DATE is not unique, based on your description and sample data. Also, I don't see the point of having a "daily" record stored together with the rest of the data. This is a COBOL methodology of data design, - multiple record types stored in 1 data file.
__________________
"The data in a record depends on the Key to the record, the Whole Key, and
nothing but the Key, so help me Codd."
Reply With Quote
  #4 (permalink)  
Old 01-21-11, 16:18
kur1j kur1j is offline
Registered User
 
Join Date: Jan 2011
Posts: 4
Quote:
Originally Posted by r937 View Post
can we see the query, and the EXPLAIN on the query?
SQLAnywhere doesn't seem to have an EXPLAIN function except for the graphical one which kinda makes the data look nasty. However I posted it anyways. If you let me know what you are looking for maybe I can get it easier.


SELECT unit_date,utilization*0.1 as utilization FROM utilization where unit_id= 110188590 and interval = 'D' and system_id = 1803


Node Statistics

*
Estimates
*
Description
*
RowsReturned

266.76

Number of rows returned
PercentTotalCost

0

Run time as a percent of total query time
RunTime

0

Time to compute the results
CPUTime

0

Time required by CPU
DiskReadTime

0

Time to perform reads from disk
DiskWriteTime

0

Time to perform writes to disk

Subtree Statistics

*
Estimates
*
Description
*
RowsReturned

266.76

Number of rows returned
PercentTotalCost

100

Run time as a percent of total query time
RunTime

84.487

Time to compute the results
CPUTime

0.016888

Time required by CPU
DiskReadTime

84.47

Time to perform reads from disk
DiskWriteTime

0

Time to perform writes to disk
DiskRead

5262.9

Disk reads
DiskWrite

0

Disk writes

Optimizer statistics

*
Value
*
Description
*
Estimated cache pages

113868

Estimated cache pages available for this statement
CurrentCacheSize

1840592

Current cache size in kilobytes
ProductVersion

10.0.1.4075

Product version
user_estimates

Override-magic

Controls whether to respect user estimates
ansi_substring

On

Controls behavior of substring function with negative start or length parameter
conversion_error

On

Controls datatype conversion errors
float_as_double

Off

Controls the interpretation of the FLOAT type
ansinull

On

Controls interpretation of NULL values
ansi_integer_overflow

On

Controls whether integer overflow causes an error
ansi_blanks

Off

Controls truncation errors
string_rtruncation

On

Controls truncation errors on INSERT or UPDATE
divide_by_zero_error

On

Controls divide-by-zero errors
sort_collation

Internal

Controls implicit use of SORTKEY on ORDER BY of characters
precision

30

Maximum number of digits in decimal arithmetic
scale

6

Minimum number of digits after decimal point
uuid_has_hyphens

On

Controls format for UUID values
first_day_of_week

7

Sets the numbering of the days of the week
date_order

YMD

Controls order of date components
nearest_century

50

Controls interpretation of two-digit years
date_format

YYYY-MM-DD

Controls format for DATE values
timestamp_format

YYYY-MM-DD HH:NNS.SSS

Controls format for TIMESTAMP values
time_format

HH:NNS.SSS

Controls format for TIME values
truncate_timestamp_values

Off

Controls precision of TIMESTAMP values
default_timestamp_increment

1

Number of microseconds to add to TIMESTAMP for next value
return_date_time_as_string

On

Controls how DATE, TIME and TIMESTAMP values are fetched
isolation_level

0

Controls the locking isolation level
optimization_goal

All-rows

Optimize queries for first row or all rows
optimization_level

9

Controls amount of effort made by the query optimizer to find an access plan
optimization_workload

Mixed

Controls whether optimizing for OLAP or mixed queries
max_query_tasks

0

Maximum number of tasks that may be used by a parallel execution plan for a single query


Index Scan Scan consumption using index utilization_unit_date

Table reference
Creator name

dba
Table name

consumption
Estimated rows

38196500
Estimated pages

136162
Estimated pages in cache

18937
Estimated row size (bytes)

35
Buffer fetch

no
Relax cursor stability

no
Lock

Isolation level 0

Scan predicates
system_id= 1803 : 100% Statistics
unit_id = 110188590 : 0.01746% Statistics
interval IS NOT NULL : 100% Statistics | Bounded
interval = 'D' : 4% Statistics

Index
Index name

utilization_unit_date
Clustered index

no
Depth

3
Estimated leaf pages

60342
Sequential Transitions

822
Random Transitions

34951063
Key Values

1527860

Scan
Selectivity

0.017459957% Index
Direction

forward
consumption_miu_id = 110188590 : 0.017459957% Statistics
consumption_date * ASC

Node Statistics

*
Estimates
*
Description
*
RowsReturned

266.76

Number of rows returned
PercentTotalCost

100

Run time as a percent of total query time
RunTime

84.487

Time to compute the results
CPUTime

0.016888

Time required by CPU
DiskReadTime

84.47

Time to perform reads from disk
DiskWriteTime

0

Time to perform writes to disk
DiskRead

5262.9

Disk reads
DiskWrite

0

Disk writes

Index Statistics

*
Estimates
*
Description
*
RowsReturned

6669.1

Number of rows returned
Reply With Quote
  #5 (permalink)  
Old 01-21-11, 16:19
kur1j kur1j is offline
Registered User
 
Join Date: Jan 2011
Posts: 4
Quote:
Originally Posted by rdjabarov View Post
The combination of UNIT_ID and UNIT_DATE is not unique, based on your description and sample data. Also, I don't see the point of having a "daily" record stored together with the rest of the data. This is a COBOL methodology of data design, - multiple record types stored in 1 data file.
You are correct. There should only be 25 records per date (day) per unit.
Reply With Quote
  #6 (permalink)  
Old 01-23-11, 21:30
dav1mo dav1mo is offline
Registered User
 
Join Date: Dec 2007
Location: Richmond, VA
Posts: 782
in your query you are not requesting a date range. If you were, I would suggest putting an index on that column. Otherwise, you have to love with what you got
Dave
Reply With Quote
  #7 (permalink)  
Old 01-23-11, 21:42
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
Quote:
Originally Posted by dav1mo View Post
Otherwise, you have to love with what you got
that's what i told all my girlfriends, eh

__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #8 (permalink)  
Old 01-24-11, 11:11
kur1j kur1j is offline
Registered User
 
Join Date: Jan 2011
Posts: 4
Quote:
Originally Posted by dav1mo View Post
in your query you are not requesting a date range. If you were, I would suggest putting an index on that column. Otherwise, you have to love with what you got
Dave
That was just one of the queries we were trying to run (probably the most simple). There are others that specify a date range.

I am curious if having the composite index on the unit_date and the unit_id could be exchanged for just and index on the unit_date (in replace of your suggestion).
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On