Results 1 to 8 of 8
  1. #1
    Join Date
    Jan 2011
    Posts
    4

    Database index design

    We are having some trouble with a table being able to return a few thousand records from a table with 80million (this is our largest to date and will keep growing) rows.

    The table is structured as follows

    CREATE TABLE "utilization" (
    "id" integer NOT NULL DEFAULT autoincrement UNIQUE,
    "system_id" integer NOT NULL,
    "unit_id" unsigned int NOT NULL,
    "unit_date" date NULL,
    "utilization" integer NULL,
    "interval" varchar(1) NULL,
    "unit_hour" integer NULL,
    "unit_actual" varchar(1) NULL,
    PRIMARY KEY ( "id" ASC )
    );

    id = PK
    system_id = there can be multiple systems hosted on the same database but this value is usually something like a zip code or some kinda identifier for our customers
    unit_id = this is a unique identification for each unit in the system (there can be multiple rows per unit)
    unit_date = this is just the date the unit reported the information
    utilization = the amount of utilization the unit reported
    interval = over what time period the information was taken ('H'our or 'D'ay)
    unit_hour = used to identify for which 'H' the utilization was taken
    unit_actual = was this an estimated value or actual


    Based on this th rows will look something like this...

    1, 01254, 2314875, '2000-01-01', 5, 'H', 0, 'Y'
    2, 01254, 2314875, '2000-01-01', 1, 'H', 1, 'Y'
    3, 01254, 2314875, '2000-01-01', 9, 'H', 2, 'Y'
    ..
    4, 01254, 2314875, '2000-01-01', 2, 'H', 23, 'Y'
    5, 01254, 2314875, '2000-01-01', 17, 'D', NULL, 'Y'


    If everything works you should have 25 records per unit every day. (1 for each hour and 1 for the whole day). Can have anywhere between 500 units and 50k+ units (lots of rows as you can tell).

    Currently we have composite index's (index(unit_id,unit_date) and index(unit_date,unit_id).

    When we are querying on the table we are usually wanting all the records in a particular date range for a particular unit. So when we do this we are getting back 4-5k records on avg (more if the system has been active longer). We do some math on the records we get back and spit out the data. These queries usually take between 20-45 seconds. We are trying to speed the queries up but not sure what else we can really do (index wise). The indexes we currently have are being used for the queries we are running. However we just can't seem to speed the queries up.

    As far as adding more indexes and as you can tell the issue is the only decent uniqueness in the table is the date and unit_id which we have indexes on. So I don't think adding more indexes would really help the situation.

    Would adding a clustered index or covering index be beneficial? I am just throwing ideas out there and would just like some ideas on something to try.

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    can we see the query, and the EXPLAIN on the query?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    The combination of UNIT_ID and UNIT_DATE is not unique, based on your description and sample data. Also, I don't see the point of having a "daily" record stored together with the rest of the data. This is a COBOL methodology of data design, - multiple record types stored in 1 data file.
    "The data in a record depends on the Key to the record, the Whole Key, and
    nothing but the Key, so help me Codd."

  4. #4
    Join Date
    Jan 2011
    Posts
    4
    Quote Originally Posted by r937 View Post
    can we see the query, and the EXPLAIN on the query?
    SQLAnywhere doesn't seem to have an EXPLAIN function except for the graphical one which kinda makes the data look nasty. However I posted it anyways. If you let me know what you are looking for maybe I can get it easier.


    SELECT unit_date,utilization*0.1 as utilization FROM utilization where unit_id= 110188590 and interval = 'D' and system_id = 1803


    Node Statistics

    *
    Estimates
    *
    Description
    *
    RowsReturned

    266.76

    Number of rows returned
    PercentTotalCost

    0

    Run time as a percent of total query time
    RunTime

    0

    Time to compute the results
    CPUTime

    0

    Time required by CPU
    DiskReadTime

    0

    Time to perform reads from disk
    DiskWriteTime

    0

    Time to perform writes to disk

    Subtree Statistics

    *
    Estimates
    *
    Description
    *
    RowsReturned

    266.76

    Number of rows returned
    PercentTotalCost

    100

    Run time as a percent of total query time
    RunTime

    84.487

    Time to compute the results
    CPUTime

    0.016888

    Time required by CPU
    DiskReadTime

    84.47

    Time to perform reads from disk
    DiskWriteTime

    0

    Time to perform writes to disk
    DiskRead

    5262.9

    Disk reads
    DiskWrite

    0

    Disk writes

    Optimizer statistics

    *
    Value
    *
    Description
    *
    Estimated cache pages

    113868

    Estimated cache pages available for this statement
    CurrentCacheSize

    1840592

    Current cache size in kilobytes
    ProductVersion

    10.0.1.4075

    Product version
    user_estimates

    Override-magic

    Controls whether to respect user estimates
    ansi_substring

    On

    Controls behavior of substring function with negative start or length parameter
    conversion_error

    On

    Controls datatype conversion errors
    float_as_double

    Off

    Controls the interpretation of the FLOAT type
    ansinull

    On

    Controls interpretation of NULL values
    ansi_integer_overflow

    On

    Controls whether integer overflow causes an error
    ansi_blanks

    Off

    Controls truncation errors
    string_rtruncation

    On

    Controls truncation errors on INSERT or UPDATE
    divide_by_zero_error

    On

    Controls divide-by-zero errors
    sort_collation

    Internal

    Controls implicit use of SORTKEY on ORDER BY of characters
    precision

    30

    Maximum number of digits in decimal arithmetic
    scale

    6

    Minimum number of digits after decimal point
    uuid_has_hyphens

    On

    Controls format for UUID values
    first_day_of_week

    7

    Sets the numbering of the days of the week
    date_order

    YMD

    Controls order of date components
    nearest_century

    50

    Controls interpretation of two-digit years
    date_format

    YYYY-MM-DD

    Controls format for DATE values
    timestamp_format

    YYYY-MM-DD HH:NNS.SSS

    Controls format for TIMESTAMP values
    time_format

    HH:NNS.SSS

    Controls format for TIME values
    truncate_timestamp_values

    Off

    Controls precision of TIMESTAMP values
    default_timestamp_increment

    1

    Number of microseconds to add to TIMESTAMP for next value
    return_date_time_as_string

    On

    Controls how DATE, TIME and TIMESTAMP values are fetched
    isolation_level

    0

    Controls the locking isolation level
    optimization_goal

    All-rows

    Optimize queries for first row or all rows
    optimization_level

    9

    Controls amount of effort made by the query optimizer to find an access plan
    optimization_workload

    Mixed

    Controls whether optimizing for OLAP or mixed queries
    max_query_tasks

    0

    Maximum number of tasks that may be used by a parallel execution plan for a single query


    Index Scan Scan consumption using index utilization_unit_date

    Table reference
    Creator name

    dba
    Table name

    consumption
    Estimated rows

    38196500
    Estimated pages

    136162
    Estimated pages in cache

    18937
    Estimated row size (bytes)

    35
    Buffer fetch

    no
    Relax cursor stability

    no
    Lock

    Isolation level 0

    Scan predicates
    system_id= 1803 : 100% Statistics
    unit_id = 110188590 : 0.01746% Statistics
    interval IS NOT NULL : 100% Statistics | Bounded
    interval = 'D' : 4% Statistics

    Index
    Index name

    utilization_unit_date
    Clustered index

    no
    Depth

    3
    Estimated leaf pages

    60342
    Sequential Transitions

    822
    Random Transitions

    34951063
    Key Values

    1527860

    Scan
    Selectivity

    0.017459957% Index
    Direction

    forward
    consumption_miu_id = 110188590 : 0.017459957% Statistics
    consumption_date * ASC

    Node Statistics

    *
    Estimates
    *
    Description
    *
    RowsReturned

    266.76

    Number of rows returned
    PercentTotalCost

    100

    Run time as a percent of total query time
    RunTime

    84.487

    Time to compute the results
    CPUTime

    0.016888

    Time required by CPU
    DiskReadTime

    84.47

    Time to perform reads from disk
    DiskWriteTime

    0

    Time to perform writes to disk
    DiskRead

    5262.9

    Disk reads
    DiskWrite

    0

    Disk writes

    Index Statistics

    *
    Estimates
    *
    Description
    *
    RowsReturned

    6669.1

    Number of rows returned

  5. #5
    Join Date
    Jan 2011
    Posts
    4
    Quote Originally Posted by rdjabarov View Post
    The combination of UNIT_ID and UNIT_DATE is not unique, based on your description and sample data. Also, I don't see the point of having a "daily" record stored together with the rest of the data. This is a COBOL methodology of data design, - multiple record types stored in 1 data file.
    You are correct. There should only be 25 records per date (day) per unit.

  6. #6
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    in your query you are not requesting a date range. If you were, I would suggest putting an index on that column. Otherwise, you have to love with what you got
    Dave

  7. #7
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by dav1mo View Post
    Otherwise, you have to love with what you got
    that's what i told all my girlfriends, eh

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  8. #8
    Join Date
    Jan 2011
    Posts
    4
    Quote Originally Posted by dav1mo View Post
    in your query you are not requesting a date range. If you were, I would suggest putting an index on that column. Otherwise, you have to love with what you got
    Dave
    That was just one of the queries we were trying to run (probably the most simple). There are others that specify a date range.

    I am curious if having the composite index on the unit_date and the unit_id could be exchanged for just and index on the unit_date (in replace of your suggestion).

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •