Results 1 to 15 of 15
  1. #1
    Join Date
    Jun 2005
    Posts
    319

    Unanswered: Am I a SQL snob?

    I think the answer is yes.

    We're talking about moving from Microsoft SQL Server to MySQL and my initial reaction was meh, but I really don't know enough about MySQL. Can anyone give me a quick comparison between the two in terms of tradeoffs? I already know MySQL is cheaper. We are doing this to use table partitioning so we don't have to shell out for Enterprise Edition.

  2. #2
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    "The data in a record depends on the Key to the record, the Whole Key, and
    nothing but the Key, so help me Codd."

  3. #3
    Join Date
    Sep 2005
    Posts
    161
    I would have to ask about your needs for table partitioning. Sometimes performance tuning alone will solve your issues.

  4. #4
    Join Date
    Jun 2005
    Posts
    319
    We have some large (and wide tables) w/ between hundreds of millions of rows, topping out at a billion rows. The bottleneck in our data flow is inserting into these tables that have a clustered PK and a few indexes.

    The nice thing about SQL Server is we can fill in an empty table with 8-10 million rows for 1 day's worth of data for 1 event, and then just switch this table into a larger table as a new partition (BUT it requires Enterprise Edition = $$$$).

    After doing some research with MySQL it looks like the table partitioning is pretty immature, but might suffice. However, it does not support the SWITCH functionality like MSFT does.

  5. #5
    Join Date
    Sep 2005
    Posts
    161
    A couple of more things. These may seem obvious...
    Is the clustered primary key an identity column? Or, is it designed so that each new PK is always greater than the previous?

    What is the fill factor on the indexes? With the number of inserts you are performing, page splits will happen often.

    One more thing comes to mind. How is your transaction log configured? If your transaction log can't keep up, that will be a bottle neck no matter what you do with your data files.
    Last edited by cascred; 12-27-10 at 15:47.

  6. #6
    Join Date
    Jun 2005
    Posts
    319
    yeap clustered pk, identity, bigint
    good call on the fill factor, looks like they are currently at the default (0)

    What would be good to change it to, like 70 maybe? I assume we need to drop and create indexes for this to take affect or is it a simple non-intrusive change?

    DB is set to Simple recovery so the tran log shouldn't be a problem.

  7. #7
    Join Date
    Sep 2005
    Posts
    161
    With the amount of data you are inserting, I would suggest something even lower, maybe 50%. You will use more hard drive space though. However, page splits can be expensive.

    Even in Simple recovery, there is a transaction log. There always has to be a way to roll back a transaction. If your transaction log is on the same drive or array as your data files, it competes for resources. If it is on a slower drive, it will affect performance.

  8. #8
    Join Date
    Jun 2005
    Posts
    319
    Gotcha, all log files are on separate disks - we're in the cloud and can't control it beyond that.

    I guess I will look into (test) what is involved in changing our fill factor to see if that helps with inserts any.

    dbcc showcontig for a random table (10 million rows)

    Code:
    TABLE level scan performed.
    - Pages Scanned................................: 351202
    - Extents Scanned..............................: 44117
    - Extent Switches..............................: 44116
    - Avg. Pages per Extent........................: 8.0
    - Scan Density [Best Count:Actual Count].......: 99.51% [43901:44117]
    - Logical Scan Fragmentation ..................: 0.48%
    - Extent Scan Fragmentation ...................: 22.15%
    - Avg. Bytes Free per Page.....................: 131.8
    - Avg. Page Density (full).....................: 98.37%
    DBCC execution completed. If DBCC printed error messages, contact your system administrator.
    Last edited by Gagnon; 12-27-10 at 17:22.

  9. #9
    Join Date
    Sep 2005
    Posts
    161
    22% extent fragmentation is high. This would have a significant impact on your insert performance as your extents filled up and had to be split. The 99.51% scan density is to be expected when using a default fill factor of 0. Full extents leads to extent splits. A high scan density is great for SELECTS, but not for INSERTS.

    Do you have plenty of ram? When performing inserts, your indexes are scanned to find the correct page to insert new data into your index pages. It is possible that when you begin dumping data into your tables, there is enough ram to cache your indexes, but after you have added several million rows your server can no longer cache all of the indexes. Just a thought. You mentioned your server is in the cloud, so I don't know how much control you have over hardware.

  10. #10
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    cascred, where are you getting your conclusions from????

    22% extent fragmentation is high? OK, I may buy this one. But how is it bad for INSERTs, if this number reflects the factor of continuity of extents for leaf pages in the index? At best it will negatively affect ordered index scans, but not the INSERTs!

    Extent splits? What is that? Is it when a 9th page needs to be inserted, we "split" the extent???? What are you talking about here?

    A high scan density is good for SELECTs but not for INSERTs??? Wow, you're a genius! No wonder I felt I was in the dark while reading some of your posts...

    You left me speachless here, I'll go back to read more on SHOWCONTIG, because obviously I missed this one all the way starting from 6.0.
    "The data in a record depends on the Key to the record, the Whole Key, and
    nothing but the Key, so help me Codd."

  11. #11
    Join Date
    Jun 2005
    Posts
    319
    from: Fragmentation 101 - SQLServerCentral

    Note: The value for Extent Scan Fragmentation is not really important. This is because the algorithm that calculates it does not work for indexes that span over multiple files. According to some Microsoft sources, in the next version of SQL Server, DBCC SHOWCONTIG will not even include this metric.

  12. #12
    Join Date
    Jun 2005
    Posts
    319
    Just to extend this topic a little, I think from what I understand,

    high Scan Density = great for reads
    low-medium Scan Density = great for inserts (i.e. having a higher fill factor than 0)

    So at least on our tracker nodes collecting data that is never read except when it is aggregated across nodes it sounds like we would benefit the most there by a much higher fill factor, maybe 90%.

    As the data is aggregated on fewer boxes I think we still want a fill factor > 0, maybe 50% as previously suggested. This data will still need to be read, but we will have to gauge the performance and play with the fill factor.

    I assume in order to fiddle with the fill factor this requires either:

    A) Rebuild the Index
    B) Drop / Create the Index

    Either of which (particularly B) will be a costly (time wise) action.

  13. #13
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    First of all, since you're using at the very least SQL2K5, you need to use DMV's, not DBCC, to find out what your page and extend level statistics are. sys.dm_db_index_physical_stats will be your first and last bet.

    Second, you're taking chances following advice that is not only unsubstantiated by any demonstrated evidence of described "issues", but clearly indicates lack of any practical knowledge of the matter.

    Thirdly, "playing with a fill factor" is like playing with matches, except you may not get your butt on fire with the latter. Also, rebuilding an index is dropping and creating it. So in your case, don't play with matches, and you won't have to worry about what's more time-consuming
    "The data in a record depends on the Key to the record, the Whole Key, and
    nothing but the Key, so help me Codd."

  14. #14
    Join Date
    Jun 2005
    Posts
    319
    Thanks for the vote of confidence dude, you are a real charmer.

  15. #15
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    No problem, any time
    "The data in a record depends on the Key to the record, the Whole Key, and
    nothing but the Key, so help me Codd."

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •