Results 1 to 5 of 5
  1. #1
    Join Date
    Jan 2007
    Posts
    9

    Unanswered: Plagued by corruption

    Since we have migrated our 2000 databases to SQL 2005 on new hardware (described below), we have experienced re-occurring corruption on the same tables and indexes even after the corruption had been resolved by dropping/re-creating the indexes manually. Typical events include:

    1. Clients receiving errors, such as 8646
    2. We see the errors in the event logs, run DBCC's to confirm the errors which then creates a dump file
    3. We resolve by dropping/creating the index(es) affected
    4. Re-run the DBCC checks and they come up clean.

    A day or two later, the entire cycle will repeat against the same object/index. This problem exists on multiple different databases on multiple different servers, all with the same structure and hardware. Our SAN engineers say there are no issues they can see with the disks.

    I would be very appreciative for any suggestions to better troubleshoot the issue, or, of course, a resolution would be magnificent!!


    Here is as much information about our setup and errors experienced as I can think of to provide. Of course, if I am missing something just shout and I will include it. TIA.


    We are running SQL 2005 enterprise x64 sp1 on Windows 2003 enterprise x64 sp1 as a 2-node active/passive cluster (native windows clustering). The hardware is HP Proliant DL585, 4 dual-core processors @ 2.8 GHz, 48 GB of RAM.

    When clustering the servers, we pay special attention to ensure the NIC settings are as specified by published MS white papers both at the NIC level and as cluster configurations. I can provide those settings if necessary.

    Other pertinent settings:
    - We set the SQL Server's min/max setting to both be 40960 MB (40GB)
    - MAXDOP to 2
    - System DB's share two drives, one for data, one for log (all drives are RAID or 10)
    - TempDB has 1 drive for data, one for log, and we modify TempDB to have as many data files as processors
    - User DBs share multiple pairs of drives, each pair has a dedicated data drive and dedicated log drive
    - Backups (litespeed) go to a dedicated drive
    - Full-text catalogs go to a dedicated drive
    - Maintenance jobs run every night and include: Reindex, update stats, backups and other miscellaneous jobs specific to our environment
    - We set physical database files to grow by 200MB increments when over 2GB in size and we proactively grow each file overnight if the file's free space is < 10%

  2. #2
    Join Date
    Jan 2007
    Posts
    9
    Additional error and dump file info...


    The typical app log entry for error 8646:

    Unable to find index entry in index ID 1, of table 894626230, in database 'dbname'. The indicated index is corrupt or there is a problem with the current update plan. Run DBCC CHECKDB or DBCC CHECKTABLE. If the problem persists, contact product support.

    Contents of the dump file:

    This file is generated by Microsoft SQL Server
    version 9.00.3042.00
    upon detection of fatal unexpected error. Please return this file,
    the query or program that produced the bugcheck, the database and
    the error log, and any other pertinent information with a Service Request.


    Computer type is AT/AT COMPATIBLE.
    Current time is 07:32:39 11/07/07
    8 Unknown CPU 9., 2813 Mhz processor (s).
    Windows NT 5.2 Build 3790 CSD Service Pack 1.

    Memory
    MemoryLoad = 92%
    Total Physical = 49149 MB
    Available Physical = 3913 MB
    Total Page File = 60149 MB
    Available Page File = 15932 MB
    Total Virtual = 8388607 MB
    Available Virtual = 8338218 MB


    DBCC RESULTS
    --------------------
    <DbccResults>
    <Dbcc ID="0" Error="2511" Severity="16" State="1">Table error: Object ID 894626230, index ID 1, partition ID 7205
    7596565782528, alloc unit ID 72057596808986624 (type In-row data). Keys out of order on page (1:487717), slots 3
    and 4.</Dbcc>
    <Dbcc ID="1" Error="2511" Severity="16" State="1">Table error: Object ID 894626230, index ID 1, partition ID 7205
    7596565782528, alloc unit ID 72057596808986624 (type In-row data). Keys out of order on page (1:487717), slots 79
    and 80.</Dbcc>


    ...and so on...

  3. #3
    Join Date
    Jan 2007
    Posts
    9
    and last but not least...


    * Short Stack Dump

    PSS @0x000000045399FA80
    -----------------------



    CSession @0x000000045399E410
    ----------------------------
    m_spid = 54 m_cRef = 14 m_rgcRefType[0] = 1
    m_rgcRefType[1] = 1 m_rgcRefType[2] = 11 m_rgcRefType[3] = 1
    m_rgcRefType[4] = 0 m_rgcRefType[5] = 0 m_pmo = 0x000000045399E080
    m_pstackBhfPool = 0x0000000000000000 m_dwLoginFlags = 0x83e0 m_fBackground = 0
    m_fClientRequestConnReset = 0 m_fUserProc = -1 m_fConnReset = 0
    m_fIsConnReset = 0 m_fInLogin = 0 m_fAuditLoginSent = 1
    m_fAuditLoginFailedSent = 0 m_fReplRelease = 0 m_fKill = 0
    m_ulLoginStamp = 54267395 m_eclClient = 5 m_protType = 5
    m_hHttpToken = FFFFFFFFFFFFFFFF

    m_pV7LoginRec
    ---------------------
    0000000000000000: d60b0000 02000972 00100000 00000007 08190000 .......r............

    0000000000000014: 00000000 e0830000 2c010000 09040000 5e000c00 ........,.......^...

    0000000000000028: 00000000 00000000 76000700 84000000 00000000 ........v...........

    000000000000003C: 84000400 8c000000 8c000000 00163582 4ea48c00 ..............5.N...

    0000000000000050: 4a0b8c00 00000000 00000000 0000J.............




    CPhysicalConnection @0x000000045399E2E0
    ---------------------------------------
    m_pPhyConn->m_pmo = 0x000000045399E080
    m_pPhyConn->m_pNetConn = 0x000000045399EB00
    m_pPhyConn->m_pConnList = 0x000000045399E3E0
    m_pPhyConn->m_pSess = 0x000000045399E410 m_pPhyConn->m_fTracked = -1
    m_pPhyConn->m_cbPacketsize = 4096 m_pPhyConn->m_fMars = 0 m_pPhyConn->m_fKill = 0


    CBatch @0x000000045399EFD0
    --------------------------
    m_pSess = 0x000000045399E410 m_pConn = 0x000000045399EEC0 m_cRef = 3
    m_rgcRefType[0] = 1 m_rgcRefType[1] = 1 m_rgcRefType[2] = 1
    m_rgcRefType[3] = 0 m_rgcRefType[4] = 0 m_pTask = 0x00000000038085C8


    EXCEPT (null) @0x00000000507BA138
    ---------------------------------
    exc_number = 0 exc_severity = 0 exc_func = 0x0000000001873F50



    Task @0x00000000038085C8
    ------------------------
    CPU Ticks used (ms) = 6134 Task State = 2
    WAITINFO_INTERNAL: WaitResource = 0x0000000000000000 WAITINFO_INTERNAL: WaitType = 0x0
    WAITINFO_INTERNAL: WaitSpinlock = 0x0000000000000000 SchedulerId = 0x2
    ThreadId = 0x1cc0 m_state = 0 m_eAbortSev = 0



    EC @0x000000045399FA90
    ----------------------
    spid = 54 ecid = 0 ec_stat = 0x0
    ec_stat2 = 0x0 ec_atomic = 0x0 __fSubProc = 1
    ec_dbccContext = 0x00000000507BEF50 __pSETLS = 0x000000045399EF30 __pSEParams = 0x000000045399F350
    __pDbLocks = 0x000000045399FEE0

    SEInternalTLS @0x000000045399EF30
    ---------------------------------
    m_flags = 0 m_TLSstatus = 3 m_owningTask = 0x00000000038085C8
    m_activeHeapDatasetList = 0x000000045399EF30
    m_activeIndexDatasetList = 0x000000045399EF40

    SEParams @0x000000045399F350
    ----------------------------
    m_lockTimeout = -1 m_isoLevel = 4096 m_logDontReplicate = 0
    m_neverReplicate = 0 m_XactWorkspace = 0x0000000BA3DDB9D0
    m_pSessionLocks = 0x0000000780BE6F10 m_pDbLocks = 0x000000045399FEE0
    m_execStats = 0x0000000082256E40 m_pAllocFileLimit = 0x0000000000000000

  4. #4
    Join Date
    Dec 2006
    Posts
    30
    Do the machines have ECC RAM?

  5. #5
    Join Date
    Jan 2007
    Posts
    9
    Yes, all our servers have ECC RAM, which I believe is a req. for the HP servers.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •