Page 1 of 2 12 LastLast
Results 1 to 15 of 26
  1. #1
    Join Date
    Nov 2003
    Posts
    11

    Unhappy Unanswered: SQL Server Bouncing Up/Down

    SQL Server Bouncing Up/Down

    Database: SQL Server 2000 SP3
    OS: 2000 SP4

    The SQL Server has been up and down five times in the last two days. We have run all hardware diagnostics against machine and it is not hardware related.

    Note: Have run all DBCC's...checkdb, checkcatalog, checkalloc.... no erros in any DB.

    msgs in error logs range as follows:

    Log 1
    - LogWriter: Operating system error 6(The handle is invalid.) encountered.
    - Write error during log flush. Shutting down serverError:
    - 9001, Severity: 21, State: 4.
    - Error: 823, Severity: 24, State: 2.
    - Database 'msdb' cannot be opened. It has been marked SUSPECT by recovery. See the SQL Server errorlog for more information.
    - The log for database 'master' is not available..

    Log 2 - Hours later
    -SQL Server Assertion: File: <p:\sql\ums\inc\umslist.h>, line=317
    Failed Assertion = 'el->m_next == 0'.
    - PSS NULL for assert - raising EXCEPTION_ILLEGAL_INSTRUCTION
    - SqlDumpExceptionHandler: Process 1628 generated fatal exception c000001d EXCEPTION_ILLEGAL_INSTRUCTION. SQL Server is terminating this process.
    - SQL Server is aborting. Fatal exception c000001d caught.

    Log 3 - 26 hrs later
    Error: 17883, Severity: 1, State: 0
    - The Scheduler 1 appears to be hung. SPID 0, ECID 0, UMS Context 0x03903AB0.

    Last couple of hours - different times - Server froze - had to reboot.. no errors in SQL Logs...

    Any suggestions: Rebuild MSDB?

  2. #2
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    The last error is very familiar. It usually is caused by an io-intensive operation that completes very quickly, if you have a very fast harddrive or SAN. However, based on previously mentioned errors, I'd say there is something wrong with your IO subsystem. When you said that all DBCC checks returned no errors, did you mean that those checks were done against MSDB database as well? Also, what happened before you noticed this bouncing for the first time? Did you try to move devices around? Did you change harddrives? Did anybody else do anything? Did you have good breakfast that morning?

  3. #3
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    My first suggestion would be to open a call with MS-PSS. It isn't cheap (around $700 if I remember correctly), but it gives you the definitive answer to this kind of problem.

    If that isn't an answer, detach all of your databases (so you don't loose the data), then remove and reload SQL Server. This is a lot of work... Seriously consider opening an incident before you do this!

    -PatP

  4. #4
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    And what would you accomplish by opening an incident? They'll be presented with the same information as we did, which would lead to the same set of questions I've already asked. Don't you think? If you have $700 to spare, I'll fix your problem , no need for M$ to get even richer

  5. #5
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    I've had really good luck with the MS-PSS folks. They've almost always been very calm, cool, collected, and focused on solving my problem. Even in the cases where I've been way over their heads, they quickly find someone able to address my questions.

    Particularly when I have a "server down" situation, they manage to get me the right folks as fast as humanly possible.

    Then again, my last few TAMs were actually based in Texas, so the SQL techs had good reason to be certain that I got what I needed. That combined with the fact that I'm relatively well known there (people stop and say "Hi" in the hall), and they remember that I sometimes send really good ice-cream.

    I never pretend that they are cheap compared to posting a question on a newsgroup or a forum like this. When you have a couple of thousand people unable to work because a server is toast, then they really are cheap!

    -PatP

  6. #6
    Join Date
    Nov 2003
    Posts
    11
    In response:
    - No changes in DB (except for data) or hardware on this machine has occurred for over a year.
    - The DBCC's were run against all DB's - system and user
    - Server has been stable for over a year

    Did you have good breakfast that morning

    And to top off: No breakfast or lunch....preparing for a Lawson/Oracle/AIX upgrade. (Oh-Boy)

  7. #7
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    Provided Answers: 11
    Check through your System event log, and see if you have a few error numbers in the 50 range (I think I am looking for error 55, but I am not sure). These errors in the system log usually point to a disk problem of some kind. Good luck.

  8. #8
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Ok, the no breakfast/no lunch thing is your first mistake. For me, that would be fatal (and I mean that literally).

    It sounds to me like you've got hardware problems, but it could still be software (especially if it were some form of malware).

    You have to make a judgement call here. Is the cost of an incident (let's assume $700 for the sake of discussion) worth more or less than the time needed to solve this problem with some help from us. Since the SQL Server software is what is showing the symptoms, you could open an incident and they'll work with you until it is fixed, 24 by 7 if necessary, and they can bring some considerable pressure to bear on vendors that are slow to react... It is one thing to leave some poor beggar trying to fix a lone server hanging, it is something quite different to have a mid-level manager at Microsoft calling to ask their CEO why they don't provide acceptable levels of support.

    I'm fine with either solution... If you need an answer NOW then I'd suggest using MS-PSS. If you need to fix it soon, but it isn't really urgent, then you can save a couple hundred dollars by working through it here with us.

    -PatP

  9. #9
    Join Date
    Nov 2003
    Posts
    11

    More and More info

    The server is a non-clinical app that can wait to be fixed for a couple of days. So if I can get help form this site...that would be my choice at this time.....I will now overload you with more information...

    String of errors for one incident:
    Event Log erros:
    1)
    Event ID 17055
    Description:
    17066
    SQL Server Assertion: File <p:\sql\ums\inc\umlist.h>, line=317
    Failed Assertion='el->m_next==0'

    2)
    Event ID 17055
    Description:
    17310
    SqlDumpExcptionHandler: Process 1628 generated fatal exception
    c00001dEXCEPTION_ILLEGAL_INSTUCTION.SQl Server is termimating this process.

    3)
    Event ID 17055
    Description:
    17311
    SQL Server is aborting. Fatal Exception c000001d caught.

    4)
    Event ID 17052
    Description:
    The MSQLSERVER service terminated unexpectedly.

    String of errors for second incident:
    Event Log erros:
    1)
    Event ID 17055
    Description:
    17053
    LogWriter: Operating system error 6(The handle is invalid) encountered.

    2)
    Event ID 17052
    Error 9001, Severity 21, State: 4
    The log for database 'msdb' is not available

    3)
    Event ID 17055
    Description:
    17053
    LogWriter: Operating system error 6(The handle is invalid) encountered.

    4)
    Event ID 17052
    Error 9001, Severity 21, State: 4
    The log for database 'msdb' is not available

    5)
    Event ID 17052
    Error 3449, Severity 21, State: 1
    An error occurred that requires SQL Server to shut down so that recovery can be performed.

    String of errors for third incident:
    Event Log erros:
    1)
    Event ID 17055
    Description:
    17053
    LogWriter: Operating system error 6(The handle is invalid) encountered.

    2)
    Event ID 17055
    Description:
    18052
    Error 9001, Severity 21, State: 4

    3)
    Event ID 17055
    Description:
    18052
    Error 823, Severity 24, State: 2

    "System froze had to reboot"

    All started with the first error:

    SQL Server Agent could not be started
    Unable to connect.

    (note: I have reviewed securty and my network admin has sworn no permissions for server have been changed. Also note: When server is running..!!! - Agent is active and running all jobs...transaction...etc)

  10. #10
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    Look into System errorlog, I bet you have either controller of disk failure at times when your server was bouncing. And if that's the case, I'll send you my7 bill later, with $200 discount

  11. #11
    Join Date
    Nov 2003
    Posts
    11

    No Cooresponding erros in System Log

    There are no corresponding System log...except system the normal shutdown was unexpected...

  12. #12
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    ...I clicked too fast...

    Are your log devices residing on a different drive than your data devices? Is it the same for system databases?

    Looks like (just from what I see now) that the drive where your log devices sit is needing a little help...to be popped out and replaced, unless it's a controller (if there is a dedicated controller for that drive) or a SCSI channel on the controller croked. Check your System errorlog.

  13. #13
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    Errorlog entries may not be sorted by date and time, so I usually do Save As text file and dump it into a table (on my Personal Edition). Then combine the 2 columns into 1 and sort descending. There has to be something there, there is usually no mysteries.

  14. #14
    Join Date
    Nov 2003
    Posts
    11
    Note:

    I checked the system log and there are no corresponding errors to the errors I previously psoted for the application log.

    There are no indications in the system log that point to a hardware error.
    No warnings, info ..etc...

    Last night we ran the complete IBM diagnostics against this box - which gave the hardware a complete bill of health.

    This server runs a small financial application...so the mdf and ldf files reside on same disk....only one controler for system

  15. #15
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    Well, then you're all set...Can you believe it yourself?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •