Results 1 to 7 of 7
  1. #1
    Join Date
    Jul 2006
    Posts
    13

    Unanswered: Oracle 10.1.0.3 issue in alert log

    I am getting the following lines in my alert log file:


    skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3


    This line is repeated 2645 times continously in 17 minutes and now there are no such errors.
    The database version is 10.1.0.3 and operating system is Solaris 5.9
    The database is working and there are no complaints from application team.
    Can anybody tell me something about this error?
    The database of our company is not using shared servers. Also the swap space is nearly 95% free.

    Gurpartap Singh

  2. #2
    Join Date
    May 2004
    Location
    Dominican Republic
    Posts
    721
    A search on metalink revealed me that there are numerous type of problems related to the same exact line. But in any case, if there's nothing wrong with the DB, it _shoulnd't_ be a problem.

  3. #3
    Join Date
    Jul 2006
    Posts
    13
    Yes, you are right. But it doesn't tell the exact casuse of the problem. It is detail for AIX and HP Unix on google and metalink but not for Solaris Operating System and not for Oracle 10.1.0.3.

    Thanks for the response....

    Gurpartap Singh

  4. #4
    Join Date
    Jul 2006
    Posts
    13

    Update

    So we got the solution:

    Next day we got nearly 10,000 lines like:

    skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3

    in the alert log file.

    Also we had nearly 10,000 defunct processes:
    $ps -ef |grep defunct

    and those were all childern to:
    'ora_qmnc_prdncms'

    We had the RMAN backup failure that night also and the next day again we had a RMAN backup failure.
    with ORA-04030
    That is the process was not able to get memory and so it failed.



    So we saw the memory structure and found we had 1.5GB of physical Memory

    But on SWAP we had only 90MB of available memory. Total SWAP on system is 16GB
    $swap -s

    This is not normal.

    Sowe came with an idea of the behaviour of oracle on solaris i.e Oracle reserves the space for all the processes it
    starts in SWAP also. i.e. if a process starts and it needs 4MB of memory then oracle will also reserve 4MB on the
    SWAP space. If oracle is not able to reserve 4MB on SWAP space the process will fail.

    So, we checked which process is using maximum memory:
    $prstat -s size (Sorts the processes accoding to memory usage)

    and found 'ocssd.bin' is using 8GB of space in SWAP.

    Since we had also opened he service request with Oracle and told them about it and they told us that it is safe to
    kill this process as it is used only with ASM or RAC and we are not using it.

    So we killed this process and instantaneously we got that 8GB freed.


    So now when we bounced the database at night, we opened a different window and ran:
    $vmstat 2

    This command shows physical and swap memory usage every 2 seconds.

    As we suspected when shutdown of database was in progress we saw 300Mb increase in the amount of free physical
    memory and 600 Mb increase in the amount of free swap space. This corresponds almost exactly to the amount of memory
    being used and reserved by the SGA, any difference would be the memory freed by the connected sessions. We ran the
    following 2 queries and found the SGA is around 500Mb in size and is using about 300 Mb.

    SQL>select round(sum(BYTES/1024/1024),2) as "Total SGA in Mb" from v$sgastat;

    Total SGA in Mb
    ---------------
    489,74

    SQL>select round(sum(BYTES/1024/1024),2) as "Used SGA in Mb" from v$sgastat where NAME not like '%free%';

    Used SGA in Mb
    --------------
    295,83

    So, as we were thinking that Oracle on Solaris will reserve swap space for the entire size of memory it believes a
    process will need: in this example the database will need 500Mb for it's SGA, at it's maximum size, so, it reserves
    500Mb even though there is plenty of physical memory to hold the SGA.

    But when I tried the same on Linux it doesn't work. On Linux Oracle doesn't reserves the space for all the processes
    on SWAP.



    About the defunct processes we were having in the database were due to the advanced queing(AQ). This is a new feature
    of Oracle 10g to use the advanced queing by default. This is used by the Oracle Scheduler. So everytime
    this process that was scheduled in scheduler tried to use AQ (Advanced Queing) and tried to start its process, it failed
    because it was also not able to get the space for reservation in SWAP space and so we got the error in alert log like:

    skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3


    Now after bouncing the database there are no such errors and our baackup job and other jobs are also running fine.


    Thanks

    Gurpartap Singh

  5. #5
    Join Date
    May 2004
    Location
    Dominican Republic
    Posts
    721
    Great you solved your problem. However, I can't seem to replicate your findings about reserved memory in swap. I am on Solaris 9 running 10gR2 RAC. Also, it is possible is the OS reserving the swap space rather than the DB. What version of solaris are you on ?

  6. #6
    Join Date
    Jul 2006
    Posts
    13
    We are using the following:
    5.9 Generic_118558-03 sun4u sparc SUNW,Sun-Fire-V240

    May be you are right as we have not received the conformation from oracle about this concept. We have put this question but haven't got the reply yet.
    May be this is the behaviour of Oracle 10g R1 on Solaris.

    I like to say can some one also do this test on their environment and give the results......

    I think this is the space that Oracle reserves on the SWAP as 'ocssd.bin' is the oracle process that reserved 8GB of memory and also when we shutdown the database we had the above mentioned results...

    Thanks for the reply.......

    Gurpartap Singh

  7. #7
    Join Date
    Jul 2006
    Posts
    13

    Update

    Oracle has agreed with us but I don't know what to say in your case and also we cannot test this till we again get a similar problem as this is a producion database.

    Thanks for your response....

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •