Tuesday 26 July 2011

Troubleshooting when weblogic server is slow


 Server performing Slow

     There are lot of reasons for server performing slow.
     First step is to take thread dumps and see what the threads are doing. If there is  nothing wrong with the threads  there are other reasons why server performs slow:

    Process runs OutOfMemory:

    If java heap is full, server process appears to be hung and not accepting any requests because each request needs heap for allocating objects.

     So if heap is full, none of the requests get served, all the requests fail with java.lang.OutOfMemory

•         OutOfMemory Analysis:
    OutOfMemory can occur because of real memory crunch or a memory leak causing the heap to fill with orphaned objects.
    First step is to enable GC and run the server again.
    (-XX:printGCDetails).
    The STDout file would show the garbage collection details.
     If the error is because of memory leak, then we would need to use profilers like Introscope or optmizeIT to figure out the source of leak.

Process size  = java heap + native memory + memory occupied by the executables and libraries.
    On 32 bit operating systems, the virtual address space of a process can go up to 4 GB. This is data bit limitation (2 pow 32)

     Out of this 4 GB, the OS kernel reserves some part for itself (typically 1 – 2 GB).
    This is not a limitation on 64 bit machines like solaris(sparc) or windows running on Itanium (64 bit)

WebLogic Troubleshooting – Fragmenation

OOM can occur due to fragmentation. In this situation, we can see free memory available but still get OutOfMemory errors.

    Before we know about fragmentation, we need to know the following fact:
    Heap allocation can only be contiguous (As per JVM spec). If a request needs 2MB of memory then JVM has to provide 2MB of contiguous memory chunk.

    Over a period of time, memory allocation is becomes scattered and there might not be enough contiguous memory available.

     FullGC might no be able to reclaim the contiguous space.
     This is called fragmentation
   
For eg: The verbose:gc output might look like the following if there was a fragmentation of heap. There is free memory available, but  still JVM throws OOM error.
    (Most of the fragmentation bugs are resolved in Sun JDK1.4.2_xx)

    [GC 4673K->3017K(32576K), 0.0050632 secs]
    [GC 5047K->3186K(32576K), 0.0028928 secs]
    [GC 5232K->3296K(32576K), 0.0019779 secs]
    [GC 5309K->3210K(32576K), 0.0004447 secs]
    java.lang.OutOfMemoryError

Fragmentation relates issues are because of bug in JVM.

    Best approach is to try the latest minor version of JVM and if does not work out, we need to work with vendor to get it fixed.
•         The following commands on solaris will provide good information:
    vmstat :

    The vmstat command reports statistics about kernel   threads, virtual memory, disks, traps and CPU activity

    sar:
    An OS utility that is termed as system activity reporter

•         If the application uses SSL, then the server performs slow compared to non SSL.
SSL reduces the capacity of the server by about 33 to 50 percent depending upon the strength of encryption used in the SSL connections.

•         Process running out of File descriptors. Server cannot accept further requests because sockets cannot be created. (Each socket created consumes a FileDescriptor)
•          
     The following exception is thrown in such cases:

     java.net.SocketException: Too many open files

     OR

     java.io.IOException: Too many open files

    In the above case, the lsof utility would help. lsof utility shows the list of all open filedescriptors. From the list of open files, we ( application owner) can easily figure out if it is a bug or expected behavior. If it is expected behavior, then the number of FDs needs to be increased. (default number is 1024)

•         GC taking long times (more than 20secs).

       This appears like a hang for end users.
       In the above case, we need to tune the GC parameters.
       In these scenarios, we should be trying other GC options  available. In some cases  (GC taking very long times), incremental GC has been useful (-Xincgc).
   
       Before knowing about high CPU analysis, it would be helpful if we know about  solaris threading Model.

•         Process consuming High CPU

    Process using high CPU is not always bad. This is a common misconception. Infact we
    want our application to use the CPU efficiently.
    What we do not want is, a single thread or couple of threads consuming all the CPU
    forever(In an infinite loop) and not allowing other threads to get the CPU share(timeslice).


High CPU analysis:
  1) Run prstat to know the process that is consuming the highest CPU.

  2)  Run prstat –L –p <PID>  where PID is processID that is consuming the highest CPU.
    From the above command you would see the lwps that are consuming the high CPU.

   3) Run pstack <PID>
   From the pstack, we can see the lwp mapped to native thread
   For eg:
     -----------------  lwp# 11 / thread# 121  --------------------
    ff31dba8 _poll    (3e8, ff33f2a8, 0, ff33f2a8, ff33f2a8, 0) + 8
   ff37e6a8 select   (0, 0, 0, 0, acb7f880, acb7f880) + 6c
   fed07c04 __1cIos_sleep6Fxi_i_ (0, 3e8, 1, 1, 0, c6e16e80) + 1c8

4) Convert the thread number to hexadecimal

5) Take thread dump on the process
     kill -3 <PID>
From the thread dump look for the thread with nid obtained from step4. The thread we see is the one that is consuming the CPU.


•         OutOfMemory during deployment

If the application is huge(contains more than 100 JSPs), we might encounter this problem with default JVM settings.
The reason for this is, the MaxPermSpace getting filled up.
This space is used by JVM to store its internal datastructures as well
 as class definitions. JSP generated class definitions are also stored in here.
MaxPermSpace is outside java heap and cannot expand dynamically.
So fix is to increase it by passing the argument in startup script of the server: –XX:MaxPermSize=128m (default is 64m)

 

No comments:

Post a Comment