/ MMAP File OutOfMemoryError and pmap ~ Java EE Support Patterns

8.12.2011

MMAP File OutOfMemoryError and pmap

This case study describes the complete root cause analysis and resolution of a Java ZipFile OutOfMemoryError problem triggered during the deployment of an Oracle Weblogic Integration 9.2 application.

Environment Specs

·         Java EE server: Oracle Weblogic Integration 9.2 MP2
·         OS: Sun Solaris 5.10
·          JDK: Sun Java VM 1.5.0_10-b03 - 32-bit
·         Java VM arguments: -server –Xms2560m –Xmx2560m -XX:PermSize=256m -XX:MaxPermSize=512m
·         Platform type: BPM


Monitoring and troubleshooting tools
·         Solaris pmap command
·         Java verbose GC (for Java Heap and PermGen memory monitoring)
 

Problem overview

·         Problem type: java.lang.OutOfMemoryError at java.util.zip.ZipFile.open(Native Method)

An OutOfMemoryError problem was observed during the deployment process of one of our Weblogic Integration 9.2 application. 

Gathering and validation of facts

As usual, a Java EE problem investigation requires gathering of technical and non technical facts so we can either derived other facts and/or conclude on the root cause. Before applying a corrective measure, the facts below were verified in order to conclude on the root cause:

·         What is the client impact? HIGH (full outage of our application)
·         Recent change of the affected platform? Yes, a minor update to the application was done along with an increase of minimum and maximum size the Java Heap from 2 GIG (2048m) to 2.5 GIG (2560m) in order to reduce the frequency of full garbage collections
·         Any recent traffic increase to the affected platform? No
·         Since how long this problem has been observed?  Since the increase of the Java Heap to 2.5 GIG
·          Is the OutOfMemoryError happening on start-up or under load? The OOM is triggered at deployment time only with no traffic to the environment
·          What was the utilization of the Java Heap at that time? Really low at 20% utilization only (no traffic)
·         What was the utilization of the PermGen space at that time? Healthy at ~ 70% utilization and not leaking
·         Did a restart of the Weblogic server resolve the problem? No


-          Conclusion #1: The problem trigger seems to be related to the 500 MB increase of the Java Heap to 2.5 GIG. ** This problem initially puzzled the troubleshooting team until more deep dive analysis was done as per below **

Weblogic log file analysis

A first analysis of the problem was done by reviewing the OOM error in the Weblogic managed server log.

java.lang.OutOfMemoryError
       at java.util.zip.ZipFile.open(Native Method)
       at java.util.zip.ZipFile.<init>(ZipFile.java:203)
       at java.util.jar.JarFile.<init>(JarFile.java:132)
       at java.util.jar.JarFile.<init>(JarFile.java:97)
       at weblogic.utils.jars.JarFileDelegate.<init>(JarFileDelegate.java:32)
       at weblogic.utils.jars.VirtualJarFactory.createVirtualJar(VirtualJarFactory.java:24)
       at weblogic.application.ApplicationFileManagerImpl.getVirtualJarFile(ApplicationFileManagerImpl.java:194)
       at weblogic.application.internal.EarDeploymentFactory.findOrCreateComponentMBeans(EarDeploymentFactory.java:162)
       at weblogic.application.internal.MBeanFactoryImpl.findOrCreateComponentMBeans(MBeanFactoryImpl.java:48)
       at weblogic.application.internal.MBeanFactoryImpl.createComponentMBeans(MBeanFactoryImpl.java:110)
       at weblogic.application.internal.MBeanFactoryImpl.initializeMBeans(MBeanFactoryImpl.java:76)
       at weblogic.management.deploy.internal.MBeanConverter.createApplicationMBean(MBeanConverter.java:88)
       at weblogic.management.deploy.internal.MBeanConverter.createApplicationForAppDeployment(MBeanConverter.java:66)
       at weblogic.management.deploy.internal.MBeanConverter.setupNew81MBean(MBeanConverter.java:314)
……………………………………………………………………………………………………………………………………………………………………………………………………………

As you can see, the OutOfMemoryError is thrown during the loading of our application (EAR file). The Weblogic server relies on the Java JDK ZipFile class to load any application EAR / jar files.

-          Conclusion #2: The OOM error is triggered during a native call (ZipFile.open(Native Method)) from the Java JDK ZipFile to load our application EAR file. This native JVM operation requires proper native memory and virtual address space available in order to execute its loading operation. The conclusion at this point was that our Java VM 1.5 was running out of native memory / virtual address space at deployment time.

Sun Java VM native memory and MMAP files

Before we go any further in the root cause analysis, you way want to revisit the internal structure of the Sun Java HotSpot VM for JDK 1.5 and JDK 1.6. Proper understanding of the internal Sun Java VM is quite important, especially when dealing with native memory problems.

When using JDK 1.4 / 1.5, any JAR / ZIP file loaded by the Java VM get mapped entirely into an address space. This means that the more EAR / JAR files you are loading to a single JVM, the higher is the native memory footprint of your Java process.

This also means that the higher is your Java Heap and PermGen space; the lower memory is left for your native memory spaces such as C-Heap and MMAP Files which can definitely be a problem if you are deploying too many separate applications (EAR files) to a single 32-bit Java process.

Please note that Sun came up with improvements in JDK 1.6 (Mustang) and changed the behaviour so that the JAR file's central directory is still mapped, but the entries themselves are read separately; reducing the native memory requirement.

I suggest that you review the Sun Bug Id link below for more detail on such JDK 1.4 / 1.5 limitation.

Native memory footprint: Solaris pmap to the rescue!

The Solaris pmap command allows Java EE application developers to analyse how an application uses memory by providing a breakdown of all allocated address spaces of the Java process.

Now back to our OOM problem, pmap snapshots were generated following the OutOfMemoryError and analysis of the data below did reveal that we were getting very close to the upper 4 GIG limit of a 32-bit process on Solaris following our Java Heap increase to 2.5 GIG.

Now find below the reduced raw data along with snapshot with explanation of the findings.

20361:  ../bea922/jdk150_10/bin/java -Dweblogic.Name=server1 -Dbea

Address  Kbytes     RSS    Anon  Locked Mode   *Mapped File*

00010000      64      64       -       - r-x--  java
0002E000      16      16       8       - rwx--  java
00032000      56      56      56       - rwx--    [ heap ]
00040000    3840    3840    3840       - rwx--    [ heap ]
00400000  229376  229376  229376       - rwx--    [ heap ]
25400000    6528     176       -       - r--s-  dev:256,65555 ino:315953
279E8000      32      32      32       - rwx-R    [ stack tid=17152 ]
27A00000     752      32       -       - r--s-  dev:256,65545 ino:21843
.........................................................................
31100000    2704    2696       -       - r--s-  dev:256,65555 ino:466131
313E6000      40      40      40       - rwx-R    [ stack tid=70 ]
31400000    9344    9328       -       - r--s-  dev:256,65555 ino:373905
31D30000      32      32       -       - r--s-  dev:256,65555 ino:317152
31D40000      24      24       -       - r--s-  dev:256,65555 ino:471460
31D94000      48      48      48       - rwx-R    [ stack tid=69 ]
31DE8000      32      32      32       - rwx-R    [ stack tid=67 ]
31E00000    1952    1936       -       - r--s-  dev:256,65555 ino:466129
32000000   10864   10840       -       - r--s-  dev:256,65555 ino:362036
32AA0000     128     128     128       - rwx--    [ anon ]
32B06000      40      40      40       - rwx-R    [ stack tid=66 ]
32B20000       8       8       -       - r-x--  librmi.so
32B30000       8       8       8       - rwx--  librmi.so
32C00000    8616    8576       -       - r--s-  dev:256,65555 ino:319156
33480000       8       8       -       - r--s-  dev:256,65555 ino:89436
.........................................................................
4C400000  102400   90336   90336       - rwx--    [ anon ]
52810000     160     144       -       - r--s-  dev:256,65545 ino:21876
52A00000     792     352       -       - r--s-  dev:256,65545 ino:35669
52AD0000       8       8       -       - r--s-  dev:256,65545 ino:20564
52AE0000     112     112       -       - r--s-  dev:256,65545 ino:21850
52B76000       8       8       8       - rwx-R    [ anon ]
52B7A000      16      16       8       - rwx-R    [ anon ]
52B7E000       8       8       8       - rwx-R    [ stack tid=16 ]
52B90000      48      48       -       - r--s-  dev:256,65545 ino:23136
52BA0000     104     104       -       - r--s-  dev:256,65545 ino:21890
52BC0000      96      96       -       - r--s-  dev:256,65545 ino:21870
52BE0000      72      72       -       - r--s-  dev:256,65545 ino:8995
52C00000   81920   57472   57472       - rwx--    [ anon ]
57C10000     216     216       -       - r--s-  dev:256,65545 ino:21856
57C50000     120     104       -       - r--s-  dev:256,65545 ino:9002
57C80000     440     344       -       - r--s-  dev:256,65545 ino:8954
57D76000      16      16       8       - rwx-R    [ anon ]
57D7A000      24      24      24       - rwx-R    [ stack tid=15 ]
57D90000       8       8       -       - r--s-  dev:256,65545 ino:8879
57DA0000     192     192       -       - r--s-  dev:256,65545 ino:8881
57DE0000      88      88       -       - r--s-  dev:256,65545 ino:8972
57E7E000       8       8       8       - rwx-R    [ stack tid=14 ]
57E90000      40      40       -       - r--s-  dev:256,65545 ino:8899
57EA0000     192     184       -       - r--s-  dev:256,65545 ino:8985
57EE0000     112     112       -       - r--s-  dev:256,65545 ino:8866
57F7C000      16      16      16       - rwx-R    [ stack tid=10 ]
57F90000     400     280       -       - r--s-  dev:256,65545 ino:13398
58000000     512     512     512       - rwx--    [ anon ]
58080000     288     288     288       - rwx--    [ anon ]
580C8000      40      40      40       - rwx--    [ anon ]
58100000    3072    3072    3072       - rwx--    [ anon ]
58400000    1024    1024    1024       - rwx--    [ anon ]
58500000       8       8       8       - rwx--    [ anon ]
..........................................................................
58800000  262144  262144  262144       - rwx--    [ anon ]
68800000  147456  135168  135168       - rwx--    [ anon ]
71800000   20480       -       -       - rwx--    [ anon ]
78800000 1572864 1572864 1572864       - rwx--    [ anon ]
D8800000  924288  524288  524288       - rwx--    [ anon ]
F8810000      16      16       -       - r--s-  dev:256,65545 ino:8941
F8820000      16      16       -       - r--s-  dev:256,65545 ino:8935
F897E000       8       8       8       - rwx-R    [ stack tid=7 ]
F8990000     112     112       -       - r--s-  dev:256,65545 ino:8986
F89B0000     136     136       -       - r--s-  dev:256,65545 ino:1339
F89E0000      64      64       -       - r--s-  dev:256,65545 ino:8970
F8A7E000       8       8       8       - rwx-R    [ stack tid=6 ]
F8A90000     392     392       -       - r--s-  dev:256,65545 ino:8961
F8B7E000       8       8       8       - rwx-R    [ stack tid=5 ]
F8B90000       8       8       -       - r--s-  dev:256,65545 ino:8944
F8BA0000      24      24       -       - r--s-  dev:256,65545 ino:8997
F8BB0000     256     248       -       - r--s-  dev:256,65545 ino:8867
F8C00000    4096    4096    4096       - rwx--    [ anon ]
FA800000    4096    4096    4096       - rwx--    [ anon ]
FAC10000     456     456       -       - r--s-  dev:256,65545 ino:9003
FAC90000     392     392       -       - r--s-  dev:256,65545 ino:9001
FB000000    8640    4256       -       - r--s-  dev:256,65545 ino:36123
FB880000    3072    3072    3072       - rwx--    [ anon ]
FBB90000      16      16       -       - r--s-  dev:256,65545 ino:84387
FBBA0000      16      16       -       - r--s-  dev:256,65545 ino:84392
FBBEE000       8       8       8       - rwx-R    [ stack tid=13 ]
FBC00000   38976   36904       -       - r--s-  dev:256,65545 ino:35671
FE220000      16      16       -       - r--s-  dev:256,65545 ino:84614
.......................................................................
FE5A0000      64      64       -       - r-x--  libzip.so
FE5B0000       8       8       8       - rwx--  libzip.so
FE5C0000     144     144       -       - r-x--  libjava.so
FE5F4000       8       8       8       - rwx--  libjava.so
FE600000       8       8       8       - rwx--    [ anon ]
FE610000      56      48       -       - r-x--  libverify.so
FE62E000       8       8       8       - rwx--  libverify.so
FE640000      64      64      64       - rwx--    [ anon ]
FE654000       8       8       -       - rwxs-    [ anon ]
FE660000      32      32       -       - rw-s-  dev:321,8 ino:1319830649
FE670000      24      24       -       - r-x--  nss_files.so.1
FE686000       8       8       8       - rwx--  nss_files.so.1
FE690000       8       8       -       - r--s-  dev:256,65545 ino:218017
FE6A0000      64      64      64       - rwx--    [ anon ]
FE6C0000      32      24       -       - r-x--  libhpi.so
FE6D8000       8       8       8       - rwx--  libhpi.so
FE6DA000       8       -       -       - rwx--  libhpi.so
FE6E0000      16      16       -       - r-x--  libmp.so.2
FE6F4000       8       8       8       - rwx--  libmp.so.2
FE700000     680     640       -       - r-x--  libm.so.2
FE7B8000      32      32      24       - rwx--  libm.so.2
FE7D0000       8       8       -       - r--s-  dev:256,65545 ino:35666
FE7E0000       8       8       -       - r-x--  libmd5.so.1
FE7F2000       8       8       8       - rwx--  libmd5.so.1
FE800000    7936    6592       -       - r-x--  libjvm.so
FEFC0000     384     384     384       - rwx--  libjvm.so
FF020000      16      16      16       - rwx--  dev:256,65545 ino:35653
FF024000      64      56      56       - rwx--  libjvm.so
.............................................................
FF3B0000     208     208       -       - r-x--  ld.so.1
FF3EC000      16      16       -       - r-x--  libthread.so.1
FF3F4000       8       8       8       - rwx--  ld.so.1
FF3F6000       8       8       8       - rwx--  ld.so.1
FFBBE000      24       -       -       - -----    [ anon ]
FFBE0000     128     128     128       - rwx--    [ stack ]
-------- ------- ------- ------- -------
total Kb 3922736 3411512 2942456   
  





Root cause: Java Heap increase!

As you can see in the above pmap analysis, our 32-bit Java process size was getting very close to the 4 GIG limit; leaving no room for additional EAR file deployment.

We can see a direct correlation between the Java Heap increase and the appearance of this OutOfMemoryError. Since the Java 1.5 VM is mapping the entire EAR file to a native memory address space; proper native memory and address space must be available to fulfil such ZipFile.open() native operation.

The solution was to simply revert back our Java Heap increase which did allow the deployment of our EAR file.

Other long term solutions will be discussed shortly such as vertical scaling of the Weblogic Integration (adding more JVM’s / managed servers to existing physical servers), switching to the 64-bit JVM and / or upgrade to the Sun Java VM 1.6.

Conclusion and recommendations

-          - When facing an OutOfMemoryError with the Sun Java VM, always do proper analysis and your due diligence to determine which memory space is the problem (Java Heap, PermGen, native / C-Heap)
-           
          - When facing an OutOfMemoryError due to native memory exhaustion, always generate Solaris pmap snapshots of your Java process and do your analysis. Do not increase the Java Heap (Xms / Xmx) as this will make the problem even worse
-         
           - Be very careful before attempting to increase your Java Heap or PermGen space. Always ensure that you understand your total Java VM memory footprint and that you have enough native memory available for the non Java Heap memory allocations such as MMAP Files for your application EAR files

1 comments:

fantastic solution mate, I love your detailed analysis and approach. this is a tricky error which takes some time to solve if you have memory leak by the way Tomcat also has history of going down with OutOfMemroyError , I have blogged my experience as java.lang.OutOfMemroyError:PermGen space in Tomcat you may find useful.

Post a Comment