Bug 1331856

Summary: PKCS11 native memory leak
Product: Red Hat Enterprise Linux 6 Reporter: Mike Millson <mmillson>
Component: java-1.7.0-openjdkAssignee: Andrew Haley <aph>
Status: CLOSED WONTFIX QA Contact: BaseOS QE - Apps <qe-baseos-apps>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.7CC: aph, cww, dereed, jack.ottofaro, jvanek, lkuprova
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1332246 (view as bug list) Environment:
Last Closed: 2016-08-09 18:52:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1332246    

Description Mike Millson 2016-04-29 20:10:21 UTC
Java application using PKCS11 provider leaks memory from Java_sun_security_pkcs11_wrapper_PKCS11_C_1DeriveKey

The following bugs are relevant:
- https://bugs.openjdk.java.net/browse/JDK-6887619
- https://bugs.openjdk.java.net/browse/JDK-6913047
- http://bugs.java.com/view_bug.do?bug_id=6913047

The use of phantom references to manage native resources, the PKCS11 keys will not be freed until several weak reference processing cycles occur. This can take a long time with the throughput collectors.

There is no workaround that is FIPS compliant. The workaround of using the JCE provider is not FIPS compliant.

Also affects OpenJDK 1.8.

Comment 2 Mike Millson 2016-04-29 21:02:16 UTC
Suggested workaround from OpenJDK developers of forcing multiple full collections using jconsole was tested, and it did not reduce the size of the Java process.

Comment 3 Mike Millson 2016-04-29 21:03:14 UTC
Where the issue has been observed, the GC logging shows only young parallel scavenge collections, no Full GCs.

Comment 4 dereed 2016-04-29 21:33:21 UTC
> Suggested workaround from OpenJDK developers of forcing multiple full 
> collections using jconsole was tested, and it did not reduce the size of the Java process.

Was the test valid?

I'm assuming the native allocations are using malloc() or similar, which buffer the native memory similar to the Java heap, so free() does not show up in the process size.  Therefore just looking at the process size would not be a useful test any more than it would be for checking the Java heap usage.

A better test would be running a continual load test with periodic GC calls and checking if the usage plateaus instead of continually growing.

Comment 5 Andrew Haley 2016-05-03 08:45:25 UTC
(In reply to dereed from comment #4)
> > Suggested workaround from OpenJDK developers of forcing multiple full 
> > collections using jconsole was tested, and it did not reduce the size of the Java process.

> 
> Was the test valid?
> I'm assuming the native allocations are using malloc() or similar,
> which buffer the native memory similar to the Java heap, so free()
> does not show up in the process size.

That's true indeed.  The question is whether full collections stop the
process from growing any more.
 
> Therefore just looking at the process size would not be a useful
> test any more than it would be for checking the Java heap usage.
> 
> A better test would be running a continual load test with periodic
> GC calls and checking if the usage plateaus instead of continually
> growing.

It would be better, yes.

Andrew.

Comment 6 Mike Millson 2016-05-03 19:47:32 UTC
To reproduce on Tomcat ON RHEL:
1. Install the Tomcat sample app:
https://tomcat.apache.org/tomcat-7.0-doc/appdev/sample/

2. Run Tomcat FIPS compliant:
https://access.redhat.com/solutions/2194601
https://access.redhat.com/solutions/42301

3. Use the ab tool to make repeated requests for the sample app:
while true; do ab -c 10 -n 9999999 -f TLS1 https://localhost:8443/sample/; done


Using top you will see the Java process increase over time.

Eventually the JVM crashed with the following in the fatal error log:

...
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 715653120 bytes for committing reserved memory.
# Possible reasons:
#   The system is out of physical RAM or swap space
#   In 32 bit mode, the process size limit was hit
# Possible solutions:
#   Reduce memory load on the system
#   Increase physical memory or swap space
#   Check if swap backing store is full
#   Use 64 bit Java on a 64 bit OS
#   Decrease Java heap size (-Xmx/-Xms)
#   Decrease number of Java threads
#   Decrease Java thread stack sizes (-Xss)
#   Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
#
#  Out of Memory Error (os_linux.cpp:2627), pid=19342, tid=139994009659136
#
...

Comment 7 Mike Millson 2016-05-03 19:54:56 UTC
Workaround:

1. Add the following JVM option:
-XX:+PrintClassHistogram 

2. Run the following script to take a thread dump every minute:

#!/bin/sh
#
# Takes the JBoss PID as an argument. 
#
# Generates a thread dump every specified interval. Thread dumps will be in the
# file where stdout is redirected or in console output.
#
# Usage: sh ./threaddump_linux-continuous.sh <JBOSS_PID> 
#

# Interval in seconds between data points.
INTERVAL=60

while [ true ]
do
    kill -3 $1
    sleep $INTERVAL
done


This will print information about objects in the heap when a thread dump is requested. In order for the JVM to compute this information, it first does a full collection. So this is a roundabout why to force a full collection at some regular interval.

3. Observing the process size with top shows that these regular full collections prevent the process size from ballooning. It does increase some, but very little.

4. This is more an a wourkaround proof of concept than a proposed workaround.

Comment 13 Andrew Haley 2016-05-10 16:02:13 UTC
Please try running this test with -XX:SoftRefLRUPolicyMSPerMB=1.

Comment 14 Mike Millson 2016-05-10 17:10:33 UTC
I tested -XX:SoftRefLRUPolicyMSPerMB=1 on the latest OpenJDK (java-1.8.0-openjdk-1.8.0.65-3.b17.el7.x86_64), and it looks the same. The process size does not reach a steady state but continues to grow over time.

Comment 15 Andrew Haley 2016-05-10 17:14:37 UTC
(In reply to Mike Millson from comment #14)
> I tested -XX:SoftRefLRUPolicyMSPerMB=1 on the latest OpenJDK
> (java-1.8.0-openjdk-1.8.0.65-3.b17.el7.x86_64), and it looks the same. The
> process size does not reach a steady state but continues to grow over time.

What exactly are you looking at?  As far as I can see this solves the problem.  The process size should grow for a while, as I would expect, but should not run to exhaustion.  If this doesn't fix the problem for you then I need to know what you're doing that's different.  This is 32-bit JDK8.

Comment 16 Mike Millson 2016-05-10 17:28:17 UTC
(In reply to Andrew Haley from comment #15)
> (In reply to Mike Millson from comment #14)
> > I tested -XX:SoftRefLRUPolicyMSPerMB=1 on the latest OpenJDK
> > (java-1.8.0-openjdk-1.8.0.65-3.b17.el7.x86_64), and it looks the same. The
> > process size does not reach a steady state but continues to grow over time.
> 
> What exactly are you looking at?  As far as I can see this solves the
> problem.  The process size should grow for a while, as I would expect, but
> should not run to exhaustion.  If this doesn't fix the problem for you then
> I need to know what you're doing that's different.  This is 32-bit JDK8.

I followed the reproducer steps, and am looking at the RSS value in top. It steadily increases. It went from <1g at startup to >3g after ~10 minutes.

I am not forcing full gc collections. Is that necessary w/ this workaround?

Comment 17 Mike Millson 2016-05-11 20:40:11 UTC
(In reply to Mike Millson from comment #16)
> (In reply to Andrew Haley from comment #15)
> > (In reply to Mike Millson from comment #14)
> > > I tested -XX:SoftRefLRUPolicyMSPerMB=1 on the latest OpenJDK
> > > (java-1.8.0-openjdk-1.8.0.65-3.b17.el7.x86_64), and it looks the same. The
> > > process size does not reach a steady state but continues to grow over time.
> > 
> > What exactly are you looking at?  As far as I can see this solves the
> > problem.  The process size should grow for a while, as I would expect, but
> > should not run to exhaustion.  If this doesn't fix the problem for you then
> > I need to know what you're doing that's different.  This is 32-bit JDK8.
> 
> I followed the reproducer steps, and am looking at the RSS value in top. It
> steadily increases. It went from <1g at startup to >3g after ~10 minutes.
> 
> I am not forcing full gc collections. Is that necessary w/ this workaround?

I see now. My testing shows it does resolve the leak when used with the G1 collector:
-XX:+UseG1GC -XX:SoftRefLRUPolicyMSPerMB=1