Created attachment 1248665 [details]
Description of problem: After updating to java-1.8.0-openjdk-220.127.116.11, I've started seeing java.io.InvalidClassException: Not a proxy occassionally.
Version-Release number of selected component (if applicable): java-1.8.0-openjdk-18.104.22.168 and java-1.8.0-openjdk-22.214.171.124
How reproducible: Sometimes, just a matter of time with reproducer.
Steps to Reproduce:
1. Attached reproducer with steps in README.md
Actual results: InvalidClassException
Additional info: This issue has already been reported upstream and with oracle:
I believe that the test program is manifesting a race condition in the WeakCache code. The problem arises when using the both the jdk8u release mentioned by the customer and the latest jdk8u code.
I have a simple (one-line) fix for this issue which I believe to be correct. I have applied the patch to the latest jdk8u tree, rebuilt and rerun the test. This stops the 'Not a proxy' InvalidClassException from manifesting. However, removing that problem causes the test program to fail with other unexpected and unaccountable errors.
I am seeing a NullPointerException coming out of various method calls in the proxy and deserialization code. The stack backtraces for these exceptions put them at odd or, in some cases, impossible locations in the Java source code. I am still investigating this problem just to be sure that it does not relate to my fix (I think it almost certainly cannot be related but I am not yet certain of that).
Assigning to Andrew. Andrew, once there is a patch, please ping Jiri to get it into the rpms.
I have checked this with Peter Levart who wrote the Weakcache code and
i) he agrees that there is a race condition
ii) he agrees that my patch fixes it
iii) he produced a simpler and more reliable reproducer which manifests the problem before the patch and works ok after the patch
iv) his reproducer does not encounter the NullPointerException problems that I saw with the original reproducer
I raised https://bugs.openjdk.java.net/browse/JDK-8174729 to cover this problem and the fix. Peter has offered to post the patch to jdk8u. I am expecting him to include his reproducer as a regression test. Once that is patched upstream we can pull the fix into our release.
So far as the client is concerned this means that the race condition problem they have identified will be resolved but they will probably still continue to see problems in their serialization code along the lines of the ones I have been observing.
I have experimented to see if those problems relate to the diagnosis provided in the original JIRA and bugs databse issue and I am not convinced it is. I modified the client's reproducer to ensure that references to proxy instances and classes weer retained yet I still saw the NPEs. Also, even if the problem were to do with the client code not maintaining reachability for proxy instances or classes I believe it has to be an incorrect for the JVM to be generating NPE traces originating at lines where there is no object dereference and, hence, no potential for a NullPointerException to arise. I believe there is something else going wrong in the JDK/JVM here and that it may well relate to reference processing and GC. I will pursue this further to try to understand what is happening.
The reproducer provided by Peter Levart has been attached to the the OpenJDK issue:
A fix for JDK-8174729 has been submitted to upstream jdk8u and approved for inclusion.
The fix for JDK-8174729 has now been committed to upstream jdk8u-dev
n.b. the fix version listed for the bug is 8u152
This was fixed in the recent 8u131 security update.
See java-1.8.0-openjdk-126.96.36.199-0.b11.el6_9 for RHEL 6.9.
Closing based on comment #11