Bug 1336088

Summary: [GSS](6.4.z) JGroups TP.registerProbeHandler not thread safe
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: dereed
Component: ClusteringAssignee: dereed
Status: CLOSED CURRENTRELEASE QA Contact: Michal Vinkler <mvinkler>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.4.6CC: bmaxwell, jbilek, jtruhlar, msochure, paul.ferraro, rnetuka, sappleto
Target Milestone: CR1   
Target Release: EAP 6.4.9   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-17 13:00:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1313472, 1324262, 1336089    
Attachments:
Description Flags
test.ear
none
standalone-ha.xml none

Description dereed 2016-05-14 03:35:56 UTC
TP.registerProbeHandlers is not thread safe since it modifies preregistered_probe_handlers outside of any synchronization.

If a thread calls this method while another thread is inside startDiagnostics (which can happen easily with a shared transport), it can cause a NullPointerException when startDiagnostics is looping through preregistered_probe_handlers.

Access to preregistered_probe_handlers should be synchronized.

Comment 1 dereed 2016-05-14 03:37:57 UTC
This bug is inside JGroups.

Comment 2 dereed 2016-05-14 03:38:55 UTC
Already fixed in upstream/EAP 7.

Comment 3 dereed 2016-05-14 05:36:40 UTC
Backporting JGRP-1869 also included https://issues.jboss.org/browse/JGRP-1834.

Comment 4 dereed 2016-05-16 06:15:55 UTC
Testing details:

In order to trigger, diagnostics must be enabled:
- add a new socket-binding
    <socket-binding name="diag" port="0" multicast-address="${jboss.default.multicast.address:230.0.0.4}" multicast-port="12345"/>
- add diagnostics-socket-binding for that new socket binding
    <transport type="UDP" socket-binding="jgroups-udp" diagnostics-socket-binding="diag"/>

And multiple JGroups channels (with the same shared transport) must be started.
For example, deploy both a <distributable/> war and a @Clustered EJB.

Then it's a timing race condition.
I have not been successful forcing it to trigger with Byteman yet, but have occasionally when just starting EAP with the above configuration.

Comment 5 Jiří Bílek 2016-06-30 15:11:26 UTC
Hello Dennis,
I cannot reproduce the issue. Could you attach the appropriate standalone.xml and deployment, please?

Comment 6 dereed 2016-07-01 00:04:40 UTC
Created attachment 1174754 [details]
test.ear

Comment 7 dereed 2016-07-01 00:05:21 UTC
Created attachment 1174755 [details]
standalone-ha.xml

Comment 8 dereed 2016-07-01 00:06:54 UTC
Attached an example deployment to trigger the issue (an ear with a <distributable/> war and a @Clustered EJB),
and standalone-ha.xml from EAP 6.4.6 (the version I had easily available) with the two changes as detailed in #4 to enable the diagnostics socket.

Comment 9 dereed 2016-07-01 00:08:37 UTC
And as mentioned above, it's a race condition and I was not able to get a test case to consistently trigger it.  With this simple application the errors will trigger occasionally on startup of EAP.

Comment 10 Jiří Bílek 2016-07-01 07:02:00 UTC
Thank you Dennis,
error occured in EAP 6.4.6  7 times in 10 starts,
error did not occured in EAP 6.4.9 in 30 starts.

Verified with EAP 6.4.9.CP.CR2

Comment 11 Petr Penicka 2017-01-17 13:00:09 UTC
Retroactively bulk-closing issues from released EAP 6.4 cummulative patches.