Bug 1336088 - [GSS](6.4.z) JGroups TP.registerProbeHandler not thread safe
Summary: [GSS](6.4.z) JGroups TP.registerProbeHandler not thread safe
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss Enterprise Application Platform 6
Classification: JBoss
Component: Clustering
Version: 6.4.6
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: CR1
: EAP 6.4.9
Assignee: dereed
QA Contact: Michal Vinkler
URL:
Whiteboard:
Depends On:
Blocks: 1313472 eap649-payload 1336089
TreeView+ depends on / blocked
 
Reported: 2016-05-14 03:35 UTC by dereed
Modified: 2019-12-16 05:47 UTC (History)
7 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2017-01-17 13:00:09 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
test.ear (1.89 KB, application/octet-stream)
2016-07-01 00:04 UTC, dereed
no flags Details
standalone-ha.xml (20.63 KB, text/plain)
2016-07-01 00:05 UTC, dereed
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker JGRP-1869 0 Minor Resolved TP.registerProbeHandler not thread safe 2017-10-31 18:09:22 UTC
Red Hat Knowledge Base (Solution) 2371781 0 None None None 2016-07-06 14:09:42 UTC

Description dereed 2016-05-14 03:35:56 UTC
TP.registerProbeHandlers is not thread safe since it modifies preregistered_probe_handlers outside of any synchronization.

If a thread calls this method while another thread is inside startDiagnostics (which can happen easily with a shared transport), it can cause a NullPointerException when startDiagnostics is looping through preregistered_probe_handlers.

Access to preregistered_probe_handlers should be synchronized.

Comment 1 dereed 2016-05-14 03:37:57 UTC
This bug is inside JGroups.

Comment 2 dereed 2016-05-14 03:38:55 UTC
Already fixed in upstream/EAP 7.

Comment 3 dereed 2016-05-14 05:36:40 UTC
Backporting JGRP-1869 also included https://issues.jboss.org/browse/JGRP-1834.

Comment 4 dereed 2016-05-16 06:15:55 UTC
Testing details:

In order to trigger, diagnostics must be enabled:
- add a new socket-binding
    <socket-binding name="diag" port="0" multicast-address="${jboss.default.multicast.address:230.0.0.4}" multicast-port="12345"/>
- add diagnostics-socket-binding for that new socket binding
    <transport type="UDP" socket-binding="jgroups-udp" diagnostics-socket-binding="diag"/>

And multiple JGroups channels (with the same shared transport) must be started.
For example, deploy both a <distributable/> war and a @Clustered EJB.

Then it's a timing race condition.
I have not been successful forcing it to trigger with Byteman yet, but have occasionally when just starting EAP with the above configuration.

Comment 5 Jiří Bílek 2016-06-30 15:11:26 UTC
Hello Dennis,
I cannot reproduce the issue. Could you attach the appropriate standalone.xml and deployment, please?

Comment 6 dereed 2016-07-01 00:04:40 UTC
Created attachment 1174754 [details]
test.ear

Comment 7 dereed 2016-07-01 00:05:21 UTC
Created attachment 1174755 [details]
standalone-ha.xml

Comment 8 dereed 2016-07-01 00:06:54 UTC
Attached an example deployment to trigger the issue (an ear with a <distributable/> war and a @Clustered EJB),
and standalone-ha.xml from EAP 6.4.6 (the version I had easily available) with the two changes as detailed in #4 to enable the diagnostics socket.

Comment 9 dereed 2016-07-01 00:08:37 UTC
And as mentioned above, it's a race condition and I was not able to get a test case to consistently trigger it.  With this simple application the errors will trigger occasionally on startup of EAP.

Comment 10 Jiří Bílek 2016-07-01 07:02:00 UTC
Thank you Dennis,
error occured in EAP 6.4.6  7 times in 10 starts,
error did not occured in EAP 6.4.9 in 30 starts.

Verified with EAP 6.4.9.CP.CR2

Comment 11 Petr Penicka 2017-01-17 13:00:09 UTC
Retroactively bulk-closing issues from released EAP 6.4 cummulative patches.


Note You need to log in before you can comment on or make changes to this bug.