1336088 – [GSS](6.4.z) JGroups TP.registerProbeHandler not thread safe

Bug 1336088 - [GSS](6.4.z) JGroups TP.registerProbeHandler not thread safe

Summary: [GSS](6.4.z) JGroups TP.registerProbeHandler not thread safe

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	JBoss Enterprise Application Platform 6
Classification:	JBoss
Component:	Clustering
Sub Component:
Version:	6.4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	CR1
Target Release:	EAP 6.4.9
Assignee:	dereed
QA Contact:	Michal Vinkler
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1313472 eap649-payload 1336089
TreeView+	depends on / blocked

Reported:	2016-05-14 03:35 UTC by dereed
Modified:	2019-12-16 05:47 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-01-17 13:00:09 UTC
Type:	Bug
Embargoed:

Attachments	(Terms of Use)
test.ear (1.89 KB, application/octet-stream) 2016-07-01 00:04 UTC, dereed	no flags	Details
standalone-ha.xml (20.63 KB, text/plain) 2016-07-01 00:05 UTC, dereed	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	JGRP-1869	0	Minor	Resolved	TP.registerProbeHandler not thread safe	2017-10-31 18:09:22 UTC
Red Hat Knowledge Base (Solution)	2371781	0	None	None	None	2016-07-06 14:09:42 UTC

Description dereed 2016-05-14 03:35:56 UTC

TP.registerProbeHandlers is not thread safe since it modifies preregistered_probe_handlers outside of any synchronization.

If a thread calls this method while another thread is inside startDiagnostics (which can happen easily with a shared transport), it can cause a NullPointerException when startDiagnostics is looping through preregistered_probe_handlers.

Access to preregistered_probe_handlers should be synchronized.

Comment 1 dereed 2016-05-14 03:37:57 UTC

This bug is inside JGroups.

Comment 2 dereed 2016-05-14 03:38:55 UTC

Already fixed in upstream/EAP 7.

Comment 3 dereed 2016-05-14 05:36:40 UTC

Backporting JGRP-1869 also included https://issues.jboss.org/browse/JGRP-1834.

Comment 4 dereed 2016-05-16 06:15:55 UTC

Testing details:

In order to trigger, diagnostics must be enabled:
- add a new socket-binding
    <socket-binding name="diag" port="0" multicast-address="${jboss.default.multicast.address:230.0.0.4}" multicast-port="12345"/>
- add diagnostics-socket-binding for that new socket binding
    <transport type="UDP" socket-binding="jgroups-udp" diagnostics-socket-binding="diag"/>

And multiple JGroups channels (with the same shared transport) must be started.
For example, deploy both a <distributable/> war and a @Clustered EJB.

Then it's a timing race condition.
I have not been successful forcing it to trigger with Byteman yet, but have occasionally when just starting EAP with the above configuration.

Comment 5 Jiří Bílek 2016-06-30 15:11:26 UTC

Hello Dennis,
I cannot reproduce the issue. Could you attach the appropriate standalone.xml and deployment, please?

Comment 6 dereed 2016-07-01 00:04:40 UTC

Created attachment 1174754 [details]
test.ear

Comment 7 dereed 2016-07-01 00:05:21 UTC

Created attachment 1174755 [details]
standalone-ha.xml

Comment 8 dereed 2016-07-01 00:06:54 UTC

Attached an example deployment to trigger the issue (an ear with a <distributable/> war and a @Clustered EJB),
and standalone-ha.xml from EAP 6.4.6 (the version I had easily available) with the two changes as detailed in #4 to enable the diagnostics socket.

Comment 9 dereed 2016-07-01 00:08:37 UTC

And as mentioned above, it's a race condition and I was not able to get a test case to consistently trigger it.  With this simple application the errors will trigger occasionally on startup of EAP.

Comment 10 Jiří Bílek 2016-07-01 07:02:00 UTC

Thank you Dennis,
error occured in EAP 6.4.6  7 times in 10 starts,
error did not occured in EAP 6.4.9 in 30 starts.

Verified with EAP 6.4.9.CP.CR2

Comment 11 Petr Penicka 2017-01-17 13:00:09 UTC

Retroactively bulk-closing issues from released EAP 6.4 cummulative patches.

Note You need to log in before you can comment on or make changes to this bug.