Bug 1336088

Summary:

[GSS](6.4.z) JGroups TP.registerProbeHandler not thread safe

Product:

[JBoss] JBoss Enterprise Application Platform 6

Reporter:

dereed

Component:

Clustering

Assignee:

dereed

Status:

CLOSED CURRENTRELEASE

QA Contact:

Michal Vinkler <mvinkler>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

6.4.6

CC:

bmaxwell, jbilek, jtruhlar, msochure, paul.ferraro, rnetuka, sappleto

Target Milestone:

CR1

Target Release:

EAP 6.4.9

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2017-01-17 13:00:09 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1313472, 1324262, 1336089

Attachments:

Description	Flags
test.ear	none
standalone-ha.xml	none

Description dereed 2016-05-14 03:35:56 UTC

TP.registerProbeHandlers is not thread safe since it modifies preregistered_probe_handlers outside of any synchronization.

If a thread calls this method while another thread is inside startDiagnostics (which can happen easily with a shared transport), it can cause a NullPointerException when startDiagnostics is looping through preregistered_probe_handlers.

Access to preregistered_probe_handlers should be synchronized.

Comment 1 dereed 2016-05-14 03:37:57 UTC

This bug is inside JGroups.

Comment 2 dereed 2016-05-14 03:38:55 UTC

Already fixed in upstream/EAP 7.

Comment 3 dereed 2016-05-14 05:36:40 UTC

Backporting JGRP-1869 also included https://issues.jboss.org/browse/JGRP-1834.

Comment 4 dereed 2016-05-16 06:15:55 UTC

Testing details:

In order to trigger, diagnostics must be enabled:
- add a new socket-binding
    <socket-binding name="diag" port="0" multicast-address="${jboss.default.multicast.address:230.0.0.4}" multicast-port="12345"/>
- add diagnostics-socket-binding for that new socket binding
    <transport type="UDP" socket-binding="jgroups-udp" diagnostics-socket-binding="diag"/>

And multiple JGroups channels (with the same shared transport) must be started.
For example, deploy both a <distributable/> war and a @Clustered EJB.

Then it's a timing race condition.
I have not been successful forcing it to trigger with Byteman yet, but have occasionally when just starting EAP with the above configuration.

Comment 5 Jiří Bílek 2016-06-30 15:11:26 UTC

Hello Dennis,
I cannot reproduce the issue. Could you attach the appropriate standalone.xml and deployment, please?

Comment 6 dereed 2016-07-01 00:04:40 UTC

Created attachment 1174754 [details]
test.ear

Comment 7 dereed 2016-07-01 00:05:21 UTC

Created attachment 1174755 [details]
standalone-ha.xml

Comment 8 dereed 2016-07-01 00:06:54 UTC

Attached an example deployment to trigger the issue (an ear with a <distributable/> war and a @Clustered EJB),
and standalone-ha.xml from EAP 6.4.6 (the version I had easily available) with the two changes as detailed in #4 to enable the diagnostics socket.

Comment 9 dereed 2016-07-01 00:08:37 UTC

And as mentioned above, it's a race condition and I was not able to get a test case to consistently trigger it.  With this simple application the errors will trigger occasionally on startup of EAP.

Comment 10 Jiří Bílek 2016-07-01 07:02:00 UTC

Thank you Dennis,
error occured in EAP 6.4.6  7 times in 10 starts,
error did not occured in EAP 6.4.9 in 30 starts.

Verified with EAP 6.4.9.CP.CR2

Comment 11 Petr Penicka 2017-01-17 13:00:09 UTC

Retroactively bulk-closing issues from released EAP 6.4 cummulative patches.