Red Hat Bugzilla – Bug 1266112
(6.4.z) ConcurrentModificationException in ClusterContext.getConnectedAndDeployedNodes
Last modified: 2017-01-17 06:38:04 EST
Description of problem:
This tests starts a cluster of 2 servers, deploys a simple application and calls a remote EJB from a standalone client (actually 10 clients in 10 threads). During the test, both servers are repeatedly killed and then brought back to life, but only one at a time. The expectation is that around the time the server is killed, some exceptions can occur (IOException "channel has been closed" is typical here).
However, in a "quiet" period where both servers are up, no exceptions are expected. Yet the test shows this exception occuring:
Searching through Bugzilla, I found bug 1204055, which brings back sad memories :-) Anyway, 6.4.0 has EJB Client 1.0.30, while 6.4.4.CR3 has 1.0.31. This was actually updated in 6.4.1 and contains a highly suspicious commit: https://github.com/jbossas/jboss-ejb-client/commit/18c52b4141d5de2d1164e2f1277336e956f26e55 This commit indeed adds the method `getConnectedAndDeployedNodes`, which iterates on a synchronized `Set` without holding a lock on it (instead, it holds a lock on `this`).
Version-Release number of selected component (if applicable):
EAP 6.4.4.CP.CR3, JBoss EJB Client 1.0.31.Final-redhat-1
Sometimes, not sure how often. Definitely not a 100% reliable reproducer.
Steps to Reproduce:
See a private comment below.
ConcurrentModificationException doesn't occur.
Iterating over a set while performing operations like adding or removing items without synchronizing those operations seems to be the problem.
they are not sync with the same lock, e.g:
BZ 1.0.x: https://github.com/jbossas/jboss-ejb-client/pull/130
Verified with EAP 6.4.5.CP.CR1.
Retroactively bulk-closing issues from released EAP 6.4 cumulative patches.