Bug 1266112

Summary: (6.4.z) ConcurrentModificationException in ClusterContext.getConnectedAndDeployedNodes
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: Ladislav Thon <lthon>
Component: EJB, ClusteringAssignee: Enrique Gonzalez Martinez <egonzale>
Status: CLOSED CURRENTRELEASE QA Contact: Ladislav Thon <lthon>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.4.4CC: bmaxwell, cdewolf, david.lloyd, egonzale, rachmato
Target Milestone: CR1   
Target Release: EAP 6.4.5   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-17 11:38:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1235745, 1261014    

Description Ladislav Thon 2015-09-24 13:57:44 UTC
Description of problem:

This tests starts a cluster of 2 servers, deploys a simple application and calls a remote EJB from a standalone client (actually 10 clients in 10 threads). During the test, both servers are repeatedly killed and then brought back to life, but only one at a time. The expectation is that around the time the server is killed, some exceptions can occur (IOException "channel has been closed" is typical here).

However, in a "quiet" period where both servers are up, no exceptions are expected. Yet the test shows this exception occuring:

java.util.ConcurrentModificationException
        at java.util.HashMap$HashIterator.nextNode(HashMap.java:1429)
        at java.util.HashMap$KeyIterator.next(HashMap.java:1453)
        at org.jboss.ejb.client.ClusterContext.getConnectedAndDeployedNodes(ClusterContext.java:265)
        at org.jboss.ejb.client.ClusterContext.getEJBReceiverContext(ClusterContext.java:125)
        at org.jboss.ejb.client.ClusterContext.getEJBReceiverContext(ClusterContext.java:96)
        ...

Searching through Bugzilla, I found bug 1204055, which brings back sad memories :-) Anyway, 6.4.0 has EJB Client 1.0.30, while 6.4.4.CR3 has 1.0.31. This was actually updated in 6.4.1 and contains a highly suspicious commit: https://github.com/jbossas/jboss-ejb-client/commit/18c52b4141d5de2d1164e2f1277336e956f26e55 This commit indeed adds the method `getConnectedAndDeployedNodes`, which iterates on a synchronized `Set` without holding a lock on it (instead, it holds a lock on `this`).

Version-Release number of selected component (if applicable):

EAP 6.4.4.CP.CR3, JBoss EJB Client 1.0.31.Final-redhat-1

How reproducible:

Sometimes, not sure how often. Definitely not a 100% reliable reproducer.

Steps to Reproduce:

See a private comment below.

Actual results:

ConcurrentModificationException occurs.

Expected results:

ConcurrentModificationException doesn't occur.

Comment 2 Enrique Gonzalez Martinez 2015-09-24 14:28:24 UTC
Iterating over a set while performing operations like adding or removing items without synchronizing those operations seems to be the problem.

they are not sync with the same lock, e.g:

https://github.com/jbossas/jboss-ejb-client/blob/18c52b4141d5de2d1164e2f1277336e956f26e55/src/main/java/org/jboss/ejb/client/ClusterContext.java#L122

https://github.com/jbossas/jboss-ejb-client/blob/18c52b4141d5de2d1164e2f1277336e956f26e55/src/main/java/org/jboss/ejb/client/ClusterContext.java#L265

Comment 6 Enrique Gonzalez Martinez 2015-09-28 08:26:39 UTC
BZ 1.0.x: https://github.com/jbossas/jboss-ejb-client/pull/130

Comment 12 Ladislav Thon 2015-11-04 12:35:04 UTC
Verified with EAP 6.4.5.CP.CR1.

Comment 13 Petr Penicka 2017-01-17 11:38:04 UTC
Retroactively bulk-closing issues from released EAP 6.4 cumulative patches.