Bug 1340342

Summary: [GSS](6.4.z) Race condition in ServiceProviderRegistryService
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: dereed
Component: ClusteringAssignee: Fedor Gavrilov <fgavrilo>
Status: CLOSED CURRENTRELEASE QA Contact: Michal Vinkler <mvinkler>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.4.7CC: bmaxwell, dosoudil, egonzale, fgavrilo, jbilek, jtruhlar, paul.ferraro, wfink
Target Milestone: CR1   
Target Release: EAP 6.4.10   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-17 12:56:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1350064, 1339868    
Attachments:
Description Flags
singleton.btm none

Description dereed 2016-05-27 06:06:30 UTC
org.jboss.as.clustering.service.ServiceProviderRegistryService is a multi-threaded class, but does not have any thread synchronization.

In particular, there is a race condition between local 
calls to "register" and remotely triggered calls to "modified".

This can result in the following order:
- ThreadA: "register" reads the current cache keySet
- ThreadB: "modified" call arrives for a new node started at the same time
- ThreadB: "register" calls "notifyListeners" with the new (correct) list
- ThreadA: "register" calls "notifyListeners" with the old (wrong) list

Result: listener ends up with the wrong data last.
For example, for singletons this can result in different election results on 
different nodes, resulting in multiple singletons or no singletons.

Comment 1 dereed 2016-05-27 06:09:56 UTC
Reproduction steps:
Install attached singleton.btm in node1 of a two-node cluster.
Start node1.
(the byteman script pauses the register() call for 20 seconds)
A few seconds later start node2.

Result: the calls are in the wrong order, with the older data last
INFO  [stdout] (notification-thread-0) XXX SingletonService.election candidates [jboss1/singleton, jboss2/singleton]
...
INFO  [stdout] (ServerService Thread Pool -- 55) XXX SingletonService.election candidates [jboss1/singleton]

Expected result: the calls are in the correct order
INFO  [stdout] (ServerService Thread Pool -- 55) XXX SingletonService.election candidates [jboss1/singleton]
...
INFO  [stdout] (notification-thread-0) XXX SingletonService.election candidates [jboss1/singleton, jboss2/singleton]

Comment 2 dereed 2016-05-27 06:10:33 UTC
Created attachment 1162371 [details]
singleton.btm

Comment 3 dereed 2016-05-27 06:21:45 UTC
[continuation of previous comment]

An alternative expected result would be that both calls have the same key list.
The important part is just that the last call has the correct entries.

Comment 11 Jiří Bílek 2016-08-09 13:42:37 UTC
Verified with EAP 6.4.10.CP.CR1

Comment 12 Petr Penicka 2017-01-17 12:56:40 UTC
Retroactively bulk-closing issues from released EAP 6.4 cummulative patches.