Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1117948

Summary: Members can miss the rebalance cancellation on coordinator change
Product: [JBoss] JBoss Data Grid 6 Reporter: Dan Berindei <dberinde>
Component: InfinispanAssignee: Tristan Tarrant <ttarrant>
Status: CLOSED CURRENTRELEASE QA Contact: Martin Gencur <mgencur>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.3.0CC: afield, jdg-bugs
Target Milestone: CR3   
Target Release: 6.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-01-26 14:06:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1104045    

Description Dan Berindei 2014-07-09 16:55:44 UTC
The new coordinator sends first a CH_UPDATE command to cancel the existing rebalance, and then a REBALANCE_START command to start a new rebalance. But the CH_UPDATE command is sent asynchronously, so it's possible for some members to receive it after the REBALANCE_START command.

If that happens, that node will assume that it will receive the segments it requested for the previous rebalance. But with the bug 1116969/ISPN-4484 fix, the provider node cancels the outbound transfer tasks when receiving a CH_UPDATE without a pendingCH, so the state requestor will never receive its segments.

Even without the bug 1116969/ISPN-4484 fix this is a problem, although less obvious. Between the provider node receiving the CH_UPDATE and the REBALANCE_START commands, it won't have the requestor in its write CH, so the requestor can miss transactions.

Comment 2 Alan Field 2014-07-15 11:37:32 UTC
Executed the elasticity test in Hyperion 3 times without a failure, and the resilience test 5 times without a failure with JDG 6.3.0 CR4. VERIFIED