Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1116969

Summary: Outbound transfers can be cancelled by old CANCEL_STATE_TRANSFER command
Product: [JBoss] JBoss Data Grid 6 Reporter: Dan Berindei <dberinde>
Component: InfinispanAssignee: Tristan Tarrant <ttarrant>
Status: CLOSED CURRENTRELEASE QA Contact: Martin Gencur <mgencur>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.3.0CC: afield, jdg-bugs
Target Milestone: CR2   
Target Release: 6.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-01-26 14:05:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1104045    

Description Dan Berindei 2014-07-07 18:09:28 UTC
This appeared during the 32-nodes elasticity test in the Hyperion environment.

Just as apex947 left, it started a rebalance, which apex948 dutifully cancelled as it became the new coordinator. apex949 had already requested segments from apex959, so it sent a StateRequestCommand(CANCEL_STATE_TRANSFER) asynchronously to apex959. Then apex948 started a new rebalance, and apex949 asked apex959 for the same segments. When apex959 finally received the cancel request, it didn't check the topology id and it incorrectly cancelled the outbound transfer to apex949.

The solution would be to verify the topology id in the CANCEL_STATE_TRANSFER command before cancelling the transfer. I also think we can avoid sending the cancel command completely in this case, and only send it as we are about to stop.

Comment 2 Alan Field 2014-07-15 11:38:32 UTC
Executed the elasticity test in Hyperion 3 times without a failure, and the resilience test 5 times without a failure with JDG 6.3.0 CR4. VERIFIED