Bug 1155611

Summary: The topology id for the merged cache topology is not always bigger than all the partition topology ids
Product: [JBoss] JBoss Data Grid 6 Reporter: Dan Berindei <dberinde>
Component: InfinispanAssignee: Dan Berindei <dberinde>
Status: CLOSED CURRENTRELEASE QA Contact: Martin Gencur <mgencur>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.4.0CC: dstahl, jdg-bugs, slaskawi, vjuranek
Target Milestone: ER3   
Target Release: 6.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dan Berindei 2014-10-22 13:12:33 UTC
With the bug 1147885/ISPN-4574 fix, I changed the merge algorithm to pick the partition with the most members (both in the stable topology and in the current topology) instead of the partition with the highest topology id.

However, the biggest topology is not necessarily the partition with the highest topology id, so it's possible that some nodes will ignore the merged topology because they already have a higher topology installed. This happened once in ClusterTopologyManagerTest.testClusterRecoveryAfterThreeWaySplit:

00:24:59,286 DEBUG (transport-thread-NodeL-p33097-t6:) [ClusterCacheStatus] Recovered 3 partition(s) for cache cache: [CacheTopology{id=8, rebalanceId=3, currentCH=DefaultConsistentHash{ns = 60, owners = (1)[, NodeL-25322: 60+0]}, pendingCH=null, unionCH=null}, CacheTopology{id=6, rebalanceId=3, currentCH=DefaultConsistentHash{ns = 60, owners = (2)[, NodeL-25322: 30+10, NodeN-6727: 30+10]}, pendingCH=DefaultConsistentHash{ns = 60, owners = (2)[, NodeL-25322: 30+30, NodeN-6727: 30+30]}, unionCH=null}, CacheTopology{id=5, rebalanceId=2, currentCH=DefaultConsistentHash{ns = 60, owners = (1)[, NodeM-12972: 60+0]}, pendingCH=null, unionCH=null}]
00:24:59,287 DEBUG (transport-thread-NodeL-p33097-t6:) [ClusterCacheStatus] Updating topologies after merge for cache cache, current topology = CacheTopology{id=5, rebalanceId=2, currentCH=DefaultConsistentHash{ns = 60, owners = (1)[, NodeM-12972: 60+0]}, pendingCH=null, unionCH=null}, stable topology = CacheTopology{id=4, rebalanceId=2, currentCH=DefaultConsistentHash{ns = 60, owners = (3)[, NodeL-25322: 20+20, NodeM-12972: 20+20, NodeN-6727: 20+20]}, pendingCH=null, unionCH=null}, availability mode = null
00:24:59,287 DEBUG (transport-thread-NodeL-p33097-t6:) [ClusterTopologyManagerImpl] Updating cluster-wide current topology for cache cache, topology = CacheTopology{id=5, rebalanceId=2, currentCH=DefaultConsistentHash{ns = 60, owners = (1)[, NodeM-12972: 60+0]}, pendingCH=null, unionCH=null}, availability mode = null
00:24:59,288 TRACE (transport-thread-NodeL-p33097-t3:) [LocalTopologyManagerImpl] Ignoring consistent hash update for cache cache, current topology is 8: CacheTopology{id=5, rebalanceId=2, currentCH=DefaultConsistentHash{ns = 60, owners = (1)[, NodeM-12972: 60+0]}, pendingCH=null, unionCH=null}

Comment 2 Sebastian Ɓaskawiec 2014-10-24 11:51:07 UTC
PR: https://github.com/infinispan/jdg/pull/302