Bug 1152934

Summary: TopologyAwareConsistentHashFactory is slow for large cluster
Product: [JBoss] JBoss Data Grid 6 Reporter: Takayoshi Kimura <tkimura>
Component: InfinispanAssignee: Tristan Tarrant <ttarrant>
Status: CLOSED EOL QA Contact: Martin Gencur <mgencur>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.3.1CC: jdg-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-05-01 00:21:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Takayoshi Kimura 2014-10-15 08:35:14 UTC
Observed 100% CPU usage for a long time on coordinator node when booting 500 nodes with 500 caches defined.

It looks like the TopologyAwareConsistentHashFactory performs O(n^2), it has double loop for all Machines. It takes 50 sec to compute rebalance with 1 cache 500 nodes. This calculation is performed on every cache, so it eats 25000 sec CPU times with 500 nodes 500 caches.

The hprof shows 90% of the time is consumed in the TopologyInfo.computeMaxSegmentsForMachine().

Comment 2 Dan Berindei 2014-10-15 09:48:57 UTC
Takayoshi, have you seen the perf problems only with TopologyAwareConsistentHashFactory? Have you also tested with TopologyAwareSyncConsistentHashFactory?

Comment 3 Takayoshi Kimura 2014-10-15 13:23:58 UTC
Yes it's TopologyAwareConsistentHashFactory only, it took 55 sec with 500 nodes. The Sync one took only 2 sec, not a problem so far.