Bug 1152934

Summary:	TopologyAwareConsistentHashFactory is slow for large cluster
Product:	[JBoss] JBoss Data Grid 6	Reporter:	Takayoshi Kimura <tkimura>
Component:	Infinispan	Assignee:	Tristan Tarrant <ttarrant>
Status:	CLOSED EOL	QA Contact:	Martin Gencur <mgencur>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	6.3.1	CC:	jdg-bugs
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2024-05-01 00:21:08 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Takayoshi Kimura 2014-10-15 08:35:14 UTC

Observed 100% CPU usage for a long time on coordinator node when booting 500 nodes with 500 caches defined.

It looks like the TopologyAwareConsistentHashFactory performs O(n^2), it has double loop for all Machines. It takes 50 sec to compute rebalance with 1 cache 500 nodes. This calculation is performed on every cache, so it eats 25000 sec CPU times with 500 nodes 500 caches.

The hprof shows 90% of the time is consumed in the TopologyInfo.computeMaxSegmentsForMachine().

Comment 2 Dan Berindei 2014-10-15 09:48:57 UTC

Takayoshi, have you seen the perf problems only with TopologyAwareConsistentHashFactory? Have you also tested with TopologyAwareSyncConsistentHashFactory?

Comment 3 Takayoshi Kimura 2014-10-15 13:23:58 UTC

Yes it's TopologyAwareConsistentHashFactory only, it took 55 sec with 500 nodes. The Sync one took only 2 sec, not a problem so far.