Observed 100% CPU usage for a long time on coordinator node when booting 500 nodes with 500 caches defined. It looks like the TopologyAwareConsistentHashFactory performs O(n^2), it has double loop for all Machines. It takes 50 sec to compute rebalance with 1 cache 500 nodes. This calculation is performed on every cache, so it eats 25000 sec CPU times with 500 nodes 500 caches. The hprof shows 90% of the time is consumed in the TopologyInfo.computeMaxSegmentsForMachine().
Takayoshi, have you seen the perf problems only with TopologyAwareConsistentHashFactory? Have you also tested with TopologyAwareSyncConsistentHashFactory?
Yes it's TopologyAwareConsistentHashFactory only, it took 55 sec with 500 nodes. The Sync one took only 2 sec, not a problem so far.