1108451 – Takes a long time to start up all clusters which has a lot of segments

Bug 1108451 - Takes a long time to start up all clusters which has a lot of segments

Summary: Takes a long time to start up all clusters which has a lot of segments

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	JBoss Data Grid 6
Classification:	JBoss
Component:	Infinispan
Sub Component:
Version:	6.2.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	ER7
Target Release:	6.3.0
Assignee:	Tristan Tarrant
QA Contact:	Martin Gencur
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-06-12 04:56 UTC by ksuzumur
Modified:	2018-12-06 16:49 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-01-26 14:05:32 UTC
Type:	Bug
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	ISPN-4400	0	Major	Resolved	Initial state transfer with large number of segments is very slow	2017-12-12 07:44:43 UTC

Comment 2 Adrian Nistor 2014-06-13 11:38:49 UTC

I've made a simple test with one DIST cache, 10 nodes, no data at all.

Initial state transfer completes in 3 seconds when numSegments == 500 and when numSegments == 5000 it takes more than 50 seconds.

Item #2 seems the have most impact. By switching to a simple HashSet in InboundTransferTask the time drops significatly, about 5 times. Unfortiunatelly the trivial solution of just replacing CopyOnWriteArraySet with HashSet breaks some concurrency related concerns so it's not immediatelly applicable. Instead I will try to see if using syncronization can get us better performance than usign concurrent collections.

Item #1 does not seem to give us much impromvement, but it is indeed a potential optimization. Should be solved by extracting the wCh.getSegmentsForOwner(..) call outside the loop.

I'm still working one #2, not sure if I can have a quick solution today.

Comment 3 Adrian Nistor 2014-06-13 16:13:18 UTC

PR here: https://github.com/infinispan/jdg/pull/123

Comment 4 Radim Vansa 2014-06-18 15:23:28 UTC

I believe that the functional perspective of this PR can be verified within the usual elasticity/resilience tests (in fact, the changes seem so simple that Infinispan testsuite should catch any bug as well).

For performance check, I could set up the cluster with many segments and see whether the startup time has improved (from logs - I don't think any automatization would be beneficial).

Comment 5 Martin Gencur 2014-06-19 06:53:20 UTC

Please run the usual elasticity/resilience tests. Thanks.

Comment 6 Martin Gencur 2014-06-20 16:03:43 UTC

The fix has been verified. It does NOT introduce any regression. We did not measure the speed up, though.

Comment 7 Adrian Nistor 2014-06-27 13:32:03 UTC

After additional reviewing in community we found some small additional, rather cosmetic, fixes: https://github.com/infinispan/jdg/pull/146

Note You need to log in before you can comment on or make changes to this bug.