Bug 828504 - State transfer taking too long on node join
State transfer taking too long on node join
Product: JBoss Data Grid 6
Classification: JBoss
Component: Infinispan (Show other bugs)
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Tristan Tarrant
Michal Linhard
Depends On:
  Show dependency treegraph
Reported: 2012-06-04 15:56 EDT by Michal Linhard
Modified: 2014-03-17 00:02 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2012-06-06 09:12:02 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Michal Linhard 2012-06-04 15:56:56 EDT
The symptoms are basically the same as in 

but I didn't want to mix with the old comments in case the cause here is different.

happened in 32 node elasticity test

We start with 16 nodes and add up to 31 nodes when we add the 32nd node the state transfer takes 580sec. previously all state transfers took under 2min.
Comment 1 Tristan Tarrant 2012-06-05 02:37:37 EDT
Remember about the threading changes in the default ER11 configuration. What configuration are you using here ?
Comment 2 Dan Berindei 2012-06-05 09:57:37 EDT
Indeed, it looks very similar, the cache entries seem to take a very long time to get to the 32nd node.

Michal, could you schedule another run with stateTransfer.chunkSize="1000"? (The default is 10000.)

If possible, also checkout branch `t_uuperf_30` from `git@github.com:danberindei/JGroups.git` and run the UUPerf test with the jgroups-udp.xml configuration from our 5.1.x branch (which should be the same as the JDG default) with 32 nodes.

Assuming you have copied jgroups-udp.xml in the JGroups directory and you've already set up the IP_ADDR and NODE_NAME environment variables:

JG=. bin/jgroups.sh -Djgroups.bind_addr=${IP_ADDR} org.jgroups.tests.perf.UUPerf -props jgroups-udp.xml -name ${NODE_NAME}
Comment 3 Michal Linhard 2012-06-05 12:12:22 EDT
I'll put it to my test queue in hyperion
Comment 4 Michal Linhard 2012-06-05 16:27:46 EDT
I just tried again:

and couldn't replicate it.
I'm going to do two more tests and tests proposed by Dan

I'd lessen the priority of this though. It doesn't seem to happen often and consequences are only lowered performance during view change.
Comment 5 Michal Linhard 2012-06-06 04:39:34 EDT
Another runs where I didn't reproduce this:

partial elasticity test 30->32

full elasticity test 16->32->16

Dan I'll move the further tests to the back of the hyperion test queue.
Comment 6 mark yarborough 2012-06-06 09:12:02 EDT
Prabhat concludes: Can't reproduce in hyperion lab.

Note You need to log in before you can comment on or make changes to this bug.