Bug 828504 - State transfer taking too long on node join
Summary: State transfer taking too long on node join
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: JBoss Data Grid 6
Classification: JBoss
Component: Infinispan
Version: 6.0.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Tristan Tarrant
QA Contact: Michal Linhard
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-06-04 19:56 UTC by Michal Linhard
Modified: 2014-03-17 04:02 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2012-06-06 13:12:02 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker ISPN-2093 0 None None None Never

Description Michal Linhard 2012-06-04 19:56:56 UTC
The symptoms are basically the same as in 
https://bugzilla.redhat.com/show_bug.cgi?id=786202

but I didn't want to mix with the old comments in case the cause here is different.

happened in 32 node elasticity test
http://www.qa.jboss.com/~mlinhard/hyperion/run148-elas-dist-32-ER11/logs/analysis/views.html

We start with 16 nodes and add up to 31 nodes when we add the 32nd node the state transfer takes 580sec. previously all state transfers took under 2min.

Comment 1 Tristan Tarrant 2012-06-05 06:37:37 UTC
Remember about the threading changes in the default ER11 configuration. What configuration are you using here ?

Comment 2 Dan Berindei 2012-06-05 13:57:37 UTC
Indeed, it looks very similar, the cache entries seem to take a very long time to get to the 32nd node.

Michal, could you schedule another run with stateTransfer.chunkSize="1000"? (The default is 10000.)

If possible, also checkout branch `t_uuperf_30` from `git:danberindei/JGroups.git` and run the UUPerf test with the jgroups-udp.xml configuration from our 5.1.x branch (which should be the same as the JDG default) with 32 nodes.

Assuming you have copied jgroups-udp.xml in the JGroups directory and you've already set up the IP_ADDR and NODE_NAME environment variables:

JG=. bin/jgroups.sh -Djgroups.bind_addr=${IP_ADDR} org.jgroups.tests.perf.UUPerf -props jgroups-udp.xml -name ${NODE_NAME}

Comment 3 Michal Linhard 2012-06-05 16:12:22 UTC
I'll put it to my test queue in hyperion

Comment 4 Michal Linhard 2012-06-05 20:27:46 UTC
I just tried again:
http://www.qa.jboss.com/~mlinhard/hyperion/run160-elas-dist-32-ER11-partial/logs/analysis/views.html

and couldn't replicate it.
I'm going to do two more tests and tests proposed by Dan

I'd lessen the priority of this though. It doesn't seem to happen often and consequences are only lowered performance during view change.

Comment 5 Michal Linhard 2012-06-06 08:39:34 UTC
Another runs where I didn't reproduce this:

partial elasticity test 30->32
http://www.qa.jboss.com/~mlinhard/hyperion/run161-elas-dist-32-ER11-partial/logs/analysis/views.html

full elasticity test 16->32->16
http://www.qa.jboss.com/~mlinhard/hyperion/run162-elas-dist-32-ER11/logs/analysis/views.html

Dan I'll move the further tests to the back of the hyperion test queue.

Comment 6 mark yarborough 2012-06-06 13:12:02 UTC
Prabhat concludes: Can't reproduce in hyperion lab.


Note You need to log in before you can comment on or make changes to this bug.