Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 828504

Summary: State transfer taking too long on node join
Product: [JBoss] JBoss Data Grid 6 Reporter: Michal Linhard <mlinhard>
Component: InfinispanAssignee: Tristan Tarrant <ttarrant>
Status: CLOSED WORKSFORME QA Contact: Michal Linhard <mlinhard>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.0.0CC: dberinde, jdg-bugs, myarboro, nobody
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-06 13:12:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michal Linhard 2012-06-04 19:56:56 UTC
The symptoms are basically the same as in 
https://bugzilla.redhat.com/show_bug.cgi?id=786202

but I didn't want to mix with the old comments in case the cause here is different.

happened in 32 node elasticity test
http://www.qa.jboss.com/~mlinhard/hyperion/run148-elas-dist-32-ER11/logs/analysis/views.html

We start with 16 nodes and add up to 31 nodes when we add the 32nd node the state transfer takes 580sec. previously all state transfers took under 2min.

Comment 1 Tristan Tarrant 2012-06-05 06:37:37 UTC
Remember about the threading changes in the default ER11 configuration. What configuration are you using here ?

Comment 2 Dan Berindei 2012-06-05 13:57:37 UTC
Indeed, it looks very similar, the cache entries seem to take a very long time to get to the 32nd node.

Michal, could you schedule another run with stateTransfer.chunkSize="1000"? (The default is 10000.)

If possible, also checkout branch `t_uuperf_30` from `git:danberindei/JGroups.git` and run the UUPerf test with the jgroups-udp.xml configuration from our 5.1.x branch (which should be the same as the JDG default) with 32 nodes.

Assuming you have copied jgroups-udp.xml in the JGroups directory and you've already set up the IP_ADDR and NODE_NAME environment variables:

JG=. bin/jgroups.sh -Djgroups.bind_addr=${IP_ADDR} org.jgroups.tests.perf.UUPerf -props jgroups-udp.xml -name ${NODE_NAME}

Comment 3 Michal Linhard 2012-06-05 16:12:22 UTC
I'll put it to my test queue in hyperion

Comment 4 Michal Linhard 2012-06-05 20:27:46 UTC
I just tried again:
http://www.qa.jboss.com/~mlinhard/hyperion/run160-elas-dist-32-ER11-partial/logs/analysis/views.html

and couldn't replicate it.
I'm going to do two more tests and tests proposed by Dan

I'd lessen the priority of this though. It doesn't seem to happen often and consequences are only lowered performance during view change.

Comment 5 Michal Linhard 2012-06-06 08:39:34 UTC
Another runs where I didn't reproduce this:

partial elasticity test 30->32
http://www.qa.jboss.com/~mlinhard/hyperion/run161-elas-dist-32-ER11-partial/logs/analysis/views.html

full elasticity test 16->32->16
http://www.qa.jboss.com/~mlinhard/hyperion/run162-elas-dist-32-ER11/logs/analysis/views.html

Dan I'll move the further tests to the back of the hyperion test queue.

Comment 6 mark yarborough 2012-06-06 13:12:02 UTC
Prabhat concludes: Can't reproduce in hyperion lab.