Bug 745910 (EDG-18)

Summary: poor performance when using virtual nodes
Product: [JBoss] JBoss Data Grid 6 Reporter: Michal Linhard <mlinhard>
Component: unspecifiedAssignee: Default User <jbpapp-maint>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: unspecifiedCC: bela, galder.zamarreno, jdg-bugs, mlinhard, nobody, prabhat.jha
Target Milestone: ---   
Target Release: 6.0.0   
Hardware: Unspecified   
OS: Unspecified   
URL: http://jira.jboss.org/jira/browse/EDG-18
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-09-01 08:24:21 UTC Type: Feature Request
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Michal Linhard 2011-07-27 19:53:50 UTC
project_key: EDG

a run with numVirtualNodes=1 achieves max throughput of cca 180K ops/sec
(see https://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-stress-client-hotrod-size4/66/artifact/report/log.txt)
a run with numVirtualNodes=500 stabilizes around cca 45K ops/sec
(see https://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-stress-client-hotrod-size4/68/artifact/report/log.txt)

Comment 1 Michal Linhard 2011-07-27 21:01:57 UTC
jprofiler snapshot, four nodes, perf17 was profiled, 1000 clients, 1 iteration
https://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-stress-client-hotrod-size4/70/artifact/report/jprofiler-snapshot.jps

Comment 2 Michal Linhard 2011-07-27 21:25:01 UTC
there's something very suspicious there:
all the traffic comes via JGroups replication channel:
19.7% - 811 s - 139,656 inv. org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle
and almost none via netty server interface:
0.3% - 13,367 ms - 484 inv. org.infinispan.server.core.AbstractProtocolDecoder.messageReceived

my first suspect is hotrod client not routing properly - perf17 isn't getting any requests.

Comment 3 Michal Linhard 2011-07-28 07:59:20 UTC
in my laptop I ran 4 instances bound to test1-test4 and hotrod client routest to only two: test1 and test4
digging further ...

Comment 4 Galder Zamarreño 2011-07-28 08:02:43 UTC
I haven't checked the profiler data or anything but as a FYI: As noted in https://issues.jboss.org/browse/ISPN-1090?focusedCommentId=12612485&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12612485, the virtual topologies sent back in this version of the protocol are not the most efficient and we have some improvements coming up in version 2.

Comment 5 Michal Linhard 2011-07-28 08:16:49 UTC
sending of the topologies shouldn't be a performance issue, they're sent only once per topology change, right ?

Comment 6 Galder Zamarreño 2011-07-28 08:25:01 UTC
Indeed. That's why I noted it as a FYI, cos I don't think it's particularly relevant here. If topology was been sent too often, you'd be able to spot by too many calls in HotRodEncoder to writeHashTopologyHeader() method.

Comment 7 Michal Linhard 2011-07-28 09:03:02 UTC
Link: Added: This issue depends ISPN-1273


Comment 8 Michal Linhard 2011-07-28 09:22:24 UTC
the performance effect of ISPN-1273 is clear to me now:
hot rod client gets only last hash id for each server - this means hash ids are likely to end up very closely to each other on the hash wheel - which means picking the first two ones from whatever server count.

this means hotrod client contacts only two of the four servers -> ending up with very inefficient unnecessary network hopping.

Comment 9 Galder Zamarreño 2011-08-02 07:01:46 UTC
ISPN-1273 is resolved now, so this should be closed?

Comment 10 Michal Linhard 2011-08-02 08:03:38 UTC
I'm gonna try a run with snapshot and then I'll close it...

Comment 11 Anne-Louise Tangring 2011-09-26 19:41:26 UTC
Docs QE Status: Removed: NEW