Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 758759

Summary:

Performance degradation in comparison with EDG 6.0.0.ALPHA

Product:

[JBoss] JBoss Data Grid 6

Reporter:

Michal Linhard <mlinhard>

Component:

unspecified

Assignee:

Tristan Tarrant <ttarrant>

Status:

CLOSED UPSTREAM

QA Contact:

Nobody <nobody>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

6.0.0

CC:

jdg-bugs, nobody, sanne, ttarrant

Target Milestone:

---

Target Release:

6.0.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2025-02-10 03:14:28 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Alternate infinispan-core without ISPN-1881	none
Performance results - Hot rod client stress tests	none
Performance results - Hot rod client stress tests + ER6 results	none

Description Michal Linhard 2011-11-30 16:36:13 UTC

in hotrod client stress test with 4 edg nodes we discovered a dramatic performance decrease.

EDG 6.0.0.Beta   (Infinispan 5.1.0.BETA5)
run:             https://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-REPORTS-PERF/job/edg-60-perf-client-stress-test-hotrod/117
config:          https://svn.devel.redhat.com/repos/jboss-qa/load-testing/etc/edg-60/configs/stress.xml (r21396)
build:           http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-QE/job/edg-60-build-edg-from-source/52/artifact/edg-srcbuild.zip
max throughput:  12181.04 ops/sec
max clients:     2200

EDG 6.0.0.Alpha  (Infinispan 5.0.0.FINAL)
run:             http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-QE/job/edg-60-experiments-mlinhard/81/
config:          https://svn.devel.redhat.com/repos/jboss-qa/load-testing/etc/edg-60/configs/stress-6.0.0.ALPHA1.xml (r21402)
build:           ftp://partners.redhat.com/fb2459b463c16599d101324b5af22d44/JBEDG-6.0.0-alpha/jboss-edg-6.0.0.alpha.zip
max throughput:  51734.306 ops/sec
max clients:     8000


degradation in max throughput:     -76 %
degradation in max clients:        -73 %

Comment 1 Michal Linhard 2011-12-02 08:26:38 UTC

Performance degradation in 1 node scenario (same config)

EDG 6.0.0 Pre-Beta-SNAPSHOT
http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-REPORTS-PERF/job/edg-60-perf-client-stress-test-hotrod/133/artifact/report/merged-throughput.png

EDG 6.0.0.Alpha
http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-QE/job/edg-60-experiments-mlinhard/84/artifact/report/merged-throughput.png

Comment 2 Michal Linhard 2011-12-02 08:29:50 UTC

jprofiler snapshots on the way

4node scenario snapshot blocked by 
https://bugzilla.redhat.com/show_bug.cgi?id=758218
https://issues.jboss.org/browse/ISPN-1508
(to try new EDG build with latest snapshot as commented on ISPN-1508)

1node scenario snapshot on the way...

Comment 3 Michal Linhard 2011-12-08 17:14:04 UTC

New runs were done with EDG with Infinispan 5.1.0.CR1 (based on ttarrant's branches of edg and as7)
http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-QE/job/edg-60-build-edg-from-source/68/artifact/edg-srcbuild.zip

this is the jprofiler snapshot of onde1 in a 4 node run:
http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-REPORTS-PERF/job/edg-60-perf-client-stress-test-hotrod/158/artifact/report/run1/jprofiler_snapshots/node01.jps

The config was modified: I've set <transaction mode="NONE"/>, see
https://svn.devel.redhat.com/repos/jboss-qa/load-testing/etc/edg-60/configs/stress.xml

With this the new 4node test performance results are following:

max throughput: 20723.166 ops/sec
max clients:    3200

so the degradation changed to:

degradation in max throughput:     -60 %
degradation in max clients:        -60 %

Comment 4 Michal Linhard 2011-12-09 10:18:23 UTC

JProfiler snapshot of run with new non-transactional settings:
https://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-REPORTS-PERF/job/edg-60-perf-client-stress-test-hotrod/161/artifact/report/size4/jprofiler_snapshots/node01.jps

Comment 5 Michal Linhard 2011-12-15 08:46:15 UTC

Until we find a component guilty for this degradation, I'll leave this assigned to you ;-)

Comment 6 Sanne Grinovero 2011-12-19 16:37:34 UTC

Some notes:

- the new configuration has DEBUG enabled on console logging, and enabled DEBUG for org.infinispan.util.concurrent.locks.LockManagerImpl

- Why the virtual nodes change? Ok if you need it, but to compare performance we'd need them to be configured as close as possible.

- From the JProfiler dump:
 -- Is it expected to use only ~40% of CPU time?
 -- It seems it's heavily network IO bound.. did network tuning change on the machines? Bela suggested recently that they where reconfigured, not sure if it's the same lab.
 -- there are lots of jgroups threads being unused; JGroups doesn't seem to be used at all.. any explanation for that? Could we reduce number of threads?
 -- MemCachedWorker threads are saturated with io work.. might need some more, assuming the machine can handle more network traffic (?).

Could you run the profiler using light weight sampling instead of instrumentation? I'm looking at the CPU metrics but they are quite unrealistic,  suspect instrumentation messed up with performance metrics as they usually do. (Like having spent 12% of total CPU to start the 378 threads is not something I'd expect to see).

Comment 7 Michal Linhard 2011-12-21 10:58:48 UTC

Sanne, thanks for the comments.
I'll try to compare runs with modified configuration according to your notes.

Virtual nodes:
- in EDG 6.0.0.ALPHA (5.0.0.FINAL), when allowed 512 virtual nodes there was a deadlock during startup, due to quite lot of data transferred among hotrod topology caches. (https://issues.jboss.org/browse/EDG-8)
- in current I expect virtual nodes behaving better, but it's worth trying one run where the settings are synced...

JGroups:
- in ALPHA runs setting number of jgroups threads to 200 dramatically increased performance
- but I'm gonna try one lowered config too

I'm assigning this back to myself, because I think I don't have yet enough data to support the decrease claim. We still must create some coparison runs to eliminate possibility that changed performance is due to
- changed hardware lab
- changed test framework

Comment 8 Sanne Grinovero 2011-12-21 12:10:33 UTC

Thanks Michal,
yes it seems a good idea to re-run both in the current environment.

Of course it might not be possible to use the same exact configuration, but try to be close, especially with virtual nodes and logging options.

Don't bother changing the JGroups number of threads, the goal is not to optimize it but to compare performance right? I just noticed that weird amount in your profiler dump. Maybe it's worth after this running it again tuning the configuration so we actually find sweet spots to recommend.

keep us updated, this is very interesting!

Comment 9 Michal Linhard 2012-03-08 09:25:04 UTC

Round 10 of performance tests for JDG 6.0.0.ER2 show comparison of client stress tests with EDG 6.0.0.ALPHA

https://docspace.corp.redhat.com/docs/DOC-94439

The size4 scenario isn't that catastrophic anymore, the difference is less than 4% in both max throughput and max clients reached.

What's still a problem is size2 scenario.
So far comparison was done only for hot rod client, but more is on the way...

Comment 10 Michal Linhard 2012-03-09 14:00:21 UTC

I added memcached results to DOC-94439, that don't look very well neither, but the results for ER2 are a bit questionable, because they didn't end properly.

Comment 11 Michal Linhard 2012-03-10 09:30:27 UTC

New results in DOC-94439:
- memcached size2/4 after rerun not catastrophic anymore
- obtained REST results: catastrophic, needs rerun

Comment 12 Michal Linhard 2012-03-12 10:38:52 UTC

The catastrophic REST results are confirmed. I'm gonna take a look at that.

Comment 13 Tristan Tarrant 2012-03-16 09:49:50 UTC

Created attachment 570548 [details]
Alternate infinispan-core without ISPN-1881

The attached JAR should be used to repeat the tests and see whether it solves the two-node scenario. Just overwrite the modules/org/infinispan/main/infinispan-core-5.1.3.ER4-redhat-1.jar with this jar

Comment 14 Michal Linhard 2012-03-19 17:02:45 UTC

Results of the client stress test here:
https://docspace.corp.redhat.com/docs/DOC-95151

Comment 15 Michal Linhard 2012-03-28 13:54:20 UTC

Performance comparison with ALPHA is still being evaluated and we still need to decide whether this can be closed for ER5.

Comment 16 Michal Linhard 2012-03-29 07:27:37 UTC

Created attachment 573554 [details]
Performance results - Hot rod client stress tests

Comment 17 Michal Linhard 2012-03-29 10:32:53 UTC

4 node JProfiler run, 100 hotrod clients, 3 minutes, JDG 6.0.0.ER5, ER5 config:
http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-QE/job/edg-60-experiments-mlinhard-perflab/80/artifact/report/size2/jprofiler_snapshots/

same with ER4 config:
http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-QE/job/edg-60-experiments-mlinhard-perflab/81/artifact/report/size2/jprofiler_snapshots/

Comment 18 mark yarborough 2012-03-29 15:46:24 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
CFR here please

Comment 19 Michal Linhard 2012-03-29 16:28:44 UTC

2node JProfiler snapshots:

ER4 config:
http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-QE/job/edg-60-experiments-mlinhard-perflab/84/artifact/report/size2/jprofiler_snapshots/

ER5 config:
http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-QE/job/edg-60-experiments-mlinhard-perflab/83/artifact/report/size2/jprofiler_snapshots/

Comment 20 Dan Berindei 2012-03-29 18:01:17 UTC

Michal, I looked at the throughput for the last snapshots you posted and it looks like ER5 is actually faster than ER4 (654 vs 606 ops/s). I have a hunch it's because the number of clients is much lower (100 vs 10000). We have to increase it and see if we can get the bad results with ER5 again.

Also, JProfiler's instrumentation mode distorts the results when there are lots of method calls (as the profiler overhead becomes greater than the cost of the call itself). Please set up JProfiler to run in sampling mode instead.

Comment 21 Michal Linhard 2012-04-04 12:40:24 UTC

Created attachment 575110 [details]
Performance results - Hot rod client stress tests + ER6 results

Adding perf numbers for ER6. Size 2 tests are bad with ER6 config but with ER4 config things are ok compared to ALPHA.

Comment 22 Michal Linhard 2012-04-05 09:50:21 UTC

Deleted Technical Notes Contents.

Old Contents:
CFR here please

Comment 23 Michal Linhard 2012-04-05 10:11:12 UTC

The ER4 vs ER5/ER6 issue will be followed here:
https://bugzilla.redhat.com/show_bug.cgi?id=810155

Comment 24 Misha H. Ali 2012-06-04 01:14:03 UTC

Tristan/Mark, BZ#810155 is included for release notes. Is that sufficient documentation and cause to exclude this bug?

Comment 31 Red Hat Bugzilla 2025-02-10 03:14:28 UTC

This product has been discontinued or is no longer tracked in Red Hat Bugzilla.