Hide Forgot
project_key: EDG After running soak tests with HotRod client with duration 8h and 48h we found out that throughput is gradually decreasing during the test. The following runs shows (in its artifacts) various statistics one of which is throughput (operations/second): https://hudson.mw.lab.eng.bos.redhat.com/hudson/view/EDG6/job/edg-60-soak-hotrod-size4/8/ https://hudson.mw.lab.eng.bos.redhat.com/hudson/view/EDG6/job/edg-60-soak-hotrod-size4/9/ This is a potential memory leak.
In which diagram/artifact did you see that decrease exactly? The diagram in https://hudson.mw.lab.eng.bos.redhat.com/hudson/view/EDG6/job/edg-60-soak-hotrod-size4/9/artifact/report/chart-cluster-throughput.png doesn't show much. If you suspect a memory leak, I'd suggest getting a heap dump when the test finishes so that we can inspect it.
I saw it in files https://hudson.mw.lab.eng.bos.redhat.com/hudson/view/EDG6/job/edg-60-soak-hotrod-size4/8/artifact/report/data_csv.txt and https://hudson.mw.lab.eng.bos.redhat.com/hudson/view/EDG6/job/edg-60-soak-hotrod-size4/9/artifact/report/data_csv.txt. I also created graphs from these files which will be in performance testing report in a few days. These graphs can be found at https://svn.devel.redhat.com/repos/jboss-qa/edg/release-testing/EDG6.0.0.ALPHA1/charts/hotrod_soak_8h_duration_vs_throughput.png and https://svn.devel.redhat.com/repos/jboss-qa/edg/release-testing/EDG6.0.0.ALPHA1/charts/hotrod_soak_48h_duration_vs_throughput.png. I did not have time to do the heap dump so far, will try to do it today/tomorrow.
Looking at the memory charts on that run I see nothing that would indicate a memory leak (see https://hudson.mw.lab.eng.bos.redhat.com/hudson/view/EDG6/job/edg-60-soak-hotrod-size4/9/artifact/report/chart-cluster-heap.png). Network traffic decreases (but that is expected with decreasing throughput). CPU usage decreases as well, so it's not like the server is being stressed there. Anything that we can do to see if too much time is spent in lock contention/acquisition ?
I ran this test with the same configuration yesterday, the build was successful, however the tests weren't finished, they failed in the iteration 32 (so after 32 minutes, instead of 8 hours) because the mean response time was above limits (7 sec): https://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-soak-hotrod-size4/13/
Docs QE Status: Removed: NEW
I configured SmartFrog to collect heap statistics also on driver nodes (HotRod clients are running on them) and ran 2hrs soak tests with 11000 clients. The results is the following: https://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-soak-hotrod-size4/23/artifact/report/chart-driver-heap.png . So it seems there is really a memory leak in HotRod clients but I will need to run 8hrs soak test to confirm this. I'll do that over night because our perf. lab is occupied during the day.
ISPN-1383 could be having an effect. Netty has some inbound buffers on the decoder size that are never pruned. They're capacity is only increased but never decreased. So, if the size of data that's stored varies over time, this capacity increase could have an impact on more mem consumption.
According to my last comment, the memory leak does not seem to be there anymore. Not in HotRod server nor in HotRod clients. Here's the graph for servers: https://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-soak-hotrod-size4/24/artifact/report/chart-cluster-heap.png. And here's the graph for HotRod clients: https://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-soak-hotrod-size4/24/artifact/report/chart-driver-heap.png. The throughput is more or less constant over the 8hrs run. This was no true with EDG ALPHA1 where the throughput was decreasing.