| Summary: | Elasticity tests in REPL mode don't finish | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [JBoss] JBoss Data Grid 6 | Reporter: | Michal Linhard <mlinhard> | ||||||
| Component: | Infinispan | Assignee: | Tristan Tarrant <ttarrant> | ||||||
| Status: | CLOSED NOTABUG | QA Contact: | Michal Linhard <mlinhard> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 6.0.0 | CC: | dberinde, jdg-bugs, nobody | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2012-03-26 10:55:55 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Attachments: |
|
||||||||
|
Description
Michal Linhard
2012-03-20 17:25:20 UTC
Of course this is not easy to replicate with small load... Again this will be tough to TRACE Run with 10clients 10K dataload http://www.qa.jboss.com/~mlinhard/hyperion/run34-elasticity4-repl/report/stats-throughput.png Run with 500 clients 5% dataload TRACE log: http://www.qa.jboss.com/~mlinhard/hyperion/run35-elasticity4-repl-trace/ generated 7.8G of logs, didn't replicate the issue times to install views are around 40 secs: http://www.qa.jboss.com/~mlinhard/hyperion/run35-elasticity4-repl-trace/table.html Another TRACE run that doesn't reproduce the problem: http://www.qa.jboss.com/~mlinhard/hyperion/run37-elasticity4-repl-trace/report/stats-throughput.png Another non-trace run that doesn't reproduce the problem: http://www.qa.jboss.com/~mlinhard/hyperion/run38-elasticity4-repl/report/stats-throughput.png http://www.qa.jboss.com/~mlinhard/hyperion/run38-elasticity4-repl/table.html Dan Berindei <dberinde> made a comment on jira ISPN-1933 Michal, it looks like your test is using standalone.xml from the master branch, which is outdated, and not the latest config in the prod-6.0.0 branch. I updated the config we're using for the REPL tests: https://svn.devel.redhat.com/repos/jboss-qa/load-testing/etc/edg-60/configs/comparison/stress-repl-sync.xml In hyperion I ran the 4 node elasticity test 3 times and couldn't reproduce this anymore. In edg lab this problem is still there: http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-REPORTS-RESILIENCE/job/edg-60-elasticity-repl-basic/4/ and still only with REPL case, the DIST works with the same JGroups config: http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-REPORTS-RESILIENCE/job/edg-60-elasticity-dist-basic/46/ Second run in edg lab didn't even get to 3 node cluster: http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-REPORTS-RESILIENCE/job/edg-60-elasticity-repl-basic/5/artifact/report/stats-throughput.png I'll try to run with trace logging Tradaaaaa! Reproduced with TRACE log: http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-REPORTS-RESILIENCE/job/edg-60-elasticity-repl-basic/6/artifact/report/stats-throughput.png this might be a framework problem. The test might have been ended sooner because it didn't ignore some expected exceptions during join/leave. Consider this a false alarm until further notice Michal, I don't think it's a false alarm, but it's a different problem. I'm seeing these errors in your TRACE log: 08:10:29,004 ERROR [org.infinispan.statetransfer.StateTransferLockImpl] Trying to release state transfer shared lock without acquiring it first: java.lang.Exception 08:10:29,005 ERROR [org.infinispan.statetransfer.StateTransferLockImpl] Trying to release state transfer shared lock without acquiring it first: java.lang.Exception They are certainly not expected, so I'm looking into it. Oh It's a new one you're right, I thought it's https://issues.jboss.org/browse/ISPN-1754 therefore I didn't report it, but the stack trace is different there. Did you create JIRA for that ? It occurs only after I start stopping the servers (or killing them). In the log you can see this event marked by "Test will now stop the server", that's written to the log output file right before I send the kill signal to JBoss. I think I meant this one: https://issues.jboss.org/browse/ISPN-1704 So now I don't know why I didn't reopen it :-) For some reason I thought it's something expected/insignificant, it's good that you spotted that. It looks very much like ISPN-1704, but that one happened on the surviving nodes - this one seems to happen on the nodes that you're killing. So I think it might be worth a separate bug after all. After fixing a problem in test framework, tests run till the end, and all state tansfers complete under 5sec. http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-REPORTS-RESILIENCE/job/edg-60-elasticity-repl-basic/12/artifact/report/stats-throughput.png Created attachment 572732 [details]
view installation times after framework fix
adding new view installation times after framework fix
Michal Linhard <mlinhard> updated the status of jira ISPN-1933 to Closed Michal Linhard <mlinhard> made a comment on jira ISPN-1933 This was a problem in the test itself. The state transfer didn't really last more than 10 min. |