| Summary: | stuck hotrod operations | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [JBoss] JBoss Data Grid 5 | Reporter: | Michal Linhard <mlinhard> | ||||
| Component: | Infinispan | Assignee: | Default User <jbpapp-maint> | ||||
| Status: | CLOSED NEXTRELEASE | QA Contact: | |||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | EAP 5.1.0 EDG TP | CC: | galder.zamarreno, mlinhard, nobody, rachmato, rhusar | ||||
| Target Milestone: | --- | ||||||
| Target Release: | EAP 5.1.0 EDG TP | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| URL: | http://jira.jboss.org/jira/browse/EDG-68 | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-04-12 07:02:08 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
|
Description
Michal Linhard
2011-03-24 13:07:12 UTC
attached jstack outputs for all relevant machines Attachment: Added: jstack.zip Galder can you please take a brief look at this ? the ends of the server logs http://hudson.qa.jboss.com/hudson/job/edg-51x-stress-client-size6-hotrod/4/console-perf18/consoleText ... http://hudson.qa.jboss.com/hudson/job/edg-51x-stress-client-size6-hotrod/4/console-perf22/consoleText might be misleading. some of the behaviour at the end is caused by me trying to undeploy datagrid.sar from EDG to kill hotrodserver and break the connections to clients thus freeing the clients out of the waiting. it was an attempt to end the test properly so we can get some proper graphs out of the smartfrog components. > they didn't timeout because hotrod client doesn't support timeout.
Actually, the version we used does support it but we didnt configure it so it used default which means disabled I recon JBPAPP-6048.
in client stress tests we used 4.2.1.CR4 on the client side so the code for timeout wasn't there. in jobs where we use the snapshot version property infinispan.client.hotrod.socket_timeout defaults to 60secs. Should we not get dev to create a CR5 to incorporate this (critical) fix, rather than use snapshots for further testing? @Richard I agree very much, it has been causing a lot of pain. Galder, wdyt? What is the relative state of 4.2.1.CR4 vs 4.2.1-SNAPSHOT? Would it be possible to cut a CR5 to give us some stability in testing? We released 4.2.1.FINAL last Friday, that should contain configurable hotrod client timeouts. Yeah, we are already using it. (And also were using SNAPSHOT with the timeout feature before to solve this) so it's not blocking us anymore. I created this JIRA only out of interrest what happened on the server side, cause I wasn't able to figure it out from the jstack reports. Michal, do you still need me to have a look at the jstack files? Or can this be closed? It would be nice to see what happened there, because we solved the problem by introducing hotrod client timeouts, but didn't get to the core of the "deadlock" that happened in the system on the server side. But it's a very low priority thing, and It's not blocking the tests anymore. I wouldn't object if we closed this for now. Michal, I had a quick look at the stacks and don't see anything peculiar. Let's close this and reopen at a later stage if necessary. |