Bug 1262114
Summary: | [GSS](6.4.z) Deadlock when connection is closing while we are writing | ||
---|---|---|---|
Product: | [JBoss] JBoss Enterprise Application Platform 6 | Reporter: | Aaron Ogburn <aogburn> |
Component: | Remoting | Assignee: | Aaron Ogburn <aogburn> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Jitka Kozana <jkudrnac> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 6.4.3 | CC: | bmaxwell, cdewolf, david.lloyd, egonzale, rjanik |
Target Milestone: | CR1 | ||
Target Release: | EAP 6.4.5 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | Type: | Bug | |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1235745, 1253482, 1262449, 1264927, 1265008, 1266518, 1278889 |
Description
Aaron Ogburn
2015-09-10 20:14:08 UTC
Hi Aaron: The problem is that when an endpoint receives a CLOSE signal it tries to send another CLOSE to the other end resulting in this behaviour. IMO the problem is in here:(send something is async, and closing the channel is also async... this causes the race condition) This is the entry point when the endpoint receives the CLOSE signal https://github.com/jboss-remoting/jboss-remoting/blob/3.3.5.Final/src/main/java/org/jboss/remoting3/remote/RemoteConnectionHandler.java#L112 This is the entry point when the endpoint sends the CLOSE signal https://github.com/jboss-remoting/jboss-remoting/blob/3.3.5.Final/src/main/java/org/jboss/remoting3/remote/RemoteConnectionHandler.java#L409 IMO the offender is: https://github.com/jboss-remoting/jboss-remoting/blob/3.3.5.Final/src/main/java/org/jboss/remoting3/remote/RemoteConnectionHandler.java#L113 should be executed only when the endpoint is closed (closeActionn) and not when is receiving the CLOSE SIGNAL (handling close) Thinking on Enrique's suggestion, I'm not sure it explains it because sendCloseRequest would only be called after the connection is flagged as closing in RemoteReadListener.handleEvent. So anything sent as a result of the sendCloseRequest should actually not send because of the closing check added to RemoteConnection$RemoteWriteListener.send. But I've noticed that the possibility for this deadlock is introduced by the changes for the RejectedExecutionException bug: https://github.com/jboss-remoting/jboss-remoting/commit/97bddc0d16deb421f1dea0baa61aeaeaa7c504b4 We removed the executor altogether that used to handle the message cancels during channel close. Thus the RemoteConnectionHandler.closeAllChannels call is blocking in and can deadlock in OutboundMessage.cancel on remoting 3.3.5.Final. Thus, this deadlock is avoidable on EAP 6.4.1 and earlier for now. So to avoid the deadlock, we'll need to revert changes that removed the executor. David suggested reverting: #1 https://github.com/jboss-remoting/jboss-remoting/commit/97bddc0d16deb421f1dea0baa61aeaeaa7c504b4 #2 https://github.com/jboss-remoting/jboss-remoting/commit/055f0c4b133a3cb361d8d7c9ba7825e000ee1386 #3 https://github.com/jboss-remoting/jboss-remoting/commit/eff6cb787f567d2a917844c189bb4fa565e0257b And then implement a new fix to avoid https://bugzilla.redhat.com/show_bug.cgi?id=1238420. Perhaps: https://github.com/jboss-remoting/jboss-remoting/compare/3.3...dmlloyd:3.3-tracking-exec 3.3 PR to revert prior changes: https://github.com/jboss-remoting/jboss-remoting/pull/46 Commit to avoid RejectedExecutionExceptions as well without having removed the executor to introduce these deadlocks: https://github.com/jboss-remoting/jboss-remoting/commit/61c32c01c7b9f893a50842f08d3ebe9d3ef81797 Hi Aaron: if your fix for this issue is reverting things, they are already in: PR 3.3: https://github.com/jboss-remoting/jboss-remoting/pull/46 Upstream: not required. Your comment https://bugzilla.redhat.com/show_bug.cgi?id=1262114#c8 is related to another BZ (https://bugzilla.redhat.com/show_bug.cgi?id=1238420) Verified. For the record, I used reproducer from https://bugzilla.redhat.com/show_bug.cgi?id=1264927 and tried with the reproducer.btm present in there and also with a reproducer.btm containing the following rule: RULE trigger deadlock two CLASS org.jboss.remoting3.remote.OutboundMessage METHOD cancel AT ENTRY IF TRUE DO Thread.sleep(3000) ENDRULE Retroactively bulk-closing issues from released EAP 6.4 cumulative patches. Retroactively bulk-closing issues from released EAP 6.4 cumulative patches. |