Bug 625541

Summary: C++ New API: Connection::close() hangs on suspended broker
Product: Red Hat Enterprise MRG Reporter: Ted Ross <tross>
Component: qpid-cppAssignee: Gordon Sim <gsim>
Status: CLOSED CURRENTRELEASE QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: DevelopmentCC: gsim, jross
Target Milestone: 1.3   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-12-11 19:09:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
testcase
none
Suggested fix tross: review+, tross: review+

Description Ted Ross 2010-08-19 18:58:47 UTC
Description of problem:

If an application using the C++ API (new style) is connected to a broker and the broker is stopped/suspended, the client application cannot close the connection.  Connection::close() hangs indefinitely.

Version-Release number of selected component (if applicable): MRG1.3 beta5

How reproducible:  100% (I've reproduced this on Fedora 11)


Steps to Reproduce:
1. Write a C++ client app that connects to a broker, delays, closes the connection, then exits.
2. Start the broker, run the app, suspend the broker (^Z, SIGSTOP) during the delay
3. Observe the app hanging and not exiting
4. Resume the broker (fg) and see the app immediately exit.
  
Actual results:  Hanging app

Expected results:  App exits normally

Comment 1 Ted Ross 2010-08-19 19:00:42 UTC
Impact:  This is likely to be a problem in deployments where the broker is running on a virtualized guest.  If the guest is force-stopped or suspended, client applications that clean up connections on exit will not be able to exit.

Comment 2 Gordon Sim 2010-08-19 19:07:05 UTC
Not a bug in my view; enable heartbeats which are there to allow this sort of condition to be detected.

Comment 3 Gordon Sim 2010-08-19 19:08:34 UTC
I guess we could also support a form of close that was not clean; i.e. abort or detach without attempting to do a clean handshake with the broker. That would not be a 1.3 change however.

Comment 4 Gordon Sim 2010-08-19 19:12:16 UTC
Further to my comment above, note that that would be only a limited solution. If the application also e.g. cancelled a subscription as part of shutdown then that too would hang. Heartbeats are intended to solve exactly this problem, so that is really my recommendation.

Comment 5 Ted Ross 2010-08-19 21:47:01 UTC
More info...  This can be solved by turning on heartbeats, turning on reconnect, and setting a reconnect-limit.  The blockage will clear after the reconnect limit is reached (typically a long time).

There is still a problem if a) reconnect is not desired, or b) reconnect-limit is not desired or is very long.

In these cases, there is no clean way to shut down an application/daemon connected to a stopped/suspended broker.

A related effect: If reconnect is in use and the broker is shut down (cleanly, no need to suspend), the client application/daemon cannot close the session/connection without waiting for the reconnect-limit (if present) to expire.

Comment 6 Gordon Sim 2010-08-20 08:32:31 UTC
Turning on reconnect is not required; that is orthogonal to the problem. 

However my initial assessment was incorrect. Heartbeats *don't* solve this problem. The heartbeat timer is turned off just before the close attempt.

Comment 7 Gordon Sim 2010-08-20 08:54:34 UTC
Created attachment 439888 [details]
testcase

Comment 8 Gordon Sim 2010-08-20 09:21:40 UTC
Created attachment 439895 [details]
Suggested fix

The attached patch fixes the issue by waiting only for the length of the heartbeat interval (if specified) for the broker to respond to the close request.