Description of problem:
If an application using the C++ API (new style) is connected to a broker and the broker is stopped/suspended, the client application cannot close the connection. Connection::close() hangs indefinitely.
Version-Release number of selected component (if applicable): MRG1.3 beta5
How reproducible: 100% (I've reproduced this on Fedora 11)
Steps to Reproduce:
1. Write a C++ client app that connects to a broker, delays, closes the connection, then exits.
2. Start the broker, run the app, suspend the broker (^Z, SIGSTOP) during the delay
3. Observe the app hanging and not exiting
4. Resume the broker (fg) and see the app immediately exit.
Actual results: Hanging app
Expected results: App exits normally
Impact: This is likely to be a problem in deployments where the broker is running on a virtualized guest. If the guest is force-stopped or suspended, client applications that clean up connections on exit will not be able to exit.
Not a bug in my view; enable heartbeats which are there to allow this sort of condition to be detected.
I guess we could also support a form of close that was not clean; i.e. abort or detach without attempting to do a clean handshake with the broker. That would not be a 1.3 change however.
Further to my comment above, note that that would be only a limited solution. If the application also e.g. cancelled a subscription as part of shutdown then that too would hang. Heartbeats are intended to solve exactly this problem, so that is really my recommendation.
More info... This can be solved by turning on heartbeats, turning on reconnect, and setting a reconnect-limit. The blockage will clear after the reconnect limit is reached (typically a long time).
There is still a problem if a) reconnect is not desired, or b) reconnect-limit is not desired or is very long.
In these cases, there is no clean way to shut down an application/daemon connected to a stopped/suspended broker.
A related effect: If reconnect is in use and the broker is shut down (cleanly, no need to suspend), the client application/daemon cannot close the session/connection without waiting for the reconnect-limit (if present) to expire.
Turning on reconnect is not required; that is orthogonal to the problem.
However my initial assessment was incorrect. Heartbeats *don't* solve this problem. The heartbeat timer is turned off just before the close attempt.
Created attachment 439888 [details]
Created attachment 439895 [details]
The attached patch fixes the issue by waiting only for the length of the heartbeat interval (if specified) for the broker to respond to the close request.
Fixed by http://mrg1.lab.bos.redhat.com/git/?p=qpid.git;a=commitdiff;h=c0f84a2cb76c105c2d4a736cbdf629b47f2a96d2