Upstream patch: https://svn.apache.org/viewvc?view=revision&revision=1656505
Turns out this patch does not work with Proton 0.7. The return code from pn_transport_input/output changed since 0.7. The following update adds support for proton 0.7: https://svn.apache.org/viewvc?view=revision&revision=1657964
This change enables idle timeout support on the broker. The new operational behavior: When an AMQP 1.0 client advertises a connection idle timeout value, 2x that value is used by the broker as its idle timeout threshold for that connection. This is consistent with the existing 0-10 behavior. This causes the broker to terminate any client that has been idle for 2x the value of the client's idle timeout. To be clear: 'advertised' means the value sent as the idle timeout in the open frame. Most clients advertise 1/2 their configured timeout, but a client may advertise 1x the configured timeout. To test, create a queue consumer that waits forever for messages on an empty queue. Use the client's 'heartbeat' connection option (assuming a qpid::messaging client speaking 1.0). Once the connection to the broker is established, send a SIGSTOP to the client. This will prevent the client from generating idle frames. After 2x the client's advertised interval, the broker should terminate the connection to the client. If debug logging is enabled, the broker will log the actual values it uses for the idle timeout on a connection.
To be sure that I understand correctly Is client free to choose the idle timeout value ? If so, how it is done in code ? Does broker have some leverage to enforce some min/max value? Thank you for valuable input.
(In reply to Zdenek Kraus from comment #5) > To be sure that I understand correctly > Is client free to choose the idle timeout value ? Yes > If so, how it is done in code ? By passing the 'heartbeat' option to the qpid::messaging::Connection constructor. The value is expressed in seconds. You can pass these via the --connection-options cmd line argument to qpid-receive: qpid-receive --connection-options "{protocol: amqp1.0, heartbeat: 10}" .... > Does broker have some leverage to enforce some min/max value? No, the broker simply uses what the client sent. Since the client expresses the interval in seconds, the smallest timeout would be 1 second. The max value is unsigned 32bit - I think - which would never time out in our lifetimes :) > > Thank you for valuable input.
(In reply to Ken Giusti from comment #6) > (In reply to Zdenek Kraus from comment #5) > > To be sure that I understand correctly > > Is client free to choose the idle timeout value ? > > Yes > > > If so, how it is done in code ? > > By passing the 'heartbeat' option to the qpid::messaging::Connection > constructor. The value is expressed in seconds. > > You can pass these via the --connection-options cmd line argument to > qpid-receive: > > qpid-receive --connection-options "{protocol: amqp1.0, heartbeat: 10}" .... > > > Does broker have some leverage to enforce some min/max value? > > No, the broker simply uses what the client sent. Since the client expresses > the interval in seconds, the smallest timeout would be 1 second. The max > value is unsigned 32bit - I think - which would never time out in our > lifetimes :) > > > > > Thank you for valuable input. I've done my usual checks for keywords in this ticket within the MICG and MPR and have discovered that qpid-receive is not mentioned in either guide. Should it be? Where is the best place to view the docs for this? In the MPR, there is mention of qpid::messaging::Connection in the http://docbuilder.usersys.redhat.com/19948/#Cluster_Failover_in_C2 section This is the only place the heartbeat option is mentioned. Heartbeats are disabled by default. You can enable them by specifying a heartbeat interval (in seconds) for the connection via the heartbeat option. For example: qpid::messaging::Connection c("node1,node2,node3","{reconnect:true,heartbeat:10}"); Is that info buried a bit too deep in this guide, and should it be made more generic? Is it applicable to other Connection settings?
(In reply to Jared MORGAN from comment #7) > (In reply to Ken Giusti from comment #6) > > (In reply to Zdenek Kraus from comment #5) > > > To be sure that I understand correctly > > > Is client free to choose the idle timeout value ? > > > > Yes > > > > > If so, how it is done in code ? > > > > By passing the 'heartbeat' option to the qpid::messaging::Connection > > constructor. The value is expressed in seconds. > > > > You can pass these via the --connection-options cmd line argument to > > qpid-receive: > > > > qpid-receive --connection-options "{protocol: amqp1.0, heartbeat: 10}" .... > > > > > Does broker have some leverage to enforce some min/max value? > > > > No, the broker simply uses what the client sent. Since the client expresses > > the interval in seconds, the smallest timeout would be 1 second. The max > > value is unsigned 32bit - I think - which would never time out in our > > lifetimes :) > > > > > > > > Thank you for valuable input. > > I've done my usual checks for keywords in this ticket within the MICG and > MPR and have discovered that qpid-receive is not mentioned in either guide. > Should it be? Where is the best place to view the docs for this? > No, qpid-receive isn't shipped as part of the MRG packages. It is a tool available from upstream - I was using it as an example to help Zdenek test this feature. > In the MPR, there is mention of qpid::messaging::Connection in the > http://docbuilder.usersys.redhat.com/19948/#Cluster_Failover_in_C2 section > This is the only place the heartbeat option is mentioned. > > Heartbeats are disabled by default. You can enable them by specifying a > heartbeat interval (in seconds) for the connection via the heartbeat option. > For example: > qpid::messaging::Connection > c("node1,node2,node3","{reconnect:true,heartbeat:10}"); > > Is that info buried a bit too deep in this guide, and should it be made more > generic? Is it applicable to other Connection settings? IMHO, I'd document the heartbeat option where the other connection options are documented. I tried following the link to Connection that appears lower in this section: http://docbuilder.usersys.redhat.com/19948/#Hello_World_Walkthrough but that link fails. I'd assume the heartbeat documentation would be included where ever the documentation for the Connection constructor is. I'd link from the Cluster Failover documentation back to the Connection documentation also. But, again, this is just IMHO.
There is a bug, when system time is changed to the future on broker machine. AMQP 1.0 connection is dropped, while 0.10 persists. Reproducer: 1. Start the broker on machine (A) 2. Connect the client from different machine (B) to the broker (A) via AMQP 1.0 and enabled heartbeats. $ ./qc2_drain -b admin/admin.144.11:5672 -f --connection-options "{heartbeat:3, protocol:amqp1.0}" "testq2;{create:always}" 3. On broker machine (A) change date to the future, and observe AMQP 1.0 client. $ date -s @$(($(date '+%s') + 300)) 4. Client (B) disconnects from the broker. Reproducible always Note: Perform step 2. twice before changing time in step 3. : connect first client using amqp1.0 and second with amqp0-10 protocol. Observe, that client using amqp0-10 will persist the time change. See bz1080165 and attached log files for more details.
There is a bug, when system time is changed to the future on broker machine. AMQP 1.0 connection is dropped, while 0.10 persists. Reproducer: 1. Start the broker on machine (A) 2. Connect the client from different machine (B) to the broker (A) via AMQP 1.0 and enabled heartbeats. $ ./qc2_drain -b admin/admin.144.11:5672 -f --connection-options "{heartbeat:3, protocol:amqp1.0}" "testq2;{create:always}" 3. On broker machine (A) change date to the future, and observe AMQP 1.0 client. $ date -s @$(($(date '+%s') + 300)) 4. Client (B) disconnects from the broker. Reproducible always Note: Perform step 2. twice before changing time in step 3. : connect first client using amqp1.0 and second with amqp0-10 protocol. Observe, that client using amqp0-10 will persist the time change. See bz1080165 and attached log files for more details. Tested on rhel 7 & 6 both archs qpid-cpp-server-0.34-1 qpid-cpp-client-0.34-1
Created attachment 1045861 [details] qpid cpp trace log
Created attachment 1045863 [details] Qpid broker trace log
(In reply to Ken Giusti from comment #8) > (In reply to Jared MORGAN from comment #7) > > (In reply to Ken Giusti from comment #6) > > > (In reply to Zdenek Kraus from comment #5) > > > > To be sure that I understand correctly > > > > Is client free to choose the idle timeout value ? > > > > > > Yes > > > > > > > If so, how it is done in code ? > > > > > > By passing the 'heartbeat' option to the qpid::messaging::Connection > > > constructor. The value is expressed in seconds. > > > > > > You can pass these via the --connection-options cmd line argument to > > > qpid-receive: > > > > > > qpid-receive --connection-options "{protocol: amqp1.0, heartbeat: 10}" .... > > > > > > > Does broker have some leverage to enforce some min/max value? > > > > > > No, the broker simply uses what the client sent. Since the client expresses > > > the interval in seconds, the smallest timeout would be 1 second. The max > > > value is unsigned 32bit - I think - which would never time out in our > > > lifetimes :) > > > > > > > > > > > Thank you for valuable input. > > > > I've done my usual checks for keywords in this ticket within the MICG and > > MPR and have discovered that qpid-receive is not mentioned in either guide. > > Should it be? Where is the best place to view the docs for this? > > > > No, qpid-receive isn't shipped as part of the MRG packages. It is a tool > available from upstream - I was using it as an example to help Zdenek test > this feature. > > > In the MPR, there is mention of qpid::messaging::Connection in the > > http://docbuilder.usersys.redhat.com/19948/#Cluster_Failover_in_C2 section > > This is the only place the heartbeat option is mentioned. > > > > Heartbeats are disabled by default. You can enable them by specifying a > > heartbeat interval (in seconds) for the connection via the heartbeat option. > > For example: > > qpid::messaging::Connection > > c("node1,node2,node3","{reconnect:true,heartbeat:10}"); > > > > Is that info buried a bit too deep in this guide, and should it be made more > > generic? Is it applicable to other Connection settings? > > IMHO, I'd document the heartbeat option where the other connection options > are documented. I tried following the link to Connection that appears lower > in this section: > > http://docbuilder.usersys.redhat.com/19948/#Hello_World_Walkthrough > > but that link fails. I'd assume the heartbeat documentation would be > included where ever the documentation for the Connection constructor is. > I'd link from the Cluster Failover documentation back to the Connection > documentation also. > > But, again, this is just IMHO. I've had a close look at the docs, and I did find heartbeat documented in http://docbuilder.usersys.redhat.com/19948/#Connection_Options_Reference So I think we're covered there. One thing I noticed is this statement in the table: By default, TCP retransmission time is around 15 minutes on Linux and 12 seconds on Windows. 15 *minutes*?? <== is that right?
That doesn't seem correct to me. The TCP retransmission timer is variable. It is computed based on the measured round trip time and can vary over time due to network dynamics. So the application programmer can't really know what the retransmission timer is at any given moment. The TCP *timeout* - note the difference: the point where TCP has retried as much as it can and essentially gives up and fails the connection - is anywhere between 13 to 30 *minutes* See http://man7.org/linux/man-pages/man7/tcp.7.html - the part about tcp_retries2 I think the whole point of setting a messaging connection time out is to override the TCP timeout - to fail faster than TCP would. Essentially, the application programmer is saying - I don't care if you can recover in 13-30 minutes, if my message doesn't get through in (say) 30 seconds I'm going to consider the connection "out of spec" and fail. Does that make sense? So no - I don't know where that recommendation came from. Perhaps this is a question better asked on the messaging mailing list? Someone might be able to shed some light on why this recommendation is present.
In summary - the documentation for connection heartbeats is not correct. We should drop the recommendation to set heartbeats to 1/2 the tcp retransmit interval. Applications should be free to determine how long a link remains unresponsive - not TCP. The semantics column should simply say something like this: Requests that heartbeats be sent every N seconds. If two successive heartbeats are missed the connection is considered lost and will fail or start the reconnect process if configured to do so.
(In reply to Ken Giusti from comment #16) > In summary - the documentation for connection heartbeats is not correct. We > should drop the recommendation to set heartbeats to 1/2 the tcp retransmit > interval. Applications should be free to determine how long a link remains > unresponsive - not TCP. > > The semantics column should simply say something like this: > > Requests that heartbeats be sent every N seconds. If two successive > heartbeats are missed the connection is considered lost and will fail or > start the reconnect process if configured to do so. This recommendation is now included in the docs. Thanks for providing this clarifying text, Ken. http://docbuilder.usersys.redhat.com/19948/#Connection_Options_Reference
This is not a documentation bugzilla. Please see relevant doc bug 1249942. There has been found a bug, please see comment 11. Moving back to assigned.
Pushed a fix for the time change bug upstream: https://svn.apache.org/viewvc?view=revision&revision=1696415
Here is the commit info for the change on the MRG qpid git repository: [kgiusti@localhost qpid (trunk)]$ git log -1 commit c2e8abd958add91b7e99163193d2fe5b65ab9b6e Author: Ken Giusti <kgiusti> Date: Tue Aug 18 13:29:56 2015 +0000 QPID-6698: use the monotonic clock for AMQP 1.0 idle timeout git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1696415 13f79535-47bb-0310-9956-ffa450edef68 [kgiusti@localhost qpid (trunk)]$ git remote -v origin ssh://git.app.eng.bos.redhat.com/srv/git/rh-qpid.git (fetch) origin ssh://git.app.eng.bos.redhat.com/srv/git/rh-qpid.git (push)
Marking as verified on rhel 6 32/64 and rhel 7 using qpid-cpp-server-0.34-3.el6/7 Setting time to future issue has been resolved.
The doc text looks good, but I found one bit unclear: the use of "enforce" in the following sentence: "Note that the broker simply uses what the client sent. It is not able to enforce a value." The broker does enforce a value - that value being 2x the idle-timeout advertised by the client. Rather, I think we're trying to explain that the broker doesn't support the _configuration_ of a default idle-timeout on the broker side. The idle timeout value for the connection is entirely determined by the configuration of the connection made by the client. There's no way to override this value on the broker side via management.
Scott, could you please revisit doctext with respect to Ken's comment 27 ?
+1 to the latest doc text - well done.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2015-1879.html