Bug 1269586 - Client heartbeat interval differs between protocols
Summary: Client heartbeat interval differs between protocols
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: Messaging_Programming_Reference
Version: Development
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Nobody
QA Contact: Messaging QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-10-07 15:49 UTC by Petr Matousek
Modified: 2023-06-23 14:12 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Petr Matousek 2015-10-07 15:49:01 UTC
Description of problem:

Please add a note to at least one of the guides that the amqp1.0 heartbeat interval is twice shorter that with amqp0-10.

Please see Additional info for details. 

Version-Release number of selected component (if applicable):
n/a

How reproducible:
always

Steps to Reproduce:
n/a

Actual results:
Client heartbeat interval differs between protocols

Expected results:
There is a note in the documentation that the interval for heartbeat is N seconds for amqp0-10 and N/2 secondes for amqp1.0.


Additional info:

AMQP1.0:

# export QPID_LOG_ENABLE="trace+:Protocol"
# ./drain --connection-options "{heartbeat:10, protocol:amqp1.0}" amq.direct -t 20

2015-10-07 17:16:48 [Protocol] trace [450dbdd3-a093-4330-9890-c207acbb3a79]: 0 -> @flow(19) [next-incoming-id=0, incoming-window=2147483647, next-outgoing-id=0, outgoing-window=2147483647, handle=0, delivery-count=0, link-credit=1, drain=false]
2015-10-07 17:16:53 [Protocol] trace [450dbdd3-a093-4330-9890-c207acbb3a79]: 0 -> (EMPTY FRAME)
2015-10-07 17:16:53 [Protocol] trace [450dbdd3-a093-4330-9890-c207acbb3a79]: 0 <- (EMPTY FRAME)
2015-10-07 17:16:58 [Protocol] trace [450dbdd3-a093-4330-9890-c207acbb3a79]: 0 -> (EMPTY FRAME)
2015-10-07 17:16:58 [Protocol] trace [450dbdd3-a093-4330-9890-c207acbb3a79]: 0 <- (EMPTY FRAME)
2015-10-07 17:17:03 [Protocol] trace [450dbdd3-a093-4330-9890-c207acbb3a79]: 0 -> (EMPTY FRAME)
2015-10-07 17:17:03 [Protocol] trace [450dbdd3-a093-4330-9890-c207acbb3a79]: 0 <- (EMPTY FRAME)
2015-10-07 17:17:08 [Protocol] trace [450dbdd3-a093-4330-9890-c207acbb3a79]: 0 -> (EMPTY FRAME)

AMQP0-10:

# export QPID_LOG_ENABLE="trace+:Network"
# ./drain --connection-options "{heartbeat:10, protocol:amqp0-10}" amq.direct -t 20

2015-10-07 17:19:28 [Network] trace SENT [[127.0.0.1:52323-127.0.0.1:5672]]: Frame[BEbe; channel=0; {ConnectionTuneOkBody: channel-max=32767; max-frame-size=65535; heartbeat=10; }]
2015-10-07 17:19:38 [Network] trace RECV [[127.0.0.1:52323-127.0.0.1:5672]]: Frame[BEbe; channel=0; {ConnectionHeartbeatBody: }]
2015-10-07 17:19:38 [Network] trace SENT [[127.0.0.1:52323-127.0.0.1:5672]]: Frame[BEbe; channel=0; {ConnectionHeartbeatBody: }]
2015-10-07 17:19:48 [Network] trace RECV [[127.0.0.1:52323-127.0.0.1:5672]]: Frame[BEbe; channel=0; {ConnectionHeartbeatBody: }]
2015-10-07 17:19:48 [Network] trace SENT [[127.0.0.1:52323-127.0.0.1:5672]]: Frame[BEbe; channel=0; {ConnectionHeartbeatBody: }]


same logged on broker side.

Comment 1 Petr Matousek 2015-10-08 14:41:18 UTC
Moreover, while testing with Java client I found another relates anomalies

AMQP0-10 (qpid-java):

# java  -cp $CLASSPATH org.apache.qpid.example.qc3_connector  --broker guest/guest.0.1:5672 --connection-options "{ protocol : 'amqp1.0', heartbeat : '10' }" --duration 20 -a amq.topic

qpidd log:
2015-10-08 16:07:26 [Protocol] trace [qpid.127.0.0.1:5672-127.0.0.1:58702]: 1 <- @flow(19) [next-incoming-id=0, incoming-window=2047, next-outgoing-id=1, outgoing-window=2147483647, handle=1, delivery-count=0, link-credit=1000]
2015-10-08 16:07:29 [Protocol] trace [qpid.127.0.0.1:5672-127.0.0.1:58702]: 0 -> (EMPTY FRAME)
2015-10-08 16:07:31 [Protocol] trace [qpid.127.0.0.1:5672-127.0.0.1:58702]: 0 <- (EMPTY FRAME)
2015-10-08 16:07:31 [Protocol] trace [qpid.127.0.0.1:5672-127.0.0.1:58702]: 0 -> (EMPTY FRAME)
2015-10-08 16:07:34 [Protocol] trace [qpid.127.0.0.1:5672-127.0.0.1:58702]: 0 <- (EMPTY FRAME)
2015-10-08 16:07:34 [Protocol] trace [qpid.127.0.0.1:5672-127.0.0.1:58702]: 0 -> (EMPTY FRAME)
2015-10-08 16:07:36 [Protocol] trace [qpid.127.0.0.1:5672-127.0.0.1:58702]: 0 <- (EMPTY FRAME)
2015-10-08 16:07:36 [Protocol] trace [qpid.127.0.0.1:5672-127.0.0.1:58702]: 0 -> (EMPTY FRAME)
2015-10-08 16:07:39 [Protocol] trace [qpid.127.0.0.1:5672-127.0.0.1:58702]: 0 <- (EMPTY FRAME)
2015-10-08 16:07:39 [Protocol] trace [qpid.127.0.0.1:5672-127.0.0.1:58702]: 0 -> (EMPTY FRAME)
2015-10-08 16:07:41 [Protocol] trace [qpid.127.0.0.1:5672-127.0.0.1:58702]: 0 <- (EMPTY FRAME)
2015-10-08 16:07:41 [Protocol] trace [qpid.127.0.0.1:5672-127.0.0.1:58702]: 0 -> (EMPTY FRAME)
2015-10-08 16:07:44 [Protocol] trace [qpid.127.0.0.1:5672-127.0.0.1:58702]: 0 <- (EMPTY FRAME)
2015-10-08 16:07:44 [Protocol] trace [qpid.127.0.0.1:5672-127.0.0.1:58702]: 0 -> (EMPTY FRAME)
2015-10-08 16:07:46 [Protocol] trace [qpid.127.0.0.1:5672-127.0.0.1:58702]: 0 <- (EMPTY FRAME)
2015-10-08 16:07:46 [Protocol] trace [qpid.127.0.0.1:5672-127.0.0.1:58702]: 0 -> (EMPTY FRAME)

AMQP1.0 (qpid-jms)
# java  -cp -Dlog4j.configuration=log4j.properties  org.apache.qpid.example.qc2_connector  --broker guest/guest.0.1:5672 --connection-options "{  sasl_mechanisms : 'PLAIN', protocol : 'amqp0-10', heartbeat : '10' }" --duration 20 -a 'amq.topic'

qpidd log:
2015-10-08 16:09:27 [Protocol] trace RECV [qpid.127.0.0.1:5672-127.0.0.1:58703]: Frame[BEbe; channel=0; {ConnectionTuneOkBody: channel-max=65534; max-frame-size=65535; heartbeat=10; }]
2015-10-08 16:09:37 [Protocol] trace SENT [qpid.127.0.0.1:5672-127.0.0.1:58703]: Frame[BEbe; channel=0; {ConnectionHeartbeatBody: }]
2015-10-08 16:09:37 [Protocol] trace RECV [qpid.127.0.0.1:5672-127.0.0.1:58703]: Frame[BEbe; channel=0; {ConnectionHeartbeatBody: }]
2015-10-08 16:09:37 [Protocol] trace RECV [qpid.127.0.0.1:5672-127.0.0.1:58703]: Frame[BEbe; channel=0; {ConnectionHeartbeatBody: }]
2015-10-08 16:09:47 [Protocol] trace SENT [qpid.127.0.0.1:5672-127.0.0.1:58703]: Frame[BEbe; channel=0; {ConnectionHeartbeatBody: }]
2015-10-08 16:09:47 [Protocol] trace RECV [qpid.127.0.0.1:5672-127.0.0.1:58703]: Frame[BEbe; channel=0; {ConnectionHeartbeatBody: }]
2015-10-08 16:09:47 [Protocol] trace RECV [qpid.127.0.0.1:5672-127.0.0.1:58703]: Frame[BEbe; channel=0; {ConnectionHeartbeatBody: }]

Comment 2 Petr Matousek 2015-10-08 14:49:16 UTC
Unfortunately, I messed the description in the comment 1 above:
  The first one is AMQP1.0 (qpid-jms)
  The second one is AMQP0-10 (qpid-java)

Comment 3 Petr Matousek 2015-10-08 14:52:05 UTC
Summarizing:

for given heartbeat=N to the client:

AMQP0-10:
 * c++ (qpid-cpp-client) and python (python-qpid):
   The heartbeat is sent each N seconds (expected according to doc).
 * java (qpid-java):
   The heartbeat is sent each N seconds (expected),
   but there is two RECV entries logged in each interval (not expected).
   Are the heartbeats event really sent twice by the client in each interval?

AMQP1.0:  
  * c++ (qpid-cpp-client), python (python-qpid):
    The heartbeat is sent each N/2 seconds (not really expected).
  * java (qpid-java):
    The heartbeat is sent each N/4 seconds (not really expected).

So I'm a bit confused by the summary above.
Gordon, could you possibly have a look at the results and share your opinion if you feel that some engineering fix is needed? I'm not pushing for any fix here, but I would like the anomalies have at least documented. Thanks in advance.

Comment 4 Gordon Sim 2015-10-09 10:13:48 UTC
(In reply to Petr Matousek from comment #3)
> AMQP1.0:  
>   * c++ (qpid-cpp-client), python (python-qpid):
>     The heartbeat is sent each N/2 seconds (not really expected).

This is largely a result of confusion around proton-c and its interpretation the idle-time-out. There was some discussion about changing this behaviour but as yet no concrete action has been taken.

The current behaviour is that the value specified for the heartbeat option is used to set the clients stated 'idle-time-out' which is the maximum interval allowed between frames, but the client would timeout only after going for twice that time without seeing anything from its peer. However proton-c then chooses to send heartbeats at twice this rate - ostensibly to avoid spurious timeouts where other implementations do not also allow for some extra time above the advertised period before giving up.

>   * java (qpid-java):
>     The heartbeat is sent each N/4 seconds (not really expected).

Not sure about this. Robbie?

Comment 5 Gordon Sim 2015-10-09 10:14:44 UTC
(In reply to Petr Matousek from comment #3)
> Summarizing:
> 
> for given heartbeat=N to the client:
> 
> AMQP0-10:
>  * c++ (qpid-cpp-client) and python (python-qpid):
>    The heartbeat is sent each N seconds (expected according to doc).
>  * java (qpid-java):
>    The heartbeat is sent each N seconds (expected),
>    but there is two RECV entries logged in each interval (not expected).
>    Are the heartbeats event really sent twice by the client in each interval?

Rajith, does this sound familiar?

Comment 6 Petr Matousek 2015-10-09 10:24:50 UTC
again, one small correction here.. the mentioned amqp1.0 java client is of course qpid-jms (not qpid-java as stated above in the summary), my apologize for the confusion.

Comment 7 Robbie Gemmell 2015-10-09 11:29:39 UTC
I think we need to be very clear about what a particular option means, and how it is used within the particular client, and the particular protocol impl.

The qpid-jms (AMQP 1.0 JMS) client for example, does not have a setting for 'heartbeat interval'. It has an idleTimeout related option (defaults to 60sec) that controls the time after which it considers a remote peer dead. The interval of heartbeats the remote peer sends to keep the client from severing the connection is essentially independent of that value, the only requirement is that they send the client something at least every 60 seconds, be that actual useful data, or empty heartbeat frames. Reading the AMQP 1.0 spec would lead one to expect that we request they send something at least every 30 seconds, by advertising a 30sec idle-timeout on the Open frame.

Proton (C and Java) advertises half the timeout (the time after what the peer is considered dead) that the transport is configured with, so if you set Proton to 60sec, it sends 30sec. This is in keeping with what the AMQP 1.0 spec says they SHOULD do, advertise half of their 'actual timeout', though the proton config name is confusing called 'idle timeout', rahter than perhaps 'actual timeout'. Since the spec only says peers SHOULD half their advertised value, at some point Proton (C and Java) was adjusted to pessimistically assume that the peer DID NOT advertise half their actual timeout (assuming they advertised any value of course), and so proton halves the received value as well to ensure that it definitely sends frequently enough to avoid timeouts, even against those peers not advertising half their actual timeout. As a result, if Proton is used on both sides, and both advertise an idle-timeout value (which need not be the same, as each direction is independent) on the Open frame then heartbeats will flow at 4x the 'actual timeout N' interval rather than just 2x as might be expected.

Comment 8 Robbie Gemmell 2015-10-09 11:38:53 UTC
(In reply to Gordon Sim from comment #5)
> (In reply to Petr Matousek from comment #3)
> > Summarizing:
> > 
> > for given heartbeat=N to the client:
> > 
> > AMQP0-10:
> >  * c++ (qpid-cpp-client) and python (python-qpid):
> >    The heartbeat is sent each N seconds (expected according to doc).
> >  * java (qpid-java):
> >    The heartbeat is sent each N seconds (expected),
> >    but there is two RECV entries logged in each interval (not expected).
> >    Are the heartbeats event really sent twice by the client in each interval?
> 
> Rajith, does this sound familiar?

I meant to add...something like that was also reported upstream recently:
https://issues.apache.org/jira/browse/QPID-6753

Comment 9 Petr Matousek 2015-10-13 11:40:39 UTC
Hi Justin,
  I'm not really sure how to deal with this bug. Could you possibly have a look on the summary in comment 3, read through the comments above and share your opinion if some engineering work is needed here?

The the AMQP0-10 java client issue (qpid-java, please see comment 8) is already reported by the mentioned jira QPID-6753, I may also create a downstream bz if there is a need, is it?

For the rest: if there is some engineering work needed, please change the component accordingly, if no engineering work is needed we shall at least document the current behaviour (leave the component as it is).

Thanks in advance for your help.

Comment 10 Justin Ross 2015-10-13 17:24:10 UTC
Hi, Petr.  At root it's a spec problem.  There's no immediate engineering work.  We should simply document it for now.
 
(In reply to Petr Matousek from comment #9)
> Hi Justin,
>   I'm not really sure how to deal with this bug. Could you possibly have a
> look on the summary in comment 3, read through the comments above and share
> your opinion if some engineering work is needed here?
> 
> The the AMQP0-10 java client issue (qpid-java, please see comment 8) is
> already reported by the mentioned jira QPID-6753, I may also create a
> downstream bz if there is a need, is it?
> 
> For the rest: if there is some engineering work needed, please change the
> component accordingly, if no engineering work is needed we shall at least
> document the current behaviour (leave the component as it is).
> 
> Thanks in advance for your help.


Note You need to log in before you can comment on or make changes to this bug.