Bug 682860 - Producer flow control - qpid-send hangs after all messages are sent to broker
Summary: Producer flow control - qpid-send hangs after all messages are sent to broker
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 1.3
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Ken Giusti
QA Contact: MRG Quality Engineering
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-03-07 19:51 UTC by ppecka
Modified: 2011-08-12 16:22 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-03-10 11:17:24 UTC
Target Upstream Version:


Attachments (Terms of Use)
reproducer (1.14 KB, application/x-shellscript)
2011-03-07 19:51 UTC, ppecka
no flags Details

Description ppecka 2011-03-07 19:51:04 UTC
Created attachment 482775 [details]
reproducer

Description of problem:
We are having one queue with flow control count set (20msg - stop; 10msg - resume). Then client(qpid-send) with "sender's capacity" = 1000 sends 100 messages. So this means on queue, after first 20 messages flowStopped is set to "True", but our client can send still up to 1000msgs because of capacity. So it transports rest of messages (80) to the queue where they are ready for consumer. After that qpid-send hangs, until messages are consumed with qpid-receive(see "Actual results:" && reproducer as an "attachment" - which get stuck on qpid-send ). So qpid-receive needs to run from separate shell to allow qpid-send to exit

Version-Release number of selected component (if applicable):
#rpm -qa | grep qpid | sort -u
python-qpid-0.9.1073306-1.el5
qpid-cpp-client-0.9.1073306-1.el5
qpid-cpp-client-devel-0.9.1073306-1.el5
qpid-cpp-client-devel-docs-0.9.1073306-1.el5
qpid-cpp-client-ssl-0.9.1073306-1.el5
qpid-cpp-server-0.9.1073306-1.el5
qpid-cpp-server-cluster-0.9.1073306-1.el5
qpid-cpp-server-devel-0.9.1073306-1.el5
qpid-cpp-server-ssl-0.9.1073306-1.el5
qpid-cpp-server-store-0.9.1073306-1.el5
qpid-cpp-server-xml-0.9.1073306-1.el5
qpid-java-client-0.9.1073306-1.el5
qpid-java-common-0.9.1073306-1.el5
qpid-java-example-0.9.1073306-1.el5
qpid-tools-0.9.1073306-1.el5
rh-qpid-cpp-tests-0.9.1073306-1.el5


How reproducible:
100%

Steps to Reproduce:
see attachment
  
Actual results:
pstack $(pidof qpid-send)
Thread 2 (Thread 0xb7f80b90 (LWP 11498)):
#0  0x00859402 in __kernel_vsyscall ()
#1  0x009eb726 in epoll_wait () from /lib/libc.so.6
#2  0x0023744a in qpid::sys::Poller::wait(qpid::sys::Duration) () from /usr/lib/libqpidcommon.so.4
#3  0x00238073 in qpid::sys::Poller::run() () from /usr/lib/libqpidcommon.so.4
#4  0x0022d521 in ?? () from /usr/lib/libqpidcommon.so.4
#5  0x00a80832 in start_thread () from /lib/libpthread.so.0
#6  0x009eb0ae in clone () from /lib/libc.so.6
Thread 1 (Thread 0xb7f81720 (LWP 11497)):
#0  0x00859402 in __kernel_vsyscall ()
#1  0x00a84bc5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x009f777d in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6
#3  0x00c46b82 in qpid::client::SessionImpl::waitForCompletionImpl(qpid::framing::SequenceNumber const&) () from /usr/lib/libqpidclient.so.4
#4  0x00c46d37 in qpid::client::SessionImpl::waitForCompletion(qpid::framing::SequenceNumber const&) () from /usr/lib/libqpidclient.so.4
#5  0x00c2bc6f in qpid::client::Future::wait(qpid::client::SessionImpl&) () from /usr/lib/libqpidclient.so.4
#6  0x00c3bbfa in qpid::client::SessionBase_0_10::sync() () from /usr/lib/libqpidclient.so.4
#7  0x00dcdaab in qpid::client::amqp0_10::SessionImpl::syncImpl(bool) () from /usr/lib/libqpidmessaging.so.3
#8  0x00dd4af2 in bool qpid::client::amqp0_10::SessionImpl::execute<qpid::client::amqp0_10::SessionImpl::Sync>(qpid::client::amqp0_10::SessionImpl::Sync&) () from /usr/lib/libqpidmessaging.so.3
#9  0x00dcdc6f in qpid::client::amqp0_10::SessionImpl::sync(bool) () from /usr/lib/libqpidmessaging.so.3
#10 0x00d9cc5b in qpid::messaging::Session::sync(bool) () from /usr/lib/libqpidmessaging.so.3
#11 0x08053d6f in qpid::messaging::Receiver::operator=(qpid::messaging::Receiver const&) ()
#12 0x0092ee9c in __libc_start_main () from /lib/libc.so.6
#13 0x080514b1 in qpid::messaging::Receiver::operator=(qpid::messaging::Receiver const&) ()


Expected results:
qpid-send will exit after all messages are delivered according to flow and capacity settings.

Additional info:

Comment 1 Ken Giusti 2011-03-09 20:20:28 UTC
A trace on the broker shows that an Execution.Sync command is being sent by the client after the first 100 messages are sent:

<SNIP>
2011-03-09 14:58:25 trace RECV [127.0.0.1:5672-127.0.0.1:56988]: Frame[Bbe; channel=1; {MessageTransferBody: destination=; accept-mode=1; acquire-mode=0; }]
2011-03-09 14:58:25 trace anonymous.73bfef7a-d0e0-42c4-92fe-e03f5a17c080: recv cmd 100: {MessageTransferBody: destination=; accept-mode=1; acquire-mode=0; }
2011-03-09 14:58:25 trace RECV [127.0.0.1:5672-127.0.0.1:56988]: Frame[be; channel=1; header (64 bytes); properties={{MessageProperties: content-length=10; correlation-id=; content-type=; user-id=; application-headers={sn:F4:uint32(98),ts:F8:int64(1299700705589619811)}; }{DeliveryProperties: routing-key=145819; }}]
2011-03-09 14:58:25 trace anonymous.73bfef7a-d0e0-42c4-92fe-e03f5a17c080: recv cmd 100: header (64 bytes); properties={{MessageProperties: content-length=10; correlation-id=; content-type=; user-id=; application-headers={sn:F4:uint32(98),ts:F8:int64(1299700705589619811)}; }{DeliveryProperties: routing-key=145819; }}
2011-03-09 14:58:25 trace RECV [127.0.0.1:5672-127.0.0.1:56988]: Frame[Ebe; channel=1; content (10 bytes) XXXXXXXXXX]
2011-03-09 14:58:25 trace anonymous.73bfef7a-d0e0-42c4-92fe-e03f5a17c080: recv cmd 100: content (10 bytes) XXXXXXXXXX
2011-03-09 14:58:25 trace Queue "145819": setting flow control for msg pos=98
2011-03-09 14:58:25 debug Message 0x7f54240276a0 enqueued on 145819
2011-03-09 14:58:25 trace RECV [127.0.0.1:5672-127.0.0.1:56988]: Frame[Bbe; channel=1; {MessageTransferBody: destination=; accept-mode=1; acquire-mode=0; }]
2011-03-09 14:58:25 trace anonymous.73bfef7a-d0e0-42c4-92fe-e03f5a17c080: recv cmd 101: {MessageTransferBody: destination=; accept-mode=1; acquire-mode=0; }
2011-03-09 14:58:25 trace RECV [127.0.0.1:5672-127.0.0.1:56988]: Frame[be; channel=1; header (64 bytes); properties={{MessageProperties: content-length=10; correlation-id=; content-type=; user-id=; application-headers={sn:F4:uint32(99),ts:F8:int64(1299700705589647329)}; }{DeliveryProperties: routing-key=145819; }}]
2011-03-09 14:58:25 trace anonymous.73bfef7a-d0e0-42c4-92fe-e03f5a17c080: recv cmd 101: header (64 bytes); properties={{MessageProperties: content-length=10; correlation-id=; content-type=; user-id=; application-headers={sn:F4:uint32(99),ts:F8:int64(1299700705589647329)}; }{DeliveryProperties: routing-key=145819; }}
2011-03-09 14:58:25 trace RECV [127.0.0.1:5672-127.0.0.1:56988]: Frame[Ebe; channel=1; content (10 bytes) XXXXXXXXXX]
2011-03-09 14:58:25 trace anonymous.73bfef7a-d0e0-42c4-92fe-e03f5a17c080: recv cmd 101: content (10 bytes) XXXXXXXXXX
2011-03-09 14:58:25 trace Queue "145819": setting flow control for msg pos=99
2011-03-09 14:58:25 debug Message 0x7f5424028460 enqueued on 145819
2011-03-09 14:58:25 trace RECV [127.0.0.1:5672-127.0.0.1:56988]: Frame[Bbe; channel=1; {MessageTransferBody: destination=; accept-mode=1; acquire-mode=0; }]
2011-03-09 14:58:25 trace anonymous.73bfef7a-d0e0-42c4-92fe-e03f5a17c080: recv cmd 102: {MessageTransferBody: destination=; accept-mode=1; acquire-mode=0; }
2011-03-09 14:58:25 trace RECV [127.0.0.1:5672-127.0.0.1:56988]: Frame[be; channel=1; header (64 bytes); properties={{MessageProperties: content-length=10; correlation-id=; content-type=; user-id=; application-headers={sn:F4:uint32(100),ts:F8:int64(1299700705589670376)}; }{DeliveryProperties: routing-key=145819; }}]
2011-03-09 14:58:25 trace anonymous.73bfef7a-d0e0-42c4-92fe-e03f5a17c080: recv cmd 102: header (64 bytes); properties={{MessageProperties: content-length=10; correlation-id=; content-type=; user-id=; application-headers={sn:F4:uint32(100),ts:F8:int64(1299700705589670376)}; }{DeliveryProperties: routing-key=145819; }}
2011-03-09 14:58:25 trace RECV [127.0.0.1:5672-127.0.0.1:56988]: Frame[Ebe; channel=1; content (10 bytes) XXXXXXXXXX]
2011-03-09 14:58:25 trace anonymous.73bfef7a-d0e0-42c4-92fe-e03f5a17c080: recv cmd 102: content (10 bytes) XXXXXXXXXX
2011-03-09 14:58:25 trace Queue "145819": setting flow control for msg pos=100
2011-03-09 14:58:25 debug Message 0x7f5424028c50 enqueued on 145819
2011-03-09 14:58:25 trace RECV [127.0.0.1:5672-127.0.0.1:56988]: Frame[BEbe; channel=1; {ExecutionSyncBody: }]
2011-03-09 14:58:25 trace anonymous.73bfef7a-d0e0-42c4-92fe-e03f5a17c080: recv cmd 103: {ExecutionSyncBody: }
2011-03-09 14:58:25 debug anonymous.73bfef7a-d0e0-42c4-92fe-e03f5a17c080: delaying completion of execution.sync 103
</SNIP>

The execution.sync will not complete until the previous 100 msgs complete - that is, are consumed -or- un-flow-controlled.   My guess is that the client is pending on the completion of the execution.sync, which is causing the hang.

Unclear why the execution.sync is being sent by the client, as it doesn't appear to call sender.sync() in the send loop.  Investigation continues.

Comment 2 Ken Giusti 2011-03-09 20:57:17 UTC
Ah! Sorry, I should've looked closer at what was going on here - this is NOT-A-BUG.


If you look at the qpid-send.cpp code, when it has finished sending all its messages, it does a session.sync().  This causes the qpid-send command to wait until all messages that it has sent has been acknowledged by the broker.

However, since flow control is active, the broker will not acknowledge all the messages until enough are consumed to drop the queue's contents down to the resume limit (which will happen when there are 9 msgs on the queue, given the configuration).

So the test sends all 100 messages - which it can do because it's capacity is > 100.  But before it shuts down the session, it waits for all messages to be ack'ed (in practice, a very good thing to do).  Since there are no consumers, it will wait forever.

As you've mentioned, the only way to "unhang" the sender is to run a consumer in a different thread.


Note You need to log in before you can comment on or make changes to this bug.