Bug 755022

Summary: Consumer blocked after PFC is activated
Product: Red Hat Enterprise MRG Reporter: Petr Matousek <pematous>
Component: qpid-cppAssignee: Ken Giusti <kgiusti>
Status: NEW --- QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.0CC: gsim, jross
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
test reproducer none

Description Petr Matousek 2011-11-18 15:14:33 UTC
Description of problem:

I have the broker running with default producer flow control (80% threshold), then I have a simple script with one sender and one consumer.

The sender creates a queue with qpid.max_count set to 10, then he tries to send 10 messages on that queue, after that the consumer tries to consume the messages from that queue.

By default at message number 8 PFC is activated and the sender is blocked until some messages are consumed .. that is right.

When sync=False is used in send method of the sender, all the messages are sent (not really sent but the sender is not blocked) .. that is right as well,
but then the receiver is blocked in receiver fetch method. 

The receiver shall not be blocked.

Please see additional info for details.

This issue is also valid on latest packages (0.12-6)

Version-Release number of selected component (if applicable):
python-qpid-0.10-1.el5
python-qpid-qmf-0.10-10.el5
qpid-cpp-client-0.10-9.el5
qpid-cpp-client-devel-0.10-9.el5
qpid-cpp-client-devel-docs-0.10-9.el5
qpid-cpp-client-ssl-0.10-9.el5
qpid-cpp-server-0.10-9.el5
qpid-cpp-server-cluster-0.10-9.el5
qpid-cpp-server-devel-0.10-9.el5
qpid-cpp-server-ssl-0.10-9.el5
qpid-cpp-server-store-0.10-9.el5
qpid-cpp-server-xml-0.10-9.el5
qpid-java-client-0.10-9.el5
qpid-java-common-0.10-9.el5
qpid-java-example-0.10-9.el5
qpid-qmf-0.10-10.el5
qpid-qmf-devel-0.10-10.el5
qpid-tools-0.10-6.el5

How reproducible:
100%

Steps to Reproduce:
1. qpidd running with default PFC
2. run the attached test reproducer
3. the consumer is not able to consume the messages until the PFC is deactivated
  
Actual results:
the consumer is blocked after PFC activation

Expected results:
the consumer is not blocked after PFC activation

Additional info:

With the following modification of the reproducer code:
- receiver.fetch(0)
+ receiver.fetch(None)

first message is consumed and after that the consumer is blocked again by consuming the second message.

Moreover, the issue applies to whole the session not only the queue, since the consumer waiting for queueThresholdExceeded events is also not able to consume messages, it seems that all the session instances are blocked until the PFC is deactivated.

This can be reproduced by uncommenting lines 26 and 27 in reproducer code.

Consumer from another session is able to consume messages, this can be reproduced by uncommenting the following lines in the reproducer code:
12,16,29-33

Comment 1 Petr Matousek 2011-11-18 15:17:28 UTC
Created attachment 534411 [details]
test reproducer

Comment 2 Ken Giusti 2011-11-21 20:57:48 UTC
Producer flow control blocks the sender by essentially holding off message transfers at the Session level - not per sender.  0-10 doesn't really have a per-sender context, so we have to block at the Session.

The reproducer script shares one session between both the sender and the receiver.  Since the sender triggers the flow control state on the session, the receiver is likewise blocked because it shares the same session state.

I'd consider the same thing to happen if we had multiple senders sharing a session - one sender hitting flow control on a particular queue will cause the other senders sharing the same session (but sending to different, likely unblocked, queues) to also block.

Since 0.10 only allows us to block at the session level, I don't think we can address this (until perhaps 1.0).

Gordon - you understand 1.0 and the client side better than I: is my understanding correct?

Comment 3 Ken Giusti 2011-11-21 20:58:38 UTC
Missed setting the needinfo field

Comment 4 Gordon Sim 2011-11-22 09:44:37 UTC
The example is in python, should the component be python-qpid?

The issue does indeed look to be a result of the way sync and completions currently work. On a given session the commands are all ordered. When issued a sync call the broker is supposed to respond when it has completed all the outstanding commands. A fetch on a receiver with no credit will flush the broker which involves a sync. However if the completion stream is already delayed by flow control the broker won't respond to this.

I believe in theory this could be addressed using command level sync flags instead of the sync command, but quite honestly I wonder whether it is worth it. In AMQP 1.0 the situation will be greatly improved.

Comment 5 Justin Ross 2011-11-22 15:24:08 UTC
Moving this out to "next version".  I suspect we'll ultimately choose not to address this in the current PFC implementation.  It may, however, be worth documenting somewhere (isolate senders in their own sessions to avoid the problematic interaction described here).  Does that make sense?  Where would be appropriate?

Comment 6 Ken Giusti 2011-11-22 15:36:02 UTC
Justin,

There is an extensive description of flow control in the messaging user's guide, we should add a note there, at least.

We don't currently reference flow control in the programmer's guide.  There is a section that discusses configuring sender capacity - perhaps it should contain a reference to the discussion of flow control in the user's guide?