Bug 1502855

Summary:	sending durable messages concurrently from 0-10 and 1.0 producer causes high latency on ACKing the 1.0 ones
Product:	Red Hat Enterprise MRG	Reporter:	Pavel Moravec <pmoravec>
Component:	qpid-cpp	Assignee:	Kim van der Riet <kim.vdriet>
Status:	CLOSED ERRATA	QA Contact:	Petr Matousek <pematous>
Severity:	high	Docs Contact:
Priority:	urgent
Version:	3.2	CC:	esammons, gsim, jross, kim.vdriet, mcressma, pematous
Target Milestone:	3.2.7
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	qpid-cpp-1.36.0-12	Doc Type:	Bug Fix
Doc Text:	Cause: When durable messages are sent to the same queue using both AMQP 1.0 and 0-10 concurrently, a condition can be triggered where the store no longer flushes the 1.0 messages. Consequence: When this condition is triggered, the 1.0 messages are only flushed by the 0-10 activity on the queue, or if there is none, then the messages are never flushed. Fix: The code was changed so that any combination of 0-10 and 1.0 messages are successfully processed without causing messages to be delayed. Result:	Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-11-20 16:52:55 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Pavel Moravec 2017-10-16 20:34:28 UTC

Description of problem:
when concurrently sending from more producers messages in 0-10 protocol and in 1.0, broker responds with huge delays to the 1.0 producers.


Version-Release number of selected component (if applicable):
qpid-cpp-server*1.36.0-11.el6.x86_64


How reproducible:
100%


Steps to Reproduce:
1) Run this script to generate 1.0 traffic (I will simplify it later on):

queues=${1:-50}
MAXMSGS=${2:-2}
MINMSGSIZE=770
MAXMSGSIZE=1550
senderSleep=${3:-8}
receiverSleep=${4:-10}

rm -rf /var/lib/qpidd/*
killall qpid-send qpid-receive
service qpidd restart

for i in $(seq 1 $queues); do
        qpid-receive -a "DurablePerf_$i; {create:always, node:{type:queue, durable:true}}"
done


for i in $(seq 1 $queues); do
        while true; do
                (time qpid-send -a DurablePerf_$i --capacity=1 --durable=yes -m $((RANDOM%MAXMSGS+1)) --content-size=$((RANDOM%(MAXMSGSIZE-MINMSGSIZE)+MINMSGSIZE)) --connection-options "{protocol:'amqp1.0'}") 2>&1 | grep real
                sleep $((senderSleep-1))
                sleep $((RANDOM%2))
        done &
        sleep 1
done

for i in $(seq 1 $queues); do
        while true; do
                qpid-receive -a DurablePerf_$i -m 1000 --print-content=no --print-headers=no --connection-options "{protocol:'amqp1.0'}" --capacity=1 --timeout=2
                sleep $receiverSleep
        done &
done


2) Monitor the "real" output (output of "time" command how long each qpid-send took to send few messages).

3) After a time, send 0-10 messages to every above-used queue - run below command few times:

queues=50; for i in $(seq 1 $queues); do qpid-send -a DurablePerf_$i --capacity=1 --durable=yes -m 3 --content-size=$((RANDOM%1500)) & done; sleep 3

4) Monitor the "real" output concurrently


Actual results:
2) shows times usually below 1second
4) show arbitrary high times, often in tens of seconds


Expected results:
4) to show slightly higher numbers only (since we generate bit more traffic)


Additional info:
- I will chekc if linearstore is the cause or not
- I will simplify the reproducer, tomorrow

Comment 1 Pavel Moravec 2017-10-17 12:23:12 UTC

Durable queues are mandatory. No durable queues or messages, no reproducer.


Smaller reproducer using just one queue:

1) In one terminal, run below cmds in a script:

rm -rf /var/lib/qpidd/*
killall qpid-send qpid-receive
service qpidd restart

# create durable queue
qpid-receive -a "DurablePerf; {create:always, node:{type:queue, durable:true}}"

# have 1.0 consumer forever
qpid-receive -a DurablePerf -m 1000000 --print-content=no --print-headers=no --connection-options "{protocol:'amqp1.0'}" --capacity=1 -f &

# repeatedly send a 1.0 message and print real time spent there
while true; do
	(time qpid-send -a DurablePerf --capacity=1 --durable=yes -m 1 --content-size=$((RANDOM%500+1000)) --connection-options "{protocol:'amqp1.0'}") 2>&1 | grep real
	sleep 1
done &


2) In another terminal, send some 0-10 messages time to time to the same queue:

while true; do qpid-send -a DurablePerf --capacity=1 --durable=yes -m 10 --content-size=$((RANDOM%1500)); sleep 1; ps aux | grep -v grep | grep qpid-send; sleep 1; ps aux | grep -v grep | grep qpid-send; date; done


3) check 1st terminal output - "real" time (how long does it take each qpid-send to send a message and terminate) will be below 0.1s. But after a race condition is hit (concurrent sending 0-10 and 1.0 message?), the times start to be above 1s forever.

Since that time, 2nd terminal will show:

Tue Oct 17 12:17:30 CEST 2017
root     21615  0.0  0.0 202836  5636 pts/1    Sl   12:17   0:00 qpid-send -a DurablePerf --capacity=1 --durable=yes -m 1 --content-size=1206 --connection-options {protocol:'amqp1.0'}
root     21615  0.0  0.0 202836  5636 pts/1    Sl   12:17   0:00 qpid-send -a DurablePerf --capacity=1 --durable=yes -m 1 --content-size=1206 --connection-options {protocol:'amqp1.0'}
Tue Oct 17 12:17:33 CEST 2017
root     21632  0.0  0.0 202836  5640 pts/1    Sl   12:17   0:00 qpid-send -a DurablePerf --capacity=1 --durable=yes -m 1 --content-size=1327 --connection-options {protocol:'amqp1.0'}
root     21632  0.0  0.0 202836  5640 pts/1    Sl   12:17   0:00 qpid-send -a DurablePerf --capacity=1 --durable=yes -m 1 --content-size=1327 --connection-options {protocol:'amqp1.0'}
Tue Oct 17 12:17:35 CEST 2017

i.e. same qpid-send (compare PID) after 1 second sleep.


Interesting observation: if I stop the 0-10 sender, the 1.0 sender gets stuck. Forever. Well, until I send another 0-10 message that triggers disposition frame is sent from qpidd to the 1.0 qpid-send and that sender can finally terminate.

So, why qpidd waits with sending 1.0 disposition performative for _another_ 0-10 message?

Comment 2 Gordon Sim 2017-10-17 12:32:50 UTC

> So, why qpidd waits with sending 1.0 disposition performative for _another_ 0-10 message?

In 1.0, there are no explicit syncs defined in the protocol, and so at present there are no flushes on the journal. With 0-10 there are. I suspect this is another bug in the journal, where it fails to flush automatically in some cases. For durable AMQP 1.0 messages, the disposition is sent back only after the journal callsback to indicate successful completion of the enqueue operation.

Comment 3 Pavel Moravec 2017-10-17 12:59:45 UTC

(In reply to Gordon Sim from comment #2)
> > So, why qpidd waits with sending 1.0 disposition performative for _another_ 0-10 message?
> 
> In 1.0, there are no explicit syncs defined in the protocol, and so at
> present there are no flushes on the journal. With 0-10 there are. I suspect
> this is another bug in the journal, where it fails to flush automatically in
> some cases. For durable AMQP 1.0 messages, the disposition is sent back only
> after the journal callsback to indicate successful completion of the enqueue
> operation.

That makes sense.

So if 1.0 disposition is sent back only after a trigger from linearstore, the linearstore must be guilty. But how it can be buggy if it doesnt care or even know the protocol version of rceived message, while concurrent sending of them is the key factor in the reproducer?

Idea:
- 1.0 message is received
- 0-10 message is received
- 1.0 message is enqueued to the queue and stored into journal
- 0-10 message as well
- the journal flush trigger is fired now - but "processed"(?) by the 0-10 sender, not by 1.0

Not sure about the latest point, but I am quite confident that if 0-10 message is enqueued after 1.0 one and before journal flush is triggered for the 1.0 message (or, for an observer, before disposition frame is sent to the 1.0 sender), then this bug is hit.

Since:

> Interesting observation: if I stop the 0-10 sender, the 1.0 sender gets stuck. > Forever. Well, until I send another 0-10 message that triggers disposition
> frame is sent from qpidd to the 1.0 qpid-send and that sender can finally terminate.

triggers the same "disposition frame stuck" again. Every time. (i.e. once we hit this BZ, the "workaround" in sending 0-10 message is also its trigger)

Comment 4 Gordon Sim 2017-10-17 13:43:24 UTC

The 1.0 codepath *never* flushes the queue explicitly. There is no explicit protocol 'hint' for it to do so. It would need to infer that flushing made sense from the absence of any more data incoming on that connection (which at present it has no knowledge of either). So my guess is that something is happening whereby the automatic flushing of the journal by itself stops happening, so completion relies on explicit flushes which only come from the 0-10 codepath.

Comment 5 Kim van der Riet 2017-10-18 13:52:37 UTC

Gordon's analysis is correct.

The problem seems to be an error in the state machine of the InactivityFireEvent TimerTask.  In cases where an external flush (which is provided by the presence of AMQP 0-10 messages) arrives just before the timer fires, the state machine moves from state RUNNING to state FLUSHED. A pathway for the state to move from FLUSHED to FIRED once the timer actually fires was missing, and this error put the timer into a state where the timer could never be reset.

By adding this pathway into the state machine, the problem has been solved, and the reproducer in C#1 now works.

Comment 6 Pavel Moravec 2017-10-19 11:02:14 UTC

I can confirm the qpid-cpp-server-1.36.0-12.el6.x86_64 package fixes this bug. Ran few other tests (i.e. the "previous" BZ reproducers) and found no issue.

Thanks for prompt patch.

Comment 9 Petr Matousek 2017-11-09 12:45:22 UTC

I've been running the reproducer from comment#1 more than 20hours. The issue did not appear on any of the supported platforms, thus the issue is considered as fixed. Internal test case has been linked.

packages under test:
* qpid-cpp-1.36.0-13

-> VERIFIED

Comment 12 errata-xmlrpc 2017-11-20 16:52:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3253