Bug 506769 - Broker redelivering dequeued messages (?)
Broker redelivering dequeued messages (?)
Status: NEW
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
All Linux
medium Severity high
: ---
: ---
Assigned To: Ken Giusti
MRG Quality Engineering
Depends On:
  Show dependency treegraph
Reported: 2009-06-18 11:53 EDT by Gordon Sim
Modified: 2013-02-24 20:30 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed:
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

External Trackers
Tracker ID Priority Status Summary Last Updated
Apache JIRA QPID-3079 None None None Never

  None (edit)
Description Gordon Sim 2009-06-18 11:53:58 EDT
Description of problem:

See https://issues.apache.org/jira/browse/QPID-1917

Version-Release number of selected component (if applicable):

Qpid built from trunk on 18th June 2009.
Comment 1 Carl Trieloff 2009-06-18 16:25:41 EDT

The bug is that we never wait for the async IO to complete on dequeue. In a txn we block on the AIO on the queue.

On enqueue we don't ack until the AIO completed. On dequeue, we don't wait to send the ack... isDequeueComplete()...  before sending the ack for dequeue.
Comment 2 Kim van der Riet 2009-06-25 13:47:51 EDT
Initial test overview:
* Start a broker with a store:
./qpidd --load-module path/to/msgstore.so --auth no --data-dir path/to/datadir --log-enable info+ --port 0

* Load the broker (and store) with 24 persistent messages (contained in file messages.txt):
./sender -b localhost -p ${PORT} --exchange TEST_EXCHANGE --routing-key TEST_QUEUE --durable yes < messages.txt

* Consume 8 messages, 5 at a time, then kill the broker:
./receiver -b localhost -p ${PORT} --messages 8 --ack-frequency 5 --credit-window 5 --queue TEST_QUEUE --trace; kill -9 ${BROKER_PID}

* Restart the broker and consume the remaining messages, making sure that there are no duplicates. This usually fails and all of the 8 messages consumed in the previous step are resent.

Immediately after the test completes and the broker is killed, the store dequeue records in the store write cache are lost as the store's flush timer has not fired yet. However, if the broker is stopped rather than killed (using -TERM), then the store is flushed and the records are written prior to the broker terminating, and the test passes (no dups). The java test referred to in the Apache Jira above was thought to be stopping (rather than killing) the broker, but appears to first stop then immediately kill the broker.

However, in spite of this, there is still a missing piece in the broker which is exposed by this kill test - which should not fail. The receiver should not exit until the dequeues have hit the disk. Currently, the dequeues are not flushed by the broker at the end of the receive portion of the test, even though the broker affirms that the MessageAcceptBodys are complete.
Comment 3 Kim van der Riet 2009-06-26 08:56:25 EDT
More details on the test above:

1. Create a message file messages.txt containing 24 messages, one message per line:

2. In one window, start a broker after removing the previous store:
rm -rf ~/.qpidd /tmp/lock /tmp/systemId /tmp/rhm
./qpidd --load-module /path/to/msgstore.so --auth no --data-dir /tmp --log-enable info+ --port 0
pgrep qpidd

Note both the port number printed by the broker and the return of the pgrep for the pid - these are needed for steps 3 and 4.

3. In another window in the test dir, prepare the broker and store by doing the following (the port number is that observed above):
export PORT=12345
qpid-config -a localhost:${PORT} add exchange direct TEST_EXCHANGE
qpid-config -a localhost:${PORT} add queue TEST_QUEUE --durable
qpid-config -a localhost:${PORT} bind TEST_EXCHANGE TEST_QUEUE TEST_QUEUE
./sender -b localhost -p ${PORT} --exchange TEST_EXCHANGE --routing-key TEST_QUEUE --durable yes < messages.txt

4. Now extract a small number of messages and immediately kill the broker (the PID used in the kill is that observed in step 2 above):
./receiver -b localhost -p ${PORT} --messages 8 --ack-frequency 5 --credit-window 5 --queue TEST_QUEUE; kill -9 4321

5. Restart the broker and rerun receiver to receive the remaining messages:
./qpidd --load-module /path/to/msgstore.so --auth no --data-dir /tmp --log-enable info+ --port 0
./receiver -b localhost -p ${PORT} --messages 16 --ack-frequency 5 --credit-window 5 --queue TEST_QUEUE

Alternatively, look at the store and check for the presence of the 8 dequeue records. If the store source is checked out, then the jhexdump script will help:
./tests/jrnl/jhexdump /tmp/rhm/jrnl/000d/TEST_QUEUE
will create files j0.txt through j7.txt. Look at the j0.txt file to see the enqueue and dequeue records.
Comment 5 Gordon Sim 2011-05-17 09:06:12 EDT
See also https://issues.apache.org/jira/browse/QPID-3079

Note You need to log in before you can comment on or make changes to this bug.