Description of problem: See https://issues.apache.org/jira/browse/QPID-1917 Version-Release number of selected component (if applicable): Qpid built from trunk on 18th June 2009.
The bug is that we never wait for the async IO to complete on dequeue. In a txn we block on the AIO on the queue. On enqueue we don't ack until the AIO completed. On dequeue, we don't wait to send the ack... isDequeueComplete()... before sending the ack for dequeue.
Initial test overview: * Start a broker with a store: ./qpidd --load-module path/to/msgstore.so --auth no --data-dir path/to/datadir --log-enable info+ --port 0 * Load the broker (and store) with 24 persistent messages (contained in file messages.txt): ./sender -b localhost -p ${PORT} --exchange TEST_EXCHANGE --routing-key TEST_QUEUE --durable yes < messages.txt * Consume 8 messages, 5 at a time, then kill the broker: ./receiver -b localhost -p ${PORT} --messages 8 --ack-frequency 5 --credit-window 5 --queue TEST_QUEUE --trace; kill -9 ${BROKER_PID} * Restart the broker and consume the remaining messages, making sure that there are no duplicates. This usually fails and all of the 8 messages consumed in the previous step are resent. Immediately after the test completes and the broker is killed, the store dequeue records in the store write cache are lost as the store's flush timer has not fired yet. However, if the broker is stopped rather than killed (using -TERM), then the store is flushed and the records are written prior to the broker terminating, and the test passes (no dups). The java test referred to in the Apache Jira above was thought to be stopping (rather than killing) the broker, but appears to first stop then immediately kill the broker. However, in spite of this, there is still a missing piece in the broker which is exposed by this kill test - which should not fail. The receiver should not exit until the dequeues have hit the disk. Currently, the dequeues are not flushed by the broker at the end of the receive portion of the test, even though the broker affirms that the MessageAcceptBodys are complete.
More details on the test above: 1. Create a message file messages.txt containing 24 messages, one message per line: message_01 message_02 ... message_24 2. In one window, start a broker after removing the previous store: rm -rf ~/.qpidd /tmp/lock /tmp/systemId /tmp/rhm ./qpidd --load-module /path/to/msgstore.so --auth no --data-dir /tmp --log-enable info+ --port 0 pgrep qpidd Note both the port number printed by the broker and the return of the pgrep for the pid - these are needed for steps 3 and 4. 3. In another window in the test dir, prepare the broker and store by doing the following (the port number is that observed above): export PORT=12345 qpid-config -a localhost:${PORT} add exchange direct TEST_EXCHANGE qpid-config -a localhost:${PORT} add queue TEST_QUEUE --durable qpid-config -a localhost:${PORT} bind TEST_EXCHANGE TEST_QUEUE TEST_QUEUE ./sender -b localhost -p ${PORT} --exchange TEST_EXCHANGE --routing-key TEST_QUEUE --durable yes < messages.txt 4. Now extract a small number of messages and immediately kill the broker (the PID used in the kill is that observed in step 2 above): ./receiver -b localhost -p ${PORT} --messages 8 --ack-frequency 5 --credit-window 5 --queue TEST_QUEUE; kill -9 4321 5. Restart the broker and rerun receiver to receive the remaining messages: ./qpidd --load-module /path/to/msgstore.so --auth no --data-dir /tmp --log-enable info+ --port 0 ./receiver -b localhost -p ${PORT} --messages 16 --ack-frequency 5 --credit-window 5 --queue TEST_QUEUE Alternatively, look at the store and check for the presence of the 8 dequeue records. If the store source is checked out, then the jhexdump script will help: ./tests/jrnl/jhexdump /tmp/rhm/jrnl/000d/TEST_QUEUE will create files j0.txt through j7.txt. Look at the j0.txt file to see the enqueue and dequeue records.
See also https://issues.apache.org/jira/browse/QPID-3079
This product has been discontinued or is no longer tracked in Red Hat Bugzilla.