Bug 508144 - A broker stopped and restarted does not remember 'redelivered' status correctly
Summary: A broker stopped and restarted does not remember 'redelivered' status correctly
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 1.1.1
Hardware: All
OS: Linux
urgent
medium
Target Milestone: 1.3
: ---
Assignee: Kim van der Riet
QA Contact: Frantisek Reznicek
URL:
Whiteboard:
Depends On: 548141
Blocks: 640618
TreeView+ depends on / blocked
 
Reported: 2009-06-25 19:41 UTC by Mike Cressman
Modified: 2018-10-27 12:50 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
After a broker restart, when a durable message was recovered and sent again, it was not marked as "redelivered" if it had been previously sent but not acknowledged. As a result, the client was not notified of this fact, and the message may have been duplicated. With this update, all recovered messages are now marked as "redelivered", thus making message delivery more reliable.
Clone Of:
Environment:
Last Closed: 2010-10-14 16:00:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Test code to reproduce the problem (2.36 KB, application/x-gzip)
2009-06-25 19:50 UTC, Mike Cressman
no flags Details
The log from java failover message redelivered testing (16.92 KB, text/x-log)
2010-10-06 08:55 UTC, Frantisek Reznicek
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0773 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise MRG Messaging and Grid Version 1.3 2010-10-14 15:56:44 UTC

Description Mike Cressman 2009-06-25 19:41:39 UTC
Description of problem:

If a standalone, non-clustered qpidd is killed while in the middle of servicing a client, it subsequently fails to set the redelivered status for those messages that were already delivered to the client but not acknowledged.

Version-Release number of selected component (if applicable):
1.1.1

How reproducible:
100%

Steps to Reproduce:
(general -- see below for more detail)
1. put durable messages on a durable queue
2. while reading the messages with a client app, stop the broker ('/etc/init.d/qpid stop') 
3. start the broker, and print out the delivery_properties of the first message received after the broker starts up again
  
Actual results:
'redelivered' is not set as one of the delivery_properties

Expected results:
'redelivered=True' is one of the delivery_properties

Additional info:

Sample python client code included.

1. gunzip into a directory.  Create two terminal windows.  Use one to start and stop the broker and use the other to run the python clients.
2. With the broker running, run 'python newq.py' to create the queue.
3. Run 'python putmsg.py' to load the messages into the queue.
3. Run 'python getmsg.py' for a few seconds then Ctrl-C.
4. Run 'python getmsg.py' again to see that the first message has the redelivered flag in the deliver_properties.  Let getmsg finish.
5. Rerun 'python putmsg.py' to reload the messages.
6. Run 'python getmsg.py'.
7. After a few seconds, stop the broker 'sudo /etc/init.d/qpidd stop'.
8. Restart the broker 'sudo /etc/init.d/qpidd start'.
9. Note that the deliver_properties of the first message after the restart do not include the redelivered flag even though you can tell by the message numbering that it's been received before.

Comment 1 Mike Cressman 2009-06-25 19:50:17 UTC
Created attachment 349449 [details]
Test code to reproduce the problem

Comment 2 Gordon Sim 2009-06-26 09:18:29 UTC
A simple solution is to mark all recovered messages as potentially redelivered. That avoids any false negatives but obviously may result in a lot of false positives.

To improve on this, the intention to deliver a particular message (or up to a particular mesage) would need to be recorded in the journal (and flushed to disk) before the delivery was attempted. Avoiding hefty performance penalties for this would be important (and/or making it an optional feature).

Comment 4 Kim van der Riet 2009-12-16 18:55:46 UTC
Checked in a change which sets the Message::redelivered flag true on recovery of all messages.

qpid r.891362
store r.3747

There are no tests for this yet, waiting for an update to the python API which will allow examination of the redelivered flag on a retrieved message (see bug 548141).

Comment 5 Kim van der Riet 2009-12-18 18:10:38 UTC
Test for redelivered flag being set on recovery added in r.3751.

Comment 6 Mike Cressman 2010-01-13 20:42:50 UTC
Customer reports (IT 384493) that JMSRedelivered flag is not set during recovery

  Issue is noted with

  1. Standalone broker [at restart]
  2. Clustered broker  [at failover]

When this is tested, Java client should be verified as well.

Comment 8 Frantisek Reznicek 2010-10-05 12:50:11 UTC
The python part (both cases, interruption of the client|restart of the broker) of the issue was verified and is functional.
Final feedback on java part soon.

Comment 9 Kim van der Riet 2010-10-05 14:56:03 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: Recovery of Store (by restarting broker with durable messages) never sets the redelivered flag when sending the recovered messages which have been previously sent but not acknowledged.
Consequence: The Client is not notified that this message has been previously sent, and could potentially have already processed it, possibly resulting in a duplicate.
Fix: Because the store does not save the send state, it now marks _all_ recovered messages as redelivered.
Results: All recovered messages are marked as redelivered. This is in effect switching false positives for false negatives, which is better from a reliability point of veiw.

Comment 10 Jaromir Hradilek 2010-10-05 21:01:19 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,4 +1 @@
-Cause: Recovery of Store (by restarting broker with durable messages) never sets the redelivered flag when sending the recovered messages which have been previously sent but not acknowledged.
+After a broker restart, when a durable message was recovered and sent again, it was not marked as "redelivered" if it had been previously sent but not acknowledged. As a result, the client was not notified of this fact, and the message may have been duplicated. With this update, all recovered messages are now marked as "redelivered", making the message delivery more reliable.-Consequence: The Client is not notified that this message has been previously sent, and could potentially have already processed it, possibly resulting in a duplicate.
-Fix: Because the store does not save the send state, it now marks _all_ recovered messages as redelivered.
-Results: All recovered messages are marked as redelivered. This is in effect switching false positives for false negatives, which is better from a reliability point of veiw.

Comment 11 Frantisek Reznicek 2010-10-06 08:54:11 UTC
The testing status:

The python case:
a] re-running client after client stopped by signal
  All messages are marked 'Redelivered' when client is restarted - OK
b] rerunning client after client disconnected at broker's shutdown
  All messages are marked 'Redelivered' when client is restarted - OK


The java case:
a] re-running client after client stopped by signal
  There are just few tens of messages marked 'Redelivered' after client is restarted - This behavior is in contrary of comment's 2 definition.

b] rerunning client after client disconnected at broker's shutdown
  All messages are marked 'Redelivered' when client is restarted - OK

c] client run after failover event
  There are just few tens of messages marked 'Redelivered' after client fails over to another clustered broker - This behavior is in contrary of comment's 2 definition.


Raising needinfo for crosscheck and keeping the state.

Comment 12 Frantisek Reznicek 2010-10-06 08:55:39 UTC
Created attachment 451839 [details]
The log from java failover message redelivered testing

Comment 13 Kim van der Riet 2010-10-06 11:15:53 UTC
The action of setting the redelivered flag on *all* recovered messages applies to the shutdown and restart of the broker only. If a client is shut down, then only the unacknowledged messages being resent will have the redelivered flags set, as expected.

Python:
a) Not correct. The broker has not been restarted, and thus only the messages which were actually sent but not acknowledged should have the redelivered flag set. This case should behave the same as Java case a). Make sure acknowledgements are enabled. If this is indeed the case with acks enabled, then raise a separate bug. You may want to check with Rafi on this.
b) Correct, since the broker has been restarted and the messages in the store recovered.

Java:
a) Correct, since the broker has not been restarted and there has been no store recovery. Only messages actually sent but not acknowledged will be marked redelivered.

b) Correct, since the broker has been restarted.

c) I am uncertain of how clustering behaves in this case, but it sounds correct. Similar to case a), the broker has not been restarted. Check with Alan.

Comment 14 Frantisek Reznicek 2010-10-06 13:46:35 UTC
The behavior was discussed with following resolution:

Python case:
a] incorrect, addressed as separate issue: bug 640618 (more specific problem which can be excluded from this defect definition - confirmed)
b] confirmed as correct

Java case:
a] confirmed as correct
b] confirmed as correct
c] confirmed as correct


Conclusion:
Qpidd broker sets correctly message's redelivered flag for case when broker is restarted and also when failover happened.


The issue has been fixed, tested on RHEL 4.8 / 5.5 i386 / x86_64 on packages:
python-qmf-0.7.946106-13.el5
python-qpid-0.7.946106-14.el5
qmf-0.7.946106-17.el5
qmf-devel-0.7.946106-17.el5
qpid-cpp-client-0.7.946106-17.el5
qpid-cpp-client-devel-0.7.946106-17.el5
qpid-cpp-client-devel-docs-0.7.946106-17.el5
qpid-cpp-client-rdma-0.7.946106-17.el5
qpid-cpp-client-ssl-0.7.946106-17.el5
qpid-cpp-mrg-debuginfo-0.7.946106-17.el5
qpid-cpp-server-0.7.946106-17.el5
qpid-cpp-server-cluster-0.7.946106-17.el5
qpid-cpp-server-devel-0.7.946106-17.el5
qpid-cpp-server-rdma-0.7.946106-17.el5
qpid-cpp-server-ssl-0.7.946106-17.el5
qpid-cpp-server-store-0.7.946106-17.el5
qpid-cpp-server-xml-0.7.946106-17.el5
qpid-dotnet-0.4.738274-2.el5
qpid-java-client-0.7.946106-10.el5
qpid-java-common-0.7.946106-10.el5
qpid-tools-0.7.946106-11.el5
ruby-qmf-0.7.946106-17.el5
ruby-qpid-0.7.946106-2.el5


-> VERIFIED

Comment 15 Douglas Silas 2010-10-11 14:01:32 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-After a broker restart, when a durable message was recovered and sent again, it was not marked as "redelivered" if it had been previously sent but not acknowledged. As a result, the client was not notified of this fact, and the message may have been duplicated. With this update, all recovered messages are now marked as "redelivered", making the message delivery more reliable.+After a broker restart, when a durable message was recovered and sent again, it was not marked as "redelivered" if it had been previously sent but not acknowledged. As a result, the client was not notified of this fact, and the message may have been duplicated. With this update, all recovered messages are now marked as "redelivered", thus making message delivery more reliable.

Comment 17 errata-xmlrpc 2010-10-14 16:00:26 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0773.html


Note You need to log in before you can comment on or make changes to this bug.