Bug 505274

Summary: Large durable messages that 'flow to disk', are not recovered correctly
Product: Red Hat Enterprise MRG Reporter: Gordon Sim <gsim>
Component: qpid-cppAssignee: Kim van der Riet <kim.vdriet>
Status: CLOSED ERRATA QA Contact: Frantisek Reznicek <freznice>
Severity: high Docs Contact:
Priority: high    
Version: 1.0CC: esammons, freznice, iboverma, kim.vdriet, lbrindle
Target Milestone: 1.2   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Messaging bug fix C: Large 'flow-to-disk' messages are stored in the journal, and then recovered. C: They would not be recovered correctly. F: A problem related to partially stored messages was corrected R: All 'flow-to-disk' messages are now recovered reliably. When large 'flow-to-disk' messages were stored in the journal, and then recovered, they could not be recovered correctly. A problem related to partially stored messages was corrected and all 'flow-to-disk' messages are now recovered reliably.
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-12-03 09:17:52 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 527551    
Attachments:
Description Flags
issue reproducer none

Description Gordon Sim 2009-06-11 10:06:51 UTC
Description of problem:

When 'flow to disk' messages are stored in the journal (as opposed to the bdb store as is the case for non-persistent messages), they are not recovered reliably.

Version-Release number of selected component (if applicable):

1.1.1 and earlier

How reproducible:

Readily

Steps to Reproduce:
1. create durable queue with a defined size limit (e.g. qpid-config add queue --durable --max-queue-size 5)
2. send more durable messages than that limit to trigger flow to disk beahaviour (e.g. for i in `seq 1 10`; do echo m_$i; done | sender --durable
3. now send a large durable message (>64k) (e.g. sender --durable --send-eos 1 < data
4. consume the messages (receiver)
  
Actual results:

Sometimes get errors, sometimes broker goes into infinite loop loading data.

Expected results:

Receive all the sent messages

Additional info:

Running the same steps for non-durable messages and durable queue (on clean store and broker) worked.

I created my data file for step 3. above by:

  for i in `seq 1 1000000`; do echo x; done | tr -d '\n' > data
  echo '' >> data # add new line to end of single line

Comment 1 Kim van der Riet 2009-06-11 19:04:22 UTC
Found that the offset of a partially loaded message was being ignored by the loadContent() function. Added tests to the python flow-to-disk tests for 10k messages, soak testing aught to use even larger sizes.

Fixed in r.3451

Easily reproduced through the above test.

Comment 2 Kim van der Riet 2009-06-11 19:05:37 UTC
The message size threshold for this bug to exhibit is the max frame size, ie > 4084 bytes.

Comment 3 Frantisek Reznicek 2009-06-12 13:32:16 UTC
Created attachment 347567 [details]
issue reproducer

Comment 4 Frantisek Reznicek 2009-06-12 13:33:49 UTC
The issue has been fixed in package rhm-0.5.3206-3.el5 (rhel 5 only i386 / x86_64). Waiting for rhel4 package...

Comment 6 Kim van der Riet 2009-10-08 13:41:13 UTC
Changing target milestone to 1.2, as this was included in store backport for 1.2.

Comment 7 Frantisek Reznicek 2009-10-19 15:24:34 UTC
The issue is fixed on qpid*-0.5.752581-26.el5, but again buggy (regression:bz529069) on qpid*-0.5.752581-28.el5, waiting for new package set... (checked on rhel5.4 i386/x86_64)

Comment 8 Frantisek Reznicek 2009-10-21 08:49:43 UTC
I'm actually not sure the behavior I can see on -28 and -29 is correct, I believe I can see the bug regression.

The reproducer attached as comment #3 passes on -26, but fails on both -28, -29.

I can see that flow-to-disk is not triggered when I do steps from bug description.
To be more specific I can see 'resource-limit-exceeded: Policy exceeded...' exception thrown by broker's QueuePolicy to client just before flow-to-disk can be triggered.

So the question is: Can someone review the reproducer and tell me whether I do something silly or it is regression bug, please?



See transcript of the reproducer on -29:

[root@mrg-qe-02 bz505274]# ./run.sh
Client[s] compile
.ready
starting standalone broker:.done
tcp        0      0 0.0.0.0:5672                0.0.0.0:*                   LISTEN      30550/qpidd
broker[s] running (pids:30550 )
A:Create durable queue with limit ...PASSED (ecode:0)
B:Filling the queue over the limit ...2009-oct-21 04:35:05 warning Exception received from broker: resource-limit-exceeded: resource-limit-exceeded: Policy exceeded on test-queue, policy: size: max=5, current=3; count: unlimited; type=reject (qpid/broker/QueuePolicy.cpp:83) [caused by 1 \x00:\x00]
Failed: resource-limit-exceeded: resource-limit-exceeded: Policy exceeded on test-queue, policy: size: max=5, current=3; count: unlimited; type=reject (qpid/broker/QueuePolicy.cpp:83)
FAILED (ecode:1)
C:Send on top of that one more huge message ... .1
D:Receiving data ... .............................................................abort (ecode:)
E:Data results diff ...FAILED (ecode:1)
test broker[s]:30550 done
stopping brokers...
OK
ERROR: Client[s] failed! (ecodes:1110)
Broker status:
0
Test Summary: TEST FAILED !!! (4 failures) dur: 103 secs

Comment 9 Kim van der Riet 2009-10-21 17:30:11 UTC
The default queue policy has been changed in this fix from flow-to-disk to reject. If you want to test flow-to-disk in a queue created using qpid-config, make sure to use the "--limit-policy flow-to-disk" option along with the other params. ie from description above, use:

qpid-config add queue test-queue --durable --max-queue-size 5 --limit-policy flow-to-disk

Comment 10 Frantisek Reznicek 2009-10-22 07:36:43 UTC
The change of the queue limit policy understood.
The issue has been fixed, validated on RHEL 5.4 / 4.8 i386 / x86_64 on packages:
[root@mrg-qe-02 bz505274]# rpm -qa | egrep '(qpid|rhm)' | sort
python-qpid-0.5.752581-4.el5
qpidc-0.5.752581-29.el5
qpidc-debuginfo-0.5.752581-29.el5
qpidc-devel-0.5.752581-29.el5
qpidc-perftest-0.5.752581-29.el5
qpidc-rdma-0.5.752581-29.el5
qpidc-ssl-0.5.752581-29.el5
qpidd-0.5.752581-29.el5
qpidd-acl-0.5.752581-29.el5
qpidd-cluster-0.5.752581-29.el5
qpidd-devel-0.5.752581-29.el5
qpid-dotnet-0.4.738274-2.el5
qpidd-rdma-0.5.752581-29.el5
qpidd-ssl-0.5.752581-29.el5
qpidd-xml-0.5.752581-29.el5
qpid-java-client-0.5.751061-9.el5
qpid-java-common-0.5.751061-9.el5
rhm-0.5.3206-16.el5
rhm-debuginfo-0.5.3206-16.el5
rhm-docs-0.5.756148-1.el5
rh-qpid-tests-0.5.752581-29.el5


-> VERIFIED

Comment 11 Irina Boverman 2009-10-29 14:20:47 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
When 'flow to disk' messages are stored in the journal (as opposed to the bdb store as is the case for non-persistent messages), they are now being recovered reliably (505274)

Comment 12 Lana Brindley 2009-11-23 20:40:50 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1,8 @@
-When 'flow to disk' messages are stored in the journal (as opposed to the bdb store as is the case for non-persistent messages), they are now being recovered reliably (505274)+Messaging bug fix
+
+C: Large 'flow-to-disk' messages are stored in the journal, and then recovered.
+C: They would not be recovered correctly.
+F: A problem related to partially stored messages was corrected
+R: All 'flow-to-disk' messages are now recovered reliably.
+
+When large 'flow-to-disk' messages were stored in the journal, and then recovered, they could not be recovered correctly. A problem related to partially stored messages was corrected and all 'flow-to-disk' messages are now recovered reliably.

Comment 14 errata-xmlrpc 2009-12-03 09:17:52 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1633.html