Bug 505274 - Large durable messages that 'flow to disk', are not recovered correctly
Large durable messages that 'flow to disk', are not recovered correctly
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
1.0
All Linux
high Severity high
: 1.2
: ---
Assigned To: Kim van der Riet
Frantisek Reznicek
:
Depends On:
Blocks: 527551
  Show dependency treegraph
 
Reported: 2009-06-11 06:06 EDT by Gordon Sim
Modified: 2015-11-15 19:07 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Messaging bug fix C: Large 'flow-to-disk' messages are stored in the journal, and then recovered. C: They would not be recovered correctly. F: A problem related to partially stored messages was corrected R: All 'flow-to-disk' messages are now recovered reliably. When large 'flow-to-disk' messages were stored in the journal, and then recovered, they could not be recovered correctly. A problem related to partially stored messages was corrected and all 'flow-to-disk' messages are now recovered reliably.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-12-03 04:17:52 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
issue reproducer (9.41 KB, application/x-tbz)
2009-06-12 09:32 EDT, Frantisek Reznicek
no flags Details

  None (edit)
Description Gordon Sim 2009-06-11 06:06:51 EDT
Description of problem:

When 'flow to disk' messages are stored in the journal (as opposed to the bdb store as is the case for non-persistent messages), they are not recovered reliably.

Version-Release number of selected component (if applicable):

1.1.1 and earlier

How reproducible:

Readily

Steps to Reproduce:
1. create durable queue with a defined size limit (e.g. qpid-config add queue --durable --max-queue-size 5)
2. send more durable messages than that limit to trigger flow to disk beahaviour (e.g. for i in `seq 1 10`; do echo m_$i; done | sender --durable
3. now send a large durable message (>64k) (e.g. sender --durable --send-eos 1 < data
4. consume the messages (receiver)
  
Actual results:

Sometimes get errors, sometimes broker goes into infinite loop loading data.

Expected results:

Receive all the sent messages

Additional info:

Running the same steps for non-durable messages and durable queue (on clean store and broker) worked.

I created my data file for step 3. above by:

  for i in `seq 1 1000000`; do echo x; done | tr -d '\n' > data
  echo '' >> data # add new line to end of single line
Comment 1 Kim van der Riet 2009-06-11 15:04:22 EDT
Found that the offset of a partially loaded message was being ignored by the loadContent() function. Added tests to the python flow-to-disk tests for 10k messages, soak testing aught to use even larger sizes.

Fixed in r.3451

Easily reproduced through the above test.
Comment 2 Kim van der Riet 2009-06-11 15:05:37 EDT
The message size threshold for this bug to exhibit is the max frame size, ie > 4084 bytes.
Comment 3 Frantisek Reznicek 2009-06-12 09:32:16 EDT
Created attachment 347567 [details]
issue reproducer
Comment 4 Frantisek Reznicek 2009-06-12 09:33:49 EDT
The issue has been fixed in package rhm-0.5.3206-3.el5 (rhel 5 only i386 / x86_64). Waiting for rhel4 package...
Comment 6 Kim van der Riet 2009-10-08 09:41:13 EDT
Changing target milestone to 1.2, as this was included in store backport for 1.2.
Comment 7 Frantisek Reznicek 2009-10-19 11:24:34 EDT
The issue is fixed on qpid*-0.5.752581-26.el5, but again buggy (regression:bz529069) on qpid*-0.5.752581-28.el5, waiting for new package set... (checked on rhel5.4 i386/x86_64)
Comment 8 Frantisek Reznicek 2009-10-21 04:49:43 EDT
I'm actually not sure the behavior I can see on -28 and -29 is correct, I believe I can see the bug regression.

The reproducer attached as comment #3 passes on -26, but fails on both -28, -29.

I can see that flow-to-disk is not triggered when I do steps from bug description.
To be more specific I can see 'resource-limit-exceeded: Policy exceeded...' exception thrown by broker's QueuePolicy to client just before flow-to-disk can be triggered.

So the question is: Can someone review the reproducer and tell me whether I do something silly or it is regression bug, please?



See transcript of the reproducer on -29:

[root@mrg-qe-02 bz505274]# ./run.sh
Client[s] compile
.ready
starting standalone broker:.done
tcp        0      0 0.0.0.0:5672                0.0.0.0:*                   LISTEN      30550/qpidd
broker[s] running (pids:30550 )
A:Create durable queue with limit ...PASSED (ecode:0)
B:Filling the queue over the limit ...2009-oct-21 04:35:05 warning Exception received from broker: resource-limit-exceeded: resource-limit-exceeded: Policy exceeded on test-queue, policy: size: max=5, current=3; count: unlimited; type=reject (qpid/broker/QueuePolicy.cpp:83) [caused by 1 \x00:\x00]
Failed: resource-limit-exceeded: resource-limit-exceeded: Policy exceeded on test-queue, policy: size: max=5, current=3; count: unlimited; type=reject (qpid/broker/QueuePolicy.cpp:83)
FAILED (ecode:1)
C:Send on top of that one more huge message ... .1
D:Receiving data ... .............................................................abort (ecode:)
E:Data results diff ...FAILED (ecode:1)
test broker[s]:30550 done
stopping brokers...
OK
ERROR: Client[s] failed! (ecodes:1110)
Broker status:
0
Test Summary: TEST FAILED !!! (4 failures) dur: 103 secs
Comment 9 Kim van der Riet 2009-10-21 13:30:11 EDT
The default queue policy has been changed in this fix from flow-to-disk to reject. If you want to test flow-to-disk in a queue created using qpid-config, make sure to use the "--limit-policy flow-to-disk" option along with the other params. ie from description above, use:

qpid-config add queue test-queue --durable --max-queue-size 5 --limit-policy flow-to-disk
Comment 10 Frantisek Reznicek 2009-10-22 03:36:43 EDT
The change of the queue limit policy understood.
The issue has been fixed, validated on RHEL 5.4 / 4.8 i386 / x86_64 on packages:
[root@mrg-qe-02 bz505274]# rpm -qa | egrep '(qpid|rhm)' | sort
python-qpid-0.5.752581-4.el5
qpidc-0.5.752581-29.el5
qpidc-debuginfo-0.5.752581-29.el5
qpidc-devel-0.5.752581-29.el5
qpidc-perftest-0.5.752581-29.el5
qpidc-rdma-0.5.752581-29.el5
qpidc-ssl-0.5.752581-29.el5
qpidd-0.5.752581-29.el5
qpidd-acl-0.5.752581-29.el5
qpidd-cluster-0.5.752581-29.el5
qpidd-devel-0.5.752581-29.el5
qpid-dotnet-0.4.738274-2.el5
qpidd-rdma-0.5.752581-29.el5
qpidd-ssl-0.5.752581-29.el5
qpidd-xml-0.5.752581-29.el5
qpid-java-client-0.5.751061-9.el5
qpid-java-common-0.5.751061-9.el5
rhm-0.5.3206-16.el5
rhm-debuginfo-0.5.3206-16.el5
rhm-docs-0.5.756148-1.el5
rh-qpid-tests-0.5.752581-29.el5


-> VERIFIED
Comment 11 Irina Boverman 2009-10-29 10:20:47 EDT
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
When 'flow to disk' messages are stored in the journal (as opposed to the bdb store as is the case for non-persistent messages), they are now being recovered reliably (505274)
Comment 12 Lana Brindley 2009-11-23 15:40:50 EST
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1,8 @@
-When 'flow to disk' messages are stored in the journal (as opposed to the bdb store as is the case for non-persistent messages), they are now being recovered reliably (505274)+Messaging bug fix
+
+C: Large 'flow-to-disk' messages are stored in the journal, and then recovered.
+C: They would not be recovered correctly.
+F: A problem related to partially stored messages was corrected
+R: All 'flow-to-disk' messages are now recovered reliably.
+
+When large 'flow-to-disk' messages were stored in the journal, and then recovered, they could not be recovered correctly. A problem related to partially stored messages was corrected and all 'flow-to-disk' messages are now recovered reliably.
Comment 14 errata-xmlrpc 2009-12-03 04:17:52 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1633.html

Note You need to log in before you can comment on or make changes to this bug.