505274 – Large durable messages that 'flow to disk', are not recovered correctly

Bug 505274 - Large durable messages that 'flow to disk', are not recovered correctly

Summary: Large durable messages that 'flow to disk', are not recovered correctly

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise MRG
Classification:	Red Hat
Component:	qpid-cpp
Sub Component:
Version:	1.0
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	1.2
Target Release:	---
Assignee:	Kim van der Riet
QA Contact:	Frantisek Reznicek
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	527551
TreeView+	depends on / blocked

Reported:	2009-06-11 10:06 UTC by Gordon Sim
Modified:	2015-11-16 00:07 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Messaging bug fix C: Large 'flow-to-disk' messages are stored in the journal, and then recovered. C: They would not be recovered correctly. F: A problem related to partially stored messages was corrected R: All 'flow-to-disk' messages are now recovered reliably. When large 'flow-to-disk' messages were stored in the journal, and then recovered, they could not be recovered correctly. A problem related to partially stored messages was corrected and all 'flow-to-disk' messages are now recovered reliably.
Clone Of:
Environment:
Last Closed:	2009-12-03 09:17:52 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
issue reproducer (9.41 KB, application/x-tbz) 2009-06-12 13:32 UTC, Frantisek Reznicek	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2009:1633	0	normal	SHIPPED_LIVE	Red Hat Enterprise MRG Messaging and Grid Version 1.2	2009-12-03 09:15:33 UTC

Description Gordon Sim 2009-06-11 10:06:51 UTC

Description of problem:

When 'flow to disk' messages are stored in the journal (as opposed to the bdb store as is the case for non-persistent messages), they are not recovered reliably.

Version-Release number of selected component (if applicable):

1.1.1 and earlier

How reproducible:

Readily

Steps to Reproduce:
1. create durable queue with a defined size limit (e.g. qpid-config add queue --durable --max-queue-size 5)
2. send more durable messages than that limit to trigger flow to disk beahaviour (e.g. for i in `seq 1 10`; do echo m_$i; done | sender --durable
3. now send a large durable message (>64k) (e.g. sender --durable --send-eos 1 < data
4. consume the messages (receiver)
  
Actual results:

Sometimes get errors, sometimes broker goes into infinite loop loading data.

Expected results:

Receive all the sent messages

Additional info:

Running the same steps for non-durable messages and durable queue (on clean store and broker) worked.

I created my data file for step 3. above by:

  for i in `seq 1 1000000`; do echo x; done | tr -d '\n' > data
  echo '' >> data # add new line to end of single line

Comment 1 Kim van der Riet 2009-06-11 19:04:22 UTC

Found that the offset of a partially loaded message was being ignored by the loadContent() function. Added tests to the python flow-to-disk tests for 10k messages, soak testing aught to use even larger sizes.

Fixed in r.3451

Easily reproduced through the above test.

Comment 2 Kim van der Riet 2009-06-11 19:05:37 UTC

The message size threshold for this bug to exhibit is the max frame size, ie > 4084 bytes.

Comment 3 Frantisek Reznicek 2009-06-12 13:32:16 UTC

Created attachment 347567 [details]
issue reproducer

Comment 4 Frantisek Reznicek 2009-06-12 13:33:49 UTC

The issue has been fixed in package rhm-0.5.3206-3.el5 (rhel 5 only i386 / x86_64). Waiting for rhel4 package...

Comment 6 Kim van der Riet 2009-10-08 13:41:13 UTC

Changing target milestone to 1.2, as this was included in store backport for 1.2.

Comment 7 Frantisek Reznicek 2009-10-19 15:24:34 UTC

The issue is fixed on qpid*-0.5.752581-26.el5, but again buggy (regression:bz529069) on qpid*-0.5.752581-28.el5, waiting for new package set... (checked on rhel5.4 i386/x86_64)

Comment 8 Frantisek Reznicek 2009-10-21 08:49:43 UTC

I'm actually not sure the behavior I can see on -28 and -29 is correct, I believe I can see the bug regression.

The reproducer attached as comment #3 passes on -26, but fails on both -28, -29.

I can see that flow-to-disk is not triggered when I do steps from bug description.
To be more specific I can see 'resource-limit-exceeded: Policy exceeded...' exception thrown by broker's QueuePolicy to client just before flow-to-disk can be triggered.

So the question is: Can someone review the reproducer and tell me whether I do something silly or it is regression bug, please?



See transcript of the reproducer on -29:

[root@mrg-qe-02 bz505274]# ./run.sh
Client[s] compile
.ready
starting standalone broker:.done
tcp        0      0 0.0.0.0:5672                0.0.0.0:*                   LISTEN      30550/qpidd
broker[s] running (pids:30550 )
A:Create durable queue with limit ...PASSED (ecode:0)
B:Filling the queue over the limit ...2009-oct-21 04:35:05 warning Exception received from broker: resource-limit-exceeded: resource-limit-exceeded: Policy exceeded on test-queue, policy: size: max=5, current=3; count: unlimited; type=reject (qpid/broker/QueuePolicy.cpp:83) [caused by 1 \x00:\x00]
Failed: resource-limit-exceeded: resource-limit-exceeded: Policy exceeded on test-queue, policy: size: max=5, current=3; count: unlimited; type=reject (qpid/broker/QueuePolicy.cpp:83)
FAILED (ecode:1)
C:Send on top of that one more huge message ... .1
D:Receiving data ... .............................................................abort (ecode:)
E:Data results diff ...FAILED (ecode:1)
test broker[s]:30550 done
stopping brokers...
OK
ERROR: Client[s] failed! (ecodes:1110)
Broker status:
0
Test Summary: TEST FAILED !!! (4 failures) dur: 103 secs

Comment 9 Kim van der Riet 2009-10-21 17:30:11 UTC

The default queue policy has been changed in this fix from flow-to-disk to reject. If you want to test flow-to-disk in a queue created using qpid-config, make sure to use the "--limit-policy flow-to-disk" option along with the other params. ie from description above, use:

qpid-config add queue test-queue --durable --max-queue-size 5 --limit-policy flow-to-disk

Comment 10 Frantisek Reznicek 2009-10-22 07:36:43 UTC

The change of the queue limit policy understood.
The issue has been fixed, validated on RHEL 5.4 / 4.8 i386 / x86_64 on packages:
[root@mrg-qe-02 bz505274]# rpm -qa | egrep '(qpid|rhm)' | sort
python-qpid-0.5.752581-4.el5
qpidc-0.5.752581-29.el5
qpidc-debuginfo-0.5.752581-29.el5
qpidc-devel-0.5.752581-29.el5
qpidc-perftest-0.5.752581-29.el5
qpidc-rdma-0.5.752581-29.el5
qpidc-ssl-0.5.752581-29.el5
qpidd-0.5.752581-29.el5
qpidd-acl-0.5.752581-29.el5
qpidd-cluster-0.5.752581-29.el5
qpidd-devel-0.5.752581-29.el5
qpid-dotnet-0.4.738274-2.el5
qpidd-rdma-0.5.752581-29.el5
qpidd-ssl-0.5.752581-29.el5
qpidd-xml-0.5.752581-29.el5
qpid-java-client-0.5.751061-9.el5
qpid-java-common-0.5.751061-9.el5
rhm-0.5.3206-16.el5
rhm-debuginfo-0.5.3206-16.el5
rhm-docs-0.5.756148-1.el5
rh-qpid-tests-0.5.752581-29.el5


-> VERIFIED

Comment 11 Irina Boverman 2009-10-29 14:20:47 UTC

Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
When 'flow to disk' messages are stored in the journal (as opposed to the bdb store as is the case for non-persistent messages), they are now being recovered reliably (505274)

Comment 12 Lana Brindley 2009-11-23 20:40:50 UTC

Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1,8 @@
-When 'flow to disk' messages are stored in the journal (as opposed to the bdb store as is the case for non-persistent messages), they are now being recovered reliably (505274)+Messaging bug fix
+
+C: Large 'flow-to-disk' messages are stored in the journal, and then recovered.
+C: They would not be recovered correctly.
+F: A problem related to partially stored messages was corrected
+R: All 'flow-to-disk' messages are now recovered reliably.
+
+When large 'flow-to-disk' messages were stored in the journal, and then recovered, they could not be recovered correctly. A problem related to partially stored messages was corrected and all 'flow-to-disk' messages are now recovered reliably.

Comment 14 errata-xmlrpc 2009-12-03 09:17:52 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1633.html

Note You need to log in before you can comment on or make changes to this bug.