Bug 525813

Summary:	Move Flow to disk from BDB to journal
Product:	Red Hat Enterprise MRG	Reporter:	Carl Trieloff <cctrieloff>
Component:	qpid-cpp	Assignee:	Kim van der Riet <kim.vdriet>
Status:	CLOSED ERRATA	QA Contact:	Frantisek Reznicek <freznice>
Severity:	medium	Docs Contact:
Priority:	low
Version:	1.1.6	CC:	esammons, freznice, iboverma, lbrindle, tross
Target Milestone:	1.2
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Messaging bug fix C: When multiple threads are enqueueing, occasionally a thread with an assigned rid will be stopped and another thread with a later assigned rid will enqueue its record first. C: The broker will close the client session with an error. F: A list is now kept of out-of-order rids that are greater than the current rid being read and which could be encountered while reading. If the following read has its rid in this list, then the read pipeline is invalidated. R: The crash no longer occurs. When multiple threads were enqueueing, occasionally a thread with an assigned rid would stop and another thread with a later assigned rid would enqueue its record first. This caused the broker to close the client session with an error. A list is now kept of out-of-order rids that are greater than the current rid being read and which could be encountered. If the next read has a rid in the list, then the read pipeline is invalidated. This prevents the crash from occurring.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2009-12-03 09:19:24 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	527551

Description Carl Trieloff 2009-09-25 20:52:10 UTC

Comment 1 Kim van der Riet 2009-09-28 15:48:55 UTC

Fixed in r.819600.

Comment 2 Kim van der Riet 2009-09-28 16:53:22 UTC

When running tsxtest to test flow-to-disk, the broker will occasionally close the client session with RHM_IORES_EMPTY - which indicates that the read pipeline has nothing more to read when flow-to-disk content is reloaded from the store.

On a 8- or 16-core box, start a broker:
./qpidd --tcp-nodelay --worker-threads 4 --auth no --load-module ../../../store/lib/.libs/msgstore.so --data-dir /tmp --log-enable info+

In another window, start the client:
./tsxtest --flow 1 --cut_depth 1 --durable_msg yes --rate 1000 --messages 40000

If the fault occurs, the RHM_IORES_EMPTY message can be seen in the broker window; the client has either closed or crashed.

This fault has been difficult to reproduce, with one or two occurrences per day.

Comment 3 Kim van der Riet 2009-09-28 17:35:55 UTC

The problem is out-of-order enqueueing. While the rids are assigned in order, the operation of assigning and enqueueing is not atomic. When multiple threads are enqueueing, occasionally a thread with an assigned rid will be stopped and another thread with a later assigned rid will enqueue its record first.

The logic used in the JournalImpl::loadContent() method does not take this possibility into account, and will run to the end of all the known records and return RHM_IORES_EMPTY if it encounters an out-of-order rid on disk.

This is now fixed by keeping a list of out-of-order rids that are greater than the current rid being read which may be encountered while reading. If the following read has its rid in this list, then the read pipeline is invalidated.

Fixed in store svn r.3650; this is synced with qpid r.819600.

Comment 4 Kim van der Riet 2009-10-01 20:51:25 UTC

git commit on branch mrg_1.1.x: http://git.et.redhat.com/git/qpid.git/?p=qpid.git;a=commitdiff;h=f6518f49336ff36826e3780fa72973e5da0b6891

Comment 5 Kim van der Riet 2009-10-08 13:48:09 UTC

Included in store backport for 1.2.

Comment 7 Irina Boverman 2009-10-22 17:48:29 UTC

Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
Resolved problem with out-of-order enqueueing causing the broker to terminate the client session incorrectly and the client to close or crash
(525813)

Comment 8 Frantisek Reznicek 2009-10-23 07:58:22 UTC

The issue has been fixed, no occurence found after extensive stress testing
using above scenario on RHEL 4.8 / 5.4 i386 / x86_64 on packages:
[root@mrg-qe-02 ~]# rpm -qa | egrep '(rhm|qpid)' | sort -u
python-qpid-0.5.752581-4.el5
qpidc-0.5.752581-29.el5
qpidc-debuginfo-0.5.752581-29.el5
qpidc-devel-0.5.752581-29.el5
qpidc-perftest-0.5.752581-29.el5
qpidc-rdma-0.5.752581-29.el5
qpidc-ssl-0.5.752581-29.el5
qpidd-0.5.752581-29.el5
qpidd-acl-0.5.752581-29.el5
qpidd-cluster-0.5.752581-29.el5
qpidd-devel-0.5.752581-29.el5
qpid-dotnet-0.4.738274-2.el5
qpidd-rdma-0.5.752581-29.el5
qpidd-ssl-0.5.752581-29.el5
qpidd-xml-0.5.752581-29.el5
qpid-java-client-0.5.751061-9.el5
qpid-java-common-0.5.751061-9.el5
rhm-0.5.3206-16.el5
rhm-debuginfo-0.5.3206-16.el5
rhm-docs-0.5.756148-1.el5
rh-qpid-tests-0.5.752581-29.el5

-> VERIFIED

Comment 9 Lana Brindley 2009-11-26 21:33:48 UTC

Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1,2 +1,9 @@
-Resolved problem with out-of-order enqueueing causing the broker to terminate the client session incorrectly and the client to close or crash
+Messaging bug fix
-(525813)+
+C: When multiple threads are enqueueing, occasionally a thread with an assigned rid will be stopped and another thread with a later assigned rid will enqueue its record first.
+C: The broker will close the client session with an error.
+F: A list is now kept of out-of-order rids that are greater than the current rid being read and which could be encountered while reading. If the
+following read has its rid in this list, then the read pipeline is invalidated.
+R: The crash no longer occurs.
+
+When multiple threads were enqueueing, occasionally a thread with an assigned rid would stop and another thread with a later assigned rid would enqueue its record first. This caused the  broker to close the client session with an error. A list is now kept of out-of-order rids that are greater than the current rid being read and which could be encountered. If the next read has a rid in the list, then the read pipeline is invalidated. This prevents the crash from occurring.

Comment 11 errata-xmlrpc 2009-12-03 09:19:24 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1633.html