Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 451011

Summary: durable perftest in fanout mode fails with sync store
Product: Red Hat Enterprise MRG Reporter: Gordon Sim <gsim>
Component: qpid-cppAssignee: Kim van der Riet <kim.vdriet>
Status: CLOSED WONTFIX QA Contact: Kim van der Riet <kim.vdriet>
Severity: high Docs Contact:
Priority: urgent    
Version: betaCC: aconway
Target Milestone: ---   
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-06-13 11:26:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Fix option 1
none
Option 2: change to qpid
none
Option 2: change to rhm store
none
remove sync store as option none

Description Gordon Sim 2008-06-12 11:41:37 UTC
perftest using durable messages in fanout mode fais on mrg5

Comment 1 Gordon Sim 2008-06-12 12:08:04 UTC
correction to title: only fails on sync store

Comment 2 Alan Conway 2008-06-12 15:53:04 UTC
I've seen two modes of failure - only with sync store. Client exits with:

Processing 2 messages from pub_done .SubscribeThread exception: Closed: 

I've also seen the test hang.

There is a second issue: the client is not displaying the correct error message.
The client is receiving the error: 

2008-jun-12 07:52:07 warning Broker closed connection: 541, Error dequeing
message, persistence id not set (BdbMessageStore.cpp:1208)

but it is getting overwritten with the generic Closed: message.

Comment 3 Gordon Sim 2008-06-12 16:50:02 UTC
The sync store assumes incorrectly that a message will be enqueued on all
matching queues before any dequeues can occur. This is no longer correct and
concurrent enqueues and dequeues on the sync store are unsafe as the code stands.

Comment 4 Gordon Sim 2008-06-12 17:59:37 UTC
Created attachment 309111 [details]
Fix option 1

One possible fix. This has the advantage of being only two lines, but the
disadvantage of affecting all codepaths, not just sync stored msgs.

Comment 5 Gordon Sim 2008-06-12 18:02:42 UTC
Created attachment 309115 [details]
Option 2: change to qpid

This is another option that minimises the impact on transient and async store
codepaths, but requires changes to both qpid and rhm store.

Comment 6 Gordon Sim 2008-06-12 18:03:53 UTC
Created attachment 309116 [details]
Option 2: change to rhm store

This is the corresponding change to the store for option 2.

Comment 7 Gordon Sim 2008-06-12 18:05:33 UTC
The third option would be locking within the store. However, as well as being
less efficient, that would be quite involved especially to avoid impacting the
standard async path, so I feel its not worth pursuing.

Comment 8 Alan Conway 2008-06-12 19:20:21 UTC
Committed fix to client error message problem revision 667205

Comment 9 Alan Conway 2008-06-12 19:37:57 UTC
We really need to perf test the impact of these fixes. 

Fix 1 looks risky to me, it puts two explicit mallocs and a new variable-size
data structure on the critical path.

Fix 2 has no impact for transient messages but I don't know what impact it might
have on async persistence, we would need to test that on Shaks rig before going
forward.

I'd strongly lean towards disabling sync store immediately. If there are still
situations where async store is broken then we need to fix it pronto. If we
support sync store for GA we may be dogged by it for months/years to come.

Comment 10 Gordon Sim 2008-06-12 19:51:27 UTC
Created attachment 309139 [details]
remove sync store as option

Comment 11 Gordon Sim 2008-06-12 19:54:08 UTC
I agree removing store is best option (though I don't see where the mallocs are
added), patch attached above to remove the option.

Comment 12 Gordon Sim 2008-06-13 11:26:17 UTC
Agreed solution was to disable the option to runthe store in sync mode.