Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 679911

Summary:

store unit tests are hanging when running the cluster unit test

Product:

Red Hat Enterprise MRG

Reporter:

Ken Giusti <kgiusti>

Component:

qpid-cpp

Assignee:

Ken Giusti <kgiusti>

Status:

CLOSED CURRENTRELEASE

QA Contact:

MRG Quality Engineering <mrgqe-bugs>

Severity:

urgent

Docs Contact:

Priority:

unspecified

Version:

Development

CC:

iboverma, jross

Target Milestone:

2.0

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

qpid-cpp-server-0.9.1079953-1, qpid-cpp-server-cluster-0.9.1079953-1

Doc Type:

Bug Fix

Doc Text:

Cause A clustered broker was not correctly detecting when a durable message was finished being saved in the durable store. Consequence: The cluster can hang when store is enabled and a new broker is added to the cluster. Fix: The clustered brokers now monitor for the completion of the message storage event. Result: The newly added broker is correctly notified of the completion of the message storage process, and will not hang.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2013-02-25 11:15:45 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Log from failing test.	none
Log from passing test - test passes if run without store enabled.	none

Description Ken Giusti 2011-02-23 20:27:45 UTC

Description of problem:

Running the store unit tests will fail on the cluster tests:

make  check-TESTS
make[3]: Entering directory `/var/lib/ptolemy/sources/qpid-cpp-store/tests/cluster'
Running C++ cluster tests...
Running 33 test cases...

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
*************fork2: 2011-02-23 12:12:48 critical cluster(20.0.10.2:2423 UPDATEE) catch-up connection closed prematurely 10.16.43.8:40675-10.16.43.8:36231(20.0.10.2:2423-1 local,catchup)
make[3]: *** wait: No child processes.  Stop.
make[3]: *** Waiting for unfinished jobs....
make[3]: *** wait: No child processes.  Stop.
+++ ps -fu ptolemy -Hww
Error 2



Version-Release number of selected component (if applicable):
Development (pre-MRG2.0)

How reproducible:
100%


Steps to Reproduce:
1. checkout latest store svn repo
2. checkout & build latest qpid svn repo
3. configure store to reference your qpid svn repo
4. Run "make check" in store svn repo
  
Actual results:
The tests will hang in the "testMessageTimeToLive" cluster unit test.

Expected results:
Test should not hang, and should succeed.

Additional info:
Problem is related to the async command completion functionality introduced by producer flow control work - https://bugzilla.redhat.com/show_bug.cgi?id=660291

Comment 1 Ken Giusti 2011-02-23 20:29:25 UTC

Created attachment 480569 [details]
Log from failing test.

Comment 2 Ken Giusti 2011-02-23 20:30:08 UTC

Created attachment 480570 [details]
Log from passing test - test passes if run without store enabled.

Comment 3 Ken Giusti 2011-02-23 21:18:39 UTC

Analysis:

The test creates a cluster of two brokers.  It then declares some queues, and sends some durable messages to the cluster.  All that appears fine.

Then the test attempts to add another broker to the cluster.  At this point the test hangs.

From what I can see, the new broker is asynchronously completing messages to the store.  The store completes the messages in its thread, and calls requestIOProcessing() to schedule the completion of the message's transfer command.

The requested scheduled completion never runs.

This appears to be happening during the cluster update process (the new broker is being updated at the time).  Does the cluster update process hold off callbacks scheduled via requestIOProcessing()?

Comment 4 Ken Giusti 2011-02-24 22:39:45 UTC

Alan has create a fix for this.

Patched upstream:

http://svn.apache.org/viewvc?view=revision&revision=1074332

Jira:
https://issues.apache.org/jira/browse/QPID-3084

Comment 5 Ken Giusti 2011-03-07 16:39:55 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause
    A clustered broker was not correctly detecting when a durable message was finished being saved in the durable store.
Consequence:
    The cluster can hang when store is enabled and a new broker is added to the cluster.
Fix:
    The clustered brokers now monitor for the completion of the message storage event.
Result:
   The newly added broker is correctly notified of the completion of the message storage process, and will not hang.