Bug 679911 - store unit tests are hanging when running the cluster unit test
Summary: store unit tests are hanging when running the cluster unit test
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: Development
Hardware: All
OS: Linux
unspecified
urgent
Target Milestone: 2.0
: ---
Assignee: Ken Giusti
QA Contact: MRG Quality Engineering
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-02-23 20:27 UTC by Ken Giusti
Modified: 2013-02-25 11:15 UTC (History)
2 users (show)

Fixed In Version: qpid-cpp-server-0.9.1079953-1, qpid-cpp-server-cluster-0.9.1079953-1
Doc Type: Bug Fix
Doc Text:
Cause A clustered broker was not correctly detecting when a durable message was finished being saved in the durable store. Consequence: The cluster can hang when store is enabled and a new broker is added to the cluster. Fix: The clustered brokers now monitor for the completion of the message storage event. Result: The newly added broker is correctly notified of the completion of the message storage process, and will not hang.
Clone Of:
Environment:
Last Closed: 2013-02-25 11:15:45 UTC
Target Upstream Version:


Attachments (Terms of Use)
Log from failing test. (219.26 KB, text/plain)
2011-02-23 20:29 UTC, Ken Giusti
no flags Details
Log from passing test - test passes if run without store enabled. (301.04 KB, text/plain)
2011-02-23 20:30 UTC, Ken Giusti
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Apache JIRA QPID-3084 0 None None None Never

Description Ken Giusti 2011-02-23 20:27:45 UTC
Description of problem:

Running the store unit tests will fail on the cluster tests:

make  check-TESTS
make[3]: Entering directory `/var/lib/ptolemy/sources/qpid-cpp-store/tests/cluster'
Running C++ cluster tests...
Running 33 test cases...

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
*************fork2: 2011-02-23 12:12:48 critical cluster(20.0.10.2:2423 UPDATEE) catch-up connection closed prematurely 10.16.43.8:40675-10.16.43.8:36231(20.0.10.2:2423-1 local,catchup)
make[3]: *** wait: No child processes.  Stop.
make[3]: *** Waiting for unfinished jobs....
make[3]: *** wait: No child processes.  Stop.
+++ ps -fu ptolemy -Hww
Error 2



Version-Release number of selected component (if applicable):
Development (pre-MRG2.0)

How reproducible:
100%


Steps to Reproduce:
1. checkout latest store svn repo
2. checkout & build latest qpid svn repo
3. configure store to reference your qpid svn repo
4. Run "make check" in store svn repo
  
Actual results:
The tests will hang in the "testMessageTimeToLive" cluster unit test.

Expected results:
Test should not hang, and should succeed.

Additional info:
Problem is related to the async command completion functionality introduced by producer flow control work - https://bugzilla.redhat.com/show_bug.cgi?id=660291

Comment 1 Ken Giusti 2011-02-23 20:29:25 UTC
Created attachment 480569 [details]
Log from failing test.

Comment 2 Ken Giusti 2011-02-23 20:30:08 UTC
Created attachment 480570 [details]
Log from passing test - test passes if run without store enabled.

Comment 3 Ken Giusti 2011-02-23 21:18:39 UTC
Analysis:

The test creates a cluster of two brokers.  It then declares some queues, and sends some durable messages to the cluster.  All that appears fine.

Then the test attempts to add another broker to the cluster.  At this point the test hangs.

From what I can see, the new broker is asynchronously completing messages to the store.  The store completes the messages in its thread, and calls requestIOProcessing() to schedule the completion of the message's transfer command.

The requested scheduled completion never runs.

This appears to be happening during the cluster update process (the new broker is being updated at the time).  Does the cluster update process hold off callbacks scheduled via requestIOProcessing()?

Comment 4 Ken Giusti 2011-02-24 22:39:45 UTC
Alan has create a fix for this.

Patched upstream:

http://svn.apache.org/viewvc?view=revision&revision=1074332

Jira:
https://issues.apache.org/jira/browse/QPID-3084

Comment 5 Ken Giusti 2011-03-07 16:39:55 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause
    A clustered broker was not correctly detecting when a durable message was finished being saved in the durable store.
Consequence:
    The cluster can hang when store is enabled and a new broker is added to the cluster.
Fix:
    The clustered brokers now monitor for the completion of the message storage event.
Result:
   The newly added broker is correctly notified of the completion of the message storage process, and will not hang.


Note You need to log in before you can comment on or make changes to this bug.