Hide Forgot
Description of problem: Running the store unit tests will fail on the cluster tests: make check-TESTS make[3]: Entering directory `/var/lib/ptolemy/sources/qpid-cpp-store/tests/cluster' Running C++ cluster tests... Running 33 test cases... 0% 10 20 30 40 50 60 70 80 90 100% |----|----|----|----|----|----|----|----|----|----| *************fork2: 2011-02-23 12:12:48 critical cluster(20.0.10.2:2423 UPDATEE) catch-up connection closed prematurely 10.16.43.8:40675-10.16.43.8:36231(20.0.10.2:2423-1 local,catchup) make[3]: *** wait: No child processes. Stop. make[3]: *** Waiting for unfinished jobs.... make[3]: *** wait: No child processes. Stop. +++ ps -fu ptolemy -Hww Error 2 Version-Release number of selected component (if applicable): Development (pre-MRG2.0) How reproducible: 100% Steps to Reproduce: 1. checkout latest store svn repo 2. checkout & build latest qpid svn repo 3. configure store to reference your qpid svn repo 4. Run "make check" in store svn repo Actual results: The tests will hang in the "testMessageTimeToLive" cluster unit test. Expected results: Test should not hang, and should succeed. Additional info: Problem is related to the async command completion functionality introduced by producer flow control work - https://bugzilla.redhat.com/show_bug.cgi?id=660291
Created attachment 480569 [details] Log from failing test.
Created attachment 480570 [details] Log from passing test - test passes if run without store enabled.
Analysis: The test creates a cluster of two brokers. It then declares some queues, and sends some durable messages to the cluster. All that appears fine. Then the test attempts to add another broker to the cluster. At this point the test hangs. From what I can see, the new broker is asynchronously completing messages to the store. The store completes the messages in its thread, and calls requestIOProcessing() to schedule the completion of the message's transfer command. The requested scheduled completion never runs. This appears to be happening during the cluster update process (the new broker is being updated at the time). Does the cluster update process hold off callbacks scheduled via requestIOProcessing()?
Alan has create a fix for this. Patched upstream: http://svn.apache.org/viewvc?view=revision&revision=1074332 Jira: https://issues.apache.org/jira/browse/QPID-3084
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause A clustered broker was not correctly detecting when a durable message was finished being saved in the durable store. Consequence: The cluster can hang when store is enabled and a new broker is added to the cluster. Fix: The clustered brokers now monitor for the completion of the message storage event. Result: The newly added broker is correctly notified of the completion of the message storage process, and will not hang.