Description of problem: Running perftest in a loop as described below crashes all members of a cluster except the directly connected one with: invalid-argument: anonymous.a96d061f-206e-42d5-9dfa-455a5fd904e0: confirmed < (1+0) but only sent < (0+0) Version-Release number of selected component (if applicable): How reproducible: easy Steps to Reproduce: 1. run a qpidd cluster 2. while true; do src/tests/perftest & sleep 2 ; done Watch for the other cluster members to die Actual results: All but directly connected node exit with error. Expected results: no failure Additional info: The error is always confirmed 1 but sent 0. Some of the perftests will exit with Controller exception: Bad report: Sequence error: expected n==81970 but got 81985 This is expected as we're running multiple perftests using the same queues. I suspect that the broker error is related to the sub-done queue, and the fact that the controller is throwing an exception which shuts down the client as soon as it gets the bad sub-done message.
Created attachment 347417 [details] issue reproducer
The issue is still present in -15 and also -16 as far as I can see (transcript of the attached reproducer from above). [root@hp-dl360g5-01 bz505210]# ./run.sh aisexec (pid 1900) is running... Stopping OpenAIS daemon (aisexec): [ OK ] Starting OpenAIS daemon (aisexec): [ OK ] client (perftest_mod) compile, ecode:0 client[s] ready Client[s] compile .ready starting brokers in the cluster:...done tcp 0 0 0.0.0.0:5672 0.0.0.0:* LISTEN 2217/qpidd tcp 0 0 0.0.0.0:10001 0.0.0.0:* LISTEN 2233/qpidd tcp 0 0 0.0.0.0:10002 0.0.0.0:* LISTEN 2251/qpidd broker[s] running (pids:2217 2233 2251 ) perftest_mod loop started: 1/2(4):LLLL|.abort-cluster tcp 0 0 0.0.0.0:5672 0.0.0.0:* LISTEN 2217/qpidd test broker[s]:2217 2233 2251 done stopping brokers... FFFFOK ERROR: Client[s] failed! (ecodes:) qpidd brokers failed with searched message, cnt:4 | qpidd_1.log:2009-jun-11 10:40:14 debug Exception constructed: anonymous.8a43df1e-6590-4e85-bcc4-9b12f77eeb5d: confirmed < (1+0) but only sent < (0+0) (qpid/SessionState.cpp:163) | qpidd_1.log:2009-jun-11 10:40:14 error Execution exception: invalid-argument: anonymous.8a43df1e-6590-4e85-bcc4-9b12f77eeb5d: confirmed < (1+0) but only sent < (0+0) (qpid/SessionState.cpp:163) | qpidd_2.log:2009-jun-11 10:40:14 debug Exception constructed: anonymous.8a43df1e-6590-4e85-bcc4-9b12f77eeb5d: confirmed < (1+0) but only sent < (0+0) (qpid/SessionState.cpp:163) | qpidd_2.log:2009-jun-11 10:40:14 error Execution exception: invalid-argument: anonymous.8a43df1e-6590-4e85-bcc4-9b12f77eeb5d: confirmed < (1+0) but only sent < (0+0) (qpid/SessionState.cpp:163) Broker status: qpidd_0.ecode:0 qpidd_1.ecode:0 qpidd_2.ecode:0 qpidd_0.ecode: 2009-06-11 10:40:20.000000000 -0400 qpidd_1.ecode: 2009-06-11 10:40:14.000000000 -0400 qpidd_2.ecode: 2009-06-11 10:40:14.000000000 -0400 qpidd-0.5.752581-16.el5 qpidd-cluster-0.5.752581-16.el5 openais-0.80.3-22.el5_3.7 Test Summary: TEST FAILED !!! (4 failures) dur: 36 secs
The failing data stored on mrg3.lab.bos.redhat.com:/root/bz505210_fails_rhel53i_090611.tar.bz2
This was a regression since qpidd-0.5.752581-10.el5 and is fixed in qpidd-0.5.752581-17.el5 Error was due to indeterminacy in handling multiple consumers on a queue. It was also observable by starting several receiver programs then running sender in a loop. E.g. qpid-config add queue test-queue ./src/tests/receiver > /dev/null & ./src/tests/receiver > /dev/null & ./src/tests/receiver > /dev/null & ./src/tests/receiver > /dev/null & while ./src/tests/sender < inputs ; do true; done This would eventually cause nodes other than 5672 to shut themselves down.
The issue has been fixed on RHEL 5.3 i386 / x86_64 on packages qpid*-0.5.752581-17.el5. -> VERIFIED
Created attachment 347662 [details] Patch to fix the issue patch also comitted to qpid trunk r784263
Fixed and verfified; closing.