Description of problem: By modifying cluster_tests.py (patch attached) the test below fails consistently with an invalid-argument error. The patch causes the test to do more broker kills & restarts. This might be the same issue as bug 654872 Version-Release number of selected component (if applicable): trunk r1036871 How reproducible: easy Steps to Reproduce: make check TESTS=run_cluster_tests CLUSTER_TESTS="*test_management -DDURATION=4" Actual results: broker exits with invalid-argument error Expected results: no error Additional info:
Created attachment 461562 [details] Patch to cluster_tests.py to reproduce the problem.
Fixed on trunk by the following 3 revisions: ------------------------------------------------------------------------ r1041181 | aconway | 2010-12-01 16:33:12 -0500 (Wed, 01 Dec 2010) | 8 lines Modified cluster_tests causes broker shut down with invalid-argument error. Described in https://bugzilla.redhat.com/show_bug.cgi?id=655078. The management agent's deleted-object list was not being replicated to new members joining the cluster, so management generated fewer deleted object notifications on the newer member, causing it to fail with an invalid-argument error. The list is now being replicated correctly. ------------------------------------------------------------------------ r1041180 | aconway | 2010-12-01 16:32:52 -0500 (Wed, 01 Dec 2010) | 6 lines Add missing call to Message::setTimestamp in ManagementAgent::sendBufferLH. Without this, messages generated here will not be expired consistently in a cluster which may cause a broker to become inconsistent and exit with an invalid-argument error. ------------------------------------------------------------------------ r1041179 | aconway | 2010-12-01 16:32:43 -0500 (Wed, 01 Dec 2010) | 7 lines Enable cluster-safe assertions on transition to CATCHUP Delaying until READY was causing multiple clientConnect management events to be raised, because broker::Connection::setUserId relies on sys::isCluster to avoid producing duplicate events with cluster::Connection::announce ------------------------------------------------------------------------
Complete fix to this also requires: ------------------------------------------------------------------------ r1043621 | aconway | 2010-12-08 14:21:05 -0500 (Wed, 08 Dec 2010) | 9 lines Defer update of managaement agent to end of update process. Move updating of the management agent to the very end of the update process, after all objects used by the update process itself have been deleted. Before the fix deletions from the update process itself (deleting the qpid.cluster-update queue and its binding to the default exchange) were sporadically appearing as extra delete messages on the updatees management agent and causing inconsistency. ------------------------------------------------------------------------
And: ------------------------------------------------------------------------ r1041582 | kgiusti | 2010-12-02 16:03:42 -0500 (Thu, 02 Dec 2010) | 2 lines bugfix in deleted obj import/export api ------------------------------------------------------------------------
VERIFIED RHEL 5.6 i386 / x86_64: packages used qpid-cpp-mrg-0.7.946106-27.el5.src.rpm qpid-tools-0.7.946106-12.el5.src.rpm openais-0.80.6-28.el5 --> VERIFIED