Description of problem: Management updates are triggered by a timer. They are not predictable for the cluster and so can cause cluster shut-downs and inconsistent message delivery. We have a hack in place that suppresses exceptions when the session receives completions for transfers not yet sent (which is the usual manifestation of the unpredictability). I.e. we have in essence disabled consistency checking for management sessions. This solved immediate problems but would quickly stop working if sessions/connections could be used for management and other things (as will be more likely with QMFv2 where using management becomes quite straightforward).
This is fixed by the following revisions: 903826 Fix cluster elder calculation to ensure unique elder. 903869 QPID_2634 Management updates in timer create inconsistencies in a cluster. 903868 Test for management + cluster: run management tools in parallel with regular clients. 903867 Cluster implementation of PeriodicTimer. 903866 Added PeriodicTimer interface for periodic tasks that need cluster synchronization. 903864 In clustered broker: move construction of broker::Connections to the cluster dispatch thread. It's been pointed out that the current solution is not well integrated with the existing Timer class. A follow up rename/refactor will be done to: - define a single abstract Timer interface with named tasks. - implementations Local\Timer and ClusterTimer - two broker accessors returning Timer: \ - getLocalTimer always returns a LocalTimer instance - getClusterTimer returns a ClusterTImer in a cluster, else the same as getLocalTimer - rework management timer initialization to happen after plugin init, drop DelegatedTimer
Hello Alan, could you possibly specify the recommended way we should test it, please? It seems difficult. putting NEEDINFO.
Can't easily reproduce this on 1.2 because the hack mentioned above suppresses the error. You can reproduce on revision 903717 (before the fixes): - start a cluster broker (one is enough) - run qpid-queue-stats -a host:port - wait for management interval to pass. The broker will exit with: - critical Modified cluster state outside of cluster context
Reproduced on manually-built qpid-903717
Verified on qpid-cpp-server-cluster-0.7.916826-2.el5, RHEL5 i386
Verified on qpid-cpp-server-cluster-0.7.916826-2.el5, RHEL5 x86_64