Bug 557138 - Management updates not predictable for cluster.
Management updates not predictable for cluster.
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
Development
All Linux
urgent Severity urgent
: 1.3
: ---
Assigned To: Alan Conway
Jan Sarenik
:
Depends On:
Blocks: 501015
  Show dependency treegraph
 
Reported: 2010-01-20 09:48 EST by Alan Conway
Modified: 2010-10-20 07:29 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-10-20 07:29:19 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Alan Conway 2010-01-20 09:48:29 EST
Description of problem:

Management updates are triggered by a timer. They are not predictable for the cluster and so can cause cluster shut-downs and inconsistent message delivery.

We have a hack in place that suppresses exceptions when the session receives
completions for transfers not yet sent (which is the usual manifestation of the
unpredictability). I.e. we have in essence disabled consistency checking for
management sessions. This solved immediate problems but would quickly stop
working if sessions/connections could be used for management and other things
(as will be more likely with QMFv2 where using management becomes quite
straightforward).
Comment 1 Alan Conway 2010-01-27 17:32:18 EST
This is fixed by the following revisions:

903826 Fix cluster elder calculation to ensure unique elder.
903869 QPID_2634 Management updates in timer create inconsistencies in a cluster.
903868 Test for management + cluster: run management tools in parallel with regular clients.
903867 Cluster implementation of PeriodicTimer.
903866 Added PeriodicTimer interface for periodic tasks that need cluster synchronization.
903864 In clustered broker: move construction of broker::Connections to the cluster dispatch thread.

It's been pointed out that the current solution is not well integrated with the existing Timer class.
A follow up rename/refactor will be done to:
 - define a single abstract Timer interface with named tasks.
 - implementations Local\Timer and ClusterTimer
 - two broker accessors returning Timer: \
  - getLocalTimer always returns a LocalTimer instance
  - getClusterTimer returns a ClusterTImer in a cluster, else the same as getLocalTimer
  - rework management timer initialization to happen after plugin init, drop DelegatedTimer
Comment 2 Frantisek Reznicek 2010-02-10 05:06:55 EST
Hello Alan, could you possibly specify the recommended way we should test it, please? It seems difficult.

putting NEEDINFO.
Comment 3 Alan Conway 2010-02-22 11:35:34 EST
Can't easily reproduce this on 1.2 because the hack mentioned above suppresses the error.

You can reproduce on revision 903717 (before the fixes):
 - start a cluster broker (one is enough)
 - run qpid-queue-stats -a host:port
 - wait for management interval to pass.

The broker will exit with:
 - critical Modified cluster state outside of cluster context
Comment 4 Jan Sarenik 2010-03-30 06:18:21 EDT
Reproduced on manually-built qpid-903717
Comment 5 Jan Sarenik 2010-03-30 06:52:31 EDT
Verified on qpid-cpp-server-cluster-0.7.916826-2.el5, RHEL5 i386
Comment 6 Jan Sarenik 2010-03-30 10:00:14 EDT
Verified on qpid-cpp-server-cluster-0.7.916826-2.el5, RHEL5 x86_64

Note You need to log in before you can comment on or make changes to this bug.