557138 – Management updates not predictable for cluster.

Bug 557138 - Management updates not predictable for cluster.

Summary: Management updates not predictable for cluster.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise MRG
Classification:	Red Hat
Component:	qpid-cpp
Sub Component:
Version:	Development
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	1.3
Target Release:	---
Assignee:	Alan Conway
QA Contact:	Jan Sarenik
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	501015
TreeView+	depends on / blocked

Reported:	2010-01-20 14:48 UTC by Alan Conway
Modified:	2010-10-20 11:29 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2010-10-20 11:29:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Alan Conway 2010-01-20 14:48:29 UTC

Description of problem:

Management updates are triggered by a timer. They are not predictable for the cluster and so can cause cluster shut-downs and inconsistent message delivery.

We have a hack in place that suppresses exceptions when the session receives
completions for transfers not yet sent (which is the usual manifestation of the
unpredictability). I.e. we have in essence disabled consistency checking for
management sessions. This solved immediate problems but would quickly stop
working if sessions/connections could be used for management and other things
(as will be more likely with QMFv2 where using management becomes quite
straightforward).

Comment 1 Alan Conway 2010-01-27 22:32:18 UTC

This is fixed by the following revisions:

903826 Fix cluster elder calculation to ensure unique elder.
903869 QPID_2634 Management updates in timer create inconsistencies in a cluster.
903868 Test for management + cluster: run management tools in parallel with regular clients.
903867 Cluster implementation of PeriodicTimer.
903866 Added PeriodicTimer interface for periodic tasks that need cluster synchronization.
903864 In clustered broker: move construction of broker::Connections to the cluster dispatch thread.

It's been pointed out that the current solution is not well integrated with the existing Timer class.
A follow up rename/refactor will be done to:
 - define a single abstract Timer interface with named tasks.
 - implementations Local\Timer and ClusterTimer
 - two broker accessors returning Timer: \
  - getLocalTimer always returns a LocalTimer instance
  - getClusterTimer returns a ClusterTImer in a cluster, else the same as getLocalTimer
  - rework management timer initialization to happen after plugin init, drop DelegatedTimer

Comment 2 Frantisek Reznicek 2010-02-10 10:06:55 UTC

Hello Alan, could you possibly specify the recommended way we should test it, please? It seems difficult.

putting NEEDINFO.

Comment 3 Alan Conway 2010-02-22 16:35:34 UTC

Can't easily reproduce this on 1.2 because the hack mentioned above suppresses the error.

You can reproduce on revision 903717 (before the fixes):
 - start a cluster broker (one is enough)
 - run qpid-queue-stats -a host:port
 - wait for management interval to pass.

The broker will exit with:
 - critical Modified cluster state outside of cluster context

Comment 4 Jan Sarenik 2010-03-30 10:18:21 UTC

Reproduced on manually-built qpid-903717

Comment 5 Jan Sarenik 2010-03-30 10:52:31 UTC

Verified on qpid-cpp-server-cluster-0.7.916826-2.el5, RHEL5 i386

Comment 6 Jan Sarenik 2010-03-30 14:00:14 UTC

Verified on qpid-cpp-server-cluster-0.7.916826-2.el5, RHEL5 x86_64

Note You need to log in before you can comment on or make changes to this bug.