Red Hat Bugzilla – Bug 223200
persistence of disabled Clusterservices
Last modified: 2010-10-22 03:56:33 EDT
Description of problem:
We have a RHEL4 u3/u4 Cluster with GFS6.1 For Maintance Reason we want disable
all Clusterservice so we can do some Admin Task ( upgrades, test, etc ) and also
a test boot. All disable Clusterservices should be down, also on a reboot until
we reenable it. With clusvcadm -d it dos not work!!
How can we temp. disable Clusterservices also over a reboot?
Version-Release number of selected component (if applicable):
Disable a Clusterservice and reboot the Cluster. Service will not stay disabled.
clusvcadm -d "Servicename"
Service should stay disabled
--> see Red Hat Service Request: 1204560
There's an autostart flag for this purpose; set it in the configuration:
<service name="foo" autostart="0"/>
The service will not autostart. Dynamic states (e.g. disabled, started,
stopped, etc.) are not persistent in RHEL4 or RHEL5.
Side note, it might be fine to have Conga or system-config-cluster change this
on the fly if desired during a 'disable' operation.
Given that there is no coherent shared state persistence like there was on RHCS3
(e.g. using a shared disk), making persistent states "work" is not possible to
do correctly across cluster outages without introducing new requirements in to
the system and/or breaking backward compatibility (i.e. rolling upgrade):
3 nodes, 1 service running on node 2
Stop node 1 (node 1 thinks it's enabled)
Disable service (nodes 2, 3 think it's disabled)
Stop nodes 2, 3
Start nodes 1, 2
Node 1 thinks the service is enabled.
Node 2 thinks the service is disabled.
Unless the clocks are synchronized (which is not currently a requirement for
RHCS), the correct state of the service can not be determined. Sometime in the
future, the states of services will likely be stored in AIS checkpoints (also
not persistent across cluster outages).
With the caveat that the state will be wrong (sometimes) after cluster
transitions, it is possible to implement.
Ok got it, but
we are quite used to persistent states of Clusterservices. As we run a lot of
SAP HA Clusters on TruCluster we are in the process of migrating them to Linux
and there this feature is quite important for us. Do you think there is a way to
reenable the persistence or some kind of workaround.
BWT: For operation processes and maintenance it is also very important to
disable a service and be sure it won't come back again even after cluster restart.
It's possible to make it work, but there's no way to guarantee the correctness
of the persistent "disabled" state across cluster transitions without
introducing additional requirements (notably, time synchronization and use of a
shared partition on the SAN to store the states, which is what I think
TruClusters does, but I could be mistaken).
Even so, it is not difficult to make it work in most cases; all nodes can just
record which services are disabled and mark them disabled on startup. However,
there will be cases where a disagreement between states will have a 50/50 chance
of putting the service in the wrong state.
Ok, thats sounds great.
I think the most of them we have already in place, ntp is in place and we have
also a "shared root" Cluster ( like TruCluster ). That mean when you use a place
in the root Filesytem to record the state, all nodes have access to them.
Are we need some additional Software ( patches or new Softwarepackages ) or is
this only a config thing?
This will require a configuration change.
I am now implementing this feature.
One design way to do this:
Use callbacks in vf_init() *if and only if* cluster.conf has
* On init, read the state prior to calling rg_init.
* If state == failed or disabled, switch from stopped to the correct state in
init_rg (?) or somewhere nearby
* All other states should be cleared (unlink file) on init.
This sort of coincides with another feature I implemented awhile ago but never
applied which mirrors resource tree states on-disk.
Both this feature and the one noted in comment #8 could be rather destabilizing.
Also, because the states are not guaranteed consistent across cluster
transitions, it might be better to try and make the UI and/or clusvcadm handle this.
For example, it would be trivial to add 'reconfig' support to the 'service'
resource agent - thereby eliminating the need to restart the service as a result
of a reconfiguration.
This would then allow a user to flip the 'autostart' flag as part of a more
robust 'disable' operation without affecting the service state.
Reconfiguration support is going out with 5.1; you can now change the autostart
flag without the service bouncing as a result.
So, in order to have a state persisted across transitions, one can:
- set autostart to 0
- disable the service
As this currently stands, we are unlikely to put the above requested feature in
to RHEL4. Porting the reconfiguration flags to RHEL4 may be possible, however.