Bug 223200 - persistence of disabled Clusterservices
Summary: persistence of disabled Clusterservices
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: rgmanager
Version: 4
Hardware: All
OS: All
medium
medium
Target Milestone: ---
Assignee: Lon Hohberger
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-01-18 12:25 UTC by Michael Hagmann
Modified: 2018-10-19 19:52 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-07-31 18:07:26 UTC
Embargoed:


Attachments (Terms of Use)

Description Michael Hagmann 2007-01-18 12:25:51 UTC
Description of problem:

We have a RHEL4 u3/u4 Cluster with GFS6.1 For Maintance Reason we want disable
all Clusterservice so we can do some Admin Task ( upgrades, test, etc ) and also
a test boot. All disable Clusterservices should be down, also on a reboot until
we reenable it. With clusvcadm -d it dos not work!!
 
How can we temp. disable Clusterservices also over a reboot? 


Version-Release number of selected component (if applicable):

rgmanager-1.9.54-1

How reproducible:

Disable a Clusterservice and reboot the Cluster. Service will not stay disabled.

clusvcadm -d "Servicename"

Actual results:


Expected results:

Service should stay disabled

Additional info:


--> see Red Hat Service Request: 1204560

Comment 1 Lon Hohberger 2007-01-18 14:28:40 UTC
There's an autostart flag for this purpose; set it in the configuration:

<service name="foo" autostart="0"/>

The service will not autostart.  Dynamic states (e.g. disabled, started,
stopped, etc.) are not persistent in RHEL4 or RHEL5.

Comment 2 Lon Hohberger 2007-01-18 15:28:30 UTC
Side note, it might be fine to have Conga or system-config-cluster change this
on the fly if desired during a 'disable' operation.

More explanation:

Given that there is no coherent shared state persistence like there was on RHCS3
(e.g. using a shared disk), making persistent states "work" is not possible to
do correctly across cluster outages without introducing new requirements in to
the system and/or breaking backward compatibility (i.e. rolling upgrade):

Ex:

3 nodes, 1 service running on node 2
Stop node 1 (node 1 thinks it's enabled)
Disable service (nodes 2, 3 think it's disabled)
Stop nodes 2, 3
Start nodes 1, 2

Node 1 thinks the service is enabled.
Node 2 thinks the service is disabled.

Unless the clocks are synchronized (which is not currently a requirement for
RHCS), the correct state of the service can not be determined.  Sometime in the
future, the states of services will likely be stored in AIS checkpoints (also
not persistent across cluster outages).

With the caveat that the state will be wrong (sometimes) after cluster
transitions, it is possible to implement.

Comment 3 Michael Hagmann 2007-01-18 16:15:31 UTC
Ok got it, but

we are quite used to persistent states of Clusterservices. As we run a lot of
SAP HA Clusters on TruCluster we are in the process of migrating them to Linux
and there this feature is quite important for us. Do you think there is a way to
reenable the persistence or some kind of workaround.
BWT: For operation processes and maintenance it is also very important to
disable a service and be sure it won't come back again even after cluster restart.

Regards Mike.



Comment 4 Lon Hohberger 2007-01-18 17:04:40 UTC
It's possible to make it work, but there's no way to guarantee the correctness
of the persistent "disabled" state across cluster transitions without
introducing additional requirements (notably, time synchronization and use of a
shared partition on the SAN to store the states, which is what I think
TruClusters does, but I could be mistaken).

Even so, it is not difficult to make it work in most cases; all nodes can just
record which services are disabled and mark them disabled on startup. However,
there will be cases where a disagreement between states will have a 50/50 chance
of putting the service in the wrong state.

Comment 5 Michael Hagmann 2007-01-18 19:18:18 UTC
Ok, thats sounds great.

I think the most of them we have already in place, ntp is in place and we have
also a "shared root" Cluster ( like TruCluster ). That mean when you use a place
in the root Filesytem to record the state, all nodes have access to them.

Are we need some additional Software ( patches or new Softwarepackages ) or is
this only a config thing?

thx mike

Comment 6 Lon Hohberger 2007-05-29 20:45:48 UTC
This will require a configuration change.

I am now implementing this feature.

Comment 7 Lon Hohberger 2007-05-29 21:40:43 UTC
One design way to do this:

Use callbacks in vf_init() *if and only if* cluster.conf has
/cluster/rm/@state_path set

* On init, read the state prior to calling rg_init.
* If state == failed or disabled, switch from stopped to the correct state in
init_rg (?) or somewhere nearby
* All other states should be cleared (unlink file) on init.


Comment 8 Lon Hohberger 2007-05-29 21:46:28 UTC
This sort of coincides with another feature I implemented awhile ago but never
applied which mirrors resource tree states on-disk.

Comment 9 Lon Hohberger 2007-06-28 16:29:54 UTC
Both this feature and the one noted in comment #8 could be rather destabilizing.
 Also, because the states are not guaranteed consistent across cluster
transitions, it might be better to try and make the UI and/or clusvcadm handle this.

For example, it would be trivial to add 'reconfig' support to the 'service'
resource agent - thereby eliminating the need to restart the service as a result
of a reconfiguration.

This would then allow a user to flip the 'autostart' flag as part of a more
robust 'disable' operation without affecting the service state.

Comment 10 Lon Hohberger 2007-07-31 18:07:26 UTC
Reconfiguration support is going out with 5.1; you can now change the autostart
flag without the service bouncing as a result.

So, in order to have a state persisted across transitions, one can:
  - set autostart to 0
  - disable the service

As this currently stands, we are unlikely to put the above requested feature in
to RHEL4.  Porting the reconfiguration flags to RHEL4 may be possible, however.


Note You need to log in before you can comment on or make changes to this bug.