Bug 728041 - ccs --start/--stop should not change chkconfig services
ccs --start/--stop should not change chkconfig services
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: ricci (Show other bugs)
6.1
Unspecified Linux
low Severity medium
: rc
: ---
Assigned To: Chris Feist
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2011-08-03 21:32 EDT by Etsuji Nakai
Modified: 2011-09-13 11:37 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-08-08 17:56:58 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Etsuji Nakai 2011-08-03 21:32:04 EDT
Description of problem:
When starting and stopping the cluster using ccs --start/--stop, chkconfig services are changed on/off accordingly. But it'd be better not to change the chkconfig status. It should just start/stop the services.

The problem is regarding the following scenario.

In the customer's cluster:

- They start/stop the cluster with starting/stopping the services directly.(Not using ccs at the moment.)
- They set chkconfig off for the cluster services (cman, rgmanger etc.)
- They force-reboot the failed node with the fence device.

In this setting, when a node is force-rebooted with some problem such as kernel panic, for example, the node doesn't automatically join the cluster. Then the customer logs-in to the node and investigates the problem. When they are sure that the problem is resolved, they start the cluster services on this node again.

Now, the problem is that this customer cannot adopt the new ccs tool for the cluster operation. Under the ccs operation, when the failed node is force-rebooted, it automatically tries to join the cluster as chkcfonig is on although the potential problem is not yet investigated and resolved by the customer. It's not desirable to get the failed node to join the cluster automatically.
Comment 3 Chris Feist 2011-08-08 17:56:58 EDT
Unfortunately, this works as designed (and I believe this is exactly how luci does it as well).  When you stop a node, you're permanently stopping it until you want to manually bring it back into service.

When you manually stop a node, you need to manually start it back up again.

There may be room for some documentation improvement.  I'll take a look at it and see if I need to file a bug to specify that the --start & --stop enable/disable cluster nodes.
Comment 4 Etsuji Nakai 2011-08-08 18:14:08 EDT
No, the problem is not that I need to manually start the node.

The problem is that once I started the node with --start, it automatically starts again when it is force-rebooted by the fence device. Don't you think it is a bad design that a failed (force-rebooted) node automatically joins the cluster again?
Comment 5 Chris Feist 2011-08-08 18:58:57 EDT
I think the issue is that --start/--stop should really be called --enable&start --disable&stop.  ccs just calls ricci/clustermon and ricci/clustermon executes the command.  Currently there is no command in ricci/clustermon for only starting or stopping without turning the service on/off.

This will definitely work differently on RHEL7, but right now this behavior is consistent with how things worked in RHEL4/5.
Comment 6 Etsuji Nakai 2011-08-08 19:50:55 EDT
Yes, it makes sense. Logically, there could be the following combinations.

--enbale&start / --disable&stop => service start/stop & chkconfig on/off
--enable / --disable => chkconfig on/off
--start / --stop => service start/stop

I hope RHEL7 will allow customers to choose their preferred operation. Thanks.

Note You need to log in before you can comment on or make changes to this bug.