Bug 606368 - Missing /etc/sysconfig/cman template and documentation about changed cman quorum waiting
Missing /etc/sysconfig/cman template and documentation about changed cman quo...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: cluster (Show other bugs)
6.0
All Linux
low Severity medium
: rc
: ---
Assigned To: Fabio Massimo Di Nitto
Cluster QE
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-06-21 09:41 EDT by Milan Broz
Modified: 2016-04-26 10:28 EDT (History)
9 users (show)

See Also:
Fixed In Version: cluster-3.0.12-11.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-11-10 14:59:25 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Milan Broz 2010-06-21 09:41:31 EDT
Description of problem:
When node starts and cman initscript detects timeout for quorum,
it should not bail out but still start other services.

Curretnly it fails in nok() function, leaving cluster in not functioning state.
e.g. clvmd cannot be started because DLM is not running and manual restart of cman is required.

Cluster is not operable later even if quorum is later regained.

Version-Release number of selected component (if applicable):
cman-3.0.12-6.el6

How reproducible:
Run cluster without quorum with enabled clvmd. dlm_controld is not running after boot.

I think nok() function should just print error and not exit the initscript,
this fixed the problem for me:

--- /etc/init.d/cman.old        2010-06-21 15:39:17.934274371 +0200
+++ /etc/init.d/cman    2010-06-21 15:39:23.926337177 +0200
@@ -180,7 +180,7 @@
        echo -e "$errmsg"
        failure
        echo
-       exit 1
+       return 1
 }
 
 none()
Comment 1 Fabio Massimo Di Nitto 2010-06-21 09:52:50 EDT
The quorum timeout behaviour is configurable.

By default a node that does not have quorum should not start other daemons (this was discussed a long time ago in the team).

In order to disable the timeout check, simply set:

CMAN_QUORUM_TIMEOUT=-1

that will skip the wait for quorum and continue as desired.

Fabio
Comment 2 Milan Broz 2010-06-21 10:20:40 EDT
Sorry, but this can cause serious problems, including support calls  because the behaviour changed.

clvmd initscript now can fail, becuase it expects that DLM is configured and it isn't when cman script fails on quorum timeout. You have to manually restart all services later.

Also, in my cluster it causes several nodes behave randomly - depends when they are restarted. Some of them have running dlm controld, some not.

>The quorum timeout behaviour is configurable.

1) where is it configurable? where is this documented?

> CMAN_QUORUM_TIMEOUT=-1
Is it 0 or -1? cman says "If CMAN_QUORUM_TIMEOUT is zero, quorum will be ignored."

2) this is different behaviour that in RHEL5, at least release note needed?

if we support both modes, there should be template in /etc/sysconfig/cman with short description of that CMAN_QUORUM_TIMEOUT parameter and other options and add it to cluster documentation.
Comment 3 RHEL Product and Program Management 2010-06-21 10:23:23 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.
Comment 4 Fabio Massimo Di Nitto 2010-06-21 10:29:17 EDT
(In reply to comment #2)
> Sorry, but this can cause serious problems, including support calls  because
> the behaviour changed.

Can you then please describe the scenario and problems you see and how to reproduce them in another BZ? and config files etc?

> >The quorum timeout behaviour is configurable.
> 
> 1) where is it configurable? where is this documented?
> 
> > CMAN_QUORUM_TIMEOUT=-1
> Is it 0 or -1? cman says "If CMAN_QUORUM_TIMEOUT is zero, quorum will be
> ignored."
> 

Sorry 0 and man page are correct, ignore the -1 my bad in this BZ.

> 2) this is different behaviour that in RHEL5, at least release note needed?
> 
> if we support both modes, there should be template in /etc/sysconfig/cman with
> short description of that CMAN_QUORUM_TIMEOUT parameter and other options and
> add it to cluster documentation.    

the idea of the template is ok. all options are already well documented in the init script. it's a matter of copy paste.
Comment 5 Milan Broz 2010-06-21 10:36:38 EDT
> > > CMAN_QUORUM_TIMEOUT=-1
> > Is it 0 or -1? cman says "If CMAN_QUORUM_TIMEOUT is zero, quorum will be
> > ignored."
> 
> Sorry 0 and man page are correct, ignore the -1 my bad in this BZ.

FYI I did not found it in man, just in initscript itself. Probably man page should containg these variables descriptions too.
Comment 8 Lon Hohberger 2010-09-13 11:36:36 EDT
I verified that the contents of /etc/sysconfig/cman are there and documented.
Comment 9 releng-rhel@redhat.com 2010-11-10 14:59:25 EST
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.

Note You need to log in before you can comment on or make changes to this bug.