RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 606368 - Missing /etc/sysconfig/cman template and documentation about changed cman quorum waiting
Summary: Missing /etc/sysconfig/cman template and documentation about changed cman quo...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: cluster
Version: 6.0
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Fabio Massimo Di Nitto
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-06-21 13:41 UTC by Milan Broz
Modified: 2016-04-26 14:28 UTC (History)
9 users (show)

Fixed In Version: cluster-3.0.12-11.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-11-10 19:59:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Milan Broz 2010-06-21 13:41:31 UTC
Description of problem:
When node starts and cman initscript detects timeout for quorum,
it should not bail out but still start other services.

Curretnly it fails in nok() function, leaving cluster in not functioning state.
e.g. clvmd cannot be started because DLM is not running and manual restart of cman is required.

Cluster is not operable later even if quorum is later regained.

Version-Release number of selected component (if applicable):
cman-3.0.12-6.el6

How reproducible:
Run cluster without quorum with enabled clvmd. dlm_controld is not running after boot.

I think nok() function should just print error and not exit the initscript,
this fixed the problem for me:

--- /etc/init.d/cman.old        2010-06-21 15:39:17.934274371 +0200
+++ /etc/init.d/cman    2010-06-21 15:39:23.926337177 +0200
@@ -180,7 +180,7 @@
        echo -e "$errmsg"
        failure
        echo
-       exit 1
+       return 1
 }
 
 none()

Comment 1 Fabio Massimo Di Nitto 2010-06-21 13:52:50 UTC
The quorum timeout behaviour is configurable.

By default a node that does not have quorum should not start other daemons (this was discussed a long time ago in the team).

In order to disable the timeout check, simply set:

CMAN_QUORUM_TIMEOUT=-1

that will skip the wait for quorum and continue as desired.

Fabio

Comment 2 Milan Broz 2010-06-21 14:20:40 UTC
Sorry, but this can cause serious problems, including support calls  because the behaviour changed.

clvmd initscript now can fail, becuase it expects that DLM is configured and it isn't when cman script fails on quorum timeout. You have to manually restart all services later.

Also, in my cluster it causes several nodes behave randomly - depends when they are restarted. Some of them have running dlm controld, some not.

>The quorum timeout behaviour is configurable.

1) where is it configurable? where is this documented?

> CMAN_QUORUM_TIMEOUT=-1
Is it 0 or -1? cman says "If CMAN_QUORUM_TIMEOUT is zero, quorum will be ignored."

2) this is different behaviour that in RHEL5, at least release note needed?

if we support both modes, there should be template in /etc/sysconfig/cman with short description of that CMAN_QUORUM_TIMEOUT parameter and other options and add it to cluster documentation.

Comment 3 RHEL Program Management 2010-06-21 14:23:23 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 4 Fabio Massimo Di Nitto 2010-06-21 14:29:17 UTC
(In reply to comment #2)
> Sorry, but this can cause serious problems, including support calls  because
> the behaviour changed.

Can you then please describe the scenario and problems you see and how to reproduce them in another BZ? and config files etc?

> >The quorum timeout behaviour is configurable.
> 
> 1) where is it configurable? where is this documented?
> 
> > CMAN_QUORUM_TIMEOUT=-1
> Is it 0 or -1? cman says "If CMAN_QUORUM_TIMEOUT is zero, quorum will be
> ignored."
> 

Sorry 0 and man page are correct, ignore the -1 my bad in this BZ.

> 2) this is different behaviour that in RHEL5, at least release note needed?
> 
> if we support both modes, there should be template in /etc/sysconfig/cman with
> short description of that CMAN_QUORUM_TIMEOUT parameter and other options and
> add it to cluster documentation.    

the idea of the template is ok. all options are already well documented in the init script. it's a matter of copy paste.

Comment 5 Milan Broz 2010-06-21 14:36:38 UTC
> > > CMAN_QUORUM_TIMEOUT=-1
> > Is it 0 or -1? cman says "If CMAN_QUORUM_TIMEOUT is zero, quorum will be
> > ignored."
> 
> Sorry 0 and man page are correct, ignore the -1 my bad in this BZ.

FYI I did not found it in man, just in initscript itself. Probably man page should containg these variables descriptions too.

Comment 8 Lon Hohberger 2010-09-13 15:36:36 UTC
I verified that the contents of /etc/sysconfig/cman are there and documented.

Comment 9 releng-rhel@redhat.com 2010-11-10 19:59:25 UTC
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.