Bug 606368
| Summary: | Missing /etc/sysconfig/cman template and documentation about changed cman quorum waiting | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Milan Broz <mbroz> |
| Component: | cluster | Assignee: | Fabio Massimo Di Nitto <fdinitto> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Cluster QE <mspqa-list> |
| Severity: | medium | Docs Contact: | |
| Priority: | low | ||
| Version: | 6.0 | CC: | agk, ccaulfie, cluster-maint, djansa, lhh, pvrabec, rpeterso, syeghiay, teigland |
| Target Milestone: | rc | Keywords: | Reopened |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | cluster-3.0.12-11.el6 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2010-11-10 19:59:25 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
The quorum timeout behaviour is configurable. By default a node that does not have quorum should not start other daemons (this was discussed a long time ago in the team). In order to disable the timeout check, simply set: CMAN_QUORUM_TIMEOUT=-1 that will skip the wait for quorum and continue as desired. Fabio Sorry, but this can cause serious problems, including support calls because the behaviour changed. clvmd initscript now can fail, becuase it expects that DLM is configured and it isn't when cman script fails on quorum timeout. You have to manually restart all services later. Also, in my cluster it causes several nodes behave randomly - depends when they are restarted. Some of them have running dlm controld, some not. >The quorum timeout behaviour is configurable. 1) where is it configurable? where is this documented? > CMAN_QUORUM_TIMEOUT=-1 Is it 0 or -1? cman says "If CMAN_QUORUM_TIMEOUT is zero, quorum will be ignored." 2) this is different behaviour that in RHEL5, at least release note needed? if we support both modes, there should be template in /etc/sysconfig/cman with short description of that CMAN_QUORUM_TIMEOUT parameter and other options and add it to cluster documentation. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. (In reply to comment #2) > Sorry, but this can cause serious problems, including support calls because > the behaviour changed. Can you then please describe the scenario and problems you see and how to reproduce them in another BZ? and config files etc? > >The quorum timeout behaviour is configurable. > > 1) where is it configurable? where is this documented? > > > CMAN_QUORUM_TIMEOUT=-1 > Is it 0 or -1? cman says "If CMAN_QUORUM_TIMEOUT is zero, quorum will be > ignored." > Sorry 0 and man page are correct, ignore the -1 my bad in this BZ. > 2) this is different behaviour that in RHEL5, at least release note needed? > > if we support both modes, there should be template in /etc/sysconfig/cman with > short description of that CMAN_QUORUM_TIMEOUT parameter and other options and > add it to cluster documentation. the idea of the template is ok. all options are already well documented in the init script. it's a matter of copy paste. > > > CMAN_QUORUM_TIMEOUT=-1
> > Is it 0 or -1? cman says "If CMAN_QUORUM_TIMEOUT is zero, quorum will be
> > ignored."
>
> Sorry 0 and man page are correct, ignore the -1 my bad in this BZ.
FYI I did not found it in man, just in initscript itself. Probably man page should containg these variables descriptions too.
I verified that the contents of /etc/sysconfig/cman are there and documented. Red Hat Enterprise Linux 6.0 is now available and should resolve the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you. |
Description of problem: When node starts and cman initscript detects timeout for quorum, it should not bail out but still start other services. Curretnly it fails in nok() function, leaving cluster in not functioning state. e.g. clvmd cannot be started because DLM is not running and manual restart of cman is required. Cluster is not operable later even if quorum is later regained. Version-Release number of selected component (if applicable): cman-3.0.12-6.el6 How reproducible: Run cluster without quorum with enabled clvmd. dlm_controld is not running after boot. I think nok() function should just print error and not exit the initscript, this fixed the problem for me: --- /etc/init.d/cman.old 2010-06-21 15:39:17.934274371 +0200 +++ /etc/init.d/cman 2010-06-21 15:39:23.926337177 +0200 @@ -180,7 +180,7 @@ echo -e "$errmsg" failure echo - exit 1 + return 1 } none()