Bug 595725 - cman init script is not consistent in checking daemons startup and introduces possible race conditions
cman init script is not consistent in checking daemons startup and introduces...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: cluster (Show other bugs)
6.0
All Linux
low Severity medium
: rc
: ---
Assigned To: Fabio Massimo Di Nitto
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-05-25 09:17 EDT by Fabio Massimo Di Nitto
Modified: 2011-05-19 09:03 EDT (History)
15 users (show)

See Also:
Fixed In Version: cluster-3.0.12-36.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-05-19 09:03:41 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Fabio Massimo Di Nitto 2010-05-25 09:17:22 EDT
when executed via init scripts:

cluster starts OK
clvmd starts fast enough that some of cluster internal bits are not set yet, resulting in clvmd failing to start.

There are 2 approaches to fix this issue:

1) clvmd init script could verify that cman is actually running and that dlm has completed its setup before invoking clvmd daemon.

2) clvmd daemon will need to do the same as #1 but internally.

A way to reproduce this issue (not that it doesn´t trigger often):

node1: start cman && sleep 10 && start clvmd <- stop here.
node2: start cman && start clvmd && sleep 5 && stop clvmd && stop cman <- loop forever.

repeat the script on node2 till clvmd will fail to start. There is no exact number of loops before it will happend, but it does eventually happen.

I have a test patch right now to address the issue via #1.
Comment 1 RHEL Product and Program Management 2010-05-25 09:36:59 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.
Comment 2 RHEL Product and Program Management 2010-07-15 10:04:40 EDT
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release. It has
been denied for the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **
Comment 4 Alasdair Kergon 2011-02-16 15:46:16 EST
For consistency:  either the scripts wait, or the daemons wait.

If the scripts wait, then the cman startup script shouldn't exit until it's ready.
(It's not the job of the clvmd script to include waiting loops for things it depends on: it shouldn't get started until they are ready for it!)

If the daemons do the waiting, then it's fine for the scripts to exit before everything is ready.  (That's probably more consistent with the future systemd approach too.)
Comment 5 Fabio Massimo Di Nitto 2011-02-21 12:28:23 EST
(In reply to comment #4)
> For consistency:  either the scripts wait, or the daemons wait.
> 
> If the scripts wait, then the cman startup script shouldn't exit until it's
> ready.
> (It's not the job of the clvmd script to include waiting loops for things it
> depends on: it shouldn't get started until they are ready for it!)
> 
> If the daemons do the waiting, then it's fine for the scripts to exit before
> everything is ready.  (That's probably more consistent with the future systemd
> approach too.)

I´ll fix this one in cman init script.

As for the daemon solution, I don´t think it´s worth doing it right now, because cman is going away and most of the daemons will change they way they start/or be started.
Comment 6 Fabio Massimo Di Nitto 2011-02-21 13:42:07 EST
As extra information, I am not able to reproduce the original problem anymore. Probably fixed as side effect of: rhbz#639018.

Patches to fix cman init and dlm_controld are being tested right now.
Comment 7 Fabio Massimo Di Nitto 2011-02-21 13:43:46 EST
Moving to 6.2.
Comment 12 errata-xmlrpc 2011-05-19 09:03:41 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0537.html

Note You need to log in before you can comment on or make changes to this bug.