Bug 595725

Summary: cman init script is not consistent in checking daemons startup and introduces possible race conditions
Product: Red Hat Enterprise Linux 6 Reporter: Fabio Massimo Di Nitto <fdinitto>
Component: clusterAssignee: Fabio Massimo Di Nitto <fdinitto>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: low    
Version: 6.0CC: agk, ccaulfie, cluster-maint, dwysocha, heinzm, jbrassow, jkortus, joe.thornber, lhh, mbroz, prockai, rpeterso, syeghiay, teigland, toarney
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: cluster-3.0.12-36.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-19 13:03:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Fabio Massimo Di Nitto 2010-05-25 13:17:22 UTC
when executed via init scripts:

cluster starts OK
clvmd starts fast enough that some of cluster internal bits are not set yet, resulting in clvmd failing to start.

There are 2 approaches to fix this issue:

1) clvmd init script could verify that cman is actually running and that dlm has completed its setup before invoking clvmd daemon.

2) clvmd daemon will need to do the same as #1 but internally.

A way to reproduce this issue (not that it doesn´t trigger often):

node1: start cman && sleep 10 && start clvmd <- stop here.
node2: start cman && start clvmd && sleep 5 && stop clvmd && stop cman <- loop forever.

repeat the script on node2 till clvmd will fail to start. There is no exact number of loops before it will happend, but it does eventually happen.

I have a test patch right now to address the issue via #1.

Comment 1 RHEL Program Management 2010-05-25 13:36:59 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 2 RHEL Program Management 2010-07-15 14:04:40 UTC
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release. It has
been denied for the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 4 Alasdair Kergon 2011-02-16 20:46:16 UTC
For consistency:  either the scripts wait, or the daemons wait.

If the scripts wait, then the cman startup script shouldn't exit until it's ready.
(It's not the job of the clvmd script to include waiting loops for things it depends on: it shouldn't get started until they are ready for it!)

If the daemons do the waiting, then it's fine for the scripts to exit before everything is ready.  (That's probably more consistent with the future systemd approach too.)

Comment 5 Fabio Massimo Di Nitto 2011-02-21 17:28:23 UTC
(In reply to comment #4)
> For consistency:  either the scripts wait, or the daemons wait.
> 
> If the scripts wait, then the cman startup script shouldn't exit until it's
> ready.
> (It's not the job of the clvmd script to include waiting loops for things it
> depends on: it shouldn't get started until they are ready for it!)
> 
> If the daemons do the waiting, then it's fine for the scripts to exit before
> everything is ready.  (That's probably more consistent with the future systemd
> approach too.)

I´ll fix this one in cman init script.

As for the daemon solution, I don´t think it´s worth doing it right now, because cman is going away and most of the daemons will change they way they start/or be started.

Comment 6 Fabio Massimo Di Nitto 2011-02-21 18:42:07 UTC
As extra information, I am not able to reproduce the original problem anymore. Probably fixed as side effect of: rhbz#639018.

Patches to fix cman init and dlm_controld are being tested right now.

Comment 7 Fabio Massimo Di Nitto 2011-02-21 18:43:46 UTC
Moving to 6.2.

Comment 12 errata-xmlrpc 2011-05-19 13:03:41 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0537.html