RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 614104 - Starting or stopping corosync blocks cman from starting or stopping - corosync part
Summary: Starting or stopping corosync blocks cman from starting or stopping - corosyn...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: corosync
Version: 6.0
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Jan Friesse
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On: 613870
Blocks: 617234
TreeView+ depends on / blocked
 
Reported: 2010-07-13 17:04 UTC by Steven Dake
Modified: 2016-04-26 13:39 UTC (History)
9 users (show)

Fixed In Version: corosync-1.2.3-23.el6
Doc Type: Bug Fix
Doc Text:
Clone Of: 613870
: 617234 (view as bug list)
Environment:
Last Closed: 2011-05-19 14:24:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Proposed patch for first part of problem (2.62 KB, patch)
2010-07-22 15:45 UTC, Jan Friesse
no flags Details | Diff
Proposed patch for first part - take 2 (5.42 KB, patch)
2010-07-28 14:10 UTC, Jan Friesse
no flags Details | Diff
Proposed patch for second problem (978 bytes, patch)
2010-07-28 14:12 UTC, Jan Friesse
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0764 0 normal SHIPPED_LIVE corosync bug fix update 2011-05-18 18:08:44 UTC

Description Steven Dake 2010-07-13 17:04:00 UTC
+++ This bug was initially created as a clone of Bug #613870 +++

Description of problem:

Two variants on the same issue:

First:

If you start corosync manually and then try to start cman, it fails with the error "Starting cman... corosync died: Error, reason code is 1 [FAILED]". If you then stop corosync and again try to start cman, it starts properly.

Second:

Perhaps most seriously; If cman is already started and you try to restart or stop corosync, corosync will sit there endlessly "Waiting for corosync services to unload:...". Hitting ctrl+c to stop it *appears* to abort the corosync restart. However, anytime there after, trying to stop or restart cman will fail with "Stopping cman... Timed-out waiting for cluster [FAILED]". 

Running 'ps aux | grep corosync' shows "root 4262 0.4 1.9 440156 34728 ? SLsl 22:57 0:01 corosync -f". This process can only be killed with '-9'. Once dead though, cman will restart successfully.


Version-Release number of selected component (if applicable):

- cman-3.0.12-2.fc13.x86_64
- corosync-1.2.3-1.fc13.x86_64

How reproducible:

Appears to be 100%.

Steps to Reproduce:
1. Start corosync, then start cman
2. Start cman, stop|restart corosync, stop|restart cman
3.
  
Actual results:

- cman won't stop/start when corosync is running or restarted.
- corosync won't stop/restart when cman is running and then blocks cman from starting/stopping.

Expected results:

- cman should detect when cman is already running and provide more useful feedback, if not stop corosync itself.
- corosync should detect when cman is available and not start with an error telling the user to use cman instead.

Additional info:

I've got a disposable test cluster. I can run any tests the developers would like me to try.

--- Additional comment from sdake on 2010-07-13 13:03:31 EDT ---

Thanks for the bug report

The common POSIX solution (missing from current corosync) is to have corosync create a file in LOCALSTATEDIR/lock/corosync then use the flock(2) call ie:
fd = open (LOCALSTATEDIR"/lock/corosync)
retry_flock;
res = flock (fd, LOCK_EX|LOCK_NB);
if (res == -1) {
  switch (errno) {
     case EINTR:
           goto retry_flock
           break;
     case EWOULDBLOCK:
           print error that corosync is already active and exit
           break;
     default
           print error that flock couldn't be obtained and exit
           break;
  }
}

The flock is GCed on process exit by POSIX allowing a new start of corosync to grab the lock.

Comment 3 Jan Friesse 2010-07-22 14:50:42 UTC
This bug will be for corosync part

Comment 4 Jan Friesse 2010-07-22 15:45:41 UTC
Created attachment 433738 [details]
Proposed patch for first part of problem

Uses solution described by Steve

Comment 5 Jan Friesse 2010-07-28 14:10:34 UTC
Created attachment 435023 [details]
Proposed patch for first part - take 2

Better version of patch. It also includes change in initscript to NOT create pid file (corosync itself now does).

Comment 6 Jan Friesse 2010-07-28 14:12:08 UTC
Created attachment 435026 [details]
Proposed patch for second problem

This patch fixes second problem in initscript. If corosync was run by cman, initsript refuses to exit.

Comment 11 errata-xmlrpc 2011-05-19 14:24:03 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0764.html


Note You need to log in before you can comment on or make changes to this bug.