RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 617234 - Starting or stopping corosync blocks cman from starting or stopping - cman part
Summary: Starting or stopping corosync blocks cman from starting or stopping - cman part
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: cluster
Version: 6.0
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Fabio Massimo Di Nitto
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On: 613870 614104
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-07-22 14:49 UTC by Jan Friesse
Modified: 2016-04-26 15:43 UTC (History)
14 users (show)

Fixed In Version: cluster-3.0.12-27.el6
Doc Type: Bug Fix
Doc Text:
Clone Of: 614104
Environment:
Last Closed: 2011-05-19 13:03:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Proposed patch for first part. Meaningful error code is displayed (674 bytes, patch)
2010-07-22 15:40 UTC, Jan Friesse
no flags Details | Diff
Proposed patch for test that corosync is not already running (663 bytes, patch)
2010-07-28 14:17 UTC, Jan Friesse
no flags Details | Diff
Proposed patch for second problem (1.23 KB, patch)
2010-09-27 13:41 UTC, Jan Friesse
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0537 0 normal SHIPPED_LIVE cluster and gfs2-utils bug fix update 2011-05-18 17:57:40 UTC

Comment 1 Jan Friesse 2010-07-22 14:50:23 UTC
This bug is for cman part

Comment 3 RHEL Program Management 2010-07-22 15:18:21 UTC
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 4 Jan Friesse 2010-07-22 15:40:57 UTC
Created attachment 433736 [details]
Proposed patch for first part. Meaningful error code is displayed

Use patch with conjunction of https://bugzilla.redhat.com/show_bug.cgi?id=614104 patch.

Comment 5 Jan Friesse 2010-07-22 15:57:21 UTC
Second issue is much more serious and needs to be solved in cman code.

Even part of this code wasn't changed too much from RHEL 5, main problem is in do_cmd_try_shutdown, and caused by patch fc7201e51687b6f357aa6b4dad0f37de1f5d5272.

Corosync signal handler (SIGINT and SIGTERM) is replaced by cman one, and this will set quit_threads to 1.

My proposed solution is to ignore INT and TERM signals completely, but I'm not sure what this solution can cause in different parts.

Chrissie, any opinion there?

Comment 6 Steven Dake 2010-07-22 17:44:52 UTC
Honza,

Speaking with Fabio, what we need is similar flock code, but in LOCALSTATEDIR/run/corosync.pid file instead.  Inside the corosync.pid file should be the process id for the process so it may be killed by looking at that pid value.  The PID should be that of the child after the fork.

Sorry for not knowing the details earlier.

Comment 7 Jan Friesse 2010-07-26 12:14:08 UTC
(In reply to comment #6)
> Honza,
> 
> Speaking with Fabio, what we need is similar flock code, but in
> LOCALSTATEDIR/run/corosync.pid file instead.  Inside the corosync.pid file
> should be the process id for the process so it may be killed by looking at that
> pid value.  The PID should be that of the child after the fork.
> 
> Sorry for not knowing the details earlier.    

Steve,
are you sure that comment is in the right bug? This is cman part, not corosync.

Comment 8 Jan Friesse 2010-07-28 14:17:13 UTC
Created attachment 435029 [details]
Proposed patch for test that corosync is not already running

Patch fixes init file so now before cman start is tested, if corosync is running. If so, init script will refuse to start.

Comment 9 Jan Friesse 2010-09-27 13:41:03 UTC
Created attachment 449893 [details]
Proposed patch for second problem

cman: Handle INT and TERM signals correctly

Corosync signal handler (SIGINT and SIGTERM) is replaced by cman one,
and this was settting quit_threads to 1. Regular cman shutdown sequence
(cman_tool leave) tests if quit_threads is set or not. If so, it refuses
continue so it was not possible to cleanly leave cluster.

Now SIGINT and SIGTERM are ignored, and (un)intentional kill of corosync
is no longer problem.

(We talked about this solution with Chrissie month and something ago, so this is why clearing need info)

Comment 10 David Mair 2010-09-27 20:03:36 UTC
This problem sounds pretty ugly and likely to drive support calls.  I'm flagging this for 6.0.z so we can hopefully release an errata shortly after 6.0 ships especially when it appears we have patches for the issues reported here.

Comment 12 Fabio Massimo Di Nitto 2010-09-28 04:11:00 UTC
(In reply to comment #10)
> This problem sounds pretty ugly and likely to drive support calls.  I'm
> flagging this for 6.0.z so we can hopefully release an errata shortly after 6.0
> ships especially when it appears we have patches for the issues reported here.

Actually, it's not as bad as it looks, since the documentation clearly states how to setup cluster and it was decided not to push the fix for 6.0 right away.

Comment 13 Steven Dake 2010-09-28 16:08:02 UTC
David,

I agree with Fabio - we are covered on the docs very well in this case.  But for those that don't read the documentation..

I am always happy to fix problems GSS believes could be problematic regarding support.

Before we can mark this in the done column, we need feedback from Chrissie re comment #5.

Regards
-steve

Comment 14 Christine Caulfield 2010-09-29 07:36:47 UTC
I thought I'd discussed this on IRC some time ago. quit_threads in the cman code doesn't actually do anything in RHEL6 apart from get in the way. Removing it, and anything that sets it should have no impact on operations.

All shutdown checking should be done by corosync so the signal handlers in daemon.c can go too.

Comment 15 Jan Friesse 2010-09-29 10:29:25 UTC
As noted in comment #9, I removed need info flag because we was talking about problem with Chrissie before my vacation.

Anyway, all cman patches are currently in STABLE3 git tree as e88da89f1a5cdb8eb5e1924514401dfb91c0363c, c09852206f21ed04806211e49ca9423e10fea1f9 and de0a199f499bec83774ad88765c5e7df487913e9 so moving to post.

Comment 16 Fabio Massimo Di Nitto 2010-09-30 13:17:37 UTC
I am dropping 6.0.z flag after discussing with other engineers.

The problem is not as bad as it looks, it´s well documented and the dependency chain is not straight (requires rebuild of corosync and cman with several patches).

Comment 17 Jeremy West 2010-09-30 19:41:42 UTC
Fabio,

I'm ok with dropping 6.0.z if you can provide a link here in the BZ to the documentation that explains how to resolve this.  From a GSS perspective we need to be able to quickly point in the right direction, any customers calling in this.

--jwest

Comment 22 errata-xmlrpc 2011-05-19 13:03:50 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0537.html


Note You need to log in before you can comment on or make changes to this bug.