Bug 613870
Summary: | Starting or stopping corosync blocks cman from starting or stopping | |||
---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Madison Kelly <mkelly> | |
Component: | corosync | Assignee: | Jan Friesse <jfriesse> | |
Status: | CLOSED UPSTREAM | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | |
Severity: | medium | Docs Contact: | ||
Priority: | low | |||
Version: | 13 | CC: | agk, cluster-maint, edamato, fdinitto, nstraz, sdake | |
Target Milestone: | --- | |||
Target Release: | --- | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 614104 (view as bug list) | Environment: | ||
Last Closed: | 2010-09-29 10:27:43 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 614104, 617234 | |||
Attachments: |
Description
Madison Kelly
2010-07-13 03:12:19 UTC
Thanks for the bug report The common POSIX solution (missing from current corosync) is to have corosync create a file in LOCALSTATEDIR/lock/corosync then use the flock(2) call ie: fd = open (LOCALSTATEDIR"/lock/corosync) retry_flock; res = flock (fd, LOCK_EX|LOCK_NB); if (res == -1) { switch (errno) { case EINTR: goto retry_flock break; case EWOULDBLOCK: print error that corosync is already active and exit break; default print error that flock couldn't be obtained and exit break; } } The flock is GCed on process exit by POSIX allowing a new start of corosync to grab the lock. Thanks Steven, Will this be added in the next release? If so, I guess this ticket can be closed? Created attachment 433737 [details]
Proposed patch for first part of problem
Uses solution described by Steve
Created attachment 435027 [details]
Proposed patch for first part - take 2
Better version of patch. It also includes change in initscript to NOT create
pid file (corosync itself now does).
Created attachment 435028 [details]
Proposed patch for second problem
This patch fixes second problem in initscript. If corosync was run by cman,
initsript refuses to kill corosync and exit itself.
Created attachment 450432 [details]
Cman: Handle corosync exit codes
Created attachment 450433 [details]
Cman: Handle "another instance running" error code
Created attachment 450434 [details]
Cman: test that corosync is not already running
Patch fixes init file so now before cman start is tested, if corosync is
running. If so, init script will refuse to start.
Created attachment 450435 [details]
Cman: Handle INT and TERM signals correctly
Corosync signal handler (SIGINT and SIGTERM) is replaced by cman one,
and this was settting quit_threads to 1. Regular cman shutdown sequence
(cman_tool leave) tests if quit_threads is set or not. If so, it refuses
continue so it was not possible to cleanly leave cluster.
Now SIGINT and SIGTERM are ignored, and (un)intentional kill of corosync
is no longer problem.
All patches together should give complete solution without any possibility to replicate bugs. Also patches are currently included in cluster STABLE3 tree and/or corosync trunk so closing as upstream. |