Bug 497450 - rgmanager shutdown hangs if it hasn't formed a quorum
rgmanager shutdown hangs if it hasn't formed a quorum
Product: Fedora
Classification: Fedora
Component: rgmanager (Show other bugs)
i686 Linux
low Severity high
: ---
: ---
Assigned To: Lon Hohberger
Fedora Extras Quality Assurance
Depends On:
  Show dependency treegraph
Reported: 2009-04-23 19:00 EDT by P Jones
Modified: 2009-07-22 16:38 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-07-22 16:38:28 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description P Jones 2009-04-23 19:00:20 EDT
Description of problem:
rgmanager shutdown hangs and never completes. 

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. define a cluster 
2. start one machine in the non-quorate cluster 
3. run "service rgmanager stop" or "reboot". Or "halt". 

Actual results:
it sits there until interrupted. 

Expected results:
System to stop clurgmgrd, either politely, or via kill -9 after a few hours. 
I see two things that should change:
 1. clurgmgrd should terminate cleanly when requested. 
 2. the initscript should timeout and kill it if it fails to exit properly. 

Additional info:
Looking at the /etc/init.d/rgmanager script, I see the inner while loop which will just wait forever.  Which is what it does when(if?) clurgmgrd doesn't exit on -TERM.  Also, clurgmgrd ignores -TERM signals. 

# Bring down the cluster on a node.
        kill -TERM `pidof $RGMGRD`

        while [ 0 ]; do

                if [ -n "`pidof $RGMGRD`" ]; then
                        echo -n $"Waiting for services to stop: " 
                        while [ -n "`pidof $RGMGRD`" ]; do
                                sleep 1
                        echo $"Services are stopped."

                # Ensure all NFS rmtab daemons are dead.
                killall $RMTABD &> /dev/null
                rm -f /var/run/$RGMGRD.pid

                return 0

If I look deeper, into cluster-2.99.12/rgmanager/src/daemons/main.c I see that on a TERM signal it exits event_loop(...) (lines 776-778), but that falls back to a while() loop in main() (lines 1095-1106); nothing sets any flags, such as running to 0 or shutdown_pending to 1.  

There is only one place shutdown_pending is set: In main.c flag_shutdown() (790-794);  This is bound to the -TERM and -INT handlers. 

The shutdown_pending is only checked once the main loop is found.  
If the signal is received before the cluster is quorate,  nothing checks for the shutdown_pending flag. clu_initialize() should be modified like this:

clu_initialize(cman_handle_t *ch)
	if (!ch)

	*ch = cman_init(NULL);
	if (!(*ch)) {
		log_printf(LOG_NOTICE, "Waiting for CMAN to start\n");

-		while (!(*ch = cman_init(NULL))) {
+		while (!(*ch = cman_init(NULL)) && shutdown_pending == 0 ) {

+        if (shutdown_pending > 0 ) { 
+            return;
+        }
        if (!cman_is_quorate(*ch)) {
		   There are two ways to do this; this happens to be the simpler
		   of the two.  The other method is to join with a NULL group 
		   and log in -- this will cause the plugin to not select any
		   node group (if any exist).
		log_printf(LOG_NOTICE, "Waiting for quorum to form\n");

-		while (cman_is_quorate(*ch) == 0 ) {
+		while (cman_is_quorate(*ch) == 0 && shutdown_pending == 0 ) {
+                if (shutdown_pending > 0 ) { 
+                    return;
+                }
		log_printf(LOG_NOTICE, "Quorum formed\n");

Comment 1 P Jones 2009-04-24 00:03:21 EDT
Oh, and this in main():

+        if (shutdown_pending > 0 ) { 
+	    log_printf(LOG_NOTICE, "shutdown durring clu_initialize\n");
+            return(-1);
+        }

	if (cman_init_subsys(clu) < 0) {
		return -1;
Comment 2 Lon Hohberger 2009-04-28 14:05:01 EDT
This was fixed in STABLE3 several weeks ago and will appear in rawhide after the next spin:


Note You need to log in before you can comment on or make changes to this bug.