Bug 149059 - Nagios Service hangs because clusvcmgrd blocks SIGALRM
Nagios Service hangs because clusvcmgrd blocks SIGALRM
Status: CLOSED DUPLICATE of bug 143867
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: clumanager (Show other bugs)
3
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Lon Hohberger
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-02-18 08:21 EST by Pietro Dania
Modified: 2009-04-16 16:16 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-02-21 14:08:12 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Pietro Dania 2005-02-18 08:21:05 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (compatible; Konqueror/3.3; Linux) KHTML/3.3.92 (like Gecko)

Description of problem:
The Nagios monitoring tool gets hung when running as a clumanager Service on a 2 nodes cluster. No hangs if it's run manually (service nagios start)
After investigation, it seems that the problem is not in Nagios itself, but in the fact that clusvcmgrd launches the Service process with SIGALRM (and lots other) signal blocked.


Version-Release number of selected component (if applicable):
clumanager-1.2.16-1

How reproducible:
Always

Steps to Reproduce:
1. compile the following program
# test.c    a program that hangs when run as a clumanager Service
#include <stdio.h>
#include <signal.h>

void sigHandler (int s) {
    exit(0);
}


int main (int argc, char *argv[]) {

    int timeout = 10;

    signal(SIGALRM, sigHandler);
    alarm(timeout);

    while (1) {
        sleep(100);
    }
}

2. make a script for it (e.g. /etc/init.d/test) and configure it as a clumanager Service

3. clusvcadm -e test


Actual Results:  the program hangs

Expected Results:  the program terminates after 10 seconds (and gets restarted if the cluster is configured to do so)

Additional info:

# strace -p <PID_OF_TEST_PROCESS>
Process 32156 attached - interrupt to quit
rt_sigprocmask(SIG_BLOCK, [CHLD], ~[HUP INT QUIT ILL TRAP ABRT BUS FPE KILL SEGV TERM CHLD STOP RTMIN], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[HUP INT QUIT ILL TRAP ABRT BUS FPE KILL SEGV TERM CHLD STOP RTMIN], NULL, 8) = 0
nanosleep({100, 0},

# cat /proc/<PID_OF_TEST_PROCESS>/status
Name:   test
State:  S (sleeping)
Tgid:   32156
Pid:    32156
PPid:   1
TracerPid:      0
Uid:    0       0       0       0
Gid:    0       0       0       0
FDSize: 256
Groups:
VmSize:     1368 kB
VmLck:         0 kB
VmRSS:       244 kB
VmData:       12 kB
VmStk:        20 kB
VmExe:         4 kB
VmLib:      1308 kB
SigPnd: 0000000000000000
ShdPnd: 0000000000002000
SigBlk: ffffffff7ffaba00
SigIgn: 0000000000000006
SigCgt: 0000000000002000
CapInh: 0000000000000000
CapPrm: 00000000fffffeff
CapEff: 00000000fffffeff
Comment 1 Lon Hohberger 2005-02-18 09:50:46 EST

*** This bug has been marked as a duplicate of 143867 ***
Comment 2 Red Hat Bugzilla 2006-02-21 14:08:12 EST
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.

Note You need to log in before you can comment on or make changes to this bug.