Description of problem: Occasionally, ypserv dies with the message ypserv[686]: svc_run: - poll failed: No child processes logged to syslog. However, it doesn't happen very often. Nevertheless, I have set the severity to high, since when it *does* happen, the entire system (including all YP clients) becomes more or less unusable. Version-Release number of selected component (if applicable): ypserv-2.8-0.9E How reproducible: Not very... I installed a server with RedHat 9 and updates at the end of July, and during August ypserv died perhaps ten times. Then it ran fine for almost a full month, and today (September 26th) it died again. Steps to Reproduce: 1. Run ypserv on a busy ypserver 2. Wait for hours, days or months... Actual results: ypserv dies with the above message logged to syslog Expected results: ypserv should not die. :-) Additional info: That particular message only exists in one place in the ypserv source, in the function ypserv_svc_run() in ypserv.c, the -1 branch of the switch statement. The probable cause is that the process gets a SIGCHLD signal after poll() returned -1, but before it gets the chance to actually look at the value of errno. The signal handler for SIGCHLD, sig_child(), calls waitpid(), which clobbers the value of errno. One way to work around this problem, is for sig_child() to save the value of errno at entry and restore it when it returns: ============================================================================== --- ypserv.c.orig 2003-09-26 14:31:18.000000000 +0200 +++ ypserv.c 2003-09-26 14:30:58.000000000 +0200 @@ -354,10 +354,11 @@ static void sig_child (int sig) { int st; pid_t pid; + int save = errno; if (debug_flag) log_msg ("sig_child: got signal %i", sig); @@ -371,10 +372,12 @@ if (children < 0) log_msg ("children is lower 0 (%i)!", children); else if (debug_flag) log_msg ("children = %i", children); + + errno = save; } static void Usage (int exitcode) ============================================================================== However, that only helps for this particular signal. Other signal handlers can wreak havoc too, and not just with errno. sig_hup() does a lot of things that I believe are not safe to do in a signal handler; calling fopen() and malloc(), for instance. A better way would be to make the signal handlers only set a flag (of the type 'volatile sig_atomic_t'), and then let the main loop detect that the flag has been set.
I'm running RedHat 7.3 with all current patches and I just had the same problem. We've been running 7.3 for a year now without ever having problems. I rebooted about 2 weeks ago and this weekend ypserv died with this same error. ypserv[834]: svc_run: - poll failed: No child processes ypserv[834]: svc_run returned
*** This bug has been marked as a duplicate of 98531 ***
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.