Bug 105661
| Summary: | ypserv dies with "poll failed: No child processes" | ||
|---|---|---|---|
| Product: | [Retired] Red Hat Linux | Reporter: | Thomas Bellman <bellman> |
| Component: | ypserv | Assignee: | Steve Dickson <steved> |
| Status: | CLOSED DUPLICATE | QA Contact: | David Lawrence <dkl> |
| Severity: | high | Docs Contact: | |
| Priority: | medium | ||
| Version: | 9 | ||
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | i686 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2006-02-21 18:58:46 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
I'm running RedHat 7.3 with all current patches and I just had the same problem. We've been running 7.3 for a year now without ever having problems. I rebooted about 2 weeks ago and this weekend ypserv died with this same error. ypserv[834]: svc_run: - poll failed: No child processes ypserv[834]: svc_run returned *** This bug has been marked as a duplicate of 98531 *** Changed to 'CLOSED' state since 'RESOLVED' has been deprecated. |
Description of problem: Occasionally, ypserv dies with the message ypserv[686]: svc_run: - poll failed: No child processes logged to syslog. However, it doesn't happen very often. Nevertheless, I have set the severity to high, since when it *does* happen, the entire system (including all YP clients) becomes more or less unusable. Version-Release number of selected component (if applicable): ypserv-2.8-0.9E How reproducible: Not very... I installed a server with RedHat 9 and updates at the end of July, and during August ypserv died perhaps ten times. Then it ran fine for almost a full month, and today (September 26th) it died again. Steps to Reproduce: 1. Run ypserv on a busy ypserver 2. Wait for hours, days or months... Actual results: ypserv dies with the above message logged to syslog Expected results: ypserv should not die. :-) Additional info: That particular message only exists in one place in the ypserv source, in the function ypserv_svc_run() in ypserv.c, the -1 branch of the switch statement. The probable cause is that the process gets a SIGCHLD signal after poll() returned -1, but before it gets the chance to actually look at the value of errno. The signal handler for SIGCHLD, sig_child(), calls waitpid(), which clobbers the value of errno. One way to work around this problem, is for sig_child() to save the value of errno at entry and restore it when it returns: ============================================================================== --- ypserv.c.orig 2003-09-26 14:31:18.000000000 +0200 +++ ypserv.c 2003-09-26 14:30:58.000000000 +0200 @@ -354,10 +354,11 @@ static void sig_child (int sig) { int st; pid_t pid; + int save = errno; if (debug_flag) log_msg ("sig_child: got signal %i", sig); @@ -371,10 +372,12 @@ if (children < 0) log_msg ("children is lower 0 (%i)!", children); else if (debug_flag) log_msg ("children = %i", children); + + errno = save; } static void Usage (int exitcode) ============================================================================== However, that only helps for this particular signal. Other signal handlers can wreak havoc too, and not just with errno. sig_hup() does a lot of things that I believe are not safe to do in a signal handler; calling fopen() and malloc(), for instance. A better way would be to make the signal handlers only set a flag (of the type 'volatile sig_atomic_t'), and then let the main loop detect that the flag has been set.