105661 – ypserv dies with "poll failed: No child processes"

Bug 105661 - ypserv dies with "poll failed: No child processes"

Summary: ypserv dies with "poll failed: No child processes"

Keywords:
Status:	CLOSED DUPLICATE of bug 98531
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	ypserv
Sub Component:
Version:	9
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Steve Dickson
QA Contact:	David Lawrence
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2003-09-26 12:51 UTC by Thomas Bellman
Modified:	2007-04-18 16:57 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2006-02-21 18:58:46 UTC
Embargoed:

Attachments	(Terms of Use)

Description Thomas Bellman 2003-09-26 12:51:10 UTC

Description of problem:

Occasionally, ypserv dies with the message

    ypserv[686]: svc_run: - poll failed: No child processes

logged to syslog.  However, it doesn't happen very often.  Nevertheless, I have
set the severity to high, since when it *does* happen, the entire system
(including all YP clients) becomes more or less unusable.



Version-Release number of selected component (if applicable):

ypserv-2.8-0.9E



How reproducible:

Not very...  I installed a server with RedHat 9 and updates at the end of July,
and during August ypserv died perhaps ten times.  Then it ran fine for almost a
full month, and today (September 26th) it died again.



Steps to Reproduce:

1. Run ypserv on a busy ypserver
2. Wait for hours, days or months...


    
Actual results:

ypserv dies with the above message logged to syslog



Expected results:

ypserv should not die. :-)



Additional info:

That particular message only exists in one place in the ypserv source, in the
function ypserv_svc_run() in ypserv.c, the -1 branch of the switch statement.

The probable cause is that the process gets a SIGCHLD signal after poll()
returned -1, but before it gets the chance to actually look at the value of
errno.  The signal handler for SIGCHLD, sig_child(), calls waitpid(), which
clobbers the value of errno.

One way to work around this problem, is for sig_child() to save the value of
errno at entry and restore it when it returns:

==============================================================================
--- ypserv.c.orig       2003-09-26 14:31:18.000000000 +0200
+++ ypserv.c    2003-09-26 14:30:58.000000000 +0200
@@ -354,10 +354,11 @@
 static void
 sig_child (int sig)
 {
   int st;
   pid_t pid;
+  int save = errno;
 
   if (debug_flag)
     log_msg ("sig_child: got signal %i", sig);
 
 
@@ -371,10 +372,12 @@
 
   if (children < 0)
     log_msg ("children is lower 0 (%i)!", children);
   else if (debug_flag)
     log_msg ("children = %i", children);
+
+  errno = save;
 }
 
 
 static void
 Usage (int exitcode)
==============================================================================

However, that only helps for this particular signal.  Other signal handlers can
wreak havoc too, and not just with errno.  sig_hup() does a lot of things that I
believe are not safe to do in a signal handler; calling fopen() and malloc(),
for instance.  A better way would be to make the signal handlers only set a flag
(of the type 'volatile sig_atomic_t'), and then let the main loop detect that
the flag has been set.

Comment 1 fisher 2003-09-29 14:20:56 UTC

I'm running RedHat 7.3 with all current patches and I just had the same problem.
 We've been running 7.3 for a year now without ever having problems.  I rebooted
about 2 weeks ago and this weekend ypserv died with this same error.

ypserv[834]: svc_run: - poll failed: No child processes
ypserv[834]: svc_run returned

Comment 2 Steve Dickson 2003-10-02 18:57:22 UTC


*** This bug has been marked as a duplicate of 98531 ***

Comment 3 Red Hat Bugzilla 2006-02-21 18:58:46 UTC

Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.

Note You need to log in before you can comment on or make changes to this bug.