177783 – netdump-server halts when running external script

Bug 177783 - netdump-server halts when running external script

Summary: netdump-server halts when running external script

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	netdump
Sub Component:
Version:	4.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	low
Target Milestone:	---
Target Release:	---
Assignee:	Thomas Graf
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	179457 181409
TreeView+	depends on / blocked

Reported:	2006-01-13 22:31 UTC by Joshua Jensen
Modified:	2014-06-18 08:28 UTC (History)
CC List:	3 users (show)
Fixed In Version:	RHBA-2006-0492
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-08-10 21:26:55 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2006:0492	0	normal	SHIPPED_LIVE	netdump bug fix update	2006-08-10 04:00:00 UTC

Description Joshua Jensen 2006-01-13 22:31:34 UTC

Description of problem:

In /var/crash/scripts/netdump-reboot, I have this logic:

sleep 10
cd "$2"
nice gzip -9 vmcore

This works as I would expect it to... but when gzip is running, netdump-server
does NOTHING else. It doesn't even accept incoming packets from other ongoing
netdumps.  So on my netdump server, which services hundreds of clients, I get
failed netdumps simplying because one that didn't fail it gzip'ing, and the
other netdump clients timeout after a while.  I suspect that gzip isn't
required.... "sleep 1h" should do the same thing.

Is the netdump-server not capable of both running one of scripts AND continuing
to process other currently active netdump sessions?


Version-Release number of selected component (if applicable):

netdump-server-0.7.7-3.i386.rpm in RHEL4 U2

Shoud should be filled under netdump-server, but I couldn't find that compenent
in the list.

Comment 1 Dave Anderson 2006-01-13 22:38:09 UTC

Can you do the gzip command in the background?  The netdump-server just does
a "system()" of the script.

Comment 2 Dave Anderson 2006-01-13 22:41:18 UTC

i.e., 

cd $2
(sleep 10; nice gzip -9 vmcore) &

Comment 3 Joshua Jensen 2006-01-13 22:50:49 UTC

Can't netdump-server do the right thing here?  Wouldn't an error in the script
effectively halt all netdump activity on the server until it was restarted?

Surely the netdump-server can fork off a process for the script.

Comment 4 Dave Anderson 2006-01-13 22:54:36 UTC

If the script failed, the system() function would just return, and the
netdump-server would continue.

Surely it could fork off a process, but it doesn't...    ;-)

(Sorry -- I didn't write the damn thing.)

Comment 5 Joshua Jensen 2006-01-13 23:18:32 UTC

If the script does something that never returns, bad bad things would happen. 
The netdump-server can provide more robust behavior by forking off the script,
and not leaving itself so open to killing off all other netdump sessions.

The code isn't set in stone?  ie, if we come up with a better way for it to
work, can't it be changed?  Do you not have permission to change the code?  I
understand that you didn't write it... but if only the original authors could
change the code... well... we would still all be running Solaris :-)

Comment 6 Dave Anderson 2006-01-13 23:26:36 UTC

No, but if (1) I fixed it today, and (2) if it was deemed a requirement
for a future netdump package errata -- which is no small task
getting it that permission from the powers-that-be -- you wouldn't see
it in a release errata for many months from now.  The next netdump
errata is already in the pipeline, and even that won't hit the streets
for quite some time.  

I feel your pain...

Comment 8 Dave Anderson 2006-01-18 14:13:35 UTC


Upon further review (I've been watching too much football lately), please
consider the following.  There are four optional netdump-server scripts: 

netdump-start: 
  Run when a new client does a "service netdump start".

netdump-crash:
  Run when a client initiates a handshake after a crash.
  It cannot be run in the background because its success/failure
  determines whether the netdump-server starts a netdump operation
  or requests the client to just reboot.

netdump-nospace:
  Run when a netdump operation has been accepted, but there is not enough
  space to hold the vmcore. 
  It cannot be run in the background because its intended use is to 
  allow the netdump-server to clear out space if possible, and if it
  is successful, does a retry of the space check.

netdump-reboot:
  Run after a vmcore has been created, and just before requesting that
  the client reboot.

So the question is whether to make the netdump-reboot, and perhaps the
netdump-start, scripts run in the background, making their operation
inconsistent with the other two.

If a user decides to have their script file perform a time-consuming task,
then the user need only add an ampersand to avoid blocking the netdump-server.
So there really is no problem/bug here.

Furthermore, it can be argued that, depending upon what the user wants
their script to accomplish, that the script *should* be run in the
foreground, i.e., the netdump-server should wait before continuing to 
process any more activity.  How do we know that some other customer
has something being accomplished by their netdump-reboot script that
*should* be waited for?  Changing the code now would break that
behavior.  

In my opinion, the process should remain as it is, so that the process
can remain as flexible as possible.  The user should have complete
control as to whether to run the scripts in the foreground or background.

Comment 10 Joshua Jensen 2006-01-18 18:51:06 UTC

Well said. I mostly buy this line of reasoning.  Remember thought that a script
that isn't backgrounded halts *all* netdumps, not just the one for which the
script is running.  Think enterprise:  you are pointing 100s or 1000s of machine
at a single netdump-server.  It isn't uncommon to see multiple netdumps at the
same time.  One mistake in a script, or just a script that takes a while, will
cauase *all* other netdumps to fail.

If you don't think that is reason enough to change netdump-server's behavior,
perhaps we should change the documentation or man page to point this important
fact out.

Comment 11 Dave Anderson 2006-01-18 20:18:47 UTC

I can certainly agree that a documentation change is reasonable,
to both the package README and the netdump-server(8) man page.

Comment 21 Red Hat Bugzilla 2006-08-10 21:26:55 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0492.html

Note You need to log in before you can comment on or make changes to this bug.