Red Hat Bugzilla – Bug 177783
netdump-server halts when running external script
Last modified: 2014-06-18 04:28:43 EDT
Description of problem:
In /var/crash/scripts/netdump-reboot, I have this logic:
nice gzip -9 vmcore
This works as I would expect it to... but when gzip is running, netdump-server
does NOTHING else. It doesn't even accept incoming packets from other ongoing
netdumps. So on my netdump server, which services hundreds of clients, I get
failed netdumps simplying because one that didn't fail it gzip'ing, and the
other netdump clients timeout after a while. I suspect that gzip isn't
required.... "sleep 1h" should do the same thing.
Is the netdump-server not capable of both running one of scripts AND continuing
to process other currently active netdump sessions?
Version-Release number of selected component (if applicable):
netdump-server-0.7.7-3.i386.rpm in RHEL4 U2
Shoud should be filled under netdump-server, but I couldn't find that compenent
in the list.
Can you do the gzip command in the background? The netdump-server just does
a "system()" of the script.
(sleep 10; nice gzip -9 vmcore) &
Can't netdump-server do the right thing here? Wouldn't an error in the script
effectively halt all netdump activity on the server until it was restarted?
Surely the netdump-server can fork off a process for the script.
If the script failed, the system() function would just return, and the
netdump-server would continue.
Surely it could fork off a process, but it doesn't... ;-)
(Sorry -- I didn't write the damn thing.)
If the script does something that never returns, bad bad things would happen.
The netdump-server can provide more robust behavior by forking off the script,
and not leaving itself so open to killing off all other netdump sessions.
The code isn't set in stone? ie, if we come up with a better way for it to
work, can't it be changed? Do you not have permission to change the code? I
understand that you didn't write it... but if only the original authors could
change the code... well... we would still all be running Solaris :-)
No, but if (1) I fixed it today, and (2) if it was deemed a requirement
for a future netdump package errata -- which is no small task
getting it that permission from the powers-that-be -- you wouldn't see
it in a release errata for many months from now. The next netdump
errata is already in the pipeline, and even that won't hit the streets
for quite some time.
I feel your pain...
Upon further review (I've been watching too much football lately), please
consider the following. There are four optional netdump-server scripts:
Run when a new client does a "service netdump start".
Run when a client initiates a handshake after a crash.
It cannot be run in the background because its success/failure
determines whether the netdump-server starts a netdump operation
or requests the client to just reboot.
Run when a netdump operation has been accepted, but there is not enough
space to hold the vmcore.
It cannot be run in the background because its intended use is to
allow the netdump-server to clear out space if possible, and if it
is successful, does a retry of the space check.
Run after a vmcore has been created, and just before requesting that
the client reboot.
So the question is whether to make the netdump-reboot, and perhaps the
netdump-start, scripts run in the background, making their operation
inconsistent with the other two.
If a user decides to have their script file perform a time-consuming task,
then the user need only add an ampersand to avoid blocking the netdump-server.
So there really is no problem/bug here.
Furthermore, it can be argued that, depending upon what the user wants
their script to accomplish, that the script *should* be run in the
foreground, i.e., the netdump-server should wait before continuing to
process any more activity. How do we know that some other customer
has something being accomplished by their netdump-reboot script that
*should* be waited for? Changing the code now would break that
In my opinion, the process should remain as it is, so that the process
can remain as flexible as possible. The user should have complete
control as to whether to run the scripts in the foreground or background.
Well said. I mostly buy this line of reasoning. Remember thought that a script
that isn't backgrounded halts *all* netdumps, not just the one for which the
script is running. Think enterprise: you are pointing 100s or 1000s of machine
at a single netdump-server. It isn't uncommon to see multiple netdumps at the
same time. One mistake in a script, or just a script that takes a while, will
cauase *all* other netdumps to fail.
If you don't think that is reason enough to change netdump-server's behavior,
perhaps we should change the documentation or man page to point this important
I can certainly agree that a documentation change is reasonable,
to both the package README and the netdump-server(8) man page.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.