Bug 179457

Summary: netdump-server halts when running external script
Product: Red Hat Enterprise Linux 3 Reporter: Brad Hinson <bhinson>
Component: netdumpAssignee: Thomas Graf <tgraf>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: high    
Version: 3.0CC: lwang, rkhan, tao
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2006-0462 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-07-20 14:31:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 177783    
Bug Blocks: 181405    

Description Brad Hinson 2006-01-31 15:47:35 UTC
+++ This bug was initially created as a clone of Bug #177783 +++

Description of problem:

In /var/crash/scripts/netdump-reboot, I have this logic:

sleep 10
cd "$2"
nice gzip -9 vmcore

This works as I would expect it to... but when gzip is running, netdump-server
does NOTHING else. It doesn't even accept incoming packets from other ongoing
netdumps.  So on my netdump server, which services hundreds of clients, I get
failed netdumps simplying because one that didn't fail it gzip'ing, and the
other netdump clients timeout after a while.  I suspect that gzip isn't
required.... "sleep 1h" should do the same thing.

Is the netdump-server not capable of both running one of scripts AND continuing
to process other currently active netdump sessions?


Version-Release number of selected component (if applicable):

netdump-server-0.7.7-3.i386.rpm in RHEL4 U2

Shoud should be filled under netdump-server, but I couldn't find that compenent
in the list.

-- Additional comment from anderson on 2006-01-13 17:38 EST --

Can you do the gzip command in the background?  The netdump-server just does
a "system()" of the script.

-- Additional comment from anderson on 2006-01-13 17:41 EST --

i.e., 

cd $2
(sleep 10; nice gzip -9 vmcore) & 

-- Additional comment from joshua on 2006-01-13 17:50 EST --
Can't netdump-server do the right thing here?  Wouldn't an error in the script
effectively halt all netdump activity on the server until it was restarted?

Surely the netdump-server can fork off a process for the script.

-- Additional comment from anderson on 2006-01-13 17:54 EST --

If the script failed, the system() function would just return, and the
netdump-server would continue.

Surely it could fork off a process, but it doesn't...    ;-)

(Sorry -- I didn't write the damn thing.)

-- Additional comment from joshua on 2006-01-13 18:18 EST --
If the script does something that never returns, bad bad things would happen. 
The netdump-server can provide more robust behavior by forking off the script,
and not leaving itself so open to killing off all other netdump sessions.

The code isn't set in stone?  ie, if we come up with a better way for it to
work, can't it be changed?  Do you not have permission to change the code?  I
understand that you didn't write it... but if only the original authors could
change the code... well... we would still all be running Solaris :-)

-- Additional comment from anderson on 2006-01-13 18:26 EST --

No, but if (1) I fixed it today, and (2) if it was deemed a requirement
for a future netdump package errata -- which is no small task
getting it that permission from the powers-that-be -- you wouldn't see
it in a release errata for many months from now.  The next netdump
errata is already in the pipeline, and even that won't hit the streets
for quite some time.  

I feel your pain...


-- Additional comment from anderson on 2006-01-18 09:13 EST --


Upon further review (I've been watching too much football lately), please
consider the following.  There are four optional netdump-server scripts: 

netdump-start: 
  Run when a new client does a "service netdump start".

netdump-crash:
  Run when a client initiates a handshake after a crash.
  It cannot be run in the background because its success/failure
  determines whether the netdump-server starts a netdump operation
  or requests the client to just reboot.

netdump-nospace:
  Run when a netdump operation has been accepted, but there is not enough
  space to hold the vmcore. 
  It cannot be run in the background because its intended use is to 
  allow the netdump-server to clear out space if possible, and if it
  is successful, does a retry of the space check.

netdump-reboot:
  Run after a vmcore has been created, and just before requesting that
  the client reboot.

So the question is whether to make the netdump-reboot, and perhaps the
netdump-start, scripts run in the background, making their operation
inconsistent with the other two.

If a user decides to have their script file perform a time-consuming task,
then the user need only add an ampersand to avoid blocking the netdump-server.
So there really is no problem/bug here.

Furthermore, it can be argued that, depending upon what the user wants
their script to accomplish, that the script *should* be run in the
foreground, i.e., the netdump-server should wait before continuing to 
process any more activity.  How do we know that some other customer
has something being accomplished by their netdump-reboot script that
*should* be waited for?  Changing the code now would break that
behavior.  

In my opinion, the process should remain as it is, so that the process
can remain as flexible as possible.  The user should have complete
control as to whether to run the scripts in the foreground or background.
  


-- Additional comment from joshua on 2006-01-18 13:51 EST --
Well said. I mostly buy this line of reasoning.  Remember thought that a script
that isn't backgrounded halts *all* netdumps, not just the one for which the
script is running.  Think enterprise:  you are pointing 100s or 1000s of machine
at a single netdump-server.  It isn't uncommon to see multiple netdumps at the
same time.  One mistake in a script, or just a script that takes a while, will
cauase *all* other netdumps to fail.

If you don't think that is reason enough to change netdump-server's behavior,
perhaps we should change the documentation or man page to point this important
fact out.

-- Additional comment from anderson on 2006-01-18 15:18 EST --

I can certainly agree that a documentation change is reasonable,
to both the package README and the netdump-server(8) man page.

Comment 5 Dave Anderson 2006-05-02 14:25:36 UTC
Fix checked in to RHEL-3 CVS, version 0.7.16-1.1

Comment 11 Red Hat Bugzilla 2006-07-20 14:31:18 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0462.html