Description of problem: In /var/crash/scripts/netdump-reboot, I have this logic: sleep 10 cd "$2" nice gzip -9 vmcore This works as I would expect it to... but when gzip is running, netdump-server does NOTHING else. It doesn't even accept incoming packets from other ongoing netdumps. So on my netdump server, which services hundreds of clients, I get failed netdumps simplying because one that didn't fail it gzip'ing, and the other netdump clients timeout after a while. I suspect that gzip isn't required.... "sleep 1h" should do the same thing. Is the netdump-server not capable of both running one of scripts AND continuing to process other currently active netdump sessions? Version-Release number of selected component (if applicable): netdump-server-0.7.7-3.i386.rpm in RHEL4 U2 Shoud should be filled under netdump-server, but I couldn't find that compenent in the list.
Can you do the gzip command in the background? The netdump-server just does a "system()" of the script.
i.e., cd $2 (sleep 10; nice gzip -9 vmcore) &
Can't netdump-server do the right thing here? Wouldn't an error in the script effectively halt all netdump activity on the server until it was restarted? Surely the netdump-server can fork off a process for the script.
If the script failed, the system() function would just return, and the netdump-server would continue. Surely it could fork off a process, but it doesn't... ;-) (Sorry -- I didn't write the damn thing.)
If the script does something that never returns, bad bad things would happen. The netdump-server can provide more robust behavior by forking off the script, and not leaving itself so open to killing off all other netdump sessions. The code isn't set in stone? ie, if we come up with a better way for it to work, can't it be changed? Do you not have permission to change the code? I understand that you didn't write it... but if only the original authors could change the code... well... we would still all be running Solaris :-)
No, but if (1) I fixed it today, and (2) if it was deemed a requirement for a future netdump package errata -- which is no small task getting it that permission from the powers-that-be -- you wouldn't see it in a release errata for many months from now. The next netdump errata is already in the pipeline, and even that won't hit the streets for quite some time. I feel your pain...
Upon further review (I've been watching too much football lately), please consider the following. There are four optional netdump-server scripts: netdump-start: Run when a new client does a "service netdump start". netdump-crash: Run when a client initiates a handshake after a crash. It cannot be run in the background because its success/failure determines whether the netdump-server starts a netdump operation or requests the client to just reboot. netdump-nospace: Run when a netdump operation has been accepted, but there is not enough space to hold the vmcore. It cannot be run in the background because its intended use is to allow the netdump-server to clear out space if possible, and if it is successful, does a retry of the space check. netdump-reboot: Run after a vmcore has been created, and just before requesting that the client reboot. So the question is whether to make the netdump-reboot, and perhaps the netdump-start, scripts run in the background, making their operation inconsistent with the other two. If a user decides to have their script file perform a time-consuming task, then the user need only add an ampersand to avoid blocking the netdump-server. So there really is no problem/bug here. Furthermore, it can be argued that, depending upon what the user wants their script to accomplish, that the script *should* be run in the foreground, i.e., the netdump-server should wait before continuing to process any more activity. How do we know that some other customer has something being accomplished by their netdump-reboot script that *should* be waited for? Changing the code now would break that behavior. In my opinion, the process should remain as it is, so that the process can remain as flexible as possible. The user should have complete control as to whether to run the scripts in the foreground or background.
Well said. I mostly buy this line of reasoning. Remember thought that a script that isn't backgrounded halts *all* netdumps, not just the one for which the script is running. Think enterprise: you are pointing 100s or 1000s of machine at a single netdump-server. It isn't uncommon to see multiple netdumps at the same time. One mistake in a script, or just a script that takes a while, will cauase *all* other netdumps to fail. If you don't think that is reason enough to change netdump-server's behavior, perhaps we should change the documentation or man page to point this important fact out.
I can certainly agree that a documentation change is reasonable, to both the package README and the netdump-server(8) man page.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2006-0492.html