Bug 177783
Summary: | netdump-server halts when running external script | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Joshua Jensen <joshua> |
Component: | netdump | Assignee: | Thomas Graf <tgraf> |
Status: | CLOSED ERRATA | QA Contact: | |
Severity: | low | Docs Contact: | |
Priority: | medium | ||
Version: | 4.0 | CC: | lwang, rkhan, tao |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | RHBA-2006-0492 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2006-08-10 21:26:55 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 179457, 181409 |
Description
Joshua Jensen
2006-01-13 22:31:34 UTC
Can you do the gzip command in the background? The netdump-server just does a "system()" of the script. i.e., cd $2 (sleep 10; nice gzip -9 vmcore) & Can't netdump-server do the right thing here? Wouldn't an error in the script effectively halt all netdump activity on the server until it was restarted? Surely the netdump-server can fork off a process for the script. If the script failed, the system() function would just return, and the netdump-server would continue. Surely it could fork off a process, but it doesn't... ;-) (Sorry -- I didn't write the damn thing.) If the script does something that never returns, bad bad things would happen. The netdump-server can provide more robust behavior by forking off the script, and not leaving itself so open to killing off all other netdump sessions. The code isn't set in stone? ie, if we come up with a better way for it to work, can't it be changed? Do you not have permission to change the code? I understand that you didn't write it... but if only the original authors could change the code... well... we would still all be running Solaris :-) No, but if (1) I fixed it today, and (2) if it was deemed a requirement for a future netdump package errata -- which is no small task getting it that permission from the powers-that-be -- you wouldn't see it in a release errata for many months from now. The next netdump errata is already in the pipeline, and even that won't hit the streets for quite some time. I feel your pain... Upon further review (I've been watching too much football lately), please consider the following. There are four optional netdump-server scripts: netdump-start: Run when a new client does a "service netdump start". netdump-crash: Run when a client initiates a handshake after a crash. It cannot be run in the background because its success/failure determines whether the netdump-server starts a netdump operation or requests the client to just reboot. netdump-nospace: Run when a netdump operation has been accepted, but there is not enough space to hold the vmcore. It cannot be run in the background because its intended use is to allow the netdump-server to clear out space if possible, and if it is successful, does a retry of the space check. netdump-reboot: Run after a vmcore has been created, and just before requesting that the client reboot. So the question is whether to make the netdump-reboot, and perhaps the netdump-start, scripts run in the background, making their operation inconsistent with the other two. If a user decides to have their script file perform a time-consuming task, then the user need only add an ampersand to avoid blocking the netdump-server. So there really is no problem/bug here. Furthermore, it can be argued that, depending upon what the user wants their script to accomplish, that the script *should* be run in the foreground, i.e., the netdump-server should wait before continuing to process any more activity. How do we know that some other customer has something being accomplished by their netdump-reboot script that *should* be waited for? Changing the code now would break that behavior. In my opinion, the process should remain as it is, so that the process can remain as flexible as possible. The user should have complete control as to whether to run the scripts in the foreground or background. Well said. I mostly buy this line of reasoning. Remember thought that a script that isn't backgrounded halts *all* netdumps, not just the one for which the script is running. Think enterprise: you are pointing 100s or 1000s of machine at a single netdump-server. It isn't uncommon to see multiple netdumps at the same time. One mistake in a script, or just a script that takes a while, will cauase *all* other netdumps to fail. If you don't think that is reason enough to change netdump-server's behavior, perhaps we should change the documentation or man page to point this important fact out. I can certainly agree that a documentation change is reasonable, to both the package README and the netdump-server(8) man page. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2006-0492.html |