Description of Problem: This would be a very simple addition to the 'service' command and would be extremely useful as services manipulated with '/sbin/service' would be guaranteed never to end up with CWD=NFS-mounted-partition, which would be a good thing and prevent several possible nasty situations. Version-Release number of selected component (if applicable): All versions How Reproducible: Every time Steps to Reproduce: Artificially constructed example. Can easily happen in other situations (consider 'sudo' + NFS mounted homedirs) 1. mount remote:/export/misc /mnt/nfs 2. cd /mnt/nfs 3. /sbin/service sshd stop 4. /sbin/service sshd start 5. ssh -l root remote '/sbin/shutdown -h now' 6. Watch the fun :-) Actual Results: In that artificial example, access by sshd goes away as sshd's attempts to access its CWD hang on the hard NFS mount to the dead machine. Expected Results: Actually, this is exactly what I expect sshd to do. Corrective action to ensure all services get a CWD of / unless their own init scripts specify otherwise is in order. Additional Information: patch: --- service.old Fri Aug 24 17:26:34 2001 +++ service Thu Sep 6 11:23:10 2001 @@ -57,7 +57,7 @@ done if [ -x "${SERVICEDIR}/${SERVICE}" ]; then - "${SERVICEDIR}/${SERVICE}" ${OPTIONS} + cd / && "${SERVICEDIR}/${SERVICE}" ${OPTIONS} else echo $"${SERVICE}: unrecognized service" >&2 exit 1
Doesn't make much sense me thinks. Why modify "servic"? Consider this: 1. service sshd restart 2. mount remote:/export/misc /mnt/nfs 3. cd /mnt/nfs 4. ssh -l root remote '/sbin/shutdown -h now' 5. Watch the fun :-)
The latter example results in a hung shell, does it not, unable to find its CWD because the NFS server has gone away. A hung shell is not fatal - you can always ssh in again and figure out what went wrong, and a single shell locked in D state isn't fatal. The former example (my original) results in sshd getting stuck in D state, which locks you out of the machine, period, if ssh is your only way in (as it probably should be). Not a good thing in the case of a remote box. Even if it's not remote, arbitrarily rebooting is not an option for a production system. My original example is intended to simulate a potential sequence of events resulting from a bleary-eyed 3am service call in a system comprising many servers working together, rather than something someone would do deliberately. This small change breaks nothing, fixes nothing *directly*, but prevents much potential unpleasantness in the real world.
Aha, so you mean: 4. /sbin/service sshd start ; ssh -l root remote '/sbin/shutdown -h now' instead of two separate steps 4 and 5. Or more clearly: 4. ssh -l root remote '/sbin/shutdown -h now' 5. sleep $LONG_ENOUGH 6. /sbin/service sshd start Wouldn't it be better then to go a step further and check whether CWD is accessible prior to starting a service, rather than changing to "/" always?
Err.. Your first 'Step 4' is 2 commands executed in sequence. It is semantically no different from 2 steps. Your second there is something completely different, and I fear you have misunderstood the nature of the calamity. In your case here, sshd will never get started because your shell will hang (given that you've just shut down the NFS server which serves your current CWD). That is not the issue of concern. My first example given is *exactly* what I meant - start sshd with an NFS mounted CWD and then shut down the NFS server that serves it. In it, CWD is happily accessible when sshd starts, so checking that will serve no purpose. The problem is that CWD is an NFS-mounted partition, and the server then goes away *after* sshd starts. This will lock sshd in D state at some later time, rendering the machine inaccessible. It's also annoying if you accidentally '/sbin/service foo start' from a temporarily-mounted location (e.g. cdrom) and then have to stop the service before you can unmount and eject your media. Now, certainly / is not 100% *guaranteed* not to be NFS-mounted, but I don't see it happening on a server (and if it does you presumably have gigabit networking and the planet's most reliable clustered fileservers, right?).
I've tried to demonstrate that you could still lose sshd (which you said could be the only connection to a remote host--often it is!) if you ran "service sshd restart" at a different point in time. Your suggested fix may solve one real ugly problem, but cannot serve as a universal workaround for administrator's mistakes. Getting "service" to change into root dir without going back to CWD after a service has been started, looks rather unfortunate WRT to the many other ways you can lose sshd. Also, users who run "service" in automated scripts (e.g. in ip-[up|down].local as I do) would have to save/restore CWD themselves or else they would find themselves in root dir. Maybe the sshd initscript can be modified to start from an accessible directory and then restore CWD no matter whether that would hang the rest of the "service" script or not.
[tsmith@hades tsmith]$ pwd /home/tsmith bash$ pwd /home/tsmith bash$ bash bash$ cd / bash$ pwd / bash$ exit exit bash$ pwd /home/tsmith 'nuff said.
Yep. :)
[a last thought -- promised] > Now, certainly / is not 100% *guaranteed* not to be NFS-mounted, > but I don't see it happening on a server Then choose a directory which must exist all the time (because if it doesn't you would be in big trouble already): --- service.orig Tue Sep 18 12:49:51 2001 +++ service Tue Sep 18 12:57:59 2001 @@ -57,7 +57,7 @@ done if [ -x "${SERVICEDIR}/${SERVICE}" ]; then - "${SERVICEDIR}/${SERVICE}" ${OPTIONS} + cd "${SERVICEDIR}" && "${SERVICEDIR}/${SERVICE}" ${OPTIONS} else echo $"${SERVICE}: unrecognized service" >&2 exit 1
Which is why '/' is a good choice. If '/' is NFS, having $SERVICEDIR local won't help you much. I might want to unmount $SERVICEDIR while the machine is running, without stopping services, which I cannot do if the services have it as CWD. (And why should they have it as CWD? they depend on nothing in there at *runtime*) If individual services themselves choose to chdir to their own subtrees, that is their business and affects no other service. The only path it is safe to have *all* services depend on is '/', since you cannot unmount '/' on a running machine anyway.
"service sshd --full-restart" ought to be fixed then, too, in line 43.
*** Bug 55535 has been marked as a duplicate of this bug. ***
cd / added in 6.62-1.
Why has "service sshd --full-restart" not been fixed, too? It still does "cd ${SERVICEDIR}".