If a (remote) NFS server (mounted with the default options) goes down, this can cause the (local) load average to get very high, which can cause problems such as sendmail refusing connections. This happens as follows: - If a process is blocked waiting for a read from a file (e.g. a file on a remote NFS server), it is put in state "D" (disc wait), corresponding to TASK_UNINTERRUPTIBLE in include/linux/sched.h. - When the load average is calculated in kernel/sched.c, in the function count_active_tasks(), processes in the state TASK_UNINTERRUPTIBLE are counted as "running", and thus contribute to the calculated load average - If the remote NFS server is down, the process in question will remain in the TASK_UNINTERRUPTIBLE state indefinitely, and thus the load average will rise. In particular, the "slocate" program run by default from "cron" in the standard RHL setup seems to attempt to contact the NFS server and hang (this appears at first sight to be a bug in "slocate" since it is configured not to scan NFS mounts - I'll investigate this further later); thus the load average rises by at least 1 every day. This does not cause too many problems at first, because the system is in fact not loaded (it is just the _reported_ load that is high), so nothing slows down. Eventually, however, this causes sendmail to refuse connections, blocking incoming email. I don't know too much about kernel-hacking, so I'm not sure what the best fix for this would be - the easiest way to fix the symptom would be not to include TASK_UNINTERRUPTIBLE in the processes counted by count_active_tasks (); however I don't know what other consequences this would have.
This is the way unix load averages get defined... strange but true.
There is still a bug here: when any mounted NFS server goes down, sendmail stops responding to connections because of the apparently high load average. This needs to be fixed - by changing the load average, or by making sure "slocate" etc. don't get hung up on stale mounts, or by using a different method to calculate the system load in sendmail - but whatever the best solution, it's still a bug :-)