Red Hat Bugzilla – Bug 140631
Mounting nfs partition causes modern machines to hang
Last modified: 2007-11-30 17:10:55 EST
Description of problem:
Having any partitions mounted via NFS causes machine to hang within 24
hours (inablity to login, if you can login, inability to view the
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.On a modern machine (Asus P4P800E-Deluxe or Abit IS7-E motherboard),
mount a partition from a different machine using nfs mounting
2.Wait 24 hours
3.Try to login to machine
Usually, the login hangs after accepting the name and password.
Sometimes, you can login successfully, but then trying to view the
mounted partition (ls or df) results in your command hanging (and the
load of the machine steadily rises into the hundreds)
You should be able to login every time and you should be able to view
the mounted partition.
This problem is even worse if you are using autofs as well. Turning
on nscd does not fix the problem.
1) is autofs in the picture?
2) could you post a AltSysRq-t system trace by
"echo t > /proc/sysrq-trigger" and then use dmesg to
capture the trace (i.e. dmesg > /tmp/systrace)
1) Autofs is out of the picture
2) There was no way to generate an ALTSysRq-t system trace, because
when the system hangs it is impossible to login. I tried leaving open
sessions on the console and via ssh, but was unable to use them after
the system hanged. I did note that when I did a reboot via
CTRL-ALT-DEL, the message for unmounting NFS said failed twice in a row.
I did test one system with no crontab writing to the NFS partition and
an identical system with a crontab that wrote to it every 5 minutes.
The one with no crontab did not crash, but the other one did.
Here is the relevant entry from /etc/fstab:
galton:/var/spool/mail /var/spool/mail nfs rw,bg,actimeo=0 0 0
And here is the relevant entry from the crontab:
0,5,10,15,20,25,30,35,40,45,50,55 * * * * /usr/bin/w >
Before the system hangs, turn on the AltSysRq processing
by "echo 1 > /proc/sys/kernel/sysrq" Then when the
system hangs type AltSysRq-t on the console key board
and a trace should appear....
Reboot and the trace *should be* in /var/log/messages....
Created attachment 108129 [details]
System trace created when machine hung
I did further tests to refine the problem.
1. Cron is not associated with this problem (I used a shell script in
place of cron to write every 5 minutes and it still got hung up).
2. Writing to local disks does not have this problem (I used a cron
job to write every 5 minutes to a file on a local partition).
3. Reading every 5 minutes from a file on a NFS mounted partition also
causes the machine to hang within 24 hours (I substituted a read for
the write on my cron job accessing the NFS mounted partition).
Looking at the system trace, it appears there are
quite a few shells hung in getting permission bits
from the server (in nfs3_proc_access() to be exact).
If you remove the actimeo=0 mount option, does the
hang still happen?
I removed the actimeo=0 mount option and the hang still happens, just
Ok... I'm trying to reproduce this here, but looking at the
system trace you posted it appears the top half is missing.
I'm trying to find the first sh process that hung, since the rest
of the sh process are just suck behind that one....
/var/log/messages should have the complete trace.
Also could you please post an AltSysRq-m and an
"cat /proc/slabinfo".... just to see how your doing on memory
I'll try to generate another trace and send you the complete
/var/log/messages file when I do.
As an experiment, I tried writing to the nfs mounted partition every
10 minutes, instead of every 5 minutes and it never hung. Is it
possible that there is a problem when a disk write is sent at the same
instant that the computer is flushing its cache to an nfs mounted
disk? If you want, I can try other times to see which intervals cause
the machine to hang.
I'm not sure whats going on.... Over the weekend I was
not able to reproduce this....
What os is running on the server side? Linux, Solaris, netapps?
The server is running Fedora 1. There is nothing fancy going on
there, and the patches should be current.
Created attachment 114082 [details]
This is my netdump output.
I am having the same problems with RedHat 3.0 connected to a NetApp filer.
Any command that has any association with the NFS mount point will hang.
Wow - I've finally discovered how to make NFS mounts stable. One of my users
observed that older versions of RedHat and Fedora were using udp when doing NFS
mounts, but the newer Fedora versions are using tcp. Since I have added the
flags "notcp, udp" to my mount options, everything has been working perfectly.
So basically your saying the NFS server in your FC1 does work with NFS
mount using TCP?
Sorry for not making the fix more clear. Basically, the server works with both
TCP and UDP, but TCP is the only one that occasionally hangs. I don't change
any server settings, but on the client machines, add the flags "notcp,udp" to
the NFS mount options in /etc/fstab and /etc/auto.master. This prohibits TCP
mounting and forces UDP mounting. With these options in place, there have been
no more crashes or hangups for a week now, even when I run programs that used to
always cause the machine to hang within 24 hours.
Fedora Core 3 is now maintained by the Fedora Legacy project for security
updates only. If this problem is a security issue, please reopen and
reassign to the Fedora Legacy product. If it is not a security issue and
hasn't been resolved in the current FC5 updates or in the FC6 test
release, reopen and change the version to match.
Closing per lack of response to previous request for information.
This bug was originally filed against a much earlier version of Fedora
Core, and significant changes have taken place since the last version
for which this bug is confirmed.
Note that FC3 and FC4 are supported by Fedora Legacy for security
fixes only. Please install a still supported version and retest. If
it still occurs on FC5 or FC6, please reopen and assign to the correct
version. Otherwise, if this a security issue, please change the
product to Fedora Legacy. Thanks, and we are sorry that we did not
get to this bug earlier.