Red Hat Bugzilla – Full Text Bug Listing
|Summary:||Mounting nfs partition causes modern machines to hang|
|Product:||[Fedora] Fedora||Reporter:||Ed Friedman <ed>|
|Component:||nfs-utils||Assignee:||Steve Dickson <steved>|
|Status:||CLOSED CANTFIX||QA Contact:||Ben Levenson <benl>|
|Version:||3||CC:||jeremy, mattdm, mtonn|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2006-10-31 10:53:36 EST||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
Description Ed Friedman 2004-11-23 16:57:51 EST
Description of problem: Having any partitions mounted via NFS causes machine to hang within 24 hours (inablity to login, if you can login, inability to view the mounted partition). Version-Release number of selected component (if applicable): nfs-utils-1.0.6-39 How reproducible: Almost always. Steps to Reproduce: 1.On a modern machine (Asus P4P800E-Deluxe or Abit IS7-E motherboard), mount a partition from a different machine using nfs mounting 2.Wait 24 hours 3.Try to login to machine Actual results: Usually, the login hangs after accepting the name and password. Sometimes, you can login successfully, but then trying to view the mounted partition (ls or df) results in your command hanging (and the load of the machine steadily rises into the hundreds) Expected results: You should be able to login every time and you should be able to view the mounted partition. Additional info: This problem is even worse if you are using autofs as well. Turning on nscd does not fix the problem.
Comment 1 Steve Dickson 2004-11-30 09:23:38 EST
Two things: 1) is autofs in the picture? 2) could you post a AltSysRq-t system trace by "echo t > /proc/sysrq-trigger" and then use dmesg to capture the trace (i.e. dmesg > /tmp/systrace)
Comment 2 Ed Friedman 2004-12-06 16:39:04 EST
1) Autofs is out of the picture 2) There was no way to generate an ALTSysRq-t system trace, because when the system hangs it is impossible to login. I tried leaving open sessions on the console and via ssh, but was unable to use them after the system hanged. I did note that when I did a reboot via CTRL-ALT-DEL, the message for unmounting NFS said failed twice in a row. I did test one system with no crontab writing to the NFS partition and an identical system with a crontab that wrote to it every 5 minutes. The one with no crontab did not crash, but the other one did. Here is the relevant entry from /etc/fstab: galton:/var/spool/mail /var/spool/mail nfs rw,bg,actimeo=0 0 0 And here is the relevant entry from the crontab: 0,5,10,15,20,25,30,35,40,45,50,55 * * * * /usr/bin/w > /var/spool/mail/rwh/raj
Comment 3 Steve Dickson 2004-12-06 18:12:21 EST
Before the system hangs, turn on the AltSysRq processing by "echo 1 > /proc/sys/kernel/sysrq" Then when the system hangs type AltSysRq-t on the console key board and a trace should appear.... Reboot and the trace *should be* in /var/log/messages....
Comment 4 Ed Friedman 2004-12-08 12:37:17 EST
Created attachment 108129 [details] System trace created when machine hung
Comment 5 Ed Friedman 2004-12-28 14:48:21 EST
I did further tests to refine the problem. 1. Cron is not associated with this problem (I used a shell script in place of cron to write every 5 minutes and it still got hung up). 2. Writing to local disks does not have this problem (I used a cron job to write every 5 minutes to a file on a local partition). 3. Reading every 5 minutes from a file on a NFS mounted partition also causes the machine to hang within 24 hours (I substituted a read for the write on my cron job accessing the NFS mounted partition).
Comment 6 Steve Dickson 2005-01-03 11:22:18 EST
Looking at the system trace, it appears there are quite a few shells hung in getting permission bits from the server (in nfs3_proc_access() to be exact). If you remove the actimeo=0 mount option, does the hang still happen?
Comment 7 Ed Friedman 2005-01-05 13:21:32 EST
I removed the actimeo=0 mount option and the hang still happens, just as before.
Comment 8 Steve Dickson 2005-01-07 06:42:13 EST
Ok... I'm trying to reproduce this here, but looking at the system trace you posted it appears the top half is missing. I'm trying to find the first sh process that hung, since the rest of the sh process are just suck behind that one.... /var/log/messages should have the complete trace. Also could you please post an AltSysRq-m and an "cat /proc/slabinfo".... just to see how your doing on memory consumption
Comment 9 Ed Friedman 2005-01-10 14:25:31 EST
I'll try to generate another trace and send you the complete /var/log/messages file when I do. As an experiment, I tried writing to the nfs mounted partition every 10 minutes, instead of every 5 minutes and it never hung. Is it possible that there is a problem when a disk write is sent at the same instant that the computer is flushing its cache to an nfs mounted disk? If you want, I can try other times to see which intervals cause the machine to hang.
Comment 10 Steve Dickson 2005-01-11 07:16:22 EST
I'm not sure whats going on.... Over the weekend I was not able to reproduce this.... What os is running on the server side? Linux, Solaris, netapps?
Comment 11 Ed Friedman 2005-01-11 12:42:54 EST
The server is running Fedora 1. There is nothing fancy going on there, and the patches should be current.
Comment 12 Michael Tonn 2005-05-06 10:37:37 EDT
Created attachment 114082 [details] netdump output This is my netdump output.
Comment 13 Michael Tonn 2005-05-06 10:38:45 EDT
I am having the same problems with RedHat 3.0 connected to a NetApp filer. Any command that has any association with the NFS mount point will hang.
Comment 14 Ed Friedman 2005-08-12 14:17:38 EDT
Wow - I've finally discovered how to make NFS mounts stable. One of my users observed that older versions of RedHat and Fedora were using udp when doing NFS mounts, but the newer Fedora versions are using tcp. Since I have added the flags "notcp, udp" to my mount options, everything has been working perfectly.
Comment 15 Steve Dickson 2005-08-16 23:39:26 EDT
So basically your saying the NFS server in your FC1 does work with NFS mount using TCP?
Comment 16 Ed Friedman 2005-08-18 14:45:04 EDT
Sorry for not making the fix more clear. Basically, the server works with both TCP and UDP, but TCP is the only one that occasionally hangs. I don't change any server settings, but on the client machines, add the flags "notcp,udp" to the NFS mount options in /etc/fstab and /etc/auto.master. This prohibits TCP mounting and forces UDP mounting. With these options in place, there have been no more crashes or hangups for a week now, even when I run programs that used to always cause the machine to hang within 24 hours.
Comment 17 Matthew Miller 2006-07-10 17:19:46 EDT
Fedora Core 3 is now maintained by the Fedora Legacy project for security updates only. If this problem is a security issue, please reopen and reassign to the Fedora Legacy product. If it is not a security issue and hasn't been resolved in the current FC5 updates or in the FC6 test release, reopen and change the version to match. Thank you!
Comment 18 John Thacker 2006-10-31 10:53:36 EST
Closing per lack of response to previous request for information. This bug was originally filed against a much earlier version of Fedora Core, and significant changes have taken place since the last version for which this bug is confirmed. Note that FC3 and FC4 are supported by Fedora Legacy for security fixes only. Please install a still supported version and retest. If it still occurs on FC5 or FC6, please reopen and assign to the correct version. Otherwise, if this a security issue, please change the product to Fedora Legacy. Thanks, and we are sorry that we did not get to this bug earlier.