From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0rc2) Gecko/20020520 Debian/1.0rc2-3 Description of problem: Trying to use a NFS filesystem on Red Hat 7.3 with kernels 2.4.18-3smp as well as 2.4.18-4 will hang after a short while. The filesystem is exported from Solaris Sparc. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: The fastest way to trig the bug seems to be by untar:ing a large file on the mounted NFS filesystem. Actual Results: All attempts to access the NFS filesytem hangs completely. The following is logged in /var/log/messages: kernel: nfs: task nnnn can't get a request slot I don't get any "not responding, still trying" as far as I can see, however. Expected Results: The tar file should have been untar:ed. Additional info: Compiling kernel-source-2.4.18-4 with CONFIG_NFS_V3 and CONFIG_NFSD_V3 disabled (these are enabled default in Red Hat 7.3) solved the problem: CONFIG_NFS_FS=m # CONFIG_NFS_V3 is not set CONFIG_NFSD=m # CONFIG_NFSD_V3 is not set CONFIG_NCPFS_NFS_NS=y
I have seen similar errors on a RedHat 7.2 box when mounting the directories from a Solaris 2.6 server. Note that the solaris server has set "nfssrv:nfs_portmon=1" in /etc/system which disallows NFS client connections from ports above 1024. Apparently, the RedHat 7.2 NFS client doesn't play by the rules when using NFS3 over TCP and this results in many requests being denied on the server. This problem does not occur when you use NFS3 over UDP. Just mentioning this because you might a experiencing a similar problem. It would be nice if this client behaviour has been fixed in RH 7.2 too. -akop
I would like to add to this bug report. Here at Brookhaven National Laboratory, we are experiencing exactly the same problem. The only difference is that we are using an nfs server running solaris 2.8. We are currently trying to upgrade on the order of 40 systems to red hat 7.3 but we will stick with 7.2 until this bug has been resolved. We do not want to rebuild a special kernel for the 40 systems as suggested by noring.
I see the same problem on a RedHat 7.3 machine, mounting from EMC Celerra NFS appliances. This one hurts!
Redhat 7.2 using NFS v3 to Solaris 2.8 machine causes the issue for us, but cd & ls through to network file system by command line does NOT cause the issue, but quite a few applictions do, e.g. as soon as we use Nautilus to browse the file system it hangs solid, re-starting autofs free up the file system, but the original application, nautilus in this case is locked solid until reboot!! This is on multiple Dell systems, with Intel and 3COM network adapters, to Solaris 2.8 tested (but also seen on Solaris 2.6) servers.
I, too, am experiencing this problem from a RH 7.3 server (2.4.18-18.7.xsmp) and two different RH 8.0 clients (2.4.18-18.8.0). This problem seems to have only shown up recently when I applied one of the RHN kernel updates to my 7.3 server (two updates ago). The problem seems to only show up after the client has been up for a while.
I'm experiencing the exact same conditions mentioned in this bug with a RH 8.0 NFS client to a SPARC Solaris 9 server. It seems to hang when doing a flush operation on the NFS client side. This is always the last thing I see on the Solaris side when it hangs (yoho=client, alyssa=server): 2462 0.00018 yoho -> alyssa UDP IP fragment ID=60482 Offset=0 MF=1 TOS=0x0 TTL=64 2463 0.00155 alyssa -> yoho RPC R XID=980343051 Success 2464 0.00038 yoho -> alyssa NFS C COMMIT3 FH=6FAE at 29458432 for 0 2465 0.01849 alyssa -> yoho NFS R COMMIT3 OK On the client side, with sunrpc.nfs_debug set to 1 via sysctl, I see this in the log file: Dec 18 14:44:25 yoho kernel: NFS: refresh_inode(b/4 ct=2 info=0x7) Dec 18 14:44:26 yoho last message repeated 87 times Dec 18 14:44:33 yoho kernel: nfs: write(//testfile(4), 8192@29368320) Dec 18 14:44:33 yoho kernel: nfs: flush(b/4) And this is where it hangs. I can mitigate the hang to a simple I/O error for the app by mounting it soft,intr, but this only helps to the point that I don't need to reboot the client. The file operation still fails. I found this bug (a similar incidence anyway) in Sunsolve as bugid 4764852. It mentions Redhat incident 38313 and bugzilla 16232. However, I am unsure of how to find these docs. Anyhow, the bug also suggests that it may be a problem with the NIC driver. I completely disagree with this notion. There is nothing to indicate there is anything wrong with the driver for my NIC (3com 3C905), and people with other NICs have complained about the same problem. Lastly, the problem most definitely is not fixed in Redhat Linux 8.0, since that is what I am using. I currently have a custom kernel, version 2.4.19 loaded and am experiencing this. I downloaded this kernel to see if I got different results from the 2.4.18-17.8.0 kernel I was originally having the problem with. This bug is a real show-stopper.
The Component of this bug should be set to kernel, not autofs. The problem is with the nfs driver in the kernel. Is someone ever going to look at this ?
I can confirm this on the latest kernels for RHAS 2.1AS and RH7.3. We had this happen occasionally (once a month or so), but since upgrading to the latest kernel, it is a showstopper. Configuration: The server I am talking to is an Sun8 box. When this happens, it fills the network-pipe 100% with retransmissions from the server to the client. Right now, this is a showstopper. Any resolution coming?
WORKAROUND: Add nfsvers=2 to the mount options. I want to point out this is not a real resolution and someone @ RedHat should look at this.
I haven't looked at this in a while. I don't think anyone's fixed this yet. However, from what I remember, adding nfsvers=2 to the mount option wasn't an effective workaround. I still saw this error occur with NFSv2. The workaround that I implemented was to add tcp to the mount options and force the client to use TCP instead of the default UDP.I haven't seen this problem come up using NFSv3 in the past year since using the tcp mount option. This seems to indicate that using UDP as the transport for both NFSv2 and NFSv3 is the issue.
We had this problem for a long time and we lost a lot of time and money to try to find a fix. We finally did fix it and the fix is quite surprising: We removed the HP ProCurve 4000M switches and hooked everything up to ExtremeNetworks switchs. Apparently, the HP's would loose packages from time to time and the NFS/UDP is not equiped to deal with it. TCP has a build-in mechanisam to deal with lost packages (making it also slower). NFS2 did help out a bit, but did not resolve it 100%. We experimented with all other options as well window sizes, etc., but where not able to get a 100% fix until we changed the switch.
The TCP code has vastly improved in later kernels. So I'm going to assume we do better in later kernels. But if the network is droping packets, there is only so much NFS can do.
This problem was never with TCP. The problem is UDP. As a matter of fact, the workaround for this problem is to force the client to use TCP as the transport. AFAIK, the UDP transport for NFS has not been fixed. Also, this bug has nothing to do with the network dropping packets. I encountered this problem on a private network with 3 systems on it.
This problem was never with TCP. The problem is UDP. As a matter of fact, the workaround for this problem is to force the client to use TCP as the transport. AFAIK, the UDP transport for NFS has not been fixed. Also, this bug has nothing to do with the network dropping packets. I encountered this problem on a private network with 3 systems on it. Please reopen this bug. I doubt it is fixed.
Ok... I did misunderstand this... sorry about that... Although there were also quite a few congestion control fixes that when into the 2.4.20ish kernel (which are in the FC1 kernel) I'll reopen this and put into the NEEDINFO state.... because I'm just not seeing this with later kernels....
Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/
Well, great way to "resolve" bugreports. Let me just say that I have confirmed this on RH AS & ES 2.1 as well as RH ES 3.0. If anybody had actually read the messages posted he/she could have seen as much. You seemed eager to take money for support for your software, but I have yet to see a valid reason I should renew my 10 RHES servers and not replace them with something else. You closing this case, while never resolving it makes me really, really pissed. Microsoft at least resolves it's issues.
This baffles me as well. This bug was opened almost 4 years ago, and nothing was ever done about it. It's because of this bug that I would never consider using Linux as an NFS client, even if there's a workaround. And by the way, it *is* a workaround, not a fix. There's no reason why NFS over UDP shouldn't work in Linux. This is pretty fundamental stuff here. If this problem existed in Solaris, it would have been fixed in a matter of days. At the risk of sounding like a shameless plug for the company that I work for, my advice to Nedik is to use Solaris 10 x86 (the OS itself is free) on your NFS clients if you can. If not, you're probably better off running Windows with some add-on NFS client.