From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0rc1) Gecko/20020417 Description of problem: I have found that from Redhat 7.1, I had to modify the autofs startup script to add nfsvers=2 to the list of default options. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.Mount a disk using NFS 3 2.Do lots of reading 3.It will hang Actual Results: Mountpoint hangs Expected Results: Mountpoints should not hang unless the server is indefinitely broken Additional info: Using nfsvers=2
I don't know about reads, but I can vouch for NFS v3 having problems. I can do large read without problems, but any large writes over a network with any kind of real latency and the NFS client goes beserk. It seems to get into a state where it does endless retries and the delay between retries decreases as time progresses sending a flood of nfs packets over the net. I was going across two local routers and manages to take down 6 or seven of our production subnets. One client was enough to flood a 100M pipe. Strangely enough, trying to reproduce the event across a local switch (with no latency) didn't work. Version 2 works fine. This problem did not exist in 7.2, so something got broke. here's the output from nfsstat -c for my last test ( I had to stop any further testing. The network guys threatened bodily harm.) Client rpc stats: calls retrans authrefrsh 11483 55336 0 Client nfs v2: null getattr setattr root lookup readlink 0 0% 256 24% 0 0% 0 0% 254 24% 0 0% read wrcache write create remove rename 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% link symlink mkdir rmdir readdir fsstat 0 0% 0 0% 0 0% 0 0% 539 51% 1 0% Client nfs v3: null getattr setattr lookup access readlink 0 0% 1509 14% 1 0% 1842 17% 6202 59% 0 0% read write create mkdir symlink mknod 0 0% 32 0% 0 0% 0 0% 0 0% 0 0% remove rmdir rename link readdir readdirplus 0 0% 0 0% 0 0% 0 0% 809 7% 0 0% fsstat fsinfo pathconf commit 3 0% 2 0% 0 0% 2 0% This after less than thiry seconds trying to copy a 16M file.
We recommend that you use UDP for now. NFS/TCP was just recently enabled at all in the upstream kernel source tree, and it is functional for some uses but clearly not for yours.
Is TCP the default? The tcpdump output that we were looking at showed UDP.
It does appear to be using UDP, and not TCP on my network as well, but the NFS version did seem to be 3. If NFS 3 support is still not mature, I'm not sure why it would have been made the new default. What's worse, the change was not documented, as far as I can tell - the nfs(5) man page still says... nfsvers=n Use an alternate RPC version number to con- tact the NFS daemon on the remote host. This option is useful for hosts that can run multiple NFS servers. The default value is version 2.
The default was version three in 7.2 also. The problem did not exist in 7.2 so something has changed. nfsvers=n is not what you think. I think it is for running multiple instances of an nfs server on the same box, but I'm not sure. The option that specifies the protocol version is not in the man page (or I missed it). It's just vers=n.
Well, nfsvers=2 and vers=2 appear to accomplish exactly the same thing, as far as the kernel is concerned. In either case, the option simply appears as "v2" when you look at /proc/mounts, so I'd just as soon stick to the documented options (even if the docs are out of date). In any case, setting localoptions='nfsvers=2' in /etc/init.d/autofs did fix the problem for me. (I was able to get things to fail consistently by forcing a large core dump on an NFS-mounted file system before, and things now work as they should.) I think the nfs(5) man page is misleading about the multiple instances suggestion. That would apply to mountprog=n and nfsprog=n, but mountvers=n and nfsvers=n are meant for specifying protocol versions. The mountd and nfsd daemons support multiple protocol versions, for reverse compatibility, regardless of how many program instances you may be running.
I'll chime in that I'm seeing this problem as well, on an NIS homedir. If I don't change autofs to use nfsv2, the box gets in sorry shape very quickly. Lots of "nfs: task XXXXX can't get a request slot" errors, and X session trying to use the NFS/NIS homedir locks up hard.
Hello, Please note that I confirm the behavior reported in this bug. However, I offer a different suggestion than working around the problem by setting nfsvers=2 in /etc/rc.d/init.d/autofs. Instead I set 'tcp,nfsvers=3'. This fixs/works around the problem for me. I am a bit confused because I thought that nfs 3 was TCP only. Like a previous poster I experienced UDP traffic from a malfunctioning Red Hat 7.3 client. In summary NFS 3 work well for me as long as I specify tcp in my mount entries. Regards, Joe
*** Bug 64984 has been marked as a duplicate of this bug. ***
*** Bug 65069 has been marked as a duplicate of this bug. ***
Several fixes to the NFS client are now added to the kernel and will show up in subsequent errata releases. Please reopen this bug if the errata kernels newer than 2.4.18-4 exhibit the same problem.
I get the same thing as above only with dmfe ethernet driver and a solaris7 nfs server. It really becomes evident with starting mozilla for the first time if it has the convert the netscape 4.x profile over. Tryed the nfsvers=2 option didn't fix it. solaris7 NFS server fully update with a freshly installed and update rh7.3 box with kernel-2.4.18-4, automounted home dirs. Email me for more info if you need it. Thanks, Mitchell
I have very slow nfs write with 2.4.18-5 (the one with the nfs fixes). Client is 7.3, Solaris server. No problem with 7.2 clients. I tried putting localoptions='nfsvers=2', localoptions='rsize=8192,wsize=8192' and localoptions='rsize=8192,wsize=8192,vers=2' in /etc/init.d/autofs, with no sucess.
I, too, have experienced very slow writes (~50k/sec) with the new kernel. See my bug report #67199
After deciding that nfs was still a problem in 2.4.18-5, I went back to 2.4.18-4 and set rsize=8192,wsize=8192. This seems better. For an 8MB write: 2.4.9-31 1.6 2.4.18-4 2.5 (with rsize=8192,wsize=8192) 2.4.18-5 25 (with rsize=8192,wsize=8192) ... the price of progress. Any magic options for 2.4.18-5?
Try mounting 'sync' vs 'async'. Do you see a difference? For me, async is great, sync = ~50k writes. Ick.
The mount options rsize=8192,wsize=8192,async do give reasonable write speeds with the 2.4.18-5 nfs client. (I put localoptions='rsize=8192,wsize=8192,async' in /etc/init.d/autofs). Thanks!
Solaris 7 NFS server and stock Red Hat GNU/Linux 7.3 NFS mount: Copying files 200KB or larger on the NFS mount causes the copy process to slow to a virual stop. The kernel is 2.4.18-3. Adding nfsvers=2 to /etc/fstab fixed the problem, as suggested by dlr in the initial post. Unless you want to use NFS version 3, nothing more needs to be done to fix this problem.
Some of us _do_ need NFSv3 to work, because we need to use files >2GB. I have tried a succession of kernels and options in a quest for decent performance and still haven't arrived at a satisfactory solution. Stability is also critical; I'd rather them be a little slow than panic every so often. Has anyone tried a 2.4.19 kernel? Any relevant fixes in there? How about downgrading to 2.4.7 or so (7.2's release)?
This problem still exists in RHL 7.3 with kernel 2.4.18-10. The NFS server is RHL 7.3 with kernel 2.4.18-10 and the clients are also RHL 7.3 with kernel 2.4.18-10. When trying to write to the NFS mounted directory it hangs. But it works fine with RHL 7.2 clients. I have tried the following setting on the client side: -fstype=nfs,hard,intr,nodev,nosuid,quota,rsize=8192,wsize=8192 servername:/home/& -fstype=nfs,hard,intr,nodev,nosuid,quota,nfsvers=2,rsize=8192,wsize=8192 servername:/home/& but it did not solve the problem. Any advise on how to resolve this? Thanks, Venkat
I also am seeing this problem.
Sorry I hit the button too fast on my last comment. I wanted to say that I've been working with Venkat (venkat) and just wanted to attach my email address to this bug so I could track progress that way.
I updated to a stock 2.4.{18|19} kernel and things are fine now. The problems I was seeing from mozilla were a combo of both nfs problems and a mozilla bug.
I traced down my nfs writes being 10 times slower on my RH 7.3 machines compared to my RH 7.1 machines to using the option 'timeo=300'. I use this along with 'soft' even though I know all the docs say don't do 'soft'. As soon as I removed the 'timeo=300' (but not 'soft') my performance was normal again. I don't understand how timeo should be making a difference since according to the man page it is only for when the server is not responding.