Bug 38313

Summary: nfs client craps out if nfs read size is 16k or larger, double read requests for 16k & >
Product: [Retired] Red Hat Linux Reporter: mwc
Component: nfs-utilsAssignee: Pete Zaitcev <zaitcev>
Status: CLOSED WONTFIX QA Contact: David Lawrence <dkl>
Severity: high Docs Contact:
Priority: medium    
Version: 7.1CC: edoutreleau, mwc
Target Milestone: ---   
Target Release: ---   
Hardware: i586   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-04-19 19:30:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description mwc 2001-04-29 18:17:49 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.76 [en] (X11; U; Linux 2.4.2-2 i586)


Just updated my Redhat 7.0 system to 7.1.  Now I can't use the nfs client
because whenever I copy a file from my server (happens to be Solaris), only
16k of the file is copied, and the process doing the copying hangs (will no
longer respond to signals).  No NFS traffic is seen between the client and
server. The system never recovers. From /var/log/messages:

Apr 23 18:30:39 merrimac-dsl automount[661]: attempting to mount entry
/jurassic/home7
Apr 23 18:31:54 merrimac-dsl automount[1641]: expired /jurassic/home7
Apr 23 18:34:24 merrimac-dsl automount[661]: attempting to mount entry
/jurassic/home7
Apr 23 18:37:49 merrimac-dsl kernel: nfs: server jurassic.eng not
responding, still trying
Apr 23 18:46:35 merrimac-dsl su(pam_unix)[1437]: session closed for user
kroot
Apr 23 18:47:21 merrimac-dsl su(pam_unix)[1756]: session opened for user
kroot by mwc(uid=4868)
Apr 23 18:49:17 merrimac-dsl su(pam_unix)[1772]: session opened for user
kroot by mwc(uid=4868)
Apr 23 18:50:20 merrimac-dsl kernel: nfs: task 354 can't get a request slot

Reproducible: Always
Steps to Reproduce:
1. Mount up solaris fs via nfs
2. try to copy a file, process hangs.
3. telnet into server, observe that the amount copied is 16k in siz
	

Actual Results:  See description. the NFS client is no longer usable (to
any host or fs).


Expected Results:  NFS client to work of course. I saw similar behavior
with one version of 6.x; but since I loaded 7.0, it worked great.  After
updating to 7.1, the problem is back, only worse (at least in 6.x the
client would recover).

Once this

Comment 1 Bob Matthews 2001-05-01 21:34:26 UTC
> nfs: task 354 can't get a request slot

This message is usually indicative of either network congestion or a buggy
network card.  Since it's reproducible, I suspect the latter?

Can you tell us which network card is installed in your machine?

Comment 2 mwc 2001-05-03 01:32:55 UTC
I've looked into the bug closer. It appears to be a problem with NFS reads of
16k or greater. I thought it might
be a IP fragment reassembly problem, but both tcpdump on the linux side and
snoop on the solaris side verify that it has:

linux:

17:44:45.230822 > merrimac-dsl.1469094410 > arachnid.nfs: 124 read fh 0,32/65536
0 [|nfs] (DF)
17:44:45.930822 > merrimac-dsl.1469094410 > arachnid.nfs: 124 read fh 0,32/65536
0 [|nfs] (DF)
17:44:46.290822 < arachnid.nfs > merrimac-dsl.1469094410: reply ok 1472 (frag 31
654:1480@0+)
17:44:46.290822 < arachnid > merrimac-dsl: (frag 31654:1480@1480+)
17:44:46.290822 < arachnid > merrimac-dsl: (frag 31654:1480@2960+)
17:44:46.300822 < arachnid > merrimac-dsl: (frag 31654:1480@4440+)
17:44:46.300822 < arachnid > merrimac-dsl: (frag 31654:1480@5920+)
17:44:46.300822 < arachnid > merrimac-dsl: (frag 31654:1480@7400+) 
17:44:46.300822 < arachnid > merrimac-dsl: (frag 31654:1480@8880+)
17:44:46.300822 < arachnid > merrimac-dsl: (frag 31654:1480@10360+)
17:44:46.300822 < arachnid > merrimac-dsl: (frag 31654:1480@11840+)
17:44:46.300822 < arachnid > merrimac-dsl: (frag 31654:1480@13320+)
17:44:46.300822 < arachnid > merrimac-dsl: (frag 31654:1480@14800+)
17:44:46.300822 < arachnid > merrimac-dsl: (frag 31654:240@16280)
17:44:46.310822 > merrimac-dsl.1485871626 > arachnid.nfs: 124 read fh 0,32/65536
0 [|nfs] (DF)
17:44:47.010822 > merrimac-dsl.1485871626 > arachnid.nfs: 124 read fh 0,32/65536
0 [|nfs] (DF)
17:44:47.230822 < arachnid.nfs > merrimac-dsl.1469094410: reply ok 1472 (frag 31
655:1480@0+)
17:44:47.230822 < arachnid > merrimac-dsl: (frag 31655:1480@1480+)
17:44:47.230822 < arachnid > merrimac-dsl: (frag 31655:1480@2960+)
17:44:47.230822 < arachnid > merrimac-dsl: (frag 31655:1480@4440+)
17:44:47.230822 < arachnid > merrimac-dsl: (frag 31655:1480@5920+)
17:44:47.240822 < arachnid > merrimac-dsl: (frag 31655:1480@7400+)
17:44:47.240822 < arachnid > merrimac-dsl: (frag 31655:1480@8880+)
17:44:47.240822 < arachnid > merrimac-dsl: (frag 31655:1480@10360+)
17:44:47.240822 < arachnid > merrimac-dsl: (frag 31655:1480@11840+)
17:44:47.240822 < arachnid > merrimac-dsl: (frag 31655:1480@13320+)
17:44:47.240822 < arachnid > merrimac-dsl: (frag 31655:1480@14800+)
17:44:47.240822 < arachnid > merrimac-dsl: (frag 31655:240@16280)
17:44:48.410822 > merrimac-dsl.1485871626 > arachnid.nfs: 124 read fh 0,32/65536
0 [|nfs] (DF)

Solaris side:

80   0.04848  dsl-195-133 -> arachnid     NFS C READ3 FH=7299 at 0 for 16384
 81   0.01725     arachnid -> dsl-195-133  UDP IP fragment ID=39999 Offset=0
MF=1
 82   0.00005     arachnid -> dsl-195-133  UDP IP fragment ID=39999 Offset=1480
MF=1
 83   0.00005     arachnid -> dsl-195-133  UDP IP fragment ID=39999 Offset=2960
MF=1
 84   0.00004     arachnid -> dsl-195-133  UDP IP fragment ID=39999 Offset=4440
MF=1
 85   0.00004     arachnid -> dsl-195-133  UDP IP fragment ID=39999 Offset=5920
MF=1
 86   0.00005     arachnid -> dsl-195-133  UDPIP fragment ID=39999 Offset=7400
MF=1
 87   0.00004     arachnid -> dsl-195-133  UDP IP fragment ID=39999 Offset=8880
MF=1
 88   0.00004     arachnid -> dsl-195-133  UDP IP fragment ID=39999 Offset=10360
 MF=1
 89   0.00004     arachnid -> dsl-195-133  UDP IP fragment ID=39999 Offset=11840
 MF=1
 90   0.00004     arachnid -> dsl-195-133  UDP IP fragment ID=39999 Offset=13320
 MF=1
 91   0.00028     arachnid -> dsl-195-133  UDP IP fragment ID=39999 Offset=14800
 MF=1
 92   0.00022     arachnid -> dsl-195-133  UDP IP fragment ID=39999 Offset=16280
 MF=0
 97   0.06899  dsl-195-133 -> arachnid     NFS C READ3 FH=7299 at 0 for 16384 (r
etransmit)

Reducing the nfs read size is a workaround. There is some strangeness with NFS
though.  If the nfs read sizes
are 16k or over, the nfs client sends *two* readrequests to the server. That
ain't good. ;^)

Here's 8k reads:

18:29:34.000822 > merrimac-dsl.2879035914 > arachnid.nfs: 124 read fh
0,32/655360 [|nfs] (DF)
18:29:34.260822 < arachnid.nfs > merrimac-dsl.2879035914: reply ok 1472 read
(frag 41690:1480@0+)
18:29:34.260822 < arachnid > merrimac-dsl: (frag 41690:1480@1480+)
18:29:34.260822 < arachnid > merrimac-dsl: (frag 41690:1480@2960+)
18:29:34.260822 < arachnid > merrimac-dsl: (frag 41690:1480@4440+)
18:29:34.270822 < arachnid > merrimac-dsl: (frag 41690:1480@5920+)
18:29:34.270822 < arachnid > merrimac-dsl: (frag 41690:928@7400)
18:29:34.270822 > merrimac-dsl.2895813130 > arachnid.nfs: 124 read fh
0,32/655360 [|nfs] (DF)
18:29:34.740822 < arachnid.nfs > merrimac-dsl.2879035914: reply ok 1472 read
(frag 41691:1480@0+)

16k reads:

18:31:42.930822 > merrimac-dsl.3449461258 > arachnid.nfs: 124 read fh
0,32/655360 [|nfs] (DF)
18:31:43.630822 > merrimac-dsl.3449461258 > arachnid.nfs: 124 read fh
0,32/655360 [|nfs] (DF)
18:31:43.860822 < arachnid.nfs > merrimac-dsl.3432684042: reply ok 1472 read
(frag 41741:1480@0+)
18:31:43.860822 < arachnid > merrimac-dsl: (frag 41741:1480@1480+)
18:31:43.860822 < arachnid > merrimac-dsl: (frag 41741:1480@2960+)
18:31:43.860822 < arachnid > merrimac-dsl: (frag 41741:1480@4440+)
18:31:43.860822 < arachnid > merrimac-dsl: (frag 41741:1480@5920+)
18:31:43.860822 < arachnid > merrimac-dsl: (frag 41741:1480@7400+)
18:31:43.860822 < arachnid > merrimac-dsl: (frag 41741:1480@8880+)
18:31:43.860822 < arachnid > merrimac-dsl: (frag 41741:1480@10360+)
18:31:43.860822 < arachnid > merrimac-dsl: (frag 41741:1480@11840+)
18:31:43.870822 < arachnid > merrimac-dsl: (frag 41741:1480@13320+)
18:31:43.870822 < arachnid > merrimac-dsl: (frag 41741:1480@14800+)
18:31:43.870822 < arachnid > merrimac-dsl: (frag 41741:240@16280)
18:31:44.790822 < arachnid.nfs > merrimac-dsl.3449461258: reply ok 1472 read
(frag 41742:1480@0+)





Comment 3 Pete Zaitcev 2002-10-09 00:19:32 UTC
I suggest updating to 7.3 or 8.0, at least they got a semi-decent
NFS client by default. If on 7.3, immediately update the kernel
to 2.4.18-10.

I do not think anyone in the world tested r/w sizes more than 8K
(I assume we are talking about mount options here, not application
I/O size).


Comment 4 Michael Carney 2002-10-09 02:30:06 UTC
Problem still exists in 7.2.

The readsize I see the problem with *is* 8k (the default (max for nfs over udp)
readsize). The 16k value I mention in one case refers to the number of bytes
successfully transferred before the nfs client goes AWOL. The client only
works reliably when I set rsize and wsize to 2k

Setting the rsize or wsize to 16k on a nfs/udp mount should be rejected by
mount_nfs as illegal. The fact that it doesn't is a bug.

I'll have the sun linux people see if they can reproduce the problem.