Bug 38313
Summary: | nfs client craps out if nfs read size is 16k or larger, double read requests for 16k & > | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | mwc |
Component: | nfs-utils | Assignee: | Pete Zaitcev <zaitcev> |
Status: | CLOSED WONTFIX | QA Contact: | David Lawrence <dkl> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 7.1 | CC: | edoutreleau, mwc |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i586 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2004-04-19 19:30:39 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
mwc
2001-04-29 18:17:49 UTC
> nfs: task 354 can't get a request slot
This message is usually indicative of either network congestion or a buggy
network card. Since it's reproducible, I suspect the latter?
Can you tell us which network card is installed in your machine?
I've looked into the bug closer. It appears to be a problem with NFS reads of 16k or greater. I thought it might be a IP fragment reassembly problem, but both tcpdump on the linux side and snoop on the solaris side verify that it has: linux: 17:44:45.230822 > merrimac-dsl.1469094410 > arachnid.nfs: 124 read fh 0,32/65536 0 [|nfs] (DF) 17:44:45.930822 > merrimac-dsl.1469094410 > arachnid.nfs: 124 read fh 0,32/65536 0 [|nfs] (DF) 17:44:46.290822 < arachnid.nfs > merrimac-dsl.1469094410: reply ok 1472 (frag 31 654:1480@0+) 17:44:46.290822 < arachnid > merrimac-dsl: (frag 31654:1480@1480+) 17:44:46.290822 < arachnid > merrimac-dsl: (frag 31654:1480@2960+) 17:44:46.300822 < arachnid > merrimac-dsl: (frag 31654:1480@4440+) 17:44:46.300822 < arachnid > merrimac-dsl: (frag 31654:1480@5920+) 17:44:46.300822 < arachnid > merrimac-dsl: (frag 31654:1480@7400+) 17:44:46.300822 < arachnid > merrimac-dsl: (frag 31654:1480@8880+) 17:44:46.300822 < arachnid > merrimac-dsl: (frag 31654:1480@10360+) 17:44:46.300822 < arachnid > merrimac-dsl: (frag 31654:1480@11840+) 17:44:46.300822 < arachnid > merrimac-dsl: (frag 31654:1480@13320+) 17:44:46.300822 < arachnid > merrimac-dsl: (frag 31654:1480@14800+) 17:44:46.300822 < arachnid > merrimac-dsl: (frag 31654:240@16280) 17:44:46.310822 > merrimac-dsl.1485871626 > arachnid.nfs: 124 read fh 0,32/65536 0 [|nfs] (DF) 17:44:47.010822 > merrimac-dsl.1485871626 > arachnid.nfs: 124 read fh 0,32/65536 0 [|nfs] (DF) 17:44:47.230822 < arachnid.nfs > merrimac-dsl.1469094410: reply ok 1472 (frag 31 655:1480@0+) 17:44:47.230822 < arachnid > merrimac-dsl: (frag 31655:1480@1480+) 17:44:47.230822 < arachnid > merrimac-dsl: (frag 31655:1480@2960+) 17:44:47.230822 < arachnid > merrimac-dsl: (frag 31655:1480@4440+) 17:44:47.230822 < arachnid > merrimac-dsl: (frag 31655:1480@5920+) 17:44:47.240822 < arachnid > merrimac-dsl: (frag 31655:1480@7400+) 17:44:47.240822 < arachnid > merrimac-dsl: (frag 31655:1480@8880+) 17:44:47.240822 < arachnid > merrimac-dsl: (frag 31655:1480@10360+) 17:44:47.240822 < arachnid > merrimac-dsl: (frag 31655:1480@11840+) 17:44:47.240822 < arachnid > merrimac-dsl: (frag 31655:1480@13320+) 17:44:47.240822 < arachnid > merrimac-dsl: (frag 31655:1480@14800+) 17:44:47.240822 < arachnid > merrimac-dsl: (frag 31655:240@16280) 17:44:48.410822 > merrimac-dsl.1485871626 > arachnid.nfs: 124 read fh 0,32/65536 0 [|nfs] (DF) Solaris side: 80 0.04848 dsl-195-133 -> arachnid NFS C READ3 FH=7299 at 0 for 16384 81 0.01725 arachnid -> dsl-195-133 UDP IP fragment ID=39999 Offset=0 MF=1 82 0.00005 arachnid -> dsl-195-133 UDP IP fragment ID=39999 Offset=1480 MF=1 83 0.00005 arachnid -> dsl-195-133 UDP IP fragment ID=39999 Offset=2960 MF=1 84 0.00004 arachnid -> dsl-195-133 UDP IP fragment ID=39999 Offset=4440 MF=1 85 0.00004 arachnid -> dsl-195-133 UDP IP fragment ID=39999 Offset=5920 MF=1 86 0.00005 arachnid -> dsl-195-133 UDPIP fragment ID=39999 Offset=7400 MF=1 87 0.00004 arachnid -> dsl-195-133 UDP IP fragment ID=39999 Offset=8880 MF=1 88 0.00004 arachnid -> dsl-195-133 UDP IP fragment ID=39999 Offset=10360 MF=1 89 0.00004 arachnid -> dsl-195-133 UDP IP fragment ID=39999 Offset=11840 MF=1 90 0.00004 arachnid -> dsl-195-133 UDP IP fragment ID=39999 Offset=13320 MF=1 91 0.00028 arachnid -> dsl-195-133 UDP IP fragment ID=39999 Offset=14800 MF=1 92 0.00022 arachnid -> dsl-195-133 UDP IP fragment ID=39999 Offset=16280 MF=0 97 0.06899 dsl-195-133 -> arachnid NFS C READ3 FH=7299 at 0 for 16384 (r etransmit) Reducing the nfs read size is a workaround. There is some strangeness with NFS though. If the nfs read sizes are 16k or over, the nfs client sends *two* readrequests to the server. That ain't good. ;^) Here's 8k reads: 18:29:34.000822 > merrimac-dsl.2879035914 > arachnid.nfs: 124 read fh 0,32/655360 [|nfs] (DF) 18:29:34.260822 < arachnid.nfs > merrimac-dsl.2879035914: reply ok 1472 read (frag 41690:1480@0+) 18:29:34.260822 < arachnid > merrimac-dsl: (frag 41690:1480@1480+) 18:29:34.260822 < arachnid > merrimac-dsl: (frag 41690:1480@2960+) 18:29:34.260822 < arachnid > merrimac-dsl: (frag 41690:1480@4440+) 18:29:34.270822 < arachnid > merrimac-dsl: (frag 41690:1480@5920+) 18:29:34.270822 < arachnid > merrimac-dsl: (frag 41690:928@7400) 18:29:34.270822 > merrimac-dsl.2895813130 > arachnid.nfs: 124 read fh 0,32/655360 [|nfs] (DF) 18:29:34.740822 < arachnid.nfs > merrimac-dsl.2879035914: reply ok 1472 read (frag 41691:1480@0+) 16k reads: 18:31:42.930822 > merrimac-dsl.3449461258 > arachnid.nfs: 124 read fh 0,32/655360 [|nfs] (DF) 18:31:43.630822 > merrimac-dsl.3449461258 > arachnid.nfs: 124 read fh 0,32/655360 [|nfs] (DF) 18:31:43.860822 < arachnid.nfs > merrimac-dsl.3432684042: reply ok 1472 read (frag 41741:1480@0+) 18:31:43.860822 < arachnid > merrimac-dsl: (frag 41741:1480@1480+) 18:31:43.860822 < arachnid > merrimac-dsl: (frag 41741:1480@2960+) 18:31:43.860822 < arachnid > merrimac-dsl: (frag 41741:1480@4440+) 18:31:43.860822 < arachnid > merrimac-dsl: (frag 41741:1480@5920+) 18:31:43.860822 < arachnid > merrimac-dsl: (frag 41741:1480@7400+) 18:31:43.860822 < arachnid > merrimac-dsl: (frag 41741:1480@8880+) 18:31:43.860822 < arachnid > merrimac-dsl: (frag 41741:1480@10360+) 18:31:43.860822 < arachnid > merrimac-dsl: (frag 41741:1480@11840+) 18:31:43.870822 < arachnid > merrimac-dsl: (frag 41741:1480@13320+) 18:31:43.870822 < arachnid > merrimac-dsl: (frag 41741:1480@14800+) 18:31:43.870822 < arachnid > merrimac-dsl: (frag 41741:240@16280) 18:31:44.790822 < arachnid.nfs > merrimac-dsl.3449461258: reply ok 1472 read (frag 41742:1480@0+) I suggest updating to 7.3 or 8.0, at least they got a semi-decent NFS client by default. If on 7.3, immediately update the kernel to 2.4.18-10. I do not think anyone in the world tested r/w sizes more than 8K (I assume we are talking about mount options here, not application I/O size). Problem still exists in 7.2. The readsize I see the problem with *is* 8k (the default (max for nfs over udp) readsize). The 16k value I mention in one case refers to the number of bytes successfully transferred before the nfs client goes AWOL. The client only works reliably when I set rsize and wsize to 2k Setting the rsize or wsize to 16k on a nfs/udp mount should be rejected by mount_nfs as illegal. The fact that it doesn't is a bug. I'll have the sun linux people see if they can reproduce the problem. |