Bug 65069
| Summary: | Copy 300k data crashed NetApp | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Retired] Red Hat Linux | Reporter: | hjl | ||||
| Component: | kernel | Assignee: | Ben LaHaise <bcrl> | ||||
| Status: | CLOSED DUPLICATE | QA Contact: | Brian Brock <bbrock> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 7.3 | CC: | gedetil, gerry.morong, jenson, jortega, karlamrhein, kresa, quentin.fennessy, sasha, sysadmin | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | i386 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2002-05-28 21:47:37 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
hjl
2002-05-16 22:36:17 UTC
Created attachment 57648 [details]
Kernel NFS debug message
I would like to know what version of DataOnTap your Netapp was running when it crashed. (sudo rsh NETAPP sysconfig) Thanks Quentin Red Hat 7.3 NFS Version 3 kernel 2.4.18-3 and/or 2.4.18-4 e100 driver. I am experiencing similar problems except our NetApps just come to their knees. Whenever we do any significant IO via NFS to a NetApp or Solaris 2.6 system, the systems come to a crawl because the 7.3 box just keeps hammering. If I force the mount to be NFS version 2 the problem goes away. Also note that the same client hardware running Red Hat 7.2 has no problems doing NFS version 3 to either NetApp or Solaris 2.6. We got the same problem, when writing from any RH-7.3 client to any non-Linux NFS server (Sparc Solaris 8, NetApp F760 6.1.1R2, NetApp F840 6.2R1). Write process will hang, then after a while the NFS server will start to fail. There is no way to kill write process on the client, because it is in a "D" state, but after about 10 min. it will die. Client will start to write if start tcpdump on it, if I stop tcpdump write process will hang in a couple of seconds again, I resume tcpdump and writes will resume and I can kill it. It happens only on RH-7.3. RH 7.2, 7.1, 6.2 and Mandrake 8.x are working w/o any problems. mount -o wsize=8192 fixed the problem. by default wsize=32768 # grep nfs /proc/mounts vega:/vol/v0/home/user /home/user nfs rw,v3,rsize=32768,wsize=32768,hard,udp,lock,addr=nfsserver 0 0 We have had the same results here at Stanford. One only needs to force to nfs v2 or drop the w/rsize to 8k to solve it. But by default it is setting the buffers to 32K which is too high of a default. This throttles the netapp in some yet unknown way (it is unreponsive to all other requests, and the client that is doing a sustained write goes up to 100% utilization of its network interface). More information to back up jlittle.edu -- this obviously doesn't affect only NetApps. All nfs servers I have test this against (Solaris, IRIX) seem to be affected to some extent. We have put out a notice to our Stanford users warning of this issue, as it looks just like a DoS attack. *** This bug has been marked as a duplicate of 64921 *** This sounds remarkably like the problem of GigE to FastEthernet thru a switch. The large rsize & wsize of 32K gets dropped for lack of buffer space moving from Gig to Fast ethernet. the smaller rsize wsize can squeeze thru with only slightly decreased performance. This is a real pain but can the root cause be that GigE is not using flow control properly? |