Description of problem: We continue to experience nfs issues due on our Linux clients to both Solaris and NetApp servers, primarily with Gigabit ethernet though also with 100Mbps. Version-Release number of selected component (if applicable): RHEL3 U3 WS Additional info: For reference: we have tried - updating the driver/kernel as suggested (now running with 2.4.21-20) with some improvement but without total success. - replacing the onboard NIC with a pci-x version. - hard coding switch and client to 1000 Full duplex, have flow control enabled. - tuning nfs mount options - downgrading to 100Mbps - reduced but did not eliminate problem - had network team (STO) validate network was sound.
From Leigh.... All, Problems still persist with our Linujx clients and NFS. Here is a tcpdump during an occurence of our network issue. Any suggestions welcome. Note we have the latest kernel applied. Tried a separate NIC. And also had the network thoroughly tested.Suggestions welcome. Regards Leigh > > From: Glenney, Susan SM SITI-ITISERPA > > Subject: RE: Network / Linux issues > > > > Here is the dump file there was one rpc timeout during the > > dump. NFS server was brclib.shell.com (138.54.36.127) > > > > <<tcphoubtc97.gz>> > > Thanks, > > Susan > >
Jay, We are currently testing RHEL3. We are having major issues with rpc time outs for nfs mounts between Linux clients and Solaris nfs servers (and to a smaller extent NetApp filers). Are you aware of any interoperability issues with nfs? Regards Leigh
Per Tuesday's (9/08) call: > RPC Issues > ------------------ > Larry: Check with Intel (e1000) driver maintainer (RH) on status of > latest driver at Intel site (5.3.19). > Whether there are any issues with updates since the 5.2.52 version - > tomorrow. Have asked Engineer/maintainer JG for comments. > > Larry: Provide list of updates to Xander/Leigh from the RHEL3 U3 > release for downlaod that affect rpc/nfs/network (e.g. nfs-utils) - > today. nfs-utils went from 1.0.6-8EL in Update2 to 1.0.6-31EL in Update3. redhat-config-nfs-1.0.13-1 is unchanged (and should have nothing to dowith your failure). > > Larry/Jay/Chris/Keith/Alan: Examine traces supplied yesterday and > feedback - tomorrow. See below... > > Leigh: Provide RH with sysreport - today. Done > > Keith: Provide download link for latest 6105 BIOS. Check if this > defaults to enable XTPR. U3 kernel should have corresponding > enablement. Working this. > > Chris: Advise of additional diagnostics/debug/data collection > required to lead to resolving this problem should current > suggestions fail. Working this.
From Red Hat: Leigh, Can you get soem output from the following: nfsstat -rc #this prints only client side RPC statistics Also, the sysreport contained 2 days work of /var/log/messages but had no RPC or NFS related information. Is this typical for your problem or has this box not seen the problem since syslog was restarted? Can you send me the messages.1 file? The file prior to Sept. 5 Thanks, Chris W. ---------------------------------------------- Additionally, it would probably be worth while to drop a different NIC into the 8200 that uses a tg3 driver. That would go a long way in determining if this problem is e1000 related. If you have any further tcpdumps that would be helpful, please send them to us. One of our engineers looked over the tcpdump provided and did not find anything problems in it. Thanks, Chris
From JG in RH Engineering > Larry: Check with Intel (e1000) driver maintainer (RH) on status of > latest driver at Intel site (5.3.19). > Whether there are any issues with updates since the 5.2.52 version > They are free to try 5.3.x level, that's the latest upstream version, and will be appearing in RHEL3 U4. e1000 changelog attached below (from JG)
Created attachment 103694 [details] tcp dump
Created attachment 103695 [details] sysreport
Created attachment 103696 [details] e1000 driver changelog
Fom Jay: Comments regarding the tcp dump: OK, this thing is really, really weird. Towards the end of the dump, there are 154 nfs packets in a row (actually more, but explaining that here shortly.) The really odd thing about all of these packets is that all 154 carry the same timestamp of 573.989570 . . .that is, all 154 packets occurred in the same 1-millionth of a second. That seems a bit strange to me. They are followed by another 30 nfs packets which again occur in the same 1-millionth of a second. So that's 184 nfs packets in a row, and 180 of those are GETATTR packets. Anyway, haven't really dug much more than that, but someone definitely appears to be going nuts with that machine.
Seems to be running better since upgrading to latest nfs-utils package and U3 kernel. Waiting for feedback from customer.
Last comment from 3 years ago states that this was better after nfs-utils was updated. I'm going to close this as CURRENTRELEASE. Please reopen if that's not the case.