Bug 132292

Summary: Linux rpc issues
Product: Red Hat Enterprise Linux 3 Reporter: Larry Troan <ltroan>
Component: nfs-utilsAssignee: Jeff Layton <jlayton>
Status: CLOSED CURRENTRELEASE QA Contact: Ben Levenson <benl>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: ichute, staubach, steved, tao
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: nfs-utils-1.0.6-33EL Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-07-17 11:07:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
tcp dump
none
sysreport
none
e1000 driver changelog none

Description Larry Troan 2004-09-10 17:44:41 UTC
Description of problem:
We continue to experience nfs issues due on our Linux clients to both
Solaris and NetApp servers, primarily with Gigabit ethernet though
also with 100Mbps.


Version-Release number of selected component (if applicable):
RHEL3 U3 WS 

Additional info:

For reference: we have tried
- updating the driver/kernel as suggested (now running with 2.4.21-20)
with some improvement but without total success.
- replacing the onboard NIC with a pci-x version.
- hard coding switch and client to 1000 Full duplex, have flow control
enabled.
- tuning nfs mount options
- downgrading to 100Mbps
- reduced but did not eliminate problem
- had network team (STO) validate network was sound.

Comment 1 Larry Troan 2004-09-10 17:46:38 UTC
From Leigh....
All,
Problems still persist with our Linujx clients and NFS. Here is a
tcpdump during an occurence of our network issue.
Any suggestions welcome. Note we have the latest kernel applied. Tried
a separate NIC. And also had the network thoroughly tested.Suggestions
welcome.

Regards
Leigh
> > From:       Glenney, Susan SM SITI-ITISERPA  
> > Subject:    RE: Network / Linux issues
> >
> > Here is the dump file there was one rpc timeout during the
> > dump.  NFS server was brclib.shell.com (138.54.36.127)
> >
> > <<tcphoubtc97.gz>>
> > Thanks,
> > Susan
> >

Comment 2 Larry Troan 2004-09-10 17:48:52 UTC
Jay,
We are currently testing RHEL3. We are having major issues with rpc
time outs for nfs mounts between Linux clients and Solaris nfs servers
(and to a smaller extent NetApp filers).

Are you aware of any interoperability issues with nfs?

Regards
Leigh

Comment 3 Larry Troan 2004-09-10 17:54:25 UTC
Per Tuesday's (9/08) call:

> RPC Issues
> ------------------
> Larry: Check with Intel (e1000) driver maintainer (RH) on status of   
> latest driver at Intel site (5.3.19).
> Whether there are any issues with updates since the 5.2.52 version -
> tomorrow.
Have asked Engineer/maintainer JG for comments.  
>
> Larry: Provide list of updates to Xander/Leigh from the RHEL3 U3 
> release for downlaod that affect rpc/nfs/network (e.g. nfs-utils) - 
> today.
nfs-utils went from 1.0.6-8EL in Update2 to 1.0.6-31EL in Update3.
redhat-config-nfs-1.0.13-1 is unchanged (and should have nothing to
dowith your failure).
>
> Larry/Jay/Chris/Keith/Alan: Examine traces supplied yesterday and 
> feedback - tomorrow.
See below...
>
> Leigh: Provide RH with sysreport - today.
Done
>
> Keith: Provide download link for latest 6105 BIOS. Check if this 
> defaults to enable XTPR. U3 kernel should have corresponding 
> enablement.
Working this.
>
> Chris:  Advise of additional diagnostics/debug/data collection 
> required to lead to resolving this problem should current 
> suggestions fail.
Working this.

Comment 4 Larry Troan 2004-09-10 17:56:09 UTC
From Red Hat:
Leigh,

Can you get soem output from the following:
nfsstat -rc    #this prints only client side RPC statistics

Also, the sysreport contained 2 days work of /var/log/messages but had
no RPC or NFS related information. Is this typical for your problem or
has this box not seen the problem since syslog was restarted? Can you
send me the messages.1 file? The file prior to Sept. 5

Thanks,

Chris W.

----------------------------------------------
Additionally, it would probably be worth while to drop a different NIC
into the 8200 that uses a tg3 driver. That would go a long way in
determining if this problem is e1000 related.

If you have any further tcpdumps that would be helpful, please send
them to us. One of our engineers looked over the tcpdump provided and
did not find anything problems in it.

Thanks,

Chris

Comment 5 Larry Troan 2004-09-10 17:57:45 UTC
From JG in RH Engineering

> Larry: Check with Intel (e1000) driver maintainer (RH) on status of 
> latest driver at Intel site (5.3.19).
> Whether there are any issues with updates since the 5.2.52 version
>
They are free to try 5.3.x level, that's the latest upstream version,
and will be appearing in RHEL3 U4.

e1000 changelog attached below (from JG)

Comment 6 Larry Troan 2004-09-10 18:00:24 UTC
Created attachment 103694 [details]
tcp dump

Comment 7 Larry Troan 2004-09-10 18:01:01 UTC
Created attachment 103695 [details]
sysreport

Comment 8 Larry Troan 2004-09-10 18:02:04 UTC
Created attachment 103696 [details]
e1000 driver changelog

Comment 9 Larry Troan 2004-09-10 18:03:20 UTC
Fom Jay:
 
Comments regarding the tcp dump:

OK, this thing is really, really weird.

Towards the end of the dump, there are 154 nfs packets in a row
(actually more, but explaining that here shortly.)  The really odd
thing about all of these packets is that all 154 carry the same
timestamp of 573.989570 . . .that is, all 154 packets occurred in the
same 1-millionth of a second. That seems a bit strange to me.  They
are followed by another 30 nfs packets which again occur in the same
1-millionth of a second.  So that's 184 nfs packets in a row, and 180
of those are GETATTR packets.

Anyway, haven't really dug much more than that, but someone definitely
appears to be going nuts with that machine.

Comment 11 Larry Troan 2004-09-10 18:21:54 UTC
Seems to be running better since upgrading to latest nfs-utils package
and U3 kernel. Waiting for feedback from customer.

Comment 12 Jeff Layton 2007-07-17 11:07:41 UTC
Last comment from 3 years ago states that this was better after nfs-utils was
updated. I'm going to close this as CURRENTRELEASE. Please reopen if that's not
the case.