Bug 132292 - Linux rpc issues
Summary: Linux rpc issues
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: nfs-utils
Version: 3.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Jeff Layton
QA Contact: Ben Levenson
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-09-10 17:44 UTC by Larry Troan
Modified: 2016-04-18 09:45 UTC (History)
4 users (show)

Fixed In Version: nfs-utils-1.0.6-33EL
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-07-17 11:07:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
tcp dump (635.02 KB, application/octet-stream)
2004-09-10 18:00 UTC, Larry Troan
no flags Details
sysreport (297.79 KB, application/octet-stream)
2004-09-10 18:01 UTC, Larry Troan
no flags Details
e1000 driver changelog (12.20 KB, text/plain)
2004-09-10 18:02 UTC, Larry Troan
no flags Details

Description Larry Troan 2004-09-10 17:44:41 UTC
Description of problem:
We continue to experience nfs issues due on our Linux clients to both
Solaris and NetApp servers, primarily with Gigabit ethernet though
also with 100Mbps.


Version-Release number of selected component (if applicable):
RHEL3 U3 WS 

Additional info:

For reference: we have tried
- updating the driver/kernel as suggested (now running with 2.4.21-20)
with some improvement but without total success.
- replacing the onboard NIC with a pci-x version.
- hard coding switch and client to 1000 Full duplex, have flow control
enabled.
- tuning nfs mount options
- downgrading to 100Mbps
- reduced but did not eliminate problem
- had network team (STO) validate network was sound.

Comment 1 Larry Troan 2004-09-10 17:46:38 UTC
From Leigh....
All,
Problems still persist with our Linujx clients and NFS. Here is a
tcpdump during an occurence of our network issue.
Any suggestions welcome. Note we have the latest kernel applied. Tried
a separate NIC. And also had the network thoroughly tested.Suggestions
welcome.

Regards
Leigh
> > From:       Glenney, Susan SM SITI-ITISERPA  
> > Subject:    RE: Network / Linux issues
> >
> > Here is the dump file there was one rpc timeout during the
> > dump.  NFS server was brclib.shell.com (138.54.36.127)
> >
> > <<tcphoubtc97.gz>>
> > Thanks,
> > Susan
> >

Comment 2 Larry Troan 2004-09-10 17:48:52 UTC
Jay,
We are currently testing RHEL3. We are having major issues with rpc
time outs for nfs mounts between Linux clients and Solaris nfs servers
(and to a smaller extent NetApp filers).

Are you aware of any interoperability issues with nfs?

Regards
Leigh

Comment 3 Larry Troan 2004-09-10 17:54:25 UTC
Per Tuesday's (9/08) call:

> RPC Issues
> ------------------
> Larry: Check with Intel (e1000) driver maintainer (RH) on status of   
> latest driver at Intel site (5.3.19).
> Whether there are any issues with updates since the 5.2.52 version -
> tomorrow.
Have asked Engineer/maintainer JG for comments.  
>
> Larry: Provide list of updates to Xander/Leigh from the RHEL3 U3 
> release for downlaod that affect rpc/nfs/network (e.g. nfs-utils) - 
> today.
nfs-utils went from 1.0.6-8EL in Update2 to 1.0.6-31EL in Update3.
redhat-config-nfs-1.0.13-1 is unchanged (and should have nothing to
dowith your failure).
>
> Larry/Jay/Chris/Keith/Alan: Examine traces supplied yesterday and 
> feedback - tomorrow.
See below...
>
> Leigh: Provide RH with sysreport - today.
Done
>
> Keith: Provide download link for latest 6105 BIOS. Check if this 
> defaults to enable XTPR. U3 kernel should have corresponding 
> enablement.
Working this.
>
> Chris:  Advise of additional diagnostics/debug/data collection 
> required to lead to resolving this problem should current 
> suggestions fail.
Working this.

Comment 4 Larry Troan 2004-09-10 17:56:09 UTC
From Red Hat:
Leigh,

Can you get soem output from the following:
nfsstat -rc    #this prints only client side RPC statistics

Also, the sysreport contained 2 days work of /var/log/messages but had
no RPC or NFS related information. Is this typical for your problem or
has this box not seen the problem since syslog was restarted? Can you
send me the messages.1 file? The file prior to Sept. 5

Thanks,

Chris W.

----------------------------------------------
Additionally, it would probably be worth while to drop a different NIC
into the 8200 that uses a tg3 driver. That would go a long way in
determining if this problem is e1000 related.

If you have any further tcpdumps that would be helpful, please send
them to us. One of our engineers looked over the tcpdump provided and
did not find anything problems in it.

Thanks,

Chris

Comment 5 Larry Troan 2004-09-10 17:57:45 UTC
From JG in RH Engineering

> Larry: Check with Intel (e1000) driver maintainer (RH) on status of 
> latest driver at Intel site (5.3.19).
> Whether there are any issues with updates since the 5.2.52 version
>
They are free to try 5.3.x level, that's the latest upstream version,
and will be appearing in RHEL3 U4.

e1000 changelog attached below (from JG)

Comment 6 Larry Troan 2004-09-10 18:00:24 UTC
Created attachment 103694 [details]
tcp dump

Comment 7 Larry Troan 2004-09-10 18:01:01 UTC
Created attachment 103695 [details]
sysreport

Comment 8 Larry Troan 2004-09-10 18:02:04 UTC
Created attachment 103696 [details]
e1000 driver changelog

Comment 9 Larry Troan 2004-09-10 18:03:20 UTC
Fom Jay:
 
Comments regarding the tcp dump:

OK, this thing is really, really weird.

Towards the end of the dump, there are 154 nfs packets in a row
(actually more, but explaining that here shortly.)  The really odd
thing about all of these packets is that all 154 carry the same
timestamp of 573.989570 . . .that is, all 154 packets occurred in the
same 1-millionth of a second. That seems a bit strange to me.  They
are followed by another 30 nfs packets which again occur in the same
1-millionth of a second.  So that's 184 nfs packets in a row, and 180
of those are GETATTR packets.

Anyway, haven't really dug much more than that, but someone definitely
appears to be going nuts with that machine.

Comment 11 Larry Troan 2004-09-10 18:21:54 UTC
Seems to be running better since upgrading to latest nfs-utils package
and U3 kernel. Waiting for feedback from customer.

Comment 12 Jeff Layton 2007-07-17 11:07:41 UTC
Last comment from 3 years ago states that this was better after nfs-utils was
updated. I'm going to close this as CURRENTRELEASE. Please reopen if that's not
the case.



Note You need to log in before you can comment on or make changes to this bug.