Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 4 product line. The current stable release is 4.9. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 461497

Summary: Systems are printing 'RPC call_verify: retry failed, exit EIO' endlessly
Product: Red Hat Enterprise Linux 4 Reporter: Flavio Leitner <fleitner>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: high    
Version: 4.5CC: dmair, tao
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-03-19 12:21:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
systemtap script
none
zcarh0xk.pcap.bz2
none
post.rpm.zcarh0xk.pcap.bz2
none
screenshot of patched wireshared to parse ClearCase V4. none

Comment 1 Flavio Leitner 2008-09-08 15:57:35 UTC
Created attachment 316094 [details]
systemtap script

Comment 5 Flavio Leitner 2008-10-03 17:28:04 UTC
Created attachment 319394 [details]
zcarh0xk.pcap.bz2

Comment 6 Flavio Leitner 2008-10-03 17:28:55 UTC
Created attachment 319395 [details]
post.rpm.zcarh0xk.pcap.bz2

Comment 7 Flavio Leitner 2008-10-03 17:50:28 UTC
Copying from IT regarding to those 2 new files captured.
----
Here are two more uploads for you:
1. zcarh0xk.pcap.bz2
2. post.rpm.zcarh0xk.pcap.bz2

I was able to login on the node zcarh0xk at the console as root and install wireshark. During the installation I noticed the storm stop. I decided to collect anyhow and so I got zcarh0xk.pcap.bz2.

A short while layer I noticed the storm startup again on this node so I re-logged in and because wireshark was already installed I simply got another collection called post.rpm.zcarh0xk.pcap.bz2.

rpm and up2date seem to do a stat on all file systems and I think that when they do that NFS file systems are touched and the storm stops. Luckily with this node it started again.

This continued trend of touching NFS somehow causing the storm to stop is supporting my believe that something is awry with NFS.
----

I have asked for a lsof to correlate UDP ports 58217, 55096 at 47.129.232.199 
and ports 50279, 50102 at IP 47.129.231.140 with processes, but after the 
storm stopped the output didn't show those ports. Perhaps during next event.
(the requested command is 'lsof -i4 -i6')

This new traffic dump output shows several UDP packets not recognized by 
wireshark, but a ClearCase one has 00 05 f5 70 and these packets has 
00 05 f5 71, so I'm suspecting that these are ClearCase packets too.

I'm thinking these UDP packets contains a RPC packet inside that causes the
error. Perhaps something related to an expiration event which doing a stat() 
on a filesystem fix this issue somehow.

Flavio

Comment 8 Flavio Leitner 2008-10-03 19:54:21 UTC
Created attachment 319411 [details]
screenshot of patched wireshared to parse ClearCase V4.

I did a patch changing 0x70 to 0x71 on wireshark and I could decode the packets 
as a ClearCase V4 (wireshark only supports till V3).

Unfortunately I couldn't see any procedure with reply state different from 0, 
so perhaps there are other type of UDP packets missed in this review that 
contains the error.

I would recommend to get the output of lsof during the storm and correlate the
traffic dump with processes. If it turns to be ClearCase processes then I can't
see what else we can do, perhaps IBM support can help on this because the error
is coming from the server and we don't know the reason in the client side.

Attaching a screenshot of patched wireshark showing one packet as an example.

Flavio

Comment 12 RHEL Program Management 2009-03-12 18:58:06 UTC
Since RHEL 4.8 External Beta has begun, and this bugzilla remains 
unresolved, it has been rejected as it is not proposed as exception or 
blocker.