Bug 461497
| Summary: | Systems are printing 'RPC call_verify: retry failed, exit EIO' endlessly | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 4 | Reporter: | Flavio Leitner <fleitner> | ||||||||||
| Component: | kernel | Assignee: | Red Hat Kernel Manager <kernel-mgr> | ||||||||||
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Martin Jenner <mjenner> | ||||||||||
| Severity: | high | Docs Contact: | |||||||||||
| Priority: | high | ||||||||||||
| Version: | 4.5 | CC: | dmair, tao | ||||||||||
| Target Milestone: | rc | ||||||||||||
| Target Release: | --- | ||||||||||||
| Hardware: | All | ||||||||||||
| OS: | Linux | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | 2009-03-19 12:21:49 UTC | Type: | --- | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Attachments: |
|
||||||||||||
Created attachment 319394 [details] zcarh0xk.pcap.bz2 Created attachment 319395 [details] post.rpm.zcarh0xk.pcap.bz2 Copying from IT regarding to those 2 new files captured. ---- Here are two more uploads for you: 1. zcarh0xk.pcap.bz2 2. post.rpm.zcarh0xk.pcap.bz2 I was able to login on the node zcarh0xk at the console as root and install wireshark. During the installation I noticed the storm stop. I decided to collect anyhow and so I got zcarh0xk.pcap.bz2. A short while layer I noticed the storm startup again on this node so I re-logged in and because wireshark was already installed I simply got another collection called post.rpm.zcarh0xk.pcap.bz2. rpm and up2date seem to do a stat on all file systems and I think that when they do that NFS file systems are touched and the storm stops. Luckily with this node it started again. This continued trend of touching NFS somehow causing the storm to stop is supporting my believe that something is awry with NFS. ---- I have asked for a lsof to correlate UDP ports 58217, 55096 at 47.129.232.199 and ports 50279, 50102 at IP 47.129.231.140 with processes, but after the storm stopped the output didn't show those ports. Perhaps during next event. (the requested command is 'lsof -i4 -i6') This new traffic dump output shows several UDP packets not recognized by wireshark, but a ClearCase one has 00 05 f5 70 and these packets has 00 05 f5 71, so I'm suspecting that these are ClearCase packets too. I'm thinking these UDP packets contains a RPC packet inside that causes the error. Perhaps something related to an expiration event which doing a stat() on a filesystem fix this issue somehow. Flavio Created attachment 319411 [details]
screenshot of patched wireshared to parse ClearCase V4.
I did a patch changing 0x70 to 0x71 on wireshark and I could decode the packets
as a ClearCase V4 (wireshark only supports till V3).
Unfortunately I couldn't see any procedure with reply state different from 0,
so perhaps there are other type of UDP packets missed in this review that
contains the error.
I would recommend to get the output of lsof during the storm and correlate the
traffic dump with processes. If it turns to be ClearCase processes then I can't
see what else we can do, perhaps IBM support can help on this because the error
is coming from the server and we don't know the reason in the client side.
Attaching a screenshot of patched wireshark showing one packet as an example.
Flavio
Since RHEL 4.8 External Beta has begun, and this bugzilla remains unresolved, it has been rejected as it is not proposed as exception or blocker. |
Created attachment 316094 [details] systemtap script