Bug 170313 - rpc.statd timing out, nsm_mon_unmon: rpc failed
rpc.statd timing out, nsm_mon_unmon: rpc failed
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: nfs-utils (Show other bugs)
3.0
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Steve Dickson
Ben Levenson
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-10-10 13:47 EDT by Johan van den Dorpe
Modified: 2007-11-30 17:07 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-10-19 14:53:24 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Johan van den Dorpe 2005-10-10 13:47:39 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050909 Fedora/1.0.6-1.2.fc3 Firefox/1.0.6

Description of problem:
Hi

I've been having problems very similar to bug 140385 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=140385

Occasionally, we're seeing these messages in /var/log/messages

Oct 10 17:52:28 hs-wells kernel: statd: server localhost not responding, timed out
Oct 10 17:52:28 hs-wells kernel: lockd: cannot unmonitor 172.17.110.75
Oct 10 17:53:52 hs-wells rpc.statd[760]: Received erroneous SM_UNMON request from hs-wells for 172.17.110.75
Oct 10 17:53:52 hs-wells last message repeated 4 times
Oct 10 17:53:53 hs-wells rpc.statd[760]: Received erroneous SM_UNMON request from hs-wells for 172.18.110.19

And at the same time, these errors in dmesg:

statd: server localhost not responding, timed out
nsm_mon_unmon: rpc failed, status=-5
lockd: cannot unmonitor 172.17.110.75

The last time I saw this happen I also noticed a high load, around the level equivalent to the number of running nfs threads (256).

When we suffer these errors, NFS server connectivity isn't available from the server for a timeframe of 2-10 minutes. In the last instance it recovered by itself & no processes showed signs of having died.

I'm running kernel 2.4.21-32.0.1.EL and today, after reading about the problems with secure-statd, updated to nfs-utils-1.0.6-42EL.

We've only been experiencing the problems since today. Practically all the clients are fedora1, but there are a handful of fedora3 machines.



Version-Release number of selected component (if applicable):
nfs-utils-1.0.6-42EL

How reproducible:
Didn't try

Steps to Reproduce:
1. I don't know/understand what is happening on the causes side to reproduce... it's just something that happens
2.
3.
  

Additional info:
Comment 1 Suzanne Hillman 2005-10-11 15:20:16 EDT
You said you're only having the problems today; is this after the update to
nfs-utils, or entirely unrelated to the timing of that?
Comment 2 Johan van den Dorpe 2005-10-11 16:07:24 EDT
We were running the vanilla nfs-utils 1.0.6 from kernel.org but after
experiencing the problem we updated the nfs-utils-1.0.6-42EL.

We've experienced the problem once after updating nfs-utils (the logs above are
from this instance).
Comment 3 Suzanne Hillman 2005-10-11 17:31:03 EDT
Does there seem to be a difference in frequency of this happening when comparing
before and after you updated the package? 

I note that the bug you linked to noted that the update didn't completely make
it go away, but only reduced the frequency by quite a lot.
Comment 4 Johan van den Dorpe 2005-10-12 06:34:03 EDT
We experienced the problem about 5 times in a morning before updating nfs-utils.
In the 48 hours since then we've only had the one instance, about 5 hours after
the update.

So yes there is a difference in frequency but we're not talking about long time
samples like the other bug.
Comment 8 RHEL Product and Program Management 2007-10-19 14:53:24 EDT
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.

Note You need to log in before you can comment on or make changes to this bug.