Bug 858793 - When started too early, rpc.statd emits "nsm_parse_reply: can't decode RPC reply" messages every few seconds
When started too early, rpc.statd emits "nsm_parse_reply: can't decode RPC re...
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: nfs-utils (Show other bugs)
6.3
All Linux
unspecified Severity low
: rc
: ---
Assigned To: Steve Dickson
Filesystem QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-09-19 13:16 EDT by Orion Poplawski
Modified: 2014-03-07 13:05 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-08-13 13:52:42 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Orion Poplawski 2012-09-19 13:16:09 EDT
Description of problem:

Apparently if rpc.statd starts to early it can get into a bad state:

Sep 11 16:01:28 hawk rpc.statd[1303]: Version 1.2.3 starting
Sep 11 16:01:28 hawk sm-notify[1304]: Version 1.2.3 starting
Sep 11 16:01:28 hawk rpc.statd[1303]: nsm_parse_reply: can't decode RPC reply
Sep 11 16:01:28 hawk rpc.statd[1303]: nsm_parse_reply: can't decode RPC reply
Sep 11 16:01:29 hawk kernel: RPC: Registered named UNIX socket transport module.
Sep 11 16:01:29 hawk kernel: RPC: Registered udp transport module.
Sep 11 16:01:29 hawk kernel: RPC: Registered tcp transport module.
Sep 11 16:01:29 hawk kernel: RPC: Registered tcp NFSv4.1 backchannel transport module.
Sep 11 16:01:41 hawk rpc.statd[1303]: nsm_parse_reply: can't decode RPC reply
Sep 11 16:01:41 hawk rpc.statd[1303]: nsm_parse_reply: can't decode RPC reply
Sep 11 16:01:45 hawk rpc.statd[1303]: nsm_parse_reply: can't decode RPC reply
Sep 11 16:01:45 hawk rpc.statd[1303]: nsm_parse_reply: can't decode RPC reply
Sep 11 16:01:57 hawk rpc.statd[1303]: nsm_parse_reply: can't decode RPC reply
Sep 11 16:01:57 hawk rpc.statd[1303]: nsm_parse_reply: can't decode RPC reply
Sep 11 16:01:58 hawk rpc.statd[1303]: nsm_parse_reply: can't decode RPC reply
Sep 11 16:01:58 hawk rpc.statd[1303]: nsm_parse_reply: can't decode RPC reply
....

Restarting rpc.statd fixes things.  From the log it does look like rpc.statd starts before rpc is fully set up going by the kernel modules.  Not sure the proper way to make it wait.  Perhaps the rpcbind init script needs to not return until it is fully initialized?

This machine is an old dual processor PIII, so perhaps the combination allows the race to occur.

Version-Release number of selected component (if applicable):
nfs-utils-1.2.3-26.el6.i686
rpcbind-0.2.0-9.el6.i686

How reproducible:
Not very it seems, only seen once, but unless you're checking logs you won't see it.
Comment 2 Steve Dickson 2012-10-08 16:31:47 EDT
(In reply to comment #0)
> Description of problem:
> 
> Apparently if rpc.statd starts to early it can get into a bad state:
> 
> Sep 11 16:01:28 hawk rpc.statd[1303]: Version 1.2.3 starting
> Sep 11 16:01:28 hawk sm-notify[1304]: Version 1.2.3 starting
> Sep 11 16:01:28 hawk rpc.statd[1303]: nsm_parse_reply: can't decode RPC reply
> Sep 11 16:01:28 hawk rpc.statd[1303]: nsm_parse_reply: can't decode RPC reply
> Sep 11 16:01:29 hawk kernel: RPC: Registered named UNIX socket transport
> module.
> Sep 11 16:01:29 hawk kernel: RPC: Registered udp transport module.
> Sep 11 16:01:29 hawk kernel: RPC: Registered tcp transport module.
> Sep 11 16:01:29 hawk kernel: RPC: Registered tcp NFSv4.1 backchannel
> transport module.
> Sep 11 16:01:41 hawk rpc.statd[1303]: nsm_parse_reply: can't decode RPC reply
> Sep 11 16:01:41 hawk rpc.statd[1303]: nsm_parse_reply: can't decode RPC reply
> Sep 11 16:01:45 hawk rpc.statd[1303]: nsm_parse_reply: can't decode RPC reply
> Sep 11 16:01:45 hawk rpc.statd[1303]: nsm_parse_reply: can't decode RPC reply
> Sep 11 16:01:57 hawk rpc.statd[1303]: nsm_parse_reply: can't decode RPC reply
> Sep 11 16:01:57 hawk rpc.statd[1303]: nsm_parse_reply: can't decode RPC reply
> Sep 11 16:01:58 hawk rpc.statd[1303]: nsm_parse_reply: can't decode RPC reply
> Sep 11 16:01:58 hawk rpc.statd[1303]: nsm_parse_reply: can't decode RPC reply
I have to wonder if this is some type of network issue... This error
is happen because xdr_replymsg() is failing. The only reason xdr_replymsg()
can fail there is some corruption in the RPC header...
Comment 3 Orion Poplawski 2012-10-08 16:36:42 EDT
I think it's triggered by rpc.statd starting before portmap and/or the kernel rpc modules being configured.
Comment 4 Steve Dickson 2012-10-12 09:15:55 EDT
I'm having hard time reproduce this.... any suggestions?
Comment 5 Orion Poplawski 2012-10-12 13:19:11 EDT
Hmm, not really.  If it happens again I'll attach a debugger to try to get more info.
Comment 10 Steve Dickson 2013-08-13 13:52:42 EDT
Since we have not seen this for a while I'm going to close it....

Note You need to log in before you can comment on or make changes to this bug.