Bug 692702

Summary: rpc.idmapd sometimes fails to start due to sunrpc
Product: Red Hat Enterprise Linux 6 Reporter: Andrew Schultz <ajschult784>
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED ERRATA QA Contact: yanfu,wang <yanwang>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.0CC: dcbw, syeghiay
Target Milestone: rcKeywords: OtherQA
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: nfs-utils-1.2.3-8.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 18:53:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
script to demonstrate conditions #2 and #3 none

Description Andrew Schultz 2011-03-31 21:17:13 UTC
Description of problem:
the rpcidmapd script sometimes fails to start rpc.idmapd because /var/lib/nfs/rpc_pipefs is not mounted, even though it explicitly loads sunrpc.  The reason is that sunrpc is not finished loading even though modprobe returns control to the script.  I just added additional bits to sleep and recheck for rpc_pipefs when it can't find rpc_pipefs the first time.  This allows the script to successfully load rpc.idmapd.

Version-Release number of selected component (if applicable):
nfs-utils-1.2.2-7.el6

Steps to Reproduce:
1. configure a box as an NFS client (and not a server)
2. boot
  
Actual results:
1. see messages in boot.log about missing RPC MTAB
2. all NFS files owned by user 2<<32

Expected results:
1. rpc.idmapd starts happily
2. NFS files owned by normal users

Comment 2 Steve Dickson 2011-04-01 14:05:26 UTC
(In reply to comment #0)
> Description of problem:
> the rpcidmapd script sometimes fails to start rpc.idmapd because
> /var/lib/nfs/rpc_pipefs is not mounted, even though it explicitly loads sunrpc.
>  The reason is that sunrpc is not finished loading even though modprobe returns
> control to the script.  I just added additional bits to sleep and recheck for
> rpc_pipefs when it can't find rpc_pipefs the first time.  This allows the
> script to successfully load rpc.idmapd.
> 
> Version-Release number of selected component (if applicable):
> nfs-utils-1.2.2-7.el6
> 
> Steps to Reproduce:
> 1. configure a box as an NFS client (and not a server)
> 2. boot
> 
> Actual results:
> 1. see messages in boot.log about missing RPC MTAB
Could you please post what these messages are?

Comment 3 Andrew Schultz 2011-04-01 15:55:06 UTC
boot.log would contain:

Starting RPC idmapd: FATAL: Module sunrpc already in kernel.
FATAL: Error running install command for sunrpc
Error: RPC MTAB does not exist.

the more explicit complaints about sunrpc sound worse, but don't cause real problems.  Further investigation reveals that something earlier during bootup triggers loading sunrpc (actually... sunrpc is in initramfs).  It apparently just takes its time finishing the loading process, especially on the first bootup.

Comment 4 Steve Dickson 2011-04-01 20:41:56 UTC
Hey,

(In reply to comment #3)
> boot.log would contain:
> 
> Starting RPC idmapd: FATAL: Module sunrpc already in kernel.
> FATAL: Error running install command for sunrpc
> Error: RPC MTAB does not exist.


The fact the script does not find /var/lib/nfs/rpc_pipefs in 
/proc/mounts and then the 'modprobe sunrpc' fails because 
the module already exists basically means whomever is
loading sunrpc is not doing it correctly. 

Does 'grep sunrpc /etc/modprobe.d/dist.conf' show two 
rules on sunrpc?

Comment 5 Andrew Schultz 2011-04-01 21:26:51 UTC
andrew@george#grep sunrpc dist.conf 
install sunrpc /sbin/modprobe --first-time --ignore-install sunrpc && { /bin/mount -t rpc_pipefs sunrpc /var/lib/nfs/rpc_pipefs > /dev/null 2>&1 || :; }
remove sunrpc { /bin/umount /var/lib/nfs/rpc_pipefs > /dev/null 2>&1 || :; } ; /sbin/modprobe -r --ignore-remove sunrpc
alias rpc_pipefs sunrpc
alias rpc_svc_gss_pipefs sunrpc

Comment 6 RHEL Program Management 2011-04-04 02:08:00 UTC
Since RHEL 6.1 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 7 Steve Dickson 2011-04-05 19:10:57 UTC
I'm unable to reproduce this locally... so when this happens
can you verify that 
   1) sunrpc is loaded (lsmod | grep sunrpc)
   2) /var/lib/nfs/rpc_pipefs is not mounted (grep sunrpc /proc/mounts)

I would add those to commands to the nfs initscript.

Comment 8 Andrew Schultz 2011-04-06 13:49:49 UTC
It's only a problem for a moment (which is why "sleep 1" fixes it).  The various conditions for this to happen really only happen during first (or second) bootup, but I can induce them synthetically.

1. sunrpc will be loaded on its own during bootup (before rpcidmapd).  At the top of the NetworkManager startup script, I added a script that checks for sunrpc and "sleep 1" until it is loaded.  It only needs to sleep once.

2. There is a finite amount of time after sunrpc starts to load before it shows up in lsmod.  trying to "modprobe sunrpc" during this time results in the first two error messages in comment 3.

3. There is a finite amount of time after sunrpc shows up in lsmod that rpc_pipefs will show up in /proc/mounts.  This is what causes rpcidmapd to fail.

The amounts of time for #2 and #3 are longer when the system doing more disk access (apparently during first boot).  I'll attach a script that I used to test this.

Comment 9 Andrew Schultz 2011-04-06 13:58:06 UTC
Created attachment 490288 [details]
script to demonstrate conditions #2 and #3

I ran this after booting into single user mode and starting a few of the services that would be loaded before rpcidmapd (network, portreserve, rpcbind).

The output would look like:

RPC: Registered udp transport module.
RPC: Registered tcp transport module.
RPC: Registered tcp NFSv4.1 backchannel transport module.
0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2
[messages about unregistering]

"0" indicates sunrpc doesn't show up in lsmod and "1" indicates that it does, but that rpc_pipefs doesn't show up in /proc/mounts.

Comment 10 Steve Dickson 2011-04-07 17:56:20 UTC
(In reply to comment #8)
> It's only a problem for a moment (which is why "sleep 1" fixes it).  The
> various conditions for this to happen really only happen during first (or
> second) bootup, but I can induce them synthetically.
> 
> 1. sunrpc will be loaded on its own during bootup (before rpcidmapd). 
Who and where is it getting loaded? Because when its loaded the
rpc_pipefs should also happen... immediately after the insmod.

Comment 11 Andrew Schultz 2011-04-10 18:54:02 UTC
(In reply to comment #10)
> > 1. sunrpc will be loaded on its own during bootup (before rpcidmapd). 
> Who and where is it getting loaded? Because when its loaded the
> rpc_pipefs should also happen... immediately after the insmod.

... for some definition of "immediately".  See #2 and #3 from comment 8.

It gets loaded during the NetworkManager script.  I sprinkled checks within the script and the daemon itself (/usr/sbin/NetworkManager) seems to be what triggers sunrpc.  How/why/when/where that happens... I don't know.  I'm not going to trace through the NetworkManager source code to track it down.  Seems like a bit of a silly exercise anyway.

What's important here is that the rpcidmapd makes invalid assumptions behavior about module loading.

Comment 12 Steve Dickson 2011-04-11 14:25:37 UTC
(In reply to comment #11)
> (In reply to comment #10)
> > > 1. sunrpc will be loaded on its own during bootup (before rpcidmapd). 
> > Who and where is it getting loaded? Because when its loaded the
> > rpc_pipefs should also happen... immediately after the insmod.
> 
> ... for some definition of "immediately".  See #2 and #3 from comment 8.
> 
> It gets loaded during the NetworkManager script.  I sprinkled checks within the
> script and the daemon itself (/usr/sbin/NetworkManager) seems to be what
> triggers sunrpc. 
Cool... thanks for this effort.. 
 
> How/why/when/where that happens... I don't know.  I'm not
> going to trace through the NetworkManager source code to track it down.  .
Of course not... Its just I'm not seeing this problem in my world so 
I'm just trying to figure what is going on... 

> 
> What's important here is that the rpcidmapd makes invalid assumptions behavior
> about module loading.
Not really since the /var/lib/nfs/rpc_pipefs is user to kernel interface
the daemons, like rpc.idmapd, use. So it wrong to load the sunrpc
module without enabling its interface (i.e. mounting /var/lib/nfs/rpc_pipefs)

BTW, what version of NetworkManager are you running?

Again, thank you for your time...

Comment 13 Andrew Schultz 2011-04-11 15:53:53 UTC
> So it wrong to load the sunrpc module without enabling its interface (i.e.
> mounting /var/lib/nfs/rpc_pipefs)

/var/lib/nfs/rpc_pipefs is mounted even when sunrpc loading is triggered by NetworkManager.  It's just not immediate.

> BTW, what version of NetworkManager are you running?

0.8.1-5.el6_0.1.x86_64

Comment 16 yanfu,wang 2011-09-09 03:01:46 UTC
I try to test using comment #9:
# cat bangfile 
#!/bin/bash
dd if=/dev/zero of=test bs=1M count=7000

# cat test.sh 
#!/bin/sh

# creates a 7GB file and reads it in a grossly inefficient way
./bangfile &
sleep 5

modprobe sunrpc &
while [ 1 ]; do
  lsmod | grep sunrpc > /dev/null
  rv=$?
  if [ $rv -eq 0 ]; then
    grep pipefs /proc/mounts > /dev/null
    rv=$?
    if [ $rv -eq 0 ]; then
      echo 2
      break
    else
      echo -n "1 "
    fi
  else
    echo -n "0 "
  fi
done

umount /var/lib/nfs/rpc_pipefs
rmmod sunrpc
killall bangfile

Then boot into single user mode and run test.sh script, and got below test result against RHEL6.1 and RHEL6.2 package.
reproduced on RHEL6.1:
[root@hp-dl385g7-01 /]# ./test.sh 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 RPC: Registered udp transport module.
RPC: Registered tcp transport module.
1 RPC: Registered tcp NFSv4.1 backchannel transport module.
1 1 1 1 1 1 1 1 1 1 1 2
RPC: Unregistered udp transport module.
RPC: Unregistered tcp transport module.
RPC: Unregistered tcp NFSv4.1 backchannel transport module.
./test.sh: line 29:  1679 Terminated              ./bangfile

^C

verified on rhel6.2:
[root@tyan-gt24-05 /]# ./test.sh 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 RPC: Registered udp transport module.
RPC: Registered tcp transport module.
RPC: Registered tcp NFSv4.1 backchannel transport module.
1 1 1 1 1 2
RPC: Unregistered udp transport module.
RPC: Unregistered tcp transport module.
RPC: Unregistered tcp NFSv4.1 backchannel transport module.
./tesh.sh: line 28:  1213 Terminated              ./bangfile


So I'm not sure what's the meaning of the script shown to me.

I check rpcidmapd.init of nfs-utils-1.2.3-8.el6 src rpm and could found below specification:
        # Make sure the rpc_pipefs filesystem is available
        /bin/mount -t rpc_pipefs sunrpc /var/lib/nfs/rpc_pipefs > /dev/null 2>&1

Verified sunrpc loading is ok on RHEL6.2 alpha distro:
# lsmod | grep sunrpc
sunrpc                197959  31 nfsd,nfs,lockd,nfs_acl,rpcsec_gss_krb5,auth_rpcgss
# grep sunrpc /proc/mounts
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0

So is the test result is ok for you?

Comment 17 Andrew Schultz 2011-09-09 03:49:40 UTC
> So I'm not sure what's the meaning of the script shown to me.

The script simply demonstrates that the condition described in comment 0 is possible.  Fixing the bug won't prevent the condition from occurring or alter the attached script's behavior, but would simply change the init script to handle the condition.

In terms of reproducing the bug, I only hit the bug on fresh installs.  Conceivably, the bug could be reproduced with nfs-utils-1.2.3-7 by replacing the while loop in the attached script with

/etc/init.d/rpcidmapd start

but I think my own attempts to encounter the bug that way were unsuccessful.

Comment 18 yanfu,wang 2011-09-13 06:31:15 UTC
(In reply to comment #17)
> > So I'm not sure what's the meaning of the script shown to me.
> 
> The script simply demonstrates that the condition described in comment 0 is
> possible.  Fixing the bug won't prevent the condition from occurring or alter
> the attached script's behavior, but would simply change the init script to
> handle the condition.
> 
> In terms of reproducing the bug, I only hit the bug on fresh installs. 
> Conceivably, the bug could be reproduced with nfs-utils-1.2.3-7 by replacing
> the while loop in the attached script with
> 
> /etc/init.d/rpcidmapd start
> 
> but I think my own attempts to encounter the bug that way were unsuccessful.

I can't reproduce it also, so do code review and the below result is ok, if any problem pls let me know.
Checking rpcidmapd.init of nfs-utils-1.2.3-8.el6 src rpm and could found below
specification:
        # Make sure the rpc_pipefs filesystem is available
        /bin/mount -t rpc_pipefs sunrpc /var/lib/nfs/rpc_pipefs > /dev/null
2>&1

Verified sunrpc loading is ok after install RHEL6.2 alpha distro:
# lsmod | grep sunrpc
sunrpc                197959  31
nfsd,nfs,lockd,nfs_acl,rpcsec_gss_krb5,auth_rpcgss
# grep sunrpc /proc/mounts
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0

Comment 19 errata-xmlrpc 2011-12-06 18:53:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2011-1534.html