Do this on 2.6.18-92.1.6.el5debug kernel after a fresh reboot: 1) mount a tcp NFSv3 filesystem 2) unmount it 3) service nfs start ...nfsd will fail to start because lockd_up fails. From dmesg: FS-Cache: Loaded FS-Cache: netfs 'nfs' registered for caching SELinux: initialized (dev 0:17, type nfs), uses genfs_contexts Installing knfsd (copyright (C) 1996 okir.de). SELinux: initialized (dev nfsd, type nfsd), uses genfs_contexts NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory NFSD: starting 90-second grace period lockd_up: makesock failed, error=-98 lockd_down: no lockd running. nfsd: last server has exited nfsd: unexporting all filesystems ...then if you do a "service nfs restart": lockd_up: no pid, 2 users?? NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory NFSD: starting 90-second grace period nfsd: last server has exited nfsd: unexporting all filesystems ...so I think we have a couple of bugs here. Something is causing the makesock to fail and when this occurs, lockd_up isn't handling the error condition appropriately and it's throwing off the nlmsvc_users counter. I suspect this is a regression from 5.1, but I need to confirm it.
Actually, this doesn't appear to be a regression. When I do the same test on -8.el5, then I get these messages: lockd_up: makesock failed, error=-98 lockd_up: no pid, 2 users?? lockd_up: no pid, 3 users?? lockd_up: no pid, 4 users?? lockd_up: no pid, 5 users?? lockd_up: no pid, 6 users?? lockd_up: no pid, 7 users?? lockd_up: no pid, 8 users?? ...and lockd isn't started. Since no one has complained about this, I'll put this on 5.4 proposed for now. If the fix turns out to be simple I may move it to 5.3...
This problem has strangely "fixed itself". Yesterday, I could reliably reproduce this. Today, I can't make it happen. The host where I saw this was a RHEL5 FV xen guest. It looked like the power blinked at the office and the xen dom0 rebooted. I brought my RHEL5 image back up and now this isn't happening anymore. It seems unlikely, but maybe this is something to do with being a guest on a long running dom0? I'll leave this open for now in case it happens again...
Closing this out. I've not seen this problem since, though it still worries me that I saw it at all. I'll reopen it if it returns.
I am not sure if this is important, but you will get this error if the portmapper is not running, start portmapper /etc/init.d/portmapper and try the mount and it will work properly.