Bug 835241 - rpcidmapd caching causes problems in an NFSv4 and NIS environment
rpcidmapd caching causes problems in an NFSv4 and NIS environment
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: ypbind (Show other bugs)
6.3
x86_64 Linux
unspecified Severity high
: rc
: ---
Assigned To: Matej Mužila
qe-baseos-daemons
:
Depends On:
Blocks: 961026 1075802 1159933
  Show dependency treegraph
 
Reported: 2012-06-25 15:57 EDT by Andy Feldt
Modified: 2015-05-05 20:30 EDT (History)
22 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: For a network using NIS and NFSv4, the change which has rpcidmapd caching uid/gid mappings causes a problem if /etc/fstab mounts an NFSv4 file system remotely owned by a uid/gid which will be resolved by NIS. Consequence: Files are shown as owned by user 'nobody' Fix: Since rpcidmapd is trying to resolve user names to uids and in a NIS environment that's done with ypbind. Therefore, the following workaround by changing the priority of the services ypbind and ypserv should fixed the problem: #> echo "# chkconfig: - 23 77" >/etc/chkconfig.d/ypbind #> echo "# chkconfig: - 22 78" >/etc/chkconfig.d/ypserv #> chkconfig ypbind resetpriorities #> chkconfig ypserv resetpriorities Please note that individual use case will need to be examined as which level of priorities for the serivce ypbind, and ypserv will need to be re-defined while rpcidmapd and ypbind are used together. Result:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-05-05 11:51:08 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
jiyin: needinfo-


Attachments (Terms of Use)

  None (edit)
Description Andy Feldt 2012-06-25 15:57:56 EDT
Description of problem:
For a network using NIS and NFSv4, the change which has rpcidmapd caching uid/gid mappings causes a problem if /etc/fstab mounts an NFSv4 file system remotely owned by a uid/gid which will be resolved by NIS.

Version-Release number of selected component (if applicable):
nfs-utils-1.2.3-26.el6.x86_64

How reproducible:
Always

Steps to Reproduce:
1.Create a filesystem on system A owned by a user, X,  whose passwd entry exists in the NIS passwd map.
2.Add an entry to /etc/fstab on system B which mounts this filesystem via NFSv4.
3.Reboot system B
4.Run ls -l on system B for the remotely mounted filesystem
  
Actual results:
Files are shown as owned by user 'nobody'

Expected results:
Files should be shown as owned by user X

Additional info:
Note that this is because the netfs script is run before the ypbind script.  It may be that the solution is to actually change the ypbind package to clear the id mapping cache when ypbind starts.
Comment 1 Andy Feldt 2012-06-25 16:01:41 EDT
I should correct my wording - replace 'filesystem' with 'directory' in my original post in the steps to reproduce
Comment 4 Geoff Kingsmill 2012-09-02 22:59:06 EDT
I have the exact same problem. nfsv4 mounted file systems show group nobody for groups defined within NIS. 

RHEL 6.3 64bit
nfs-utils-1.2.3-26.el6
nfs-utils-lib-1.1.5-4.el6

nfsv4 mounted file systems show some files mapped to group nobody, even though a "getent group" clearly shows a mapping between GID and group name.

Domain is correctly defined in /etc/rpc.idmapd on both NFSv4 server and client. The rpc.idmapd daemon is running.

If I temporary add a NIS group to /etc/group then files with the locally mapped group show the correct mapping.

Normally the order for starting the relevant daemons is rpcidmapd, ypbind and then nfs. With this configuration, nfsv4 mounted files where group is served by NIS shows group as nobody. 

/etc/rc3.d/S24rpcidmapd
/etc/rc3.d/S27ypbind          <-- fails when ypbind started after rpcidmapd
/etc/rc3.d/S28autofs
/etc/rc3.d/S30nfs

If I change the order and start NIS before rpcidmapd then everything works nicely. The NIS served groups correctly reflect the correct group.

/etc/rc3.d/S23ypbind          <-- works when ypbind started before rpcidmapd
/etc/rc3.d/S24rpcidmapd
/etc/rc3.d/S28autofs
/etc/rc3.d/S30nfs

Is this a known problem?
Comment 6 RHEL Product and Program Management 2012-12-14 03:04:27 EST
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Comment 7 jorge.gonzalez 2013-02-13 06:42:51 EST
We're having a similar problem when the user map is on LDAP and we access it via SSSD. SSSD is also started after rpcidmapd, and sometimes we have seen that a NFS mounted directory showed as owned by nobody.

We have reproduced the problem by deactivating sssd on boot. So rpcidmapd starts and every owner is nobody on th NFS mount (/home).

We have then started sssd and rpcidmapd still shows nobody as owner (it's caching results) BUT after 10 minutes (600 seconds default cache expiry in rpcidmapd), everything goes back to normal. All users are visible and all files on NFS show the correct LDAP owner.

It looks like the same problem as above, but with a different network database mechanism (NIS vs. LDAP).
Comment 14 Jestin Paul 2013-06-30 10:26:47 EDT
We also have the same issue and due to that we are unable to use nfs4 performance.
We would like to know when/which release this will be fixed. Also, if there is a proven workaround.
Comment 15 jorge.gonzalez 2013-08-20 04:18:04 EDT
Our workaround has been to include the line:

service rpcidmapd restart

in /etc/rc.d/rc.local. This way, rpcidmapd gets (re)started after SSSD is available and can get all user info from it.

The definitive solution would be to adjust the start order of daemons, so that SSSD starts the earliest and later on, the rest of daemons which make use of it (ypbind, rpcidmapd, etc.).
Comment 22 Steve Dickson 2014-04-23 11:51:11 EDT
(In reply to jorge.gonzalez from comment #15)
> Our workaround has been to include the line:
> 
> service rpcidmapd restart
> 
> in /etc/rc.d/rc.local. This way, rpcidmapd gets (re)started after SSSD is
> available and can get all user info from it.
> 
> The definitive solution would be to adjust the start order of daemons, so
> that SSSD starts the earliest and later on, the rest of daemons which make
> use of it (ypbind, rpcidmapd, etc.).
Since I can't do this I'm reassigning this bz to the group 
that can make it happen
Comment 23 Jakub Hrozek 2014-06-18 10:57:17 EDT
Steve, I'm fine with making the change per se, but what about cases where the system is configured so that some partitions the SSSD uses (/var maybe) are mounted with NFS? Wouldn't changing the startup order break those?
Comment 24 Steve Dickson 2014-06-19 07:47:38 EDT
(In reply to Jakub Hrozek from comment #23)
> Steve, I'm fine with making the change per se, but what about cases where
> the system is configured so that some partitions the SSSD uses (/var maybe)
> are mounted with NFS? Wouldn't changing the startup order break those?
Yes... This is interesting problem... Kinda of a chicken or egg problem.... 
So maybe changing the order may not make sense... 

Note, with later Linux NFS servers, this problem will not exist
since they now give uid strings (aka "3606") which don't need
to be looked up instead user@domain strings (aka "steved@redhat.com")
that do need to be looked up.

So I'm thinking the only way to make this work is to
use a server that gives out uid strings or added 
needed uid/gid to the client's /etc/passwd,
Comment 25 jorge.gonzalez 2014-06-19 14:37:28 EDT
I'm somehow confused: SSSD is supposed to be a system daemon, and started very early on the boot process, so that other processes can  make use of it.

I believe we should think about the common case: given the case of local /var and NFS mounted /var, my feelings are that local /var systems are much, much more common than NFS mounted /var systems. So the solution should be to solve the problem for local /var systems, and then adapt the solution for the other ones.

Regarding the NFS /var problem: in a system with NFS /var (and given that modern NFS seems to depend on SSSD), it looks that mounting /var would require some kind of early userspace SSSD.

A liberal analogy: if your partitioning does not use LVM, you don't need to include it in an initrd, and even more, you could probably boot without an initrd with the correct boot parameters. But if you need LVM to boot, you will build an initrd with LVM support, so that you can access root fs and boot off of it.

A similar case to /var: if /var is NFS mounted, you wil lneed to make an initrd which fully supports doing so. And if NFS depends on SSSD, then SSSD _will_ have to go on the initrd early user space...

Am I wrong?

Regards
J.
Comment 26 Steve Dickson 2014-06-23 07:38:10 EDT
(In reply to jorge.gonzalez from comment #25)
> I'm somehow confused: SSSD is supposed to be a system daemon, and started
> very early on the boot process, so that other processes can  make use of it.
> 
> I believe we should think about the common case: given the case of local
> /var and NFS mounted /var, my feelings are that local /var systems are much,
> much more common than NFS mounted /var systems. So the solution should be to
> solve the problem for local /var systems, and then adapt the solution for
> the other ones.
> 
> Regarding the NFS /var problem: in a system with NFS /var (and given that
> modern NFS seems to depend on SSSD), it looks that mounting /var would
> require some kind of early userspace SSSD.
> 
> A liberal analogy: if your partitioning does not use LVM, you don't need to
> include it in an initrd, and even more, you could probably boot without an
> initrd with the correct boot parameters. But if you need LVM to boot, you
> will build an initrd with LVM support, so that you can access root fs and
> boot off of it.
> 
> A similar case to /var: if /var is NFS mounted, you wil lneed to make an
> initrd which fully supports doing so. And if NFS depends on SSSD, then SSSD
> _will_ have to go on the initrd early user space...
> 
> Am I wrong?
I don't think so... Adding this to the to the initrd might be the answer...
Comment 27 Jakub Hrozek 2014-07-02 13:31:17 EDT
I'm sorry this bugzilla stalled a bit.

Steve, with your NFS experience, did you encounter similar problems with NFS-mounted /var with other daemons?

When you say "Adding this to the initrd", were you thinking a distro-wide change or a custom change the admin of such setup would do?
Comment 28 Steve Dickson 2014-07-08 14:03:38 EDT
(In reply to Jakub Hrozek from comment #27)
> I'm sorry this bugzilla stalled a bit.
> 
> Steve, with your NFS experience, did you encounter similar problems with
> NFS-mounted /var with other daemons?
No I have not... Generally when people want a system filesystem NFS
mounted they mount the root file system over NFS then have /var
part of the mounted root.
 
> 
> When you say "Adding this to the initrd", were you thinking a distro-wide
> change or a custom change the admin of such setup would do?
This would probably a custom change, since I don't maintain initrd.
Comment 29 Jakub Hrozek 2014-07-30 09:36:15 EDT
This is the startup order I see on a 6.5 clean install:

# ll /etc/rc3.d/ | grep -E "(sssd|rpcidmapd|ypbind)"
lrwxrwxrwx. 1 root root 14 Jul 30 15:18 S13sssd -> ../init.d/sssd
lrwxrwxrwx. 1 root root 19 Jul 30 15:18 S18rpcidmapd -> ../init.d/rpcidmapd
lrwxrwxrwx. 1 root root 16 Jul 30 15:22 S24ypbind -> ../init.d/ypbind

So it seems we're already good?

Can you confirm what startup order you're seeing with your machines?
Comment 30 Andy Feldt 2014-07-30 17:26:27 EDT
Jakub,

This is the startup order, but this still leads to problem alluded to in my original bug filing.  When rpcidmapd starts before ypbind, it means that automounted directories get mounted without the information from NIS about ownership and this persists due to the caching done by rpcidmapd.  In my case, all basic user home directories are automounted and users then log in without ownership of their own files if I use NFSv4 instead of v3.  My solution has still been to run only NFSv3 because of this.  But, I would prefer to be able to run NFSv4.

Andy
Comment 31 Jakub Hrozek 2014-07-31 05:25:43 EDT
(In reply to Andy Feldt from comment #30)
> Jakub,
> 
> This is the startup order, but this still leads to problem alluded to in my
> original bug filing.  When rpcidmapd starts before ypbind,

But that means we should revert the ypbind <-> rpcidmapd order and not sssd <--> rpcidmapd, right?

I think with respect to NFS and SSSD startup order, we're good since we fixed bug #805431

Any objections to re-assigning this bug to ypbind ?

> it means that
> automounted directories get mounted without the information from NIS about
> ownership and this persists due to the caching done by rpcidmapd.  In my
> case, all basic user home directories are automounted and users then log in
> without ownership of their own files if I use NFSv4 instead of v3.  My
> solution has still been to run only NFSv3 because of this.  But, I would
> prefer to be able to run NFSv4.
> 

Thank you for the explanation!
Comment 32 Andy Feldt 2014-07-31 12:35:00 EDT
It is fine with me to assign to ypbind - the process does not matter to me as long as the result lets me finally use NFSv4!
Comment 33 Honza Horak 2014-08-15 06:22:53 EDT
(In reply to Jakub Hrozek from comment #29)
> This is the startup order I see on a 6.5 clean install:
> 
> # ll /etc/rc3.d/ | grep -E "(sssd|rpcidmapd|ypbind)"
> lrwxrwxrwx. 1 root root 14 Jul 30 15:18 S13sssd -> ../init.d/sssd
> lrwxrwxrwx. 1 root root 19 Jul 30 15:18 S18rpcidmapd -> ../init.d/rpcidmapd
> lrwxrwxrwx. 1 root root 16 Jul 30 15:22 S24ypbind -> ../init.d/ypbind

Well, now rpcidmapd is mapped to 24 on my rhel-6 box, even though it is set to 18 in the LSB header, so I'm a bit confused (resetpriorities does not change it either). CC'ing Lukas to ask for explanation -- how is it possible, Lukas?

# ll /etc/rc3.d/ | grep -E "(sssd|rpcidmapd|ypbind)"
lrwxrwxrwx. 1 root root 19 Jul 23 15:18 S24rpcidmapd -> ../init.d/rpcidmapd
lrwxrwxrwx. 1 root root 16 Aug 15 10:11 S24ypbind -> ../init.d/ypbind

Anyway, the last time we did changes in the services order in ypbind, it turned to be too fragile and it created a regression #1011507. Since ypbind is so crucial component in the starting chain and many applications depend on it, I'd rather suggest to change the starting order in rpcidmapd to 25. Would that be possible? (changing component to nfs-utils, so it is brought to attention)
Comment 34 Honza Horak 2014-08-15 06:34:38 EDT
(In reply to Andy Feldt from comment #32)
> It is fine with me to assign to ypbind - the process does not matter to me
> as long as the result lets me finally use NFSv4!

Anyway, Andy, could you, please, test if changing rpcidmapd's priority to 25 would fix your issue? It can be easily done by:
 #> touch /etc/chkconfig.d/rpcidmapd
 #> echo '# chkconfig: 345 25 75' >>/etc/chkconfig.d/rpcidmapd
 #> chkconfig rpcidmapd resetpriorities
Thanks!
Comment 35 Lukáš Nykrýn 2014-08-15 07:46:31 EDT
So problem is here:

[kamarad@hhorak-rhel-6-server init.d]$ grep Provide /etc/rc3.d/S23NetworkManager 
# Provides: network_manager $network
[kamarad@hhorak-rhel-6-server init.d]$ grep Require /etc/rc3.d/S24rpcidmapd 
# Required-Start: $network $syslog
Comment 36 Honza Horak 2014-08-15 07:54:07 EDT
Restoring still valid needinfo, as per comment #34.
Comment 37 Andy Feldt 2014-08-15 11:11:51 EDT
I will not be able try this until Monday as I need to coordinate with users and user jobs to be able to set up a test environment where I turn on NFSv4.  I will let you know once I have been able to do this.
Comment 38 Andy Feldt 2014-08-20 10:37:44 EDT
Well, I was optimistic about when I could get to this! But, I have now tested this.  Surprisingly, I was able to use NFSv4 in conjunction with NIS and autofs after this even though it does not change the ordering of rpcidmapd and ypbind.  So, as another test, I tried with rpcidmap at its original priority of 18 and that worked also!  So, something else has changed which has taken care of this issue (at least in my limited testing).  I will try a more extensive roll out of this over the next several weeks and see if I run into any problems.  But, whether rpcidmapd is at 18 or 25 seems to matter not.
Comment 39 Honza Horak 2014-08-20 10:47:22 EDT
Actually, priority defined in the init script is not used every-time explicitly, the initscripts do some other magic to satisfy dependencies, but I do not know details. Just FYI.
Comment 40 Andy Feldt 2014-08-20 11:09:42 EDT
Note: I just realized that on my systems, I had not updated the priority for ypbind to 24 (hence my note that the change in priority for rpcidmapd did not affect its ordering with ypbind).  This, of course, does not change the fact that I now find that there is no problem even when rpcidmapd starts before ypbind!
Comment 41 Steve Dickson 2014-09-03 13:25:22 EDT
(In reply to Andy Feldt from comment #40)
> Note: I just realized that on my systems, I had not updated the priority for
> ypbind to 24 (hence my note that the change in priority for rpcidmapd did
> not affect its ordering with ypbind).  This, of course, does not change the
> fact that I now find that there is no problem even when rpcidmapd starts
> before ypbind!

Does that me we can close this bug?
Comment 42 Andy Feldt 2014-09-03 15:53:20 EDT
From my perspective, it can be closed (tho' I am curious as to what actually changed to make this viable without a re-odering of the init scripts, but as long as it works...)
Comment 43 Jestin Paul 2014-10-16 06:23:24 EDT
So, what is the conclusion?
How to solve this problem?
Comment 46 Roland Mainz 2014-11-10 05:41:06 EST
(In reply to jorge.gonzalez from comment #7)
[snip]
> We have then started sssd and rpcidmapd still shows nobody as owner (it's
> caching results) BUT after 10 minutes (600 seconds default cache expiry in
> rpcidmapd), everything goes back to normal. All users are visible and all
> files on NFS show the correct LDAP owner.

Steve: Does Linux idmapd have a way to clear the cache (all cache entries, regardless whether these are local, LDAP, NIS/YP or NIS+ entries) via nfsidmap(5) ?
Comment 47 Steve Dickson 2014-12-03 13:00:11 EST
(In reply to Roland Mainz from comment #46)
> (In reply to jorge.gonzalez from comment #7)
> [snip]
> > We have then started sssd and rpcidmapd still shows nobody as owner (it's
> > caching results) BUT after 10 minutes (600 seconds default cache expiry in
> > rpcidmapd), everything goes back to normal. All users are visible and all
> > files on NFS show the correct LDAP owner.
> 
> Steve: Does Linux idmapd have a way to clear the cache (all cache entries,
> regardless whether these are local, LDAP, NIS/YP or NIS+ entries) via
> nfsidmap(5) ?

Yes nfsidmap -c will clear the kering of all keys
Comment 53 Steve Dickson 2015-03-16 10:58:35 EDT
per Comment 50, Comment 51 and Comment 52 changing component to ypbind
Comment 55 Honza Horak 2015-03-17 12:46:39 EDT
However, changing ypbind startup order also requires to update ypserv priority to 22, not later than ypbind change is done, to not meet something like BZ#953555.
Comment 56 Honza Horak 2015-03-17 12:48:27 EDT
I'm also thinking about if the priority of service may be somehow overridden permanently by admin for example.. Lukas, can you advise?
Comment 57 Steve Dickson 2015-03-17 13:15:56 EDT
(In reply to Honza Horak from comment #55)
> However, changing ypbind startup order also requires to update ypserv
> priority to 22, not later than ypbind change is done, to not meet something
> like BZ#953555.

How come I don't have access to this bz???
Comment 59 Lukáš Nykrýn 2015-03-18 07:32:51 EDT
(In reply to Honza Horak from comment #56)
> I'm also thinking about if the priority of service may be somehow overridden
> permanently by admin for example.. Lukas, can you advise?

CHKCONFIG(8) 
OVERRIDE FILES
       File in /etc/chkconfig.d/servicename are parsed using the same comments that chkconfig notices in init service scripts, and override values in the init service scripts themselves.
Comment 61 Honza Horak 2015-03-18 17:16:31 EDT
(In reply to Lukáš Nykrýn from comment #59)
> (In reply to Honza Horak from comment #56)
> > I'm also thinking about if the priority of service may be somehow overridden
> > permanently by admin for example.. Lukas, can you advise?
> 
> CHKCONFIG(8) 
> OVERRIDE FILES
>        File in /etc/chkconfig.d/servicename are parsed using the same
> comments that chkconfig notices in init service scripts, and override values
> in the init service scripts themselves.

Thanks, Lukas.

I've checked that and it works nice, so the following should fix the issue as well:
  #> echo "# chkconfig: - 23 77" >/etc/chkconfig.d/ypbind
  #> echo "# chkconfig: - 22 78" >/etc/chkconfig.d/ypserv
  #> chkconfig ypbind resetpriorities
  #> chkconfig ypserv resetpriorities

So, considering possibility of regression we saw the last time we changed priority of these services, wouldn't this be a solution? Wouldn't customer be fine with re-defining the priorities for their specific use case? We can also document this as the solution for cases user needs to use rpcidmapd and ypbind together.
Comment 62 Honza Horak 2015-03-19 07:17:34 EDT
Removing devel_ack, based on comment #55, since unless we update also ypserv, we cannot just update only ypbind.
Comment 69 Honza Horak 2015-05-05 11:51:08 EDT
Closing as per comment #61. Please, consider using the solution described in that comment.

Note You need to log in before you can comment on or make changes to this bug.