1033708 – Updating to nfs-utils-1.2.3-39.el6 causes rpcidmapd to be chkconfig deleted

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1033708 - Updating to nfs-utils-1.2.3-39.el6 causes rpcidmapd to be chkconfig deleted

Summary: Updating to nfs-utils-1.2.3-39.el6 causes rpcidmapd to be chkconfig deleted

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	nfs-utils
Sub Component:
Version:	6.5
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Steve Dickson
QA Contact:	JianHong Yin
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1044514 1122375 (view as bug list)
Depends On:
Blocks:	994246 1075802 1079871 1172231
TreeView+	depends on / blocked

Reported:	2013-11-22 16:47 UTC by John T. Rose
Modified:	2019-03-22 07:16 UTC (History)
CC List:	30 users (show)
Fixed In Version:	nfs-utils-1.2.3-46
Doc Type:	Bug Fix
Doc Text:	The nfs-utils packages had been changed to use an in-kernel key ring to store the ID mappings needed for NFSv4. However, the kernel key is too small for large enterprise environments. With this update, the nfsidmap command, used by the kernel to do ID mapping, has been changed to use multiple key rings.
Clone Of:
Environment:
Last Closed:	2014-10-14 04:32:09 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Use multiple keyrings for nfsidmap to work around keyring limits (6.42 KB, patch) 2014-03-21 21:23 UTC, bcodding	no flags	Details \| Diff
[PATCH 1/2] nfsidmap: Match names with kernel default keyring (1.02 KB, patch) 2014-03-25 14:04 UTC, bcodding	no flags	Details \| Diff
[PATCH 2/2] nfsidmap: Create id_resolver child keyrings (5.92 KB, patch) 2014-03-25 14:05 UTC, bcodding	no flags	Details \| Diff
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	975993	0	None	None	RHEL 6.5 Kernel key requests keep incrementing due to kerberos	2019-03-22 07:16:00 UTC
Red Hat Product Errata	RHBA-2014:1407	0	normal	SHIPPED_LIVE	nfs-utils bug fix and enhancement update	2014-10-14 00:54:56 UTC

Description John T. Rose 2013-11-22 16:47:38 UTC

Description of problem:

After upgrading from RHEL 6.4 to 6.5 and rebooting we noticed that all of our machines that were running rpcidmapd were no longer running it.

Version-Release number of selected component (if applicable):

Upgrading nfs-utils.x86_64 1:1.2.3-36.el6 to nfs-utils.x86_64 1:1.2.3-39.el6.

How reproducible:

Always.

Steps to Reproduce:
1. Install nfs-utils.x86_64 1:1.2.3-36.el6
2. Start rpcidmapd
3. Upgrade nfs-utils to nfs-utils.x86_64 1:1.2.3-39.el6
4. Reboot
5. Figure out why rpcidmapd is no longer running as it was before the upgrade

Actual results:

rpcidmapd is deleted from chkconfig.

Expected results:

Leave my configured running services alone during upgrades.

Additional info:

This appears to be intentional as this trigger was added in nfs-utils.x86_64 1:1.2.3-39.el6.

triggerun scriptlet (using /bin/sh) -- nfs-utils < 1:1.2.3-38
if [ "$1" -eq 2 ]; then
	/sbin/chkconfig --del rpcidmapd
fi

This broke lots of machines so I decided to open this bz even though it is probably going to be treated as water under the bridge at this point.

Comment 2 Steve Dickson 2013-11-22 21:10:31 UTC

(In reply to John T. Rose from comment #0)
> 
> This broke lots of machines so I decided to open this bz even though it is
> probably going to be treated as water under the bridge at this point.

The reason for this is the client now does an upcall to the
nfsidmap command to do the id mapping... Only the server 
now uses the rpc.idmapd daemon... See nfsidmap(5) for details.

How did this break you machines?

Comment 3 John T. Rose 2013-11-22 21:22:16 UTC

Steve, sorry about the hyperbole. What happened was that we monitor the processes and saw all these machines missing it after reboot. Since we believed it was a necessary process we "fixed" them before receiving any problem reports so it may well be that our fixing them was wasted effort.

I'm curious how we might have been aware of this change before it happened?

Thanks for the clarification.

Comment 5 RHEL Program Management 2013-11-25 22:02:16 UTC

This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 6 Nicolas Mitsis 2013-11-26 10:15:52 UTC

With rpcidmapd off I've got the following issue using automount and nfs with kerberos/IPA:

* with rpcidmapd off user can login, create/modify/etc it's own files, but when he does ls -l it shows:

# ls -ld /home/user
drwx-----x. 5 4294967294 4294967294 4.0K 2013-10-29 16:23 /home/user

The number 4294967294 (nobody?) it's not his id which resolves correctly:

# id user
uid=1755404884(user) gid=1755404884(user) groups=1755404884(user)

* with rpcidmapd on (and autofs restart) ls -l resolves correctly:

# ls -ld /home/user
drwx-----x. 5 user user 4.0K 2013-10-29 16:23 /home/user

I've only noticed this because ssh complained for incorrect owner/permissions on ~.ssh/ files.

Comment 7 Steve Dickson 2013-11-26 11:37:28 UTC

With rpc.idmapd *not* runnning, set Verbosity = 2 in /etc/idmapd.conf
then redo the mounts. There should be some debug log in
/var/log/message. Please post those debug logs

Comment 8 Nicolas Mitsis 2013-11-26 20:45:53 UTC

I get no logs with rpc.idmapd stopped.

However, with rpc.idmapd running I get:

Nov 26 21:59:42 host rpc.idmapd[28813]: Client aa: (user) name "user@domain" -> id "1755401773"
Nov 26 21:59:42 host rpc.idmapd[28813]: Client aa: (group) name "user@domain" -> id "1755401773"

I've run a test with nfsv3 and user ids work ok, the problem exists with nfsv4:

# ls -ld /home/user
drwx--x--x. 15 4294967294 4294967294 4.0K Oct 19 10:39 /home/user

# grep user /etc/mtab 
nfs.domain:/user /home/user nfs4 rw,sec=krb5p,soft,rsize=8192,wsize=8192,sloppy,addr=10.1.1.1,clientaddr=10.1.1.2 0 0

With nfsv3:

# ls -ld /mnt/user
drwx--x--x. 15 user user 4.0K Oct 19 10:39 /mnt/user

# grep mnt /etc/mtab 
nfs:/home /mnt nfs rw,nolock,addr=10.1.1.2 0 0

*** Config files follow ***

# grep -v ^# /etc/idmapd.conf | uniq
[General]
Verbosity = 2

[Mapping]

Nobody-User = nobody
Nobody-Group = nobody

[Translation]

Method = nsswitch

# grep -v ^# /etc/sysconfig/nfs | uniq 
SECURE_NFS="yes"

# grep -v ^# /etc/request-key.d/id_resolver.conf | uniq 
create    id_resolver    *         *    /usr/sbin/nfsidmap %k %d

# cat /etc/autofs_ldap_auth.conf 
<?xml version="1.0" ?>
<!--
This files contains a single entry with multiple attributes tied to it.
See autofs_ldap_auth.conf(5) for more information.
-->

<autofs_ldap_sasl_conf
	usetls="no"
	tlsrequired="no"
	authrequired="yes"
	authtype="GSSAPI"
	clientprinc="host/ipa.domain@/DOMAIN"
/>

# grep -v ^# /etc/nsswitch.conf  | uniq

passwd:     files sss
shadow:     files sss
group:      files sss

hosts:      files dns

bootparams: nisplus [NOTFOUND=return] files

ethers:     files
netmasks:   files
networks:   files
protocols:  files
rpc:        files
services:   files sss

netgroup:   files sss

publickey:  nisplus

automount:  files sss
aliases:    files nisplus

sudoers:    files ldap

# grep autofs /etc/sssd/sssd.conf 
services = nss, pam, autofs, ssh, sudo
[autofs]

Comment 9 John T. Rose 2013-11-26 21:04:24 UTC

(In reply to Nicolas Mitsis from comment #6)
> With rpcidmapd off I've got the following issue using automount and nfs with
> kerberos/IPA:
> 
> * with rpcidmapd off user can login, create/modify/etc it's own files, but
> when he does ls -l it shows:
> 
> # ls -ld /home/user
> drwx-----x. 5 4294967294 4294967294 4.0K 2013-10-29 16:23 /home/user

FWIW I'm also using automount and nfs4 with kerberos (in this case AD) and with rpcidmapd off I'm not seeing this issue. Everything appears to be working normally here. If I can be of any help debugging let me know and I'll try to help.

Comment 10 Steve Dickson 2013-12-02 13:46:18 UTC

(In reply to Nicolas Mitsis from comment #6)
> With rpcidmapd off I've got the following issue using automount and nfs with
> kerberos/IPA:
> 
> * with rpcidmapd off user can login, create/modify/etc it's own files, but
> when he does ls -l it shows:
> 
> # ls -ld /home/user
> drwx-----x. 5 4294967294 4294967294 4.0K 2013-10-29 16:23 /home/user
> 
> The number 4294967294 (nobody?) it's not his id which resolves correctly:
Hmm... That was how nfsnobody was being set... What version of 
nfs-utils are you using?

Comment 11 Rob Henderson 2013-12-03 14:36:31 UTC

Just another data point.  We are also seeing this problem but not repeatably.  We are using nfsv4+automount+kerberos/AD and recently upgraded a number of systems to 6.5.  Shortly after, we got a couple complaints that people were seeing all their files owned by uid 4294967294 (although the gid was correct).  I was able to su to a couple different users and see the incorrect uid and the problem disappeared as soon as I started rpcidmapd on the client.  I assumed that the problem was that rpcidmapd was not running so didn't do a lot of debugging other than just starting that service.  Later I did try to stop the service on one system and then restart nslcd and nscd but the problem did *not* return.

Since then, I have looked more closely at a number of other systems that were also upgraded to 6.5 at the same time.  In every other case that I have checked, the UID mapping is working correctly without rpcidmapd running.  So, at this point, I definitely confirmed a couple instances of the problem but am not able to reproduce it now.

All the client systems in question were recently upgraded to rhel 6.5 with nfs-utils-1.2.3-39 and the file servers are running nfs-utils-1.2.3-36.

Comment 12 Nicolas Mitsis 2013-12-03 16:43:57 UTC

Sorry for the long delay. Since this was temporary fixed by starting rpcidmapd it got a low priority and I didn't do much debugging.

After rebooting the servers today (did maintenance on the host servers) the problem got away. That is, uid/gid resolves correctly without rpcidmapd running on the client. I do not know why there was a problem at first and I didn't change any configuration since then. Note that the servers where rebooted after upgrading to 6.5, this was a second reboot and also I'm not running neither nslcd nor nscd, just sssd.

If anyone requires more info I'm happy to help.

Comment 13 Steve Dickson 2013-12-04 11:43:20 UTC

(In reply to Nicolas Mitsis from comment #12)
> Sorry for the long delay. Since this was temporary fixed by starting
> rpcidmapd it got a low priority and I didn't do much debugging.
> 
> After rebooting the servers today (did maintenance on the host servers) the
> problem got away. That is, uid/gid resolves correctly without rpcidmapd
> running on the client. I do not know why there was a problem at first and I
> didn't change any configuration since then. Note that the servers where
> rebooted after upgrading to 6.5, this was a second reboot and also I'm not
> running neither nslcd nor nscd, just sssd.
> 
> If anyone requires more info I'm happy to help.

hmm... that is odd... I'm glad things got straight out but I have
to wonder if there is something going on during the upgrade
that is causing this problem...

Comment 14 kerickso 2013-12-10 05:06:05 UTC

We are experiencing this, too.  With LDAP + autofs + krb5 + sssd, all automounted nfs4 directories have every uid:gid as 4294967294:4294967294.  This is under 6.5 Server.  Will try starting rpcidmapd on the client.

Comment 15 kerickso 2013-12-10 05:08:44 UTC

# service rpcidmapd start
Starting RPC idmapd:                                       [  OK  ]


Problem fixed.  This is a really serious issue, in that 1) user action is required, and 2) a deprecated daemon (client side idmapd) is required to reinstate previously working functionality.  What is also very unhelpful is that the issue only occurs after the existing caches expire (such as on a reboot), so it is very hard to track down the cause.

Please fix.

Comment 16 Steve Dickson 2013-12-10 21:34:28 UTC

(In reply to kerickso from comment #15)
> # service rpcidmapd start
> Starting RPC idmapd:                                       [  OK  ]
> 
> 
> Problem fixed.  This is a really serious issue, in that 1) user action is
> required, and 2) a deprecated daemon (client side idmapd) is required to
> reinstate previously working functionality.  What is also very unhelpful is
> that the issue only occurs after the existing caches expire (such as on a
> reboot), so it is very hard to track down the cause.
To debug this the Verbose=2 needs to be set in /etc/idmapd.conf.
Stop the rpc.idmapd daemon and allow the nfsidmap command which
is called by the kernel to do the idmapping. 

The configuration for this new kernel keyring based ID 
mapping lives in /etc/request-key.d/id_resolver.conf. 
Adding a "-v" to the nfsidmap command line in that file
will also enable debugging.

Once things are reset back to having the nfsidmap command, with 
debugging on, please look in /var/log/messages for any messages
from the nfsidmap command or kernel keyring code and post
them... 

> 
> Please fix.
Understood... working on it...

Comment 17 kerickso 2013-12-10 23:12:07 UTC

I'm afraid this first round of testing is not going to be too helpful.

First, I stopped the daemon.  Then I put Verbose=2 (with a capital V and no spaces, as you wrote) in idmapd.conf, and I added the -v to id_resolver.conf.  Then I did ls -la /dir/to/autoNfsMount, and saw the uid/gid of 4 billion again.  I tailed /var/log/messages, and saw nothing.  I checked the last modification time, and it was a while back (the last message in there is a DHCP renewal).

I rebooted to try a fresh start.

I ran service rpcidmapd status to ensure that it was stopped, repeated the ls, and repeated the tail.  Here's the messages log:

Dec 10 18:08:13 myhostname kernel: RPC: Registered udp transport module.
Dec 10 18:08:13 myhostname kernel: RPC: Registered tcp transport module.
Dec 10 18:08:13 myhostname kernel: RPC: Registered tcp NFSv4.1 backchannel transport module.
Dec 10 18:08:13 myhostname kernel: FS-Cache: Netfs 'nfs' registered for caching


Not much help, I'm afraid.

Comment 18 Steve Dickson 2013-12-12 19:26:34 UTC

First all thank you making making this effort!

(In reply to kerickso from comment #17)
> I'm afraid this first round of testing is not going to be too helpful.
> 
> First, I stopped the daemon.  Then I put Verbose=2 (with a capital V and no
> spaces, as you wrote) in idmapd.conf, 
Well I mis-typed... You want to set Verbosity = 2 in idmapd.conf
The variable is actually comment out.

> and I added the -v to id_resolver.conf.  
hmm... this is odd... so the line in id_resolver.conf look like:
create    id_resolver    *         *    /usr/sbin/nfsidmap -vv %k %d

> Then I did ls -la /dir/to/autoNfsMount, and saw the
> uid/gid of 4 billion again.  I tailed /var/log/messages, and saw nothing.  I
> checked the last modification time, and it was a while back (the last
> message in there is a DHCP renewal).
During the mount the messages should look something like
Dec 10 09:50:46 rhel6 nfsidmap[7247]: key: 0x345872c6 type: uid value: root@DNS_Domain timeout 600
Dec 10 09:50:46 rhel6 nfsidmap[7249]: key: 0x138edd0d type: gid value: root@DNS_Domain timeout 600

> 
> I rebooted to try a fresh start.
> 
> I ran service rpcidmapd status to ensure that it was stopped, repeated the
> ls, and repeated the tail.  Here's the messages log:
> 
> Dec 10 18:08:13 myhostname kernel: RPC: Registered udp transport module.
> Dec 10 18:08:13 myhostname kernel: RPC: Registered tcp transport module.
> Dec 10 18:08:13 myhostname kernel: RPC: Registered tcp NFSv4.1 backchannel
> transport module.
> Dec 10 18:08:13 myhostname kernel: FS-Cache: Netfs 'nfs' registered for
> caching
This is normal noise which happens when the kernel modules get loaded 

> 
> 
> Not much help, I'm afraid.
Its a beginning!!! 


A couple questions:

1) what server is everyone using?

2) A DNS domain name is or is not set?

Comment 19 kerickso 2013-12-13 13:41:19 UTC

(In reply to Steve Dickson from comment #18)
> First all thank you making making this effort!
np


> Well I mis-typed... You want to set Verbosity = 2 in idmapd.conf
> The variable is actually comment out.

Done

> > and I added the -v to id_resolver.conf.  
> hmm... this is odd... so the line in id_resolver.conf look like:
> create    id_resolver    *         *    /usr/sbin/nfsidmap -vv %k %d

So do you want one v or two?  You asked for one, but your thing above has two.  I'll switch to two until I hear back.


With all of these changes and stopping rpcidmapd, once again /var/log/messages has nothing.  I'll reboot.

After rebooting, I get the exact same contents in /var/log/messages:
Dec 13 08:39:42 myhostname kernel: RPC: Registered udp transport module.
Dec 13 08:39:42 myhostname kernel: RPC: Registered tcp transport module.
Dec 13 08:39:42 myhostname kernel: RPC: Registered tcp NFSv4.1 backchannel transport module.
Dec 13 08:39:42 myhostname kernel: FS-Cache: Netfs 'nfs' registered for caching

> A couple questions:
> 
> 1) what server is everyone using?
See Comment #14: RHEL6.5 Server
 
> 2) A DNS domain name is or is not set?
It is set via DHCP

Comment 20 Steve Dickson 2013-12-13 19:52:50 UTC

(In reply to kerickso from comment #19)
> (In reply to Steve Dickson from comment #18)
> > First all thank you making making this effort!
> np
> 
> 
> > Well I mis-typed... You want to set Verbosity = 2 in idmapd.conf
> > The variable is actually comment out.
> 
> Done
> 
> > > and I added the -v to id_resolver.conf.  
> > hmm... this is odd... so the line in id_resolver.conf look like:
> > create    id_resolver    *         *    /usr/sbin/nfsidmap -vv %k %d
> 
> So do you want one v or two?  You asked for one, but your thing above has
> two.  I'll switch to two until I hear back.
Two is fine... It just enables more debugging... 

Actually both are not needed... the  Verbosity = 2 or the 
-vv basically do the same thing... 

> 
> 
> With all of these changes and stopping rpcidmapd, once again
> /var/log/messages has nothing.  I'll reboot.
This is bizarre.... When you start rpc.idmapd daemon with 
Verbosity = 2 are there any debugging messages logged?

> > A couple questions:
> > 
> > 1) what server is everyone using?
> See Comment #14: RHEL6.5 Server
Sorry I missed that... 

>  
> > 2) A DNS domain name is or is not set?
> It is set via DHCP
Good... 

Just to be clear... With Verbosity = 2 set, you do the mount
and the ls and nothing is logged to /var/log/messages with 
or without the rpc.idmapd daemon running?

Comment 21 Sami Kähkönen 2013-12-18 15:28:06 UTC

(In reply to kerickso from comment #14)
> We are experiencing this, too.  With LDAP + autofs + krb5 + sssd, all
> automounted nfs4 directories have every uid:gid as 4294967294:4294967294. 
> This is under 6.5 Server.  Will try starting rpcidmapd on the client.

We had this too, but only for some users at first. Reason was AFAIK that nfsidmap uses kernel keyring and root quota (I suppose this is the real bug here). For first 200 users and groups the default size kernel.keys.root_maxkeys = 200 is enough. 

After the quota is full (see /proc/key-users) nfsidmap failure manifests as giving uid=4294967294 (this is int -2 interpreted as uint) for users trying to login later on, if rpc.idmapd service is not running.

We have (/etc/sysctl.conf):
 kernel.keys.root_maxbytes = 500000
 kernel.keys.root_maxkeys = 25000
but after two weeks uptime even this is not enough.
Only option was to (re)enable rpc.idmapd.

This problem has already been seen in Fedora 19:
  https://bugzilla.redhat.com/show_bug.cgi?id=876705
but not really solved.
The real bug is that kernel keyring based id mapping uses root quota. Also it should give some kind of warning on failure when quota is full.
Keyring allocations seems to be permanent (see /proc/keys), but I cannot say if this is the required behaviour or not.

Our setup is also RHEL6.5, nfs4 automounted home directories with ldap, krb5+sssd.

Actually we first noticed this as sssd also uses kernel keyring and we had these messages in /var/log/messages after restarting sssd service:

sssd: Could not create private keyring session. If you store password there they may be easily accessible to the root user. (122, Disk quota exceeded)

sssd: Could not set permissions on private keyring. If you store password there they may be easily accessible to the root user. (13, Permission denied)

Comment 22 kerickso 2013-12-18 19:38:45 UTC

(In reply to Steve Dickson from comment #20)
> > With all of these changes and stopping rpcidmapd, once again
> > /var/log/messages has nothing.  I'll reboot.
> This is bizarre.... When you start rpc.idmapd daemon with 
> Verbosity = 2 are there any debugging messages logged?

# service rpcidmapd start
Starting RPC idmapd: rpc.idmapd: libnfsidmap: using (default) domain: mydomain
rpc.idmapd: libnfsidmap: Realms list: 'MYREALM' 
rpc.idmapd: libnfsidmap: processing 'Method' list
rpc.idmapd: libnfsidmap: loaded plugin /usr/lib64/libnfsidmap/nsswitch.so for method nsswitch

> Just to be clear... With Verbosity = 2 set, you do the mount
> and the ls and nothing is logged to /var/log/messages with 
> or without the rpc.idmapd daemon running?

The mounts are automounted, so I'm not mounting them directly.  I am, however, ls'ing a directory that's currently not there so that automount kicks in and automounts it.

Comment 23 Steve Dickson 2013-12-24 14:29:15 UTC

(In reply to Sami Kähkönen from comment #21)
> (In reply to kerickso from comment #14)
> > We are experiencing this, too.  With LDAP + autofs + krb5 + sssd, all
> > automounted nfs4 directories have every uid:gid as 4294967294:4294967294. 
> > This is under 6.5 Server.  Will try starting rpcidmapd on the client.
> 
> We had this too, but only for some users at first. Reason was AFAIK that
> nfsidmap uses kernel keyring and root quota (I suppose this is the real bug
> here). For first 200 users and groups the default size
> kernel.keys.root_maxkeys = 200 is enough. 
> 
> After the quota is full (see /proc/key-users) nfsidmap failure manifests as
> giving uid=4294967294 (this is int -2 interpreted as uint) for users trying
> to login later on, if rpc.idmapd service is not running.
> 
> We have (/etc/sysctl.conf):
>  kernel.keys.root_maxbytes = 500000
>  kernel.keys.root_maxkeys = 25000
> but after two weeks uptime even this is not enough.
> Only option was to (re)enable rpc.idmapd
Thank you for doing this debugging... 

> 
> This problem has already been seen in Fedora 19:
>   https://bugzilla.redhat.com/show_bug.cgi?id=876705
I'll look to get this patches into a RHEL6.6 and the Z-streams...

> but not really solved.
> The real bug is that kernel keyring based id mapping uses root quota. Also
> it should give some kind of warning on failure when quota is full.
> Keyring allocations seems to be permanent (see /proc/keys), but I cannot say
> if this is the required behaviour or not.
I'll talk to upstream to see about this...

> 
> Our setup is also RHEL6.5, nfs4 automounted home directories with ldap,
> krb5+sssd.
> 
> Actually we first noticed this as sssd also uses kernel keyring and we had
> these messages in /var/log/messages after restarting sssd service:
The nfsidmap command should definitely log some type of error/warning..

Thanks again!
 
> 
> sssd: Could not create private keyring session. If you store password there
> they may be easily accessible to the root user. (122, Disk quota exceeded)
> 
> sssd: Could not set permissions on private keyring. If you store password
> there they may be easily accessible to the root user. (13, Permission denied)

Comment 24 chruitad 2014-01-08 21:49:02 UTC

I just found this error too and I am not using autofs.  The problem I am experiencing is likely linked to the fact that rpc.idmapd is no longer a service:

[root@host etc]# service --status-all | grep rpc
rpc.svcgssd is stopped
rpc.mountd is stopped
rpc.rquotad is stopped
rpc.statd (pid  5156) is running...
rpcbind (pid  1855) is running...
rpc.gssd is stopped
rpc.idmapd is stopped
rpc.svcgssd is stopped

[root@host etc]# service rpc.idmapd start
rpc.idmapd: unrecognized service

Yet, the /usr/sbin/rpc.idmapd binary exists

I agree that regression testing should have caught this problem before updates were released. 

Please fix this ASAP or provide a temporary work around.  Servers with the most recent updates are effectively useless while NFS Client functionality is broken.

Comment 25 chruitad 2014-01-08 22:09:24 UTC

I should add this build of nfs-utils breaks the ability to use the NFS client.  When I try to mount, I get the following error:

mount.nfs: mounting host:/share failed, reason given by server: No such file or directory

However, when I do a "showmount -e server", I can see the list of exports.  This share is still available on machines that did not get a recent "update"

Please advise!

Let me know how I can debug this further if you need help fixing this side effect.(In reply to chruit from comment #24)

Comment 26 kerickso 2014-01-12 04:28:17 UTC

What other information do you need?  We really need a fix on this.  You completely broke RHEL6.5 for a very common use case.  Do I have to go through different RH support channels?

Comment 27 Steve Dickson 2014-01-13 15:36:14 UTC

(In reply to kerickso from comment #26)
> What other information do you need?  We really need a fix on this.  You
> completely broke RHEL6.5 for a very common use case.  Do I have to go
> through different RH support channels?

For Now the workaround is to re-enable rpc.idmapd since the
final fix will be a kernel patch (that increase the key ring).

Comment 28 Hugh MacMullan IV 2014-01-17 17:31:11 UTC

(In reply to chruit from comment #24)

> rpc.svcgssd is stopped
> 
> [root@host etc]# service rpc.idmapd start
> rpc.idmapd: unrecognized service
> 
> Yet, the /usr/sbin/rpc.idmapd binary exists

Hi! In case you didn't figure it out, or if it's useful to others, the service is 'rpcidmapd', no '.'. On all clients we did:

chkconfig --add rpcidmapd
service rpcidmapd start
(or reboot)

And all's well for now.

Comment 29 kerickso 2014-01-20 02:36:28 UTC

(In reply to Steve Dickson from comment #27)
> (In reply to kerickso from comment #26)
> > What other information do you need?  We really need a fix on this.  You
> > completely broke RHEL6.5 for a very common use case.  Do I have to go
> > through different RH support channels?
> 
> For Now the workaround is to re-enable rpc.idmapd since the
> final fix will be a kernel patch (that increase the key ring).

The "workaround" is really not very useful when you have hundreds of machines automatically updating and thus needing manual repair.  Further, that kernel fix is not guaranteed to fix all of the problems.  As I said in a post that is mysteriously gone, in our situation the problem does not arise after X users connect as stated in Comment 21.  It is present when even a single user connects.  Therefore, the problem needs further investigation.

I've opened a ticket with RH using our support contract, citing this one, and requesting management escalation.  Hopefully something gets done soon, as the urgency of this fix I don't think is appropriately understood by the RH staff.

Comment 32 Steve Dickson 2014-02-10 13:03:55 UTC

The problem there is defined in
   https://bugzilla.redhat.com/show_bug.cgi?id=1033708

The kernel fix has recently gone upstream with this bz
   https://bugzilla.redhat.com/show_bug.cgi?id=876705

and that bz also talk about manually increasing keyring
size. I'm hoping that backport will make it into 6.6.
I'm cc-ing David who is working on it show maybe he
can shed some light on that... 

but for now the two workarounds are 
   1) enable the rpc.idmapd as it says in Comment 28
   2) Increase the keyring size like it says in bz876705

Comment 33 kerickso 2014-02-10 14:27:58 UTC

(In reply to Steve Dickson from comment #32)
> The problem there is defined in
>    https://bugzilla.redhat.com/show_bug.cgi?id=1033708
> 
> The kernel fix has recently gone upstream with this bz
>    https://bugzilla.redhat.com/show_bug.cgi?id=876705
> 
> and that bz also talk about manually increasing keyring
> size. I'm hoping that backport will make it into 6.6.
> I'm cc-ing David who is working on it show maybe he
> can shed some light on that... 
> 
> but for now the two workarounds are 
>    1) enable the rpc.idmapd as it says in Comment 28
>    2) Increase the keyring size like it says in bz876705

The keyring bug describes a problem were the 201st uid is screwed up.  I am describing a problem where every user is screwed up.  Logging with just one person mounting a directory with just one owner results in broken ownership.

As for the kernel change, this needs to make it into 6.5, not 6.6.  Unless you intend to get 6.6 out the door yesterday.

Comment 34 bcodding 2014-03-20 17:48:01 UTC

There are two issues at play here.  The first is that the kernel's id_resolver keyring is limited to 509 entries on x86_64 and 1020 on a 32 bit system, so if you have more distinct user/groups than that, the idmapper is going to get an error returned in request_key and fall back to doing the upcall to rpc.idmapd.

The fallback results in a leaked key that counts against root's quota and cannot be cleaned up with `nfsidmap -c`.  Even if you disable nfsidmap in request-key's configuration, each lookup results in a leaked key.  Even worse, each lookup results in the kernel forking request-key attempting to instantiate the key, then a fallback to the upcall mech, which performs much worse than just performing the upcall in the first place.

Enable rpc.idmapd if you have more user/group than 502.  You can get a /little/ bit of caching by trying to purge the id_resolv keyring when it gets close to full with `nfsidmap -c`, but if you overrun it, you're leaking keys against root's key quota, and if the quota fills you're stuck in the exec request-key, fails, try upcall path.

Comment 36 bcodding 2014-03-21 16:21:37 UTC

Steve, can you share any details about the fix?  I couldn't find it on the lists.  I've been working on getting nfsidmap to hang keys off subrings of the id_resolv keyring.

Comment 37 Steve Dickson 2014-03-21 18:26:49 UTC

Very recently there were upstream patches that increased
the keyring size. It would be very difficult to back port
those patches to RHEL6. So I've decided to do two things:

1) re-enabled rpc.idmapd on the NFS client
2) remove the nfsidmap command since it no longer needed.

These bits are in the available in nfs-utils-1.2.3-40.el6

Comments welcome on this approached!

Comment 38 bcodding 2014-03-21 19:36:06 UTC

Great to hear we'll have a fix soon!

I've got an approach where nfsidmap takes a command-line parameter specifying the number of child keyrings to hang off the .id_resolver root keyring.  Then, it instantiates the new keys on the least-filled child keyring.

Any interest in that before you rip out nfsidmap?  That would make the nfsidmap method functional until we can get caught up to the keyring size increase.

With a little extra logic, it could probably detect when the child keyrings are getting full and add a new one each time.

Comment 39 bcodding 2014-03-21 21:23:28 UTC

Created attachment 877507 [details]
Use multiple keyrings for nfsidmap to work around keyring limits

Here's patch to keep nfsidmap and work around the keyring limitations by filling using a specified number of keyrings instead of a single keyring.

Comment 40 Steve Dickson 2014-03-22 10:50:25 UTC

(In reply to bcodding from comment #38)
> Great to hear we'll have a fix soon!
> 
> I've got an approach where nfsidmap takes a command-line parameter
> specifying the number of child keyrings to hang off the .id_resolver root
> keyring.  Then, it instantiates the new keys on the least-filled child
> keyring.
> 
> Any interest in that before you rip out nfsidmap?  That would make the
> nfsidmap method functional until we can get caught up to the keyring size
> increase.
To be quit frank, I'm not sure its legal just rip a command out of 
a RHEL release... so the RHEL police might be knocking on my door... ;-)
 
> 
> With a little extra logic, it could probably detect when the child keyrings
> are getting full and add a new one each time.
With this patch, how many id/gid keys are possible?

Also the patch needs to update the nfsidmap(5) man page and would 
you mind posting the patch to linux-nfs.org....

Comment 41 bcodding 2014-03-22 21:32:30 UTC

(In reply to Steve Dickson from comment #40)
> With this patch, how many id/gid keys are possible?

In theory, about 508^2 on x86_64, which fixes things for us since have less than 100K distinct names.  This approach could be extended to create more parent/child relationships which would continue to scale.

> Also the patch needs to update the nfsidmap(5) man page and would 
> you mind posting the patch to linux-nfs.org....

Sure, I'll do that and put it out to the list.

Comment 43 bcodding 2014-03-25 14:03:07 UTC

If rpc.idmapd is re-enabled, there's going to be big performance problems for each lookup.  The kernel is going to try to exec request-key before falling back to rpc.idmapd, which is much worse than just doing the upcall to rpc.idmapd in the first place.  This is a problem that can't get fixed in nfs-utils.

However, I think we should try to get nfsidmap to use multiple keyrings, which could increase the id_resolver cache enough to avoid this problem.  I've re-worked the multiple keyring approach in nfsidmap so that no additional command-line params are required - instead nfsidmap just adds additional keyrings as they fill up.  Would you take a look?

Comment 44 bcodding 2014-03-25 14:04:35 UTC

Created attachment 878472 [details]
[PATCH 1/2] nfsidmap: Match names with kernel default keyring

Comment 45 bcodding 2014-03-25 14:05:12 UTC

Created attachment 878473 [details]
[PATCH 2/2] nfsidmap: Create id_resolver child keyrings

Comment 46 kerickso 2014-03-25 18:33:14 UTC

(In reply to bcodding from comment #43)
> If rpc.idmapd is re-enabled, there's going to be big performance problems
> for each lookup.  The kernel is going to try to exec request-key before
> falling back to rpc.idmapd, which is much worse than just doing the upcall
> to rpc.idmapd in the first place.  This is a problem that can't get fixed in
> nfs-utils.

Then where can it get fixed?

More specifically, how can we restore RHEL to the state before Steve bollocksed it up?  It really baffles me that RH is letting this farce go on for so long.

Comment 47 Steve Dickson 2014-03-25 19:22:15 UTC

(In reply to bcodding from comment #43)
> If rpc.idmapd is re-enabled, there's going to be big performance problems
> for each lookup.  The kernel is going to try to exec request-key before
> falling back to rpc.idmapd, which is much worse than just doing the upcall
> to rpc.idmapd in the first place.  This is a problem that can't get fixed in
> nfs-utils.
I did take a look and sure enough the exec request-key is done 
and then the upcall to rpc.idmapd and its all hard coded. Meaning
there is no kernel config or #ifdef surrounding those calls.

> 
> However, I think we should try to get nfsidmap to use multiple keyrings,
> which could increase the id_resolver cache enough to avoid this problem. 
> I've re-worked the multiple keyring approach in nfsidmap so that no
> additional command-line params are required - instead nfsidmap just adds
> additional keyrings as they fill up.  Would you take a look?
In http://people.redhat.com/steved/.bz1033708/ there are two 
nfs-utils rpms (i686 & x86_64) that have the re-enable reverted
and the two patches in Comment 44 & Comment 45. I can also
make just the patched-up nfsidmap binary available as well...

I would like to know if this duel keyring approach works. It seems
like it should but I don't have an environment big enough to
test this out...  Can anybody help out?

Comment 48 Steve Dickson 2014-03-25 19:27:46 UTC

(In reply to kerickso from comment #46)
> (In reply to bcodding from comment #43)
> > If rpc.idmapd is re-enabled, there's going to be big performance problems
> > for each lookup.  The kernel is going to try to exec request-key before
> > falling back to rpc.idmapd, which is much worse than just doing the upcall
> > to rpc.idmapd in the first place.  This is a problem that can't get fixed in
> > nfs-utils.
> 
> Then where can it get fixed?
Please test these http://people.redhat.com/steved/.bz1033708/ 

> 
> More specifically, how can we restore RHEL to the state before Steve
> bollocksed it up?  It really baffles me that RH is letting this farce go on
> for so long.
Its been one release cycle... It will be fixed and the fixed will be backported to the one broken release... 

My apologies... but it turns out we were make double upcalls the whole so 
in the end hopefully we can come up with a better solution...

Comment 49 bcodding 2014-03-25 19:39:11 UTC

(In reply to Steve Dickson from comment #47)
> I would like to know if this duel keyring approach works. It seems
> like it should but I don't have an environment big enough to
> test this out...  Can anybody help out?

I've been running this in production for 16 hours or so averaging between 300 and 1200 active id_resov keys.  No key leaks!  Seems to work!

Of course, I'm the guy that wrote it, so maybe you could have someone else test..

SteveD, I think you could create a few thousand entries in your /etc/passwd file on the NFS server, then create a bunch of files owned by each of those users, then stat them all on the client..  You should quickly be able to fill up the keyrings.

Comment 50 kerickso 2014-03-26 16:41:41 UTC

After a yum update to the latest kernel (but still nfs-utils 39, since that is what is released), not even the "service rpcidmapd start" workaround works any longer.  It seems you have broken it worse.

Comment 51 kerickso 2014-03-26 17:13:12 UTC

I have tried your custom version 41.  It now takes forever to stat directories, and I get error messages on a bunch of group IDs.

Why are you resisting just reverting everything you've ever done in 6.5?  6.4 worked.  Go back to that, and stop trying to polish a turd.  You are screwing around with untested changes from upstream code in a RHEL release.  We pay you to specifically NOT do that.

Comment 52 bcodding 2014-03-26 17:34:32 UTC

(In reply to kerickso from comment #51)
> I have tried your custom version 41.  It now takes forever to stat
> directories, and I get error messages on a bunch of group IDs.
> 
> Why are you resisting just reverting everything you've ever done in 6.5? 
> 6.4 worked.  Go back to that, and stop trying to polish a turd.  You are
> screwing around with untested changes from upstream code in a RHEL release. 
> We pay you to specifically NOT do that.

If you add -vvv to nfsidmap in /etc/request-key.d/id_resolver.conf like this:

create    id_resolver    *         *    /usr/sbin/nfsidmap -vvv %k %d

you should be getting some log messages from nfsidmap in /var/log/messages.  Can you provide those?

Comment 53 kerickso 2014-03-26 17:44:32 UTC

There's a lot of site-identifying information in there.  But basically:

nfsidmap[5004]: key: 0x1eaef126 type: uid value: <email address> timeout 600
nfsidmap[5004]: nss_getpwnam: name '<user>' not found in domain '<domain>'

That would be for a <user> that maps to nobody.  Other users do map.  It seems to be random.

Comment 54 bcodding 2014-03-26 17:48:43 UTC

For this <user> that can't be found, is that user actually existing in your name services (LDAP, NIS, whatever)?

You may be having an intermittent NSS issue, which ends up getting cached as a negative entry - so after that first attempt there's a period where the lookup is not re-attempted.

Comment 55 bcodding 2014-03-26 17:58:44 UTC

Ah, add more 'v's to your nfsidmap command, there's a bunch of logging in libnfsidmap we are missing.  The problem is in libnfsidmap.  Try with 4 'v's: -vvvv, and send us some logs.

Comment 56 procaccia 2014-03-28 11:14:25 UTC

I have the same problem, nfs server on RHEL 6.5 , nfsv4 with clients in fedora19 64bits, sssd, ldap automounts.

there is an update waiting for my nfs server now running 2.6.32-431.5.1.el6.x86_64, the update is 2.6.32-431.11.2.el6
but how can I know that this update includes a coorection for that nfs/idmap/key... problem !?
reading http://rhn.redhat.com/errata/RHSA-2014-0328.html doesn't talk about it .

as restarting the main NFS server has severe impact on our production, I cannot blindly upgrade without knowing that this update could help me out of that pb .

Thanks .

PS: nfs server has nfs-utils-1.2.3-39.el6.x86_64, and no update available .

Comment 57 bcodding 2014-03-28 15:16:46 UTC

(In reply to procaccia from comment #56)
> but how can I know that this update includes a coorection for that
> nfs/idmap/key... problem !?
> reading http://rhn.redhat.com/errata/RHSA-2014-0328.html doesn't talk about
> it .

The fix(s) described in this thread are all NFS client fixes, and none of them are in the kernel package.  This specific issue can't be fixed by updating the kernel of your NFS server.

Comment 58 Brent Jones 2014-04-11 22:01:44 UTC

I'm able to reproduce this issue reliably since upgrading to 6.5

Running nfs-utils:
nfs-utils.x86_64      1:1.2.3-39
nfs-utils-lib.x86_64  1.1.5-6

And kernel:
kernel.x86_64         2.6.32-431
kernel.x86_64         2.6.32-431.11.2

I use SSSD with LDAP/Kerberos, with AD backend. rpcidmap is set to start on both NFS servers and clients.

On fresh boot, ID mapping shows the right UID/GIDs, but I am unable to change owner or mode on the files, with error "Invalid Argument".
If I restart rpcidmapd on the NFS server , the error goes away for a while, but it eventually comes back after some hours. Also, if the NFS server is rebooted again, I have to restart rpcidmapd manually yet again.

This was all working fine on 6.4.

One oddity I have also noticed, I can stop rpcidmapd on NFS clients, and ID mapping seems to work consistently, behavior for me seems to be NFS server related.

Comment 61 Steve Dickson 2014-04-29 15:35:10 UTC

*** Bug 1079871 has been marked as a duplicate of this bug. ***

Comment 66 M.T 2014-05-15 08:15:40 UTC

Adding the following values in /etc/sysctl.conf file, and starting the service rpcidmapd 

kernel.keys.root_maxbytes = 500000
 kernel.keys.root_maxkeys = 25000

we noticed that although we have more than 1000 users id, only 514 entries are always in /proc/keys.

We have also noticed that executing ls -l /mail/<userdirectories>, the time needed,  can be up to 2 minutes. Executing immediately again the command for the same directory, it only takes few milliseconds.

We do have openLDAP, nfs servers/clients, automount service and sssd.

Comment 68 jas 2014-05-20 15:45:36 UTC

Steve - Nothing has been said about this bug for a bit, though I've noticed the recent "fix" change from your custom nfs-utils-1.2.3-43 to nfs-utils-1.2.3-46.  Can you provide any comment on what the new version will do to solve the problem?  Is it still re-enabling rpc.idmapd and deleting nfsidmap?  Any idea when this will become an official update? 

It just so happens that this bug bit me while upgrading a bunch of clients from NFSv3 to NFSv4.  We have 2300 users and 166 groups, so the default kernel key ring method isn't going to work.  Re-enabling rpc.idmapd on the client seems to be the solution.  On the surface, I don't see any performance issues, though I can see that the root keyring still ends up meeting quota eventually (which I can make happen really quickly with an ls -alR).  From one of the comments here, it looks like that behaviour can't change without a kernel adjustment.  However, it's not clear if that meeting quota is actually a problem, as long as rpc.idmapd is running.  Any feedback would be helpful.

Comment 69 Steve Dickson 2014-05-20 18:25:12 UTC

(In reply to jas from comment #68)
> Steve - Nothing has been said about this bug for a bit, though I've noticed
> the recent "fix" change from your custom nfs-utils-1.2.3-43 to
> nfs-utils-1.2.3-46.  Can you provide any comment on what the new version
> will do to solve the problem? 
I've decided to with the attached patches. They will enable you
to have up to 250,000 users. Any more than would have broken 
rpc.idmapd any ways. It been test in a very large environment 
for quite a while now, with no problems. 
 
> Is it still re-enabling rpc.idmapd and deleting nfsidmap? 

No but if particular site does want to re-able, the
/etc/init.d/nfs will realize rpc.idmapd is running and
the right thing. By right thing I mean instead of just 
blindly trying to start rpc.idmapd, which fails because
rpc.idmapd is already running, the initscript will signal 
rpc.idmapd allowing it to create the communications to
the kernel.

> Any idea when this will become an official update?
I think it's in a couple months, but I really don't know.

In the mean time, I've put a pre-released version for nfs-utils 
under http://people.redhat.com/steved/.bz1033708/. Its only
x86_64 version, let me know if you need a different arch.

Please feel free to take out for test run and report back how 
well it works...

> 
> It just so happens that this bug bit me while upgrading a bunch of clients
> from NFSv3 to NFSv4.  We have 2300 users and 166 groups, so the default
> kernel key ring method isn't going to work.  Re-enabling rpc.idmapd on the
> client seems to be the solution.  On the surface, I don't see any
> performance issues, though I can see that the root keyring still ends up
> meeting quota eventually (which I can make happen really quickly with an ls
> -alR).  From one of the comments here, it looks like that behaviour can't
> change without a kernel adjustment.  However, it's not clear if that meeting
> quota is actually a problem, as long as rpc.idmapd is running.  Any feedback
> would be helpful.
The kernel does indeed try the upcall to nfsidmap and then to rpc.idmap
when the first upcall fails. So I though it was crazy not to try an fix
the problem verses to allow the double calls to happen all the time.

I guess I'm not totally against re-enable rpc.idmapd (it's just one
line in the spec file, all the infrastructure is still there), it's
the daemon not needed! I do get a lot push back from other
customers about needless daemon running, which is one of the reasons
I made the change.

But... when rpc.idmad is enabled it does give the idmapping 
a bit of redundancy. Because if nfsidmap does fail for some reason
the kernel then call up to the running rpc.idmapd to do the mapping.
I have tested this scenario and it does work...

So for now I'm going leave rpc.idmapd off by default, hoping 
most sites don't need more that 250,000 users. If they 
do then they can re-able rpc.idmap which will handle all 
rest of the users.

Comments welcomed!

Comment 70 jas 2014-05-20 19:15:09 UTC

Hi Steve.

Thanks for your response..

I tried the patched nfs-utils.  I guess it goes without saying that the patch still requires manually increasing kernel.keys.root_maxkeys and kernel.keys.root_maxbytes, because without that, the root keys are still filled very quickly.  It's too bad there isn't a better way to increase these values automatically. 

For my test, I went with root_maxkeys of 20,000 and root_maxbytes of 100,000.  As far as I'm aware, I don't need to increase the non-root maxkeys/maxbytes.

I did an ls in a directory containing 1577 user directories.  The results of cat /proc/key-users afterwards is:

    0:  1592 1591/1591 1588/20000 54902/100000

... which looks good.

nfsidmap -c gives:

nfsidmap: clearing '39a68b62 I--Q--     1 perm 3f3f0000     0     0 keyring   .id_resolver_child_1: empty'
nfsidmap: clearing '114ed483 I--Q--     1 perm 3f3f0000     0     0 keyring   .id_resolver_child_2: empty'
nfsidmap: clearing '317148c9 I--Q--     1 perm 3f3f0000     0     0 keyring   .id_resolver_child_3: empty'
nfsidmap: clearing '3716347e I--Q--     1 perm 3f3f0000     0     0 keyring   .id_resolver_child_4: 427/428'
nfsidmap: clearing '0748947a I--Q--     1 perm 3f3f0000     0     0 keyring   .id_resolver_child_5: 501/504'
nfsidmap: clearing '21a1110a I--Q--     1 perm 3f3f0000     0     0 keyring   .id_resolver_child_6: 367/368'
nfsidmap: clearing '22c92b97 I-----     1 perm 1f030000     0     0 keyring   .id_resolver: 23/24'

(I guess these are the child keyrings created by the patch.)

now /proc/key-users:

    0:     6 5/5 2/20000 70/100000

ls again in the directory, and I'm back to ..

    0:  1587 1586/1586 1583/20000 54765/100000

... and since the numbers are slightly different from the original numbers, I tried one more time..

cleared with nfsidmap -c

# cat /proc/key-users
    0:     6 5/5 2/20000 70/100000

(as above)

and ls again..

# cat /proc/key-users
    0:  1587 1586/1586 1583/20000 54765/100000

(as above)

Looks good!

Comment 71 M.T 2014-05-21 12:18:21 UTC

Jas,

did you notice any performance issue of the ls command before and after applying the patch.
We do have performance issue, see also https://bugzilla.redhat.com/show_bug.cgi?id=1098147

To be honest I am confused, what the cause of the poor performance is. The sssd or the small value of the /proc/keys?

Comment 72 jas 2014-05-21 13:47:46 UTC

With the patch applied, and after a reboot, doing an ls in a directory containing 1574 directories (an ext4 filesystem served via nfsv4), time resulted in:

0.007u 0.125s 0:03.65 3.2%	0+0k 0+0io 0pf+0w
0.003u 0.003s 0:00.02 0.0%	0+0k 0+0io 0pf+0w
0.005u 0.001s 0:00.01 0.0%	0+0k 0+0io 0pf+0w
0.006u 0.000s 0:00.01 0.0%	0+0k 0+0io 0pf+0w
0.006u 0.000s 0:00.01 0.0%	0+0k 0+0io 0pf+0w

On a machine without the patch, also freshly booted..

0.006u 0.053s 0:01.16 4.3%	0+0k 0+0io 0pf+0w
0.006u 0.002s 0:00.01 0.0%	0+0k 0+0io 0pf+0w
0.007u 0.001s 0:00.01 0.0%	0+0k 0+0io 0pf+0w
0.006u 0.002s 0:00.01 0.0%	0+0k 0+0io 0pf+0w
0.008u 0.000s 0:00.01 0.0%	0+0k 0+0io 0pf+0w

... but, without the patch, most of the uid lookups are failing.

Comment 73 bcodding 2014-06-26 15:23:35 UTC

Even with the patched nfsidmap, you will still need to increase root's keyring quotas.  We're running

kernel.keys.root_maxkeys = 1000000
kernel.keys.root_maxbytes = 25000000

Add these lines to your /etc/sysctl.conf, and run sysctl -p.  Verify the setting with `head -1 /proc/key-users` - the last two column's second number should match root_maxkeys and root_maxbytes respectively.

We've been running this patch and these keyring settings for the last 3 months or so with no problems.

Comment 76 Benjamin Coddington 2014-08-13 14:33:15 UTC

*** Bug 1044514 has been marked as a duplicate of this bug. ***

Comment 78 errata-xmlrpc 2014-10-14 04:32:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1407.html

Comment 79 Benjamin Coddington 2015-08-03 18:01:54 UTC

*** Bug 1122375 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.

bcodding
brent
bubrown
ccui
chruitad
cww
dhowells
dwysocha
eguan
hugh
igeorgex
jas
jehan.procaccia
jhsiao
jko
jkurik
jsvarova
kerickso
klepikho
ksquizza
mitsis
mmalhotr
pyaduvan
rmainz
robh
sami.kahkonen
steved
swhiteho
tmaria
tom