Bug 1292238

Summary: extreme memory usage in libnfsidmap sss.so plug-in when resolving groups with many members
Product: Red Hat Enterprise Linux 7 Reporter: James Ralston <ralston>
Component: sssdAssignee: SSSD Maintainers <sssd-maint>
Status: CLOSED ERRATA QA Contact: Steeve Goveas <sgoveas>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.2CC: grajaiya, jhrozek, lslebodn, mkosek, mniranja, mzidek, pbrezina, ralston, sbose
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: sssd-1.14.0-0.1.alpha.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-04 07:14:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
call trace of rpc.idmapd when OOM killer killed it
none
evidence of a "domain users" group lookup malloc bomb
none
nfs idmap: fix infinite loop none

Description James Ralston 2015-12-16 19:39:12 UTC
Description of problem:

We are testing a RHEL7-based NFSv4 server and NFSv4 client in our infrastructure, in order to provide NFS-based /home directories on the client that are auto-mounted from the server on-demand. Both the server and the client share these common features:

* The server and client are VMware virtual machines, with 2GiB of memory each.

* They are joined to our Microsoft Active Directory domain (via "net ads join").

* They use the AD KDC as their Kerberos KDCs (in /etc/krb5.conf).

* They run sssd, and sssd provides the (nss, pac, pam) services.

* All users and groups are provided using the sss nsswitch plug-in.

* We use the sss.so plug-in in our idmapd.conf plug-in.

The client mounts the server with the nfserv=4.2 and sec=krb5p options.

We discovered a few bugs during the initial setup (see bug 1283341) that suggests we are "pushing the envelope" in terms of our direct integration with Active Directory.

Specifically, I suspect not many sites are using the libnfsidmap sss.so plug-in (at least, not yet).

On our first day of testing with our development team, the NFS server has invoked the OOM killer 4 times, and the NFS client has invoked the OOM killer once.

For the server, each time the OOM killer was invoked, it was triggered by the rpc.idmapd process. During normal operation, rpc.idmapd has about 32MiB of total memory usage, and about 1MiB of resident memory:

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND     
27766 root      20   0   33708   1036    812 S  0.0  0.1   0:00.00 rpc.idmapd  

But when the OOM killer is invoked, the memory usage of rpc.idmapd has jumped to almost 1GiB of total memory, with almost 512MiB resident. Here's the line from the OOM killer report:

[ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[ 1279]     0  1279   918202   423856    1794   485969             0 rpc.idmapd

Because of the extreme memory usage of rpc.idmapd, the OOM killer selects it for termination, and memory usage recovers.

On the client, we've only seen the OOM killer invoked once so far, but the process that triggered it was nfsidmap. Here's the line from the OOM killer report:

[ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[16323]     0 16323   916107   424365    1792   485464             0 nfsidmap

As with rpc.idmapd on the server, because of the extreme memory usage of nfsidmap, the OOM killer selects it for termination, and memory usage recovers.

The common element between both rpc.idmapd and nfsidmap is that they load the sss.so plug-in. Therefore, we highly suspect that this extreme memory usage occurs in the sss.so plug-in.

Version-Release number of selected component (if applicable):

0:gssproxy-0.4.1-7.el7.x86_64
0:libnfsidmap-0.25-12.el7.x86_64
0:libsss_idmap-1.13.0-40.el7.x86_64
0:libsss_nss_idmap-1.13.0-40.el7.x86_64
0:python-sssdconfig-1.13.0-40.el7.noarch
0:sssd-1.13.0-40.el7.x86_64
0:sssd-ad-1.13.0-40.el7.x86_64
0:sssd-client-1.13.0-40.el7.x86_64
0:sssd-common-1.13.0-40.el7.x86_64
0:sssd-common-pac-1.13.0-40.el7.x86_64
0:sssd-ipa-1.13.0-40.el7.x86_64
0:sssd-krb5-1.13.0-40.el7.x86_64
0:sssd-krb5-common-1.13.0-40.el7.x86_64
0:sssd-ldap-1.13.0-40.el7.x86_64
0:sssd-proxy-1.13.0-40.el7.x86_64

The client is using kernel 3.10.0-327.el7.x86_64; the server is using 3.10.0-327.el7.local.2.x86_64.  (The only change the .local.2 kernel adds is that it contains the kernel gss patch in bug 1283341.)

How reproducible:

I do not know the specific circumstances that trigger the extreme memory usage in rpc.idmapd / nfsidmap, but we seem to be able to trigger it fairly easily on the server.

Additional info:

I don't know whether the extreme memory usage of rpc.idmapd/nfsidmap would recover. Meaning, if our VMs has 8GiB of memory instead of 2GiB, so that rpc.idmapd/nfsidmap could consume more than ~1GiB of memory before triggering the OOM killer, would they recover on their after allocating, say, 2GiB of memory?

(I suspect the answer is "no".  Given that normal operation consumes ~1MiB of resident memory, I suspect that whatever memory consumption is being triggered is essentially an infinite loop, and that the process will consume memory until memory exhaustion is reached. And even if the processes would recover at some memory usage point, suddenly jumping from ~1MiB resident to ~512MiB resident is unacceptable behavior.)

We do have a temporary work-around for this behavior: since both the server and the client are identical, and obtain the same passwd/group entries (via sssd), we can use the nsswitch.so plugin on both the client and the server, and still have name/id translation work. (We are testing that now, and at least so far, we haven't seen the OOM killer invoked on either the server or the client.)

BUT: we have Linux NFSv4 clients that will need to use NFSv4/krb5/AD servers that are not Linux-based. Those clients *must* use the sss plug-in for idmapd in order for NFSv4 name/id translation to work. So if the sss plug-in for idmapd is what is causing the extreme memory usage in rpc.idmapd and nfsidmap (and the data so far strongly suggest that is the case), we absolutely need a way to prevent that from happening.

Comment 2 James Ralston 2015-12-16 20:09:17 UTC
Created attachment 1106513 [details]
call trace of rpc.idmapd when OOM killer killed it

On the NFS server, one of the invocation of the OOM killer managed to capture the call trace for the rpc.idmapd process at the time it killed it.

Unfortunately, there doesn't seem to be much to go on here.

I'm open to suggestions on how to debug this further.

We could revert to using the sss plug-in for idmapd, and attach to it with gdb, but I'm not sure how to capture any useful information before the OOM killer is invoked and performs a kill -9 on it.

I'm not sure if reverting to the sss plug-in and running valgrind rpc.idmapd -f -vvv until the malloc bomb happened would produce and useful information. Thoughts?

I'm a Fedora developer, so I'd be happy to re-roll local sssd RPMs with extra debugging patches applied.

Comment 3 James Ralston 2015-12-17 22:35:03 UTC
Created attachment 1106892 [details]
evidence of a "domain users" group lookup malloc bomb

I believe I've figured out the problem.

This attachment is the output of "ltrace -f rpc.idmapd -f" interleaved with full rpc.idmapd debugging from /var/log/messages.

When libnfsidmap attempts to resolve gr_gid = 1651947284 to the group name, it malloc bombs the host.

In our domain, gr_gid = 1651947284 is the "domain users" group:

$ getent group 1651947284 | cut -d: -f1
domain users

This group contains every single user in our domain:

$ getent group "domain users" | tr , '\012' | wc -l
925

I strongly suspect that the extreme memory usage is dependent on the (non-trivial) number of members of the group.

Resolving the "domain users" group doesn't malloc bomb the host every time. However, I can most easily reproduce the malloc bomb when the host has been quiescent for some time. Therefore, I suspect the extreme memory usage only occurs when the "domain users" group has expired from sssd's cache.

We appreciate the fact that sssd provides the libnfsidmap plugin. It finally allows us to use NFSv4+krb5+AD, which wasn't possible before.

But this is a SHOWSTOPPER-class problem. It is completely unreasonable for the libnfsidmap plugin to consume gigabytes of memory—however briefly—in the course of normal operations. It needs to be able to function in a reasonably small memory footprint.

Comment 4 Sumit Bose 2015-12-18 11:09:17 UTC
(In reply to James Ralston from comment #3)
> Resolving the "domain users" group doesn't malloc bomb the host every time.
> However, I can most easily reproduce the malloc bomb when the host has been
> quiescent for some time. Therefore, I suspect the extreme memory usage only
> occurs when the "domain users" group has expired from sssd's cache.
> 

Can you check if adding

[sss]
    memcache = False

will expose the malloc bomb all the time? For your description I assume that as long as the group has a valid entry in the memory cache no memory leak exists. If this is true it would help to track down the code path where the memory happens.

Comment 5 Sumit Bose 2015-12-18 12:20:03 UTC
Looks like I was wrong, the issue is in the memcache path, so that with above config option you shouldn't see the memory bomb. You are right, it is an infinite loop and attached patch fixes it for me.

I'll try to prepare a test build.

Comment 6 Sumit Bose 2015-12-18 15:38:21 UTC
Please find a test-build at http://koji.fedoraproject.org/koji/taskinfo?taskID=12238159.

Comment 7 Sumit Bose 2015-12-18 15:39:24 UTC
Created attachment 1107202 [details]
nfs idmap: fix infinite loop

Comment 8 James Ralston 2015-12-19 01:57:32 UTC
Good catch! Yes, that was clearly an infinite loop bug / malloc bomb: the size argument was the initial size of the buffer, not the grown size.

The malloc bomb only triggered when the initial buffer size of 4096 bytes was inadequate:

#define BUF_LEN                     (4096)

In a test environment, without many users, the "domain users" group entry wouldn't have been large enough to exceed BUF_LEN and trigger the malloc bomb. But at larger sites with many hundreds or thousands of users, BUF_LEN would be inadequate, triggering the bug. And ~925 users is definitely enough to exceed BUF_LEN:

$ getent group 1651947284 | wc -c
7545

I've rebuilt local versions of sssd with your patch, and I've updated our test NFSv4 server/client to use them.

I'll post an update of the results of our testing (which might take longer than usual, because many people will be on vacation over the next two weeks).

Comment 9 Jakub Hrozek 2016-01-04 09:58:51 UTC
Upstream ticket:
https://fedorahosted.org/sssd/ticket/2909

Comment 10 Lukas Slebodnik 2016-01-04 12:52:17 UTC
(In reply to James Ralston from comment #8)
> Good catch! Yes, that was clearly an infinite loop bug / malloc bomb: the
> size argument was the initial size of the buffer, not the grown size.
> 
> The malloc bomb only triggered when the initial buffer size of 4096 bytes
> was inadequate:
> 
> #define BUF_LEN                     (4096)
> 
> In a test environment, without many users, the "domain users" group entry
> wouldn't have been large enough to exceed BUF_LEN and trigger the malloc
> bomb. But at larger sites with many hundreds or thousands of users, BUF_LEN
> would be inadequate, triggering the bug. And ~925 users is definitely enough
> to exceed BUF_LEN:
> 
> $ getent group 1651947284 | wc -c
> 7545
> 
> I've rebuilt local versions of sssd with your patch, and I've updated our
> test NFSv4 server/client to use them.
> 
> I'll post an update of the results of our testing (which might take longer
> than usual, because many people will be on vacation over the next two weeks).

The patch make sense. But I would like to ask whether you hit another issue
in your testing.

Comment 11 James Ralston 2016-01-04 18:29:35 UTC
With the patch, we haven't been able to reproduce the OOM problem. We haven't noticed any other issues from using the patch.

However, as I said in comment 8, we had very light usage of the test hosts last week (only one person logged in the entire week), so last week wasn't a good test.

We should be able to test the hosts more thoroughly this week. So I think if we cannot reproduce the OOM problem after this week, and we don't see any other problems, then we can assume the problem is fixed.

Comment 12 Jakub Hrozek 2016-01-06 04:19:12 UTC
master: 2a256e4e4b64891fe846e933589506daa68aa13e 
sssd-1-13: 93e404ea1b3bda5961fb61e6d22d081adfed889a 
sssd-1-12: 097edb7fa766ab50c768c049c5b1e15361785ed6

Comment 13 James Ralston 2016-01-12 21:25:25 UTC
For the record, after another week of testing, we have not been able to reproduce the malloc bomb in the NFS idmap plugin, and we have not noticed any additional problems introduced by the patch.

Comment 14 Jakub Hrozek 2016-01-13 07:41:34 UTC
(In reply to James Ralston from comment #13)
> For the record, after another week of testing, we have not been able to
> reproduce the malloc bomb in the NFS idmap plugin, and we have not noticed
> any additional problems introduced by the patch.

Great, thanks again for testing. FYI I also commited that code to RHEL-6 since 6.8 will enable the NFS plugin as well -- and this bug would make the plugin unusable.

There is a COPR repo with 6.8 preview builds if you're interested in testing.

Comment 15 Mike McCune 2016-03-28 23:37:25 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 18 Sumit Bose 2016-06-20 09:13:30 UTC
To reproduce this you need a group with many user. The actual amount depends a bit on the mean name length of the members. Since the issue is about a 4k buffer with a mean name length of 5 characters you would need more then 820 mebers in the group, with a length of 10 more then 410 members.

Instead with a full NFS setup is should be possible to reproduce the issue by calling

    nfsidmap 12345 uid:group_with_many_members@domain

The 12345 is a keyring ID but since the keyring operation happen after the group lookup the exact value does not matter. Without the patch 'nfsidmap' should not return and consume more and more memory. With the patch 'nfsidmap' should just return.

Comment 20 Sumit Bose 2016-09-22 14:54:12 UTC
(In reply to Sumit Bose from comment #18)
> To reproduce this you need a group with many user. The actual amount depends
> a bit on the mean name length of the members. Since the issue is about a 4k
> buffer with a mean name length of 5 characters you would need more then 820
> mebers in the group, with a length of 10 more then 410 members.
> 
> Instead with a full NFS setup is should be possible to reproduce the issue
> by calling
> 
>     nfsidmap 12345 uid:group_with_many_members@domain
> 
> The 12345 is a keyring ID but since the keyring operation happen after the
> group lookup the exact value does not matter. Without the patch 'nfsidmap'
> should not return and consume more and more memory. With the patch
> 'nfsidmap' should just return.

sorry, the loop is triggered by the lookup by GID, please use:

    nfsidmap 12345 group:987654

where 987654 is the GID of the group with many members.

Comment 21 Niranjan Mallapadi Raghavender 2016-09-22 15:25:25 UTC
1. Reproduce the issue with older version
=======================================

[root@client2 ~]# rpm -qa  | grep sss
sssd-client-1.13.0-40.el7_2.9.x86_64
sssd-ipa-1.13.0-40.el7_2.9.x86_64
sssd-proxy-1.13.0-40.el7_2.9.x86_64
libsss_idmap-1.13.0-40.el7_2.9.x86_64
libsss_nss_idmap-1.13.0-40.el7_2.9.x86_64
sssd-common-1.13.0-40.el7_2.9.x86_64
sssd-common-pac-1.13.0-40.el7_2.9.x86_64
sssd-ad-1.13.0-40.el7_2.9.x86_64
sssd-ldap-1.13.0-40.el7_2.9.x86_64
python-sssdconfig-1.13.0-40.el7_2.9.noarch
sssd-krb5-common-1.13.0-40.el7_2.9.x86_64
sssd-krb5-1.13.0-40.el7_2.9.x86_64
sssd-1.13.0-40.el7_2.9.x86_64

2. Configure RHEL7.3 client to authenticate to AD using sss and join AD using net ads command

3. Create a group TestGroup1 with 1001 members

[root@client2 ~]# getent group "TestGroup1"     | tr , '\012' | wc -l
1001


4. cat /etc/sssd/sssd.conf
[sssd]
domains = centaur.test
config_file_version = 2
services = nss, pam

[domain/centaur.test]
ad_domain = centaur.test
krb5_realm = CENTAUR.TEST
cache_credentials = True
id_provider = ad
krb5_store_password_if_offline = True
default_shell = /bin/bash
ldap_id_mapping = True
use_fully_qualified_names = True
fallback_homedir = /home/%u@%d
access_provider = ad

5. cat /etc/idmap.conf
[General]
Domain = centaur.test
Verbosity = 10
[Mapping]
Nobody-User = nobody
Nobody-Group = nobody
[Translation]
Method = sss

[Static]
[UMICH_SCHEMA]
LDAP_server = ldap-server.local.domain.edu
LDAP_base = dc=local,dc=domain,dc=edu


6. Restart nfs-idmap service 
$ systemctl restart nfs-idmapd

7. Run nfsidmap command

[root@client2 ~]#  nfsidmap 1131314 group:1993601671
[root@client2 ~]# /usr/bin/time -v nfsidmap 1131313 group:1993601671
Command exited with non-zero status 1
        Command being timed: "nfsidmap 1131313 group:1993601671"
        User time (seconds): 1.19
        System time (seconds): 1.08
        Percent of CPU this job got: 33%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.72
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 885776
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 25
        Minor (reclaiming a frame) page faults: 727361
        Voluntary context switches: 102
        Involuntary context switches: 162
        Swaps: 0
        File system inputs: 26760
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 1



Following message are seen 

Sep 22 20:44:28 client2.example.test nfsidmap[3075]: name_lookup: keyctl_instantiate failed: Operation not permitted
Sep 22 20:48:50 client2.example.test nfsidmap[3092]: key: 0x114332 type: group value: 1993601671 timeout 600
Sep 22 20:48:50 client2.example.test nfsidmap[3092]: nfs4_gid_to_name: calling sss_nfs->gid_to_name
Sep 22 20:48:57 client2.example.test nfsidmap[3092]: sss_nfs_gid_to_name: rc=0 msg=Success
Sep 22 20:48:57 client2.example.test nfsidmap[3092]: nfs4_gid_to_name: sss_nfs->gid_to_name returned 0
Sep 22 20:48:57 client2.example.test nfsidmap[3092]: nfs4_gid_to_name: final return value is 0
Sep 22 20:48:57 client2.example.test nfsidmap[3092]: name_lookup: keyctl_instantiate failed: Operation not permitted


8. Update sssd to 1.14

sssd-common-1.14.0-41.el7.x86_64
sssd-krb5-1.14.0-41.el7.x86_64
sssd-ipa-1.14.0-41.el7.x86_64
libsss_nss_idmap-1.13.0-40.el7_2.9.x86_64
sssd-client-1.14.0-41.el7.x86_64
sssd-krb5-common-1.14.0-41.el7.x86_64
sssd-ad-1.14.0-41.el7.x86_64
sssd-ldap-1.14.0-41.el7.x86_64
python-sssdconfig-1.14.0-41.el7.noarch
libsss_idmap-1.14.0-41.el7.x86_64
sssd-common-pac-1.14.0-41.el7.x86_64
sssd-proxy-1.14.0-41.el7.x86_64
sssd-1.14.0-41.el7.x86_64

9. Run nfsidmap again:

Command exited with non-zero status 1
        Command being timed: "nfsidmap 1131314 group:1993601671"
        User time (seconds): 0.00
        System time (seconds): 0.00
        Percent of CPU this job got: 50%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 1144
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 354
        Voluntary context switches: 1
        Involuntary context switches: 0
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 1


Sep 22 20:52:23 client2.example.test systemd[1]: Started Network Manager Script Dispatcher Service.
Sep 22 20:52:23 client2.example.test nm-dispatcher[3199]: Dispatching action 'dhcp4-change' for eth0
Sep 22 20:52:26 client2.example.test nfsidmap[3212]: key: 0x114332 type: group value: 1993601671 timeout 600
Sep 22 20:52:26 client2.example.test nfsidmap[3212]: nfs4_gid_to_name: calling sss_nfs->gid_to_name
Sep 22 20:52:26 client2.example.test nfsidmap[3212]: found gid 1993601671 in memcache
Sep 22 20:52:26 client2.example.test nfsidmap[3212]: sss_nfs_gid_to_name: rc=0 msg=Success
Sep 22 20:52:26 client2.example.test nfsidmap[3212]: nfs4_gid_to_name: sss_nfs->gid_to_name returned 0
Sep 22 20:52:26 client2.example.test nfsidmap[3212]: nfs4_gid_to_name: final return value is 0
Sep 22 20:52:26 client2.example.test nfsidmap[3212]: name_lookup: keyctl_instantiate failed: Operation not permitted

Comment 23 errata-xmlrpc 2016-11-04 07:14:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2476.html