Bug 1546064

Summary: sssd_nss consumes more memory until restarted or machine swaps [rhel-7.4.z]
Product: Red Hat Enterprise Linux 7 Reporter: Oneata Mircea Teodor <toneata>
Component: sssdAssignee: SSSD Maintainers <sssd-maint>
Status: CLOSED ERRATA QA Contact: sssd-qe <sssd-qe>
Severity: urgent Docs Contact: Lucie Maňásková <lmanasko>
Priority: urgent    
Version: 7.3CC: aheverle, andrewg.gray, atolani, ddas, ekeck, enewland, fidencio, gparente, grajaiya, imntreal, jhrozek, jvilicic, jwooten, knweiss, lmanasko, lslebodn, mkosek, mmilgram, mniranja, mupadhye, mzidek, pbrezina, pdhamdhe, pkulkarn, serge.savard, sgoveas, sssd-maint, striker, tscherf, vmishra
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: sssd-1.15.2-50.el7_4.12 Doc Type: Bug Fix
Doc Text:
Previously, due to a wrong memory hierarchy in the network security service (NSS) responder, the memory that was needed for printing a specific name was allocated to a permanent memory context. As a consequence, the memory usage of the sssd_nss service kept growing. This update allocates the memory to a correct operation context and the memory usage of sssd_nss no longer grows.
Story Points: ---
Clone Of: 1482231 Environment:
Last Closed: 2018-04-17 16:35:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1482231    
Bug Blocks:    
Attachments:
Description Flags
valgrind logs
none
valgrind output from unpatched version
none
valgrind output from patched version none

Description Oneata Mircea Teodor 2018-02-16 09:01:58 UTC
This bug has been copied from bug #1482231 and has been proposed to be backported to 7.4 z-stream (EUS).

Comment 8 Niranjan Mallapadi Raghavender 2018-04-05 08:40:16 UTC
Versions:
=========
sssd-common-1.15.2-50.el7_4.13.x86_64
sssd-ad-1.15.2-50.el7_4.13.x86_64
sssd-1.15.2-50.el7_4.13.x86_64
libsss_nss_idmap-1.15.2-50.el7_4.11.x86_64
sssd-client-1.15.2-50.el7_4.13.x86_64
libsss_sudo-1.15.2-50.el7_4.13.x86_64
libsss_autofs-1.15.2-50.el7_4.13.x86_64
sssd-krb5-common-1.15.2-50.el7_4.13.x86_64
sssd-ipa-1.15.2-50.el7_4.13.x86_64
sssd-ldap-1.15.2-50.el7_4.13.x86_64
sssd-proxy-1.15.2-50.el7_4.13.x86_64
sssd-dbus-1.15.2-50.el7_4.13.x86_64
libsss_certmap-1.15.2-50.el7_4.11.x86_64
libsss_idmap-1.15.2-50.el7_4.13.x86_64
python-sssdconfig-1.15.2-50.el7_4.13.noarch
sssd-common-pac-1.15.2-50.el7_4.13.x86_64
sssd-krb5-1.15.2-50.el7_4.13.x86_64
libsss_simpleifp-1.15.2-50.el7_4.13.x86_64


Configuration:
=============
1. Configure sssd to authenticate against ldap (389-ds)

[sssd]
domains = EXAMPLE.TEST
config_file_version = 2
services = nss, pam, ifp

[domain/EXAMPLE.TEST]
enumerate = false
id_provider = ldap
ldap_uri = ldap://auto-hv-01-guest01.idmqe.lab.eng.bos.redhat.com
ldap_search_base = dc=example,dc=test
ldap_tls_cacert = /etc/openldap/cacerts/cacert.pem
auth_provider = ldap
krb5_server = auto-hv-01-guest01.idmqe.lab.eng.bos.redhat.com
krb5_kpasswd = auto-hv-01-guest01.idmqe.lab.eng.bos.redhat.com
krb5_realm = EXAMPLE.TEST
debug_level = 9
cache_credentials = True
krb5_store_password_if_offline = True

[nss]
command = valgrind --leak-check=full --trace-children=yes --show-reachable=yes \
--track-origins=yes --read-var-info=yes --tool=memcheck --num-callers=50 -v \
--time-stamp=yes --log-file=/tmp/valgrind.%p.log /usr/libexec/sssd/sssd_nss
debug_level = 9


1. Start sssd with the above options

2. Run the below script for an hour

NAME_PREFIX=`for i in {1..245}; do printf "x"; done`
for i in {1..1000}; do
        for j in {1..500}; do
                getent passwd -s sss "$NAME_PREFIX$j"
        done

        for j in {1..500}; do
                getent passwd -s sss "$NAME_PREFIX$j"
        done

        sleep 15
done

Leak summary from 

==00:01:17:03.719 9618== LEAK SUMMARY:
==00:01:17:03.719 9618==    definitely lost: 0 bytes in 0 blocks
==00:01:17:03.719 9618==    indirectly lost: 0 bytes in 0 blocks
==00:01:17:03.719 9618==      possibly lost: 5,016 bytes in 18 blocks
==00:01:17:03.719 9618==    still reachable: 321,866 bytes in 444 blocks
==00:01:17:03.719 9618==         suppressed: 0 bytes in 0 blocks
==00:01:17:03.719 9618==
==00:01:17:03.719 9618== ERROR SUMMARY: 13 errors from 13 contexts (suppressed: 0 from 0)
==00:01:17:03.719 9618==
==00:01:17:03.719 9618== 1 errors in context 1 of 13:
==00:01:17:03.719 9618== Conditional jump or move depends on uninitialised value(s)
==00:01:17:03.719 9618==    at 0x529C087: become_user (in /usr/lib64/sssd/libsss_util.so)
==00:01:17:03.719 9618==    by 0x52911AC: server_setup (in /usr/lib64/sssd/libsss_util.so)
==00:01:17:03.719 9618==    by 0x11002F: ??? (in /usr/libexec/sssd/sssd_nss)
==00:01:17:03.719 9618==    by 0x9628C04: (below main) (in /usr/lib64/libc-2.17.so)
==00:01:17:03.719 9618==  Uninitialised value was created by a stack allocation
==00:01:17:03.719 9618==    at 0x10FB48: ??? (in /usr/libexec/sssd/sssd_nss)
==00:01:17:03.719 9618==
==00:01:17:03.719 9618==
==00:01:17:03.719 9618== 1 errors in context 2 of 13:
==00:01:17:03.719 9618== Syscall param chown(group) contains uninitialised byte(s)
==00:01:17:03.719 9618==    at 0x96F1807: chown (in /usr/lib64/libc-2.17.so)
==00:01:17:03.719 9618==    by 0x8BDBB82: chown_debug_file (debug.c:335)
==00:01:17:03.719 9618==    by 0x529118D: server_setup (in /usr/lib64/sssd/libsss_util.so)
==00:01:17:03.719 9618==    by 0x11002F: ??? (in /usr/libexec/sssd/sssd_nss)
==00:01:17:03.719 9618==    by 0x9628C04: (below main) (in /usr/lib64/libc-2.17.so)
==00:01:17:03.719 9618==  Uninitialised value was created by a stack allocation
==00:01:17:03.719 9618==    at 0x10FB48: ??? (in /usr/libexec/sssd/sssd_nss)
==00:01:17:03.719 9618==
==00:01:17:03.719 9618==
==00:01:17:03.719 9618== 1 errors in context 3 of 13:
==00:01:17:03.719 9618== Syscall param chown(owner) contains uninitialised byte(s)
==00:01:17:03.719 9618==    at 0x96F1807: chown (in /usr/lib64/libc-2.17.so)
==00:01:17:03.719 9618==    by 0x8BDBB82: chown_debug_file (debug.c:335)
==00:01:17:03.719 9618==    by 0x529118D: server_setup (in /usr/lib64/sssd/libsss_util.so)
==00:01:17:03.719 9618==    by 0x11002F: ??? (in /usr/libexec/sssd/sssd_nss)
==00:01:17:03.719 9618==    by 0x9628C04: (below main) (in /usr/lib64/libc-2.17.so)
==00:01:17:03.719 9618==  Uninitialised value was created by a stack allocation
==00:01:17:03.719 9618==    at 0x10FB48: ??? (in /usr/libexec/sssd/sssd_nss)
==00:01:17:03.719 9618==
==00:01:17:03.719 9618== ERROR SUMMARY: 13 errors from 13 contexts (suppressed: 0 from 0)

Comment 9 Niranjan Mallapadi Raghavender 2018-04-05 08:43:39 UTC
Created attachment 1417565 [details]
valgrind logs

Comment 10 Niranjan Mallapadi Raghavender 2018-04-05 12:35:13 UTC
Versions:
=========
Reproduced the issue on below version:
=====================================

sssd-krb5-1.15.2-50.el7_4.11.x86_64
sssd-debuginfo-1.15.2-50.el7_4.11.x86_64
sssd-common-pac-1.15.2-50.el7_4.11.x86_64
sssd-client-1.15.2-50.el7_4.11.x86_64
sssd-ad-1.15.2-50.el7_4.11.x86_64
sssd-dbus-1.15.2-50.el7_4.11.x86_64
sssd-common-1.15.2-50.el7_4.11.x86_64
sssd-ldap-1.15.2-50.el7_4.11.x86_64
sssd-ipa-1.15.2-50.el7_4.11.x86_64
sssd-krb5-common-1.15.2-50.el7_4.11.x86_64
sssd-1.15.2-50.el7_4.11.x86_64
sssd-proxy-1.15.2-50.el7_4.11.x86_64


Configuration:

Configuration:
=============
1. Configure sssd to authenticate against ldap (389-ds)

[sssd]
domains = EXAMPLE.TEST
config_file_version = 2
services = nss, pam, ifp

[domain/EXAMPLE.TEST]
enumerate = false
id_provider = ldap
ldap_uri = ldap://auto-hv-01-guest01.idmqe.lab.eng.bos.redhat.com
ldap_search_base = dc=example,dc=test
ldap_tls_cacert = /etc/openldap/cacerts/cacert.pem
auth_provider = ldap
krb5_server = auto-hv-01-guest01.idmqe.lab.eng.bos.redhat.com
krb5_kpasswd = auto-hv-01-guest01.idmqe.lab.eng.bos.redhat.com
krb5_realm = EXAMPLE.TEST
debug_level = 9
cache_credentials = True
krb5_store_password_if_offline = True

[nss]
command = valgrind --leak-check=full --trace-children=yes --show-reachable=yes \
--track-origins=yes --read-var-info=yes --tool=memcheck --num-callers=50 -v \
--time-stamp=yes --log-file=/tmp/valgrind.%p.log /usr/libexec/sssd/sssd_nss
debug_level = 9


1. Start sssd with the above options

2. Run the below script for an hour

NAME_PREFIX=`for i in {1..245}; do printf "x"; done`
for i in {1..1000}; do
        for j in {1..500}; do
                getent passwd -s sss "$NAME_PREFIX$j"
        done

        for j in {1..500}; do
                getent passwd -s sss "$NAME_PREFIX$j"
        done

        sleep 15
done

Leak are seen in responder(sized_output_name):

==00:00:03:21.560 16565== 517,176 bytes in 1,500 blocks are definitely lost in loss record 357 of 358
==00:00:03:21.560 16565==    at 0x4C29BE3: malloc (vg_replace_malloc.c:299)
==00:00:03:21.560 16565==    by 0x8FF4EAB: __talloc_with_prefix (talloc.c:698)
==00:00:03:21.560 16565==    by 0x8FF4EAB: __talloc (talloc.c:739)
==00:00:03:21.560 16565==    by 0x8FF4EAB: __talloc_strlendup (talloc.c:2369)
==00:00:03:21.560 16565==    by 0x8FF4EAB: talloc_strdup (talloc.c:2385)
==00:00:03:21.560 16565==    by 0x52934F9: sss_output_name (usertools.c:808)
==00:00:03:21.560 16565==    by 0x5293690: sss_output_fqname (usertools.c:863)
==00:00:03:21.560 16565==    by 0x121249: sized_output_name (responder_common.c:1708)
==00:00:03:21.560 16565==    by 0x113836: memcache_delete_entry (nss_get_object.c:112)
==00:00:03:21.560 16565==    by 0x113C06: nss_get_object_done (nss_get_object.c:245)
==00:00:03:21.560 16565==    by 0x8DE5291: _tevent_req_error (in /usr/lib64/libtevent.so.0.9.31)
==00:00:03:21.560 16565==    by 0x1277CE: cache_req_done (cache_req.c:1058)
==00:00:03:21.560 16565==    by 0x8DE5291: _tevent_req_error (in /usr/lib64/libtevent.so.0.9.31)
==00:00:03:21.560 16565==    by 0x126B4E: cache_req_search_domains_done (cache_req.c:608)
==00:00:03:21.560 16565==    by 0x8DE5291: _tevent_req_error (in /usr/lib64/libtevent.so.0.9.31)
==00:00:03:21.560 16565==    by 0x1218EB: sss_dp_internal_get_done (responder_dp.c:899)
==00:00:03:21.560 16565==    by 0x5B14861: ??? (in /usr/lib64/libdbus-1.so.3.7.4)
==00:00:03:21.560 16565==    by 0x5B17B50: dbus_connection_dispatch (in /usr/lib64/libdbus-1.so.3.7.4)
==00:00:03:21.560 16565==    by 0x5286C51: sbus_dispatch (sssd_dbus_connection.c:93)
==00:00:03:21.560 16565==    by 0x8DE8C96: tevent_common_loop_timer_delay (in /usr/lib64/libtevent.so.0.9.31)
==00:00:03:21.560 16565==    by 0x8DE9CA8: ??? (in /usr/lib64/libtevent.so.0.9.31)
==00:00:03:21.560 16565==    by 0x8DE82A6: ??? (in /usr/lib64/libtevent.so.0.9.31)
==00:00:03:21.560 16565==    by 0x8DE40CC: _tevent_loop_once (in /usr/lib64/libtevent.so.0.9.31)
==00:00:03:21.560 16565==    by 0x8DE42FA: tevent_common_loop_wait (in /usr/lib64/libtevent.so.0.9.31)
==00:00:03:21.560 16565==    by 0x8DE8246: ??? (in /usr/lib64/libtevent.so.0.9.31)
==00:00:03:21.560 16565==    by 0x5291C72: server_loop (server.c:718)
==00:00:03:21.560 16565==    by 0x11009C: main (nsssrv.c:560)


Updated to below versions

sssd-common-1.15.2-50.el7_4.13.x86_64
sssd-ad-1.15.2-50.el7_4.13.x86_64
sssd-1.15.2-50.el7_4.13.x86_64
sssd-client-1.15.2-50.el7_4.13.x86_64
libsss_sudo-1.15.2-50.el7_4.13.x86_64
libsss_autofs-1.15.2-50.el7_4.13.x86_64
sssd-krb5-common-1.15.2-50.el7_4.13.x86_64
sssd-ipa-1.15.2-50.el7_4.13.x86_64
sssd-ldap-1.15.2-50.el7_4.13.x86_64
sssd-proxy-1.15.2-50.el7_4.13.x86_64
sssd-dbus-1.15.2-50.el7_4.13.x86_64
libsss_idmap-1.15.2-50.el7_4.13.x86_64
python-sssdconfig-1.15.2-50.el7_4.13.noarch
sssd-common-pac-1.15.2-50.el7_4.13.x86_64
sssd-krb5-1.15.2-50.el7_4.13.x86_64
libsss_simpleifp-1.15.2-50.el7_4.13.x86_64


Leak summary from 

==00:01:17:03.719 9618== LEAK SUMMARY:
==00:01:17:03.719 9618==    definitely lost: 0 bytes in 0 blocks
==00:01:17:03.719 9618==    indirectly lost: 0 bytes in 0 blocks
==00:01:17:03.719 9618==      possibly lost: 5,016 bytes in 18 blocks
==00:01:17:03.719 9618==    still reachable: 321,866 bytes in 444 blocks
==00:01:17:03.719 9618==         suppressed: 0 bytes in 0 blocks
==00:01:17:03.719 9618==
==00:01:17:03.719 9618== ERROR SUMMARY: 13 errors from 13 contexts (suppressed: 0 from 0)
==00:01:17:03.719 9618==
==00:01:17:03.719 9618== 1 errors in context 1 of 13:
==00:01:17:03.719 9618== Conditional jump or move depends on uninitialised value(s)
==00:01:17:03.719 9618==    at 0x529C087: become_user (in /usr/lib64/sssd/libsss_util.so)
==00:01:17:03.719 9618==    by 0x52911AC: server_setup (in /usr/lib64/sssd/libsss_util.so)
==00:01:17:03.719 9618==    by 0x11002F: ??? (in /usr/libexec/sssd/sssd_nss)
==00:01:17:03.719 9618==    by 0x9628C04: (below main) (in /usr/lib64/libc-2.17.so)
==00:01:17:03.719 9618==  Uninitialised value was created by a stack allocation
==00:01:17:03.719 9618==    at 0x10FB48: ??? (in /usr/libexec/sssd/sssd_nss)
==00:01:17:03.719 9618==
==00:01:17:03.719 9618==
==00:01:17:03.719 9618== 1 errors in context 2 of 13:
==00:01:17:03.719 9618== Syscall param chown(group) contains uninitialised byte(s)
==00:01:17:03.719 9618==    at 0x96F1807: chown (in /usr/lib64/libc-2.17.so)
==00:01:17:03.719 9618==    by 0x8BDBB82: chown_debug_file (debug.c:335)
==00:01:17:03.719 9618==    by 0x529118D: server_setup (in /usr/lib64/sssd/libsss_util.so)
==00:01:17:03.719 9618==    by 0x11002F: ??? (in /usr/libexec/sssd/sssd_nss)
==00:01:17:03.719 9618==    by 0x9628C04: (below main) (in /usr/lib64/libc-2.17.so)
==00:01:17:03.719 9618==  Uninitialised value was created by a stack allocation
==00:01:17:03.719 9618==    at 0x10FB48: ??? (in /usr/libexec/sssd/sssd_nss)
==00:01:17:03.719 9618==
==00:01:17:03.719 9618==
==00:01:17:03.719 9618== 1 errors in context 3 of 13:
==00:01:17:03.719 9618== Syscall param chown(owner) contains uninitialised byte(s)
==00:01:17:03.719 9618==    at 0x96F1807: chown (in /usr/lib64/libc-2.17.so)
==00:01:17:03.719 9618==    by 0x8BDBB82: chown_debug_file (debug.c:335)
==00:01:17:03.719 9618==    by 0x529118D: server_setup (in /usr/lib64/sssd/libsss_util.so)
==00:01:17:03.719 9618==    by 0x11002F: ??? (in /usr/libexec/sssd/sssd_nss)
==00:01:17:03.719 9618==    by 0x9628C04: (below main) (in /usr/lib64/libc-2.17.so)
==00:01:17:03.719 9618==  Uninitialised value was created by a stack allocation
==00:01:17:03.719 9618==    at 0x10FB48: ??? (in /usr/libexec/sssd/sssd_nss)
==00:01:17:03.719 9618==
==00:01:17:03.719 9618== ERROR SUMMARY: 13 errors from 13 contexts (suppressed: 0 from 0)


The valgrind output doesn't have any leaks corresponding to sized_output_name function from respond

Comment 11 Niranjan Mallapadi Raghavender 2018-04-05 12:39:49 UTC
Created attachment 1417680 [details]
valgrind output from unpatched version

Comment 12 Niranjan Mallapadi Raghavender 2018-04-05 12:41:03 UTC
Created attachment 1417681 [details]
valgrind output from patched version

Comment 15 errata-xmlrpc 2018-04-17 16:35:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1144