Bug 1254518

Summary:	Fix crash in nss responder
Product:	Red Hat Enterprise Linux 7	Reporter:	Jakub Hrozek <jhrozek>
Component:	sssd	Assignee:	Lukas Slebodnik <lslebodn>
Status:	CLOSED ERRATA	QA Contact:	Kaushik Banerjee <kbanerje>
Severity:	unspecified	Docs Contact:
Priority:	medium
Version:	7.0	CC:	apeetham, grajaiya, jgalipea, jhrozek, lslebodn, mkosek, mzidek, nsoman, pbrezina, preichl
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	sssd-1.13.0-24.el7	Doc Type:	Bug Fix
Doc Text:	Cause: When two domains were configured and initgroups were run on a user from the second domain memory was double freed Consequence: SSSD crashed Fix: Memory deallocation is handled properly Result: SSSD doesn't crash	Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-11-19 11:40:15 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jakub Hrozek 2015-08-18 10:11:48 UTC

This bug is created as a clone of upstream ticket:
https://fedorahosted.org/sssd/ticket/2749

Crash is part of test cases for https://fedorahosted.org/sssd/wiki/DesignDocs/RestrictDomainsInPAM.

{{{
Backtrace:
[New LWP 20672]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files'.
Program terminated with signal 6, Aborted.
#0  0x00007fed9e7b25d7 in raise () from /lib64/libc.so.6

Thread 1 (Thread 0x7feda2e96840 (LWP 20672)):
#0  0x00007fed9e7b25d7 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007fed9e7b3cc8 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x00007feda2201f45 in _dbus_abort () from /lib64/libdbus-1.so.3
No symbol table info available.
#3  0x00007feda21f87ce in _dbus_warn_check_failed () from /lib64/libdbus-1.so.3
No symbol table info available.
#4  0x00007feda21eab86 in dbus_message_iter_append_basic () from /lib64/libdbus-1.so.3
No symbol table info available.
#5  0x00007feda21eb4a3 in dbus_message_append_args_valist () from /lib64/libdbus-1.so.3
No symbol table info available.
#6  0x00007feda21eb809 in dbus_message_append_args () from /lib64/libdbus-1.so.3
No symbol table info available.
#7  0x00007feda2f04e17 in sss_dp_get_account_msg (pvt=<optimized out>) at src/responder/common/responder_dp.c:626
        msg = 0x7feda34ad9f0
        dbret = <optimized out>
        info = 0x7feda34c0f90
        be_type = 4099
        attrs = 1
        filter = 0x7feda34b3c50 "name=\336\336\336\336\336\336\336\336q"
        __FUNCTION__ = "sss_dp_get_account_msg"
#8  0x00007feda2f05506 in sss_dp_issue_request (mem_ctx=0x7feda34c0f20, rctx=rctx@entry=0x7feda34a3b70, strkey=strkey@entry=0x7feda34a5640 "3:\336\336\336\336\336\336\336\336q@KRB", dom=dom@entry=0x7feda34a5920, msg_create=msg_create@entry=0x7feda2f04cd0 <sss_dp_get_account_msg>, pvt=0x7feda34c0f90, nreq=nreq@entry=0x7feda34c0d90) at src/responder/common/responder_dp.c:303
        hret = <optimized out>
        value = {type = 2669018022, {ptr = 0x10, i = 16, ui = 16, l = 16, ul = 16, f = 2.24207754e-44, d = 7.9050503334599447e-323}}
        key = 0x7feda34bc130
        sdp_req = <optimized out>
        cb = <optimized out>
        te = <optimized out>
        tv = <optimized out>
        msg = <optimized out>
        tmp_ctx = 0x7feda34a7af0
        ret = <optimized out>
        __FUNCTION__ = "sss_dp_issue_request"
#9  0x00007feda2f05b60 in sss_dp_get_account_send (mem_ctx=mem_ctx@entry=0x7feda34ac7a0, rctx=0x7feda34a3b70, dom=0x7feda34a5920, fast_reply=fast_reply@entry=true, type=type@entry=SSS_DP_INITGROUPS, opt_name=opt_name@entry=0x7feda34a5f00 "\336\336\336\336\336\336\336\336q", opt_id=0, extra=extra@entry=0x0) at src/responder/common/responder_dp.c:499
        ret = <optimized out>
        req = 0x7feda34c0d90
        info = 0x7feda34c0f90
        state = 0x7feda34c0f20
        key = 0x7feda34a5640 "3:\336\336\336\336\336\336\336\336q@KRB"
        __FUNCTION__ = "sss_dp_get_account_send"
#10 0x00007feda2ef219b in check_cache (dctx=dctx@entry=0x7feda34a2cb0, nctx=nctx@entry=0x7feda34b3d00, res=0x7feda34b6c40, req_type=req_type@entry=SSS_DP_INITGROUPS, opt_name=opt_name@entry=0x7feda34a5f00 "\336\336\336\336\336\336\336\336q", opt_id=opt_id@entry=0, extra=0x0, callback=callback@entry=0x7feda2ef65f0 <nss_cmd_getby_dp_callback>, pvt=pvt@entry=0x7feda34a2cb0) at src/responder/nss/nsssrv_cmd.c:705
        ret = <optimized out>
        cmdctx = 0x7feda34b12d0
        cctx = 0x7feda34ac7a0
        req = 0x0
        cb_ctx = 0x0
        cacheExpire = <optimized out>
        __FUNCTION__ = "check_cache"
#11 0x00007feda2ef4b77 in nss_cmd_initgroups_search (dctx=dctx@entry=0x7feda34a2cb0) at src/responder/nss/nsssrv_cmd.c:4314
        cmdctx = <optimized out>
        dom = 0x7feda34a5920
        cctx = <optimized out>
        name = 0x7feda34a5f00 "\336\336\336\336\336\336\336\336q"
        nctx = 0x7feda34b3d00
        ret = <optimized out>
        user_attrs = {0x7feda2f0c0c4 "name", 0x7feda2f0c175 "uidNumber", 0x7feda2f0c189 "gidNumber", 0x7feda2f0c1b5 "gecos", 0x7feda2f0c19d "homeDirectory", 0x7feda2f0c1c5 "loginShell", 0x7feda2f0c449 "lastUpdate", 0x7feda2f0c255 "dataExpireTimestamp", 0x7feda2f0c269 "initgrExpireTimestamp", 0x7feda2f0c03d "objectClass", 0x7feda2f0c059 "origPrimaryGroupGidNumber", 0x7feda2f0c14d "objectSIDString", 0x7feda2f0bf8b "userPrincipalName", 0x7feda2f0c4e8 "overrideDN", 0x7feda2f0c4f3 "overrideObjectDN", 0x7feda2f0bfaa "defaultOverrideName", 0x0}
        msg = 0x7feda34b6c10
        sysdb_name = <optimized out>
        c = <optimized out>
        extra_flag = <optimized out>
        __FUNCTION__ = "nss_cmd_initgroups_search"
#12 0x00007feda2ef6b28 in nss_cmd_getby_dp_callback (err_maj=<optimized out>, err_min=0, err_msg=0x7feda34a68c0 "Success (Success)", ptr=<optimized out>) at src/responder/nss/nsssrv_cmd.c:1208
        dctx = 0x7feda34a2cb0
        cmdctx = 0x7feda34b12d0
        cctx = 0x7feda34ac7a0
        ret = <optimized out>
        check_subdomains = <optimized out>
        nctx = <optimized out>
        __FUNCTION__ = "nss_cmd_getby_dp_callback"
#13 0x00007feda2eed2b5 in nsssrv_dp_send_acct_req_done (req=0x0) at src/responder/nss/nsssrv_cmd.c:759
        cb_ctx = 0x7feda34ad690
        ret = 0
        err_maj = 0
        err_min = 0
        err_msg = 0x7feda34a68c0 "Success (Success)"
        __FUNCTION__ = "nsssrv_dp_send_acct_req_done"
#14 0x00007feda2f04afc in sss_dp_internal_get_done (pending=<optimized out>, ptr=<optimized out>) at src/responder/common/responder_dp.c:802
        ret = 0
        req = 0x7feda34b7030
        sdp_req = 0x7feda34b7240
        cb = 0x7feda34b7460
        state = <optimized out>
        cb_state = 0x7feda34b6c10
        __FUNCTION__ = "sss_dp_internal_get_done"
#15 0x00007feda21dd862 in complete_pending_call_and_unlock () from /lib64/libdbus-1.so.3
No symbol table info available.
#16 0x00007feda21e0b51 in dbus_connection_dispatch () from /lib64/libdbus-1.so.3
No symbol table info available.
#17 0x00007feda28881e2 in sbus_dispatch (ev=0x7feda349a4a0, te=<optimized out>, tv=..., data=<optimized out>) at src/sbus/sssd_dbus_connection.c:96
        new_event = <optimized out>
        conn = 0x7feda34a4b80
        dbus_conn = 0x7feda34a7550
        ret = <optimized out>
        __FUNCTION__ = "sbus_dispatch"
#18 0x00007fed9f15e22f in tevent_common_loop_timer_delay (ev=ev@entry=0x7feda349a4a0) at ../tevent_timed.c:341
        current_time = {tv_sec = 1438913911, tv_usec = 784255}
        te = 0x7feda34a8d60
#19 0x00007fed9f15f23a in epoll_event_loop_once (ev=0x7feda349a4a0, location=<optimized out>) at ../tevent_epoll.c:911
        epoll_ev = 0x7feda349a6e0
        tval = {tv_sec = 57, tv_usec = 999997}
        panic_triggered = false
#20 0x00007fed9f15d937 in std_event_loop_once (ev=0x7feda349a4a0, location=0x7feda28aac33 "src/util/server.c:668") at ../tevent_standard.c:114
        glue_ptr = <optimized out>
        glue = 0x7feda349a590
        ret = <optimized out>
#21 0x00007fed9f15a0fd in _tevent_loop_once (ev=ev@entry=0x7feda349a4a0, location=location@entry=0x7feda28aac33 "src/util/server.c:668") at ../tevent.c:533
        ret = <optimized out>
        nesting_stack_ptr = 0x0
#22 0x00007fed9f15a29b in tevent_common_loop_wait (ev=0x7feda349a4a0, location=0x7feda28aac33 "src/util/server.c:668") at ../tevent.c:637
        ret = <optimized out>
#23 0x00007fed9f15d8d7 in std_event_loop_wait (ev=0x7feda349a4a0, location=0x7feda28aac33 "src/util/server.c:668") at ../tevent_standard.c:140
        glue_ptr = <optimized out>
        glue = 0x7feda349a590
        ret = <optimized out>
#24 0x00007feda2891e23 in server_loop (main_ctx=0x7feda349b8f0) at src/util/server.c:668
No locals.
#25 0x00007feda2eebec7 in main (argc=6, argv=<optimized out>) at src/responder/nss/nsssrv.c:626
        opt = <optimized out>
        pc = 0x7feda3499060
        main_ctx = 0x7feda349b8f0
        ret = <optimized out>
        uid = 0
        gid = 0
        long_options = {{longName = 0x0, shortName = 0 '\000', argInfo = 4, arg = 0x7feda2850320 <poptHelpOptions>, val = 0, descrip = 0x7feda2f093b9 "Help options:", argDescrip = 0x0}, {longName = 0x7feda2f093c7 "debug-level", shortName = 100 'd', argInfo = 2, arg = 0x7fed9f567120 <debug_level>, val = 0, descrip = 0x7feda2f093d3 "Debug level", argDescrip = 0x0}, {longName = 0x7feda2f093df "debug-to-files", shortName = 102 'f', argInfo = 0, arg = 0x7fed9f56711c <debug_to_file>, val = 0, descrip = 0x7feda2f09930 "Send the debug output to files instead of stderr", argDescrip = 0x0}, {longName = 0x7feda2f093ee "debug-to-stderr", shortName = 0 '\000', argInfo = 1073741824, arg = 0x7fed9f567118 <debug_to_stderr>, val = 0, descrip = 0x7feda2f09968 "Send the debug output to stderr directly.", argDescrip = 0x0}, {longName = 0x7feda2f093fe "debug-timestamps", shortName = 0 '\000', argInfo = 2, arg = 0x7fed9f5670f4 <debug_timestamps>, val = 0, descrip = 0x7feda2f0940f "Add debug times!
 tamps", a
 rgDescrip = 0x0}, {longName = 0x7feda2f09424 "debug-microseconds", shortName = 0 '\000', argInfo = 2, arg = 0x7fed9f5670f0 <debug_microseconds>, val = 0, descrip = 0x7feda2f09998 "Show timestamps with microseconds", argDescrip = 0x0}, {longName = 0x7feda2f09437 "uid", shortName = 0 '\000', argInfo = 2, arg = 0x7fff1689a910, val = 0, descrip = 0x7feda2f099c0 "The user ID to run the server as", argDescrip = 0x0}, {longName = 0x7feda2f0943b "gid", shortName = 0 '\000', argInfo = 2, arg = 0x7fff1689a914, val = 0, descrip = 0x7feda2f099e8 "The group ID to run the server as", argDescrip = 0x0}, {longName = 0x0, shortName = 0 '\000', argInfo = 0, arg = 0x0, val = 0, descrip = 0x0, argDescrip = 0x0}}
        __FUNCTION__ = "main"
}}}

Comment 1 Jakub Hrozek 2015-08-19 22:11:10 UTC

Patch acked upstream, acking for RHEL

Comment 2 Namita Soman 2015-08-20 13:07:17 UTC

Pls add steps to verify

Comment 3 Jakub Hrozek 2015-08-20 14:52:59 UTC

(In reply to Namita Soman from comment #2)
> Pls add steps to verify

Configure two domains and run initgroups on a user from the second domain.

Please note this is a use-after-free error, so it's possible sssd wouldn't crash each time, especially on a fast or idle machine.

Comment 5 Lukas Slebodnik 2015-08-27 06:54:44 UTC

There is a mistake in Doc text.

The statement "memory was double freed" is wrong.
Double free and use after free are two different kind of bugs.
The memory was accessed after releasing it.

Comment 6 Lukas Slebodnik 2015-09-02 12:31:26 UTC

(In reply to Jakub Hrozek from comment #3)
> (In reply to Namita Soman from comment #2)
> > Pls add steps to verify
> 
> Configure two domains and run initgroups on a user from the second domain.
> 
> Please note this is a use-after-free error, so it's possible sssd wouldn't
> crash each time, especially on a fast or idle machine.

It's very likely it might crash in such setup. So it is not a deterministic reproducer. But you might use valgrind.

Put following line to "[nss]" section of sssd
command = valgrind -v --log-file=/var/log/sssd/valgrind_nss_%p.log /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files

# there is a assumption that sssd_nss was spawn by sssd with the same arguments
# It should be the same as output of `pgrep -af sssd_nss`
 
In case of use after free you should be able to see an error "Invalid read of size" in log file /var/log/sssd/valgrind_nss_%p.log
The similar to in commit message https://git.fedorahosted.org/cgit/sssd.git/commit/?id=b9901fe3d6cfe05cd75a2440c0f9c7985aea36c6

Comment 7 Amith 2015-09-16 10:50:10 UTC

I followed the instructions on comment #6 and tried to verify the bug. The valgrind log file shows "Invalid read of size". Does that mean "use after free" error still exists and bug is not fixed? OR is it the other way around ?

Also, encountered other issues like SSSD service fails to start upon adding the following to "[nss]" section:
command = valgrind -v --log-file=/var/log/sssd/valgrind_nss_%p.log /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files

SSSD service error:
--------------------------------------------------
# service sssd status
Redirecting to /bin/systemctl status  sssd.service
● sssd.service - System Security Services Daemon
   Loaded: loaded (/usr/lib/systemd/system/sssd.service; disabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/sssd.service.d
           └─journal.conf
   Active: failed (Result: exit-code) since Wed 2015-09-16 06:13:58 EDT; 10s ago
  Process: 17987 ExecStart=/usr/sbin/sssd -D -f (code=exited, status=1/FAILURE)
 Main PID: 16897 (code=exited, status=0/SUCCESS)

 sssd[be[sssdad2012r2.com]][17989]: Starting up
 sssd[pam][17992]: Starting up
 sssd[17988]: Exiting the SSSD. Could not restart critical service [nss].
 sssd[pam][17992]: Shutting down
 sssd[be[LDAP]][17990]: Shutting down
 sssd[be[sssdad2012r2.com]][17989]: Shutting down
 systemd[1]: sssd.service: control process exited, code=exited status=1
 systemd[1]: Failed to start System Security Services Daemon.
 systemd[1]: Unit sssd.service entered failed state.
 systemd[1]: sssd.service failed.

When SSSD was initiated from the command line, it works fine.

# sssd -D
# ps -ef | grep sss
root     18028     1  0 06:23 ?        00:00:00 sssd -D
root     18029 18028  0 06:23 ?        00:00:00 /usr/libexec/sssd/sssd_be --domain sssdad2012r2.com --uid 0 --gid 0
root     18030 18028  0 06:23 ?        00:00:00 /usr/libexec/sssd/sssd_be --domain LDAP --uid 0 --gid 0
root     18031 18028  0 06:23 ?        00:00:03 valgrind -v --log-file=/var/log/sssd/valgrind_nss_%p.log /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files
root     18032 18028  0 06:23 ?        00:00:00 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0
-------------------------------------------------------

The valgrind log file shows the error "Invalid read of size 4".

# grep Invalid /var/log/sssd/valgrind_nss_18031.log 
==18031== Invalid read of size 4

Contents of SSSD.CONF with multiple domains:
----------------------------------------------------
[sssd]
domains = sssdad2012r2.com, LDAP
config_file_version = 2
services = nss, pam

[nss]
command = valgrind -v --log-file=/var/log/sssd/valgrind_nss_%p.log /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files

[domain/sssdad2012r2.com]
ad_domain = sssdad2012r2.com
krb5_realm = SSSDAD2012R2.COM
realmd_tags = manages-system joined-with-adcli 
cache_credentials = True
id_provider = ad
krb5_store_password_if_offline = True
default_shell = /bin/bash
ldap_id_mapping = True
use_fully_qualified_names = True
fallback_homedir = /home/%u@%d
access_provider = ad

[domain/LDAP]
id_provider = ldap
auth_provider = ldap
enumerate = true
debug_level = 0xFFF0
cache_credentials = FALSE
ldap_uri = ldaps://hubcap.lab.eng.pnq.redhat.com
ldap_tls_cacert = /etc/openldap/certs/cacert.asc
ldap_search_base = dc=example,dc=com
----------------------------------------------------

Comment 8 Lukas Slebodnik 2015-09-18 05:44:28 UTC

(In reply to Amith from comment #7)
> I followed the instructions on comment #6 and tried to verify the bug. The
> valgrind log file shows "Invalid read of size". Does that mean "use after
> free"
yes

> error still exists and bug is not fixed? OR is it the other way around
> ?
>
Which version of sssd did you test?
The error should be gone with the latest sssd.
 
> Also, encountered other issues like SSSD service fails to start upon adding
> the following to "[nss]" section:
> command = valgrind -v --log-file=/var/log/sssd/valgrind_nss_%p.log
> /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files
> 
I assume that valgrind was installed on that machine.
So it might have been caused by SELinux policy. 

By default sssd is not allowed to execute code from memory; which is required by valgrind.

#============= sssd_t ==============
allow sssd_t self:process execmem;

> SSSD service error:
> --------------------------------------------------
> # service sssd status
> Redirecting to /bin/systemctl status  sssd.service
> ● sssd.service - System Security Services Daemon
>    Loaded: loaded (/usr/lib/systemd/system/sssd.service; disabled; vendor
> preset: disabled)
>   Drop-In: /etc/systemd/system/sssd.service.d
>            └─journal.conf
>    Active: failed (Result: exit-code) since Wed 2015-09-16 06:13:58 EDT; 10s
> ago
>   Process: 17987 ExecStart=/usr/sbin/sssd -D -f (code=exited,
> status=1/FAILURE)
>  Main PID: 16897 (code=exited, status=0/SUCCESS)
> 
>  sssd[be[sssdad2012r2.com]][17989]: Starting up
>  sssd[pam][17992]: Starting up
>  sssd[17988]: Exiting the SSSD. Could not restart critical service [nss].
>  sssd[pam][17992]: Shutting down
>  sssd[be[LDAP]][17990]: Shutting down
>  sssd[be[sssdad2012r2.com]][17989]: Shutting down
>  systemd[1]: sssd.service: control process exited, code=exited status=1
>  systemd[1]: Failed to start System Security Services Daemon.
>  systemd[1]: Unit sssd.service entered failed state.
>  systemd[1]: sssd.service failed.
> 
> When SSSD was initiated from the command line, it works fine.
>
This might be explained by mislabeled files (in /run/... ).
and errors caused by SELinux. You should either prepare own temporary
SELinux policy or test in permissive mode.

> # sssd -D
> # ps -ef | grep sss
> root     18028     1  0 06:23 ?        00:00:00 sssd -D
> root     18029 18028  0 06:23 ?        00:00:00 /usr/libexec/sssd/sssd_be
> --domain sssdad2012r2.com --uid 0 --gid 0
> root     18030 18028  0 06:23 ?        00:00:00 /usr/libexec/sssd/sssd_be
> --domain LDAP --uid 0 --gid 0
> root     18031 18028  0 06:23 ?        00:00:03 valgrind -v
> --log-file=/var/log/sssd/valgrind_nss_%p.log /usr/libexec/sssd/sssd_nss
> --uid 0 --gid 0 --debug-to-files
> root     18032 18028  0 06:23 ?        00:00:00 /usr/libexec/sssd/sssd_pam
> --uid 0 --gid 0
> -------------------------------------------------------

Comment 9 Amith 2015-09-24 05:30:16 UTC

Verified the bug with SSSD Version: sssd-1.13.0-26.el7.x86_64

Steps followed during verification:

1. Setup SSSD with multiple domains and add the following in nss section:

command = valgrind -v --log-file=/var/log/sssd/valgrind_nss_%p.log /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files

2. Restart SSSD and run initgroups against user from first domain.

# id testuser OR id --groups testuser

# id --groups GROUP

3. Verify valgrind log file for the following error:
==29279== Invalid read of size 1
==29279== at 0x4C2CBA2: strlen (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) 
==29279== by 0x89A7AC4: talloc_strdup (in /usr/lib64/libtalloc.so.2.1.2) ==29279== by 0x11668A: nss_cmd_initgroups_search (nsssrv_cmd.c:4191)

4. Above mentioned errors were not discovered in valgrind log file.

Comment 10 errata-xmlrpc 2015-11-19 11:40:15 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-2355.html