Bug 1672584

Summary: [abrt] [faf] sssd: __pthread_rwlock_wrlock(): /usr/libexec/sssd/sssd_be killed by 11
Product: Red Hat Enterprise Linux 8 Reporter: Steeve Goveas <sgoveas>
Component: sssdAssignee: Tomas Halman <thalman>
Status: CLOSED ERRATA QA Contact: sssd-qe <sssd-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.0CC: atikhono, dbula, grajaiya, jhrozek, lslebodn, mupadhye, mzidek, pbrezina, sbose, tscherf, wchadwic
Target Milestone: rc   
Target Release: 8.2   
Hardware: Unspecified   
OS: Unspecified   
URL: http://faf.lab.eng.brq.redhat.com/faf/reports/bthash/8fa878897513ed0df8703f60876adc2c64eef26b/
Whiteboard: sync-to-jira
Fixed In Version: sssd-2.2.3-2.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-28 16:55:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1682305    
Bug Blocks:    

Description Steeve Goveas 2019-02-05 11:32:44 UTC
This bug has been created based on an anonymous crash report requested by the package maintainer.

Report URL: http://faf.lab.eng.brq.redhat.com/faf/reports/bthash/8fa878897513ed0df8703f60876adc2c64eef26b/

Crash starts at this test case from ad_provider/ldap_krb5 test suite
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   Enumerate user belonging to multiple groups
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [ 06:10:13 ] :: [  BEGIN   ] :: Running 'getent -s sss group lkgroup011-22741'
lkgroup011-22741:*:1122741:lkuser01-22741
:: [ 06:10:13 ] :: [   PASS   ] :: Command 'getent -s sss group lkgroup011-22741' (Expected 0, got 0)
:: [ 06:10:13 ] :: [  BEGIN   ] :: Running 'id lkuser01-22741 | grep lkgroup01-22741 | grep lkgroup011-22741'
uid=1022741(lkuser01-22741) gid=1022741(lkgroup01-22741) groups=1022741(lkgroup01-22741),1122741(lkgroup011-22741),1222741(lkgroup012-22741)
:: [ 06:10:15 ] :: [   PASS   ] :: Command 'id lkuser01-22741 | grep lkgroup01-22741 | grep lkgroup011-22741' (Expected 0, got 0)
:: [ 06:10:15 ] :: [  BEGIN   ] :: Running 'id -g lkuser01-22741 | grep 1022741'
1022741
:: [ 06:10:15 ] :: [   PASS   ] :: Command 'id -g lkuser01-22741 | grep 1022741' (Expected 0, got 0)
:: [ 06:10:15 ] :: [  BEGIN   ] :: Running 'su_success lkuser01-22741 Secret123'
spawn su --shell /bin/sh nobody -- -c su --shell /bin/true -- "$1" -- lkuser01-22741
Password:
:: [ 06:10:17 ] :: [   PASS   ] :: Command 'su_success lkuser01-22741 Secret123' (Expected 0, got 0)
:: [ 06:10:28 ] :: [  BEGIN   ] :: Running 'id lkuser01-22741 | grep lkgroup01-22741 | grep lkgroup011-22741'
uid=1022741(lkuser01-22741) gid=1022741(lkgroup01-22741) groups=1022741(lkgroup01-22741),1122741(lkgroup011-22741),1222741(lkgroup012-22741)
:: [ 06:10:31 ] :: [   PASS   ] :: Command 'id lkuser01-22741 | grep lkgroup01-22741 | grep lkgroup011-22741' (Expected 0, got 0)
:: [ 06:10:31 ] :: [  BEGIN   ] :: Running 'id -g lkuser01-22741 | grep 1022741'
1022741
:: [ 06:10:31 ] :: [   PASS   ] :: Command 'id -g lkuser01-22741 | grep 1022741' (Expected 0, got 0)
:: [ 06:10:31 ] :: [  BEGIN   ] :: Running 'su_success lkuser01-22741 Secret123'
spawn su --shell /bin/sh nobody -- -c su --shell /bin/true -- "$1" -- lkuser01-22741
Password:
:: [ 06:10:33 ] :: [   PASS   ] :: Command 'su_success lkuser01-22741 Secret123' (Expected 0, got 0)
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   Duration: 31s
::   Assertions: 7 good, 0 bad
::   RESULT: PASS
** Enumerate-user-belonging-to-multiple-groups PASS Score:0
use_pty:FALSE /usr/share/restraint/plugins/run_task_plugins


Crash report received in email
time:           Tue 05 Feb 2019 06:10:23 AM EST
package:        sssd-common-2.0.0-38.el8
reason:         sssd_be killed by SIGSEGV
crash_function: orderly_shutdown
cmdline:        /usr/libexec/sssd/sssd_be --domain AD --uid 0 --gid 0 --logger=files
executable:     /usr/libexec/sssd/sssd_be
component:      sssd
uid:            0
username:       root
hostname:       host-8-242-67.host.centralci.eng.rdu2.redhat.com
os_release:     Red Hat Enterprise Linux release 8.0 Beta (Ootpa)
architecture:   x86_64
pwd:            /
kernel:         4.18.0-64.el8.x86_64
abrt_version:   2.10.9

Reports:
uReport: BTHASH=8fa878897513ed0df8703f60876adc2c64eef26b
ABRT Server: URL=http://faf.lab.eng.brq.redhat.com/faf/reports/bthash/8fa878897513ed0df8703f60876adc2c64eef26b
ABRT Server: URL=http://faf.lab.eng.brq.redhat.com/faf/reports/9745/
CI Job: https://platform-stg-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/sssd-rhel-8.0.0-candidate-runtest-ad-provider-ldap_krb5-win2k12r2/40/

Full Backtrace:
[New LWP 15889]
Error while reading shared library symbols for /lib64/libbasicobjects.so.0:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/libbasicobjects.so.0.1.0-0.6.1-39.el8.x86_64.debug
Error while reading shared library symbols for /lib64/libref_array.so.1:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/libref_array.so.1.2.1-0.6.1-39.el8.x86_64.debug
Error while reading shared library symbols for /lib64/libcollection.so.4:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/libcollection.so.4.1.1-0.6.1-39.el8.x86_64.debug
Error while reading shared library symbols for /lib64/libsystemd.so.0:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/libsystemd.so.0.23.0-239-11.el8.x86_64.debug
Error while reading shared library symbols for /lib64/libdhash.so.1:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/libdhash.so.1.1.0-0.6.1-39.el8.x86_64.debug
Error while reading shared library symbols for /lib64/libdbus-1.so.3:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/libdbus-1.so.3.19.7-1.12.8-7.el8.x86_64.debug
Error while reading shared library symbols for /lib64/libaudit.so.1:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/lib64/libaudit.so.1.0.0-3.0-0.10.20180831git0047a6c.el8.x86_64.debug
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Error while reading shared library symbols for /lib64/libpath_utils.so.1:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/libpath_utils.so.1.0.1-0.6.1-39.el8.x86_64.debug
Error while reading shared library symbols for /lib64/liblz4.so.1:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/liblz4.so.1.8.1-1.8.1.2-4.el8.x86_64.debug
Error while reading shared library symbols for /lib64/libmount.so.1:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/libmount.so.1.1.0-2.32.1-8.el8.x86_64.debug
Error while reading shared library symbols for /lib64/libgcc_s.so.1:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/lib64/libgcc_s-8-20180905.so.1-8.2.1-3.5.el8.x86_64.debug
Error while reading shared library symbols for /lib64/libblkid.so.1:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/lib64/libblkid.so.1.1.0-2.32.1-8.el8.x86_64.debug
Error while reading shared library symbols for /lib64/libuuid.so.1:
could not find '.gnu_debugaltlink' file for /var/cache/abrt-di/usr/lib/debug/usr/

Comment 1 Alexey Tikhonov 2019-02-07 17:36:59 UTC
So, backtrace boils down to following:

sssd::util/server.c:orderly_shutdown() ->
libc::exit() -> libc::on_exit() -> ... -> 
libtalloc::talloc_lib_atexit() -> [a bunch of libtalloc::_tc_free_*() leading to] -> 
sssd::providers/ldap/sdap_async.c:sdap_handle_destructor() [d-tor of "sdap_handle"] -> (inlined)sdap_handle_release() ->
(inlined)libldap::ldap_unbind_ext() ->ldap_ld_free() -> ...
libcrypto::RAND_get_rand_method()(*) -> CRYPTO_THREAD_write_lock()


RAND_get_rand_method() calls CRYPTO_THREAD_write_lock(rand_meth_lock)
with "rand_meth_lock" being "static CRYPTO_RWLOCK *rand_meth_lock;"
[ https://github.com/openssl/openssl/blob/master/crypto/rand/rand_lib.c ]

Take a note, this is libc::on_exit() path.

Now, https://github.com/openssl/openssl/blob/master/crypto/init.c#L140 :
atexit(OPENSSL_cleanup)

OPENSSL_cleanup() -> rand_cleanup_int() -> CRYPTO_THREAD_lock_free(rand_meth_lock);
[ https://github.com/openssl/openssl/blob/master/crypto/rand/rand_lib.c#L357 ]


So, I am not sure on order of execution of "on_exit" handlers, but it seems it might be so that OpenSSL handler is executed first and mutex is already freed when sssd's "on_exit" handlers tries to lock it...

Comment 2 Jakub Hrozek 2019-02-07 21:10:14 UTC
Thank you very much for the analysis, Alexey.

Given that this is "just" a crash on exit, I don't think it is worth a blocker flag for 8.0 and can wait for either 8.0.z if we will do a z-stream update for another reason or even 8.1

Comment 3 Jakub Hrozek 2019-04-05 13:08:23 UTC
potential place to look at: Sumit found that there is sdap_finalize that calls orderly_shutdown, perhaps we could unbind there?

Comment 19 Sumit Bose 2019-08-23 15:44:43 UTC
Master:
 - f19f8e6b917e77d5d2bfdedc78e5669b522ea265

Comment 20 Sumit Bose 2019-08-23 19:51:10 UTC
(In reply to Sumit Bose from comment #19)
> Master:
>  - f19f8e6b917e77d5d2bfdedc78e5669b522ea265

sorry, this commit had issues

Comment 22 Sumit Bose 2019-08-29 14:36:01 UTC
Master:
 - a9669683de3a1c39dc4e47dd2aca0a9f99b652a9

Comment 25 Alexey Tikhonov 2019-11-15 18:04:20 UTC
*** Bug 1771852 has been marked as a duplicate of this bug. ***

Comment 30 errata-xmlrpc 2020-04-28 16:55:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1863