Bug 1899593

Summary: sssd_be segfaults at be_refresh_get_values_ex() due to NULL ptrs in results of sysdb_search_with_ts_attr() [rhel-7.9.z]
Product: Red Hat Enterprise Linux 7 Reporter: Akshay Sakure <asakure>
Component: sssdAssignee: Alexey Tikhonov <atikhono>
Status: CLOSED ERRATA QA Contact: Anuj Borah <aborah>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.9CC: atikhono, dlavu, grajaiya, hkhot, jhrozek, johannespfau, jreznik, lslebodn, mzidek, pbrezina, sbose, sgoveas, thalman, tscherf
Target Milestone: rcKeywords: Triaged, ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: sync-to-jira
Fixed In Version: sssd-1.16.5-10.el7_9.6 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-12-15 11:22:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Akshay Sakure 2020-11-19 16:06:46 UTC
-  Description of problem:

sssd is crashing with an error - kernel: sssd_be[ ]: segfault at 8 ip 00007fa2c9d08a8e sp 00007ffe81a822d0 error 4 in libsss_util.so[7fa2c9ce9000+8d000]


-  Version-Release number of selected component (if applicable):

RHEL 7.9
sssd-1.16.5-10.el7_9.5.x86_64


-  How reproducible:

Randomly but frequently post upgrading to RHEL 7.9


-  Steps to Reproduce:

1. Configure sssd in id_provider = ldap mode. 
2. Upgrade system to RHEL 7.9 (sssd-1.16.5-10.el7_9.5.x86_64)


-  Actual results:

sssd_be crashes randomly


-  Expected results:

sssd_be shouldn't crash

Comment 5 Alexey Tikhonov 2020-11-19 20:36:19 UTC
(gdb) frame
#0  0x00007f8262bc6a8e in sysdb_msg2attrs (mem_ctx=mem_ctx@entry=0x560a69714c30, count=277, msgs=0x560a69744bc0, attrs=attrs@entry=0x7ffd9c9ada80)
    at src/db/sysdb.c:1574
1574	        a[i]->num = msgs[i]->num_elements;
(gdb)  
(gdb) p count
$11 = 277
(gdb) p i
$12 = 152
(gdb) p msgs[i]
$13 = (struct ldb_message *) 0x0


(gdb) frame 1
#1  0x0000560a67f14670 in be_refresh_get_values_ex (_values=0x560a69714f78, search_cache=SYSDB_CACHE_TYPE_TIMESTAMP, value_attr=0x7f82510d430b "name", 
    key_attr=<optimized out>, base_dn=0x560a69601b40, period=4050, domain=0x560a695c44e0, mem_ctx=0x560a69714f40) at src/providers/be_refresh.c:75
75	    ret = sysdb_msg2attrs(tmp_ctx, res->count, res->msgs, &records);
(gdb)  
(gdb) p res->msgs[151]
$26 = (struct ldb_message *) 0x560a69811220
(gdb) p res->msgs[152]
$27 = (struct ldb_message *) 0x0
(gdb) p res->msgs[153]
$28 = (struct ldb_message *) 0x0
(gdb) p res->msgs[154]
$29 = (struct ldb_message *) 0x560a69814190
(gdb) p res->msgs[155]
$30 = (struct ldb_message *) 0x560a69814c40


  --  `sysdb_search_with_ts_attr()` returned `ldb_result` with some of msgs[] being NULL ptr.


sysdb_search_with_ts_attr(SYSDB_CACHE_TYPE_TIMESTAMP) = sysdb_search_ts_entry() + merge_res_sysdb_attrs(),
   merge_res_sysdb_attrs() = for (c:count) { merge_msg_sysdb_attrs(msgs[c]) }
      merge_msg_sysdb_attrs() = merge_all_ts_attrs(sysdb_cache_search_entry())

But if `sysdb_cache_search_entry()` fails then `merge_res_sysdb_attrs()` doesn't bother much and `ts_cache_res->msgs[c]` is left zero initialized:
https://github.com/SSSD/sssd/blob/d93b4fe14b0f72bd8311497d18204f153c104007/src/db/sysdb_search.c#L252
(btw, I think comment is wrong here)

But if my guess is correct we should see "Cannot merge sysdb cache values for %s\n" SSSDBG_MINOR_FAILURE messages in the log.

I didn't find sssd logs in the sos-report.


Akshay, can we get sssd_$domain.log please?

Btw, if they will remove cache db most probably issue will be hidden. Would be good to get log before cache removed. Or are they able to re-produce the issue reliable?

Comment 10 Pavel Březina 2020-11-23 10:50:43 UTC
Pushed PR: https://github.com/SSSD/sssd/pull/5414

* `master`
    * ff24d1538af88f83d0a3cc2817952cf70e7ca580 - SYSDB: merge_res_sysdb_attrs() fixed to avoid NULL ptr in msgs[]
* `sssd-1-16`
    * 9ace3a7899e6b3753ef428088303e0a646db4096 - SYSDB: merge_res_sysdb_attrs() fixed to avoid NULL ptr in msgs[]

Comment 20 Johannes Pfau 2020-12-04 15:20:57 UTC
We could also reproduce this with the AD backend on 7.9.
The linked patch seems to fix the issue (we have sssd running for 20 minutes now, whereas it previously crashed after 1-2 minutes).

Although we had to build packages with the fixes locally anyway, updated official packages would be much appreciated. This issue has the potential to break logins for all AD-integrated machines.

Comment 24 errata-xmlrpc 2020-12-15 11:22:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (sssd bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5459