Bug 1292456

Summary: sssd_be AD segfaults on missing A record
Product: Red Hat Enterprise Linux 7 Reporter: Martin Kosek <mkosek>
Component: sssdAssignee: Pavel Březina <pbrezina>
Status: CLOSED ERRATA QA Contact: Steeve Goveas <sgoveas>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0CC: dlavu, grajaiya, jgalipea, jhrozek, lslebodn, mkosek, mzidek, pbrezina
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: sssd-1.14.0-0.1.alpha.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-04 07:14:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Martin Kosek 2015-12-17 14:09:58 UTC
This bug is created as a clone of upstream ticket:
https://fedorahosted.org/sssd/ticket/2904

When using sssd on CentOS 6.6 with the AD backend against a Samba 4.2 AD domain, sssd does not handle a rare failure condition; when the SRV records point at a DC but the A record for that domain controller is missing. sssd_be periodically crashes, it restarts a couple times but generally does not recover:

Dec 16 09:44:05 host kernel: sssd_be[107682]: segfault at 0 ip 00007fd12c5e018b sp 00007fffba8db420 error 4 in libsss_ad.so[7fd12c5ca000+20000]
Dec 16 09:44:05 host abrtd: Directory 'ccpp-2015-12-16-09:44:05-107682' creation detected
Dec 16 09:44:05 host abrt[107687]: Saved core dump of pid 107682 (/usr/libexec/sssd/sssd_be) to /var/spool/abrt/ccpp-2015-12-16-09:44:05-107682 (1978368 bytes)

The SIGSEGV appears to happen in sss_ldap_init_send(), src/util/sss_ldap.c:331.

Getting into this condition is rare - it's a Samba bug that I'm working on separately. The situation could probably be replicated by poisoning DNS though. My expected behavior would be to give up on this DC, try any other DCs in the Site, then try other DCs in other Sites.

I have ABRT crashes and cores / backtraces from GDB.

Comment 1 Jakub Hrozek 2016-01-19 17:08:46 UTC
 * master: 8bd9ec3a8885b01a34863d22aa784e221fc422fb
 * sssd-1-13: b32ea7b1f91b4194e05a1a965310691075ecba23

Comment 2 Mike McCune 2016-03-28 23:16:15 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 4 Dan Lavu 2016-09-19 18:38:01 UTC
Verified against sssd-client-1.14.0-42.el7.x86_64 , though it's sanity *only* since we are testing against MS AD instead of Samba AD. 

Removed A records for NS servers in zone file. 

[root@dell-per230-02 ~]# nslookup bsod2.sssdad2012r2.com
Server:		10.12.0.159
Address:	10.12.0.159#53

*** Can't find bsod2.sssdad2012r2.com: No answer

[root@dell-per230-02 ~]# rm -rf /var/lib/sss/db/* 
[root@dell-per230-02 ~]# service sssd restart

[root@dell-per230-02 ~]# id Administrator
uid=1196000500(administrator) gid=1196000513(domain users) groups=1196000513(domain users),1196000519(enterprise admins),1196000518(schema admins),1196000512(domain admins),1196000520(group policy creator owners),1196000572(denied rodc password replication group)


Ran for a few hours and SSSD did not crash.

Comment 6 errata-xmlrpc 2016-11-04 07:14:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2476.html