Bug 1264224

Summary: segfault in ns-slapd due to accessing Slapi_DN freed in pre bind plug-in
Product: Red Hat Enterprise Linux 7 Reporter: Noriko Hosoi <nhosoi>
Component: 389-ds-baseAssignee: Noriko Hosoi <nhosoi>
Status: CLOSED ERRATA QA Contact: Viktor Ashirov <vashirov>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.0CC: abokovoy, nhosoi, nkinder, rmeggins, s.kieske, spoore, sramling, tlavigne
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 389-ds-base-1.3.4.0-16.el7 Doc Type: Bug Fix
Doc Text:
Cause: In a bind operation, if a bind dn is replaced in one of or more pre-bind plug-ins, it could crash the server by NULL dereference. Fix: If the bind dn replace occurs, it retrieves the new memory so that the crash does not occur any more.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-19 11:44:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Noriko Hosoi 2015-09-17 22:53:51 UTC
This bug is created as a clone of upstream ticket:
https://fedorahosted.org/389/ticket/48188

Hopefully this is the appropriate place for this.

I'm running freeIPA on centos 7. Each of my client servers are doing authentication and group lookup via vanilla ldap. I am getting pretty consistent segfaults in the ns-slapd process, especially when I am doing mass changes via ansible. This triggers a flood of ldap requests that end up crashing the service, I haven't been able to pinpoint the exact conditions but it doesn't take long for it to buckle.

version 1.3.3.1 on centos 7

Comment 1 Viktor Ashirov 2015-09-18 16:02:44 UTC
Hi Noriko,

Is that a reliably reproducible crash? If so, what are the steps to reproduce?

Thanks!

Comment 2 Noriko Hosoi 2015-09-18 16:39:32 UTC
(In reply to Viktor Ashirov from comment #1)
> Hi Noriko,
> 
> Is that a reliably reproducible crash? If so, what are the steps to
> reproduce?

Hi Viktor,

It requires freeipa and slapi-nis/compat plug-in.  More precisely, this function backend_bind_cb(Slapi_PBlock *pb) needs to be called with the following condition.
 * 2. If bind target DN exists in LDAP store, its map cache entry
 *    will have orginal entry DN recorded. Enforcing SLAPI_BIND_TARGET_SDN
 *    to it will force other plugins to handle authentication request against
 *    the original because slapi-nis' map cache entry doesn't have paswords
 *    recorded. To make it working, slapi-nis should be registered with higher
 *    plugin ordering priority than other plugins.
As described in this comment, SLAPI_BIND_TARGET_SDN is reset in the pblock which is the cause of the crash.

I think we need this setup/scenario to be translated by the slapi-nis expert...

Alexander, could you please help us?  Thanks!

Comment 3 Noriko Hosoi 2015-09-18 16:42:09 UTC
Alexander, could you please help us on the reproducer described in Comment 2?
Thank you!!

Comment 4 Alexander Bokovoy 2015-09-18 17:31:37 UTC
Noriko,

I think this also is similar to what Simo was doing with support for non-DN binds (like AD does with sAMAccountName value), i.e. 389-ds needs indeed to refresh SDN value after the plugins' run.

I think your patch is roughly correct. I wonder, though, what causes this crash because to me any rebinding that slapi-nis does works fine. Perhaps, another thread did remove the user at the same time as somebody tried to bind to compat entry of that user? I.e. the entry did exist at the bind time in slapi-nis map cache but disappeared from the primary source in a separate thread?

Comment 5 Noriko Hosoi 2015-09-19 00:53:03 UTC
(In reply to Alexander Bokovoy from comment #4)
> Noriko,
> 
> I think this also is similar to what Simo was doing with support for non-DN
> binds (like AD does with sAMAccountName value), i.e. 389-ds needs indeed to
> refresh SDN value after the plugins' run.
> 
> I think your patch is roughly correct. I wonder, though, what causes this
> crash because to me any rebinding that slapi-nis does works fine. Perhaps,
> another thread did remove the user at the same time as somebody tried to
> bind to compat entry of that user? I.e. the entry did exist at the bind time
> in slapi-nis map cache but disappeared from the primary source in a separate
> thread?

Thanks, Alexander.  I pushed the fix (enhanced some more by the comments by Rich) to upstream.
https://fedorahosted.org/389/ticket/48188#comment:26

I'm building the rhel-7.2 candidate with the patch from now.  The patches prevent the crash caused by the replaced sdn in the pre-op bind plug-in such as slapi-nis.  And if the entry is deleted, the following backend bind code (ldbm_back_bind) fails like this.
    /*
     * find the target entry.  find_entry() takes care of referrals
     *   and sending errors if the entry does not exist.
     */
    if (( e = find_entry( pb, be, addr, &txn )) == NULL ) {
        rc = SLAPI_BIND_FAIL;
        goto bail;
    }
Hopefully, it works (bind fails) as expected.

That said, the reproducer would be binding as a user and simultaneous (deleting and adding the user) in a loop?  Could you please tell me to enable slapi-nis, what option needs to be set to ipa-server-install?  (sorry about this too basic question...)  Thanks!!

Comment 7 Alexander Bokovoy 2015-09-19 16:44:46 UTC
Noriko,

please use 'ipa-compat-manage status|enable|disable' after you installed the IPA master to see status, enable (or disable) compat plugin. I think it should be enabled by default.

When binding from ldapsearch as compat user you would get something like this in the logs:

[19/Sep/2015:18:40:55 +0200] conn=1950 fd=111 slot=111 SSL connection from 192.168.7.1 to 192.168.7.1
[19/Sep/2015:18:40:56 +0200] conn=1950 TLS1.2 128-bit AES
[19/Sep/2015:18:40:56 +0200] conn=1950 op=0 BIND dn="uid=abokovoy,cn=users,cn=compat,dc=vda,dc=li" method=128 version=3
[19/Sep/2015:18:40:56 +0200] conn=1950 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=abokovoy,cn=users,cn=accounts,dc=vda,dc=li"
[19/Sep/2015:18:40:56 +0200] conn=1950 op=1 SRCH base="dc=vda,dc=li" scope=2 filter="(uid=abokovoy)" attrs=ALL
[19/Sep/2015:18:40:56 +0200] conn=1950 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[19/Sep/2015:18:40:56 +0200] conn=1950 op=2 UNBIND
[19/Sep/2015:18:40:56 +0200] conn=1950 op=2 fd=111 closed - U1

e.g. result of bind should show a rewritten DN. I can't reproduce the crash issue unless I would try to delete the user while I have a high load on the machine so that deletion does not propagate fast to the slapi-nis callback.

If entry was deleted, bind has to fail, indeed.

Comment 8 Sankar Ramalingam 2015-10-16 13:53:37 UTC
Installed ipa-server, enabled compat plugin and ran ldapsearch as user test1 from compat entry. No crashes observed.

[root@cloud-qe-21 ~]# ldapsearch -x -p 389 -h localhost -D "uid=test1,cn=users,cn=compat,dc=idmqe,dc=lab,dc=eng,dc=bos,dc=redhat,dc=com" -w Secret123 -b "dc=idmqe,dc=lab,dc=eng,dc=bos,dc=redhat,dc=com" 

[root@cloud-qe-21 ~]# tail -f /var/log/dirsrv/slapd-IDMQE-LAB-ENG-BOS-REDHAT-COM/errors 
[16/Oct/2015:09:13:33 -0400] NSACLPlugin - The ACL target cn=ad,cn=etc,dc=idmqe,dc=lab,dc=eng,dc=bos,dc=redhat,dc=com does not exist
[16/Oct/2015:09:13:33 -0400] NSACLPlugin - The ACL target cn=casigningcert cert-pki-ca,cn=ca_renewal,cn=ipa,cn=etc,dc=idmqe,dc=lab,dc=eng,dc=bos,dc=redhat,dc=com does not exist
[16/Oct/2015:09:13:33 -0400] NSACLPlugin - The ACL target cn=casigningcert cert-pki-ca,cn=ca_renewal,cn=ipa,cn=etc,dc=idmqe,dc=lab,dc=eng,dc=bos,dc=redhat,dc=com does not exist
[16/Oct/2015:09:13:33 -0400] NSACLPlugin - The ACL target cn=automember rebuild membership,cn=tasks,cn=config does not exist
[16/Oct/2015:09:13:33 -0400] - Skipping CoS Definition cn=Password Policy,cn=accounts,dc=idmqe,dc=lab,dc=eng,dc=bos,dc=redhat,dc=com--no CoS Templates found, which should be added before the CoS Definition.
[16/Oct/2015:09:13:33 -0400] - slapd started.  Listening on All Interfaces port 389 for LDAP requests
[16/Oct/2015:09:13:33 -0400] - Listening on All Interfaces port 636 for LDAPS requests
[16/Oct/2015:09:13:33 -0400] - Listening on /var/run/slapd-IDMQE-LAB-ENG-BOS-REDHAT-COM.socket for LDAPI requests
[16/Oct/2015:09:13:42 -0400] memberof-plugin - Memberof task starts (arg: (objectclass=*)) ...
[16/Oct/2015:09:13:44 -0400] memberof-plugin - Memberof task finished (arg: (objectclass=*)) ...


Packages tested:
[root@cloud-qe-21 ~]# rpm -qa |egrep 'ipa-|389-ds'
389-ds-base-1.3.4.0-19.el7.x86_64
389-ds-base-libs-1.3.4.0-19.el7.x86_64
ipa-server-dns-4.2.0-12.el7.x86_64
389-ds-base-debuginfo-1.3.4.0-19.el7.x86_64
ipa-client-4.2.0-12.el7.x86_64
redhat-access-plugin-ipa-0.9.1-2.el7.noarch
ipa-admintools-4.2.0-12.el7.x86_64
ipa-python-4.2.0-12.el7.x86_64
sssd-ipa-1.13.0-39.el7.x86_64
ipa-server-4.2.0-12.el7.x86_64


Is this enough to mark the bug as Verified?

Comment 9 Noriko Hosoi 2015-10-16 16:21:53 UTC
(In reply to Sankar Ramalingam from comment #8)
> Is this enough to mark the bug as Verified?

The DN of the bind user was replaced in slapi-nis, which triggered the crash.  I'm not sure what data/set-up is needed to make slapi-nis modify the bind DN.

Scott, could you please enlighten us how to verify this bug?  Thanks!!

Comment 10 Scott Poore 2015-10-16 19:01:00 UTC
I'm not sure we can test this with anything but sanity only, right?

From the looks of it from the ticket, it was never reproduced outside of a production environment.  The description from the ticket though also sounded like the crashes occurred during multiple simultaneous lookups.  So, do we need a script to fork multiple ssh for different users at the same time?  or enough maybe to just run multiple searches in parallel?

Comment 11 Noriko Hosoi 2015-10-16 19:58:13 UTC
(In reply to Scott Poore from comment #10)
> I'm not sure we can test this with anything but sanity only, right?
> 
> From the looks of it from the ticket, it was never reproduced outside of a
> production environment.  The description from the ticket though also sounded
> like the crashes occurred during multiple simultaneous lookups.  So, do we
> need a script to fork multiple ssh for different users at the same time?  or
> enough maybe to just run multiple searches in parallel?

Thank you, Scott.

It is quite clear in the code backend_bind_cb (back-sch.c)... :)

/* If user comes from NSSWITCH, it will get authentication handled by PAM. */
This check "if (data->source == backend_entry_source_nsswitch)" is false and it goes to:
  /* Otherwise force rewrite of the SLAPI_BIND_TARGET_SDN
   * and let other plugins to handle it.
   * slapi-nis should have plugin ordering set below standard 50 to succeed */
in this "else code", it replace the Slapi_DN in pblock and it used to crash 389-ds-base.  

Is this else clause usually executed?  If so, this bug is verified.  (but I doubt it is ...  If so, the crash should have observed more often...):

Comment 12 Sankar Ramalingam 2015-10-19 10:15:55 UTC
Thanks Noriko and Scott. As per Scott's suggestion in comment #10, I executed multiple searches in parallel on the IPA server setup and I found no crash. Hence, marking the bug as Verified.

[root@cloud-qe-21 ~]# find /var/log/dirsrv/slapd-*/* -name core*
[root@cloud-qe-21 ~]# 

Packages tested in comment #8.

Comment 13 Sankar Ramalingam 2015-10-19 10:32:35 UTC
(In reply to Sankar Ramalingam from comment #12)
> Thanks Noriko and Scott. As per Scott's suggestion in comment #10, I
> executed multiple searches in parallel on the IPA server setup and I found

On the ipa server side, I enabled slapi-nis plugin as well and restarted the Directory server. This is to make sure both ipa compat and nis plugin were enabled when running the ldap searches.
> no crash. Hence, marking the bug as Verified.
> 
> [root@cloud-qe-21 ~]# find /var/log/dirsrv/slapd-*/* -name core*
> [root@cloud-qe-21 ~]# 
> 
> Packages tested in comment #8.

Comment 14 Sven Kieske 2015-10-21 15:35:42 UTC
Hi,

when will this package be released, any timeline?


Thanks

Sven

Comment 15 Noriko Hosoi 2015-10-21 16:10:26 UTC
(In reply to Sven Kieske from comment #14)
> Hi,
> 
> when will this package be released, any timeline?
> 
> 
> Thanks
> 
> Sven

It will be included in the next release.  We will let you know when it is ready.
Thanks for your patience.
--noriko

Comment 16 errata-xmlrpc 2015-11-19 11:44:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2351.html