Bug 1465135 - sssd_be dereferences NULL pointer server->common and crashes with assertion error
Summary: sssd_be dereferences NULL pointer server->common and crashes with assertion e...
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: sssd
Version: 7.3
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Sumit Bose
QA Contact: sssd-qe
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-06-26 18:00 UTC by Orion Poplawski
Modified: 2023-08-14 13:27 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-31 10:00:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github SSSD sssd issues 4513 0 None closed sssd_be dereferences NULL pointer server->common and crashes with assertion error 2021-02-17 07:25:03 UTC
Red Hat Issue Tracker RHELPLAN-151015 0 None None None 2023-03-08 09:29:40 UTC
Red Hat Issue Tracker SSSD-5719 0 None None None 2023-03-16 14:18:28 UTC

Description Orion Poplawski 2017-06-26 18:00:48 UTC
Description of problem:

Core was generated by `/usr/libexec/sssd/sssd_be --domain nwra.com --uid 0 --gid 0 --debug-to-files'.
Program terminated with signal 11, Segmentation fault.
#0  ldap_unbind_ext (ld=0x1, sctrls=0x0, cctrls=0x0) at unbind.c:46
46              assert( LDAP_VALID( ld ) );
(gdb) up
#1  0x00007f0ad4dd9ba4 in sdap_handle_release (sh=0x7f0adec05e70)
    at src/providers/ldap/sdap_async.c:110
110             ldap_unbind_ext(sh->ldap, NULL, NULL);
(gdb) print sh->ldap
$1 = (LDAP *) 0x1
(gdb) print *sh
$2 = {ldap = 0x1, connected = false, expire_time = 0, page_size = 1000, disable_deref = false,
  sdap_fd_events = 0x0, supported_saslmechs = {num_vals = 0, vals = 0x0}, supported_controls = {
    num_vals = 0, vals = 0x0}, supported_extensions = {num_vals = 0, vals = 0x0}, ops = 0x0,
  destructor_lock = true, release_memory = false}
(gdb) bt
#0  ldap_unbind_ext (ld=0x1, sctrls=0x0, cctrls=0x0) at unbind.c:46
#1  0x00007f0ad4dd9ba4 in sdap_handle_release (sh=0x7f0adec05e70)
    at src/providers/ldap/sdap_async.c:110
#2  sdap_handle_destructor (mem=<optimized out>) at src/providers/ldap/sdap_async.c:79
#3  0x00007f0ada46d447 in _talloc_free_internal () from /lib64/libtalloc.so.2
#4  0x00007f0ada466b63 in _talloc_free () from /lib64/libtalloc.so.2
#5  0x00007f0ada679680 in tevent_req_received (req=0x7f0adec08010) at ../tevent_req.c:247
#6  0x00007f0ada6796b9 in tevent_req_destructor (req=<optimized out>) at ../tevent_req.c:99
#7  0x00007f0ada466e80 in _talloc_free () from /lib64/libtalloc.so.2
#8  0x00007f0ad4dfc235 in sdap_cli_connect_done (subreq=0x7f0adec08010)
    at src/providers/ldap/sdap_async_connection.c:1568
#9  0x00007f0ada679512 in _tevent_req_error (req=req@entry=0x7f0adec08010, error=error@entry=111,
    location=location@entry=0x7f0ad4e2fee0 "src/providers/ldap/sdap_async_connection.c:158")
    at ../tevent_req.c:167
#10 0x00007f0ad4df7eb4 in sdap_sys_connect_done (subreq=0x0)
    at src/providers/ldap/sdap_async_connection.c:158
#11 0x00007f0ada679512 in _tevent_req_error (req=req@entry=0x7f0adebfd160, error=<optimized out>,
    location=location@entry=0x7f0ad4e3a019 "src/util/sss_ldap.c:260") at ../tevent_req.c:167
#12 0x00007f0ad4e0eac6 in sss_ldap_init_sys_connect_done (subreq=0x0) at src/util/sss_ldap.c:260
#13 0x00007f0ada679512 in _tevent_req_error (req=<optimized out>, error=<optimized out>,
    location=<optimized out>) at ../tevent_req.c:167
#14 0x00007f0ada679512 in _tevent_req_error (req=<optimized out>, error=<optimized out>,
    location=<optimized out>) at ../tevent_req.c:167
#15 0x00007f0ada67dd8b in epoll_event_loop (tvalp=0x7ffe04904f50, epoll_ev=0x7f0adeb94e70)
    at ../tevent_epoll.c:728
#16 epoll_event_loop_once (ev=<optimized out>, location=<optimized out>) at ../tevent_epoll.c:926
#17 0x00007f0ada67c257 in std_event_loop_once (ev=0x7f0adeb94c30,
    location=0x7f0ade0153ff "src/util/server.c:716") at ../tevent_standard.c:114
#18 0x00007f0ada67840d in _tevent_loop_once (ev=ev@entry=0x7f0adeb94c30,
    location=location@entry=0x7f0ade0153ff "src/util/server.c:716") at ../tevent.c:533
#19 0x00007f0ada6785ab in tevent_common_loop_wait (ev=0x7f0adeb94c30,
    location=0x7f0ade0153ff "src/util/server.c:716") at ../tevent.c:637
#20 0x00007f0ada67c1f7 in std_event_loop_wait (ev=0x7f0adeb94c30,
    location=0x7f0ade0153ff "src/util/server.c:716") at ../tevent_standard.c:140
#21 0x00007f0addff7f23 in server_loop (main_ctx=0x7f0adeb96080) at src/util/server.c:716
#22 0x00007f0ade874b82 in main (argc=8, argv=<optimized out>)
    at src/providers/data_provider_be.c:587

Version-Release number of selected component (if applicable):
sssd-1.14.0-43.el7_3.14.x86_64

How reproducible:
Only seen once I think.

(Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [fo_set_port_status] (0x0100): Marking port 0 of server 'europa.nwra.com' as 'neutral'
(Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [set_server_common_status] (0x0100): Marking server 'ipa-boulder2.nwra.com' as 'name not resolved'
(Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [fo_set_port_status] (0x0100): Marking port 0 of server 'ipa-boulder2.nwra.com' as 'neutral'
(Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [set_srv_data_status] (0x0100): Marking SRV lookup of service 'IPA' as 'neutral'
(Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [set_server_common_status] (0x0100): Marking server 'ipa2.nwra.com' as 'name not resolved'
(Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'ipa2.nwra.com' as 'neutral'
(Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [set_srv_data_status] (0x0100): Marking SRV lookup of service 'IPA' as 'neutral'
(Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [set_server_common_status] (0x0100): Marking server 'ipa1.nwra.com' as 'name not resolved'
(Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'ipa1.nwra.com' as 'neutral'
(Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [set_srv_data_status] (0x0100): Marking SRV lookup of service 'IPA' as 'neutral'
(Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [set_server_common_status] (0x0100): Marking server  'ipa-boulder2.nwra.com' as 'name not resolved'
(Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'ipa-boulder2.nwra.com' as 'neutral'
(Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [set_srv_data_status] (0x0100): Marking SRV lookup of service 'IPA' as 'neutral'
(Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [set_server_common_status] (0x0100): Marking server 'europa.nwra.com' as 'name not resolved'
(Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'europa.nwra.com' as 'neutral'
(Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'IPA'
(Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [resolve_srv_send] (0x0200): The status of SRV lookup is neutral
(Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [collapse_srv_lookup] (0x0100): Need to refresh SRV lookup for domain nwra.com
(Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [resolv_getsrv_send] (0x0100): Trying to resolve SRV record of '_ldap._tcp.nwra.com'
(Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [resolv_gethostbyname_dns_query] (0x0100): Trying to resolve A record of 'ipa2.nwra.com' in DNS
(Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [set_server_common_status] (0x0100): Marking server 'ipa2.nwra.com' as 'name resolved'
(Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [set_srv_data_status] (0x0100): Marking SRV lookup of service 'IPA' as 'resolved'
(Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [be_resolve_server_process] (0x0200): Found address for server ipa2.nwra.com: [10.0.1.76] TTL 86400
(Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [fo_get_server_hostent] (0x0020): Bug: Trying to get hostent from a name-less server
(Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [ipa_resolve_callback] (0x0020): FATAL: No hostent available for server (unknown name)
(Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [sssd_async_connect_done] (0x0020): connect failed [111][Connection refused].
(Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [sssd_async_socket_init_done] (0x0020): sdap_async_sys_connect request failed: [111]: Connection refused.
(Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [sss_ldap_init_sys_connect_done] (0x0020): sssd_async_socket_init request failed: [111]: Connection refused.
(Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [sdap_sys_connect_done] (0x0020): sdap_async_connect_call request failed: [111]: Connection refused.

Comment 3 Michal Zidek 2017-08-23 09:33:55 UTC
In function ipa_resolve_callback, we have server->common == NULL. This should not be the case and is most likely a bug in SSSD.

But I do not know how this could happen. There is nothing suspicious in the provided part of the logs (with the exception of the debug messages generated as a result of the fact that the server->common is NULL).

I am not sure if we can fix this without reproducer.

Comment 4 Jakub Hrozek 2017-08-23 13:40:57 UTC
(In reply to Michal Zidek from comment #3)
> In function ipa_resolve_callback, we have server->common == NULL. This
> should not be the case and is most likely a bug in SSSD.
> 
> But I do not know how this could happen. There is nothing suspicious in the
> provided part of the logs (with the exception of the debug messages
> generated as a result of the fact that the server->common is NULL).
> 
> I am not sure if we can fix this without reproducer.

Yes, I'm not sure either and I think it would be best to just plug the hole by checking for server->common validity.

Comment 5 Jakub Hrozek 2017-08-23 19:21:31 UTC
Upstream ticket:
https://pagure.io/SSSD/sssd/issue/3487

Comment 7 Michal Zidek 2017-08-31 08:48:19 UTC
Hi, maybe there is a misunderstanding here. I do not think we can plug the code with checks for NULL to avoid the assertion error.

Let me describe the situation a little bit better.

The NULL pointer that I talked about is the only sign that I can follow in the logs that indicates error in the code, but it is not NULL dereference that caused the crash (the code is guarded here).

What happened is that a destructor that we set to free ldap connection calls ldap_unbind_ext and inside this function is an assert that we fail. I think the root cause is the same as the NULL pointer in the server->common, but I do not know how we could plug this. I looked in the public api, if there is a way to do the same check that is made by the assert before we call ldap_unbind_ext, but found no suitable function (we could skip the call if we know it is going to fail assert).

I was thinking about creating some sort of flag "something_bad_happened", when we detect the NULL pointer in server->common and then skip the call in the destructor (the call for ldap_unbind_ext that fails with assertion), but that is a really ugly hack and on top of that, we have no way to actually verify that it would work (we do not have reproducer).

So again, I do not think we can do anything with this without a reproducer or more information about what happened.

Comment 8 Jakub Hrozek 2017-08-31 09:54:21 UTC
I checked how we deal setting the LDAP handle when unbinding and at least in sdap_handle_release() we do set the handle to NULL right after unbinding, so I also don't have a good idea how to fix this, sorry :-(

I guess this bugzilla could then be closed as INSUFFICIENT_INFO and we could keep the upstream patch around in "Patches Welcome" just to keep the knowledge about the problem around..

Comment 9 Michal Zidek 2017-08-31 09:59:41 UTC
(In reply to Jakub Hrozek from comment #8)
> I checked how we deal setting the LDAP handle when unbinding and at least in
> sdap_handle_release() we do set the handle to NULL right after unbinding, so
> I also don't have a good idea how to fix this, sorry :-(
> 
> I guess this bugzilla could then be closed as INSUFFICIENT_INFO and we could
> keep the upstream patch around in "Patches Welcome" just to keep the
> knowledge about the problem around..

+1

Going to do it now.


Note You need to log in before you can comment on or make changes to this bug.