Bug 1465135
| Summary: | sssd_be dereferences NULL pointer server->common and crashes with assertion error | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Orion Poplawski <orion> |
| Component: | sssd | Assignee: | Sumit Bose <sbose> |
| Status: | NEW --- | QA Contact: | sssd-qe <sssd-qe> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.3 | CC: | aboscatt, frenaud, grajaiya, lslebodn, mkosek, mzidek, pbrezina, pkettman, qpham, tscherf |
| Target Milestone: | rc | Keywords: | Reopened, Triaged |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-08-31 10:00:44 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
In function ipa_resolve_callback, we have server->common == NULL. This should not be the case and is most likely a bug in SSSD. But I do not know how this could happen. There is nothing suspicious in the provided part of the logs (with the exception of the debug messages generated as a result of the fact that the server->common is NULL). I am not sure if we can fix this without reproducer. (In reply to Michal Zidek from comment #3) > In function ipa_resolve_callback, we have server->common == NULL. This > should not be the case and is most likely a bug in SSSD. > > But I do not know how this could happen. There is nothing suspicious in the > provided part of the logs (with the exception of the debug messages > generated as a result of the fact that the server->common is NULL). > > I am not sure if we can fix this without reproducer. Yes, I'm not sure either and I think it would be best to just plug the hole by checking for server->common validity. Upstream ticket: https://pagure.io/SSSD/sssd/issue/3487 Hi, maybe there is a misunderstanding here. I do not think we can plug the code with checks for NULL to avoid the assertion error. Let me describe the situation a little bit better. The NULL pointer that I talked about is the only sign that I can follow in the logs that indicates error in the code, but it is not NULL dereference that caused the crash (the code is guarded here). What happened is that a destructor that we set to free ldap connection calls ldap_unbind_ext and inside this function is an assert that we fail. I think the root cause is the same as the NULL pointer in the server->common, but I do not know how we could plug this. I looked in the public api, if there is a way to do the same check that is made by the assert before we call ldap_unbind_ext, but found no suitable function (we could skip the call if we know it is going to fail assert). I was thinking about creating some sort of flag "something_bad_happened", when we detect the NULL pointer in server->common and then skip the call in the destructor (the call for ldap_unbind_ext that fails with assertion), but that is a really ugly hack and on top of that, we have no way to actually verify that it would work (we do not have reproducer). So again, I do not think we can do anything with this without a reproducer or more information about what happened. I checked how we deal setting the LDAP handle when unbinding and at least in sdap_handle_release() we do set the handle to NULL right after unbinding, so I also don't have a good idea how to fix this, sorry :-( I guess this bugzilla could then be closed as INSUFFICIENT_INFO and we could keep the upstream patch around in "Patches Welcome" just to keep the knowledge about the problem around.. (In reply to Jakub Hrozek from comment #8) > I checked how we deal setting the LDAP handle when unbinding and at least in > sdap_handle_release() we do set the handle to NULL right after unbinding, so > I also don't have a good idea how to fix this, sorry :-( > > I guess this bugzilla could then be closed as INSUFFICIENT_INFO and we could > keep the upstream patch around in "Patches Welcome" just to keep the > knowledge about the problem around.. +1 Going to do it now. |
Description of problem: Core was generated by `/usr/libexec/sssd/sssd_be --domain nwra.com --uid 0 --gid 0 --debug-to-files'. Program terminated with signal 11, Segmentation fault. #0 ldap_unbind_ext (ld=0x1, sctrls=0x0, cctrls=0x0) at unbind.c:46 46 assert( LDAP_VALID( ld ) ); (gdb) up #1 0x00007f0ad4dd9ba4 in sdap_handle_release (sh=0x7f0adec05e70) at src/providers/ldap/sdap_async.c:110 110 ldap_unbind_ext(sh->ldap, NULL, NULL); (gdb) print sh->ldap $1 = (LDAP *) 0x1 (gdb) print *sh $2 = {ldap = 0x1, connected = false, expire_time = 0, page_size = 1000, disable_deref = false, sdap_fd_events = 0x0, supported_saslmechs = {num_vals = 0, vals = 0x0}, supported_controls = { num_vals = 0, vals = 0x0}, supported_extensions = {num_vals = 0, vals = 0x0}, ops = 0x0, destructor_lock = true, release_memory = false} (gdb) bt #0 ldap_unbind_ext (ld=0x1, sctrls=0x0, cctrls=0x0) at unbind.c:46 #1 0x00007f0ad4dd9ba4 in sdap_handle_release (sh=0x7f0adec05e70) at src/providers/ldap/sdap_async.c:110 #2 sdap_handle_destructor (mem=<optimized out>) at src/providers/ldap/sdap_async.c:79 #3 0x00007f0ada46d447 in _talloc_free_internal () from /lib64/libtalloc.so.2 #4 0x00007f0ada466b63 in _talloc_free () from /lib64/libtalloc.so.2 #5 0x00007f0ada679680 in tevent_req_received (req=0x7f0adec08010) at ../tevent_req.c:247 #6 0x00007f0ada6796b9 in tevent_req_destructor (req=<optimized out>) at ../tevent_req.c:99 #7 0x00007f0ada466e80 in _talloc_free () from /lib64/libtalloc.so.2 #8 0x00007f0ad4dfc235 in sdap_cli_connect_done (subreq=0x7f0adec08010) at src/providers/ldap/sdap_async_connection.c:1568 #9 0x00007f0ada679512 in _tevent_req_error (req=req@entry=0x7f0adec08010, error=error@entry=111, location=location@entry=0x7f0ad4e2fee0 "src/providers/ldap/sdap_async_connection.c:158") at ../tevent_req.c:167 #10 0x00007f0ad4df7eb4 in sdap_sys_connect_done (subreq=0x0) at src/providers/ldap/sdap_async_connection.c:158 #11 0x00007f0ada679512 in _tevent_req_error (req=req@entry=0x7f0adebfd160, error=<optimized out>, location=location@entry=0x7f0ad4e3a019 "src/util/sss_ldap.c:260") at ../tevent_req.c:167 #12 0x00007f0ad4e0eac6 in sss_ldap_init_sys_connect_done (subreq=0x0) at src/util/sss_ldap.c:260 #13 0x00007f0ada679512 in _tevent_req_error (req=<optimized out>, error=<optimized out>, location=<optimized out>) at ../tevent_req.c:167 #14 0x00007f0ada679512 in _tevent_req_error (req=<optimized out>, error=<optimized out>, location=<optimized out>) at ../tevent_req.c:167 #15 0x00007f0ada67dd8b in epoll_event_loop (tvalp=0x7ffe04904f50, epoll_ev=0x7f0adeb94e70) at ../tevent_epoll.c:728 #16 epoll_event_loop_once (ev=<optimized out>, location=<optimized out>) at ../tevent_epoll.c:926 #17 0x00007f0ada67c257 in std_event_loop_once (ev=0x7f0adeb94c30, location=0x7f0ade0153ff "src/util/server.c:716") at ../tevent_standard.c:114 #18 0x00007f0ada67840d in _tevent_loop_once (ev=ev@entry=0x7f0adeb94c30, location=location@entry=0x7f0ade0153ff "src/util/server.c:716") at ../tevent.c:533 #19 0x00007f0ada6785ab in tevent_common_loop_wait (ev=0x7f0adeb94c30, location=0x7f0ade0153ff "src/util/server.c:716") at ../tevent.c:637 #20 0x00007f0ada67c1f7 in std_event_loop_wait (ev=0x7f0adeb94c30, location=0x7f0ade0153ff "src/util/server.c:716") at ../tevent_standard.c:140 #21 0x00007f0addff7f23 in server_loop (main_ctx=0x7f0adeb96080) at src/util/server.c:716 #22 0x00007f0ade874b82 in main (argc=8, argv=<optimized out>) at src/providers/data_provider_be.c:587 Version-Release number of selected component (if applicable): sssd-1.14.0-43.el7_3.14.x86_64 How reproducible: Only seen once I think. (Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [fo_set_port_status] (0x0100): Marking port 0 of server 'europa.nwra.com' as 'neutral' (Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [set_server_common_status] (0x0100): Marking server 'ipa-boulder2.nwra.com' as 'name not resolved' (Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [fo_set_port_status] (0x0100): Marking port 0 of server 'ipa-boulder2.nwra.com' as 'neutral' (Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [set_srv_data_status] (0x0100): Marking SRV lookup of service 'IPA' as 'neutral' (Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [set_server_common_status] (0x0100): Marking server 'ipa2.nwra.com' as 'name not resolved' (Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'ipa2.nwra.com' as 'neutral' (Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [set_srv_data_status] (0x0100): Marking SRV lookup of service 'IPA' as 'neutral' (Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [set_server_common_status] (0x0100): Marking server 'ipa1.nwra.com' as 'name not resolved' (Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'ipa1.nwra.com' as 'neutral' (Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [set_srv_data_status] (0x0100): Marking SRV lookup of service 'IPA' as 'neutral' (Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [set_server_common_status] (0x0100): Marking server 'ipa-boulder2.nwra.com' as 'name not resolved' (Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'ipa-boulder2.nwra.com' as 'neutral' (Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [set_srv_data_status] (0x0100): Marking SRV lookup of service 'IPA' as 'neutral' (Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [set_server_common_status] (0x0100): Marking server 'europa.nwra.com' as 'name not resolved' (Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'europa.nwra.com' as 'neutral' (Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'IPA' (Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [resolve_srv_send] (0x0200): The status of SRV lookup is neutral (Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [collapse_srv_lookup] (0x0100): Need to refresh SRV lookup for domain nwra.com (Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [resolv_getsrv_send] (0x0100): Trying to resolve SRV record of '_ldap._tcp.nwra.com' (Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [resolv_gethostbyname_dns_query] (0x0100): Trying to resolve A record of 'ipa2.nwra.com' in DNS (Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [set_server_common_status] (0x0100): Marking server 'ipa2.nwra.com' as 'name resolved' (Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [set_srv_data_status] (0x0100): Marking SRV lookup of service 'IPA' as 'resolved' (Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [be_resolve_server_process] (0x0200): Found address for server ipa2.nwra.com: [10.0.1.76] TTL 86400 (Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [fo_get_server_hostent] (0x0020): Bug: Trying to get hostent from a name-less server (Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [ipa_resolve_callback] (0x0020): FATAL: No hostent available for server (unknown name) (Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [sssd_async_connect_done] (0x0020): connect failed [111][Connection refused]. (Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [sssd_async_socket_init_done] (0x0020): sdap_async_sys_connect request failed: [111]: Connection refused. (Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [sss_ldap_init_sys_connect_done] (0x0020): sssd_async_socket_init request failed: [111]: Connection refused. (Sun Jun 25 10:56:32 2017) [sssd[be[nwra.com]]] [sdap_sys_connect_done] (0x0020): sdap_async_connect_call request failed: [111]: Connection refused.