Bug 1300636
| Summary: | nsupdate crashing due to a race condition when receiving SIGTERM during processing | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Nirupama Karandikar <nkarandi> | ||||||||||
| Component: | bind | Assignee: | Petr Menšík <pemensik> | ||||||||||
| Status: | CLOSED ERRATA | QA Contact: | Robin Hack <rhack> | ||||||||||
| Severity: | medium | Docs Contact: | |||||||||||
| Priority: | medium | ||||||||||||
| Version: | 7.2 | CC: | jhrozek, lslebodn, mkyral, nsoman, pemensik, psklenar, rhack, sgoveas, thozza | ||||||||||
| Target Milestone: | rc | Keywords: | Patch, Reproducer, Triaged | ||||||||||
| Target Release: | --- | Flags: | thozza:
mirror+
|
||||||||||
| Hardware: | Unspecified | ||||||||||||
| OS: | Unspecified | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | bind-9.11.4-20.P2.el7 | Doc Type: | No Doc Update | ||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | 2020-09-29 19:25:38 UTC | Type: | Bug | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Bug Depends On: | |||||||||||||
| Bug Blocks: | 1380362, 1465887, 1465928, 1534569, 1780577 | ||||||||||||
| Attachments: |
|
||||||||||||
Hello. Please provide steps to reproduce including the overall system setup and the sequence of commands that led to the crash. Also if you have a system on which you can reproduce the issue, it would be really helpful if you could provide me access to it for debugging purposes. Thanks [root@bumblebee ad_parameters]# cat /etc/sssd/sssd.conf
[sssd]
config_file_version = 2
services = nss, pam
domains = sssdad.com
user =
[nss]
filter_groups = root
filter_users = root
default_shell = /bin/bash
override_homedir = /home/%u
[domain/sssdad.com]
debug_level = 0xFFF0
id_provider = ad
ad_domain = sssdad.com
krb5_realm = SSSDAD
realmd_tags = manages-system joined-with-adcli
cache_credentials = True
ldap_schema = ad
krb5_store_password_if_offline = True
default_shell = /bin/bash
use_fully_qualified_names = False
fallback_homedir = /home/%u@%d
access_provider = ad
ldap_id_mapping = false
* Crash happens in this test block when sssd is restarted
:: [ PASS ] :: Command 'service sssd stop; rm -f /var/lib/sss/{mc,db}/*; service sssd start' (Expected 0, got 0)
:: [ BEGIN ] :: Running 'date; id aa0002-8794'
Tue Jul 19 16:29:15 IST 2016
uid=10031(aa0002-8794) gid=10035(tgroup01-8794) groups=10035(tgroup01-8794),100000(domain users)
:: [ PASS ] :: Command 'date; id aa0002-8794' (Expected 0, got 0)
:: [ BEGIN ] :: Running 'service sssd stop; rm -f /var/lib/sss/{mc,db}/*; service sssd start'
Redirecting to /bin/systemctl stop sssd.service
Redirecting to /bin/systemctl start sssd.service
:: [ PASS ] :: Command 'service sssd stop; rm -f /var/lib/sss/{mc,db}/*; service sssd start' (Expected 0, got 0)
:: [ BEGIN ] :: Running 'date; id aa0002-8794'
Tue Jul 19 16:29:31 IST 2016
uid=10031(aa0002-8794) gid=10035(tgroup01-8794) groups=10035(tgroup01-8794),100000(domain users)
* Crash occurred at 16:29:08
[root@bumblebee sssd]# tail -f /var/log/messages
Jul 19 16:29:07 bumblebee systemd: Stopping System Security Services Daemon...
Jul 19 16:29:07 bumblebee sssd[be[sssdad.com]]: Shutting down
Jul 19 16:29:07 bumblebee sssd[nss]: Shutting down
Jul 19 16:29:07 bumblebee sssd[pam]: Shutting down
Jul 19 16:29:08 bumblebee abrt-hook-ccpp: Process 26991 (nsupdate) of user 0 killed by SIGABRT - dumping core
Jul 19 16:29:08 bumblebee systemd: Stopped System Security Services Daemon.
Jul 19 16:29:08 bumblebee abrt-server: Duplicate: core backtrace
Jul 19 16:29:08 bumblebee abrt-server: DUP_OF_DIR: /var/spool/abrt/ccpp-2016-07-19-16:19:44-25069
Jul 19 16:29:08 bumblebee abrt-server: Deleting problem directory ccpp-2016-07-19-16:29:08-26991 (dup of ccpp-2016-07-19-16:19:44-25069)
Jul 19 16:29:08 bumblebee systemd: Starting System Security Services Daemon...
Jul 19 16:29:08 bumblebee dbus[591]: [system] Activating service name='org.freedesktop.problems' (using servicehelper)
Jul 19 16:29:08 bumblebee dbus-daemon: dbus[591]: [system] Activating service name='org.freedesktop.problems' (using servicehelper)
Jul 19 16:29:08 bumblebee dbus[591]: [system] Successfully activated service 'org.freedesktop.problems'
Jul 19 16:29:08 bumblebee dbus-daemon: dbus[591]: [system] Successfully activated service 'org.freedesktop.problems'
Jul 19 16:29:09 bumblebee sssd: Starting up
Jul 19 16:29:09 bumblebee sssd[be[sssdad.com]]: Starting up
Jul 19 16:29:09 bumblebee sssd[pam]: Starting up
Jul 19 16:29:09 bumblebee sssd[nss]: Starting up
Jul 19 16:29:09 bumblebee systemd: Started System Security Services Daemon.
* sssd domain logs around the time of the crash. Full logs will be attached along with the core file
[root@bumblebee sssd]# tail -f sssd_sssdad.com.log
(Tue Jul 19 16:29:06 2016) [sssd[be[sssdad.com]]] [child_sig_handler] (0x1000): Waiting for child [26983].
(Tue Jul 19 16:29:06 2016) [sssd[be[sssdad.com]]] [child_sig_handler] (0x0100): child [26983] finished successfully.
(Tue Jul 19 16:29:06 2016) [sssd[be[sssdad.com]]] [be_nsupdate_done] (0x0200): nsupdate child status: 0
(Tue Jul 19 16:29:06 2016) [sssd[be[sssdad.com]]] [nsupdate_msg_create_common] (0x0200): Creating update message for realm [(null)].
(Tue Jul 19 16:29:06 2016) [sssd[be[sssdad.com]]] [be_nsupdate_create_ptr_msg] (0x0400): -- Begin nsupdate message --
update add 73.206.65.10.in-addr.arpa. 3600 in PTR bumblebee.sssdad.com.
send
-- End nsupdate message --
(Tue Jul 19 16:29:06 2016) [sssd[be[sssdad.com]]] [child_handler_setup] (0x2000): Setting up signal handler up for pid [26991]
(Tue Jul 19 16:29:06 2016) [sssd[be[sssdad.com]]] [child_handler_setup] (0x2000): Signal handler set up for pid [26991]
(Tue Jul 19 16:29:06 2016) [sssd[be[sssdad.com]]] [write_pipe_handler] (0x0400): All data has been sent!
(Tue Jul 19 16:29:06 2016) [sssd[be[sssdad.com]]] [nsupdate_child_stdin_done] (0x1000): Sending nsupdate data complete
(Tue Jul 19 16:29:06 2016) [sssd[be[sssdad.com]]] [be_nsupdate_args] (0x0200): nsupdate auth type: GSS-TSIG
setup_system()
reset_system()
user_interaction()
do_next_command()
start_update()
done_update()
reset_system()
user_interaction()
do_next_command()
evaluate_update()
update_addordelete()
do_next_command()
start_update()
recvsoa()
About to create rcvmsg
show_message()
Reply from SOA query:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 50028
;; flags: qr aa rd ra; QUESTION: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 2
;; QUESTION SECTION:
;73.206.65.10.in-addr.arpa. IN SOA
;; AUTHORITY SECTION:
10.in-addr.arpa. 3600 IN SOA pluto.sssdad.com. hostmaster.sssdad.com. 1557 900 600 86400 3600
;; ADDITIONAL SECTION:
pluto.sssdad.com. 3600 IN A 10.8.63.34
pluto.sssdad.com. 3600 IN AAAA 2620:52:0:83f:9149:4ef6:d466:9e9f
Found zone name: 10.in-addr.arpa
The master is: pluto.sssdad.com
start_gssrequest
send_gssrequest
show_message()
Out of recvsoa
(Tue Jul 19 16:29:07 2016) [sssd[be[sssdad.com]]] [remove_krb5_info_files] (0x0200): Could not remove [/var/lib/sss/pubconf/kpasswdinfo.SSSDAD.COM], [2][No such file or directory]
(Tue Jul 19 16:29:07 2016) [sssd[be[sssdad.com]]] [be_ptask_destructor] (0x0400): Terminating periodic task [AD machine account password renewal]
(Tue Jul 19 16:29:07 2016) [sssd[be[sssdad.com]]] [dp_terminate_active_requests] (0x0400): Terminating active data provider requests
(Tue Jul 19 16:29:07 2016) [sssd[be[sssdad.com]]] [dp_client_destructor] (0x0400): Removed NSS client
(Tue Jul 19 16:29:07 2016) [sssd[be[sssdad.com]]] [dp_client_destructor] (0x0400): Removed PAM client
(Tue Jul 19 16:29:07 2016) [sssd[be[sssdad.com]]] [sbus_remove_watch] (0x2000): 0x7f7708377d90/0x7f7708370a30
(Tue Jul 19 16:29:07 2016) [sssd[be[sssdad.com]]] [remove_socket_symlink] (0x4000): The symlink points to [/var/lib/sss/pipes/private/sbus-dp_sssdad.com.26958]
(Tue Jul 19 16:29:07 2016) [sssd[be[sssdad.com]]] [remove_socket_symlink] (0x4000): The path including our pid is [/var/lib/sss/pipes/private/sbus-dp_sssdad.com.26958]
(Tue Jul 19 16:29:07 2016) [sssd[be[sssdad.com]]] [remove_socket_symlink] (0x4000): Removed the symlink
(Tue Jul 19 16:29:07 2016) [sssd[be[sssdad.com]]] [be_ptask_destructor] (0x0400): Terminating periodic task [Subdomains Refresh]
(Tue Jul 19 16:29:07 2016) [sssd[be[sssdad.com]]] [sdap_handle_release] (0x2000): Trace: sh[0x7f77083be000], connected[1], ops[(nil)], ldap[0x7f7708398070], destructor_lock[0], release_memory[0]
(Tue Jul 19 16:29:07 2016) [sssd[be[sssdad.com]]] [remove_connection_callback] (0x4000): Successfully removed connection callback.
(Tue Jul 19 16:29:07 2016) [sssd[be[sssdad.com]]] [sdap_handle_release] (0x2000): Trace: sh[0x7f770837b560], connected[1], ops[(nil)], ldap[0x7f770837fe90], destructor_lock[0], release_memory[0]
(Tue Jul 19 16:29:07 2016) [sssd[be[sssdad.com]]] [remove_connection_callback] (0x4000): Successfully removed connection callback.
cleanup()
Detaching GSS-TSIG keyring
Shutting down task manager
shutdown_program()
Shutting down request manager
recvgss()
Shutting down request manager
Destroy DST lib
Destroying request manager
Freeing the dispatchers
Shutting down dispatch manager
Destroying event
Shutting down socket manager
Shutting down timer manager
Destroying hash context
Destroying name state
Removing log context
Destroying memory context
mem.c:1102: INSIST(ctx->stats[i].gets == 0U) failed, back trace
#0 0x7f33496d10d7 in ??
#1 0x7f33496d102a in ??
#2 0x7f33496e0aa1 in ??
#3 0x7f33496e0f1c in ??
#4 0x404c07 in ??
#5 0x7f33474f3b35 in ??
#6 0x404da5 in ??
Created attachment 1181649 [details]
sssd domain full logs
Created attachment 1181650 [details]
Coredump 19 July 16
Hi. Please provide the exact NVR of bind-utils RPM used and ideally also the set of commands passes to nsupdate. Thanks The nsupdate message is: -- Begin nsupdate message -- update add 73.206.65.10.in-addr.arpa. 3600 in PTR bumblebee.sssdad.com. send -- End nsupdate message -- We usually invoke the nsupdate with -g, with this high debug level we also add -D and -d. We've seen this crash in rhel-7.2, 6.8 and now 7.3. The attached logs and corefile are from a rhel-7.3 system. [root@bumblebee ~]# rpm -q bind-utils bind-utils-9.9.4-36.el7.x86_64 *** Bug 1438393 has been marked as a duplicate of this bug. *** Hello Steve. Could you please rerun the test, in which nsupdate crashes, with "-M" option? It enables additional memory debugging in nsupdate and it should provide more information where the leak is happening. Thanks! Reported to upstream: https://bugs.isc.org/SelfService/Display.html?id=46044 (not public yet). Tracked as: [ISC-Bugs #46044] Created attachment 1424122 [details]
WIP patch1
Reported to upstream again as https://gitlab.isc.org/isc-projects/bind9/issues/558 Upstream seems to have a Patch, which is in essence similar to my WIP patch. Meaning that it prevents the leak, but my experience was that the race condition can cause also other errors. The proposed upstream patch is https://gitlab.isc.org/isc-projects/bind9/merge_requests/1048/diffs?commit_id=d698f6c32bfbf9da3ebcb167b582792f4c369fa1 Fix was merged by upstream https://gitlab.isc.org/isc-projects/bind9/merge_requests/1048 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (bind bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3871 |
Created attachment 1116891 [details] genrated coredump during crash Description of problem: /usr/bin/nsupdate -g -d -D Version-Release number of selected component (if applicable): bind-utils-9.9.4-29.el7_2.1 How reproducible: Always Actual results: nsupdate crashing during system configuration against AD Expected results: nsupdate should not crash Additional info: Attaching generated core file