Bug 1687698
Summary: | Restarting dbus crashes certmonger | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Mohammad Rizwan <myusuf> | ||||
Component: | certmonger | Assignee: | Rob Crittenden <rcritten> | ||||
Status: | CLOSED ERRATA | QA Contact: | ipa-qe <ipa-qe> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 8.0 | CC: | abokovoy, dking, ksiddiqu, nalin, pcech, pvoborni, rcritten | ||||
Target Milestone: | rc | Keywords: | Regression, TestBlocker | ||||
Target Release: | 8.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | certmonger-0.79.7-15.el8 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2020-11-04 02:51:52 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Mohammad Rizwan
2019-03-12 07:34:42 UTC
Can you reproduce this manually? Can you provide a stack trace from the core? Yes, It is reproduced manually. Providing my own stack trace. Program received signal SIGTERM, Terminated. 0x00007f9acd5a17a8 in poll () from /lib64/libc.so.6 (gdb) where #0 0x00007f9acd5a17a8 in poll () from /lib64/libc.so.6 #1 0x00007f9acfe63916 in poll (__timeout=<optimized out>, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:46 #2 _dbus_poll (fds=<optimized out>, n_fds=<optimized out>, timeout_milliseconds=<optimized out>) at ../../dbus/dbus-sysdeps-unix.c:2837 #3 0x00007f9acfe5c70f in socket_do_iteration (transport=0x55cbecb0e310, flags=6, timeout_milliseconds=<optimized out>) at ../../dbus/dbus-transport-socket.c:1183 #4 0x00007f9acfe5b491 in _dbus_transport_do_iteration ( transport=0x55cbecb0e310, flags=<optimized out>, timeout_milliseconds=<optimized out>) at ../../dbus/dbus-transport.c:1016 #5 0x00007f9acfe432ec in _dbus_connection_do_iteration_unlocked ( connection=0x55cbecb0dcf0, pending=<optimized out>, flags=6, timeout_milliseconds=25000) at ../../dbus/dbus-connection.c:1227 #6 0x00007f9acfe43cbd in _dbus_connection_block_pending_call ( pending=0x55cbecb0b5a0) at ../../dbus/dbus-connection.c:2433 #7 0x00007f9acfe5550e in dbus_pending_call_block (pending=<optimized out>) at ../../dbus/dbus-pending-call.c:767 #8 0x00007f9acfe44286 in dbus_connection_send_with_reply_and_block ( connection=connection@entry=0x55cbecb0dcf0, message=message@entry=0x55cbecb05040, timeout_milliseconds=timeout_milliseconds@entry=-1, error=error@entry=0x0) at ../../dbus/dbus-connection.c:3576 #9 0x00007f9acfe3ff38 in dbus_bus_register (connection=0x55cbecb0dcf0, error=0x0) at ../../dbus/dbus-bus.c:695 #10 0x00007f9acfe401b4 in internal_bus_get (type=DBUS_BUS_SYSTEM, private=0, error=0x0) at ../../dbus/dbus-bus.c:483 #11 0x000055cbeb49feac in cm_tdbus_reconnect (ec=0x55cbecaf5a90, timer=<optimized out>, current_time=..., pvt=0x55cbecafd2f0) at tdbus.c:546 #12 0x00007f9acfc23bd9 in tevent_common_invoke_timer_handler () from /lib64/libtevent.so.0 #13 0x00007f9acfc23d7e in tevent_common_loop_timer_delay () from /lib64/libtevent.so.0 #14 0x00007f9acfc24f2b in epoll_event_loop_once () from /lib64/libtevent.so.0 #15 0x00007f9acfc231bb in std_event_loop_once () from /lib64/libtevent.so.0 #16 0x00007f9acfc1e395 in _tevent_loop_once () from /lib64/libtevent.so.0 #17 0x000055cbeb482799 in main (argc=<optimized out>, argv=<optimized out>) at main.c:413 It is blowing up on this line in certmonger: tdb->conn = dbus_bus_get(DBUS_BUS_SESSION, NULL); Near as I can tell exit_on_disconnect is set to FALSE. Reproduction does not require an IPA master to be installed. Steps are: 1. systemctl start certmonger 2. systemctl restart messagebus Given it is failing doing a poll on a fd I'm re-assigning to dbus team for further evaluation. Much like killing the X server for Xlib clients, standard libdbus behaviour is to terminate clients when the server is terminated. This is explained in detail in upstream Bugzilla (and more briefly in the libdbus documentation): https://bugs.freedesktop.org/show_bug.cgi?id=16338#c1 I think the certmonger dbus code attempts to reconnect when the bus is the system bus. Seems like that is wrong. It sounds like for practical purposes certmonger should always set dbus_connection_set_exit_on_disconnect(conn, TRUE) so it exits gracefully? certmonger is dbus-activated by systemd so if it goes away due to dbus restart it will be restarted upon first use of the clients. If that is right then feel free to re-assign back to me and I'll make the changes in certmonger. I am not really the person to ask about writing applications using libdbus. That would be the upstream dbus mailing list (where you would very likely be encouraged to stop using a long-deprecated library, where better alternatives exist). The documentation for dbus_connection_set_exit_on_disconnect() suggests that is using dbus_bus_get(), exiting will be enabled by default: https://dbus.freedesktop.org/doc/api/html/group__DBusConnection.html#ga19091beb74f1504b0e862a7ad10e71cd The behaviour of certmonger to go away due to dbus restart, and be re-activated as necessary, sounds fine to me. Steps to verify in description. master: 39ce89ec821d02643681795d2149b20198f0fe42 systemd (pid 1) is sending a SIGTERM to certmonger when dbus restarts. There simply isn't anything we can do except ignore SIGTERM which isn't really a solution. Rob, should we add a forced restart for the lost of dbus service? [Unit] After=dbus.service BindsTo=dbus.service See https://bugzilla.redhat.com/show_bug.cgi?id=1654779 for details of how this came as a solution. Good idea. I think PartOf may be a better option and will still tie to stop/restart. Doing so will require patience on the user's part if they restart dbus as it waits until all dependent services are restarted and certmonger takes a bit to get going. I'll work on a patch. Implemented PartOf in systemd file. Generally speaking messagebus should never be restarted without a reboot but given the repercussions of a long-running server where certmonger is never restarted automatically and certificates expiring is dire enough to take this action. Verified based on following output. For details refer the attached runner.log file. IPA Version. 2020-08-12T06:09:25+0000 [ci-vm-10-0-105-109.h] +-----------------------------[RPMs & OS: [RedHat - x86_64]-----------------------------+ 2020-08-12T06:09:25+0000 [ci-vm-10-0-105-109.h] | ipa-client-4.8.7-8.module+el8.3.0+7513+a375844a.x86_64 2020-08-12T06:09:25+0000 [ci-vm-10-0-105-109.h] | ipa-client-common-4.8.7-8.module+el8.3.0+7513+a375844a.noarch 2020-08-12T06:09:25+0000 [ci-vm-10-0-105-109.h] | sssd-ipa-2.3.0-7.el8.x86_64 2020-08-12T06:09:25+0000 [ci-vm-10-0-105-109.h] ------------------------------------------------------------------------------------------ Snip from console output: :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: certmonger should not dump dbus related core, bz1103090 :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: [ 02:09:17 ] :: [ PASS ] :: Starting certmonger service (Expected 0,1, got 0) :: [ 02:09:17 ] :: [ PASS ] :: restarting messagebus service (Expected 0, got 0) :: [ 02:09:22 ] :: [ PASS ] :: sleep for certmonger to start after messagebugs restart (Expected 0, got 0) :: [ 02:09:23 ] :: [ PASS ] :: Checking certmonger service status after messagebus service restart (Expected 0, got 0) :: [ 02:09:23 ] :: [ PASS ] :: certmonger does not crashes on dbus service restart :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (certmonger bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4671 |