Bug 1687698

Summary: Restarting dbus crashes certmonger
Product: Red Hat Enterprise Linux 8 Reporter: Mohammad Rizwan <myusuf>
Component: certmongerAssignee: Rob Crittenden <rcritten>
Status: CLOSED ERRATA QA Contact: ipa-qe <ipa-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.0CC: abokovoy, dking, ksiddiqu, nalin, pcech, pvoborni, rcritten
Target Milestone: rcKeywords: Regression, TestBlocker
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: certmonger-0.79.7-15.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-04 02:51:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
console.log none

Description Mohammad Rizwan 2019-03-12 07:34:42 UTC
Created attachment 1543123 [details]
console.log

Description of problem:
Restarting dbus crashes certmonger

Version-Release number of selected component (if applicable):
ipa-server-4.7.1-11.module+el8+2842+7481110c.x86_64
certmonger-0.79.6-5.el8.x86_64
dbus-1.12.8-7.el8.x86_64

How reproducible:
always

Steps to Reproduce:
1. Insatll IPA master
2. start certmonger service
   $ systemctl start certmonger
3. restart messagebus service
   $ systemctl restart messagebus
4. $ systemctl status certmonger

Actual results:
certmonger crash

Expected results:
certmonger should not crash

Additional info:
similar bz was closed in : past https://bugzilla.redhat.com/show_bug.cgi?id=1103090

Comment 2 Rob Crittenden 2019-03-13 14:42:38 UTC
Can you reproduce this manually?

Can you provide a stack trace from the core?

Comment 3 Mohammad Rizwan 2019-03-14 07:04:29 UTC
Yes, It is reproduced manually.

Comment 6 Rob Crittenden 2019-03-28 14:05:27 UTC
Providing my own stack trace.

Program received signal SIGTERM, Terminated.
0x00007f9acd5a17a8 in poll () from /lib64/libc.so.6
(gdb) where
#0  0x00007f9acd5a17a8 in poll () from /lib64/libc.so.6
#1  0x00007f9acfe63916 in poll (__timeout=<optimized out>, 
    __nfds=<optimized out>, __fds=<optimized out>)
    at /usr/include/bits/poll2.h:46
#2  _dbus_poll (fds=<optimized out>, n_fds=<optimized out>, 
    timeout_milliseconds=<optimized out>)
    at ../../dbus/dbus-sysdeps-unix.c:2837
#3  0x00007f9acfe5c70f in socket_do_iteration (transport=0x55cbecb0e310, 
    flags=6, timeout_milliseconds=<optimized out>)
    at ../../dbus/dbus-transport-socket.c:1183
#4  0x00007f9acfe5b491 in _dbus_transport_do_iteration (
    transport=0x55cbecb0e310, flags=<optimized out>, 
    timeout_milliseconds=<optimized out>) at ../../dbus/dbus-transport.c:1016
#5  0x00007f9acfe432ec in _dbus_connection_do_iteration_unlocked (
    connection=0x55cbecb0dcf0, pending=<optimized out>, flags=6, 
    timeout_milliseconds=25000) at ../../dbus/dbus-connection.c:1227
#6  0x00007f9acfe43cbd in _dbus_connection_block_pending_call (
    pending=0x55cbecb0b5a0) at ../../dbus/dbus-connection.c:2433
#7  0x00007f9acfe5550e in dbus_pending_call_block (pending=<optimized out>)
    at ../../dbus/dbus-pending-call.c:767
#8  0x00007f9acfe44286 in dbus_connection_send_with_reply_and_block (
    connection=connection@entry=0x55cbecb0dcf0, 
    message=message@entry=0x55cbecb05040, 
    timeout_milliseconds=timeout_milliseconds@entry=-1, error=error@entry=0x0)
    at ../../dbus/dbus-connection.c:3576
#9  0x00007f9acfe3ff38 in dbus_bus_register (connection=0x55cbecb0dcf0, 
    error=0x0) at ../../dbus/dbus-bus.c:695
#10 0x00007f9acfe401b4 in internal_bus_get (type=DBUS_BUS_SYSTEM, private=0, 
    error=0x0) at ../../dbus/dbus-bus.c:483
#11 0x000055cbeb49feac in cm_tdbus_reconnect (ec=0x55cbecaf5a90, 
    timer=<optimized out>, current_time=..., pvt=0x55cbecafd2f0) at tdbus.c:546
#12 0x00007f9acfc23bd9 in tevent_common_invoke_timer_handler ()
   from /lib64/libtevent.so.0
#13 0x00007f9acfc23d7e in tevent_common_loop_timer_delay ()
   from /lib64/libtevent.so.0
#14 0x00007f9acfc24f2b in epoll_event_loop_once () from /lib64/libtevent.so.0
#15 0x00007f9acfc231bb in std_event_loop_once () from /lib64/libtevent.so.0
#16 0x00007f9acfc1e395 in _tevent_loop_once () from /lib64/libtevent.so.0
#17 0x000055cbeb482799 in main (argc=<optimized out>, argv=<optimized out>)
    at main.c:413

It is blowing up on this line in certmonger:

tdb->conn = dbus_bus_get(DBUS_BUS_SESSION, NULL);

Near as I can tell exit_on_disconnect is set to FALSE.

Reproduction does not require an IPA master to be installed.

Steps are:

1. systemctl start certmonger
2. systemctl restart messagebus

Given it is failing doing a poll on a fd I'm re-assigning to dbus team for further evaluation.

Comment 8 David King 2019-09-10 14:18:07 UTC
Much like killing the X server for Xlib clients, standard libdbus behaviour is to terminate clients when the server is terminated. This is explained in detail in upstream Bugzilla (and more briefly in the libdbus documentation): https://bugs.freedesktop.org/show_bug.cgi?id=16338#c1

Comment 9 Rob Crittenden 2019-09-10 15:20:23 UTC
I think the certmonger dbus code attempts to reconnect when the bus is the system bus. Seems like that is wrong.

It sounds like for practical purposes certmonger should always set dbus_connection_set_exit_on_disconnect(conn, TRUE) so it exits gracefully?

certmonger is dbus-activated by systemd so if it goes away due to dbus restart it will be restarted upon first use of the clients.

If that is right then feel free to re-assign back to me and I'll make the changes in certmonger.

Comment 10 David King 2019-09-10 17:05:58 UTC
I am not really the person to ask about writing applications using libdbus. That would be the upstream dbus mailing list (where you would very likely be encouraged to stop using a long-deprecated library, where better alternatives exist).

The documentation for dbus_connection_set_exit_on_disconnect() suggests that is using dbus_bus_get(), exiting will be enabled by default: https://dbus.freedesktop.org/doc/api/html/group__DBusConnection.html#ga19091beb74f1504b0e862a7ad10e71cd

The behaviour of certmonger to go away due to dbus restart, and be re-activated as necessary, sounds fine to me.

Comment 14 Kaleem 2020-05-07 08:38:55 UTC
Steps to verify in description.

Comment 15 Rob Crittenden 2020-05-08 15:46:49 UTC
master: 39ce89ec821d02643681795d2149b20198f0fe42

Comment 21 Rob Crittenden 2020-07-21 16:01:57 UTC
systemd (pid 1) is sending a SIGTERM to certmonger when dbus restarts. There simply isn't anything we can do except ignore SIGTERM which isn't really a solution.

Comment 22 Alexander Bokovoy 2020-07-30 10:59:35 UTC
Rob, should we add a forced restart for the lost of dbus service?

[Unit]
After=dbus.service
BindsTo=dbus.service

See https://bugzilla.redhat.com/show_bug.cgi?id=1654779 for details of how this came as a solution.

Comment 23 Rob Crittenden 2020-07-30 14:04:56 UTC
Good idea. I think PartOf may be a better option and will still tie to stop/restart.

Doing so will require patience on the user's part if they restart dbus as it waits until all dependent services are restarted and certmonger takes a bit to get going.

I'll work on a patch.

Comment 24 Rob Crittenden 2020-08-06 15:39:34 UTC
Implemented PartOf in systemd file. Generally speaking messagebus should never be restarted without a reboot but given the repercussions of a long-running server where certmonger is never restarted automatically and certificates expiring is dire enough to take this action.

Comment 25 Kaleem 2020-08-12 06:26:44 UTC
Verified based on following output. For details refer the attached runner.log file.


IPA Version.

2020-08-12T06:09:25+0000 [ci-vm-10-0-105-109.h]  +-----------------------------[RPMs & OS: [RedHat - x86_64]-----------------------------+
2020-08-12T06:09:25+0000 [ci-vm-10-0-105-109.h] |       ipa-client-4.8.7-8.module+el8.3.0+7513+a375844a.x86_64
2020-08-12T06:09:25+0000 [ci-vm-10-0-105-109.h] |       ipa-client-common-4.8.7-8.module+el8.3.0+7513+a375844a.noarch
2020-08-12T06:09:25+0000 [ci-vm-10-0-105-109.h] |       sssd-ipa-2.3.0-7.el8.x86_64
2020-08-12T06:09:25+0000 [ci-vm-10-0-105-109.h] ------------------------------------------------------------------------------------------

Snip from console output:

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   certmonger should not dump dbus related core, bz1103090
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: [ 02:09:17 ] :: [   PASS   ] :: Starting certmonger service (Expected 0,1, got 0)
:: [ 02:09:17 ] :: [   PASS   ] :: restarting messagebus service (Expected 0, got 0)
:: [ 02:09:22 ] :: [   PASS   ] :: sleep for certmonger to start after messagebugs restart (Expected 0, got 0)
:: [ 02:09:23 ] :: [   PASS   ] :: Checking certmonger service status after messagebus service restart (Expected 0, got 0)
:: [ 02:09:23 ] :: [   PASS   ] :: certmonger does not crashes on dbus service restart 
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Comment 29 errata-xmlrpc 2020-11-04 02:51:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (certmonger bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4671