Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1798899

Summary: Missing API to know for in-process dbus users to know when its safe to shut it down
Product: Red Hat Enterprise Linux 8 Reporter: Jan Pazdziora (Red Hat) <jpazdziora>
Component: dbusAssignee: David King <dking>
Status: CLOSED WONTFIX QA Contact: Desktop QE <desktop-qa-list>
Severity: urgent Docs Contact:
Priority: high    
Version: 8.1CC: aaron.mccrocklin, dbodnarc, dmach, jcastran, jpazdziora, kkohli, kwalker, mdomonko, mschwabe, ngompa13, packaging-team-maint, pmatilai, rmetrich, swm-qe, vijsingh
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1783346 Environment:
Last Closed: 2020-02-10 18:21:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1783346    
Bug Blocks: 1786127, 1790794    

Description Jan Pazdziora (Red Hat) 2020-02-06 09:31:01 UTC
When multiple in-process libraries use dbus and want to cleanly shut the connection down, they can break each other. An API to handle the counting might be needed.

+++ This bug was initially created as a clone of Bug #1783346 +++

Description of problem:
 A yum install or remove results in the following error:

    dbus[PID]: arguments to dbus_connection_close() were incorrect, assertion "connection->generation == _dbus_current_generation" failed in file ../../dbus/dbus-connection.c line 2936.

 The following backtrace is then subsequently seen in the logs:

    systemd-coredump[PID]: Process PID (dnf) of user 0 dumped core.
                                                         
    Stack trace of thread 4260:
    #0  0x00007f7514bc58df raise (libc.so.6)
    #1  0x00007f7514bafcf5 abort (libc.so.6)
    #2  0x00007f74ffdcb82d _dbus_abort.cold.0 (libdbus-1.so.3)
    #3  0x00007f74ffdedbd0 _dbus_warn_check_failed (libdbus-1.so.3)
    #4  0x00007f750001934f Connection_tp_dealloc (_dbus_bindings.so)
    #5  0x00007f7515a6e22e subtype_dealloc (libpython3.6m.so.1.0)
    #6  0x00007f75159eb9df dict_dealloc (libpython3.6m.so.1.0)
    #7  0x00007f75159d376f PyDict_Clear (libpython3.6m.so.1.0)
    #8  0x00007f75159ecfe3 type_clear (libpython3.6m.so.1.0)
    #9  0x00007f75159f80e2 collect (libpython3.6m.so.1.0)
    #10 0x00007f7515b0587b _PyGC_CollectNoFail (libpython3.6m.so.1.0)
    #11 0x00007f7515ab8a6f PyImport_Cleanup (libpython3.6m.so.1.0)
    #12 0x00007f7515b198e2 Py_FinalizeEx (libpython3.6m.so.1.0)
    #13 0x00007f7515b19a48 Py_Exit (libpython3.6m.so.1.0)
    #14 0x00007f7515b19b33 handle_system_exit (libpython3.6m.so.1.0)
    #15 0x00007f7515b19b96 PyErr_PrintEx (libpython3.6m.so.1.0)
    #16 0x00007f7515b1a005 PyRun_SimpleFileExFlags (libpython3.6m.so.1.0)
    #17 0x00007f7515b1b07b Py_Main (libpython3.6m.so.1.0)
    #18 0x000056150566bc68 main (platform-python3.6)
    #19 0x00007f7514bb1873 __libc_start_main (libc.so.6)
    #20 0x000056150566bdde _start (platform-python3.6)


Version-Release number of selected component (if applicable):
 rpm-plugin-systemd-inhibit-4.14.2-25

How reproducible:
 Difficult - Only observed in customer environments

Steps to Reproduce:
1. TBD
2.
3.

Actual results:
 The above dbus connection error is reported and the application subsequently encounters an segfault.

Expected results:
 YUM continues operating with no failures reported

Additional info:
 This looks to be a manifestation of the upstream problem below:

    1750575 – dnfdragora complains that dnf is locked by another process after updates (due to dnfdaemon crashing when attempting dbus operation after dnf.Base instance is closed)
    https://bugzilla.redhat.com/show_bug.cgi?id=1750575

 Where the fix requires a revert of the fix from bug:

    1714657 – Valgrind reports errors and lost memory when running rpm -Uvh with rpm-plugin-systemd-inhibit
    https://bugzilla.redhat.com/show_bug.cgi?id=1714657

 The specific patch needing a revert:

    Revert "Fully shutdown DBUS on systemd_inhibit cleanup (RhBug:1714657)" by pmatilai · Pull Request #900 · rpm-software-management/rpm
    https://github.com/rpm-software-management/rpm/pull/900/commits/c9863472aa0302fafdc5b7ebca25215867c38503

--- Additional comment from Panu Matilainen on 2019-12-16 10:32:59 CET ---

Oh ugh, so we ended up with this on RHEL too. What is peculiar that it was thought this only happens with dnfdragora which is now known to use dbus on its own so crashing due to dbus shutdown is understandable, but I don't know what yum would do with dbus, and why this didn't show up in QA at all. What yum/dnf plugins are present when this crashes? 

ACK for reverting the dbus-shutdown patch anyway.

--- Additional comment from Michal Domonkos on 2020-01-03 14:21:43 CET ---

(In reply to Panu Matilainen from comment #1)
> Oh ugh, so we ended up with this on RHEL too. What is peculiar that it was
> thought this only happens with dnfdragora which is now known to use dbus on
> its own so crashing due to dbus shutdown is understandable, but I don't know
> what yum would do with dbus, and why this didn't show up in QA at all. What
> yum/dnf plugins are present when this crashes? 

It seems that subscription-manager could be the "offender" here (in quotes, since the offender really is RPM and we're going to do the revert), also see:

https://bugzilla.redhat.com/show_bug.cgi?id=1752965#c1

Comment 1 Jan Pazdziora (Red Hat) 2020-02-06 09:31:47 UTC
Please see also https://bugzilla.redhat.com/show_bug.cgi?id=1798401 for more discussion about the effects of the situation.

Comment 2 Panu Matilainen 2020-02-06 14:26:03 UTC
So the short summary of the situation is that we have a library which is dlopen()'ing its own plugins, and one of those plugins uses DBUS to establish a systemd inhibition lock while we do critical stuff, and then once the critical stuff is done, release the lock. This is all hidden inside the library, the calling application may or may not be using DBUS for anything.

This is all working just fine, except that it appears to leak memory from the DBUS-connection. What the plugin does is:

    bus = dbus_bus_get_private(DBUS_BUS_SYSTEM, &err);
    /* ... do the actual work... */
    
    if (bus) {
        dbus_connection_close(bus);
        dbus_connection_unref(bus);
    }

This is not enough to entirely to free all the resources allocated by the DBUS library, valgrind will show multiple leaks related to this. Calling dbus_shutdown() at the end does clear up those leaks, but this kills DBUS regardless of whether there are other in-process users or not, and if there are, those will crash and burn. I don't know whether this is by design or not, I would've kind of expected shutdown to honor reference counting but I can see reasons to the contrary as well.

dbus_shutdown() looks like something that an application would call just before exiting, but when the application itself is not at all aware of the DBUS usage there's no clear point of exit and no safe place to call it - even in library destructors you never know if its actually the last remaining user. Unless I'm missing something, there doesn't seem to be possible to handle this cleanly with current DBUS API.

Comment 3 David King 2020-02-10 18:21:30 UTC
You would be welcome to file a feature request for this upstream: https://gitlab.freedesktop.org/dbus/dbus/issues/ or to ask for help on the dbus mailing list: dbus.org

I do not think it is likely that any API to make this possible will be added - it is the task of the library user to see that dbus_threads_init() is mirrored by calling dbus_shutdown(), and the API reference has exactly this to say: "You have to know that nobody is using libdbus in your application's process before you can call dbus_shutdown(). One implication of this is that calling dbus_shutdown() from a library is almost certainly wrong, since you don't know what the rest of the app is up to."

Comment 4 Panu Matilainen 2020-02-11 08:42:15 UTC
Ack, thanks for confirming.

It is a bit embarrassing that everybody involved on the rpm side has missed the "DONT" from the dbus_shutdown() API documentation :)