Hide Forgot
Description of problem: When attempting an update with dnfdragora, update finishes with a dialog: "DNF is locked by another process. dnfdragora will exit." This happens after the selected packages have finished upgrading. Version-Release number of selected component (if applicable): dnfdragora-1.1.1-3.fc31.noarch python3-dnfdaemon-0.3.19-7.fc31.noarch How reproducible: Everytime Steps to Reproduce: 1. Attempt to update the system using dnfdragora 2. At the end of updates, dialog box is displayed "DNF is locked by another process. dnfdragora will exit." Additional info: coredumpctl gdb 2444 PID: 2444 (dnfdaemon-syste) UID: 0 (root) GID: 0 (root) Signal: 6 (ABRT) Timestamp: Mon 2019-09-09 17:49:06 EDT (4min 13s ago) Command Line: /usr/bin/python3 /usr/share/dnfdaemon/dnfdaemon-system Executable: /usr/bin/python3.7 Control Group: /system.slice/dnfdaemon.service Unit: dnfdaemon.service Slice: system.slice Boot ID: a5cce266acaf451a87c10eec4734b67a Machine ID: 69d3916002784cefaaf940ea46fea120 Hostname: rpi3-1 Storage: /var/lib/systemd/coredump/core.dnfdaemon-syste.0.a5cce266acaf451a87c10eec4734b67a.2444.1568065746000000000000.lz4 Message: Process 2444 (dnfdaemon-syste) of user 0 dumped core. Stack trace of thread 2444: #0 0x00000000b6e8625c __GI_raise (libc.so.6) #1 0x00000000b6e6f8a0 __GI_abort (libc.so.6) #2 0x00000000b274ee3c _dbus_abort (libdbus-1.so.3) #3 0x00000000b2744bb4 _dbus_warn_check_failed (libdbus-1.so.3) #4 0x00000000b2745400 _dbus_warn_return_if_fail (libdbus-1.so.3) #5 0x00000000b27292c4 _dbus_connection_send_preallocated_and_unlock (libdbus-1.so.3) #6 0x00000000b27716bc Connection_send_message (_dbus_bindings.so)
Created attachment 1613351 [details] dnfdaemon backtrace
The updates are applied successfully, but this message is confusing. This also affects all arches. Proposing as a blocker for Fedora 31 Beta. Criteria "The installed system must be able appropriately to install, remove, and update software with the default tool for the relevant software type in all release-blocking desktops (e.g. default graphical package manager)."
Discussed at the Fedora 31 Beta Go/No-Go Meeting #1, acting as a blocker review meeting: https://meetbot-raw.fedoraproject.org/fedora-meeting-1/2019-09-12/f31-beta-go_no_go-meeting.2019-09-12-17.00.txt . Rejected as a blocker as this does not seem to impede the actual functionality at all - the updates are installed, and subsequent use of dnfdragora still works. It's just an unfortunate message.
Neal, any news here?
Proposing as a blocker for final, criteria "The installed system must be able appropriately to install, remove, and update software with the default tool for the relevant software type in all release-blocking desktops (e.g. default graphical package manager).". As discussed previously the updates are applied successfully but this error message is confusing and affects many of the Desktop spins. Perhaps the criteria should be extended slightly for final to say "without errors".
(In reply to Adam Williamson from comment #4) > Neal, any news here? I've not yet had a chance to do some digging here. I'm planning to investigate this weekend.
I've spoken to Daniel Mach about this at the DNF community meeting yesterday. The symptoms appear to point to a problem in libdnf that they've already been investigating. As agreed, I'm reassigning this to libdnf.
clearing RejectedBlocker (from Beta decision) so we can reconsider this for Final.
Discussed during the 2019-10-07 blocker review meeting: [0] The decision to delay the classification of this as a blocker bug was made as we did not have enough votes for a clear decision on this in meeting; we will ask for votes in-bug to get a clearer decision. [0] https://meetbot.fedoraproject.org/fedora-blocker-review/2019-10-07/f31-blocker-review.2019-10-07-16.02.txt
The problem is easily reproducible on Fedora 31: * dnf upgrade * dnf downgrade bash * dnfdragora --ncurses * mark [x] bash and run upgrade * error dialogue: "DNF is locked by another process. dnfdragora will exit." Unfortunately I was unable to reproduce it on Fedora 30 with the *same* codebase. I used the latest dnf and libdnf from rpmsoftwaremanagement/dnf-nightly COPR. That makes me think that the problem *might* be elsewhere, but we need to spend more time on discovering the root cause.
If I understand it correctly from comment 0, dnfdragora (in particular dnfdaemon-system) actually crashes at the end? In that case I believe we can apply All applications that can be launched using the standard graphical mechanism of a release-blocking desktop after a default installation of that desktop must start successfully and withstand a basic functionality test. https://fedoraproject.org/wiki/Fedora_31_Final_Release_Criteria#Default_application_functionality (even the error message itself can be considered violating the basic functionality) I'm nervous about allowing a system updater to crash, especially when the root cause is unknown and we're not completely sure about the consequences. I'm rather +1 blocker here.
dnfdragora is the GUI app, dnfdaemon is a sort of helper daemon that it uses. The user does not observe the GUI application crash. So I don't think that criterion really applies, unless I miss something.
-1 blocker, +1 FE. The updates do finish. The error is messy and looks bad, but doesn't itself break anything.
I was -1 blocker for beta since it was functional-but-ugly. I'm less okay with that for a final. I'm 0 blocker, +1 FE for now.
-1 Blocker but +1 FE since its doesn't break anything but just ugly.
-1 Blocker, +1 FE
So we have +1/-4 for blocker, +6 (counting me and kparal as +1) for FE. I kinda would like this to come up in a review meeting where someone can make the case for blocker status, though, so I'm gonna take an executive decision to accept as FE but leave blocker status open to the next meeting, with a presumption that we'll reject it unless the vote changes drastically.
(In reply to Adam Williamson from comment #12) > The user does not observe the GUI application crash. So I don't think > that criterion really applies, unless I miss something. The GUI does crash after closing the error window or clicking ok.
Discussed during the 2019-10-14 blocker review meeting: [0] The decision to classify this bug as an "AcceptedBlocker" was made as it violates the following criterion: "All applications that can be launched using the standard graphical mechanism of a release-blocking desktop after a default installation of that desktop must start successfully and withstand a basic functionality test" Note, for dnfdragora, we hold that 'install some updates without printing a confusing error and quitting' is 'basic' functionality for an updater. [0] https://meetbot.fedoraproject.org/fedora-blocker-review/2019-10-14/f31-blocker-review.2019-10-14-16.01.txt
So, Go/No-Go is on Thursday. This is the last blocker for which we really do not have a resolution at present. Daniel, can you please prioritize this? We really need it fixed ASAP, ideally tomorrow. Thanks a lot!
Doing a bit of digging into this myself, I managed to at least hack up dnfdragora to log a traceback showing us what codepath we're in here: 2019-10-14 17:53:29,436 [dnfdragora](ERROR) dnfdaemon client error: g-io-error-quark: GDBus.Error:org.baseurl.DnfSystem.LockedError: dnf is locked by another application (36) Traceback (most recent call last): File "/usr/bin/dnfdragora", line 85, in <module> main_gui.handleevent() File "/usr/lib/python3.7/site-packages/dnfdragora/ui.py", line 1273, in handleevent if not self._searchPackages(filter, True) : File "/usr/lib/python3.7/site-packages/dnfdragora/ui.py", line 949, in _searchPackages packages = self.backend.search(fields, strings, self.match_all, self.newest_only, tags ) File "/usr/lib/python3.7/site-packages/dnfdragora/misc.py", line 135, in newFunc rc = func(*args, **kwargs) File "/usr/lib/python3.7/site-packages/dnfdragora/dnf_backend.py", line 464, in search newest_only, tags) File "/usr/lib/python3.7/site-packages/dnfdaemon/client/__init__.py", line 508, in Search fields, keys, attrs, match_all, newest_only, tags)) File "/usr/lib/python3.7/site-packages/dnfdaemon/client/__init__.py", line 293, in _run_dbus_async result = self._get_result(data) File "/usr/lib/python3.7/site-packages/dnfdaemon/client/__init__.py", line 277, in _get_result self._handle_dbus_error(user_data['error']) File "/usr/lib/python3.7/site-packages/dnfdaemon/client/__init__.py", line 250, in _handle_dbus_error raise DaemonError(str(err)) dnfdaemon.client.DaemonError: g-io-error-quark: GDBus.Error:org.baseurl.DnfSystem.LockedError: dnf is locked by another application (36) which means that, somehow, shortly after the update operation finishes, we wind up handling a yui.YEvent.WidgetEvent for the 'find button', and that calls self._searchPackages which calls `search()` from dnf_backend.py which is wrapped in the ExceptionHandler decorator, and that's where we're somehow hitting this locking issue.
OK, did a bit more debugging. I think the error message is misleading us slightly here in two ways. For a start, this isn't actually any kind of "dnf" lock, AFAICT. It's a lock state for dnfdaemon itself. All it is is an attribute of a DnfDaemonBase instance called `_lock`; basically when a new DnfDaemonBase is initialized it is set to None. Then if the daemon receives a DBus 'Lock' method call, the attribute is set to the id of whatever process sent the call. If the daemon receives a DBus 'Unlock' method call, the attribute is cleared. There's also a 'check_lock()' function that gets called (via 'working_start()') when most other DBus method calls arrive that checks the attribute is set to the ID of the sender, and errors out if not. That's really the whole implementation - AFAICT it's not actually wired up to any of dnf's *own* locks in any way at all, unless I'm missing something. Secondly, the `check_lock()` function that produces the error "dnf is locked by another application" actually produces that error even if *the lock is not held at all*: def check_lock(self, sender): """ Check that the current sender is owning the dnf lock :param sender: """ if self._lock == sender: return True else: raise LockedError('dnf is locked by another application') note that it doesn't do a boolean check on self._lock - it just raises that error if it's not equal to sender. So I hooked up some more logging around all this in dnfdaemon, and found some interesting stuff: 2019-10-14 18:53:03.397784 XXX In DnfDaemon.init now! 2019-10-14 18:53:03.402804 XXX LOCK: sender is :1.1835 2019-10-14 18:53:03.407524 XXX CHECK LOCK: sender is :1.1835 XXX CHECK LOCK: self._lock is :1.1835 2019-10-14 18:53:07.556499 XXX CHECK LOCK: sender is :1.1835 XXX CHECK LOCK: self._lock is :1.1835 [...snip a bunch more calls to check_lock where the sender is the same...] 2019-10-14 18:53:13.560828 XXX CHECK LOCK: sender is :1.1835 XXX CHECK LOCK: self._lock is :1.1835 2019-10-14 18:53:13.560941 XXX CHECK LOCK: sender is :1.1835 XXX CHECK LOCK: self._lock is :1.1835 2019-10-14 18:53:16.513900 XXX In DnfDaemon.init now! 2019-10-14 18:53:16.516724 XXX CHECK LOCK: sender is :1.1835 XXX CHECK LOCK: self._lock is None here are the main app logs from the same time: 2019-10-14 18:53:03,227 [dnfdragora.ui](INFO) dnfdragora started 2019-10-14 18:53:03,401 [dnfdaemon.client](DEBUG) org.baseurl.DnfSystem daemon loaded - version : 2 2019-10-14 18:53:03,401 [dnfdragora.dnf_backend](DEBUG) dnfdaemon api version (2) 2019-10-14 18:53:03,401 [dnfdragora.base](DEBUG) Lock the DNF root daemon 2019-10-14 18:53:07,555 [dnfdragora.base](INFO) Locked the DNF root daemon 2019-10-14 18:53:07,813 [dnfdragora.dnf_backend](DEBUG) get-packages : updates 2019-10-14 18:53:08,524 [dnfdragora.misc](DEBUG) get_packages took 0.71 sec 2019-10-14 18:53:10,544 [dnfdragora.misc](DEBUG) _make_pkg_object_with_attr took 0.00 sec 2019-10-14 18:53:12,674 [dnfdragora.ui](DEBUG) adding: install rsync,0,3.1.3,10.fc31,x86_64,updates-testing 2019-10-14 18:53:12,680 [dnfdragora.dnf_backend](DEBUG) TransactionEvent : start-build 2019-10-14 18:53:12,735 [dnfdragora.dnf_backend](DEBUG) TransactionEvent : end-build 2019-10-14 18:53:13,563 [dnfdragora.dnf_backend](DEBUG) on_RepoMetaDataProgress (root): ('', 0.0) 2019-10-14 18:53:13,849 [dnfdragora.dnf_backend](DEBUG) on_RepoMetaDataProgress (root): ('', 0.06477154724818276) 2019-10-14 18:53:13,851 [dnfdragora.dnf_backend](DEBUG) on_RepoMetaDataProgress (root): ('', 0.23234683281412252) 2019-10-14 18:53:13,851 [dnfdragora.dnf_backend](DEBUG) on_RepoMetaDataProgress (root): ('', 0.5674974039460021) 2019-10-14 18:53:13,925 [dnfdragora.dnf_backend](DEBUG) on_RepoMetaDataProgress (root): ('', 0.7350726895119418) 2019-10-14 18:53:13,926 [dnfdragora.dnf_backend](DEBUG) on_RepoMetaDataProgress (root): ('', 1.0) 2019-10-14 18:53:13,953 [dnfdragora.dnf_backend](DEBUG) Downloaded : rsync-3.1.3-10.fc31.x86_64.rpm 2019-10-14 18:53:14,711 [dnfdragora.dnf_backend](DEBUG) on_RPMProgress : [rsync,0,3.1.3,10.fc31,x86_64,updates-testing] 2019-10-14 18:53:14,845 [dnfdragora.dnf_backend](DEBUG) on_RPMProgress : [rsync,0,3.1.3,9.fc31,x86_64,@System] 2019-10-14 18:53:15,894 [dnfdragora.dnf_backend](DEBUG) on_RPMProgress : [rsync,0,3.1.3,10.fc31,x86_64,updates-testing] 2019-10-14 18:53:15,894 [dnfdragora.dnf_backend](DEBUG) on_RPMProgress : [rsync,0,3.1.3,9.fc31,x86_64,@System] 2019-10-14 18:53:16,517 [dnfdragora](ERROR) dnfdaemon client error: g-io-error-quark: GDBus.Error:org.baseurl.DnfSystem.LockedError: dnf is locked by another application (36) Traceback (most recent call last): File "/usr/bin/dnfdragora", line 85, in <module> main_gui.handleevent() File "/usr/lib/python3.7/site-packages/dnfdragora/ui.py", line 1273, in handleevent if not self._searchPackages(filter, True) : File "/usr/lib/python3.7/site-packages/dnfdragora/ui.py", line 949, in _searchPackages packages = self.backend.search(fields, strings, self.match_all, self.newest_only, tags ) File "/usr/lib/python3.7/site-packages/dnfdragora/misc.py", line 135, in newFunc rc = func(*args, **kwargs) File "/usr/lib/python3.7/site-packages/dnfdragora/dnf_backend.py", line 464, in search newest_only, tags) File "/usr/lib/python3.7/site-packages/dnfdaemon/client/__init__.py", line 508, in Search fields, keys, attrs, match_all, newest_only, tags)) File "/usr/lib/python3.7/site-packages/dnfdaemon/client/__init__.py", line 293, in _run_dbus_async result = self._get_result(data) File "/usr/lib/python3.7/site-packages/dnfdaemon/client/__init__.py", line 277, in _get_result self._handle_dbus_error(user_data['error']) File "/usr/lib/python3.7/site-packages/dnfdaemon/client/__init__.py", line 250, in _handle_dbus_error raise DaemonError(str(err)) dnfdaemon.client.DaemonError: g-io-error-quark: GDBus.Error:org.baseurl.DnfSystem.LockedError: dnf is locked by another application (36) Also note that there is no call to dnfdaemon's Unlock method - I wired up logging for that too, and it does *not* fire between the last call to check_lock() where self._lock came up as ":1.1835" (at 18:53:13) and the call where self._lock came up as "None" (at 18:53:16). No. As we can see, what happens instead is somehow *we get a whole new DnfDaemon instance* at 18:53:16.513900 and the `search()` runs almost instantly, *without* any call to `Lock()` happening in between...so we get the "dnf is locked by another application" error not because the new daemon instance is locked by another application but because the new daemon instance *isn't locked at all* (self._lock for the new DnfDaemon is None, because Lock has not happened). I don't know *why* this is happening yet, though.
heyyyy, okay, this is making a bit more sense now. What happens is dnfdaemon *dies* at 18:53:15 and gets automatically restarted by systemd: Oct 14 18:53:03 adam.happyassassin.net systemd[1]: Starting Package management dnf daemon... Oct 14 18:53:03 adam.happyassassin.net systemd[1]: Started Package management dnf daemon. Oct 14 18:53:15 adam.happyassassin.net dnfdaemon-system[226042]: dbus[226042]: arguments to dbus_connection_unref() were incorrect, assertion "connection->generation == _dbus_current_generation" failed in file ../../dbus/dbus-connection.c line 2823. Oct 14 18:53:15 adam.happyassassin.net dnfdaemon-system[226042]: This is normally a bug in some application using the D-Bus library. Oct 14 18:53:15 adam.happyassassin.net dnfdaemon-system[226042]: D-Bus not built with -rdynamic so unable to print a backtrace Oct 14 18:53:16 adam.happyassassin.net systemd[1]: dnfdaemon.service: Main process exited, code=dumped, status=6/ABRT Oct 14 18:53:16 adam.happyassassin.net systemd[1]: dnfdaemon.service: Failed with result 'core-dump'. Oct 14 18:53:16 adam.happyassassin.net systemd[1]: dnfdaemon.service: Consumed 7.153s CPU time. Oct 14 18:53:16 adam.happyassassin.net systemd[1]: Starting Package management dnf daemon... Oct 14 18:53:16 adam.happyassassin.net systemd[1]: Started Package management dnf daemon. so dnfdragora winds up sending its Search dbus method call to the *new* dnfdaemon instance, which it has not yet locked, and then we see the error in dnfdragora. But really the problem here is what's causing the daemon to drop dead a second earlier in the first place. That's what we need to find out and fix.
I rebuilt dbus with --enable-asserts (so we get a traceback) and also set dnfdaemon to -v -d, and that gets us this: Oct 14 22:09:13 adam.happyassassin.net dnfdaemon-system[11383]: 22:09:13: setting max_err to : 1 Oct 14 22:09:14 adam.happyassassin.net dnfdaemon-system[11383]: 22:09:14: download status : None - rsync-3.1.3-10.fc31.x86_64.rpm Oct 14 22:09:14 adam.happyassassin.net dnfdaemon-system[11383]: 22:09:14: checking signature for : rsync-3.1.3-10.fc31.x86_64, 0 Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: dbus[11383]: assertion failed "connection->generation == _dbus_current_generation" file "../../dbus/dbus-connection.c" line 1424 function _dbus_connection_ref_unlocked Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libdbus-1.so.3(_dbus_print_backtrace+0x36) [0x7fb1a41be8f6] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libdbus-1.so.3(_dbus_abort+0xf) [0x7fb1a41c20af] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libdbus-1.so.3(_dbus_real_assert+0x1f) [0x7fb1a41afa9f] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libdbus-1.so.3(_dbus_connection_ref_unlocked+0x67) [0x7fb1a4182307] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libdbus-1.so.3(+0x1c23f) [0x7fb1a418423f] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libdbus-1.so.3(+0x1c4d6) [0x7fb1a41844d6] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libdbus-1.so.3(+0x1c849) [0x7fb1a4184849] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libdbus-1.so.3(+0x1d455) [0x7fb1a4185455] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libdbus-1.so.3(+0x1d522) [0x7fb1a4185522] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /usr/lib64/python3.7/site-packages/_dbus_bindings.so(+0xb17f) [0x7fb1a423317f] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(_PyMethodDef_RawFastCallKeywords+0x25e) [0x7fb1b48b939e] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(_PyMethodDescr_FastCallKeywords+0x4d) [0x7fb1b48eb40d] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(+0x14059b) [0x7fb1b48eb59b] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(_PyEval_EvalFrameDefault+0xbfd) [0x7fb1b4921fbd] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(_PyEval_EvalCodeWithName+0x2f0) [0x7fb1b48d9120] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(_PyFunction_FastCallKeywords+0x2a2) [0x7fb1b48da122] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(+0x1404bf) [0x7fb1b48eb4bf] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(_PyEval_EvalFrameDefault+0xbfd) [0x7fb1b4921fbd] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(_PyEval_EvalCodeWithName+0x2f0) [0x7fb1b48d9120] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(_PyFunction_FastCallKeywords+0x2a2) [0x7fb1b48da122] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(+0x1404bf) [0x7fb1b48eb4bf] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(_PyEval_EvalFrameDefault+0xbfd) [0x7fb1b4921fbd] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(_PyEval_EvalCodeWithName+0x2f0) [0x7fb1b48d9120] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(_PyFunction_FastCallDict+0x2ee) [0x7fb1b48da7be] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(_PyEval_EvalFrameDefault+0x230c) [0x7fb1b49236cc] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(_PyEval_EvalCodeWithName+0x2f0) [0x7fb1b48d9120] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(_PyFunction_FastCallDict+0x2ee) [0x7fb1b48da7be] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(_PyEval_EvalFrameDefault+0x230c) [0x7fb1b49236cc] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(_PyEval_EvalCodeWithName+0x2f0) [0x7fb1b48d9120] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(_PyFunction_FastCallDict+0x2ee) [0x7fb1b48da7be] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(_PyObject_Call_Prepend+0x66) [0x7fb1b48aeed6] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(_PyObject_FastCallDict+0xbd) [0x7fb1b48aa9fd] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(+0x10d07f) [0x7fb1b48b807f] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(PyObject_CallFunctionObjArgs+0x99) [0x7fb1b48b82b9] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /usr/lib64/python3.7/site-packages/_dbus_bindings.so(+0xa81f) [0x7fb1a423281f] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /usr/lib64/python3.7/site-packages/_dbus_bindings.so(+0xb45d) [0x7fb1a423345d] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libdbus-1.so.3(+0x3808a) [0x7fb1a41a008a] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libdbus-1.so.3(dbus_connection_dispatch+0xa77) [0x7fb1a4188be7] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /usr/lib64/python3.7/site-packages/_dbus_glib_bindings.so(+0x2f09) [0x7fb1a4211f09] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libglib-2.0.so.0(g_main_context_dispatch+0x160) [0x7fb1a66d34a0] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libglib-2.0.so.0(+0x51830) [0x7fb1a66d3830] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libglib-2.0.so.0(g_main_loop_run+0x83) [0x7fb1a66d3b23] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libffi.so.6(ffi_call_unix64+0x4c) [0x7fb1a5d4aaa8] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libffi.so.6(ffi_call+0x1d4) [0x7fb1a5d4a2a4] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /usr/lib64/python3.7/site-packages/gi/_gi.cpython-37m-x86_64-linux-gnu.so(+0x30c35) [0x7fb1a48a7c35] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /usr/lib64/python3.7/site-packages/gi/_gi.cpython-37m-x86_64-linux-gnu.so(+0x3276c) [0x7fb1a48a976c] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /usr/lib64/python3.7/site-packages/gi/_gi.cpython-37m-x86_64-linux-gnu.so(+0x2330b) [0x7fb1a489a30b] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(_PyObject_FastCallKeywords+0x4bc) [0x7fb1b48ea39c] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(+0x140649) [0x7fb1b48eb649] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(_PyEval_EvalFrameDefault+0x5592) [0x7fb1b4926952] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(_PyEval_EvalCodeWithName+0x2f0) [0x7fb1b48d9120] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(_PyFunction_FastCallKeywords+0x2a2) [0x7fb1b48da122] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(+0x1404bf) [0x7fb1b48eb4bf] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(_PyEval_EvalFrameDefault+0xbfd) [0x7fb1b4921fbd] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(_PyFunction_FastCallKeywords+0x102) [0x7fb1b48d9f82] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(+0x1404bf) [0x7fb1b48eb4bf] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(_PyEval_EvalFrameDefault+0xbfd) [0x7fb1b4921fbd] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(_PyFunction_FastCallKeywords+0x102) [0x7fb1b48d9f82] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(+0x1404bf) [0x7fb1b48eb4bf] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(_PyEval_EvalFrameDefault+0xb58) [0x7fb1b4921f18] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(_PyEval_EvalCodeWithName+0x2f0) [0x7fb1b48d9120] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(PyEval_EvalCodeEx+0x39) [0x7fb1b48d9e79] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(PyEval_EvalCode+0x1b) [0x7fb1b4968b5b] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(+0x1fea83) [0x7fb1b49a9a83] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(PyRun_FileExFlags+0x97) [0x7fb1b49a9fd7] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(PyRun_SimpleFileExFlags+0x19a) [0x7fb1b49b04ca] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(+0x207251) [0x7fb1b49b2251] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libpython3.7m.so.1.0(_Py_UnixMain+0x3c) [0x7fb1b49b23fc] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /lib64/libc.so.6(__libc_start_main+0xf3) [0x7fb1b4b1d1a3] Oct 14 22:09:16 adam.happyassassin.net dnfdaemon-system[11383]: /usr/bin/python3(_start+0x2e) [0x55734166808e] Still, there's two seconds between the last log message from dnfdaemon-system and when it actually blows up; per the dnfdragora logs I think it's got well past the signature check by that point, so I still don't know exactly what it's doing when it dies :/
Created attachment 1625859 [details] better backtrace (with udev asserts enabled) I got a better backtrace with the new udev build, here it is. Based on this and some debug logging, I believe we crash in this bit of dnfdaemon/server/__init__.py DnfDaemonBase.run_transaction(): self._can_quit = True self._reset_base() self.TransactionEvent('end-run', NONE) <---- HERE result = json.dumps((rc, msgs)) return result we crash exactly during the `self.TransactionEvent('end-run', NONE)` call, I believe.
So, after playing around with this a bit more, I don't yet have a convincing explanation for *why*, but I think what's happening here is that dnfdaemon will crash as soon as it tries to send any dbus signal after DnfDaemonBase._reset_base() is called. If I move the self._reset_base() call in run_transaction() to the end, the `self.TransactionEvent('end-run', NONE)` works OK, and we crash as soon as something *else* tries to send a signal. If I comment that out entirely, we don't crash until after an Unlock event has run - which is another thing that calls `_reset_base()`. `_reset_base()` only does this: def _reset_base(self): """Close the current dnf.Base object.""" if self._base: self._base.close() self._base = None so, now the question is, why is doing that causing problems for subsequent dbus stuff?
I think the next step is gonna have to be bisecting libdnf, but just to add one more nugget, it seems to be the `self._base.close()` that's the problem, not the `self._base = None`. Just commenting that out "fixes the bug", except of course it's wrong. We need to know why closing the dnf.Base object somehow breaks dnfdaemon's subsequent dbus interactions.
Narrowed it down some more: the thing that seems to cause the problem is the call to `self._closeRpmDB()` in dnf `Base._finalize_base()`, which is called by `Base.close()`. If I change dnfdaemon's `_reset_base()` implementation to call `self._base._closeRpmDB()` instead of `self._base.close()`, that's enough to cause the problem to keep happening. If I instead replace the call to `self._base.close()` with, say, `self._base.reset(True, True, True)` (the other half of what `close()` does), or `self._base._store_persistent_data()` (another thing that `_finalize_base()` does), that's not enough to make the problem happen - but just calling `_closeRpmDB()` is enough to make it happen.
OK, so have dnfdaemon's `_reset_base()` just do this: def _reset_base(self): if self._base: self._base._priv_ts.ts = None self._base = None is enough to trigger the problem. That's one of the things that dnf.rpm.transaction.TransactionWrapper.close() does, and that is the thing that ultimately happens when `Base._closeRpmDB()` is called. It also calls `closeDB()` on the ts object and does `self.open = False`, but those aren't needed to trigger the bug - it's the setting of the wrapper's `ts` attribute to `None` that causes the problem, somehow.
OOOOH! Big news. I've found the culprit, I think: rpm-plugin-systemd-inhibit . This is an rpm plugin (rpm has plugins? who knew) which tries to inhibit systemd from shutting down or suspending the system while a transaction is occurring. It does this...via dbus. It is recommended by python3-dnf, so it is installed by default basically always. If I remove that plugin, the bug doesn't happen. Install it again, the bug happens. The code for the plugin has a cleanup function which calls dbus_shutdown(): static void systemd_inhibit_cleanup(rpmPlugin plugin) { dbus_shutdown(); } which, right now, is my suspect. My best theory as of right now: the Base's `_priv_ts.ts` is an rpm.TransactionSet instance. I think when it goes out of scope - when we drop our last/only reference to it, in dnf - rpm's code winds up calling the cleanup function, and the call to dbus_shutdown() happens. dbus_shutdown() is documented thus: https://dbus.freedesktop.org/doc/api/html/group__DBusMemory.html#ga01912903e39428872920d861ef565bac "Frees all memory allocated internally by libdbus and reverses the effects of dbus_threads_init(). libdbus keeps internal global variables, for example caches and thread locks, and it can be useful to free these internal data structures. ... You can't continue to use any D-Bus objects, such as connections, that were allocated prior to dbus_shutdown(). You can, however, start over; call dbus_threads_init() again, create new connections, and so forth." so, it's a pretty big cannon. And remember, we're all in a single process here - the dnfdaemon process. So I think this dbus_shutdown() call in the rpm plugin, which runs in that same process, winds up nerfing dnfdaemon's *own* dbus usage.
*aha*. j'accuse: Panu Matilainen! https://github.com/rpm-software-management/rpm/commit/d5f201345f6d27b6280750e5c6502f4418614fbc https://bugzilla.redhat.com/show_bug.cgi?id=1714657
FEDORA-2019-5caeccb98d has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2019-5caeccb98d
(In reply to Adam Williamson from comment #30) > > You can't continue to use any D-Bus objects, such as connections, that were > allocated prior to dbus_shutdown(). You can, however, start over; call > dbus_threads_init() again, create new connections, and so forth." > > so, it's a pretty big cannon. And remember, we're all in a single process > here - the dnfdaemon process. So I think this dbus_shutdown() call in the > rpm plugin, which runs in that same process, winds up nerfing dnfdaemon's > *own* dbus usage. Right, I only thought about shutting down the handle *we* have but I admit the thought of another dbus user in the same process never crossed my mind. So either we need to figure a way to only call shutdown if we're the only user, or revert the change. I doubt the former is possible at all.
https://github.com/rpm-software-management/rpm/pull/900
(In reply to Fedora Update System from comment #32) > FEDORA-2019-5caeccb98d has been submitted as an update to Fedora 31. > https://bodhi.fedoraproject.org/updates/FEDORA-2019-5caeccb98d Fixes the problem in dnfdragora.
"Right, I only thought about shutting down the handle *we* have but I admit the thought of another dbus user in the same process never crossed my mind." Sure, it's a tricky problem to foresee - I'm sure if I was reviewing your PR I wouldn't have thought about it either! I hope I didn't come off as suggesting you should've foreseen this, I just like saying "j'accuse" :)
(In reply to Adam Williamson from comment #36) > Sure, it's a tricky problem to foresee - I'm sure if I was reviewing your PR > I wouldn't have thought about it either! I hope I didn't come off as > suggesting you should've foreseen this, I just like saying "j'accuse" :) Oh no, not at all. No worries :)
rpm-4.15.0-2.fc31 has been pushed to the Fedora 31 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-5caeccb98d
we don't need to track this as *both* an FE *and* a blocker. blocker beats FE.
rpm-4.15.0-2.fc31 has been pushed to the Fedora 31 stable repository. If problems still persist, please make note of it in this bug report.
Is it possible to have a build for FC30, too please? Reasons are bug 1667903 and bug 1751196 and there might be quite some more: https://bugzilla.redhat.com/buglist.cgi?component=dnfdragora&list_id=10596445&product=Fedora I also like to point to my upstream report: https://github.com/manatools/dnfdragora/issues/140
This bug has nothing to do with either of those reports, and does not affect F30. Note the bug here was actually in rpm, not dnfdragora, and we fixed it in rpm. F30's rpm does not have that bug - F30 uses RPM 4.14, the bug was introduced in 4.15.
*** Bug 1761620 has been marked as a duplicate of this bug. ***