Bug 2408378
| Summary: | Offline system update crashes in libdnf::Swdb::setItemDone/libdnf::TransactionItemBase::setState | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Bapf <bugs.redhat> | ||||||||
| Component: | libdnf | Assignee: | rpm-software-management | ||||||||
| Status: | NEW --- | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
| Severity: | unspecified | Docs Contact: | |||||||||
| Priority: | unspecified | ||||||||||
| Version: | 43 | CC: | ales.astone, awilliam, daniel.mach, gnome-sig, jmracek, jrohel, kde-sig, mblaha, mcatanza, ngompa13, pkratoch, quantum.analyst, rhughes, rpm-software-management | ||||||||
| Target Milestone: | --- | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | Unspecified | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | --- | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | Type: | Bug | |||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
Problem is PackageKit is crashing. Unfortunately, systemd-coredump is not running in the offline environment, so I have no clue how we might debug the crash. This would probably be easy to solve if we figure out how to enable systemd-coredump (and then reproduce the problem). I *think* it should be pretty simple to enable coredumpd in the upgrade environment: mkdir -p /etc/systemd/system/system-update.target.wants ln -s /usr/lib/systemd/system/systemd-coredump.socket /etc/systemd/system/system-update.target.wants I *think* that ought to do it. But then we need to be able to reproduce the bug. If Bapf or anybody else who has encountered this problem is able to test that and report back, it would be a big help. I fear it's not terribly likely that the issue will be reproducible given that the upgrade is modifying packages on your system and you're not going to be left in the same state you were when you started. But maybe you'll get really lucky and hit the crash again? Or maybe somebody will encounter problem this when upgrading a VM, where you can roll back to a previous snapshot.... it would be really useful if anyone who encounters this happens to have a list of the packages that were installed at the time they ran the upgrade. I'm not sure if the journal from the boot *before* the failed upgrade (the boot where the upgrade was downloaded/prepared) might have that. In the logs from the upgrade boot we get a partial list of the old packages, but only that, it's clearly incomplete. The log has a list in a not-so-obvious form, I'm assuming that every line with "package upgrading" was already on the system and anything with "package installing" is new. I took all those lines, dropped the package versions and put those into a list (which I'll attach here). Note that this ends up installing both `wine-core.x86_64` and `wine-core.i686`. Because our common issues say to uninstall wine but it's still part of the upgrade set, I assume that the original reporter didn't do that but instead followed the advice from https://bugzilla.redhat.com/show_bug.cgi?id=2401666#c10 I can reproduce in a VM in Boxes (on F42 if that matters): 1. Install F42, run all updates 2. Install rpmfusion-{free,nonfree} 3. Install the attached packages (sudo dnf install $(cat packages.txt) --allowerasing) (note you need to allow erasing because ffmpeg is installed) 4. Following the wine-core conflict advice, delete the symlinks in `/usr/{lib,lib64}/wine` (2 in each dir) 5. Enable coredumpd in the upgrade environment with https://bugzilla.redhat.com/show_bug.cgi?id=2408378#c2 6. Perform a system upgrade from Software Below is the stack track, though it doesn't look amazing: PID: 808 (packagekitd) UID: 0 (root) GID: 0 (root) Signal: 11 (SEGV) Timestamp: Thu 2025-10-30 20:39:21 EDT (22min ago) Command Line: /usr/libexec/packagekitd Executable: /usr/libexec/packagekitd Control Group: /system.slice/packagekit.service Unit: packagekit.service Slice: system.slice Boot ID: 5c8cf6725ca743e0a64518fc9ab92b95 Machine ID: 6148a20714c544e1a03860ddec190fd3 Hostname: fedora Storage: /var/lib/systemd/coredump/core.packagekitd.0.5c8cf6725ca743e0a64518fc9ab92b95.808.1761871161000000.zst (present) Size on Disk: 71.4M Message: Process 808 (packagekitd) of user 0 dumped core. Module /usr/libexec/packagekitd (deleted) without build-id. Module /usr/libexec/packagekitd (deleted) Module /usr/lib64/libcap-ng.so.0.0.0 (deleted) without build-id. Module /usr/lib64/libcap-ng.so.0.0.0 (deleted) Module /usr/lib64/libcrypto.so.3.2.6 from rpm openssl-3.2.6-2.fc42.x86_64 Module /usr/lib64/libssl.so.3.2.6 from rpm openssl-3.2.6-2.fc42.x86_64 Module /usr/lib64/libnghttp2.so.14.28.3 from rpm nghttp2-1.64.0-3.fc42.x86_64 Module /usr/lib64/librpm.so.10.2.1 from rpm rpm-4.20.1-1.fc42.x86_64 Module /usr/lib64/libbrotlidec.so.1.1.0 (deleted) without build-id. Module /usr/lib64/libbrotlidec.so.1.1.0 (deleted) Module /usr/lib64/librpmio.so.10.2.1 from rpm rpm-4.20.1-1.fc42.x86_64 Module /usr/lib64/packagekit-backend/libpk_backend_dnf.so (deleted) without build-id. Module /usr/lib64/packagekit-backend/libpk_backend_dnf.so (deleted) Module /usr/lib64/libcap.so.2.73 from rpm libcap-2.73-2.fc42.x86_64 Module /usr/lib64/libffi.so.8.1.4 from rpm libffi-3.4.6-5.fc42.x86_64 Module /usr/lib64/libgmodule-2.0.so.0.8400.4 from rpm glib2-2.84.4-1.fc42.x86_64 Module /usr/lib64/libsystemd.so.0.40.0 from rpm systemd-257.10-1.fc42.x86_64 Module /usr/lib64/libgio-2.0.so.0.8400.4 from rpm glib2-2.84.4-1.fc42.x86_64 Module /usr/lib64/libgobject-2.0.so.0.8400.4 from rpm glib2-2.84.4-1.fc42.x86_64 Module /usr/lib64/libglib-2.0.so.0.8400.4 from rpm glib2-2.84.4-1.fc42.x86_64 Module /usr/lib64/libpackagekit-glib2.so.18.1.3 (deleted) without build-id. Module /usr/lib64/libpackagekit-glib2.so.18.1.3 (deleted) Module libdbus-1.so.3 from rpm dbus-1.16.0-4.fc43.x86_64 Module libattr.so.1 from rpm attr-2.5.2-6.fc43.x86_64 Module libbrotlicommon.so.1 from rpm brotli-1.1.0-10.fc43.x86_64 Module libevent-2.1.so.7 from rpm libevent-2.1.12-16.fc43.x86_64 Module libkeyutils.so.1 from rpm keyutils-1.6.3-6.fc43.x86_64 Module libcom_err.so.2 from rpm e2fsprogs-1.47.3-2.fc43.x86_64 Module libacl.so.1 from rpm acl-2.3.2-4.fc43.x86_64 Module libbz2.so.1 from rpm bzip2-1.0.8-21.fc43.x86_64 Module libjson-c.so.5 from rpm json-c-0.18-7.fc43.x86_64 Module libfyaml.so.0 from rpm libfyaml-0.8-8.fc43.x86_64 Stack trace of thread 888: #0 0x00007fe6c442d1ca n/a (/usr/lib64/libdnf.so.2 (deleted) + 0x8b1ca) #1 0x00007fe6c441aa72 n/a (/usr/lib64/libdnf.so.2 (deleted) + 0x78a72) #2 0x00007fe6c441b0c9 n/a (/usr/lib64/libdnf.so.2 (deleted) + 0x790c9) #3 0x00007fe6bee34da7 rpmtsNotify (/usr/lib64/librpm.so.10.2.1 + 0x3bda7) #4 0x00007fe6bee23eee rpmpsmRun.part.0 (/usr/lib64/librpm.so.10.2.1 + 0x2aeee) #5 0x00007fe6bee37e3d rpmteProcess (/usr/lib64/librpm.so.10.2.1 + 0x3ee3d) #6 0x00007fe6bee438bd rpmtsRun (/usr/lib64/librpm.so.10.2.1 + 0x4a8bd) #7 0x00007fe6c441ecaa n/a (/usr/lib64/libdnf.so.2 (deleted) + 0x7ccaa) #8 0x00007fe6d34ffb40 n/a (/usr/lib64/packagekit-backend/libpk_backend_dnf.so (deleted) + 0x6b40) #9 0x00007fe6d35055e4 n/a (/usr/lib64/packagekit-backend/libpk_backend_dnf.so (deleted) + 0xc5e4) #10 0x000056095ced5236 n/a (/usr/libexec/packagekitd (deleted) + 0x1f236) #11 0x00007fe6d3f35662 g_thread_proxy (/usr/lib64/libglib-2.0.so.0.8400.4 + 0x74662) #12 0x00007fe6d3823f54 n/a (/usr/lib64/libc.so.6 (deleted) + 0x71f54) #13 0x00007fe6d38a732c n/a (/usr/lib64/libc.so.6 (deleted) + 0xf532c) ELF object binary architecture: AMD x86-64 Created attachment 2111577 [details]
Package list
now we have a coredump on disk, can you run `coredumpctl gdb` on it and get a full backtrace? Thanks for looking into this. (In reply to Adam Williamson from comment #7) > now we have a coredump on disk, can you run `coredumpctl gdb` on it and get > a full backtrace? This doesn't actually work, because the files were upgraded in the meantime and gdb just gives me a bunch of ?? symbols. However, I took a snapshot before upgrading, so I mounted that, copied the core dump into a chroot, and ran gdb with the old files. It still had severla warnings of "Can't open file /usr/lib64/>something> (deleted) during file-backed mapping note processing" but it was able to produce a backtrace: #0 libdnf::TransactionItemBase::setState (this=0x0, value=libdnf::TransactionItemState::DONE) at /usr/src/debug/libdnf-0.74.0-1.fc42.x86_64/libdnf/transaction/../transaction/TransactionItem.hpp:75 #1 libdnf::Swdb::setItemDone (this=0x7fe6a8007a60, nevra=...) at /usr/src/debug/libdnf-0.74.0-1.fc42.x86_64/libdnf/transaction/Swdb.cpp:257 #2 0x00007fe6c441aa72 in _swdb_transaction_item_progress (swdb=swdb@entry=0x7fe6a8007a60, pkg=<optimized out>) at /usr/src/debug/libdnf-0.74.0-1.fc42.x86_64/libdnf/dnf-transaction.cpp:538 #3 0x00007fe6c441b0c9 in dnf_transaction_ts_progress_cb (arg=<optimized out>, what=RPMCALLBACK_UNINST_STOP, amount=3, total=3, key=0x0, data=<optimized out>) at /usr/src/debug/libdnf-0.74.0-1.fc42.x86_64/libdnf/dnf-transaction.cpp:775 #4 0x00007fe6bee34da7 in rpmtsNotify (ts=0x56097f3f6d30, te=0x7fe69cc475a0, what=RPMCALLBACK_UNINST_STOP, amount=3, total=3) at /usr/src/debug/rpm-4.20.1-1.fc42.x86_64/lib/rpmts.c:1021 #5 0x00007fe6bee23eee in rpmpsmRemove (psm=0x7fe6a0df1ce0) at /usr/src/debug/rpm-4.20.1-1.fc42.x86_64/lib/psm.c:831 #6 rpmPackageErase (ts=0x56097f3f6d30, psm=0x7fe6a0df1ce0) at /usr/src/debug/rpm-4.20.1-1.fc42.x86_64/lib/psm.c:984 #7 runGoal (psm=0x7fe6a0df1ce0, goal=<optimized out>) at /usr/src/debug/rpm-4.20.1-1.fc42.x86_64/lib/psm.c:1080 #8 rpmpsmRun (ts=0x56097f3f6d30, te=0x7fe69cc475a0, goal=<optimized out>) at /usr/src/debug/rpm-4.20.1-1.fc42.x86_64/lib/psm.c:1123 #9 0x00007fe6bee37e3d in rpmpsmRun (ts=0x56097f3f6d30, te=0x7fe69cc475a0, goal=PKG_ERASE) at /usr/src/debug/rpm-4.20.1-1.fc42.x86_64/lib/rpmte.c:828 #10 rpmteProcess (te=0x7fe69cc475a0, goal=PKG_ERASE, num=<optimized out>) at /usr/src/debug/rpm-4.20.1-1.fc42.x86_64/lib/rpmte.c:824 #11 0x00007fe6bee438bd in rpmtsProcess (ts=0x56097f3f6d30) at /usr/src/debug/rpm-4.20.1-1.fc42.x86_64/lib/transaction.c:1637 #12 rpmtsRun (ts=0x56097f3f6d30, okProbs=<optimized out>, ignoreSet=<optimized out>) at /usr/src/debug/rpm-4.20.1-1.fc42.x86_64/lib/transaction.c:1859 #13 0x00007fe6c441ecaa in dnf_transaction_commit (transaction=0x56097f5890a0, goal=<optimized out>, state=state@entry=0x7fe6a80c2d40, error=error@entry=0x7fe6bdb63be8) at /usr/src/debug/libdnf-0.74.0-1.fc42.x86_64/libdnf/dnf-transaction.cpp:1489 #14 0x00007fe6d34ffb40 in pk_backend_transaction_download_commit (job=0x56097f3f4f00, state=0x7fe6a80c2d40, error=0x7fe6bdb63be8) at ../backends/dnf/pk-backend-dnf.c:2415 #15 pk_backend_transaction_run (job=job@entry=0x56097f3f4f00, state=0x7fe6a8042060, error=error@entry=0x7fe6bdb63be8) at ../backends/dnf/pk-backend-dnf.c:2608 #16 0x00007fe6d35055e4 in pk_backend_upgrade_system_thread (job=0x56097f3f4f00, params=<optimized out>, user_data=<optimized out>) at ../backends/dnf/pk-backend-dnf.c:3496 #17 0x000056095ced5236 in pk_backend_job_thread_setup (thread_data=0x56097f5e17e0) at ../src/pk-backend-job.c:743 #18 0x00007fe6d3f35662 in g_thread_proxy (data=0x56097f5e0b10) at ../glib/gthread.c:893 #19 0x00007fe6d3823f54 in start_thread (arg=<optimized out>) at pthread_create.c:448 #20 0x00007fe6d38a732c in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 The full backtrace is also attached. It appears to be some problem processing wine-dxvk-d3d9 in libdnf. Created attachment 2111587 [details]
gdb bt full output
gdb is supposed to use a remote debuginfo store now so it *should* be able to find the symbols even for libraries you don't have installed. but hey, important thing is you got the trace - thanks! hopefully we can figure something out about it soon. the fact that "this" seems to be null in frame 0 is an obvious potential starting point, I guess... Wow, nice job Elliott. This is excellent! That's more than enough evidence to move this to libdnf. Interesting that it always happens (In reply to Adam Williamson from comment #10) > the fact that "this" seems to be null in frame 0 is an obvious potential > starting point, I guess... Probably, yes. (A couple weeks ago I would have said *definitely* rather than *probably*. But very recently, I've twice now seen this=0x0 in nonsensical places in WebKit crash stack traces, where I strongly suspect the this=0x0 is not actually related to the crash. I'm guessing it's some sort of new compiler optimization, and we can no longer always fully trust the stack trace.) So setItemDone is pretty simple:
Swdb::setItemDone(const std::string &nevra)
{
if (!transactionInProgress) {
throw std::logic_error(_("No transaction in progress"));
}
auto item = itemsInProgress[nevra];
item->setState(TransactionItemState::DONE);
item->saveState();
}
if we decide to trust the frame 0 NULL at least for now, that would suggest that `auto item = itemsInProgress[nevra]` is giving us a NULL, i.e. whatever nevra we're looking up here isn't actually in itemsInProgress. Not sure if that means we should guard against NULL here and just bail out if we get it, or if nevra always *should* be in itemsInProgress and we need to figure out why it isn't in this case.
wait. is nevra *literally* the string "..." in frame 2? if so, uh, I'd say that's very likely the problem here... er, frame 1. No, nevra would be in quotes if that was what it was. It's ellipsis because it's optimized out in that frame. You should look at the bt full output; the nevra is "wine-dxvk-d3d9-2.6.2-1.fc42.x86_64" in frame 2. I forgot about the fedorapeople space; if you want to look at the core dump, you can find it here: https://qulogic.fedorapeople.org/core.packagekitd.0.5c8cf6725ca743e0a64518fc9ab92b95.808.1761871161000000.zst duh, yeah, thanks, slow morning. :D so, yeah, looking at the full backtrace, especially the first bit:
#0 libdnf::TransactionItemBase::setState (this=0x0, value=libdnf::TransactionItemState::DONE) at /usr/src/debug/libdnf-0.74.0-1.fc42.x86_64/libdnf/transaction/../transaction/TransactionItem.hpp:75
No locals.
#1 libdnf::Swdb::setItemDone (this=0x7fe6a8007a60, nevra=...) at /usr/src/debug/libdnf-0.74.0-1.fc42.x86_64/libdnf/transaction/Swdb.cpp:257
item = std::shared_ptr<libdnf::TransactionItem> (empty) = {get() = 0x0}
#2 0x00007fe6c441aa72 in _swdb_transaction_item_progress (swdb=swdb@entry=0x7fe6a8007a60, pkg=<optimized out>) at /usr/src/debug/libdnf-0.74.0-1.fc42.x86_64/libdnf/dnf-transaction.cpp:538
nevra = 0x7fe6a8322ff0 "wine-dxvk-d3d9-2.6.2-1.fc42.x86_64"
the only thing I think can be happening is that we look up `itemsInProgress["wine-dxvk-d3d9-2.6.2-1.fc42.x86_64"]` but it's not there. itemsInProgress is a C++ map, and per https://en.cppreference.com/w/cpp/container/map.html , "Using operator[] with non-existent key always performs an insert" - so when that's the case, we won't crash or anything, we'll instead add a new entry of the appropriate type (which is TransactionItemPtr) to the map. That seems to match up with "item = std::shared_ptr<libdnf::TransactionItem> (empty) = {get() = 0x0}" in the backtrace, I think.
This is the thing that populates itemsInProgress:
// save rpm items to map to resolve RPM callbacks
for (auto item : transactionInProgress->getItems()) {
auto transItem = item->getItem();
if (transItem->getItemType() != ItemType::RPM) {
continue;
}
auto rpmItem = std::dynamic_pointer_cast< RPMItem >(transItem);
itemsInProgress[rpmItem->getNEVRA()] = item;
}
That (and the stuff behind it, transactionInProgress->getItems) all *looks* fairly...sensible, none of it seems to have been changed lately. I can't find anything obvious that ever takes things *out* of itemsInProgress - initTransaction and closeTransaction clear it entirely, but obviously we shouldn't be suddenly doing either of those when we're clearly in the middle of the transaction. So, hmm. This isn't anything obvious at least :/
It seems the reproducer can be reduced to: 1. Stock F42 install 2. Install wine-core.x86_64, wine-core.i686 , wine-dxvk-d3d9 (which will pull in both arches and all versions of dxvk as deps) 3. Delete the symlinks 4. Run an upgrade from GNOME Software i.e. the specific list of installed packages isn't needed, and neither is enabling rpmfusion. Just having the wine packages installed and wiping the symlinks causes the crash. So, first thing, I'm taking that advice out of the common bug. Then I'll try and use this to figure out what's going on... OK, and I can reduce the reproducer further now to: 1. Use Software (or anything else PK-based) to upgrade from F42 to F43 with the wine-dxvk packages installed You can test this without removing any symlinks by doing a clean F42 install+update then doing: rpm -ivh --nodeps https://kojipkgs.fedoraproject.org//packages/wine-dxvk/2.6.2/1.fc42/i686/wine-dxvk-2.6.2-1.fc42.i686.rpm https://kojipkgs.fedoraproject.org//packages/wine-dxvk/2.6.2/1.fc42/i686/wine-dxvk-d3d10-2.6.2-1.fc42.i686.rpm https://kojipkgs.fedoraproject.org//packages/wine-dxvk/2.6.2/1.fc42/i686/wine-dxvk-d3d8-2.6.2-1.fc42.i686.rpm https://kojipkgs.fedoraproject.org//packages/wine-dxvk/2.6.2/1.fc42/i686/wine-dxvk-d3d9-2.6.2-1.fc42.i686.rpm https://kojipkgs.fedoraproject.org//packages/wine-dxvk/2.6.2/1.fc42/i686/wine-dxvk-dxgi-2.6.2-1.fc42.i686.rpm https://kojipkgs.fedoraproject.org//packages/wine-dxvk/2.6.2/1.fc42/x86_64/wine-dxvk-2.6.2-1.fc42.x86_64.rpm https://kojipkgs.fedoraproject.org//packages/wine-dxvk/2.6.2/1.fc42/x86_64/wine-dxvk-d3d10-2.6.2-1.fc42.x86_64.rpm https://kojipkgs.fedoraproject.org//packages/wine-dxvk/2.6.2/1.fc42/x86_64/wine-dxvk-d3d8-2.6.2-1.fc42.x86_64.rpm https://kojipkgs.fedoraproject.org//packages/wine-dxvk/2.6.2/1.fc42/x86_64/wine-dxvk-d3d9-2.6.2-1.fc42.x86_64.rpm https://kojipkgs.fedoraproject.org//packages/wine-dxvk/2.6.2/1.fc42/x86_64/wine-dxvk-dxgi-2.6.2-1.fc42.x86_64.rpm i.e. install the packages with --nodeps, so you can get them installed *without* wine-core also being installed. Then you will be able to get an upgrade to start, but it will crash. If you don't use --nodeps you kinda can't get the upgrade to start with the dxvk packages installed unless you remove the symlinks. But this way, we can show that the important factor is "having the dxvk packages installed", not "wiping the symlinks". Notably, when preparing the upgrade, Software warns you that it will have to remove all the dxvk packages as part of the upgrade: " Incompatible Software Installed software is incompatible with Fedora Linux 43, and will be automatically removed during upgrade. wine-dxvk wine-dxvk-d3d10 wine-dxvk-d3d8 wine-dxvk-d3d9 wine-dxvk-dxgi " Notably it doesn't list an *arch* even though we have two arches of each of those packages installed. This might be significant, not sure. |
Created attachment 2111512 [details] Output from sudo journalctl -xb from the failed installatoin boot Description of problem: When Running the System Update, in the offline phase (after the reboot) the updated crashed while below 100%, rebootet and left the system in a state without running graphical environment and multiple packages installed twice (in the fc42 and fc43 variant). Version-Release number of selected component (if applicable): Update von FC42 to FC43 How reproducible: * Starting FC43 Update * rebooting * don't know (if it is really reproducible) * see attached log Actual results: Unfinished upgrade Expected results: Finished upgrade Additional info: Running sudo dnf system-upgrade download --releasever=43 --setopt=protected_packages= (and setting in protected_packages= in /etc/dnf/dnf.conf) an afterwards doing a reinstall of all packages with status source=unknown brought the system up again. Got help on the Feodra Matrix channel, where it also was suggested that I file this bug.