Description of problem: I updated to dnf-4.2.7-1.fc30 and libdnf-0.35.1-1.fc30. I logged into Plasma on Wayland from sddm. My network connection was not connecting to the ISP, so I reset the router. PackageKit was apparently checking for updates when the crash occurred. The segmentation fault was as follows using coredumpctl gdb Core was generated by `/usr/libexec/packagekitd'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f5fcc4f4b1b in libdnf::Repo::Impl::detachLibsolvRepo (this=0x5645e466df00) at /usr/src/debug/libdnf-0.35.1-1.fc30.x86_64/libdnf/repo/Repo.cpp:1347 1347 libsolvRepo->appdata = nullptr; [Current thread is 1 (Thread 0x7f5fdb33ce80 (LWP 1692))] libsolvRepo->appdata pointed to an inaccessible address which I think led to the segmentation fault. (gdb) p libsolvRepo->appdata Cannot access memory at address 0x722c636578656f7e (gdb) p libsolvRepo $1 = (libdnf::LibsolvRepo *) 0x722c636578656f6e The problem is likely in libdnf-0.35.1-1. I hadn't seen this crash before the libdnf-0.35.1-1 update. Version-Release number of selected component: PackageKit-1.1.12-5.fc30 Additional info: reporter: libreport-2.10.1 backtrace_rating: 4 cmdline: /usr/libexec/packagekitd crash_function: libdnf::Repo::Impl::detachLibsolvRepo executable: /usr/libexec/packagekitd journald_cursor: s=44db0dbaae674dc1b5fb807f38e5b4ae;i=5e6ff;b=6f332d81a7de4103815dc9c2dae2f3a7;m=172bd666;t=58cf00fee399f;x=8cc53fc8564fabf7 kernel: 5.1.16-300.fc30.x86_64 rootdir: / runlevel: N 5 type: CCpp uid: 0 Truncated backtrace: Thread no. 1 (10 frames) #0 libdnf::Repo::Impl::detachLibsolvRepo at /usr/src/debug/libdnf-0.35.1-1.fc30.x86_64/libdnf/repo/Repo.cpp:1347 #1 dnf_sack_finalize at /usr/src/debug/libdnf-0.35.1-1.fc30.x86_64/libdnf/dnf-sack.cpp:136 #4 dnf_sack_cache_item_free at pk-backend-dnf.c:123 #5 g_hash_table_remove_all_nodes at ../glib/ghash.c:706 #7 g_hash_table_unref at ../glib/ghash.c:1457 #8 pk_backend_destroy at pk-backend-dnf.c:311 #9 pk_backend_unload at pk-backend.c:601 #10 pk_engine_finalize at pk-engine.c:1874 #13 glib_autoptr_clear_PkEngine at pk-engine.h:56 #14 glib_autoptr_cleanup_PkEngine at pk-engine.h:56
Created attachment 1587739 [details] File: backtrace
Created attachment 1587740 [details] File: cgroup
Created attachment 1587741 [details] File: core_backtrace
Created attachment 1587742 [details] File: cpuinfo
Created attachment 1587743 [details] File: dso_list
Created attachment 1587744 [details] File: environ
Created attachment 1587745 [details] File: exploitable
Created attachment 1587746 [details] File: limits
Created attachment 1587747 [details] File: maps
Created attachment 1587748 [details] File: mountinfo
Created attachment 1587749 [details] File: open_fds
Created attachment 1587750 [details] File: proc_pid_status
Created attachment 1587751 [details] File: var_log_messages
Reassigning to libdnf as this was triggered by a libdnf update. Now it was initially assigned to PK and re-assigned to libdnf, both sets of maintainers are CCed anyhow. We can reassign elsewhere later if it turns out the fix needs to be somewhere else.
I got a similar trace from an openQA job: https://openqa.fedoraproject.org/tests/418910 #0 0x00007f0e5b72bb6b in libdnf::Repo::Impl::detachLibsolvRepo() (this=0x7f0e4c044d80) at /usr/src/debug/libdnf-0.35.1-1.fc31.x86_64/libdnf/repo/Repo.cpp:1347 #1 0x00007f0e5b684c47 in dnf_sack_finalize(GObject*) (object=0x7f0e4c0058e0) at /usr/src/debug/libdnf-0.35.1-1.fc31.x86_64/libdnf/dnf-sack.cpp:136 hrepo = <optimized out> sack = <optimized out> priv = 0x7f0e4c005830 pool = 0x7f0e4c005040 repo = <optimized out> i = <optimized out> #2 0x00007f0e6f442cc0 in g_object_unref () at /lib64/libgobject-2.0.so.0 #3 0x00007f0e60086370 in dnf_sack_cache_item_free (cache_item=0x7f0e4e9998c0) at pk-backend-dnf.c:123 #4 0x00007f0e6f3409b4 in g_hash_table_remove_internal () at /lib64/libglib-2.0.so.0 #5 0x00007f0e60086c97 in dnf_utils_create_sack_for_filters (job=0x55d222ca0e00, filters=<optimized out>, create_flags=<optimized out>, state=0x7f0e4c003480, error=0x7f0e592dcd10) at pk-backend-dnf.c:699 locker = 0x55d222a45508 ret = <optimized out> flags = 13 cache_item = 0x7f0e4e9998c0 state_local = <optimized out> backend = <optimized out> job_data = 0x55d222c02980 priv = 0x55d222a454f0 cache_key = 0x7f0e4e2878a0 "DnfSack::release_ver[31]::filelists|remote|unavailable" install_root = 0x0 solv_dir = 0x0 sack = 0x0 #6 0x00007f0e60088c5f in backend_get_details_thread (job=0x55d222ca0e00, params=<optimized out>, user_data=<optimized out>) at pk-backend-dnf.c:1846 ret = <optimized out> i = <optimized out> state_local = <optimized out> pkg = <optimized out> job_data = 0x55d222c02980 filters = 8 package_ids = 0x55d222b83d00 sack = 0x0 error = 0x0 hash = 0x0 __func__ = "backend_get_details_thread" #7 0x000055d2210ea6ae in pk_backend_job_thread_setup (thread_data=0x55d222ca20e0) at pk-backend-job.c:726 helper = 0x55d222ca20e0 #8 0x00007f0e6f37c962 in g_thread_proxy () at /lib64/libglib-2.0.so.0 #9 0x00007f0e6f21e4e2 in start_thread () at /lib64/libpthread.so.0 #10 0x00007f0e6f14e623 in clone () at /lib64/libc.so.6 It differs a bit from frame #4 onwards, but it looks like probably the same basic thing hit on a different path...
Proposing as an F31 Beta blocker. It looks like this and https://bugzilla.redhat.com/show_bug.cgi?id=1727424 might both happen at any time with this libdnf / PackageKit combination .
Thanks Adam. I've seen two more segmentation faults of packagekitd with the same traces as I reported. The second crash appeared to be a null pointer dereference at /usr/src/debug/libdnf-0.35.1-1.fc30.x86_64/libdnf/repo/Repo.cpp:1347 since (gdb) p libsolvRepo->appdata Cannot access memory at address 0x10 (gdb) p libsolvRepo $1 = (libdnf::LibsolvRepo *) 0x0 The third crash seemed to be at an inaccessible address which looked too short to be valid. (gdb) p libsolvRepo->appdata Cannot access memory at address 0x656771 (gdb) p libsolvRepo $1 = (libdnf::LibsolvRepo *) 0x656761 Each of the 3 crashes happened while packagekitd was running automatically in the background at the same time as a journal message indicating that packagekitd quit. I didn't interact with the plasma software updates applet in the task bar or plasma-discover when the crashes happened. Jul 05 10:39:29 PackageKit[1692]: daemon quit Jul 05 10:39:29 audit[1692]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 subj=system_u:system_r:rpm_t:s0 pid=1692 comm="packagekitd" exe="/usr/libexec/packagekitd" sig=11 res=1 Jul 05 10:39:29 kernel: show_signal: 41 callbacks suppressed Jul 05 10:39:29 kernel: traps: packagekitd[1692] general protection fault ip:7f5fcc4f4b1b sp:7fff3ffdfb18 error:0 in libdnf.so.2[7f5fcc40d000+f8000] Jul 05 10:39:29 systemd[1]: Created slice system-systemd\x2dcoredump.slice. Jul 05 10:39:29 systemd[1]: Started Process Core Dump (PID 1951/UID 0). Jul 05 10:39:29 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@0-1951-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Jul 05 10:39:29 systemd[1]: packagekit.service: Main process exited, code=dumped, status=11/SEGV Jul 05 10:39:29 systemd[1]: packagekit.service: Failed with result 'core-dump'. Jul 05 10:39:29 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=packagekit comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed' Jul 05 10:39:30 abrt-dump-journal-oops[1020]: abrt-dump-journal-oops: Found oopses: 1 Jul 05 10:39:30 abrt-dump-journal-oops[1020]: abrt-dump-journal-oops: Creating problem directories Jul 05 10:39:30 systemd-coredump[1952]: Process 1692 (packagekitd) of user 0 dumped core. Stack trace of thread 1692: #0 0x00007f5fcc4f4b1b _ZN6libdnf4Repo4Impl17detachLibsolvRepoEv (libdnf.so.2) #1 0x00007f5fcc44dc27 n/a (libdnf.so.2) #2 0x00007f5fdaf72cf0 g_object_unref (libgobject-2.0.so.0) #3 0x00007f5fcc62b380 n/a (libpk_backend_dnf.so) #4 0x00007f5fdae6ea42 g_hash_table_remove_all_nodes (libglib-2.0.so.0) #5 0x00007f5fdae6fe43 g_hash_table_remove_all_nodes (libglib-2.0.so.0) #6 0x00007f5fcc632d13 pk_backend_destroy (libpk_backend_dnf.so) #7 0x00005645e3d7f285 pk_backend_unload (packagekitd) #8 0x00005645e3d89b05 n/a (packagekitd) #9 0x00007f5fdaf72cf0 g_object_unref (libgobject-2.0.so.0) #10 0x00005645e3d711b2 main (packagekitd) #11 0x00007f5fdab95f33 __libc_start_main (libc.so.6) #12 0x00005645e3d713fe _start (packagekitd) Stack trace of thread 1693: #0 0x00007f5fdac625c7 __GI___poll (libc.so.6) #1 0x00007f5fdae821de g_main_context_poll (libglib-2.0.so.0) #2 0x00007f5fdae82313 g_main_context_iteration (libglib-2.0.so.0) #3 0x00007f5fdae82361 glib_worker_main (libglib-2.0.so.0) #4 0x00007f5fdaeab4e2 g_thread_proxy (libglib-2.0.so.0) #5 0x00007f5fdad425a2 start_thread (libpthread.so.0) #6 0x00007f5fdac6d303 __clone (libc.so.6) Stack trace of thread 1694: #0 0x00007f5fdac625c7 __GI___poll (libc.so.6) #1 0x00007f5fdae821de g_main_context_poll (libglib-2.0.so.0) #2 0x00007f5fdae825a3 g_main_loop_run (libglib-2.0.so.0) #3 0x00007f5fdb0d0d5a gdbus_shared_thread_func (libgio-2.0.so.0) #4 0x00007f5fdaeab4e2 g_thread_proxy (libglib-2.0.so.0) #5 0x00007f5fdad425a2 start_thread (libpthread.so.0) #6 0x00007f5fdac6d303 __clone (libc.so.6) Jul 05 10:39:30 systemd[1]: systemd-coredump: Succeeded. Jul 05 10:39:30 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@0-1951-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Jul 05 10:39:31 abrt-server[1953]: Package 'kernel-core' isn't signed with proper key Jul 05 10:39:31 abrt-server[1953]: 'post-create' on '/var/spool/abrt/oops-2019-07-05-10:39:30-1020-0' exited with 1 Jul 05 10:39:31 abrt-server[1953]: Deleting problem directory '/var/spool/abrt/oops-2019-07-05-10:39:30-1020-0' Jul 05 10:39:31 abrtd[969]: Size of '/var/spool/abrt' >= 5000 MB (MaxCrashReportsSize), deleting old directory 'ccpp-2019-06-28-19:07:10.115445-4776' Jul 05 10:39:31 abrt-dump-journal-oops[1020]: Reported 1 kernel oopses to Abrt
How often is this happening? Is it reproducible on all or most packagekit-initiated updates or is it a rare heisenbug? I'm definitely in favor of at least a +1 FE here, but whether it is a blocker for Beta seems like it should depend on how likely it is to hit (and whether all subsequent update continue to fail or if retry-until-it-works is a workaround). I'd probably come down on the side of Final Blocker though, just for a general fit-and-finish.
This and https://bugzilla.redhat.com/show_bug.cgi?id=1727424 have the same basic cause, and every GNOME Software and FreeIPA Cockpit enrolment test crashes on one or the other.
Discussed during the 2019-07-15 blocker review meeting: [1] The decision to classify this bug as an "AcceptedBlocker" was made as it violates the following criteria: "The installed system must be able appropriately to install software with the default tool for the relevant software type in all release-blocking desktops" [1] https://meetbot.fedoraproject.org/fedora-blocker-review/2019-07-15/f31-blocker-review.2019-07-15-16.05.txt
Stephen, I've seen these packagekit crashes with the same traces 10 out of 120 times that the PackageKit daemon quit message was shown in my journal since I updated to libdnf-0.35.1-1. I wasn't using a program that uses packagekitd directly when the crashes happened. packagekitd was just running in the background. The frequency of the crashes are likely to be higher if programs using packagekitd are being actively used as Adam found. In two of the crashes the pointer libsolvRepo was null. In the other crashes, libsolvRepo was a address shorter than the other non-null addresses, and so libsolvRepo might have been corrupted. Some of my journals in that time period since July 5 were corrupted due to the system having become unresponsive due to a kernel panic or other reasons when I did a hard shutdown. I tried to run valgrind on packagekitd, but I haven't run it enough times to see the crash yet while doing so. I could keep trying to run packagekitd under valgrind until it crashes if it would be informative.
(In reply to Matt Fagnani from comment #21) > Stephen, I've seen these packagekit crashes with the same traces 10 out of > 120 times that the PackageKit daemon quit message was shown in my journal > since I updated to libdnf-0.35.1-1. I wasn't using a program that uses > packagekitd directly when the crashes happened. packagekitd was just running > in the background. The frequency of the crashes are likely to be higher if > programs using packagekitd are being actively used as Adam found. In two of > the crashes the pointer libsolvRepo was null. In the other crashes, > libsolvRepo was a address shorter than the other non-null addresses, and so > libsolvRepo might have been corrupted. Some of my journals in that time > period since July 5 were corrupted due to the system having become > unresponsive due to a kernel panic or other reasons when I did a hard > shutdown. I tried to run valgrind on packagekitd, but I haven't run it > enough times to see the crash yet while doing so. I could keep trying to run > packagekitd under valgrind until it crashes if it would be informative. If you can get us a valgrind output demonstrating the crash, that would probably be ideal. It definitely looks as if the problem is memory-related.
Um. I think the cause is probably just going to be the same as https://bugzilla.redhat.com/show_bug.cgi?id=1727424 , which we already bisected? see https://bugzilla.redhat.com/show_bug.cgi?id=1727424#c9 and following discussion.
Hello, Jaroslav Rohel made a patch: https://github.com/rpm-software-management/libdnf/pull/761 And here are scratch builds: rawhide: https://koji.fedoraproject.org/koji/taskinfo?taskID=36317514 Fedora 30: https://koji.fedoraproject.org/koji/taskinfo?taskID=36317358 Can you please confirm the issue is fixed there? Thank you.
Pavla, I haven't seen the packagekitd segmentation fault in 23 times that the PackageKit daemon quit message was shown in my journal since I updated to libdnf-0.35.1-2.fc30. I saw the packagekitd crash 11/153 times the PackageKit daemon quit message happened with libdnf-0.35.1-1. I think the crash is likely to be fixed in libdnf-0.35.1-2 as it probably would have happened by now if it wasn't given that it happened about 1/14 of the time with libdnf-0.35.1-1. Thanks.
FEDORA-2019-672a74d688 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-672a74d688
dnf-4.2.7-2.fc30, libdnf-0.35.1-2.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-672a74d688
The fix hit Rawhide already and tests are passing, so dropped the F31 blocker metadata.
dnf-4.2.7-2.fc30, libdnf-0.35.1-2.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.