Bug 1727343 - [abrt] PackageKit: libdnf::Repo::Impl::detachLibsolvRepo(): packagekitd killed by SIGSEGV
Summary: [abrt] PackageKit: libdnf::Repo::Impl::detachLibsolvRepo(): packagekitd kille...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: libdnf
Version: 30
Hardware: x86_64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jaroslav Rohel
QA Contact: Fedora Extras Quality Assurance
URL: https://retrace.fedoraproject.org/faf...
Whiteboard: abrt_hash:6562f7186c10d849b8ebca6f869...
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-05 15:25 UTC by Matt Fagnani
Modified: 2019-07-30 01:14 UTC (History)
15 users (show)

Fixed In Version: libdnf-0.35.1-2.fc30
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-07-30 01:14:38 UTC


Attachments (Terms of Use)
File: backtrace (19.19 KB, text/plain)
2019-07-05 15:25 UTC, Matt Fagnani
no flags Details
File: cgroup (308 bytes, text/plain)
2019-07-05 15:25 UTC, Matt Fagnani
no flags Details
File: core_backtrace (7.01 KB, text/plain)
2019-07-05 15:25 UTC, Matt Fagnani
no flags Details
File: cpuinfo (1.40 KB, text/plain)
2019-07-05 15:25 UTC, Matt Fagnani
no flags Details
File: dso_list (7.50 KB, text/plain)
2019-07-05 15:25 UTC, Matt Fagnani
no flags Details
File: environ (190 bytes, text/plain)
2019-07-05 15:25 UTC, Matt Fagnani
no flags Details
File: exploitable (82 bytes, text/plain)
2019-07-05 15:26 UTC, Matt Fagnani
no flags Details
File: limits (1.29 KB, text/plain)
2019-07-05 15:26 UTC, Matt Fagnani
no flags Details
File: maps (47.39 KB, text/plain)
2019-07-05 15:26 UTC, Matt Fagnani
no flags Details
File: mountinfo (3.69 KB, text/plain)
2019-07-05 15:26 UTC, Matt Fagnani
no flags Details
File: open_fds (3.24 KB, text/plain)
2019-07-05 15:26 UTC, Matt Fagnani
no flags Details
File: proc_pid_status (1.28 KB, text/plain)
2019-07-05 15:26 UTC, Matt Fagnani
no flags Details
File: var_log_messages (84 bytes, text/plain)
2019-07-05 15:26 UTC, Matt Fagnani
no flags Details

Description Matt Fagnani 2019-07-05 15:25:51 UTC
Description of problem:
I updated to dnf-4.2.7-1.fc30 and libdnf-0.35.1-1.fc30. I logged into Plasma on Wayland from sddm. My network connection was not connecting to the ISP, so I reset the router. PackageKit was apparently checking for updates when the crash occurred.

The segmentation fault was as follows using coredumpctl gdb

Core was generated by `/usr/libexec/packagekitd'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f5fcc4f4b1b in libdnf::Repo::Impl::detachLibsolvRepo (this=0x5645e466df00)
    at /usr/src/debug/libdnf-0.35.1-1.fc30.x86_64/libdnf/repo/Repo.cpp:1347
1347        libsolvRepo->appdata = nullptr;
[Current thread is 1 (Thread 0x7f5fdb33ce80 (LWP 1692))]

libsolvRepo->appdata pointed to an inaccessible address which I think led to the segmentation fault.

(gdb) p libsolvRepo->appdata
Cannot access memory at address 0x722c636578656f7e
(gdb) p libsolvRepo
$1 = (libdnf::LibsolvRepo *) 0x722c636578656f6e

The problem is likely in libdnf-0.35.1-1. I hadn't seen this crash before the libdnf-0.35.1-1 update.


Version-Release number of selected component:
PackageKit-1.1.12-5.fc30

Additional info:
reporter:       libreport-2.10.1
backtrace_rating: 4
cmdline:        /usr/libexec/packagekitd
crash_function: libdnf::Repo::Impl::detachLibsolvRepo
executable:     /usr/libexec/packagekitd
journald_cursor: s=44db0dbaae674dc1b5fb807f38e5b4ae;i=5e6ff;b=6f332d81a7de4103815dc9c2dae2f3a7;m=172bd666;t=58cf00fee399f;x=8cc53fc8564fabf7
kernel:         5.1.16-300.fc30.x86_64
rootdir:        /
runlevel:       N 5
type:           CCpp
uid:            0

Truncated backtrace:
Thread no. 1 (10 frames)
 #0 libdnf::Repo::Impl::detachLibsolvRepo at /usr/src/debug/libdnf-0.35.1-1.fc30.x86_64/libdnf/repo/Repo.cpp:1347
 #1 dnf_sack_finalize at /usr/src/debug/libdnf-0.35.1-1.fc30.x86_64/libdnf/dnf-sack.cpp:136
 #4 dnf_sack_cache_item_free at pk-backend-dnf.c:123
 #5 g_hash_table_remove_all_nodes at ../glib/ghash.c:706
 #7 g_hash_table_unref at ../glib/ghash.c:1457
 #8 pk_backend_destroy at pk-backend-dnf.c:311
 #9 pk_backend_unload at pk-backend.c:601
 #10 pk_engine_finalize at pk-engine.c:1874
 #13 glib_autoptr_clear_PkEngine at pk-engine.h:56
 #14 glib_autoptr_cleanup_PkEngine at pk-engine.h:56

Comment 1 Matt Fagnani 2019-07-05 15:25:54 UTC
Created attachment 1587739 [details]
File: backtrace

Comment 2 Matt Fagnani 2019-07-05 15:25:55 UTC
Created attachment 1587740 [details]
File: cgroup

Comment 3 Matt Fagnani 2019-07-05 15:25:56 UTC
Created attachment 1587741 [details]
File: core_backtrace

Comment 4 Matt Fagnani 2019-07-05 15:25:57 UTC
Created attachment 1587742 [details]
File: cpuinfo

Comment 5 Matt Fagnani 2019-07-05 15:25:58 UTC
Created attachment 1587743 [details]
File: dso_list

Comment 6 Matt Fagnani 2019-07-05 15:25:59 UTC
Created attachment 1587744 [details]
File: environ

Comment 7 Matt Fagnani 2019-07-05 15:26:00 UTC
Created attachment 1587745 [details]
File: exploitable

Comment 8 Matt Fagnani 2019-07-05 15:26:01 UTC
Created attachment 1587746 [details]
File: limits

Comment 9 Matt Fagnani 2019-07-05 15:26:02 UTC
Created attachment 1587747 [details]
File: maps

Comment 10 Matt Fagnani 2019-07-05 15:26:03 UTC
Created attachment 1587748 [details]
File: mountinfo

Comment 11 Matt Fagnani 2019-07-05 15:26:04 UTC
Created attachment 1587749 [details]
File: open_fds

Comment 12 Matt Fagnani 2019-07-05 15:26:05 UTC
Created attachment 1587750 [details]
File: proc_pid_status

Comment 13 Matt Fagnani 2019-07-05 15:26:06 UTC
Created attachment 1587751 [details]
File: var_log_messages

Comment 14 Adam Williamson 2019-07-05 18:05:47 UTC
Reassigning to libdnf as this was triggered by a libdnf update. Now it was initially assigned to PK and re-assigned to libdnf, both sets of maintainers are CCed anyhow. We can reassign elsewhere later if it turns out the fix needs to be somewhere else.

Comment 15 Adam Williamson 2019-07-06 04:48:07 UTC
I got a similar trace from an openQA job:

https://openqa.fedoraproject.org/tests/418910

#0  0x00007f0e5b72bb6b in libdnf::Repo::Impl::detachLibsolvRepo() (this=0x7f0e4c044d80) at /usr/src/debug/libdnf-0.35.1-1.fc31.x86_64/libdnf/repo/Repo.cpp:1347
#1  0x00007f0e5b684c47 in dnf_sack_finalize(GObject*) (object=0x7f0e4c0058e0) at /usr/src/debug/libdnf-0.35.1-1.fc31.x86_64/libdnf/dnf-sack.cpp:136
        hrepo = <optimized out>
        sack = <optimized out>
        priv = 0x7f0e4c005830
        pool = 0x7f0e4c005040
        repo = <optimized out>
        i = <optimized out>
#2  0x00007f0e6f442cc0 in g_object_unref () at /lib64/libgobject-2.0.so.0
#3  0x00007f0e60086370 in dnf_sack_cache_item_free (cache_item=0x7f0e4e9998c0) at pk-backend-dnf.c:123
#4  0x00007f0e6f3409b4 in g_hash_table_remove_internal () at /lib64/libglib-2.0.so.0
#5  0x00007f0e60086c97 in dnf_utils_create_sack_for_filters (job=0x55d222ca0e00, filters=<optimized out>, create_flags=<optimized out>, state=0x7f0e4c003480, error=0x7f0e592dcd10) at pk-backend-dnf.c:699
        locker = 0x55d222a45508
        ret = <optimized out>
        flags = 13
        cache_item = 0x7f0e4e9998c0
        state_local = <optimized out>
        backend = <optimized out>
        job_data = 0x55d222c02980
        priv = 0x55d222a454f0
        cache_key = 0x7f0e4e2878a0 "DnfSack::release_ver[31]::filelists|remote|unavailable"
        install_root = 0x0
        solv_dir = 0x0
        sack = 0x0
#6  0x00007f0e60088c5f in backend_get_details_thread (job=0x55d222ca0e00, params=<optimized out>, user_data=<optimized out>) at pk-backend-dnf.c:1846
        ret = <optimized out>
        i = <optimized out>
        state_local = <optimized out>
        pkg = <optimized out>
        job_data = 0x55d222c02980
        filters = 8
        package_ids = 0x55d222b83d00
        sack = 0x0
        error = 0x0
        hash = 0x0
        __func__ = "backend_get_details_thread"
#7  0x000055d2210ea6ae in pk_backend_job_thread_setup (thread_data=0x55d222ca20e0) at pk-backend-job.c:726
        helper = 0x55d222ca20e0
#8  0x00007f0e6f37c962 in g_thread_proxy () at /lib64/libglib-2.0.so.0
#9  0x00007f0e6f21e4e2 in start_thread () at /lib64/libpthread.so.0
#10 0x00007f0e6f14e623 in clone () at /lib64/libc.so.6

It differs a bit from frame #4 onwards, but it looks like probably the same basic thing hit on a different path...

Comment 16 Adam Williamson 2019-07-06 04:54:48 UTC
Proposing as an F31 Beta blocker. It looks like this and https://bugzilla.redhat.com/show_bug.cgi?id=1727424 might both happen at any time with this libdnf / PackageKit combination .

Comment 17 Matt Fagnani 2019-07-07 17:08:04 UTC
Thanks Adam. I've seen two more segmentation faults of packagekitd with the same traces as I reported. The second crash appeared to be a null pointer dereference at /usr/src/debug/libdnf-0.35.1-1.fc30.x86_64/libdnf/repo/Repo.cpp:1347 since
(gdb) p libsolvRepo->appdata
Cannot access memory at address 0x10
(gdb) p libsolvRepo
$1 = (libdnf::LibsolvRepo *) 0x0

The third crash seemed to be at an inaccessible address which looked too short to be valid.
(gdb) p libsolvRepo->appdata
Cannot access memory at address 0x656771
(gdb) p libsolvRepo
$1 = (libdnf::LibsolvRepo *) 0x656761

Each of the 3 crashes happened while packagekitd was running automatically in the background at the same time as a journal message indicating that packagekitd quit. I didn't interact with the plasma software updates applet in the task bar or plasma-discover when the crashes happened.

Jul 05 10:39:29 PackageKit[1692]: daemon quit
Jul 05 10:39:29 audit[1692]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 subj=system_u:system_r:rpm_t:s0 pid=1692 comm="packagekitd" exe="/usr/libexec/packagekitd" sig=11 res=1
Jul 05 10:39:29 kernel: show_signal: 41 callbacks suppressed
Jul 05 10:39:29 kernel: traps: packagekitd[1692] general protection fault ip:7f5fcc4f4b1b sp:7fff3ffdfb18 error:0 in libdnf.so.2[7f5fcc40d000+f8000]
Jul 05 10:39:29 systemd[1]: Created slice system-systemd\x2dcoredump.slice.
Jul 05 10:39:29 systemd[1]: Started Process Core Dump (PID 1951/UID 0).
Jul 05 10:39:29 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@0-1951-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jul 05 10:39:29 systemd[1]: packagekit.service: Main process exited, code=dumped, status=11/SEGV
Jul 05 10:39:29 systemd[1]: packagekit.service: Failed with result 'core-dump'.
Jul 05 10:39:29 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=packagekit comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Jul 05 10:39:30 abrt-dump-journal-oops[1020]: abrt-dump-journal-oops: Found oopses: 1
Jul 05 10:39:30 abrt-dump-journal-oops[1020]: abrt-dump-journal-oops: Creating problem directories
Jul 05 10:39:30 systemd-coredump[1952]: Process 1692 (packagekitd) of user 0 dumped core.
                                        
                                        Stack trace of thread 1692:
                                        #0  0x00007f5fcc4f4b1b _ZN6libdnf4Repo4Impl17detachLibsolvRepoEv (libdnf.so.2)
                                        #1  0x00007f5fcc44dc27 n/a (libdnf.so.2)
                                        #2  0x00007f5fdaf72cf0 g_object_unref (libgobject-2.0.so.0)
                                        #3  0x00007f5fcc62b380 n/a (libpk_backend_dnf.so)
                                        #4  0x00007f5fdae6ea42 g_hash_table_remove_all_nodes (libglib-2.0.so.0)
                                        #5  0x00007f5fdae6fe43 g_hash_table_remove_all_nodes (libglib-2.0.so.0)
                                        #6  0x00007f5fcc632d13 pk_backend_destroy (libpk_backend_dnf.so)
                                        #7  0x00005645e3d7f285 pk_backend_unload (packagekitd)
                                        #8  0x00005645e3d89b05 n/a (packagekitd)
                                        #9  0x00007f5fdaf72cf0 g_object_unref (libgobject-2.0.so.0)
                                        #10 0x00005645e3d711b2 main (packagekitd)
                                        #11 0x00007f5fdab95f33 __libc_start_main (libc.so.6)
                                        #12 0x00005645e3d713fe _start (packagekitd)
                                        
                                        Stack trace of thread 1693:
                                        #0  0x00007f5fdac625c7 __GI___poll (libc.so.6)
                                        #1  0x00007f5fdae821de g_main_context_poll (libglib-2.0.so.0)
                                        #2  0x00007f5fdae82313 g_main_context_iteration (libglib-2.0.so.0)
                                        #3  0x00007f5fdae82361 glib_worker_main (libglib-2.0.so.0)
                                        #4  0x00007f5fdaeab4e2 g_thread_proxy (libglib-2.0.so.0)
                                        #5  0x00007f5fdad425a2 start_thread (libpthread.so.0)
                                        #6  0x00007f5fdac6d303 __clone (libc.so.6)
                                        
                                        Stack trace of thread 1694:
                                        #0  0x00007f5fdac625c7 __GI___poll (libc.so.6)
                                        #1  0x00007f5fdae821de g_main_context_poll (libglib-2.0.so.0)
                                        #2  0x00007f5fdae825a3 g_main_loop_run (libglib-2.0.so.0)
                                        #3  0x00007f5fdb0d0d5a gdbus_shared_thread_func (libgio-2.0.so.0)
                                        #4  0x00007f5fdaeab4e2 g_thread_proxy (libglib-2.0.so.0)
                                        #5  0x00007f5fdad425a2 start_thread (libpthread.so.0)
                                        #6  0x00007f5fdac6d303 __clone (libc.so.6)
Jul 05 10:39:30 systemd[1]: systemd-coredump@0-1951-0.service: Succeeded.
Jul 05 10:39:30 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@0-1951-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jul 05 10:39:31 abrt-server[1953]: Package 'kernel-core' isn't signed with proper key
Jul 05 10:39:31 abrt-server[1953]: 'post-create' on '/var/spool/abrt/oops-2019-07-05-10:39:30-1020-0' exited with 1
Jul 05 10:39:31 abrt-server[1953]: Deleting problem directory '/var/spool/abrt/oops-2019-07-05-10:39:30-1020-0'
Jul 05 10:39:31 abrtd[969]: Size of '/var/spool/abrt' >= 5000 MB (MaxCrashReportsSize), deleting old directory 'ccpp-2019-06-28-19:07:10.115445-4776'
Jul 05 10:39:31 abrt-dump-journal-oops[1020]: Reported 1 kernel oopses to Abrt

Comment 18 Stephen Gallagher 2019-07-15 13:33:28 UTC
How often is this happening? Is it reproducible on all or most packagekit-initiated updates or is it a rare heisenbug?

I'm definitely in favor of at least a +1 FE here, but whether it is a blocker for Beta seems like it should depend on how likely it is to hit (and whether all subsequent update continue to fail or if retry-until-it-works is a workaround).

I'd probably come down on the side of Final Blocker though, just for a general fit-and-finish.

Comment 19 Adam Williamson 2019-07-15 16:05:00 UTC
This and https://bugzilla.redhat.com/show_bug.cgi?id=1727424 have the same basic cause, and every GNOME Software and FreeIPA Cockpit enrolment test crashes on one or the other.

Comment 20 Geoffrey Marr 2019-07-15 19:02:31 UTC
Discussed during the 2019-07-15 blocker review meeting: [1]

The decision to classify this bug as an "AcceptedBlocker" was made as it violates the following criteria:

"The installed system must be able appropriately to install software with the default tool for the relevant software type in all release-blocking desktops"

[1] https://meetbot.fedoraproject.org/fedora-blocker-review/2019-07-15/f31-blocker-review.2019-07-15-16.05.txt

Comment 21 Matt Fagnani 2019-07-15 19:20:56 UTC
Stephen, I've seen these packagekit crashes with the same traces 10 out of 120 times that the PackageKit daemon quit message was shown in my journal since I updated to libdnf-0.35.1-1. I wasn't using a program that uses packagekitd directly when the crashes happened. packagekitd was just running in the background. The frequency of the crashes are likely to be higher if programs using packagekitd are being actively used as Adam found. In two of the crashes the pointer libsolvRepo was null. In the other crashes, libsolvRepo was a address shorter than the other non-null addresses, and so libsolvRepo might have been corrupted. Some of my journals in that time period since July 5 were corrupted due to the system having become unresponsive due to a kernel panic or other reasons when I did a hard shutdown. I tried to run valgrind on packagekitd, but I haven't run it enough times to see the crash yet while doing so. I could keep trying to run packagekitd under valgrind until it crashes if it would be informative.

Comment 22 Stephen Gallagher 2019-07-15 20:36:41 UTC
(In reply to Matt Fagnani from comment #21)
> Stephen, I've seen these packagekit crashes with the same traces 10 out of
> 120 times that the PackageKit daemon quit message was shown in my journal
> since I updated to libdnf-0.35.1-1. I wasn't using a program that uses
> packagekitd directly when the crashes happened. packagekitd was just running
> in the background. The frequency of the crashes are likely to be higher if
> programs using packagekitd are being actively used as Adam found. In two of
> the crashes the pointer libsolvRepo was null. In the other crashes,
> libsolvRepo was a address shorter than the other non-null addresses, and so
> libsolvRepo might have been corrupted. Some of my journals in that time
> period since July 5 were corrupted due to the system having become
> unresponsive due to a kernel panic or other reasons when I did a hard
> shutdown. I tried to run valgrind on packagekitd, but I haven't run it
> enough times to see the crash yet while doing so. I could keep trying to run
> packagekitd under valgrind until it crashes if it would be informative.

If you can get us a valgrind output demonstrating the crash, that would probably be ideal. It definitely looks as if the problem is memory-related.

Comment 23 Adam Williamson 2019-07-15 21:59:35 UTC
Um. I think the cause is probably just going to be the same as https://bugzilla.redhat.com/show_bug.cgi?id=1727424 , which we already bisected? see https://bugzilla.redhat.com/show_bug.cgi?id=1727424#c9 and following discussion.

Comment 24 Pavla Kratochvilova 2019-07-18 07:40:20 UTC
Hello,

Jaroslav Rohel made a patch: https://github.com/rpm-software-management/libdnf/pull/761
And here are scratch builds:
rawhide: https://koji.fedoraproject.org/koji/taskinfo?taskID=36317514
Fedora 30: https://koji.fedoraproject.org/koji/taskinfo?taskID=36317358

Can you please confirm the issue is fixed there? Thank you.

Comment 25 Matt Fagnani 2019-07-22 03:29:42 UTC
Pavla, I haven't seen the packagekitd segmentation fault in 23 times that the PackageKit daemon quit message was shown in my journal since I updated to libdnf-0.35.1-2.fc30. I saw the packagekitd crash 11/153 times the PackageKit daemon quit message happened with libdnf-0.35.1-1. I think the crash is likely to be fixed in libdnf-0.35.1-2 as it probably would have happened by now if it wasn't given that it happened about 1/14 of the time with libdnf-0.35.1-1. Thanks.

Comment 26 Fedora Update System 2019-07-23 07:17:04 UTC
FEDORA-2019-672a74d688 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-672a74d688

Comment 27 Fedora Update System 2019-07-24 01:44:08 UTC
dnf-4.2.7-2.fc30, libdnf-0.35.1-2.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-672a74d688

Comment 28 Adam Williamson 2019-07-29 17:17:59 UTC
The fix hit Rawhide already and tests are passing, so dropped the F31 blocker metadata.

Comment 29 Fedora Update System 2019-07-30 01:14:38 UTC
dnf-4.2.7-2.fc30, libdnf-0.35.1-2.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.