Bug 1626851

Summary: Packagekit coredumps
Product: [Fedora] Fedora Reporter: Ludovic Hirlimann [:Paul-muadib] <ludovic>
Component: PackageKitAssignee: Richard Hughes <rhughes>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 29CC: awilliam, bugzilla, dmach, ignatenko, jonathan, klember, mboddu, rdieter, rhughes, robatino, sgallagh, smparrish, sumitkbhardwaj
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: AcceptedFreezeException
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-13 03:39:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1517011, 1517012    
Attachments:
Description Flags
Journal output at the time of PackageKit crash sumitkbhardwaj: review+

Description Ludovic Hirlimann [:Paul-muadib] 2018-09-09 19:23:25 UTC
After updating to 29, my wayland/X session takes ages to show up after I've logged in. Looking at the logs I found a crash of packagekit as follow:
-- The start-up result is RESULT.
Sep 09 20:48:46 saraan audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@0-117>
Sep 09 20:48:46 saraan systemd[1]: packagekit.service: Main process exited, code=killed, status=11/SEGV
Sep 09 20:48:47 saraan systemd[1]: packagekit.service: Failed with result 'signal'.
Sep 09 20:48:47 saraan audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=packagekit comm="system>
Sep 09 20:48:47 saraan systemd-coredump[11783]: Process 10603 (packagekitd) of user 0 dumped core.
                                                
                                                Stack trace of thread 11781:
                                                #0  0x00007f794c42e3c3 dnf_db_ensure_origin_pkg (libdnf.so.2)
                                                #1  0x00007f794c42e4d5 dnf_db_ensure_origin_pkglist (libdnf.so.2)
                                                #2  0x00007f795b81ee43 n/a (libpk_backend_dnf.so)
                                                #3  0x000055985b45005e n/a (packagekitd)
                                                #4  0x00007f795b3bd6ea n/a (libglib-2.0.so.0)
                                                #5  0x00007f795b26e58e start_thread (libpthread.so.0)
                                                #6  0x00007f795b19d513 __clone (libc.so.6)
                                                
                                                Stack trace of thread 10612:
                                                #0  0x00007f795b192301 __poll (libc.so.6)
                                                #1  0x00007f795b3945e6 n/a (libglib-2.0.so.0)
                                                #2  0x00007f795b3949a2 g_main_loop_run (libglib-2.0.so.0)
                                                #3  0x00007f795b59690a n/a (libgio-2.0.so.0)
                                                #4  0x00007f795b3bd6ea n/a (libglib-2.0.so.0)
                                                #5  0x00007f795b26e58e start_thread (libpthread.so.0)
                                                #6  0x00007f795b19d513 __clone (libc.so.6)
                                                
                                                Stack trace of thread 10609:
                                                #0  0x00007f795b192301 __poll (libc.so.6)
                                                #1  0x00007f795b3945e6 n/a (libglib-2.0.so.0)
                                                #2  0x00007f795b394710 g_main_context_iteration (libglib-2.0.so.0)
                                                #3  0x00007f795b394761 n/a (libglib-2.0.so.0)
                                                #4  0x00007f795b3bd6ea n/a (libglib-2.0.so.0)
                                                #5  0x00007f795b26e58e start_thread (libpthread.so.0)
                                                #6  0x00007f795b19d513 __clone (libc.so.6)
                                                
                                                Stack trace of thread 11769:
                                                #0  0x00007f795b197d6d syscall (libc.so.6)
                                                #1  0x00007f795b3dc28e g_cond_wait_until (libglib-2.0.so.0)
                                                #2  0x00007f795b3661f1 n/a (libglib-2.0.so.0)
                                                #3  0x00007f795b3be232 n/a (libglib-2.0.so.0)
                                                #4  0x00007f795b3bd6ea n/a (libglib-2.0.so.0)
                                                #5  0x00007f795b26e58e start_thread (libpthread.so.0)
                                                #6  0x00007f795b19d513 __clone (libc.so.6)
                                                
                                                Stack trace of thread 10603:
                                                #0  0x00007f795b192301 __poll (libc.so.6)
                                                #1  0x00007f795b3945e6 n/a (libglib-2.0.so.0)
                                                #2  0x00007f795b3949a2 g_main_loop_run (libglib-2.0.so.0)
                                                #3  0x000055985b43ab86 main (packagekitd)
                                                #4  0x00007f795b0c4413 __libc_start_main (libc.so.6)
                                                #5  0x000055985b43ae3a _start (packagekitd)
-- Subject: Process 10603 (packagekitd) dumped core
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- Documentation: man:core(5)
-- 
-- Process 10603 (packagekitd) crashed and dumped core.

Comment 1 Ludovic Hirlimann [:Paul-muadib] 2018-09-09 19:25:22 UTC
-- 
-- This usually indicates a programming error in the crashing program and
-- should be reported to its vendor as a bug.
Sep 09 20:48:47 saraan audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@0-1178>
Sep 09 20:48:48 saraan abrt-notification[11837]: Process 10603 (packagekitd) crashed in dnf_db_ensure_origin_pkg()
-- Subject: ABRT has detected unexpected termination: packagekitd
-- Defined-By: ABRT
-- Support: https://bugzilla.redhat.com/
-- Documentation: man:abrt(1)
-- 
-- packagekitd killed by SIGSEGV
-- 
-- Use the abrt command-line tool for further analysis or to report
-- the problem to the appropriate support site.


It's probably a duplicate but I was unbale to pinpoint which one.

Comment 2 Sumit Bhardwaj 2018-09-11 19:09:21 UTC
+1

I also get this issue, but not on boot, but when I open Gnome Software and try to edit software repositories. 

I too have updated from a fully updated Fedora 28 Workstation to Fedora 29 Workstation via dnf system-upgrade.

Comment 3 Fedora Blocker Bugs Application 2018-09-11 19:10:33 UTC
Proposed as a Blocker for 29-beta by Fedora user krazyabouttechnology using the blocker tracking app because:

 Gnome Software is the primary method of installing updates and software from GUI.

Comment 4 Sumit Bhardwaj 2018-09-11 19:15:09 UTC
Created attachment 1482421 [details]
Journal output at the time of PackageKit crash

Attaching the journal output at the time of the crash and proposing as a blocker.

Comment 5 Adam Williamson 2018-09-11 19:17:55 UTC
This looks like it's crashing in libdnf, so adding dmach and ignatenko to CC.

Is anyone seeing this in a clean F29 install *without* updating libdnf from updates-testing?

Can anyone provide the actual full backtrace (by running abrt and reporting from there, or something)?

Comment 6 Sumit Bhardwaj 2018-09-11 19:28:34 UTC
(In reply to Adam Williamson from comment #5)
> This looks like it's crashing in libdnf, so adding dmach and ignatenko to CC.
> 
> Is anyone seeing this in a clean F29 install *without* updating libdnf from
> updates-testing?
> 
> Can anyone provide the actual full backtrace (by running abrt and reporting
> from there, or something)?

I am trying but i am getting the message 'fedora-29-x86_64 is not supported by the retrace server'. It then asks me to generate trace locally but if I say yes, it starts downloading lot of data. I am on a limited connection so cant do that. Any other way around? And why i am getting the not supported message?

Comment 7 Adam Williamson 2018-09-11 19:35:05 UTC
No other way around until they fix the retrace server config, no :/ I'll try and get that done. Thanks.

Comment 8 Adam Williamson 2018-09-11 19:36:39 UTC
https://github.com/abrt/retrace-server/issues/221

Comment 9 Ludovic Hirlimann [:Paul-muadib] 2018-09-12 11:50:56 UTC
With abrt, I get reporting disabled because the backtrace is unusable.

Comment 10 Kalev Lember 2018-09-12 13:37:22 UTC
Should be fixed by https://github.com/rpm-software-management/libdnf/pull/575

Comment 11 Stephen Gallagher 2018-09-12 14:09:22 UTC
Kalev, what's the practical impact of this? When does a package have no origin? When it's added to the system only by direct RPM commands?

When added by DNF or similar tools, the "repo" should be "commandline", correct?

I'm trying to get a sense of how likely it is for users to encounter this crash. If I'm correct that it can only happen if you have RPMs that were added to the system with `rpm -i`, then I don't think this is a blocker.

I am in favor of a Freeze Exception if we do another compose; the fix is very small, self-contained and looks extremely low-risk.

Comment 12 Sumit Bhardwaj 2018-09-12 14:33:05 UTC
(In reply to Stephen Gallagher from comment #11)
> Kalev, what's the practical impact of this? When does a package have no
> origin? When it's added to the system only by direct RPM commands?
> 
> When added by DNF or similar tools, the "repo" should be "commandline",
> correct?
> 
> I'm trying to get a sense of how likely it is for users to encounter this
> crash. If I'm correct that it can only happen if you have RPMs that were
> added to the system with `rpm -i`, then I don't think this is a blocker.
> 
> I am in favor of a Freeze Exception if we do another compose; the fix is
> very small, self-contained and looks extremely low-risk.

The fix might be small, but on systems with such situation, you cannot use Gnome-Software properly. On the homepage, the editor's picks do not show up for me, its always in loading state.

And if I open the Software Repository selection screen in settings, it shows loading animation indefinitely , and in the journalctl output, PackageKit crashes and dumps core repeatedly. Crash dump folder gets full because of that and further abrt reports are also affected as the older ones getting deleted. 

Since Gnome-Software is part of the core workstation experience and it does not work properly on systems upgraded from F28, I proposed it as blocker. FE is also fine if its fixed in time.

Comment 13 Stephen Gallagher 2018-09-12 14:40:32 UTC
If this issue only occurs after an upgrade, then it could conceivably be a "Special Blocker" (one that we don't need to fix in the frozen package set as long as the fix is present in the stable repository on release day.

If, however, it can happen if you start GNOME Software on the Live media without any uncommon action, then that's clearly a blocker. If the situation is that it can happen any time you do `rpm -i` of a package outside of a repository, then this is serious but could be a Common Bugs entry.

So, I need more information. I'm certainly in favor of a Freeze Exception at the minimum, but right now there isn't enough information to indicate whether it's worth blocking on it.

Comment 14 Kalev Lember 2018-09-12 15:00:53 UTC
(In reply to Stephen Gallagher from comment #11)
> Kalev, what's the practical impact of this? When does a package have no
> origin? When it's added to the system only by direct RPM commands?
> 
> When added by DNF or similar tools, the "repo" should be "commandline",
> correct?
> 
> I'm trying to get a sense of how likely it is for users to encounter this
> crash. If I'm correct that it can only happen if you have RPMs that were
> added to the system with `rpm -i`, then I don't think this is a blocker.

I think it's only packages added with 'rpm -i', but not 100% sure. If it doesn't reproduce on the live cd (I haven't tested), then I think it should be fine to fix it with a 0 day update without respinning the images.

Comment 15 Adam Williamson 2018-09-12 16:08:43 UTC
There is, of course, a chicken-and-egg problem if you hit this: technically I think we still maintain the threadbare fiction that you should be able to use Workstation without ever seeing a terminal...so if GNOME Software is affected by this bug, how do you install the update that fixes it? :)

Still, for a Beta I think it'd be OK to ship the fix as an update and document it.

Comment 16 Kalev Lember 2018-09-12 18:07:13 UTC
What I was trying to say above is that I'd rather respin the images with the libdnf fix if gnome-software doesn't work. gnome-software is really prominent in Workstation and we'd look super bad with this crasher.

However if this only affects distro upgraded systems (or older F29 installs where people have played with rpm -i or similar), then it should be fine to do a 0 day update -- people are going to get it anyway, and it doesn't need to be on the media.

Comment 17 Adam Williamson 2018-09-12 19:00:59 UTC
It's the "people are going to get it anyway" part I am querying. How exactly are they "going to get it" if the bug they hit is "GNOME Software doesn't work"?

I agree that the number of people likely to be affected is important here, but it seems demonstrably the case that anyone who *is* affected by it, needs to know how to use dnf or an alternative GUI package manager to actually get the fix, if we ship it as an update.

Comment 18 Adam Williamson 2018-09-12 19:01:45 UTC
I suppose if you're considering the case of "install a fresh system, then later <do something> that triggers the bug", this is less likely to happen if the fix is available as a 0-day update indeed, assuming the user updates promptly after installation.

Comment 19 Stephen Gallagher 2018-09-12 19:15:26 UTC
(In reply to Adam Williamson from comment #18)
> I suppose if you're considering the case of "install a fresh system, then
> later <do something> that triggers the bug", this is less likely to happen
> if the fix is available as a 0-day update indeed, assuming the user updates
> promptly after installation.

This is the case I was talking about, yes. As far as I can tell, the only way this bug can be triggered is if there are RPMs on the system that have no "origin" recorded in the RPM database. I'm not aware of any way that this can happen besides "At some point, someone bypassed DNF/PackageKit and installed an RPM directly with /usr/bin/rpm".

So unless someone can point out a different way that this can occur, I'm fine with treating this as a Special Blocker and getting an update in stable for the release date.

Comment 20 Fedora Update System 2018-09-12 20:38:34 UTC
libdnf-0.17.0-3.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2018-02feb2485b

Comment 21 Adam Williamson 2018-09-12 20:42:53 UTC
Since we have a fix for this now: I'm -1 blocker, +1 FE at this point. Other votes?

Comment 22 Kalev Lember 2018-09-12 20:44:38 UTC
+1 FE

Comment 23 Chris Murphy 2018-09-12 20:49:24 UTC
+1 beta freeze exception

Comment 24 Mohan Boddu 2018-09-12 21:28:52 UTC
+1 FE for F29 Beta

Comment 25 Adam Williamson 2018-09-12 22:10:52 UTC
That's enough votes for accepted FE status at least.

Comment 26 Fedora Update System 2018-09-13 03:39:12 UTC
libdnf-0.17.0-3.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.

Comment 27 Sumit Bhardwaj 2018-09-13 09:32:03 UTC
Sorry but this is still not fixed for me. The build i currently have on my fully updated pre-beta F29 workstation system is libdnf-0.17.2-1.fc29.

The build in this fix is libdnf-0.17.0-3.fc29. Even if i install it manually from koji, the next dnf upgrade overwrites it immediately. Am I doing something wrong or is the version of this build is wrong?

Comment 28 Chris Murphy 2018-09-14 02:30:23 UTC
I'm confused by comment 12. It says
libdnf-0.17.0-3.fc29
what I have in Fedora 29 since F28->F29 upgrade is
libdnf-0.17.2-1.fc29.x86_64

And a 'dnf update' right now does not show the apparently older 0.17.0-3 listed in comment 26.

Comment 29 Adam Williamson 2018-09-14 02:56:12 UTC
it's a bit messy, as 0.17.2-1 was in u-t already before we did this fix, but we wanted to *just* fix this in the Beta, not pull in all of 0.17.2-1. 0.17.0-3 is a more *recent* build than 0.17.2-1, and has the fix for this where 0.17.2-1 does not.

Kalev, we should probably do a 0.17.2-2 with this fix and edit it into the dnf-3.3.0 / libdnf-0.17.2 update...

Comment 30 Sumit Bhardwaj 2018-09-14 03:07:36 UTC
Actually, i did a distro-sync yesterday after commenting here and that downgraded my dnf, libdnf and couple of other packages to the mentioned version. After that the issue is fixed. So probably moving the fix to the 0.17.2 version will close this for good.

Comment 31 Adam Williamson 2018-09-14 03:17:13 UTC
That would've worked because we actually took 0.17.2-1 *out* of updates-testing again. But the idea was to push it back there eventually.

I have just edited the update to be an update to dnf-3.5.1 and libdnf-0.19.1, because what the hell, why not. 0.19.1 upstream has the fix for this, so at least this bug should be sorted, once the update makes its way out to u-t: both the version in stable and the version in u-t should have the fix at that point.