Bug 1398085 - Fedora 25: gnome-software unable to install software due to PackageKit daemon SIGABRTing
Summary: Fedora 25: gnome-software unable to install software due to PackageKit daemon...
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: libsolv
Version: 25
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
Assignee: rpm-software-management
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-24 05:40 UTC by Nate Graham
Modified: 2017-12-12 10:23 UTC (History)
19 users (show)

Fixed In Version: libsolv-0.6.25-1.fc25 libsolv-0.6.25-1.fc24
Clone Of:
Environment:
Last Closed: 2017-12-12 10:23:16 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
gnome-software stuck forever at its loading screen (14.31 KB, image/png)
2016-11-24 05:40 UTC, Nate Graham
no flags Details
Backtrace (43.21 KB, text/plain)
2016-11-24 21:32 UTC, Nate Graham
no flags Details
Debugged the crash with gdb (49.77 KB, text/plain)
2016-11-24 22:01 UTC, Nate Graham
no flags Details
packagekitd running under valgrind (24.43 KB, text/plain)
2017-01-11 16:04 UTC, Nate Graham
no flags Details
packagekitd coredump (2.95 MB, application/x-core)
2017-01-25 23:26 UTC, Nate Graham
no flags Details
packagekitd core (12.91 MB, application/x-xz)
2017-01-27 15:48 UTC, Nate Graham
no flags Details
/var/cache/PackageKit/25/hawkey/@System.solv (5.30 MB, application/octet-stream)
2017-03-03 19:24 UTC, Jonathan Ryshpan
no flags Details
/var/cache/PackageKit/25/hawkey/@System.solv (2.72 MB, application/octet-stream)
2017-03-15 09:28 UTC, Henrique Menezes
no flags Details
@System.solv (2.00 MB, application/x-bzip)
2017-04-19 17:48 UTC, Serge Droz
no flags Details

Description Nate Graham 2016-11-24 05:40:04 UTC
Created attachment 1223520 [details]
gnome-software stuck forever at its loading screen

Description of problem:
gnome-software hangs at load because the PackageKit daemon is crashing.



Version-Release number of selected component (if applicable):
Fedora 25



How reproducible:
100% reproducible for me. I was using gnome-software normally for two days, but starting today this is happening 100% of the time. Not sure what changed to trigger this.



Steps to Reproduce:
1. check to make sure that the PackageKit service is running:
 systemctl status packagekit.service -l
● packagekit.service - PackageKit Daemon
   Loaded: loaded (/usr/lib/systemd/system/packagekit.service; static; vendor preset: disabled)
   Active: active (running) since Wed 2016-11-23 22:14:04 MST; 3min 26s ago
 Main PID: 18290 (packagekitd)
    Tasks: 3 (limit: 4915)
   CGroup: /system.slice/packagekit.service
           └─18290 /usr/libexec/packagekitd
Nov 23 22:14:04 spectre systemd[1]: Starting PackageKit Daemon...
Nov 23 22:14:04 spectre PackageKit[18290]: daemon start
Nov 23 22:14:04 spectre systemd[1]: Started PackageKit Daemon.
Nov 23 22:14:04 spectre PackageKit[18290]: uid 1000 is trying to obtain org.freedesktop.packagekit.system-sources-ref
Nov 23 22:14:04 spectre PackageKit[18290]: uid 1000 obtained auth for org.freedesktop.packagekit.system-sources-refre
Nov 23 22:14:04 spectre packagekitd[18290]: BDB2053 Freeing read locks for locker 0x13: 18059/140196001058112
Nov 23 22:14:04 spectre packagekitd[18290]: BDB2053 Freeing read locks for locker 0x15: 18059/140196001058112
Nov 23 22:14:23 spectre PackageKit[18290]: refresh-cache transaction /6_ecabedbb from uid 1000 finished with success 

Cool, looks like it's running.

2. Open gnome-software



Actual results:
gnome-software gets stuck at "Software catalog is being loaded". If you check on the PackageKit service, you'll see that it's crashed:

$ systemctl status packagekit.service -l
● packagekit.service - PackageKit Daemon
   Loaded: loaded (/usr/lib/systemd/system/packagekit.service; static; vendor preset: disabled)
   Active: failed (Result: signal) since Wed 2016-11-23 22:30:04 MST; 4s ago
  Process: 18604 ExecStart=/usr/libexec/packagekitd (code=killed, signal=ABRT)
 Main PID: 18604 (code=killed, signal=ABRT)

Nov 23 22:30:03 spectre packagekitd[18604]: 7fea25fdd000-7fea25fde000 r--p 00025000 fd:00 4203032                    
Nov 23 22:30:03 spectre packagekitd[18604]: 7fea25fde000-7fea25fdf000 rw-p 00026000 fd:00 4203032                    
Nov 23 22:30:03 spectre packagekitd[18604]: 7fea25fdf000-7fea25fe0000 rw-p 00000000 00:00 0
Nov 23 22:30:03 spectre packagekitd[18604]: 7ffeaef59000-7ffeaef7a000 rw-p 00000000 00:00 0                          
Nov 23 22:30:03 spectre packagekitd[18604]: 7ffeaefcb000-7ffeaefcd000 r--p 00000000 00:00 0                          
Nov 23 22:30:03 spectre packagekitd[18604]: 7ffeaefcd000-7ffeaefcf000 r-xp 00000000 00:00 0                          
Nov 23 22:30:03 spectre packagekitd[18604]: ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  
Nov 23 22:30:04 spectre systemd[1]: packagekit.service: Main process exited, code=killed, status=6/ABRT
Nov 23 22:30:04 spectre systemd[1]: packagekit.service: Unit entered failed state.
Nov 23 22:30:04 spectre systemd[1]: packagekit.service: Failed with result 'signal'.




Expected results:
gnome-software should load properly and allow me to install software, and PackageKit should not crash.



Additional info:
Happy to collect more info, since this is 100% reproducible for me and seemingly not going away. Hardware is an 2016 HP Spectre x360 (2016, Kaby Lake).

Comment 1 Nate Graham 2016-11-24 21:32:33 UTC
Created attachment 1224028 [details]
Backtrace

Comment 2 Nate Graham 2016-11-24 22:01:55 UTC
Created attachment 1224057 [details]
Debugged the crash with gdb

Comment 3 Nate Graham 2016-12-15 04:08:10 UTC
This same issue has also been reported here: https://bugs.freedesktop.org/show_bug.cgi?id=99083

I can confirm that the same thing happens to me from the CLI

$ pkcon install Thunar
Resolving                     [=========================]         
Testing changes               [                         ] (0%)  The daemon crashed mid-transaction!

Comment 4 Nate Graham 2016-12-29 17:28:32 UTC
Not fixed with PackageKit-1.1.5-0.1.20161221.fc25:

$ pkcon install Thunar
Resolving                     [=========================]         
Testing changes               [=                        ] (5%)  The daemon crashed mid-transaction!

Comment 5 srinu 2017-01-03 05:19:39 UTC
Friends, I am also encountering the freeze of gnome-software. Can anyone suggest a solution?

Comment 6 Igor Gnatenko 2017-01-09 12:30:27 UTC
Actually:

malloc(): memory corruption


Richard, any idea what caused this?

Comment 7 Richard Hughes 2017-01-11 09:47:03 UTC
(In reply to Igor Gnatenko from comment #6)
> malloc(): memory corruption
> Richard, any idea what caused this?

Without the logs of packagekitd running under valgrind, no.

Comment 8 Nate Graham 2017-01-11 16:04:54 UTC
Created attachment 1239519 [details]
packagekitd running under valgrind

Here's the output of packagekitd running under valgrind, as requested. Looks like this is the issue:


valgrind: m_mallocfree.c:303 (get_bszB_as_is): Assertion 'bszB_lo == bszB_hi' failed.
valgrind: Heap block lo/hi size mismatch: lo = 101856, hi = 109289737790854.
This is probably caused by your program erroneously writing past the
end of a heap block and corrupting heap metadata.  If you fix any
invalid writes reported by Memcheck, this assertion failure will
probably go away.  Please try that before reporting this as a bug.

Not sure if "fix[ing] any invalid writes reported by Memcheck" is something I would do or something you would do.

Comment 9 Marco Kundt 2017-01-20 18:26:31 UTC
For me it's the same. I'm reporting every bug which occurs with packagekit in abrt because it's not usable at all. Worked for 3 days or so and afterwards the packagekit daemon crashes at every operation in GNOME Software (and in the shell too). I can't search for updates, install or remove packages from repositories. For me it's not a big deal because i'm using mostly dnf. But for users which aren't home inside the shell it's a big problem. I freshly reinstalled my system (because removing configs, deinstalling and installing packagekit + GNOME Software again won't work) but after some time the problems occurred again.

Comment 10 Nate Graham 2017-01-20 20:14:20 UTC
Richard, is there anything else you need from me on this? I'm happy to provide any information that would be useful.

Is this not the right place for this bug? Should I file another one at freedesktop.org?

Comment 11 Kalev Lember 2017-01-22 12:37:30 UTC
Looking at the valgrind log from comment #8, it looks like it's libsolv doing an out of bounds write in repo_write/traverse_dirs.

Comment 12 Michael Schröder 2017-01-25 10:12:29 UTC
Can you please attach a core file?

Comment 13 Nate Graham 2017-01-25 23:26:25 UTC
Created attachment 1244511 [details]
packagekitd coredump

Comment 14 Michael Schröder 2017-01-26 10:48:43 UTC
That seems to be a coredump from 'vim'...

Comment 15 Kalev Lember 2017-01-26 11:21:26 UTC
Thanks for looking at this, Michael. If it helps, there are a few ABRT filed bug reports with crashes in the same functions and they all have good backtraces, e.g. https://bugzilla.redhat.com/show_bug.cgi?id=1405832 and https://bugzilla.redhat.com/show_bug.cgi?id=1404468 (look at the "backtrace" attachment).

Comment 16 Michael Schröder 2017-01-26 13:30:28 UTC
I need a core so I can access the data to reproduce this.

Comment 17 Nate Graham 2017-01-26 15:13:17 UTC
Looks like I attached the wrong core. Trying again...

Comment 18 Nate Graham 2017-01-26 17:03:53 UTC
The right one was too big to upload to this bug report, so you can download it from here: hommelscitadel.com/wp-content/uploads/2017/01/coredump

Comment 19 Michael Schröder 2017-01-27 10:44:40 UTC
Doesn't work for me. 'file coredump' says "error reading (Invalid argument)" and gdb prints "coredump" is not a core dump: File format not recognized"...

Comment 20 Nate Graham 2017-01-27 15:48:49 UTC
Created attachment 1245164 [details]
packagekitd core

Okay, looks like something got corrupted during the upload or something. I'm re-uploading it in an archive.

Comment 21 Nate Graham 2017-01-27 15:50:53 UTC
If I download and un-archive the core, it looks good to me now:

$ file '/home/nate/coredump' 
/home/nate/coredump: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/usr/libexec/packagekitd', real uid: 0, effective uid: 0, real gid: 0, effective gid: 0, execfn: '/usr/libexec/packagekitd', platform: 'x86_64'


Thanks for your patience, Michael!

Comment 22 Michael Schröder 2017-01-30 14:26:58 UTC
Ok, from looking at the core I found that the directory data for the "@System" repo is bad. There's a bogus NULL entry in it with trips the logic in the repository write code.

There are not many places in the code where directories are added; I checked them all and I don't see how this could have happened. So here's a little test:
Packagekit should store a file called @System.solv somewhere in /var/cache.
Do you still see coredumps if you remove this file so that packagekit needs to rebuild its cache?

Comment 23 Jan Niklas Hasse 2017-01-30 14:30:11 UTC
Removing /var/cache/PackageKit/25/hawkey/@System.solv fixed the crashes for me :)

Comment 24 Michael Schröder 2017-01-30 15:08:22 UTC
Good to hear! Now I just need to find out what caused the bad data in the first place...

Comment 25 Nate Graham 2017-01-30 15:25:40 UTC
There's also a lot of temp-file-looking things in there:

$ find /var/cache/PackageKit/25/hawkey/ | grep -i System | wc -l
120
$ find /var/cache/PackageKit/25/hawkey/ | grep -i System | head -n 5
/var/cache/PackageKit/25/hawkey/@System.solv.NndZva
/var/cache/PackageKit/25/hawkey/@System.solv
/var/cache/PackageKit/25/hawkey/@System.solv.KWL0wF
/var/cache/PackageKit/25/hawkey/@System.solv.Y05sdi
/var/cache/PackageKit/25/hawkey/@System.solv.TtcMrX


I can also confirm that deleting /var/cache/PackageKit/25/hawkey/@System.solv resolves the issue of packagekitd coredumping.

In addition to figuring out how the bad data got in there, perhaps packagekitd should better sanitize its inputs.

Comment 26 Michael Schröder 2017-01-30 15:39:48 UTC
Those are leftovers from the crashes. When writing a cache file, it first writes into a tmp file and then renames it to the final destination. The writing part was where packagekit crashed.

Comment 27 Kalev Lember 2017-01-30 18:52:28 UTC
mls, I wonder if there's a good workaround we could do in PackageKit to deal with existing installations that have broken @System.solv? Maybe unlink it every time packagekitd starts, just to be on the safe side?

Or are the libsolv side fixes sufficient here? (I noticed https://github.com/openSUSE/libsolv/commit/3a8f2216aeec9126968ea3d99872f839548a6d65 which I guess is for this crash?)

Comment 28 Kodiak Firesmith 2017-01-31 01:34:42 UTC
Removing @System.solv fixes this on my multi-upgraded (23 -> 24 -> 25) system.   

Thanks for posting the work-around, I never would have figured that out on my own.

Comment 29 Jan Niklas Hasse 2017-01-31 07:39:19 UTC
FWIW I'm running a fresh 25 install.

Comment 30 Michael Schröder 2017-01-31 12:39:10 UTC
Ok, I've got a theory to prove. Nate, can you please create an attachment with the/var/cache/PackageKit/25/hawkey/@System.solv and /var/lib/rpm/Packages files?

Comment 31 Nate Graham 2017-01-31 15:30:33 UTC
Available at http://hommelscitadel.com/wp-content/uploads/2017/requested_files.tar.xz since it was too large to attach here. Note: the @System.solv I've included there is the one that works, not the bad broken one (it was deleted yesterday).

Comment 32 Michael Schröder 2017-02-01 10:52:42 UTC
Thanks! The bad package leading to the corrupt entry seems to be gone now, though.

Anyway, I did a couple of changes to libsolv:
1) it will now reject solv files that have bad directory entries. This will make packagekit rebuild the cache so this problem should be gone. (commit 64ea54c31ec396531faac2a86b5c0c1c056b59b2)
2) I added a guard so that illegal directories entries can no longer be added. (commit 3a8f2216aeec9126968ea3d99872f839548a6d65)

I also cleanup up some of the code a bit, but that should not make a difference.

If somebody still has a @System.solv file that leads to a crash, could you please attach it to this bug? I'm still a little bit nervous because I couldn't find the real root cause of the illegal directory entry. Thanks!

Comment 33 Kalev Lember 2017-02-01 11:05:03 UTC
Awesome, thanks for the fixes, Michael! Do you have plans for an upstream libsolv release or should we backport patches?

Comment 34 Michael Schröder 2017-02-07 12:22:59 UTC
I just released libsolv version 0.6.25 which contains those fixes.

Comment 35 Fedora Update System 2017-02-07 12:48:02 UTC
libsolv-0.6.25-1.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2017-2810889d00

Comment 36 Fedora Update System 2017-02-07 12:48:19 UTC
libsolv-0.6.25-1.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2017-8bfcc055a1

Comment 37 Nate Graham 2017-02-07 14:38:23 UTC
Excellent!

Comment 38 Kalev Lember 2017-02-07 15:41:02 UTC
Thanks mls!

Comment 39 Fedora Update System 2017-02-08 02:48:19 UTC
libsolv-0.6.25-1.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-8bfcc055a1

Comment 40 Fedora Update System 2017-02-09 09:20:23 UTC
libsolv-0.6.25-1.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-2810889d00

Comment 41 Fedora Update System 2017-02-09 20:24:42 UTC
libsolv-0.6.25-1.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.

Comment 42 Fedora Update System 2017-02-09 20:50:45 UTC
libsolv-0.6.25-1.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.

Comment 43 Jonathan Ryshpan 2017-03-03 18:57:25 UTC
It looks like I'm seeing the same problem.  libsolv-0.6.25-1.fc25 was supposed to cure it; my system is running libsolv-0.6.26-1.fc25 but the problem persists.  Here's a brief bash log:

$ pkcon get-updates
Getting updates               [=========================]         
Querying                      [                     ==  ]         The daemon crashed mid-transaction!
$ rpm -q libsolv
libsolv-0.6.26-1.fc25.x86_64

Comment 44 Nate Graham 2017-03-03 19:04:21 UTC
Jonathan, can you attach your /var/cache/PackageKit/25/hawkey/@System.solv file?

Comment 45 Jonathan Ryshpan 2017-03-03 19:24:06 UTC
Created attachment 1259702 [details]
/var/cache/PackageKit/25/hawkey/@System.solv

As requested by Nate Graham in Comment #44

BTW: What kind of file is this?  What is its function?

Comment 46 Nate Graham 2017-03-03 20:15:07 UTC
I'm re-opening this for investigation, since we now have a case in the wild with the supposedly-fixed libsolv.

Comment 47 Henrique Menezes 2017-03-15 09:28:32 UTC
Created attachment 1263236 [details]
/var/cache/PackageKit/25/hawkey/@System.solv

I'm experiencing the same issue as Jonathan, also with the exact same bash output.

Attaching my @System.solv file, maybe it'll help?

Comment 48 Patrick R 2017-03-22 20:20:56 UTC
Confirmed here....
I started experiencing the problems when I was installing codecs from the "Add-ons" Categories
Now I can't install anything via the GUI

Per what previous people have posted I get...

$pkcon get-updates
Getting updates               [=========================]         
Querying                      [              ==         ]         The daemon crashed mid-transaction!
$rpm -q libsolv
libsolv-0.6.26-1.fc25.x86_64

Anyone have any suggestions on how to get this working at least temporarily?

Comment 49 Kodiak Firesmith 2017-03-23 10:56:25 UTC
(In reply to Patrick R from comment #48)
> Anyone have any suggestions on how to get this working at least temporarily?

Hi Patrick,
You can work around this by removing any of these files:
/var/cache/PackageKit/25/hawkey/@System.solv*   *BUT*  before you do that, please consider just moving them into /tmp and attaching them to this ticket for further investigation - I remember some of the devs were looking for examples of these files to inspect and find commonalities.

Comment 50 Serge Droz 2017-04-19 17:48:10 UTC
Created attachment 1272693 [details]
@System.solv

Comment 51 Fedora End Of Life 2017-11-16 18:31:47 UTC
This message is a reminder that Fedora 25 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 25. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '25'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 25 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 52 Fedora End Of Life 2017-12-12 10:23:16 UTC
Fedora 25 changed to end-of-life (EOL) status on 2017-12-12. Fedora 25 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.