Bug 1183010 - packagekit either fails to complete transaction very often, or even breaks rpm database indices
Summary: packagekit either fails to complete transaction very often, or even breaks rp...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: PackageKit
Version: 21
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Richard Hughes
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-01-16 12:26 UTC by Kamil Páral
Modified: 2015-01-20 23:10 UTC (History)
9 users (show)

Fixed In Version: libhif-0.1.8-1.fc21
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-01-20 21:03:10 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
issue 1 - journal (140.24 KB, text/plain)
2015-01-16 12:27 UTC, Kamil Páral
no flags Details
rpm -qa output (46.93 KB, text/plain)
2015-01-16 12:27 UTC, Kamil Páral
no flags Details
issue 2 - journal (129.97 KB, text/plain)
2015-01-16 14:51 UTC, Kamil Páral
no flags Details
issue 3 - journal (146.85 KB, text/plain)
2015-01-16 15:31 UTC, Kamil Páral
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1181501 0 unspecified CLOSED offline update failed in an unexpected way - Failed to find curl;7.37.0-12.fc21;x86_64;updates 2021-02-22 00:41:40 UTC

Internal Links: 1181501

Description Kamil Páral 2015-01-16 12:26:36 UTC
Description of problem:
When verifying bug 1181501, I stumbled upon other unexpected packagekit errors. I played with the system for a while, and found out that roughly 50% of all my packagekit install or update operations fail with some kind of transaction error. When I wanted to reproduce it properly and gather all important logs, I saw a different error and it seems my rpm database got completely corrupted.

I have no idea whether all those errors have a common root cause or not, I'll report everything I see in this report and we can dissect it with the help of developers.

This is the corruption issue (issue 1):

[kparal@localhost ~]$ pkcon install htop
Resolving                     [=========================]         
Starting                      [=========================]         
Testing changes               [=========================]         
Finished                      [=========================]         
Installing                    [=========================]         
Waiting for authentication    [=========================]         
Querying                      [=========================]         
Downloading packages          [=========================]         
Testing changes               [=========================]         
Installing packages           [=========================]         
Finished                      [=========================]         
[kparal@localhost ~]$ pkcon remove htop
Resolving                     [=========================]         
Querying                      [=========================]         
Testing changes               [=========================]         
Finished                      [=========================]         
Removing                      [=========================]         
Waiting for authentication    [=========================]         
Finished                      [=========================]         
Fatal error: Error -1 running transaction
[kparal@localhost ~]$ pkcon remove htop
Resolving                     [=========================]         
Finished                      [=========================]         
Removing                      [=========================]         
Waiting for authentication    [=========================]         
Finished                      [=========================]         
Fatal error: failed to find package htop
[kparal@localhost ~]$ pkcon install htop
Resolving                     [=========================]         
Loading cache                 [=========================]         
Finished                      [=========================]         
Fatal error: htop-1.0.3-4.fc21.x86_64 is already installed
[kparal@localhost ~]$ sudo yum remove htop
error: db5 error(-30973) from dbenv->open: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages index using db5 -  (-30973)
error: cannot open Packages database in /var/lib/rpm
CRITICAL:yum.main:

Error: rpmdb open failed
[kparal@localhost ~]$ rpm -q htop
htop-1.0.3-4.fc21.x86_64
[kparal@localhost ~]$ 



Version-Release number of selected component (if applicable):
fully updated F21, with libhif-0.1.7-3.fc21 installed
PackageKit-1.0.3-4.fc21.x86_64
rpm-4.12.0.1-4.fc21.x86_64
libhif-0.1.7-3.fc21.x86_64
hawkey-0.5.2-1.fc21.x86_64
librepo-1.7.11-1.fc21.x86_64
dnf-0.6.3-2.fc21.noarch
yum-3.4.3-153.fc21.noarch
libdb-5.3.28-9.fc21.x86_64

How reproducible:
not sure about this particular corruption issue, but if I run pkcon install and pkcon remove in a loop, I receive an error after just a few rounds, reliably. I always reset VM to the last snapshot and try again, to always run on a fresh and non-broken system.

Steps to Reproduce:
1. install clean F21 in a VM
2. fully update, include libhif-0.1.7-3.fc21 to have the latest and greatest stuff (note: I have seen similar errors even with older libhif)
3. make a snapshot
4. pick some package, e.g. htop (it's small)
5. run pkcon install htop
6. run pkcon remote htop
7. repeat steps 5 -6 in a loop until you get an error (that should be under a minute)

Comment 1 Kamil Páral 2015-01-16 12:27:16 UTC
Created attachment 980832 [details]
issue 1 - journal

Comment 2 Kamil Páral 2015-01-16 12:27:35 UTC
Created attachment 980833 [details]
rpm -qa output

Comment 3 Kamil Páral 2015-01-16 12:44:39 UTC
I just reproduced this exactly same problem (issue 1; I guess I'm being very lucky today) on a completely different computer while using virtualbox instead of libvirt/kvm for my VM. That rules out hardware issues or virtualization issues with the host.

Comment 4 Kamil Páral 2015-01-16 14:50:50 UTC
This is a different type of error - cannot create index (issue 2):

[kparal@localhost ~]$ C=1; while echo "Run $C ..." && pkcon install htop && pkcon remove htop; do C=$(($C + 1)); done
Run 1 ...
Resolving                     [=========================]         
Starting                      [=========================]         
Testing changes               [=========================]         
Finished                      [=========================]         
Installing                    [=========================]         
Waiting for authentication    [=========================]         
Downloading packages          [=========================]         
Testing changes               [=========================]         
Installing packages           [=========================]         
Finished                      [=========================]         
Fatal error: cannot create index for htop;1.0.3-4.fc21;x86_64;installed
[kparal@localhost ~]$ 

This is a very common issue, if PK fails, this is a very frequent error.

Comment 5 Kamil Páral 2015-01-16 14:51:20 UTC
Created attachment 980853 [details]
issue 2 - journal

Comment 6 Kamil Páral 2015-01-16 15:17:53 UTC
I have tried to bisect recent version of packages and find when it started to occur. This is my reproducer:

C=1; while echo "Run $C ..." && pkcon install htop && pkcon remove htop; do C=$(($C + 1)); done

Usually I see an error in the first 1-4 runs. If I see 20 transactions without an error, I consider that setup working.

F21 GOLD works. It contains:
dnf-0.6.1-1.fc21.noarch
hawkey-0.5.1-1.fc21.x86_64
libdb-5.3.28-9.fc21.x86_64
libhif-0.1.6-1.fc21.x86_64
librepo-1.7.5-2.fc21.x86_64
libsolv-0.6.4-3.fc21.x86_64
PackageKit-1.0.3-2.fc21.x86_64
rpm-4.12.0.1-3.fc21.x86_64
yum-3.4.3-153.fc21.noarch

F21 fully updated today doesn't work. It contains:
dnf-0.6.3-2.fc21.noarch
hawkey-0.5.2-1.fc21.x86_64
libdb-5.3.28-9.fc21.x86_64
libhif-0.1.7-1.fc21.x86_64
librepo-1.7.11-1.fc21.x86_64
libsolv-0.6.4-3.fc21.x86_64
PackageKit-1.0.3-4.fc21.x86_64
rpm-4.12.0.1-4.fc21.x86_64
yum-3.4.3-153.fc21.noarch

The minimal change to trigger the bug is to take F21 GOLD and update these packages:
libhif-0.1.6-1.fc21.x86_64 -> libhif-0.1.7-1.fc21.x86_64
librepo-1.7.5-2.fc21.x86_64 -> librepo-1.7.11-1.fc21.x86_64
PackageKit-1.0.3-2.fc21.x86_64 -> PackageKit-1.0.3-4.fc21.x86_64

If I update just libhif or just PackageKit, I don't see the bug. Only when both of them are updated, I see the bug. I need to update librepo as well because otherwise PackageKit crashes instantly on boot.

Or you can do it the other way round, take fully updated F21 and downgrade just PackageKit or just libhif. In both cases I suddenly can't reproduce the bug.

So it really seems to be related to the combination of PackageKit 1.0.3-4 + libhif 0.1.7-1. That doesn't mean that the bug can't originate from a completely different package, but these updates seem to trigger it while the previous ones didn't.

Comment 7 Kamil Páral 2015-01-16 15:30:52 UTC
And another very common error - Error -1 running transaction (issue 3):

[kparal@localhost ~]$ C=1; while echo "Run $C ..." && pkcon install htop && pkcon remove htop; do C=$(($C + 1)); done
Run 1 ...
<snip>
Run 2 ...
Resolving                     [=========================]         
Loading cache                 [=========================]         
Testing changes               [=========================]         
Finished                      [=========================]         
Installing                    [=========================]         
Waiting for authentication    [=========================]         
Querying                      [=========================]         
Downloading packages          [=========================]         
Installing packages           [=========================]         
Finished                      [=========================]         
Fatal error: Error -1 running transaction
[kparal@localhost ~]$

Comment 8 Kamil Páral 2015-01-16 15:31:48 UTC
Created attachment 980872 [details]
issue 3 - journal

Comment 9 Kamil Páral 2015-01-16 18:31:25 UTC
Richard created a new libhif which should hopefully fix this. It's in this repo:
http://people.freedesktop.org/~hughsient/fedora/

Unfortunately PackageKit fails to start with it, even when updated from Richard's repo:
packagekitd[1297]: Failed to load the backend: opening module hif failed : /usr/lib64/packagekit-backend/libpk_backend_hif.so: undefined symbol: hif_rc_to_gerror

So, waiting for Richard to fix it.

Comment 10 Richard Hughes 2015-01-16 19:27:41 UTC
Repo updated, sorry for the brain-fart.

Comment 11 DO NOT USE account not monitored (old adamwill) 2015-01-16 19:36:19 UTC
Still bad with libhif-0.1.8-0.112.20150116git.fc21.x86_64 , using kparal's reproducer: failed on run 3 with the 'Fatal error: cannot create index for htop;1.0.3-4.fc21;x86_64;install' error.

Comment 13 Kamil Páral 2015-01-19 08:07:31 UTC
I still see the same issue:

[kparal@localhost ~]$ C=1; while echo "Run $C ..." && pkcon install htop && pkcon remove htop; do C=$(($C + 1)); done
Run 1 ...
Resolving                     [=========================]         
Starting                      [=========================]         
Testing changes               [=========================]         
Finished                      [=========================]         
Installing                    [=========================]         
Waiting for authentication    [=========================]         
Querying                      [=========================]         
Downloading packages          [=========================]         
Testing changes               [=========================]         
Installing packages           [=========================]         
Finished                      [=========================]         
Resolving                     [=========================]         
Testing changes               [=========================]         
Finished                      [=========================]         
Removing                      [=========================]         
Waiting for authentication    [=========================]         
Requesting data               [=========================]         
Finished                      [=========================]         
Run 2 ...
Resolving                     [=========================]         
Loading cache                 [=========================]         
Testing changes               [=========================]         
Finished                      [=========================]         
Installing                    [=========================]         
Querying                      [=========================]         
Downloading packages          [=========================]         
Testing changes               [=========================]         
Installing packages           [=========================]         
Finished                      [=========================]         
Fatal error: cannot create index for htop;1.0.3-4.fc21;x86_64;installed
[kparal@localhost ~]$ rpm -q libhif
libhif-0.1.8-0.116.20150117git.fc21.x86_64
[kparal@localhost ~]$ 

In journal:
Jan 19 09:04:20 localhost.localdomain PackageKit[1297]: uid 1000 is trying to obtain org.freedesktop.packagekit.
Jan 19 09:04:20 localhost.localdomain PackageKit[1297]: new install-packages transaction /49_dbcdbcac scheduled 
Jan 19 09:04:20 localhost.localdomain PackageKit[1297]: uid 1000 obtained auth for org.freedesktop.packagekit.pa
Jan 19 09:04:23 localhost.localdomain packagekitd[1297]: mapping pread: Bad file descriptor
Jan 19 09:04:23 localhost.localdomain PackageKit[1297]: in /49_dbcdbcac for install-packages package htop;1.0.3-
Jan 19 09:04:23 localhost.localdomain PackageKit[1297]: install-packages transaction /49_dbcdbcac from uid 1000

Comment 14 Kamil Páral 2015-01-19 08:09:50 UTC
Note, I updated just libhif from Richard's utopia repo, not PackageKit (that seems to have broken deps with libappstream-glib).

Comment 15 Kamil Páral 2015-01-19 08:34:56 UTC
I've found out that matching libappstream-glib is built in rawhide:
http://koji.fedoraproject.org/koji/buildinfo?buildID=604510
so I used it and updated PackageKit as well. Unfortunately it still fails for me:

Run 7 ...
Resolving                     [=========================]         
Loading cache                 [=========================]         
Testing changes               [=========================]         
Finished                      [=========================]         
Installing                    [=========================]         
Querying                      [=========================]         
Downloading packages          [=========================]         
Testing changes               [=========================]         
Installing packages           [=========================]         
Finished                      [=========================]         
Fatal error: cannot create index for htop;1.0.3-4.fc21;x86_64;installed
[kparal@localhost ~]$ rpm -q libhif
libhif-0.1.8-0.116.20150117git.fc21.x86_64
[kparal@localhost ~]$ rpm -q PackageKit
PackageKit-1.0.4-0.1851.20150117git.fc21.x86_64
[kparal@localhost ~]$ 

Journal says:
Jan 19 09:30:23 localhost.localdomain PackageKit[1298]: new install-packages transaction /101_bacdaeea scheduled from uid 1000
Jan 19 09:30:23 localhost.localdomain PackageKit[1298]: uid 1000 obtained auth for org.freedesktop.packagekit.package-install
Jan 19 09:30:25 localhost.localdomain packagekitd[1298]: mapping pread: Bad file descriptor
Jan 19 09:30:25 localhost.localdomain PackageKit[1298]: in /101_bacdaeea for install-packages package htop;1.0.3-4.fc21;x86_64;fedora was installing for uid 1000
Jan 19 09:30:25 localhost.localdomain PackageKit[1298]: install-packages transaction /101_bacdaeea from uid 1000 finished with failed after 1848ms
Jan 19 09:30:25 localhost.localdomain packagekitd[1298]: error: rpmdb: BDB0151 fsync: Unknown error -1
Jan 19 09:30:25 localhost.localdomain packagekitd[1298]: error: db5 error(-1) from db->close: BDB0092 Unknown error: -1
Jan 19 09:30:25 localhost.localdomain packagekitd[1298]: error: rpmdb: BDB0151 fsync: Unknown error -1
Jan 19 09:30:25 localhost.localdomain packagekitd[1298]: error: rpmdb: BDB0164 close: Bad file descriptor
Jan 19 09:30:25 localhost.localdomain packagekitd[1298]: error: rpmdb: Packages: Bad file descriptor
Jan 19 09:30:25 localhost.localdomain packagekitd[1298]: error: db5 error(-1) from db->close: BDB0092 Unknown error: -1
Jan 19 09:30:25 localhost.localdomain polkitd[681]: Unregistered Authentication Agent for unix-process:4765:7321 (system bus name :1.129, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus)

Comment 16 Richard Hughes 2015-01-19 17:21:00 UTC
commit 783f1f9dc8aa904a14ba24ac2c337d19f64e2bbc
Author: Richard Hughes <richard>
Date:   Mon Jan 19 15:03:40 2015 +0000

    Do not attempt to read from a closed file descriptor
    
    This fixes a hard-to-trigger and even-harder-to-debug crash when installing
    and then removing the same package. This will fix errors in the journal like:
    
        error: rpmdb: Packages: Bad file descriptor
    
    Part of the yumdb core design is that the package 'key' is the SHA hash of the
    rpmdb package header. By setting this during the transaction we do not have to
    open the rpmdb when writing out the yumdb entry.
    
    It appears that hy_sack_load_system_repo() -> repo_add_repodata() ->
    repodata_empty() -> repodata_freedata() -> close() calls on the fd that we're
    trying to share between different threads. The pread() and pwrite() system
    calls were chosen in libsolv to make this possible, so we're not trying to
    do anything unsupported and crazy.
    
    We've got the rpm 'Header' already during the transaction, so it's easy to fix
    by just extracting it there rather than looking it up once the transaction has
    completed.
    
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1183010

Comment 17 Fedora Update System 2015-01-19 19:00:15 UTC
PackageKit-1.0.4-1.fc21,libhif-0.1.8-1.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/PackageKit-1.0.4-1.fc21,libhif-0.1.8-1.fc21

Comment 18 Adam Williamson 2015-01-19 22:03:04 UTC
With the latest bits - from #c17 - I can no longer trigger any crashes or outright error messages, but it seems after a while PK gets confused about the current status of the package:

Run 3 ...
Resolving                     [=========================]         
Testing changes               [=========================]         
Finished                      [=========================]         
Installing                    [=========================]         
Waiting in queue              [=========================]         
Querying                      [=========================]         
Downloading packages          [=========================]         
Testing changes               [=========================]         
Finished                      [=========================]         
Installing packages           [=========================]         
Resolving                     [=========================]         
Querying                      [=========================]         
Testing changes               [=========================]         
Finished                      [=========================]         
Removing                      [=========================]         
Waiting for authentication    [=========================]         
Finished                      [=========================]         
Run 4 ...
Resolving                     [=========================]         
Testing changes               [=========================]         
Finished                      [=========================]         
Fatal error: htop-1.0.3-4.fc21.x86_64 is already installed
[root@localhost test]# rpm -q htop
package htop is not installed

that is, run 3's install/uninstall cycle completed, but then run 4's 'install' command claimed htop was already installed; rpm -q shows it is not. This doesn't look like it needs to hold up the update, but still seems to be not perfectly fixed yet.

Comment 19 Michael Catanzaro 2015-01-20 01:01:49 UTC
That's a serious issue, but I suspect it's unrelated to this bug. PackageKit has occasionally had trouble understanding what's installed and what isn't for quite some time.

Comment 20 Richard Hughes 2015-01-20 15:16:16 UTC
(In reply to Adam Williamson (Red Hat) from comment #18)
> that is, run 3's install/uninstall cycle completed, but then run 4's
> 'install' command claimed htop was already installed

So, this looks to be some kind of cache-invalidation issue. After a remove/install PackageKit is supposed to invalidate any sack cache with @system included. Can anyone that can repduce this issue please grab a:

sudo /usr/libexec/packagekitd --verbose --disable-timer --backend=hif

trace -- I then need the *whole* log right to the start.

Also, we should probably open a new bug for this issue, it's unrelated to any of the fixes. Thanks.

Richard

Comment 21 Fedora Update System 2015-01-20 21:03:10 UTC
libhif-0.1.8-1.fc21, PackageKit-1.0.4-1.fc21 has been pushed to the Fedora 21 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 22 Adam Williamson 2015-01-20 23:10:13 UTC
For reference of anyone with broken offline updates, you can install the update with:

su -c 'yum update --advisory=FEDORA-2015-0921'

or:

sudo yum update --advisory=FEDORA-2015-0921

At present the update is syncing to mirrors so it may not be immediately available; if the above doesn't install anything, wait a few hours and try again. Sorry again for the trouble.


Note You need to log in before you can comment on or make changes to this bug.