Bug 1642796

Summary: PackageKit terminated before end of offline update: TransactionItem state is not set (when any multiarch package is installed)
Product: [Fedora] Fedora Reporter: Petr Schindler <pschindl>
Component: libdnfAssignee: rpm-software-management
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: urgent    
Version: 29CC: arcadiy, awilliam, bbigby64, davidmenhur, dmach, excieve, jmracek, jonathan, keramidasceid, klember, kparal, lruzicka, mail, mattdm, michal.jnn, mikhail.zabaluev, mluscon, rdieter, rhughes, robatino, rpm-software-management, sgallagh, smparrish
Target Milestone: ---Keywords: CommonBugs, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: RejectedBlocker https://fedoraproject.org/wiki/Common_F29_bugs#libdnf-crash-offline-updates
Fixed In Version: libdnf-0.22.0-8.fc29 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1649291 (view as bug list) Environment:
Last Closed: 2018-11-09 06:03:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1574713, 1649291    
Attachments:
Description Flags
output of journalctl -b -1 -a
none
output of journalctl -b -1 -a
none
#dnf history
none
#dnf history info last
none
/var/lib/dnf/history.sqlite
none
log of latest failed offline updates session none

Description Petr Schindler 2018-10-25 06:07:31 UTC
Created attachment 1497303 [details]
output of journalctl -b -1 -a

Description of problem:
I tested new build of gnome-software (https://bodhi.fedoraproject.org/updates/FEDORA-2018-b5a870dcff). I enabled updates testing, g-s showed some updates -> Download -> Restart&Update. Update didn't end properly and the system stuck (last message from packagekit was 97% done).

After I forced restart I looked to the journal and found this (time 16:45:40):
packagekitd[1014]: terminate called after throwing an instance of 'std::runtime_error'
packagekitd[1014]:   what():  TransactionItem state is not set: alsa-lib-1.1.6-3.fc29.x86_64

You can find the whole output attached.

Version-Release number of selected component (if applicable):
PackageKit-1.1.11-1.fc29.x86_64

How reproducible:
Not sure

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Kalev Lember 2018-10-25 06:46:33 UTC
This is something going wrong in libdnf. Is there anything in coredumpctl to get a backtrace of the crash? Reassigning to libdnf.

Also proposing as a F29 final blocker, as this prevents system updates from working.

Comment 2 Kamil Páral 2018-10-25 09:33:56 UTC
This seems the same error as in bug 1629340, which was fixed in:
dnf-3.5.1-1.fc29
dnf-plugins-core-3.0.3-1.fc29
libdnf-0.19.1-3.fc29

And Petr originally had:
dnf-3.6.1-2.fc29
dnf-plugins-core-3.0.4-1.fc29
libdnf-0.20.0-1.fc29

which seems fine. So why is this still happening?

Also please note that none of us have seen this yet and we don't have a reproducer for this.

Comment 3 Kamil Páral 2018-10-25 09:36:01 UTC
Correcting Petr's mistake.

Comment 4 Jaroslav Mracek 2018-10-25 11:35:20 UTC
I believe that the issue is solved with libdnf-0.22.0-6. Please can anyone reproduce it with latest libdnf and after restarting of PackageKit?

Comment 5 Matthew Miller 2018-10-25 11:55:23 UTC
I just updated my system and saw the same thing. I didn't update to latest libdnf, though. So I can confirm the original but not necessarily the fix.

Comment 6 Stephen Gallagher 2018-10-25 12:32:09 UTC
(In reply to Kamil Páral from comment #2)
> This seems the same error as in bug 1629340, which was fixed in:
> dnf-3.5.1-1.fc29
> dnf-plugins-core-3.0.3-1.fc29
> libdnf-0.19.1-3.fc29
> 
> And Petr originally had:
> dnf-3.6.1-2.fc29
> dnf-plugins-core-3.0.4-1.fc29
> libdnf-0.20.0-1.fc29
> 
> which seems fine. So why is this still happening?
> 
> Also please note that none of us have seen this yet and we don't have a
> reproducer for this.

It's the same error message, but looking at alsa-lib, it doesn't seem like it has any Obsoletes: specified at all, which was the source of that earlier bug. So I doubt it's the same issue.

Comment 7 Kamil Páral 2018-10-25 13:21:52 UTC
alsa-lib is mentioned probably because it's the first update alphabetically

Comment 8 Kamil Páral 2018-10-26 09:47:23 UTC
Discussed during 2018-10-25 Go/NoGo meeting [1]:

#agreed 1642796 - RejectedBlocker - Testers cannot reproduce the issue with up-to-date systems

[1] https://meetbot.fedoraproject.org/teams/f29-final-go_no_go-meeting/f29-final-go_no_go-meeting.2018-10-25-17.03.html

Comment 9 Mikhail Zabaluev 2018-11-02 06:30:40 UTC
This has occurred to me when trying to update packages after upgrading to F29:

Nov 02 05:53:18 packagekitd[947]: terminate called after throwing an instance of 'std::runtime_error'
Nov 02 05:53:18 packagekitd[947]:   what():  TransactionItem state is not set: cairo-1.15.14-1.fc29.x86_64

Comment 10 Mikhail Zabaluev 2018-11-02 06:37:32 UTC
(In reply to Mikhail Zabaluev from comment #9)
> This has occurred to me when trying to update packages after upgrading to
> F29:

libdnf is at 0.22.0-6.fc29 and was at that release before the upgrade.

Comment 11 Jaroslav Mracek 2018-11-02 08:08:59 UTC
Please can you provide following information:

Can you attach outputs from:
"dnf history"
"dnf history info <number>" where number should be the transaction performed by packagekit.

Please can you provide a DB - ls /var/lib/dnf/history.sqlite ?

Comment 12 Petr Schindler 2018-11-02 08:25:27 UTC
The same problem happened to me again yesterday. I will attach information asked in comment 11.

Version: 
libdnf-0.22.0-6.fc29.x86_64

Comment 13 Petr Schindler 2018-11-02 08:26:04 UTC
Created attachment 1500406 [details]
output of journalctl -b -1 -a

Comment 14 Petr Schindler 2018-11-02 08:28:51 UTC
Created attachment 1500411 [details]
#dnf history

Comment 15 Petr Schindler 2018-11-02 08:29:20 UTC
Created attachment 1500420 [details]
#dnf history info last

Comment 16 Petr Schindler 2018-11-02 08:29:56 UTC
Created attachment 1500421 [details]
/var/lib/dnf/history.sqlite

Comment 17 Matthew Miller 2018-11-02 17:02:17 UTC
This is still happening to me too, with 
libdnf-0.22.0-6.fc29.x86_64

Comment 18 Fedora Blocker Bugs Application 2018-11-02 17:04:16 UTC
Proposed as a Blocker for 30-beta by Fedora user mattdm using the blocker tracking app because:

 Updates aren't functional.

Comment 19 Adam Williamson 2018-11-03 17:30:26 UTC
I just managed to reproduce this error in dnf by accident while testing something else.

The scenario was quite elaborate, though most parts of it probably aren't relevant, but here it is:

1. I'm inside a mock.
2. Inside the mock is a filesystem image, loopback mounted as /mnt/sysimage .
3. From the mock root, I'm doing this over and over:
   dnf --installroot /mnt/sysimage/ reinstall kernel-core-4.20.0-0.rc0.git8.1.fc30.x86_64.rpm

One time, I had replaced an executable used by kernel-core's scriptlets, but forgot to mark the replacement as executable, so the scriptlets failed:

Reinstalled: kernel-core-4.20.0-0.rc0.git8.1.fc30.x86_64
  Running scriptlet: kernel-core-4.20.0-0.rc0.git8.1.fc30.x86_64                                                  2/2 
/var/tmp/rpm-tmp.9MeBzi: line 1: /bin/kernel-install: Permission denied
error: %preun(kernel-core-4.20.0-0.rc0.git8.1.fc30.x86_64) scriptlet failed, exit status 126

Error in PREUN scriptlet in rpm package kernel-core
error: kernel-core-4.20.0-0.rc0.git8.1.fc30.x86_64: erase failed
/var/tmp/rpm-tmp.bVoMOd: line 1: /bin/kernel-install: Permission denied
warning: %posttrans(kernel-core-4.20.0-0.rc0.git8.1.fc30.x86_64) scriptlet failed, exit status 126

Error in POSTTRANS scriptlet in rpm package kernel-core

When I then fixed this (made the replacement file executable then ran the reinstall again), I hit this error:

Traceback (most recent call last):
  File "/usr/bin/dnf", line 58, in <module>
    main.user_main(sys.argv[1:], exit_code=True)
  File "/usr/lib/python3.7/site-packages/dnf/cli/main.py", line 179, in user_main
    errcode = main(args)
  File "/usr/lib/python3.7/site-packages/dnf/cli/main.py", line 64, in main
    return _main(base, args, cli_class, option_parser_class)
  File "/usr/lib/python3.7/site-packages/dnf/cli/main.py", line 99, in _main
    return cli_run(cli, base)
  File "/usr/lib/python3.7/site-packages/dnf/cli/main.py", line 123, in cli_run
    ret = resolving(cli, base)
  File "/usr/lib/python3.7/site-packages/dnf/cli/main.py", line 154, in resolving
    base.do_transaction(display=displays)
  File "/usr/lib/python3.7/site-packages/dnf/cli/cli.py", line 235, in do_transaction
    tid = super(BaseCli, self).do_transaction(display)
  File "/usr/lib/python3.7/site-packages/dnf/base.py", line 858, in do_transaction
    tid = self._run_transaction(cb=cb)
  File "/usr/lib/python3.7/site-packages/dnf/base.py", line 1008, in _run_transaction
    self._verify_transaction(cb.verify_tsi_package)
  File "/usr/lib/python3.7/site-packages/dnf/base.py", line 1046, in _verify_transaction
    self.history.end(rpmdbv, 0)
  File "/usr/lib/python3.7/site-packages/dnf/db/history.py", line 498, in end
    bool(return_code)
  File "/usr/lib64/python3.7/site-packages/libdnf/transaction.py", line 758, in endTransaction
    return _transaction.Swdb_endTransaction(self, dtEnd, rpmdbVersionEnd, state)
RuntimeError: TransactionItem state is not set: kernel-core-4.20.0-0.rc0.git8.1.fc30.x86_64

So - I wonder if this bug can occur on the *next* dnf transaction involving a package whose scriptlets failed in a previous transaction?

Comment 20 Matthew Miller 2018-11-05 20:57:00 UTC
It's definitely not just the *next* transaction. This happens to me every time with offline updates, and online dnf transactions in between seem to be fine.

Will attach journal from last update. (Ends with me hitting ctrl-alt-del)

Comment 21 Matthew Miller 2018-11-05 20:58:05 UTC
Created attachment 1502215 [details]
log of latest failed offline updates session

Comment 22 Artem Hluvchynskyi 2018-11-05 22:04:47 UTC
Same happens to me with every offline update after going 28->29. The package in "TransactionItem state is not set" error is always different. Directly using dnf (update or install) doesn't cause this currently.

Comment 23 Petr Schindler 2018-11-06 07:08:57 UTC
I can confirm that this happens to me in every offline update. Dnf works fine.

Comment 24 Daniel Mach 2018-11-06 14:02:52 UTC
I wrote a patch for the "TransactionItem state is not set" issue:
https://github.com/rpm-software-management/libdnf/pull/627
I's fixing PackageKit.

reproducer on F29:
$ dnf distro-sync
# it happens when there are multilib packages installed (same name, different arch)
$ dnf install alsa-lib.i686 alsa-lib.x86_64
$ koji download-build --arch=x86_64 --arch=i686 alsa-lib-1.1.7-1.fc29
$ rpm -Uvh ./alsa-lib-1.1.7-1.fc29.i686.rpm ./alsa-lib-1.1.7-1.fc29.x86_64.rpm --force
$ systemctl restart packagekit
$ pkcon update
$ systemctl status packagekit



jmracek is working on a fix for problem reported in comment#19.
It's a different issue than the one originally reported:
When a transaction fails in the middle (reboot, sigkill, problem with dnf/libdnf), rpmdb ends up with duplicate records.
It's a broken state of rpmdb that you cannot achieve under normal conditions.
The history database doesn't handle this situation correctly, it always changes the state of the first occurence of the NEVRA and that leads to "TransactionItem state is not set" issues for the duplicates.
Next run always passes as the duplicates were removed from the rpmdb during the previous run.

Comment 25 Jaroslav Mracek 2018-11-06 15:35:26 UTC
I created a patch for problem described in Comment 19 (https://github.com/rpm-software-management/dnf/pull/1260).

Comment 26 Matthew Miller 2018-11-06 18:00:23 UTC
We should update https://fedoraproject.org/wiki/Common_F29_bugs#libdnf-crash-offline-updates. It definitely isn't fixed in the libdnf version listed there. And we should probably mention the "97%" thing.

A couple of questions.

1. Daniel, it sounds like this is a problem with multi-arch handling? Or am I misunderstanding your comment?

2. How did this regression happen? How did we not catch it earlier?

3. I've been hitting ctrl-alt-delete once I'm at the stuck point. Um, everything seems fine. What's that last 3%?

Comment 27 Adam Williamson 2018-11-06 18:13:37 UTC
The case where the db had duplicate entries was actually already filed before:

https://bugzilla.redhat.com/show_bug.cgi?id=1627534

but it wound up getting closed as a dupe when apparently it shouldn't have been. For that case, perhaps we should re-open that bug, and mark Jaroslav's patch as fixing that bug, and keep this bug strictly for the multiarch case?

Comment 28 Adam Williamson 2018-11-06 18:19:39 UTC
Well, actually, the initial report there was of the known grub2 case that got fixed, then Zbigniew added a comment which turned out to be the 'duplicate entry' case...so for clarity, I've just filed a new bug for that case:

https://bugzilla.redhat.com/show_bug.cgi?id=1647144

let's keep this one strictly for the multilib issue. Thanks!

Comment 29 Adam Williamson 2018-11-06 20:39:21 UTC
I updated the commonbugs entry also.

I suspect the answer to Matt's 2) is at least partly "no-one is testing multilib updates as a matter of course"...

Comment 30 Matthew Miller 2018-11-06 21:05:09 UTC
(In reply to Adam Williamson from comment #29)
> I suspect the answer to Matt's 2) is at least partly "no-one is testing
> multilib updates as a matter of course"...

Ouch. Can we add "multilib updates work" as a test case (and possibly a release criterion)?

Comment 31 Adam Williamson 2018-11-06 22:39:48 UTC
we...could, I think it might be a bit complex though. The trick with update tests is you need updates; the tests can't usually just rely on there being some updates of the required kind, we have to dummy them up. For the formal update test case and the openQA implementation of it I currently use a dummy python3-kickstart package (which is obviously noarch). I could try replacing that or augmenting it with a dummy multiarch package, but I'm not sure how easy it'd be off the bat, I'd have to take a look at it...

Comment 32 Daniel Mach 2018-11-07 09:06:00 UTC
Matthew,
We'd like make sure your problem is fixed completely.
Could you provide us data from your system that would allow
us to clone it and test the code in an indentical environment?

`rpm -qa` output
`dnf check` output
/etc/dnf/*
/etc/yum.repos.d/*
/var/lib/rpm/*
/var/lib/dnf/history.sqlite


(In reply to Matthew Miller from comment #26)
To your questions:

> A couple of questions.
> 
> 1. Daniel, it sounds like this is a problem with multi-arch handling? Or am
> I misunderstanding your comment?
Correct.
As Adam mentioned in comment#30, this would probably deserve running a test or two for both dnf and PackageKit as release criteria.

> 
> 2. How did this regression happen? How did we not catch it earlier?
I believe the bug was there since we introduced the new history database in F29 Rawhide, which was in June. It looks like most Rawhide and Branched users use dnf rather than PackageKit and that's why it wasn't discovered earlier.

> 
> 3. I've been hitting ctrl-alt-delete once I'm at the stuck point. Um,
> everything seems fine. What's that last 3%?
When transaction is finished, a check is performed that all transaction items are processed. The check failed and it failed after everything was on disk already. This means that after you hit ctrl-alt-delete and reboot, your system should be all fine.

Comment 33 Kalev Lember 2018-11-07 13:28:01 UTC
*** Bug 1647435 has been marked as a duplicate of this bug. ***

Comment 34 Jaroslav Mracek 2018-11-07 16:48:59 UTC
I tested the patches according to conditions from Comments 13 to 16 and without the patches and it exhibit the issue and after applying them the issue disappear. The test was performed with Gnome software and transactions after reboot. Therefore I believe that the problem is fixed. I also try to test the issue from Comment 19 and it looks like that the problem is also fixed with DNF patch. The patches are part of DNF-4.0.4-2 and libdnf-0.22.0-8 and I am creating a bodhi update for Fedora 29. Please could anyone verify both case?

Comment 35 Fedora Update System 2018-11-07 16:49:33 UTC
libdnf-0.22.0-8.fc29 dnf-4.0.4-2.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2018-32186e8871

Comment 36 Fedora Update System 2018-11-08 05:20:31 UTC
dnf-4.0.4-2.fc29, libdnf-0.22.0-8.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-32186e8871

Comment 37 Petr Schindler 2018-11-08 14:21:51 UTC
The problem is fixed with libdnf-0.22.0-8.fc29. Update finished completely and system rebooted. Great, thanks for fix.

Comment 38 Lukas Ruzicka 2018-11-08 15:10:18 UTC
I can also confirm, that this is fixed. Thanks.

Comment 39 Fedora Update System 2018-11-09 06:03:34 UTC
dnf-4.0.4-2.fc29, libdnf-0.22.0-8.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.

Comment 40 Kalev Lember 2018-11-13 07:06:32 UTC
*** Bug 1649124 has been marked as a duplicate of this bug. ***

Comment 41 Jaroslav Mracek 2018-11-26 11:41:46 UTC
*** Bug 1652996 has been marked as a duplicate of this bug. ***

Comment 42 Daniel Mach 2019-03-01 19:49:40 UTC
*** Bug 1648741 has been marked as a duplicate of this bug. ***

Comment 43 Daniel Mach 2019-03-08 15:29:05 UTC
*** Bug 1680975 has been marked as a duplicate of this bug. ***