Bug 1642796
Summary: | PackageKit terminated before end of offline update: TransactionItem state is not set (when any multiarch package is installed) | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Petr Schindler <pschindl> | ||||||||||||||
Component: | libdnf | Assignee: | rpm-software-management | ||||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||||
Severity: | high | Docs Contact: | |||||||||||||||
Priority: | urgent | ||||||||||||||||
Version: | 29 | CC: | arcadiy, awilliam, bbigby64, davidmenhur, dmach, excieve, jmracek, jonathan, keramidasceid, klember, kparal, lruzicka, mail, mattdm, michal.jnn, mikhail.zabaluev, mluscon, rdieter, rhughes, robatino, rpm-software-management, sgallagh, smparrish | ||||||||||||||
Target Milestone: | --- | Keywords: | CommonBugs, Triaged | ||||||||||||||
Target Release: | --- | ||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||
OS: | Unspecified | ||||||||||||||||
Whiteboard: | RejectedBlocker https://fedoraproject.org/wiki/Common_F29_bugs#libdnf-crash-offline-updates | ||||||||||||||||
Fixed In Version: | libdnf-0.22.0-8.fc29 | Doc Type: | If docs needed, set a value | ||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||
Clone Of: | |||||||||||||||||
: | 1649291 (view as bug list) | Environment: | |||||||||||||||
Last Closed: | 2018-11-09 06:03:34 UTC | Type: | Bug | ||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
Embargoed: | |||||||||||||||||
Bug Depends On: | |||||||||||||||||
Bug Blocks: | 1574713, 1649291 | ||||||||||||||||
Attachments: |
|
Description
Petr Schindler
2018-10-25 06:07:31 UTC
This is something going wrong in libdnf. Is there anything in coredumpctl to get a backtrace of the crash? Reassigning to libdnf. Also proposing as a F29 final blocker, as this prevents system updates from working. This seems the same error as in bug 1629340, which was fixed in: dnf-3.5.1-1.fc29 dnf-plugins-core-3.0.3-1.fc29 libdnf-0.19.1-3.fc29 And Petr originally had: dnf-3.6.1-2.fc29 dnf-plugins-core-3.0.4-1.fc29 libdnf-0.20.0-1.fc29 which seems fine. So why is this still happening? Also please note that none of us have seen this yet and we don't have a reproducer for this. Correcting Petr's mistake. I believe that the issue is solved with libdnf-0.22.0-6. Please can anyone reproduce it with latest libdnf and after restarting of PackageKit? I just updated my system and saw the same thing. I didn't update to latest libdnf, though. So I can confirm the original but not necessarily the fix. (In reply to Kamil Páral from comment #2) > This seems the same error as in bug 1629340, which was fixed in: > dnf-3.5.1-1.fc29 > dnf-plugins-core-3.0.3-1.fc29 > libdnf-0.19.1-3.fc29 > > And Petr originally had: > dnf-3.6.1-2.fc29 > dnf-plugins-core-3.0.4-1.fc29 > libdnf-0.20.0-1.fc29 > > which seems fine. So why is this still happening? > > Also please note that none of us have seen this yet and we don't have a > reproducer for this. It's the same error message, but looking at alsa-lib, it doesn't seem like it has any Obsoletes: specified at all, which was the source of that earlier bug. So I doubt it's the same issue. alsa-lib is mentioned probably because it's the first update alphabetically Discussed during 2018-10-25 Go/NoGo meeting [1]: #agreed 1642796 - RejectedBlocker - Testers cannot reproduce the issue with up-to-date systems [1] https://meetbot.fedoraproject.org/teams/f29-final-go_no_go-meeting/f29-final-go_no_go-meeting.2018-10-25-17.03.html This has occurred to me when trying to update packages after upgrading to F29: Nov 02 05:53:18 packagekitd[947]: terminate called after throwing an instance of 'std::runtime_error' Nov 02 05:53:18 packagekitd[947]: what(): TransactionItem state is not set: cairo-1.15.14-1.fc29.x86_64 (In reply to Mikhail Zabaluev from comment #9) > This has occurred to me when trying to update packages after upgrading to > F29: libdnf is at 0.22.0-6.fc29 and was at that release before the upgrade. Please can you provide following information: Can you attach outputs from: "dnf history" "dnf history info <number>" where number should be the transaction performed by packagekit. Please can you provide a DB - ls /var/lib/dnf/history.sqlite ? The same problem happened to me again yesterday. I will attach information asked in comment 11. Version: libdnf-0.22.0-6.fc29.x86_64 Created attachment 1500406 [details]
output of journalctl -b -1 -a
Created attachment 1500411 [details]
#dnf history
Created attachment 1500420 [details]
#dnf history info last
Created attachment 1500421 [details]
/var/lib/dnf/history.sqlite
This is still happening to me too, with libdnf-0.22.0-6.fc29.x86_64 Proposed as a Blocker for 30-beta by Fedora user mattdm using the blocker tracking app because: Updates aren't functional. I just managed to reproduce this error in dnf by accident while testing something else. The scenario was quite elaborate, though most parts of it probably aren't relevant, but here it is: 1. I'm inside a mock. 2. Inside the mock is a filesystem image, loopback mounted as /mnt/sysimage . 3. From the mock root, I'm doing this over and over: dnf --installroot /mnt/sysimage/ reinstall kernel-core-4.20.0-0.rc0.git8.1.fc30.x86_64.rpm One time, I had replaced an executable used by kernel-core's scriptlets, but forgot to mark the replacement as executable, so the scriptlets failed: Reinstalled: kernel-core-4.20.0-0.rc0.git8.1.fc30.x86_64 Running scriptlet: kernel-core-4.20.0-0.rc0.git8.1.fc30.x86_64 2/2 /var/tmp/rpm-tmp.9MeBzi: line 1: /bin/kernel-install: Permission denied error: %preun(kernel-core-4.20.0-0.rc0.git8.1.fc30.x86_64) scriptlet failed, exit status 126 Error in PREUN scriptlet in rpm package kernel-core error: kernel-core-4.20.0-0.rc0.git8.1.fc30.x86_64: erase failed /var/tmp/rpm-tmp.bVoMOd: line 1: /bin/kernel-install: Permission denied warning: %posttrans(kernel-core-4.20.0-0.rc0.git8.1.fc30.x86_64) scriptlet failed, exit status 126 Error in POSTTRANS scriptlet in rpm package kernel-core When I then fixed this (made the replacement file executable then ran the reinstall again), I hit this error: Traceback (most recent call last): File "/usr/bin/dnf", line 58, in <module> main.user_main(sys.argv[1:], exit_code=True) File "/usr/lib/python3.7/site-packages/dnf/cli/main.py", line 179, in user_main errcode = main(args) File "/usr/lib/python3.7/site-packages/dnf/cli/main.py", line 64, in main return _main(base, args, cli_class, option_parser_class) File "/usr/lib/python3.7/site-packages/dnf/cli/main.py", line 99, in _main return cli_run(cli, base) File "/usr/lib/python3.7/site-packages/dnf/cli/main.py", line 123, in cli_run ret = resolving(cli, base) File "/usr/lib/python3.7/site-packages/dnf/cli/main.py", line 154, in resolving base.do_transaction(display=displays) File "/usr/lib/python3.7/site-packages/dnf/cli/cli.py", line 235, in do_transaction tid = super(BaseCli, self).do_transaction(display) File "/usr/lib/python3.7/site-packages/dnf/base.py", line 858, in do_transaction tid = self._run_transaction(cb=cb) File "/usr/lib/python3.7/site-packages/dnf/base.py", line 1008, in _run_transaction self._verify_transaction(cb.verify_tsi_package) File "/usr/lib/python3.7/site-packages/dnf/base.py", line 1046, in _verify_transaction self.history.end(rpmdbv, 0) File "/usr/lib/python3.7/site-packages/dnf/db/history.py", line 498, in end bool(return_code) File "/usr/lib64/python3.7/site-packages/libdnf/transaction.py", line 758, in endTransaction return _transaction.Swdb_endTransaction(self, dtEnd, rpmdbVersionEnd, state) RuntimeError: TransactionItem state is not set: kernel-core-4.20.0-0.rc0.git8.1.fc30.x86_64 So - I wonder if this bug can occur on the *next* dnf transaction involving a package whose scriptlets failed in a previous transaction? It's definitely not just the *next* transaction. This happens to me every time with offline updates, and online dnf transactions in between seem to be fine. Will attach journal from last update. (Ends with me hitting ctrl-alt-del) Created attachment 1502215 [details]
log of latest failed offline updates session
Same happens to me with every offline update after going 28->29. The package in "TransactionItem state is not set" error is always different. Directly using dnf (update or install) doesn't cause this currently. I can confirm that this happens to me in every offline update. Dnf works fine. I wrote a patch for the "TransactionItem state is not set" issue: https://github.com/rpm-software-management/libdnf/pull/627 I's fixing PackageKit. reproducer on F29: $ dnf distro-sync # it happens when there are multilib packages installed (same name, different arch) $ dnf install alsa-lib.i686 alsa-lib.x86_64 $ koji download-build --arch=x86_64 --arch=i686 alsa-lib-1.1.7-1.fc29 $ rpm -Uvh ./alsa-lib-1.1.7-1.fc29.i686.rpm ./alsa-lib-1.1.7-1.fc29.x86_64.rpm --force $ systemctl restart packagekit $ pkcon update $ systemctl status packagekit jmracek is working on a fix for problem reported in comment#19. It's a different issue than the one originally reported: When a transaction fails in the middle (reboot, sigkill, problem with dnf/libdnf), rpmdb ends up with duplicate records. It's a broken state of rpmdb that you cannot achieve under normal conditions. The history database doesn't handle this situation correctly, it always changes the state of the first occurence of the NEVRA and that leads to "TransactionItem state is not set" issues for the duplicates. Next run always passes as the duplicates were removed from the rpmdb during the previous run. I created a patch for problem described in Comment 19 (https://github.com/rpm-software-management/dnf/pull/1260). We should update https://fedoraproject.org/wiki/Common_F29_bugs#libdnf-crash-offline-updates. It definitely isn't fixed in the libdnf version listed there. And we should probably mention the "97%" thing. A couple of questions. 1. Daniel, it sounds like this is a problem with multi-arch handling? Or am I misunderstanding your comment? 2. How did this regression happen? How did we not catch it earlier? 3. I've been hitting ctrl-alt-delete once I'm at the stuck point. Um, everything seems fine. What's that last 3%? The case where the db had duplicate entries was actually already filed before: https://bugzilla.redhat.com/show_bug.cgi?id=1627534 but it wound up getting closed as a dupe when apparently it shouldn't have been. For that case, perhaps we should re-open that bug, and mark Jaroslav's patch as fixing that bug, and keep this bug strictly for the multiarch case? Well, actually, the initial report there was of the known grub2 case that got fixed, then Zbigniew added a comment which turned out to be the 'duplicate entry' case...so for clarity, I've just filed a new bug for that case: https://bugzilla.redhat.com/show_bug.cgi?id=1647144 let's keep this one strictly for the multilib issue. Thanks! I updated the commonbugs entry also. I suspect the answer to Matt's 2) is at least partly "no-one is testing multilib updates as a matter of course"... (In reply to Adam Williamson from comment #29) > I suspect the answer to Matt's 2) is at least partly "no-one is testing > multilib updates as a matter of course"... Ouch. Can we add "multilib updates work" as a test case (and possibly a release criterion)? we...could, I think it might be a bit complex though. The trick with update tests is you need updates; the tests can't usually just rely on there being some updates of the required kind, we have to dummy them up. For the formal update test case and the openQA implementation of it I currently use a dummy python3-kickstart package (which is obviously noarch). I could try replacing that or augmenting it with a dummy multiarch package, but I'm not sure how easy it'd be off the bat, I'd have to take a look at it... Matthew, We'd like make sure your problem is fixed completely. Could you provide us data from your system that would allow us to clone it and test the code in an indentical environment? `rpm -qa` output `dnf check` output /etc/dnf/* /etc/yum.repos.d/* /var/lib/rpm/* /var/lib/dnf/history.sqlite (In reply to Matthew Miller from comment #26) To your questions: > A couple of questions. > > 1. Daniel, it sounds like this is a problem with multi-arch handling? Or am > I misunderstanding your comment? Correct. As Adam mentioned in comment#30, this would probably deserve running a test or two for both dnf and PackageKit as release criteria. > > 2. How did this regression happen? How did we not catch it earlier? I believe the bug was there since we introduced the new history database in F29 Rawhide, which was in June. It looks like most Rawhide and Branched users use dnf rather than PackageKit and that's why it wasn't discovered earlier. > > 3. I've been hitting ctrl-alt-delete once I'm at the stuck point. Um, > everything seems fine. What's that last 3%? When transaction is finished, a check is performed that all transaction items are processed. The check failed and it failed after everything was on disk already. This means that after you hit ctrl-alt-delete and reboot, your system should be all fine. *** Bug 1647435 has been marked as a duplicate of this bug. *** I tested the patches according to conditions from Comments 13 to 16 and without the patches and it exhibit the issue and after applying them the issue disappear. The test was performed with Gnome software and transactions after reboot. Therefore I believe that the problem is fixed. I also try to test the issue from Comment 19 and it looks like that the problem is also fixed with DNF patch. The patches are part of DNF-4.0.4-2 and libdnf-0.22.0-8 and I am creating a bodhi update for Fedora 29. Please could anyone verify both case? libdnf-0.22.0-8.fc29 dnf-4.0.4-2.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2018-32186e8871 dnf-4.0.4-2.fc29, libdnf-0.22.0-8.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-32186e8871 The problem is fixed with libdnf-0.22.0-8.fc29. Update finished completely and system rebooted. Great, thanks for fix. I can also confirm, that this is fixed. Thanks. dnf-4.0.4-2.fc29, libdnf-0.22.0-8.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report. *** Bug 1649124 has been marked as a duplicate of this bug. *** *** Bug 1652996 has been marked as a duplicate of this bug. *** *** Bug 1648741 has been marked as a duplicate of this bug. *** *** Bug 1680975 has been marked as a duplicate of this bug. *** |