Bug 1631533 - dnf and packagekitd can crash when libdnf swdb is locked by another process
Summary: dnf and packagekitd can crash when libdnf swdb is locked by another process
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: libdnf
Version: 29
Hardware: All
OS: Linux
high
high
Target Milestone: ---
Assignee: Daniel Mach
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: RejectedBlocker https://fedoraproject...
: 1608686 1630488 1632822 1634674 1636886 1639965 1641261 1676440 1702423 1708594 1709198 1710086 1716315 1730193 1730416 1741097 (view as bug list)
Depends On:
Blocks: 1632521
TreeView+ depends on / blocked
 
Reported: 2018-09-20 19:39 UTC by Adam Williamson
Modified: 2019-08-31 01:38 UTC (History)
37 users (show)

Fixed In Version: libdnf-0.31.0-6.fc29
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1632521 (view as bug list)
Environment:
Last Closed: 2019-08-31 01:38:26 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
backtrace of the crash (15.40 KB, text/plain)
2018-09-20 19:39 UTC, Adam Williamson
no flags Details

Description Adam Williamson 2018-09-20 19:39:49 UTC
Created attachment 1485270 [details]
backtrace of the crash

packagekitd seems to crash when it tries to do something while another process is locking the libdnf swdb. I saw this twice in a VM while I was trying to debug *another* crash with abrt - presumably abrt was locking the swdb while installing debugging symbols.

Here's journal messages from around the time of the crash:

Sep 20 12:03:07 localhost-live.happyassassin.net gnome-software[9072]: Only 3 apps for recent list, hiding
Sep 20 12:03:07 localhost-live.happyassassin.net packagekitd[10758]: terminate called after throwing an instance of 'SQLite3::LibException'
Sep 20 12:03:07 localhost-live.happyassassin.net packagekitd[10758]:   what():  Exec failed: database is locked
Sep 20 12:03:07 localhost-live.happyassassin.net audit[10758]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 subj=system_u:system_r:rpm_t:s0 pid=10758 comm="packagekitd" exe="/usr/libexec/packagekitd" sig=6 res=1
Sep 20 12:03:07 localhost-live.happyassassin.net systemd[1]: Started Process Core Dump (PID 12002/UID 0).
Sep 20 12:03:07 localhost-live.happyassassin.net audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@2-12002-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 20 12:03:07 localhost-live.happyassassin.net gnome-software[9072]: couldn't parse execption 'org.freedesktop.DBus.Error.NoReply', please report
Sep 20 12:03:07 localhost-live.happyassassin.net gnome-software[9072]: not handling error failed for action search-provides: Message recipient disconnected from message bus without replying
Sep 20 12:03:07 localhost-live.happyassassin.net audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=packagekit comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Sep 20 12:03:07 localhost-live.happyassassin.net systemd[1]: packagekit.service: Main process exited, code=dumped, status=6/ABRT
Sep 20 12:03:07 localhost-live.happyassassin.net systemd[1]: packagekit.service: Failed with result 'core-dump'.

I will attach the backtrace of the crash.

Comment 1 Adam Williamson 2018-09-20 19:42:10 UTC
Looks like it crashed here:

https://github.com/hughsie/PackageKit/blob/master/backends/dnf/pk-backend-dnf.c#L471

Comment 2 Adam Williamson 2018-09-20 19:43:38 UTC
https://retrace.fedoraproject.org/faf/reports/2315896/ is the FAF for this. I couldn't get abrt to report it properly, had to generate the backtrace via coredumpctl.

Comment 3 Adam Williamson 2018-09-20 19:48:45 UTC
I can reproduce broadly the same crash on demand by running 'dnf -y install (something)' in one tab, and 'pkcon -y install (something else)' in another tab. So long as the timing is right, packagekitd crashes.

Comment 4 Kalev Lember 2018-09-20 19:57:06 UTC
Reassigning to libdnf. Looks like it's crashing somewhere in swdb code.

Comment 5 Adam Williamson 2018-09-20 20:00:14 UTC
PackageKit seems to do some locking via libdnf in pk_backend_repo_set_data_thread , but we don't go through that in this backtrace, AFAICS. We go through pk_transaction_run -> pk_backend_get_update_detail -> pk_backend_job_set_context , where the crash happens. Nothing in that chain does any locking AFAICS.

Comment 6 Adam Williamson 2018-09-20 20:07:33 UTC
note the PK code here is like two years old, so the thing that changed here seems to be the introduction of swdb. If libdnf wants consumers to do some locking around calls that now use swdb, that should've been communicated, I guess.

Comment 7 Adam Williamson 2018-09-20 20:28:02 UTC
To be specific, the libdnf-side bits here all basically date to e2f2862bedd0c278ed9781624ddc6861686a4760 , "[swdb]: C++ implementation with SWIG bindings."

Comment 8 Adam Williamson 2018-09-20 20:34:39 UTC
Aha, I was able to produce a very similar crash in dnf on another try, using a combination of commands suggested by George Goffe in https://bugzilla.redhat.com/show_bug.cgi?id=1596827#c32 . I ran `dnf -y reinstall libreoffice*` in one tab, and `dnf upgrade` in another tab. The `dnf upgrade` crashed:

[root@localhost-live tmp]# dnf upgrade
Last metadata expiration check: 1:59:58 ago on Thu 20 Sep 2018 11:32:14 AM PDT.
Traceback (most recent call last):
  File "/usr/bin/dnf", line 58, in <module>
    main.user_main(sys.argv[1:], exit_code=True)
  File "/usr/lib/python3.7/site-packages/dnf/cli/main.py", line 179, in user_main
    errcode = main(args)
  File "/usr/lib/python3.7/site-packages/dnf/cli/main.py", line 64, in main
    return _main(base, args, cli_class, option_parser_class)
  File "/usr/lib/python3.7/site-packages/dnf/cli/main.py", line 99, in _main
    return cli_run(cli, base)
  File "/usr/lib/python3.7/site-packages/dnf/cli/main.py", line 123, in cli_run
    ret = resolving(cli, base)
  File "/usr/lib/python3.7/site-packages/dnf/cli/main.py", line 146, in resolving
    base.resolve(cli.demands.allow_erasing)
  File "/usr/lib/python3.7/site-packages/dnf/base.py", line 736, in resolve
    self._transaction = self._goal2transaction(goal)
  File "/usr/lib/python3.7/site-packages/dnf/base.py", line 633, in _goal2transaction
    ts.add_install(pkg, obs, reason)
  File "/usr/lib/python3.7/site-packages/dnf/db/group.py", line 256, in add_install
    ti_new = self.new(new, libdnf.transaction.TransactionItemAction_INSTALL, reason)
  File "/usr/lib/python3.7/site-packages/dnf/db/group.py", line 219, in new
    rpm_item = self._pkg_to_swdb_rpm_item(pkg)
  File "/usr/lib/python3.7/site-packages/dnf/db/group.py", line 210, in _pkg_to_swdb_rpm_item
    rpm_item = self.history.swdb.createRPMItem()
  File "/usr/lib/python3.7/site-packages/dnf/db/history.py", line 291, in swdb
    self._swdb = libdnf.transaction.Swdb(self.dbpath)
  File "/usr/lib64/python3.7/site-packages/libdnf/transaction.py", line 729, in __init__
    this = _transaction.new_Swdb(*args)
RuntimeError: C++ std::exception: Exec failed: database is locked

it definitely seems like libdnf is missing some locking around swdb use, here.

Comment 9 Adam Williamson 2018-09-20 20:39:37 UTC
Proposing this as a Final blocker as a violation of "The installed system must be able appropriately to install, remove, and update software with the default console tool for the relevant software type (e.g. default console package manager). This includes downloading of packages to be installed/updated." I'm gonna argue this is basic enough to constitute a violation of "appropriately", there: multiple libdnf-using-things should not be able to crash each other over a database lock like this.

Comment 10 Geoffrey Marr 2018-09-24 19:35:15 UTC
Discussed during the 2018-09-24 blocker review meeting: [1]

The decision to classify this bug as a "RejectedBlocker" was made as this doesn't seem to actually cause any permanent problems or impair the system's ability to install software, it's just a transient ugly error. If we find it can cause permanent issues we'll revisit the bug and go from there.

[1] https://meetbot.fedoraproject.org/fedora-blocker-review/2018-09-24/f29-blocker-review.2018-09-24-16.05.txt

Comment 11 Adam Williamson 2018-10-12 15:39:13 UTC
Note, this seems to be happening in real life quite a lot - msuchy just posted this link to devel@:

https://retrace.fedoraproject.org/faf/reports/?opsysreleases=20&opsysreleases=127&component_names=dnf%2Cdnf-plugins-core&associate=__None&first_occurrence_daterange=2018-09-29%3A2018-10-12&last_occurrence_daterange=&order_by=last_occurrence

all the ones there with `__init__` listed as the crash function seem to be the same as this (the `__init__` in question, in each case, is libdnf/transaction.py __init__, they're all crashing at this same place).

Comment 12 Kamil Páral 2018-10-16 08:54:02 UTC
*** Bug 1636886 has been marked as a duplicate of this bug. ***

Comment 13 Kamil Páral 2018-10-16 08:54:42 UTC
*** Bug 1608686 has been marked as a duplicate of this bug. ***

Comment 14 kxra 2018-10-16 23:36:05 UTC
I just got this at the very end of an upgrade after downloading, installing, cleaning up, and verifying. Here's the output: 

The downloaded packages were saved in cache until the next successful transaction.

You can remove cached packages by executing 'dnf clean packages'.

Traceback (most recent call last):

  File "/usr/bin/dnf", line 58, in <module>

    main.user_main(sys.argv[1:], exit_code=True)

  File "/usr/lib/python3.7/site-packages/dnf/cli/main.py", line 179, in user_main

    errcode = main(args)

  File "/usr/lib/python3.7/site-packages/dnf/cli/main.py", line 64, in main

    return _main(base, args, cli_class, option_parser_class)

  File "/usr/lib/python3.7/site-packages/dnf/cli/main.py", line 99, in _main

    return cli_run(cli, base)

  File "/usr/lib/python3.7/site-packages/dnf/cli/main.py", line 123, in cli_run

    ret = resolving(cli, base)

  File "/usr/lib/python3.7/site-packages/dnf/cli/main.py", line 154, in resolving

    base.do_transaction(display=displays)

  File "/usr/lib/python3.7/site-packages/dnf/cli/cli.py", line 235, in do_transaction

    tid = super(BaseCli, self).do_transaction(display)

  File "/usr/lib/python3.7/site-packages/dnf/base.py", line 847, in do_transaction

    # put back our depcheck callback

  File "/usr/lib/python3.7/site-packages/dnf/base.py", line 997, in _run_transaction

    try:

  File "/usr/lib/python3.7/site-packages/dnf/base.py", line 1035, in _verify_transaction

    

  File "/usr/lib/python3.7/site-packages/dnf/db/history.py", line 508, in end

    self.swdb.initTransaction()

  File "/usr/lib/python3.7/site-packages/dnf/db/history.py", line 291, in swdb

    self._swdb = libdnf.transaction.Swdb(self.dbpath)

  File "/usr/lib64/python3.7/site-packages/libdnf/transaction.py", line 729, in __init__

    this = _transaction.new_Swdb(*args)

RuntimeError: Exec failed: database is locked

Comment 15 Daniel Mach 2018-10-17 08:54:01 UTC
Which dnf, libdnf and PackageKit versions did you use?
Have you restarted PackageKit after libdnf upgrade to libdnf-0.22.0-1.fc29 ?

Is it reproducible with following builds?
libdnf-0.22.0-1.fc29.x86_64
dnf-4.0.4-1.fc29.noarch

Comment 16 Niki Guldbrand 2018-10-19 00:47:20 UTC
Similar problem has been detected:

happened during an while running 'dnf update' and there was a kernel update in that transaction

reporter:       libreport-2.9.6
cmdline:        /usr/bin/python3 /bin/dnf -y install --disablerepo=* /tmp/akmods.DZ8oVlsM/results/kmod-xtables-addons-4.18.14-300.fc29.x86_64-3.1-1.fc29.x86_64.rpm
crash_function: __init__
exception_type: RuntimeError
executable:     /bin/dnf
interpreter:    python3-3.7.0-9.fc29.x86_64
kernel:         4.18.10-300.fc29.x86_64
package:        dnf-3.6.1-2.fc29
reason:         transaction.py:729:__init__:RuntimeError: Exec failed: database is locked
runlevel:       N 5
type:           Python3
uid:            0

Comment 17 Mars 2018-11-12 07:49:25 UTC
Similar problem has been detected:

I typing into tilix terminal:

sudo dnf upgrade

This problem is coming at home too... but just once.
After installing new updates i not meet with this dnf bug. 

I think work well.

The true problem is gnome-software who's dont like to show me the new updates. I like to read package changes, and gnome-software 
show this informations and urls... but i dont like to update from gnome-sofware because 
 - not found updates, 
 - want to reboot my system
 - show tothing if i staring upgrade

I would like to see what doing my applicaions without starting from a private terminal... 

I love Fedora <3 

reporter:       libreport-2.9.6
cmdline:        /usr/bin/python3 /usr/bin/dnf upgrade
crash_function: __init__
exception_type: RuntimeError
executable:     /usr/bin/dnf
interpreter:    python3-3.7.1-2.fc29.x86_64
kernel:         4.18.17-300.fc29.x86_64
package:        dnf-4.0.4-2.fc29
reason:         transaction.py:729:__init__:RuntimeError: Exec failed: database is locked
runlevel:       N 5
type:           Python3
uid:            0

Comment 18 yuhan 2019-04-01 15:18:37 UTC
Similar problem has been detected:

trying to update by running sudo dnf upgrade --refresh

reporter:       libreport-2.10.0
cmdline:        /usr/bin/python3 /usr/bin/dnf upgrade --refresh
crash_function: __init__
exception_type: RuntimeError
executable:     /usr/bin/dnf
interpreter:    python3-3.7.2-5.fc29.x86_64
kernel:         5.0.4-200.fc29.x86_64
package:        dnf-4.2.2-1.fc29
reason:         transaction.py:729:__init__:RuntimeError: Exec failed: database is locked
runlevel:       N 5
type:           Python3
uid:            0

Comment 19 Daniel Mach 2019-06-02 04:27:12 UTC
*** Bug 1630488 has been marked as a duplicate of this bug. ***

Comment 20 Daniel Mach 2019-06-02 04:27:29 UTC
*** Bug 1634674 has been marked as a duplicate of this bug. ***

Comment 21 Daniel Mach 2019-06-02 04:27:38 UTC
*** Bug 1639965 has been marked as a duplicate of this bug. ***

Comment 22 Daniel Mach 2019-06-02 04:27:55 UTC
*** Bug 1676440 has been marked as a duplicate of this bug. ***

Comment 23 Daniel Mach 2019-06-02 04:28:44 UTC
*** Bug 1708594 has been marked as a duplicate of this bug. ***

Comment 24 Daniel Mach 2019-06-02 04:28:57 UTC
*** Bug 1709198 has been marked as a duplicate of this bug. ***

Comment 25 Daniel Mach 2019-06-02 04:29:08 UTC
*** Bug 1710086 has been marked as a duplicate of this bug. ***

Comment 26 Daniel Mach 2019-06-02 04:29:55 UTC
*** Bug 1641261 has been marked as a duplicate of this bug. ***

Comment 27 Daniel Mach 2019-06-02 04:37:35 UTC
The minimal reproducer is:

#!/usr/bin/python3

from libdnf.transaction import *
from multiprocessing import Process

def f():
    while 1:
        db = Swdb("history.sqlite")
        db.closeDatabase()

if __name__ == '__main__':
    for num in range(10):
        Process(target=f).start()


The root cause is setting busy timeout after executing pragmas.
This causes pragmas fail on locked database, because they are regular statements
and it doesn't matter they don't manipulate with tables.


PR: https://github.com/rpm-software-management/libdnf/pull/741
Needs to be backported to f29, f30 and rawhide

Comment 28 Daniel Mach 2019-06-03 08:56:16 UTC
*** Bug 1716315 has been marked as a duplicate of this bug. ***

Comment 29 Daniel Mach 2019-06-03 09:02:43 UTC
*** Bug 1702423 has been marked as a duplicate of this bug. ***

Comment 30 Daniel Mach 2019-06-03 09:03:48 UTC
*** Bug 1632822 has been marked as a duplicate of this bug. ***

Comment 31 Marek Blaha 2019-07-22 11:46:48 UTC
*** Bug 1730193 has been marked as a duplicate of this bug. ***

Comment 32 Marek Blaha 2019-07-22 11:47:10 UTC
*** Bug 1730416 has been marked as a duplicate of this bug. ***

Comment 33 Fedora Update System 2019-08-14 07:23:41 UTC
FEDORA-2019-d4b6ede072 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-d4b6ede072

Comment 34 Fedora Update System 2019-08-16 20:12:07 UTC
dnf-4.2.5-4.fc29, dnf-plugins-extras-4.0.4-2.fc29, libdnf-0.31.0-6.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-d4b6ede072

Comment 35 Jaroslav Mracek 2019-08-19 11:22:40 UTC
*** Bug 1741097 has been marked as a duplicate of this bug. ***

Comment 36 Fedora Update System 2019-08-31 01:38:26 UTC
dnf-4.2.5-4.fc29, dnf-plugins-extras-4.0.4-2.fc29, libdnf-0.31.0-6.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.