1627954 – Crash in libdnf hy_repo_get_string while auto-refreshing updates (I think)

Bug 1627954 - Crash in libdnf hy_repo_get_string while auto-refreshing updates (I think)

Summary: Crash in libdnf hy_repo_get_string while auto-refreshing updates (I think)

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	PackageKit
Sub Component:
Version:	31
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Richard Hughes
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-09-12 00:32 UTC by Adam Williamson
Modified:	2020-11-03 22:33 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2020-11-03 22:33:23 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
backtrace of the crash (14.69 KB, text/plain) 2018-09-12 00:35 UTC, Adam Williamson	no flags	Details
abrt crash problem directory (inc. coredump) (16.34 MB, application/octet-stream) 2018-09-13 16:59 UTC, Adam Williamson	no flags	Details
View All

Description Adam Williamson 2018-09-12 00:32:23 UTC

One of the openQA Rawhide tests failed today:

https://openqa.fedoraproject.org/tests/278254

That test boots a clean installed system to the desktop, then switches to a console, removes python3-kickstart (if installed), installs a dummy low-versioned python3-kickstart (to ensure an update will be available), switches back to the desktop, runs GNOME Software, checks for updates, and installs them.

...only in this case, when it ran GNOME Software, it just got stuck displaying "Software catalog is being loaded", so the test failed.

Looking more closely, it appears packagekitd crashed shortly after it initially started. It was then restarted later, but the crash is likely relevant here:

Sep 11 11:02:06 ibm-p8-kvm-03-guest-02.virt.pnr.lab.eng.rdu2.redhat.com systemd[1]: Started PackageKit Daemon.
Sep 11 11:02:53 ibm-p8-kvm-03-guest-02.virt.pnr.lab.eng.rdu2.redhat.com PackageKit[1094]: what-provides transaction /15_eaabeaca from uid 1000 finished with failed after 3379ms
Sep 11 11:02:54 ibm-p8-kvm-03-guest-02.virt.pnr.lab.eng.rdu2.redhat.com systemd[1]: packagekit.service: Main process exited, code=killed, status=11/SEGV
Sep 11 11:02:54 ibm-p8-kvm-03-guest-02.virt.pnr.lab.eng.rdu2.redhat.com systemd[1]: packagekit.service: Failed with result 'signal'.
Sep 11 11:04:24 ibm-p8-kvm-03-guest-02.virt.pnr.lab.eng.rdu2.redhat.com systemd[1]: Starting PackageKit Daemon...
Sep 11 11:04:24 ibm-p8-kvm-03-guest-02.virt.pnr.lab.eng.rdu2.redhat.com PackageKit[2705]: daemon start
Sep 11 11:04:24 ibm-p8-kvm-03-guest-02.virt.pnr.lab.eng.rdu2.redhat.com systemd[1]: Started PackageKit Daemon.
Sep 11 11:04:27 ibm-p8-kvm-03-guest-02.virt.pnr.lab.eng.rdu2.redhat.com PackageKit[2705]: resolve transaction /17_dbbbcdec from uid 1000 finished with cancelled after 2687ms
Sep 11 11:04:27 ibm-p8-kvm-03-guest-02.virt.pnr.lab.eng.rdu2.redhat.com PackageKit[2705]: resolve transaction /19_bbdcacdc from uid 1000 finished with cancelled after 437ms
Sep 11 11:04:30 ibm-p8-kvm-03-guest-02.virt.pnr.lab.eng.rdu2.redhat.com PackageKit[2705]: resolve transaction /20_aabdaaac from uid 1000 finished with failed after 2466ms

In terms of what the test does, the crash happened shortly after the test initially logged into a clean desktop. It happened *before* the test got to messing around with python3-kickstart at a console (the test was still busy verifying it was actually at a clean desktop when the crash happened, which it does by hitting 'super' every second and seeing if it sees the overview app grid icon). So I think it basically happened while the automatic 'check if any updates are available right after login' thing was going on.

Here's the backtrace of the relevant thread from the crash. I'll attach the full backtrace:

Thread 1 (Thread 0x7fc1d60cb700 (LWP 1896)):
#0  hy_repo_get_string (repo=0x0, which=which@entry=6)
    at /usr/src/debug/libdnf-0.17.0-2.fc29.x86_64/libdnf/hy-repo.cpp:234
No locals.
#1  0x00007fc1dc572858 in (anonymous namespace)::readModuleMetadataFromRepo (
    install_root=0x7fc1dd847b1c <operator new(unsigned long)+28> "H\205\300t\a[\303\017\037D", platformModule=0x0, moduleDefaults=..., 
    modulePackages=0x7fc1c80a7c60, sack=0x5589091a3c60)
    at /usr/src/debug/libdnf-0.17.0-2.fc29.x86_64/libdnf/dnf-sack.cpp:2196
        hyRepo = <optimized out>
        modules_fn = <optimized out>
        yamlContent = ""
        pool = <optimized out>
        r = <optimized out>
        id = 1
        pool = <optimized out>
        r = <optimized out>
        id = <optimized out>
        hyRepo = <optimized out>
        modules_fn = <optimized out>
        yamlContent = <optimized out>
        exception = <optimized out>
        except = <optimized out>
#2  dnf_sack_filter_modules_v2(_DnfSack*, ModulePackageContainer*, char const**, char const*, char const*, bool) ()
    at /usr/src/debug/libdnf-0.17.0-2.fc29.x86_64/libdnf/dnf-sack.cpp:2354
        defaultsDirPath = 0x7fc1c8027e80 "/etc/dnf/modules.defaults.d/"
        priv = <optimized out>
        moduleDefaults = {
          prioritizer = std::shared_ptr<_ModulemdPrioritizer> (use count 1, weak count 0) = {get() = 0x7fc1c8001ec0}, defaults = std::map with 0 elements}
        defaultStreams = std::map with 231 elements = {
          [""] = ""<error reading variable: Cannot access memory at address 0x6448ec8348d38958>...}
        nevraTuple = std::tuple containing = {
          [1] = std::vector of length -4389590794240, capacity 112223571 = {
            <error reading variable nevraTuple (Cannot access memory at address 0x7fc100000000)>
#3  0x00007fc1dc573aed in dnf_sack_filter_modules ()
    at /usr/src/debug/libdnf-0.17.0-2.fc29.x86_64/libdnf/dnf-sack.cpp:2327
        hotfixRepos = std::vector of length 1, capacity 1 = {0x0}
#4  0x00007fc1dc704049 in dnf_utils_create_sack_for_filters ()
    at pk-backend-dnf.c:803
        ret = <optimized out>
        flags = <optimized out>
        cache_item = <optimized out>
        state_local = <optimized out>
        backend = <optimized out>
        job_data = 0x55890934f760
        priv = 0x558909103f40
        cache_key = 0x7fc1c8026eb0 "DnfSack::release_ver[30]::filelists|remote"
        install_root = 0x7fc1c8023250 "/"
        solv_dir = 0x7fc1c8023380 "/var/cache/PackageKit/30/hawkey"
        sack = 0x5589091a3c60
#5  0x00007fc1dc709b4c in pk_backend_search_thread () at pk-backend-dnf.c:1003
        ret = <optimized out>
        db = <optimized out>
        state_local = <optimized out>
        installs = 0x0
        pkglist = 0x0
        query = 0x0
        job_data = 0x55890934f760
        filters = 2
        search_tmp = 0x0
        error = 0x0
        sack = 0x0
        search = 0x0
        __func__ = "pk_backend_search_thread"
#6  0x00005589076dd05e in pk_backend_job_thread_setup (
    thread_data=0x55890930ed40) at pk-backend-job.c:813
        helper = 0x55890930ed40
#7  0x00007fc1eaf1748a in g_thread_proxy (data=0x5589090a06d0) at gthread.c:784
        thread = 0x5589090a06d0
        __func__ = "g_thread_proxy"
#8  0x00007fc1eadc858e in start_thread (arg=<optimized out>)
    at pthread_create.c:486
        ret = <optimized out>
        pd = <optimized out>
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140470496573184, 
                -3652607583065171130, 140733855716206, 140733855716207, 
                140733855716336, 140470496570880, 3660574204564118342, 
                3660442951183145798}, mask_was_saved = 0}}, priv = {pad = {
              0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, 
              canceltype = 0}}}
        not_first_call = <optimized out>
#9  0x00007fc1eacf71b3 in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Comment 1 Adam Williamson 2018-09-12 00:32:53 UTC

Leaving assigned to PK for now, but CCing libdnf folks.

Comment 2 Adam Williamson 2018-09-12 00:33:24 UTC

"nevraTuple = std::tuple containing = {
          [1] = std::vector of length -4389590794240, capacity 112223571 = {
            <error reading variable nevraTuple (Cannot access memory at address 0x7fc100000000)>" looks odd.

Comment 3 Adam Williamson 2018-09-12 00:35:38 UTC

Created attachment 1482515 [details]
backtrace of the crash

Comment 4 Adam Williamson 2018-09-12 00:36:28 UTC

Note this crash doesn't appear to happen *all* the time - I restarted the test and it worked OK.

Comment 5 Adam Williamson 2018-09-12 00:40:45 UTC

Same crash in a different test here:

https://openqa.fedoraproject.org/tests/278449

that test is trying to test update notification, but looks like the same crash happened there too.

Comment 6 Adam Williamson 2018-09-12 00:42:06 UTC

CCing sgallagh as this looks like it's in modularity bits...

Comment 7 Stephen Gallagher 2018-09-13 14:06:27 UTC

I'm going to need a proper coredump and/or consistent reproduction steps for this. I've been trying to reproduce it locally, but I'm having no luck.

Comment 8 Adam Williamson 2018-09-13 16:50:51 UTC

The backtrace was already generated from a coredump, what more are you going to do with it? You can download it from the 'Assets' tab of the openQA test if you need it, though:

https://openqa.fedoraproject.org/tests/278254/file/desktop_update_graphical-spoolabrt.tar.gz (the abrt problem directory, including the coredump)

Comment 9 Stephen Gallagher 2018-09-13 16:53:04 UTC

(In reply to Adam Williamson from comment #8)
> The backtrace was already generated from a coredump, what more are you going
> to do with it? You can download it from the 'Assets' tab of the openQA test
> if you need it, though:
> 
> https://openqa.fedoraproject.org/tests/278254/file/desktop_update_graphical-
> spoolabrt.tar.gz (the abrt problem directory, including the coredump)

Since the issue appears to be due to memory corruption, I want to inspect the variables to try and track down what got freed (or not initialized). The backtrace doesn't give me enough to go on, unfortunately.

Thanks for the link; I'll load that up and get back to you.

Comment 10 Adam Williamson 2018-09-13 16:59:15 UTC

Created attachment 1483108 [details]
abrt crash problem directory (inc. coredump)

Attaching the abrt problem directory tarball for posterity (it'll get cleaned up from openQA in a while).

Comment 11 Stephen Gallagher 2018-09-19 12:14:39 UTC

Is this issue still happening on Rawhide? For whatever reason, that coredump was useless and even when I had the matching debuginfo and debugsource packages, GDB was pointing at memory areas that didn't match the extracted coredump.

I haven't been able to reproduce the issue, but I'm wondering if the recent libdnf changes might have fixed it for us.

Comment 12 Ben Cotton 2019-08-13 16:49:18 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to '31'.

Comment 13 Ben Cotton 2020-11-03 17:18:39 UTC

This message is a reminder that Fedora 31 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '31'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 31 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 14 Adam Williamson 2020-11-03 22:33:23 UTC

Don't think we've seen this issue lately.

Note You need to log in before you can comment on or make changes to this bug.