Bug 1614511

Summary: anaconda crashes mysteriously during package installation in graphical mode (but not text mode, and not non-package-based installs)
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: anacondaAssignee: Anaconda Maintenance Team <anaconda-maint-list>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rawhideCC: anaconda-maint-list, dmach, jonathan, kellin, mkolman, robatino, rstrode, vanmeeuwen+fedora, vponcova, wwoods
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-08-13 15:09:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 1517011    
Attachments:
Description Flags
journal dump that contains the stack trace
none
more detailed trace of the GTK+ thread from gdb none

Description Adam Williamson 2018-08-09 18:43:41 UTC
We finally got a Rawhide compose:

https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20180808.n.1/

but most openQA tests of it fail:

https://openqa.fedoraproject.org/tests/overview?distri=fedora&version=Rawhide&build=Fedora-Rawhide-20180808.n.1&groupid=1

it seems that in every install that actually uses RPM packages (so, all DVD and network installs, but *not* dvd-ostree or live installs), and that ran graphically, anaconda crashes during early package installation. Interestingly, the one text install test did *not* crash.

Unfortunately, we don't seem to be able to get either a traceback or a core dump from anaconda - all we get in the logs is "systemd-coredump[2368]: Cannot store coredump of 1367 (anaconda): No space left on device".

This may well be in dnf or libdnf, but filing on anaconda at least for now.

This is obviously a Beta blocker as it prevents graphical install from any DVD or network install image.

Comment 1 Martin Kolman 2018-08-09 18:45:14 UTC
Adding Dan Mach to CC as this is very likely a DNF related issue.

Comment 2 Martin Kolman 2018-08-09 19:00:51 UTC
Created attachment 1474811 [details]
journal dump that contains the stack trace

Managed to extract the traceback from a kickstart tests VMs. All the vms run with inst.debug and debug=1 boot options, which could be what's needed to get the traceback.

Comment 3 Adam Williamson 2018-08-09 19:20:44 UTC
I also managed to get a dump, and also a core file (by doing hideous hacks to systemd-coredump). Core file is quite large so can't attach here, have sent it to Martin. 

Martin noticed that one common feature of all the traces is actually in GTK+, not to do with packaging at all - they all seem to have a thread with this trace:

                                                              Stack trace of thread 1575:
                                                              #0  0x00007f3e16af053f raise (libc.so.6)
                                                              #1  0x00007f3e16ada895 abort (libc.so.6)
                                                              #2  0x00007f3e08d2fde3 n/a (libglib-2.0.so.0)
                                                              #3  0x00007f3e08d8b0e2 g_assertion_message_error (libglib-2.0.so.0)
                                                              #4  0x00007f3dfd3b3e45 ensure_surface_for_gicon (libgtk-3.so.0)
                                                              #5  0x00007f3dfd3b4557 gtk_icon_helper_load_surface (libgtk-3.so.0)
                                                              #6  0x00007f3dfd3b4634 gtk_icon_helper_ensure_surface.part.4 (libgtk-3.so.0)
                                                              #7  0x00007f3dfd3b4828 _gtk_icon_helper_get_size (libgtk-3.so.0)
                                                              #8  0x00007f3dfd3c8161 gtk_image_get_content_size (libgtk-3.so.0)
                                                              #9  0x00007f3dfd329b57 gtk_css_custom_gadget_get_preferred_size (libgtk-3.so.0)
                                                              #10 0x00007f3dfd32dea9 gtk_css_gadget_get_preferred_size (libgtk-3.so.0)
                                                              #11 0x00007f3dfd3c8957 gtk_image_get_preferred_width (libgtk-3.so.0)
                                                              #12 0x00007f3dfd492901 gtk_widget_query_size_for_orientation (libgtk-3.so.0)
                                                              #13 0x00007f3dfd493130 gtk_widget_compute_size_for_orientation (libgtk-3.so.0)
                                                              #14 0x00007f3dfd49321e gtk_widget_get_preferred_width (libgtk-3.so.0)
                                                              #15 0x00007f3dfd2d8d52 gtk_box_get_content_size (libgtk-3.so.0)
                                                              #16 0x00007f3dfd329b57 gtk_css_custom_gadget_get_preferred_size (libgtk-3.so.0)
                                                              #17 0x00007f3dfd32dea9 gtk_css_gadget_get_preferred_size (libgtk-3.so.0)
                                                              #18 0x00007f3dfd2d9837 gtk_box_get_preferred_width (libgtk-3.so.0)
                                                              #19 0x00007f3dfd492901 gtk_widget_query_size_for_orientation (libgtk-3.so.0)
                                                              #20 0x00007f3dfd493130 gtk_widget_compute_size_for_orientation (libgtk-3.so.0)
                                                              #21 0x00007f3dfd49321e gtk_widget_get_preferred_width (libgtk-3.so.0)
                                                              #22 0x00007f3dfd492901 gtk_widget_query_size_for_orientation (libgtk-3.so.0)
                                                              #23 0x00007f3dfd493130 gtk_widget_compute_size_for_orientation (libgtk-3.so.0)
                                                              #24 0x00007f3dfd49321e gtk_widget_get_preferred_width (libgtk-3.so.0)
                                                              #25 0x00007f3dfd3acfad gtk_grid_request_run (libgtk-3.so.0)
                                                              #26 0x00007f3dfd3ad1d5 gtk_grid_get_size (libgtk-3.so.0)
                                                              #27 0x00007f3dfd329b57 gtk_css_custom_gadget_get_preferred_size (libgtk-3.so.0)
                                                              #28 0x00007f3dfd32dea9 gtk_css_gadget_get_preferred_size (libgtk-3.so.0)
                                                              #29 0x00007f3dfd3ab237 gtk_grid_get_preferred_width (libgtk-3.so.0)
                                                              #30 0x00007f3dfd492901 gtk_widget_query_size_for_orientation (libgtk-3.so.0)
                                                              #31 0x00007f3dfd493130 gtk_widget_compute_size_for_orientation (libgtk-3.so.0)
                                                              #32 0x00007f3dfd49321e gtk_widget_get_preferred_width (libgtk-3.so.0)
                                                              #33 0x00007f3dfd492901 gtk_widget_query_size_for_orientation (libgtk-3.so.0)
                                                              #34 0x00007f3dfd493130 gtk_widget_compute_size_for_orientation (libgtk-3.so.0)
                                                              #35 0x00007f3dfd49321e gtk_widget_get_preferred_width (libgtk-3.so.0)
                                                              #36 0x00007f3dfd2d8d52 gtk_box_get_content_size (libgtk-3.so.0)
                                                              #37 0x00007f3dfd329b57 gtk_css_custom_gadget_get_preferred_size (libgtk-3.so.0)
                                                              #38 0x00007f3dfd32dea9 gtk_css_gadget_get_preferred_size (libgtk-3.so.0)
                                                              #39 0x00007f3dfd2d9837 gtk_box_get_preferred_width (libgtk-3.so.0)
                                                              #40 0x00007f3dfd492901 gtk_widget_query_size_for_orientation (libgtk-3.so.0)
                                                              #41 0x00007f3dfd493130 gtk_widget_compute_size_for_orientation (libgtk-3.so.0)
                                                              #42 0x00007f3dfd49321e gtk_widget_get_preferred_width (libgtk-3.so.0)
                                                              #43 0x00007f3dfd2d3ba9 gtk_bin_get_preferred_width (libgtk-3.so.0)
                                                              #44 0x00007f3dfd492901 gtk_widget_query_size_for_orientation (libgtk-3.so.0)
                                                              #45 0x00007f3dfd493130 gtk_widget_compute_size_for_orientation (libgtk-3.so.0)
                                                              #46 0x00007f3dfd49321e gtk_widget_get_preferred_width (libgtk-3.so.0)
                                                              #47 0x00007f3dfd49986a gtk_stack_measure (libgtk-3.so.0)
                                                              #48 0x00007f3dfd329b57 gtk_css_custom_gadget_get_preferred_size (libgtk-3.so.0)
                                                              #49 0x00007f3dfd32dea9 gtk_css_gadget_get_preferred_size (libgtk-3.so.0)
                                                              #50 0x00007f3dfd49a58b gtk_stack_get_preferred_width (libgtk-3.so.0)
                                                              #51 0x00007f3dfd492901 gtk_widget_query_size_for_orientation (libgtk-3.so.0)
                                                              #52 0x00007f3dfd493130 gtk_widget_compute_size_for_orientation (libgtk-3.so.0)
                                                              #53 0x00007f3dfd49321e gtk_widget_get_preferred_width (libgtk-3.so.0)
                                                              #54 0x00007f3dfd2d3ba9 gtk_bin_get_preferred_width (libgtk-3.so.0)
                                                              #55 0x00007f3dfd492901 gtk_widget_query_size_for_orientation (libgtk-3.so.0)
                                                              #56 0x00007f3dfd493130 gtk_widget_compute_size_for_orientation (libgtk-3.so.0)
                                                              #57 0x00007f3dfd49321e gtk_widget_get_preferred_width (libgtk-3.so.0)
                                                              #58 0x00007f3dfd5566d2 gtk_window_get_preferred_width (libgtk-3.so.0)
                                                              #59 0x00007f3dfd492901 gtk_widget_query_size_for_orientation (libgtk-3.so.0)
                                                              #60 0x00007f3dfd493130 gtk_widget_compute_size_for_orientation (libgtk-3.so.0)
                                                              #61 0x00007f3dfd49321e gtk_widget_get_preferred_width (libgtk-3.so.0)
                                                              #62 0x00007f3dfd49354c _gtk_widget_get_preferred_size_and_baseline (libgtk-3.so.0)
                                                              #63 0x00007f3dfd557e24 gtk_window_compute_configure_request (libgtk-3.so.0)

So we think this may actually be in GTK+, not in dnf bits.

Comment 4 Adam Williamson 2018-08-09 19:35:26 UTC
<halfline> adamw: well g_assertion_message_error means an assertion failed
<halfline> looking in ensure_surface_for_gicon there's only one assertion
<halfline>       destination = gtk_icon_theme_load_icon (icon_theme,•
<halfline>                                               "image-missing",•
<halfline>                                               width,•
<halfline>                                               flags | GTK_ICON_LOOKUP_USE_BUILTIN | GTK_ICON_LOOKUP_GENERIC_FALLBACK,•
<halfline>                                               &error);•
<halfline>       /* We include this image as resource, so we always have it available or•
<halfline>        * the icontheme code is broken */•
<halfline>       g_assert_no_error (error);•
<halfline> so there's probably two bugs
<halfline> 1) anaconda has a misspelled icon name somewhere
<halfline> and so it's trying to use image-missing as a fallback
<halfline> 2) the fallback code doesn't work anymore for some reason
<halfline> i think glib ships with a program to extract gresources from binaries
<halfline> lemme look
<adamw> that's very helpful, thanks
<halfline> ╎❯ gresource list /usr/lib64/libgtk-3.so |grep missing

Comment 5 Adam Williamson 2018-08-10 00:15:35 UTC
I thought https://github.com/rhinstaller/anaconda/commit/8eecff2cf971f2ffe9e2286d33e07f5e030de9e7 was a possible suspect here, but I tested an image containing an anaconda scratch build with a patch to revert that commit, and it still crashes. :/

I think next I'll try a GTK+ build that's patched to do some more logging during icon loading.

Comment 6 Adam Williamson 2018-08-10 05:54:11 UTC
I've got a version of the GTK+ thread backtrace with more data, from the coredump. Will attach. Unfortunately the error string is optimized out, but one thing I notice is some of the values way back in the trace seem quite...*odd*, like this:

#70 gtk_window_compute_configure_request (window=window@entry=0x55dd014e2290, request=request@entry=0x7ffef4135ab0, geometry=geometry@entry=0x7ffef4135ad0, flags=flags@entry=0x7ffef4135aa8) at gtkwindow.c:9524
        priv = 0x55dd014e2030
        new_geometry = {min_width = 1, min_height = 0, max_width = -1473846153, max_height = 32724, base_width = 199, base_height = 0, width_inc = 20664336, height_inc = 21981, min_aspect = 6.9529314086768607e-310, max_aspect = 4.9406564584124654e-324, win_gravity = 4094910384}
        new_flags = <optimized out>
        w = 0
        h = 0
        pos = <optimized out>
        parent_widget = <optimized out>
        info = <optimized out>
        screen = 0x55dd01278080
        x = 0
        y = 0
        __func__ = "gtk_window_compute_configure_request"

dunno if that's significant, but...-1473846153 ? 6.9529314086768607e-310 ?!

Comment 7 Adam Williamson 2018-08-10 05:56:20 UTC
Created attachment 1474882 [details]
more detailed trace of the GTK+ thread from gdb

Comment 8 Martin Kolman 2018-08-10 10:19:33 UTC
I have a theory what might be causing the crash or at least influencing it.

Recently during the switch to using the install_specs() API introduced in DNF 3.1 we did this change:

-       process = multiprocessing.Process(target=do_transaction,
+       process = multiprocessing.dummy.Process(target=do_transaction,

(https://github.com/rhinstaller/anaconda/commit/2249ae942c1b8bf0689ef0b368166637e572138c#diff-8243efa96c7d27bd2521b0f903157473R1013)

This effectively runs the DNF transaction in the Anaconda process (dummy Process is just a thread) instead of in a separate sub-process.

Why could this be an issue ? Well, at least in the past with yum, we have noticed that sometimes during the package installation transaction either yum or DNF would chroot the whole process for a period of time, which would result in issues such as icons not loading as the Anaconda process would look for them in the install root instead of in the installation image root.

So we put the yum transaction to a separate process, fixing the issue.

Fast forward to the present, when working on switching Anaconda to the install_specs() API which both makes it possible install modules as well as to drop a bunch of code from Anaconda for stuff that should have been always handled by DNF we hit an issue - the transaction would crash with a "foreign key error" when DNF tries to access one of it's SQLite databases. After looking at it the DNF team proposed just dropping the separate process for the transaction, which fixes the issue and seemed no longer necessary anyway.

As according to the traceback it seems to happen when GTK attemts to load some icon or other resource during a package installation transaction, it looks to me awfully similar to the issue we had back then with yum. I can easily image that if the chroot-the-whole-process-during-transaction issues still persist with DNF, then GTK will try to load something, only for it to vanish into thin air due to the chroot, leading to a crash.

This would also explain why this only happens during a package transaction (DNF or RPM doing chroot) & why it was not happening before (we only switched to the dummy process together with switch to the install_specs() API).

Comment 10 Martin Kolman 2018-08-10 13:29:13 UTC
Just in case I've created a PR to revert to the old behavior of running the DNF transaction in a sub-process, which should isolate the chrooting RPM/DNF are doing:

https://github.com/rhinstaller/anaconda/pull/1571

Unfortunately (see the first comment in the PR) this first requires DNF to be fixed not to crash if the transaction is run in a sub-process.

Comment 11 Ray Strode [halfline] 2018-08-10 18:09:07 UTC
a quick hack to fix this might be to add some C code that does something like

```
bool
install_chroot_countermeasures (void)
{
    return unshare (CLONE_FS) == 0;
}
```

and then call it from the top of `do_transaction()` 

but maybe the problem with your revert is anaconda is trying to use dnf from two or more different processes at the same time?

Comment 12 Adam Williamson 2018-08-10 18:24:19 UTC
Sorry, some stuff was posted only on IRC, not here: we actually have this working now, there's a libdnf change that makes anaconda's "run it as a subprocess" approach work again:

https://github.com/rpm-software-management/libdnf/pull/546

I tested a custom ISO build with that libdnf patch and the anaconda reversion patch, it worked (completed an install). I've done 'official' builds of libdnf and anaconda with the patches now. I asked nirik and/or mboddu to fire a new Rawhide compose, but of course they're at Flock so might take a while to respond.

Comment 13 Adam Williamson 2018-08-13 15:09:21 UTC
Confirmed fixed in recent composes.