Description of problem: Happens during openQA upgrade tests; these involve upgrading a system from F29 or F30 to F31 *before* g-i-s has run, so g-i-s runs for the first time *after* the upgrade has completed. When the upgrade has completed and we boot the upgraded system, it seems g-i-s crashes in this way. Version-Release number of selected component: gnome-initial-setup-3.33.2-1.fc31 Additional info: reporter: libreport-2.10.0 backtrace_rating: 4 cmdline: /usr/libexec/gnome-initial-setup --existing-user crash_function: __freelocale executable: /usr/libexec/gnome-initial-setup journald_cursor: s=75140769fa4c44a8a68910d0e147b588;i=3fec;b=188b0b3d393e42d182357eb76fb05abf;m=4e38c1d;t=58a6ca463a107;x=e7ec9e87372d6c06 kernel: 5.2.0-0.rc2.git1.2.fc31.x86_64 rootdir: / runlevel: N 5 type: CCpp uid: 1000 Truncated backtrace: Thread no. 1 (10 frames) #0 __freelocale at freelocale.c:44 #1 welcome at ../gnome-initial-setup/pages/language/gis-welcome-widget.c:128 #2 fill_stack at ../gnome-initial-setup/pages/language/gis-welcome-widget.c:171 #3 gis_welcome_widget_constructed at ../gnome-initial-setup/pages/language/gis-welcome-widget.c:189 #4 g_object_new_internal at ../gobject/gobject.c:1867 #6 _gtk_builder_construct at gtkbuilder.c:718 #7 builder_construct at gtkbuilderparser.c:139 #9 end_element at gtkbuilderparser.c:1075 #10 emit_end_element at ../glib/gmarkup.c:1093 #11 g_markup_parse_context_parse at ../glib/gmarkup.c:1643
Created attachment 1576777 [details] File: backtrace
Created attachment 1576778 [details] File: cgroup
Created attachment 1576779 [details] File: core_backtrace
Created attachment 1576780 [details] File: cpuinfo
Created attachment 1576781 [details] File: dso_list
Created attachment 1576782 [details] File: environ
Created attachment 1576783 [details] File: exploitable
Created attachment 1576784 [details] File: limits
Created attachment 1576785 [details] File: maps
Created attachment 1576786 [details] File: mountinfo
Created attachment 1576787 [details] File: open_fds
Created attachment 1576788 [details] File: proc_pid_status
Problem seems to be that locale is null, I think?
This looks a lot like it's probably caused by this commit by mcatanzaro: https://gitlab.gnome.org/GNOME/gnome-initial-setup/commit/83696b544e241293233fa7eb09a0113d722a89e9 so, CCing him.
I think https://bugzilla.redhat.com/show_bug.cgi?id=1715891 may have caused this. This bug seems to have shown up first in the Fedora-Rawhide-20190603.n.0 tests, and that was the compose in which Rawhide went from glibc-2.29.9000-21.fc31 to glibc-2.29.9000-23.fc31 .
Two problems: (a) My code is broken for assuming newlocale() succeeds without checking return value. I will change the code to check the return value, and print an appropriate error message when the call fails. (b) GNOME expects all available locales to be installed. It appears the crash is caused by the Chinese locale disappearing between June 2 rawhide (last good) and June 3 rawhide (first bad). We currently have no plans to change gnome-initial-setup, gnome-control-center, and other applications that offer language selection widgets (e.g. Epiphany) for them to function properly in the absence of particular locales. If locales are missing I think they would generally just disappear from the list of locales that are available for selection, but certain locales are hardcoded to be presented first, including Chinese, and we have no plans to change that at this time.
BTW to be clear: I will fix (a), but someone who knows how locale installation is supposed to work will have to comment on (b). We don't have any code to install locales in gnome-initial-setup, or (to the best of my knowledge) gnome-control-center, etc.
Note, this is only happening on the *upgrade* tests. g-i-s is running OK in fresh install tests, e.g. https://openqa.fedoraproject.org/tests/408461 . I'll try and figure out what it is about upgrading specifically that causes this problem.
Huh, OK, so the cause seems relatively simple: somehow, on upgrade, /usr/lib/locale/locale-archive just goes completely missing. I just recreated what the openQA test in question does manually: installed Fedora 30 Workstation (from the Everything netinst), then upgraded to Rawhide using dnf system-upgrade . I checked the upgraded system, and it simply has no /usr/lib/locale/locale-archive file at all. `rpm -V glibc-all-langpacks` shows: missing /usr/lib/locale/locale-archive I don't know *why* this happens, yet.
(In reply to Adam Williamson from comment #18) > Note, this is only happening on the *upgrade* tests. g-i-s is running OK in > fresh install tests, e.g. https://openqa.fedoraproject.org/tests/408461 . > I'll try and figure out what it is about upgrading specifically that causes > this problem. Do you have glibc-all-langpacks installed? Only if you have glibc-all-langpacks do you also have /usr/lib/locale/locale-archive which is the mmap'able copy of all the installed locales ready for use by the process. The change we made was to copy in a complete version of this file without processing or filtering it in any way. You shouldn't be able to tell the difference. Are you running processes during %post? Has rpm somehow not upgraded /usr/lib/locale/locale-archive? Can you get us a copy of that file for your system?
Filed https://bugzilla.redhat.com/show_bug.cgi?id=1716710 .
(In reply to Adam Williamson from comment #19) > Huh, OK, so the cause seems relatively simple: somehow, on upgrade, > /usr/lib/locale/locale-archive just goes completely missing. > > I just recreated what the openQA test in question does manually: installed > Fedora 30 Workstation (from the Everything netinst), then upgraded to > Rawhide using dnf system-upgrade . I checked the upgraded system, and it > simply has no /usr/lib/locale/locale-archive file at all. `rpm -V > glibc-all-langpacks` shows: > > missing /usr/lib/locale/locale-archive > > I don't know *why* this happens, yet. This may be a transitional issue we didn't consider. So the old glibc will have a %postun to remove /usr/lib/locale/locale-archive. The old glibc used to recreate locale-archive via %posttrans. The new glibc will install locale-archive normally as a normal installed file. It may be the case that the old glibc's %postun is deleting the new glibc's locale-archive. Then the new glibc doesn't do anything to reinstall it.
I wasn't setting a 'depends on' because the crash in g-i-s indicates a bug regardless of the *reason* why the locale couldn't be loaded, just for the record...
I don't think we need to track this separately since the crash will go away as soon as bug #1716710 is fixed, but for the record, I've submitted a fix in https://gitlab.gnome.org/GNOME/gnome-initial-setup/merge_requests/39.