Description of problem:
Happens during openQA upgrade tests; these involve upgrading a system from F29 or F30 to F31 *before* g-i-s has run, so g-i-s runs for the first time *after* the upgrade has completed. When the upgrade has completed and we boot the upgraded system, it seems g-i-s crashes in this way.
Version-Release number of selected component:
cmdline: /usr/libexec/gnome-initial-setup --existing-user
runlevel: N 5
Thread no. 1 (10 frames)
#0 __freelocale at freelocale.c:44
#1 welcome at ../gnome-initial-setup/pages/language/gis-welcome-widget.c:128
#2 fill_stack at ../gnome-initial-setup/pages/language/gis-welcome-widget.c:171
#3 gis_welcome_widget_constructed at ../gnome-initial-setup/pages/language/gis-welcome-widget.c:189
#4 g_object_new_internal at ../gobject/gobject.c:1867
#6 _gtk_builder_construct at gtkbuilder.c:718
#7 builder_construct at gtkbuilderparser.c:139
#9 end_element at gtkbuilderparser.c:1075
#10 emit_end_element at ../glib/gmarkup.c:1093
#11 g_markup_parse_context_parse at ../glib/gmarkup.c:1643
Created attachment 1576777 [details]
Created attachment 1576778 [details]
Created attachment 1576779 [details]
Created attachment 1576780 [details]
Created attachment 1576781 [details]
Created attachment 1576782 [details]
Created attachment 1576783 [details]
Created attachment 1576784 [details]
Created attachment 1576785 [details]
Created attachment 1576786 [details]
Created attachment 1576787 [details]
Created attachment 1576788 [details]
Problem seems to be that locale is null, I think?
This looks a lot like it's probably caused by this commit by mcatanzaro:
so, CCing him.
I think https://bugzilla.redhat.com/show_bug.cgi?id=1715891 may have caused this. This bug seems to have shown up first in the Fedora-Rawhide-20190603.n.0 tests, and that was the compose in which Rawhide went from glibc-2.29.9000-21.fc31 to glibc-2.29.9000-23.fc31 .
(a) My code is broken for assuming newlocale() succeeds without checking return value. I will change the code to check the return value, and print an appropriate error message when the call fails.
(b) GNOME expects all available locales to be installed. It appears the crash is caused by the Chinese locale disappearing between June 2 rawhide (last good) and June 3 rawhide (first bad). We currently have no plans to change gnome-initial-setup, gnome-control-center, and other applications that offer language selection widgets (e.g. Epiphany) for them to function properly in the absence of particular locales. If locales are missing I think they would generally just disappear from the list of locales that are available for selection, but certain locales are hardcoded to be presented first, including Chinese, and we have no plans to change that at this time.
BTW to be clear: I will fix (a), but someone who knows how locale installation is supposed to work will have to comment on (b). We don't have any code to install locales in gnome-initial-setup, or (to the best of my knowledge) gnome-control-center, etc.
Note, this is only happening on the *upgrade* tests. g-i-s is running OK in fresh install tests, e.g. https://openqa.fedoraproject.org/tests/408461 . I'll try and figure out what it is about upgrading specifically that causes this problem.
Huh, OK, so the cause seems relatively simple: somehow, on upgrade, /usr/lib/locale/locale-archive just goes completely missing.
I just recreated what the openQA test in question does manually: installed Fedora 30 Workstation (from the Everything netinst), then upgraded to Rawhide using dnf system-upgrade . I checked the upgraded system, and it simply has no /usr/lib/locale/locale-archive file at all. `rpm -V glibc-all-langpacks` shows:
I don't know *why* this happens, yet.
(In reply to Adam Williamson from comment #18)
> Note, this is only happening on the *upgrade* tests. g-i-s is running OK in
> fresh install tests, e.g. https://openqa.fedoraproject.org/tests/408461 .
> I'll try and figure out what it is about upgrading specifically that causes
> this problem.
Do you have glibc-all-langpacks installed?
Only if you have glibc-all-langpacks do you also have /usr/lib/locale/locale-archive which is the mmap'able copy of all the installed locales ready for use by the process.
The change we made was to copy in a complete version of this file without processing or filtering it in any way.
You shouldn't be able to tell the difference.
Are you running processes during %post?
Has rpm somehow not upgraded /usr/lib/locale/locale-archive?
Can you get us a copy of that file for your system?
Filed https://bugzilla.redhat.com/show_bug.cgi?id=1716710 .
(In reply to Adam Williamson from comment #19)
> Huh, OK, so the cause seems relatively simple: somehow, on upgrade,
> /usr/lib/locale/locale-archive just goes completely missing.
> I just recreated what the openQA test in question does manually: installed
> Fedora 30 Workstation (from the Everything netinst), then upgraded to
> Rawhide using dnf system-upgrade . I checked the upgraded system, and it
> simply has no /usr/lib/locale/locale-archive file at all. `rpm -V
> glibc-all-langpacks` shows:
> missing /usr/lib/locale/locale-archive
> I don't know *why* this happens, yet.
This may be a transitional issue we didn't consider.
So the old glibc will have a %postun to remove /usr/lib/locale/locale-archive.
The old glibc used to recreate locale-archive via %posttrans.
The new glibc will install locale-archive normally as a normal installed file.
It may be the case that the old glibc's %postun is deleting the new glibc's locale-archive.
Then the new glibc doesn't do anything to reinstall it.
I wasn't setting a 'depends on' because the crash in g-i-s indicates a bug regardless of the *reason* why the locale couldn't be loaded, just for the record...
I don't think we need to track this separately since the crash will go away as soon as bug #1716710 is fixed, but for the record, I've submitted a fix in https://gitlab.gnome.org/GNOME/gnome-initial-setup/merge_requests/39.