Bug 1262040

Summary: build-locale-archive exits on SIGABRT at install-time if provided a --install-langs value other than "all"
Product: [Fedora] Fedora Reporter: David Shea <dshea>
Component: glibcAssignee: Carlos O'Donell <codonell>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: arjun, codonell, dshea, dustymabe, fweimer, jakub, law, loic.yhuel, mfabian, pfrankli, robatino, siddhesh
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 2.22-3.fc23 glibc-2.21-8.fc22 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-04 22:49:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1170821    
Attachments:
Description Flags
build-locale-archive-install.trace
none
build-locale-archive-postinstall.trace
none
Script to run a kickstart using virt-install none

Description David Shea 2015-09-10 16:58:36 UTC
Description of problem:
When installing glibc-common to a new system and using _install_langs, build-locale-archive fails when invoked from the triggerin script.

I modified the rpm to add --verbose to the arguments, and when I installed glibc-common as part of an anaconda install, it ran:

/usr/sbin/build-locale-archive --verbose --install-langs en

and I got the following output:

using default input archive file.
using default output archive file.

followed by an exit on signal 6.

It appears to work fine when %{_install_langs} expands to "all".


Version-Release number of selected component (if applicable):
glibc-common-2.22.90-3.fc24.x86_64


How reproducible:
always


Steps to Reproduce:
1. kickstart install using %packages --instLangs=en

Comment 1 David Shea 2015-09-11 14:44:50 UTC
The SIGABRT may be a red herring. I just tried again in an effort to get a core file, and while build-locale-archive exited with status 0, it still didn't install any locales. localedef --list-archive on the installed system does not list anything.

Comment 2 Carlos O'Donell 2015-09-11 15:09:56 UTC
(In reply to David Shea from comment #1)
> The SIGABRT may be a red herring. I just tried again in an effort to get a
> core file, and while build-locale-archive exited with status 0, it still
> didn't install any locales. localedef --list-archive on the installed system
> does not list anything.

Right, a core file is exactly what we would have needed to see what was going on and at which point it failed.

You could have a bad disk, bad filesystem, and we need to narrow that down. If localedef --list-archive shows no locales, then you have no locales.

What's the size of /usr/lib/locale/locale-archive? Is that the only file in /usr/lib/locale?

You can do the following to fix the system:
- Download glibc-common rpm.
- Unpack RPM.
- Copy locale-archive.tmpl to /usr/lib/locale/
- Run /usr/sbin/build-locale-archive again with the arguments you want.

This should do exactly what was done by kickstart and we can see what goes wrong?

Comment 3 David Shea 2015-09-11 15:12:46 UTC
(In reply to Carlos O'Donell from comment #2)
> (In reply to David Shea from comment #1)
> > The SIGABRT may be a red herring. I just tried again in an effort to get a
> > core file, and while build-locale-archive exited with status 0, it still
> > didn't install any locales. localedef --list-archive on the installed system
> > does not list anything.
> 
> Right, a core file is exactly what we would have needed to see what was
> going on and at which point it failed.
> 
> You could have a bad disk, bad filesystem, and we need to narrow that down.
> If localedef --list-archive shows no locales, then you have no locales.

Doubtful, as it's all virt.

> What's the size of /usr/lib/locale/locale-archive? Is that the only file in
> /usr/lib/locale?

/usr/lib/locale/locale-archive is 103860 bytes. There is also the 0-byte locale-archive.tmpl.

> 
> You can do the following to fix the system:
> - Download glibc-common rpm.
> - Unpack RPM.
> - Copy locale-archive.tmpl to /usr/lib/locale/
> - Run /usr/sbin/build-locale-archive again with the arguments you want.

That works fine. This has something to do with the state of the system as glibc is being installed for the first time.
 
> This should do exactly what was done by kickstart and we can see what goes
> wrong?

Comment 4 Carlos O'Donell 2015-09-11 15:16:17 UTC
(In reply to David Shea from comment #3)
> > You can do the following to fix the system:
> > - Download glibc-common rpm.
> > - Unpack RPM.
> > - Copy locale-archive.tmpl to /usr/lib/locale/
> > - Run /usr/sbin/build-locale-archive again with the arguments you want.
> 
> That works fine. This has something to do with the state of the system as
> glibc is being installed for the first time.

After glibc-common is installed the /usr/lib/locale/locale-archive.tmpl file contains all of the prebuilt locales. Then build-locale-archive copies the required locales from the template to the final file /usr/lib/locale/locale-archive. After which it truncates the template file to size zero to save space and indicate it's done the copying.

We've never seen a SIGABRT like this during any of our testing. Getting a core file of that SGIABRT would be the only way we're going to learn something about the failure mode :-(

Comment 5 Dusty Mabe 2015-09-11 16:06:47 UTC
(In reply to Carlos O'Donell from comment #4)
> 
> We've never seen a SIGABRT like this during any of our testing. Getting a
> core file of that SGIABRT would be the only way we're going to learn
> something about the failure mode :-(

I think there is a bigger problem we should focus on rather than the SIGABRT, which is that there are no locales in the archive after kickstart/anaconda if you provide the --instLangs argument with any value other than 'all' to %packages (i.e. '%packages --instLangs=en').

This is an easy reproducer: all that is needed is a kickstart for F23 with '%packages --instLangs=en' and you should observe the problem.

I worked with David first on this issue and he did some testing in Anaconda to verify the value of the macro is properly getting set and passed to /usr/sbin/build-locale-archive. Is it possible it is something else within build-locale-archive that is causing this (i.e. dependecy on environment variables, etc)?

Comment 6 David Shea 2015-09-11 16:07:58 UTC
Like I said in comment 3, it doesn't appear to abort anymore. I don't know why it did in the first place. The version of glibc is unchanged, but other packages from rawhide may have been upgraded. I don't know how that would affect the operation of build-locale-archive, but whatever the problem is seems to have something to do with the state of the system when build-locale-archive is run.

I built a statically-linked copy of strace and patched that into the post install scripts. I will attach the results.

Comment 7 David Shea 2015-09-11 16:08:42 UTC
Created attachment 1072603 [details]
build-locale-archive-install.trace

The output of strace from running build-locale-archive from the triggerin script.

Comment 8 David Shea 2015-09-11 16:10:22 UTC
Created attachment 1072604 [details]
build-locale-archive-postinstall.trace

This one is from running build-locale-archive in the installed system. Post-install, I chrooted to /mnt/sysimage, copied locale-archive.tmpl from the rpm to /usr/lib/locale, and ran build-locale-archive with the same arguments as the trigger script.

Comment 9 Carlos O'Donell 2015-09-11 18:32:03 UTC
(In reply to David Shea from comment #7)
> Created attachment 1072603 [details]
> build-locale-archive-install.trace
> 
> The output of strace from running build-locale-archive from the triggerin
> script.

This doesn't show a SIGABRT thought? Is it reproducible?

Could you please provide exact set of steps to reproduce the failure during kickstart, including which kickstart file you used, and how you ran it.

Comment 10 Dusty Mabe 2015-09-11 19:15:12 UTC
Created attachment 1072619 [details]
Script to run a kickstart using virt-install

Comment 11 David Shea 2015-09-11 19:15:49 UTC
(In reply to Carlos O'Donell from comment #9)
> (In reply to David Shea from comment #7)
> > Created attachment 1072603 [details]
> > build-locale-archive-install.trace
> > 
> > The output of strace from running build-locale-archive from the triggerin
> > script.
> 
> This doesn't show a SIGABRT thought? Is it reproducible?

Hard to say, since I had to cram so many wrapper programs in there to get output and status codes and the like, since the rpm's lua interpreter is so limited. As I said, it went away at some point. Anaconda runs with MALLOC_CHECK_=2 and MALLOC_PERTRUB_ set, so it could have been a bad memory access somewhere.

> Could you please provide exact set of steps to reproduce the failure during
> kickstart, including which kickstart file you used, and how you ran it.

Here's one: https://github.com/dashea/anaconda/blob/instlangs-tests/tests/kickstart_tests/packages-instlangs-1.ks, and I ran it by booting an installer with inst.ks=http://path/to/a/similar/kickstart.

But how about instead of trying to unravel the debugging I've added and going around in a circle again, I'll just give you a fix.

diff --git a/build-locale-archive.c b/build-locale-archive.c
index a506876..d901fc9 100644
--- a/build-locale-archive.c
+++ b/build-locale-archive.c
@@ -313,8 +313,7 @@ fill_archive (struct locarhandle *tmpl_ah,
                /* Add one for "_" and one for the null terminator.  */
                size_t len = strlen (install_langs_list[i]) + 2;
                char *install_lang = (char *)xmalloc (len);
-                strncpy (install_lang, install_langs_list[i], len - 2);
-               install_lang[len - 1] = '\0';
+                strncpy (install_lang, install_langs_list[i], len - 1);
                 if (strchr (install_lang, '_') == NULL)
                   strcat (install_lang, "_");
                 if (strncmp (name, install_lang, strlen (install_lang)) == 0)


Adding the null byte to install_lang was off by one. For example, if install_langs_list[i] is en_US, then we'll end up with:

len = 7
strncpy(install_lang, "en_US", 5), resulting in en_US with no \0
install_lang[6] = \0, which is two bytes after the 'S'

Since len is already set to strlen() + 2 and allocates enough space, just let strncpy copy over the \0 as well and it works.

Comment 12 Dusty Mabe 2015-09-11 19:17:25 UTC
(In reply to Carlos O'Donell from comment #9)

> 
> This doesn't show a SIGABRT thought? Is it reproducible?

The SIGABRT isn't really the problem. The empty archive is. Let's concentrate on that.

> 
> Could you please provide exact set of steps to reproduce the failure during
> kickstart, including which kickstart file you used, and how you ran it.

See attachment: https://bugzilla.redhat.com/attachment.cgi?id=1072619

If you run that (you need to modify the variables at the top of the script) what you end up with is a VM with an empty archive.

Comment 13 Carlos O'Donell 2015-09-11 21:01:47 UTC
(In reply to David Shea from comment #11)
> diff --git a/build-locale-archive.c b/build-locale-archive.c
> index a506876..d901fc9 100644
> --- a/build-locale-archive.c
> +++ b/build-locale-archive.c
> @@ -313,8 +313,7 @@ fill_archive (struct locarhandle *tmpl_ah,
>                 /* Add one for "_" and one for the null terminator.  */
>                 size_t len = strlen (install_langs_list[i]) + 2;
>                 char *install_lang = (char *)xmalloc (len);
> -                strncpy (install_lang, install_langs_list[i], len - 2);
> -               install_lang[len - 1] = '\0';
> +                strncpy (install_lang, install_langs_list[i], len - 1);
>                  if (strchr (install_lang, '_') == NULL)
>                    strcat (install_lang, "_");
>                  if (strncmp (name, install_lang, strlen (install_lang)) ==
> 0)

Awesome! Thanks for finding this. The key here is that VM's often leave dirtier pages than normal systems where these bytes just might be zero, which is why it only fails in a VM.

I'll rewrite this a bit actually to make it clearer what we're doing.

Comment 14 Carlos O'Donell 2015-09-16 13:19:11 UTC
I think the change should be something like this:

diff --git a/build-locale-archive.c b/build-locale-archive.c
index a506876..55edcde 100644
--- a/build-locale-archive.c
+++ b/build-locale-archive.c
@@ -311,9 +311,9 @@ fill_archive (struct locarhandle *tmpl_ah,
             for (i = 0; i < install_langs_count; i++)
               {
                /* Add one for "_" and one for the null terminator.  */
-               size_t len = strlen (install_langs_list[i]) + 2;
-               char *install_lang = (char *)xmalloc (len);
-                strncpy (install_lang, install_langs_list[i], len - 2);
+               size_t len = strlen (install_langs_list[i]) + 1;
+               char *install_lang = (char *)xmalloc (len + 1);
+                strncpy (install_lang, install_langs_list[i], len);
                install_lang[len - 1] = '\0';
                 if (strchr (install_lang, '_') == NULL)
                   strcat (install_lang, "_");
---

Where we make it clear that LEN is length plus null, and then we add one for the "_" we *may* add to the string. Then the use of len in `install_lang[len - 1] = '\0' is correct and doesn't need changing.

Comment 15 Carlos O'Donell 2015-09-16 15:27:32 UTC
Even simpler. Just admit that you've already run strlen on install_langs_list[i] and the strncpy is no safer than strcpy, and this is a single threaded process with unlikely sources of corruption for that character array.

diff --git a/build-locale-archive.c b/build-locale-archive.c
index a506876..5fc93ff 100644
--- a/build-locale-archive.c
+++ b/build-locale-archive.c
@@ -313,8 +313,7 @@ fill_archive (struct locarhandle *tmpl_ah,
                /* Add one for "_" and one for the null terminator.  */
                size_t len = strlen (install_langs_list[i]) + 2;
                char *install_lang = (char *)xmalloc (len);
-                strncpy (install_lang, install_langs_list[i], len - 2);
-               install_lang[len - 1] = '\0';
+                strcpy (install_lang, install_langs_list[i]);
                 if (strchr (install_lang, '_') == NULL)
                   strcat (install_lang, "_");
                 if (strncmp (name, install_lang, strlen (install_lang)) == 0)
---

Comment 16 Carlos O'Donell 2015-09-17 16:27:15 UTC
This is now fixed in Fedora Rawhide (0457f649e3fe6299efe384da13dfc923bbe65707)

This is now fixed in F23 (26f4d282402dc86c2b20f42fd63b11e63ce3fc74)

Comment 17 Carlos O'Donell 2015-09-17 18:39:06 UTC
*** Bug 1242691 has been marked as a duplicate of this bug. ***

Comment 18 Fedora Update System 2015-09-17 21:20:28 UTC
glibc-2.22-3.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2015-16130

Comment 19 Fedora Update System 2015-09-19 00:20:55 UTC
glibc-2.22-3.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.\nIf you want to test the update, you can install it with \n su -c 'yum --enablerepo=updates-testing update glibc'. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-16130

Comment 20 Fedora Update System 2015-09-21 10:47:39 UTC
glibc-2.22-3.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.

Comment 21 Fedora Update System 2015-09-29 11:25:41 UTC
glibc-2.21-8.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2015-6eb4b16455

Comment 22 Fedora Update System 2015-10-04 01:38:04 UTC
glibc-2.21-8.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report.
If you want to test the update, you can install it with
$ su -c 'dnf --enablerepo=updates-testing update glibc'
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-6eb4b16455

Comment 23 Fedora Update System 2015-10-04 22:49:30 UTC
glibc-2.21-8.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.