Bug 1262040
Summary: | build-locale-archive exits on SIGABRT at install-time if provided a --install-langs value other than "all" | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | David Shea <dshea> | ||||||||
Component: | glibc | Assignee: | Carlos O'Donell <codonell> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | unspecified | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | rawhide | CC: | arjun.is, codonell, dshea, dustymabe, fweimer, jakub, law, loic.yhuel, mfabian, pfrankli, robatino, siddhesh | ||||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | 2.22-3.fc23 glibc-2.21-8.fc22 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2015-10-04 22:49:33 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1170821 | ||||||||||
Attachments: |
|
Description
David Shea
2015-09-10 16:58:36 UTC
The SIGABRT may be a red herring. I just tried again in an effort to get a core file, and while build-locale-archive exited with status 0, it still didn't install any locales. localedef --list-archive on the installed system does not list anything. (In reply to David Shea from comment #1) > The SIGABRT may be a red herring. I just tried again in an effort to get a > core file, and while build-locale-archive exited with status 0, it still > didn't install any locales. localedef --list-archive on the installed system > does not list anything. Right, a core file is exactly what we would have needed to see what was going on and at which point it failed. You could have a bad disk, bad filesystem, and we need to narrow that down. If localedef --list-archive shows no locales, then you have no locales. What's the size of /usr/lib/locale/locale-archive? Is that the only file in /usr/lib/locale? You can do the following to fix the system: - Download glibc-common rpm. - Unpack RPM. - Copy locale-archive.tmpl to /usr/lib/locale/ - Run /usr/sbin/build-locale-archive again with the arguments you want. This should do exactly what was done by kickstart and we can see what goes wrong? (In reply to Carlos O'Donell from comment #2) > (In reply to David Shea from comment #1) > > The SIGABRT may be a red herring. I just tried again in an effort to get a > > core file, and while build-locale-archive exited with status 0, it still > > didn't install any locales. localedef --list-archive on the installed system > > does not list anything. > > Right, a core file is exactly what we would have needed to see what was > going on and at which point it failed. > > You could have a bad disk, bad filesystem, and we need to narrow that down. > If localedef --list-archive shows no locales, then you have no locales. Doubtful, as it's all virt. > What's the size of /usr/lib/locale/locale-archive? Is that the only file in > /usr/lib/locale? /usr/lib/locale/locale-archive is 103860 bytes. There is also the 0-byte locale-archive.tmpl. > > You can do the following to fix the system: > - Download glibc-common rpm. > - Unpack RPM. > - Copy locale-archive.tmpl to /usr/lib/locale/ > - Run /usr/sbin/build-locale-archive again with the arguments you want. That works fine. This has something to do with the state of the system as glibc is being installed for the first time. > This should do exactly what was done by kickstart and we can see what goes > wrong? (In reply to David Shea from comment #3) > > You can do the following to fix the system: > > - Download glibc-common rpm. > > - Unpack RPM. > > - Copy locale-archive.tmpl to /usr/lib/locale/ > > - Run /usr/sbin/build-locale-archive again with the arguments you want. > > That works fine. This has something to do with the state of the system as > glibc is being installed for the first time. After glibc-common is installed the /usr/lib/locale/locale-archive.tmpl file contains all of the prebuilt locales. Then build-locale-archive copies the required locales from the template to the final file /usr/lib/locale/locale-archive. After which it truncates the template file to size zero to save space and indicate it's done the copying. We've never seen a SIGABRT like this during any of our testing. Getting a core file of that SGIABRT would be the only way we're going to learn something about the failure mode :-( (In reply to Carlos O'Donell from comment #4) > > We've never seen a SIGABRT like this during any of our testing. Getting a > core file of that SGIABRT would be the only way we're going to learn > something about the failure mode :-( I think there is a bigger problem we should focus on rather than the SIGABRT, which is that there are no locales in the archive after kickstart/anaconda if you provide the --instLangs argument with any value other than 'all' to %packages (i.e. '%packages --instLangs=en'). This is an easy reproducer: all that is needed is a kickstart for F23 with '%packages --instLangs=en' and you should observe the problem. I worked with David first on this issue and he did some testing in Anaconda to verify the value of the macro is properly getting set and passed to /usr/sbin/build-locale-archive. Is it possible it is something else within build-locale-archive that is causing this (i.e. dependecy on environment variables, etc)? Like I said in comment 3, it doesn't appear to abort anymore. I don't know why it did in the first place. The version of glibc is unchanged, but other packages from rawhide may have been upgraded. I don't know how that would affect the operation of build-locale-archive, but whatever the problem is seems to have something to do with the state of the system when build-locale-archive is run. I built a statically-linked copy of strace and patched that into the post install scripts. I will attach the results. Created attachment 1072603 [details]
build-locale-archive-install.trace
The output of strace from running build-locale-archive from the triggerin script.
Created attachment 1072604 [details]
build-locale-archive-postinstall.trace
This one is from running build-locale-archive in the installed system. Post-install, I chrooted to /mnt/sysimage, copied locale-archive.tmpl from the rpm to /usr/lib/locale, and ran build-locale-archive with the same arguments as the trigger script.
(In reply to David Shea from comment #7) > Created attachment 1072603 [details] > build-locale-archive-install.trace > > The output of strace from running build-locale-archive from the triggerin > script. This doesn't show a SIGABRT thought? Is it reproducible? Could you please provide exact set of steps to reproduce the failure during kickstart, including which kickstart file you used, and how you ran it. Created attachment 1072619 [details]
Script to run a kickstart using virt-install
(In reply to Carlos O'Donell from comment #9) > (In reply to David Shea from comment #7) > > Created attachment 1072603 [details] > > build-locale-archive-install.trace > > > > The output of strace from running build-locale-archive from the triggerin > > script. > > This doesn't show a SIGABRT thought? Is it reproducible? Hard to say, since I had to cram so many wrapper programs in there to get output and status codes and the like, since the rpm's lua interpreter is so limited. As I said, it went away at some point. Anaconda runs with MALLOC_CHECK_=2 and MALLOC_PERTRUB_ set, so it could have been a bad memory access somewhere. > Could you please provide exact set of steps to reproduce the failure during > kickstart, including which kickstart file you used, and how you ran it. Here's one: https://github.com/dashea/anaconda/blob/instlangs-tests/tests/kickstart_tests/packages-instlangs-1.ks, and I ran it by booting an installer with inst.ks=http://path/to/a/similar/kickstart. But how about instead of trying to unravel the debugging I've added and going around in a circle again, I'll just give you a fix. diff --git a/build-locale-archive.c b/build-locale-archive.c index a506876..d901fc9 100644 --- a/build-locale-archive.c +++ b/build-locale-archive.c @@ -313,8 +313,7 @@ fill_archive (struct locarhandle *tmpl_ah, /* Add one for "_" and one for the null terminator. */ size_t len = strlen (install_langs_list[i]) + 2; char *install_lang = (char *)xmalloc (len); - strncpy (install_lang, install_langs_list[i], len - 2); - install_lang[len - 1] = '\0'; + strncpy (install_lang, install_langs_list[i], len - 1); if (strchr (install_lang, '_') == NULL) strcat (install_lang, "_"); if (strncmp (name, install_lang, strlen (install_lang)) == 0) Adding the null byte to install_lang was off by one. For example, if install_langs_list[i] is en_US, then we'll end up with: len = 7 strncpy(install_lang, "en_US", 5), resulting in en_US with no \0 install_lang[6] = \0, which is two bytes after the 'S' Since len is already set to strlen() + 2 and allocates enough space, just let strncpy copy over the \0 as well and it works. (In reply to Carlos O'Donell from comment #9) > > This doesn't show a SIGABRT thought? Is it reproducible? The SIGABRT isn't really the problem. The empty archive is. Let's concentrate on that. > > Could you please provide exact set of steps to reproduce the failure during > kickstart, including which kickstart file you used, and how you ran it. See attachment: https://bugzilla.redhat.com/attachment.cgi?id=1072619 If you run that (you need to modify the variables at the top of the script) what you end up with is a VM with an empty archive. (In reply to David Shea from comment #11) > diff --git a/build-locale-archive.c b/build-locale-archive.c > index a506876..d901fc9 100644 > --- a/build-locale-archive.c > +++ b/build-locale-archive.c > @@ -313,8 +313,7 @@ fill_archive (struct locarhandle *tmpl_ah, > /* Add one for "_" and one for the null terminator. */ > size_t len = strlen (install_langs_list[i]) + 2; > char *install_lang = (char *)xmalloc (len); > - strncpy (install_lang, install_langs_list[i], len - 2); > - install_lang[len - 1] = '\0'; > + strncpy (install_lang, install_langs_list[i], len - 1); > if (strchr (install_lang, '_') == NULL) > strcat (install_lang, "_"); > if (strncmp (name, install_lang, strlen (install_lang)) == > 0) Awesome! Thanks for finding this. The key here is that VM's often leave dirtier pages than normal systems where these bytes just might be zero, which is why it only fails in a VM. I'll rewrite this a bit actually to make it clearer what we're doing. I think the change should be something like this: diff --git a/build-locale-archive.c b/build-locale-archive.c index a506876..55edcde 100644 --- a/build-locale-archive.c +++ b/build-locale-archive.c @@ -311,9 +311,9 @@ fill_archive (struct locarhandle *tmpl_ah, for (i = 0; i < install_langs_count; i++) { /* Add one for "_" and one for the null terminator. */ - size_t len = strlen (install_langs_list[i]) + 2; - char *install_lang = (char *)xmalloc (len); - strncpy (install_lang, install_langs_list[i], len - 2); + size_t len = strlen (install_langs_list[i]) + 1; + char *install_lang = (char *)xmalloc (len + 1); + strncpy (install_lang, install_langs_list[i], len); install_lang[len - 1] = '\0'; if (strchr (install_lang, '_') == NULL) strcat (install_lang, "_"); --- Where we make it clear that LEN is length plus null, and then we add one for the "_" we *may* add to the string. Then the use of len in `install_lang[len - 1] = '\0' is correct and doesn't need changing. Even simpler. Just admit that you've already run strlen on install_langs_list[i] and the strncpy is no safer than strcpy, and this is a single threaded process with unlikely sources of corruption for that character array. diff --git a/build-locale-archive.c b/build-locale-archive.c index a506876..5fc93ff 100644 --- a/build-locale-archive.c +++ b/build-locale-archive.c @@ -313,8 +313,7 @@ fill_archive (struct locarhandle *tmpl_ah, /* Add one for "_" and one for the null terminator. */ size_t len = strlen (install_langs_list[i]) + 2; char *install_lang = (char *)xmalloc (len); - strncpy (install_lang, install_langs_list[i], len - 2); - install_lang[len - 1] = '\0'; + strcpy (install_lang, install_langs_list[i]); if (strchr (install_lang, '_') == NULL) strcat (install_lang, "_"); if (strncmp (name, install_lang, strlen (install_lang)) == 0) --- This is now fixed in Fedora Rawhide (0457f649e3fe6299efe384da13dfc923bbe65707) This is now fixed in F23 (26f4d282402dc86c2b20f42fd63b11e63ce3fc74) *** Bug 1242691 has been marked as a duplicate of this bug. *** glibc-2.22-3.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2015-16130 glibc-2.22-3.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.\nIf you want to test the update, you can install it with \n su -c 'yum --enablerepo=updates-testing update glibc'. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-16130 glibc-2.22-3.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report. glibc-2.21-8.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2015-6eb4b16455 glibc-2.21-8.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with $ su -c 'dnf --enablerepo=updates-testing update glibc' You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-6eb4b16455 glibc-2.21-8.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report. |