Bug 1714888

Summary: glibc: Account for size of locale-archive in rpm package.
Product: Red Hat Enterprise Linux 7 Reporter: Karel Srot <ksrot>
Component: glibcAssignee: Florian Weimer <fweimer>
Status: CLOSED ERRATA QA Contact: qe-baseos-tools-bugs
Severity: high Docs Contact:
Priority: high    
Version: 7.6CC: ashankar, codonell, dj, fweimer, mnewsome, pfrankli, pmatilai, skolosov
Target Milestone: rcKeywords: Patch
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glibc-2.17-304.el7 Doc Type: Bug Fix
Doc Text:
Cause: The package manager does not verify that before a glibc update, sufficient disk space to complete the update is available. Consequence: If a system is low on disk space, installation of a glibc update may terminate abnormally, potentially leaving the system in an unusable state. Fix: The size of the generated glibc locale file is recorded in the glibc RPM file, so that the package manager can take it it into account during its pre-update checks. Result: For future updates, the package manager will not start the update in most cases when insufficient disk space is available, reporting an error instead.
Story Points: ---
Clone Of:
: 1725131 (view as bug list) Environment:
Last Closed: 2020-03-31 19:08:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1725131    
Bug Blocks: 1710255    

Description Karel Srot 2019-05-29 06:12:21 UTC
Description of problem:

Please see also bug 1491786 for more background.


If the root filesystem is low on space, it starts the update but runs out of space during the installation of the RPMs. Yum then fails, leaving the system in a corrupted state.


Version-Release number of selected component (if applicable):
 yum-3.4.3-150.el7.noarch
 kernel-3.10.0-514.6.2.el7.x86_64 (RHEL 7.3)

How reproducible:
-

Steps to Reproduce:
1.1. Build a server with our standard layout including a  3G root filesystem
2. Create a 1.5G file in /usr
	dd if=/dev/urandom of=/usr/bigfile bs=1024k count=1500
3. Update the server:
    yum clean all && yum update -y


Actual results:
RPM installation then fails, leaving packages broken and in an consistent state. The server is then broken; processes no longer can run because dependent libraries are not installed or are installed in inconsistent versions.

Expected results:
Yum transaction check accurately determines if there is enough space before installing packages.





~~~~~~~~~~~~~~~~~~Additional info~~~~~~~~~~~~~~~~~~
On a freshly installed RHEL7.3 machine:

~~~
# df -B1 /
Filesystem                      1B-blocks      Used  Available Use% Mounted on
/dev/mapper/rhel-root          3210215424 971796480 2238418944  31% /
#
~~~

Calculating the space needed for the /usr/bigfile:

~~~
# echo $((2238418944-237072384))
2001346560
# echo $((2001346560/1024/1024))
1908
# dd if=/dev/urandom of=/usr/bigfile bs=1024k count=1908
1908+0 records in
1908+0 records out
2000683008 bytes (2.0 GB) copied, 104.975 s, 19.1 MB/s
# df -B1 /
Filesystem             1B-blocks       Used Available Use% Mounted on
/dev/mapper/rhel-root 3210215424 2972479488 237735936  93% /
#
~~~

Making a little less space:

~~~
# dd if=/dev/urandom of=/usr/bigfile bs=1024k count=1909
1909+0 records in
1909+0 records out
2001731584 bytes (2.0 GB) copied, 104.584 s, 19.1 MB/s
# df -B1 /
Filesystem             1B-blocks       Used Available Use% Mounted on
/dev/mapper/rhel-root 3210215424 2973528064 236687360  93% /
#
~~~

Trying to reproduce:

~~~
# yum update -y
... snip ...
Install   10 Packages (+31 Dependent packages)
Upgrade  199 Packages

Total download size: 214 M
... snip ..

 Updating   : bash-4.2.46-28.el7.x86_64                                                                                                                                                                                               13/444 
  Updating   : nss-softokn-freebl-3.28.3-6.el7.x86_64                                                                                                                                                                                  14/444 
  Updating   : glibc-common-2.17-196.el7.x86_64                                                                                                                                                                                        15/444 
/usr/sbin/build-locale-archive: cannot add to locale archive: No such file or directory
/usr/sbin/build-locale-archive: cannot add to locale archive: No such file or directory
  Updating   : glibc-2.17-196.el7.x86_64                                                                                                                                                                                               16/444 
Error unpacking rpm package glibc-2.17-196.el7.x86_64
warning: /etc/nsswitch.conf created as /etc/nsswitch.conf.rpmnew
error: unpacking of archive failed on file /lib64/libc-2.17.so;5989c491: cpio: write
  Updating   : libstdc++-4.8.5-16.el7.x86_64                                                                                                                                                                                           17/444 
Error unpacking rpm package libstdc++-4.8.5-16.el7.x86_64
error: glibc-2.17-196.el7.x86_64: install failed
error: unpacking of archive failed on file /usr/lib64/libstdc++.so.6.0.19;5989c491: cpio: lsetfilecon
  Updating   : pcre-8.32-17.el7.x86_64                                                                                                                                                                                                 18/444 
Error unpacking rpm package pcre-8.32-17.el7.x86_64
error: libstdc++-4.8.5-16.el7.x86_64: install failed
error: unpacking of archive failed on file /usr/lib64/libpcre.so.1;5989c491: cpio: symlink
  Updating   : libselinux-2.5-11.el7.x86_64                                                                                                                                                                                            19/444 
error: pcre-8.32-17.el7.x86_64: install failed
warning: %triggerin(cronie-1.4.11-14.el7_2.1.x86_64) scriptlet failed, signal 11
... snip ...
#


----------

 Panu Matilainen 2019-05-28 12:52:08 UTC

Looking at the original report:
>  Updating   : glibc-common-2.17-196.el7.x86_64                                                                                                                                                                                        > 15/444 
> /usr/sbin/build-locale-archive: cannot add to locale archive: No such file or directory
> /usr/sbin/build-locale-archive: cannot add to locale archive: No such file or directory
>  Updating   : glibc-2.17-196.el7.x86_64                         

The culprit here is probably the unaccounted-for glibc locale-archive. It's a big file which is generated by install-time scriptlets, but appears as zero-size to rpm:

$ rpm -qplv glibc-common-2.17-292.el7.x86_64.rpm |grep /locale-archive
-rw-r--r--    1 root    root                        0 Apr 30 11:24 /usr/lib/locale/locale-archive
-rw-r--r--    1 root    root                106065680 Apr 30 11:06 /usr/lib/locale/locale-archive.tmpl

Post-install, those files have more or less swapped sizes, .tmpl is 0 and locale-archive is roughly the original .tmpl size, but presumably there's a moment during which this size requirement is doubled, and there's no way for rpm to know about or account for that.

IF my theory is right and this is the culprit, the only way to address this that I can see is have glibc account for the maximum temporary diskspace, by making locale-archive a %ghost that is touch'ed to locale-archive.tmpl size so that rpm knows about the space requirement (this is similar to what is done about kernel initramfs, only in this case its just a temporary need).


----------------

 Kyle Walker 2019-05-28 16:59:41 UTC

Looking at the glibc-common post script behaviour, I believe that warrants a separate bug as well. The utility keeps the /usr/lib/locale/locale-archive.tmpl open and at the original size while it is writing /usr/lib/locale/locale-archive. It only truncates after the locale-archive file is completely done. Essentially doubling the on-disk size during the update. Where the file is quite large to begin with:

    # ls -lh /usr/lib/locale/locale-archive*
    -rw-r--r--. 1 root root    0 May 28 12:49 /usr/lib/locale/locale-archive
    -rw-r--r--. 1 root root 102M Apr 30 03:55 /usr/lib/locale/locale-archive.tmpl

    
    # strace -Tttfv -e trace=file /usr/sbin/build-locale-archive --install-langs all 
    <snip>
    12:51:08.196574 open("/usr/lib/locale/locale-archive.tmpl", O_RDONLY) = 4 <0.000033>
    12:51:08.196830 unlink("/usr/lib/locale/locale-archive") = 0 <0.000113>
    12:51:08.197311 open("/proc/meminfo", O_RDONLY|O_CLOEXEC) = 3 <0.000053>
    12:51:08.199092 open("/usr/lib/locale/locale-archive", O_RDWR) = -1 ENOENT (No such file or directory) <0.000033>
    12:51:08.199336 open("/usr/lib/locale/locale-archive.w6dqgn", O_RDWR|O_CREAT|O_EXCL, 0600) = 3 <0.000246>
    12:51:08.200928 link("/usr/lib/locale/locale-archive.w6dqgn", "/usr/lib/locale/locale-archive") = 0 <0.000054>
    12:51:08.201082 unlink("/usr/lib/locale/locale-archive.w6dqgn") = 0 <0.000043>
    12:51:08.204415 open("/usr/share/locale/locale.alias", O_RDONLY) = 5 <0.000096>
    12:51:08.205877 open("/usr/share/locale/locale.alias", O_RDONLY) = 5 <0.000034>
    <snip>
    12:51:08.492411 open("/usr/lib/locale/locale-archive.pRsTnl", O_RDWR|O_CREAT|O_EXCL, 0600) = 5 <0.000201>
    12:51:10.146028 rename("/usr/lib/locale/locale-archive.pRsTnl", "/usr/lib/locale/locale-archive") = 0 <0.000119>
    <snip>
    12:51:10.206147 open("/usr/share/locale/locale.alias", O_RDONLY) = 3 <0.000052>
    12:51:10.212070 truncate("/usr/lib/locale/locale-archive.tmpl", 0) = 0 <0.015600>
    12:51:10.227770 execve("/usr/sbin/tzdata-update", ["/usr/sbin/tzdata-update"], []) = -1 ENOENT (No such file or directory) <0.000058>
    12:51:10.227993 +++ exited with 0 +++


However, I think this is about as far as we can take this particular problem. In the grand scheme of things, yum is relying on rpm to report if there are going to be any problems. The rpm side can't account for all the various ways that scripts can result in package size bloat. The Fedora packaging guidelines don't even mention this aspect of scriptlet creation. Though it might be something we want to add to that guidance and start looking for in our own content.


Version-Release number of selected component (if applicable):


How reproducible:
always, when the available disk space is low


Steps to Reproduce:
1. see above

Actual results:
package installation fails leaving the system in a corrupted state

Expected results:
glibc RPM package better claims required disk space needed for the transaction so that rpm can reflect it in the pre-install check.

Comment 2 Florian Weimer 2019-05-29 06:49:44 UTC
This looks predominantly like an RPM bug.  It should not continue the update if the disk is reported as full, and should be able to complete the installation after the system administrator has freed up some disk space.  After all, the pre-transaction check suffers from a TOCTOU race, so it's not completely reliable.

It's news to me that %ghost files contribute to transaction size.  Is this documented somewhere?

Comment 3 Panu Matilainen 2019-05-29 07:48:01 UTC
I don't know if it's documented anywhere, so many dark corners like this are not.. in any case, it does. The kernel initramfs case convinced me that it *must* be that way.

Technically rpm is in position to stop the transaction of course, but ability to signal the admin and continuing would require a new interface that is somehow mandated on all API clients.

Comment 4 Florian Weimer 2019-05-29 10:06:08 UTC
Presumably, the recommended change looks like this:

diff --git a/glibc.spec b/glibc.spec
index ebf161b4..c501b9de 100644
--- a/glibc.spec
+++ b/glibc.spec
@@ -3441,7 +3441,7 @@ $olddir/build-%{target}/elf/ld.so \
     --prefix ${RPM_BUILD_ROOT} --add-to-archive \
     *_*
 rm -rf *_*
-mv locale-archive{,.tmpl}
+cp locale-archive{,.tmpl}
 popd
 %endif

I have a test build with this change, glibc-2.17-292.el7.bz1714888.0.

Comment 8 Florian Weimer 2019-05-29 12:17:09 UTC
I missed another file truncation, so the patch should look like this:

diff --git a/glibc.spec b/glibc.spec
index ebf161b4..5176f5c4 100644
--- a/glibc.spec
+++ b/glibc.spec
@@ -3441,7 +3441,7 @@ $olddir/build-%{target}/elf/ld.so \
     --prefix ${RPM_BUILD_ROOT} --add-to-archive \
     *_*
 rm -rf *_*
-mv locale-archive{,.tmpl}
+cp locale-archive{,.tmpl}
 popd
 %endif
 
@@ -3906,10 +3906,6 @@ touch $RPM_BUILD_ROOT/var/{db,run}/nscd/{passwd,group,hosts,services}
 touch $RPM_BUILD_ROOT/var/run/nscd/{socket,nscd.pid}
 %endif
 
-%ifnarch %{auxarches}
-> $RPM_BUILD_ROOT/%{_prefix}/lib/locale/locale-archive
-%endif
-
 mkdir -p $RPM_BUILD_ROOT/var/cache/ldconfig
 > $RPM_BUILD_ROOT/var/cache/ldconfig/aux-cache

Comment 9 Panu Matilainen 2019-05-29 12:19:55 UTC
Okay, this should be good to go (but I don't have an easy reproducer for the issue itself):

$ rpm -qplv glibc-common-2.17-292.el7.bz1714888.1.x86_64.rpm |grep /locale-archive
-rw-r--r--    1 root     root                106065680 May 29 13:54 /usr/lib/locale/locale-archive
-rw-r--r--    1 root     root                106065680 May 29 13:54 /usr/lib/locale/locale-archive.tmpl

Comment 10 Karel Srot 2019-05-29 13:14:04 UTC
Hi,
the change seem to be effective

[root@ts ~]# dd if=/dev/zero of=/dev/mychroot bs=1M count=200
[root@ts ~]# losetup /dev/loop0 /dev/mychroot
[root@ts ~]# mkfs.ext4 /dev/loop0
[root@ts ~]# mount /dev/loop0 /tmp/chroot/

new glibc:

[root@ts ~]# rpm -iv --root /tmp/chroot/ --test --nodeps glibc-common-2.17-292.el7.bz1714888.1.x86_64.rpm 
Preparing packages...
	installing package glibc-common-2.17-292.el7.bz1714888.1.x86_64 needs 55MB on the / filesystem

old glibc:

[root@ts ~]# rpm -iv --root /tmp/chroot/ --test --nodeps glibc-common-2.17-292.el7.
glibc-common-2.17-292.el7.bz1714888.1.x86_64.rpm  glibc-common-2.17-292.el7.x86_64.rpm              
[root@ts ~]# rpm -iv --root /tmp/chroot/ --test --nodeps glibc-common-2.17-292.el7.x86_64.rpm 
Preparing packages...
[root@ts ~]#

Comment 12 Florian Weimer 2019-08-01 08:47:41 UTC
Panu, I can't get this to work with rpm-4.11.3-40.el7.x86_64:

# rpm -qplv glibc-common-2.17-304.el7.x86_64.rpm | grep /locale-archive
-rw-r--r--    1 root    root                106180368 Aug  1 02:47 /usr/lib/locale/locale-archive
-rw-r--r--    1 root    root                106180368 Aug  1 02:47 /usr/lib/locale/locale-archive.tmpl

So the %ghost size record is there.  (It's not a hard-link ghost file, either. 8-p)

I have 167 MiB available:

# df -h /usr
Filesystem                           Size  Used Avail Use% Mounted on
/dev/mapper/rhel_nnn--n420--01-root   50G   50G  167M 100% /

RPM testing passes:

# rpm -Uv --test glibc-2.17-304.el7.x86_64.rpm glibc-common-2.17-304.el7.x86_64.rpm 
Preparing packages...

Installation fails:

# rpm -Uv glibc-2.17-304.el7.x86_64.rpm glibc-common-2.17-304.el7.x86_64.rpm 
Preparing packages...
glibc-common-2.17-304.el7.x86_64
/usr/sbin/build-locale-archive: cannot add to locale archive: No such file or directory
/usr/sbin/build-locale-archive: cannot add to locale archive: No such file or directory
glibc-2.17-304.el7.x86_64
/usr/sbin/build-locale-archive: cannot add to locale archive: No such file or directory
/usr/sbin/build-locale-archive: cannot add to locale archive: No such file or directory
glibc-common-2.17-292.el7.x86_64
glibc-2.17-292.el7.x86_64

Unless I'm missing something, RPM doesn't take the %ghost size into account in this version, so we cannot fix this bug.

Comment 13 Panu Matilainen 2019-08-01 09:29:47 UTC
All rpm versions in existence should take the %ghost size into account, but there are always any number of other factors at play. The original bug 1491786 is one possibility (so might be interesting to test with rpm >= 4.11.3-36.el7) but with tight margins all sorts of rounding to block size etc comes to play so coming up with a reliable reproducer can be hard. Try giving it a bit more room to maneuver.

Comment 14 Florian Weimer 2019-08-01 09:31:31 UTC
This may have something to do with the other change we are making at the same time:

-%attr(0644,root,root) %verify(not md5 size mtime mode) %ghost %config(missingok,noreplace) %{_prefix}/lib/locale/locale-archive
+%attr(0644,root,root) %verify(not md5 size mtime mode) %ghost %{_prefix}/lib/locale/locale-archive

I get FA_SKIP from rpmfiDecideFateIndex for this file, which I believe will exempt it from size accounting.

Comment 15 Florian Weimer 2019-08-01 09:45:53 UTC
Yes, I think it's the %config change that interacts with this.  The size accounting works for future updates, but not an update which combines both changes.

Comment 16 Florian Weimer 2019-08-01 09:47:15 UTC
Maybe we should move bug 1717512 to a z-stream update to compensate for this?  But there will always be some customers that will get both changes at the same time.

Comment 17 Panu Matilainen 2019-08-01 10:06:35 UTC
Nah, it's nowhere near worth bothering with z-stream. This is a rare corner-case issue at best and there'll always be cases where we fail when margins are tight no matter how many tweaks like this we do. It's good to have it fixed though.

I suppose the %config change could affect the outcome but can't say for sure offhand.

Comment 19 Sergey Kolosov 2019-10-03 17:16:22 UTC
Verified, glibc-common rpm size now calculated with the double size of /usr/lib/locale/locale-archive.tmpl file.

Comment 21 errata-xmlrpc 2020-03-31 19:08:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0989