Bug 1213602
| Summary: | Yum/rpm should work inside a container backed by overlayfs | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Eric Paris <eparis> | |
| Component: | yum-utils | Assignee: | Valentina Mukhamedzhanova <vmukhame> | |
| Status: | CLOSED ERRATA | QA Contact: | Karel Srot <ksrot> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | 7.2 | CC: | avi.miller, david.jones74, dhowells, ekuric, eparis, ffesti, fweimer, greg.martyn, ikent, jafshar, james.antill, jeder, jherrman, jzeleny, ksrot, lkardos, matthew, mgoldman, packaging-team-maint, perfbz, podvody, sct, vgoyal, viggiani, vmukhame, walters, zsvetlik | |
| Target Milestone: | rc | |||
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | yum-utils-1.1.31-33.el7 | Doc Type: | Bug Fix | |
| Doc Text: | When using yum inside a container with the OverlayFS service as storage, yum terminated unexpectedly with an "Rpmdb checksum is invalid" error". This update adds the yum-plugin-ovl sub-package, which ensures that yum works as expected in the described scenario. | Story Points: | --- | |
| Clone Of: | 1212533 | |||
| : | 1216064 1260977 (view as bug list) | Environment: | ||
| Last Closed: | 2015-11-19 12:10:38 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1212533 | |||
| Bug Blocks: | 1216064, 1259679, 1260977, 1260979 | |||
| 
        
          Description
        
        
          Eric Paris
        
        
        
        
        
          2015-04-20 21:37:49 UTC
        
       > This behavior is 'correct,' documented, and expected in overlayfs.
Well, the overlayfs docs is not the right place to look up basic file system semantics. May be you guys should have a look into the POSIX standard from time to time...
Assuming that it is not sufficient to only clean up some yum caches from the base image an intermediate solution could be to lift the files in question (/var/lib/rpm +  may be some yum files) to the overlay FS by touching them or - if this is not enough - opening them RW as part of the spin up of the container. Having such a mechanism will become handy for fixing all the other applications that run into the same trap.
I wish I could say that I am surprised about this issue. For years - or may be even decades - no one has been able to create a working union, overlay, you-name-it file system that actually works. This is just another instance of not having solved the actual problem at hand. Are there no capable file system developers in the kernel or are they just too busy to turn their attention such a trivial problem?
Anyway, I am actually hesitant to fix this in rpm or yum and to become complicit in enabling the use of an non POSIX compliant file system for our OS and removing the immediate need for a proper solution: Fixing overlayfs or any of the other - non-working - implementations. Or finally giving up the idea as impractical or impossible.
Had a short look at the error message and the code producing it. It looks like this is triggered by yum's cache of the rpmdb. Removing the files in /var/lib/yum/rpmdb-indexes/ from the base image should make the immediate error go away. Also the error message is surrounded by an if __debug__: line. So switching off debugging for yum should also make the error go away and trigger the recreating of the cache. I switch this bugreport over to yum for now - as that's where error happens. Fixing the images is probably the easier way of getting rid of this (Unless docker misses the means to do so) (In reply to Florian Festi from comment #2) > > This behavior is 'correct,' documented, and expected in overlayfs. > > Well, the overlayfs docs is not the right place to look up basic file system > semantics. May be you guys should have a look into the POSIX standard from > time to time... Overlayfs provides a restricted subset of the POSIX standards because implementing the same level as the real filesystems is very difficult. Consider the following: fd1 = open("rpmdb", O_RDONLY) fd2 = open("rpmdb", O_RDWR) fstat(fd1, &st1); fstat(fd2, &st2); With overlayfs, not only can you not guarantee fd2 to correspond to the same file as fd1, you cannot expect st1.{st_dev,st_ino} == st2.{st_dev,st_ino}. The true difficulty in the kernel is making the latter work. Every time you opened a R/O file, you would have to allocate an inode number on the upper (destination) fs and this would then have to be *persistent* such that each time the same R/O file was opened, you got the same inode number. This would require some serious fudging of the filesystem code and the upper fs could then be horribly broken by accidentally mismatching the upper and lower layers in an overlay. Further, you would also have a potential problem if someone altered the lower fs in such a way that a file got deleted and a new one created with the same inode number. Yes, some filesystems have a generation number, but not all. (In reply to Florian Festi from comment #3) > Had a short look at the error message and the code producing it. It looks > like this is triggered by yum's cache of the rpmdb. Removing the files in > /var/lib/yum/rpmdb-indexes/ from the base image should make the immediate > error go away. Also the error message is surrounded by an if __debug__: > line. So switching off debugging for yum should also make the error go away > and trigger the recreating of the cache. > > I switch this bugreport over to yum for now - as that's where error happens. > Fixing the images is probably the easier way of getting rid of this (Unless > docker misses the means to do so) I tried to to remove files in /var/lib/yum/rpmdb-indexes/, docker build fails with [1] - this is when I only removed /var/lib/yum/rpmdb-indexes/version - lstat is always there if any of files is removed ( or all ) providing inside container /etc/yum.conf with debuglevel=0 did not helped, the error message I got when executing docker build is [2] - please note that in this case I did not removed any of files in /var/lib/yum/rpmdb-indexes/ thank you Elvir [1] lstat /mnt/docker/overlay/bc61310a8ce35ab13a7d18ca67066c5ceeb35222656840c285f160325a2161cc/merged/var/lib/yum/rpmdb-indexes/version: no such file or directory time="2015-04-28T07:16:22-04:00" level=info msg="-job build() = ERR (1)" INFO[0009] lstat /mnt/docker/overlay/bc61310a8ce35ab13a7d18ca67066c5ceeb35222656840c285f160325a2161cc/merged/var/lib/yum/rpmdb-indexes/version: no such file or directory [2] time="2015-04-28T07:25:23-04:00" level=info msg="-job release_interface(7dbf2ebdc0db5d6aa059ca047cff381a5393b43cddc0505c5f7e575815293b61) = OK (0)" lstat /mnt/docker/overlay/7dbf2ebdc0db5d6aa059ca047cff381a5393b43cddc0505c5f7e575815293b61/merged/var/lib/yum/rpmdb-indexes/conflicts: no such file or directory time="2015-04-28T07:25:23-04:00" level=info msg="-job build() = ERR (1)" INFO[0111] lstat /mnt/docker/overlay/7dbf2ebdc0db5d6aa059ca047cff381a5393b43cddc0505c5f7e575815293b61/merged/var/lib/yum/rpmdb-indexes/conflicts: no such file or directory (In reply to Jan Zeleny from comment #7) > Note that yum is obsolete in Fedora as of F22 so the right component to file > such a bug against is dnf, assuming the bug is still there. Fedora rawhide clone is: https://bugzilla.redhat.com/show_bug.cgi?id=1216064 Elvir tried both workarounds. Clearing existing needinfo, but ... This issue prevents us from properly testing docker builds on overlay-backed docker storage. Need to know if this can be fixed in yum for RHEL 7.2 (see c#4). Thanks. Sorry, looks like the error message had misled me into thinking that the caches get mixed up. Looks like the problem actually is the rpmdb which yum still has opened read-only. Executing "touch /var/lib/rpm/*" right at the beginning makes the error go away for me. So we are back to my first comment: Easiest way of fixing this is just executing "touch /var/lib/rpm/*" on container spin-up. While I am not the yum maintainer and the decision is not mine I kinda have doubt that yum wants to change the way it handles the rpmdb - especially in the 7.2 time frame. If touching the rpmdb is for some reason to complicated for docker or wasting the disk space is unwanted you can consider shipping a yum plug-in that touches the rpmdb files. (In reply to Florian Festi from comment #11) > Sorry, looks like the error message had misled me into thinking that the > caches get mixed up. Looks like the problem actually is the rpmdb which yum > still has opened read-only. Executing "touch /var/lib/rpm/*" right at the > beginning makes the error go away for me. > > So we are back to my first comment: Easiest way of fixing this is just > executing "touch /var/lib/rpm/*" on container spin-up. While I am not the > yum maintainer and the decision is not mine I kinda have doubt that yum > wants to change the way it handles the rpmdb - especially in the 7.2 time > frame. If touching the rpmdb is for some reason to complicated for docker or > wasting the disk space is unwanted you can consider shipping a yum plug-in > that touches the rpmdb files. you wrote container spin-up, but what I wrote / tested in #9 is what I see when executing 'docker build', so that is before I can spin up container. However I tried advice from #11, and it ends with [1]. I added RUN touch /var/lib/rpm/* as first command in dockerifle. Packages I am trying to install during image build are listed below and I am using rhel7 base image FROM registry.access.redhat.com/redhat/rhel7 RUN yum clean all; time yum install -y bc blktrace btrfs-progs ethtool findutils gcc gdb git glibc-common glibc-utils gnuplot httpd hwloc iotop iproute iputils less pciutils ltrace mailx man-db netsniff-ng net-tools numactl numactl-devel passwd procps-ng psmisc screen strace tar tcpdump vim-enhanced wget xauth which What docker version you have and what is dockerfile which lead to sucessfull build of docker image when docker is started with -s overlay option thank you kind regards, Elvir [1] time="2015-04-29T11:45:04-04:00" level=info msg="+job log(die, 994cdd6f69f282b3a28c91001a55684b2ce1f481ed8a9e57acbb62b726d9ef58, 03071466b791)" time="2015-04-29T11:45:04-04:00" level=info msg="-job log(die, 994cdd6f69f282b3a28c91001a55684b2ce1f481ed8a9e57acbb62b726d9ef58, 03071466b791) = OK (0)" time="2015-04-29T11:45:04-04:00" level=info msg="+job release_interface(994cdd6f69f282b3a28c91001a55684b2ce1f481ed8a9e57acbb62b726d9ef58)" time="2015-04-29T11:45:04-04:00" level=info msg="-job release_interface(994cdd6f69f282b3a28c91001a55684b2ce1f481ed8a9e57acbb62b726d9ef58) = OK (0)" lstat /mnt/docker/overlay/994cdd6f69f282b3a28c91001a55684b2ce1f481ed8a9e57acbb62b726d9ef58/merged/usr/lib64/libgcc_s-4.8.2-20140120.so.1: no such file or directory time="2015-04-29T11:45:04-04:00" level=info msg="-job build() = ERR (1)" time="2015-04-29T11:45:04-04:00" level=info msg="lstat /mnt/docker/overlay/994cdd6f69f282b3a28c91001a55684b2ce1f481ed8a9e57acbb62b726d9ef58/merged/usr/lib64/libgcc_s-4.8.2-20140120.so.1: no such file or directory" #3 and though #9 is most likely a red herring.
I have not build a docker image. I follow the reproducer instructions in the original bug report and was able to reproduce the error. After having set things up for OverlayFS I instead do:
docker run -ti fedora /bin/bash
touch /var/lib/rpm/*
yum install -y quilt
With the two later commands being executed in the container. This made the "Rpmdb checksum is invalid: dCDPT(pkg checksums): groff-base.x86_64 0:1.22.2-11.fc21 - u" error go away. This means that elevating the rpmdb in the upper layer fixes this issue. No idea how to do this for your docker build use case, though.
I am using Fedora 21 both as host and as container. 
> rpm -q docker-io
docker-io-1.6.0-2.git3eac457.fc21.x86_64
Given the nature of the issue the exact version of docker and the OS should actually not matter.
I think telling people to do `touch /var/lib/rpm/*` in their Dockerfiles today makes sense. Even if this bug was fixed *today*, we'd still have RHEL6 base images, not to mention probably years of people running 7.0/7.1 images. That said, it seems really worth having someone who knows this part of the yum codebase to make it overlayfs compatible. Upstream commits:
commit 617d2d90a553f9e5bc4dfd9ab2f9c194b956fcab
Author: Valentina Mukhamedzhanova <vmukhame>
Date:   Thu Jun 25 12:53:39 2015 +0200
    ovl plugin: get rpmdbpath from conduit
commit 1555cfa6465e6e31515a86f097c8993d89c0085e
Author: Valentina Mukhamedzhanova <vmukhame>
Date:   Thu Jun 25 12:30:13 2015 +0200
    ovl plugin: fix indentation
commit 0c0b029122b476c269a4b560d9be558e69e054ae
Author: Valentina Mukhamedzhanova <vmukhame>
Date:   Thu Jun 25 12:09:52 2015 +0200
    Add plugin for overlayfs issue workaround. Patch by Pavel Odvody. BZ#1213602
I am having problems building a container when yum is used. This is the same issue mentioned in comment #9. I have opened new ticket for this issue: #1256299. yum-utils-1.1.31-28.fc21 has been pushed to the Fedora 21 stable repository. If problems still persist, please make note of it in this bug report. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-2129.html Is this going to be fixed for docker images that rely on CentOS 6, or will those images remain unusable if the docker server uses the "overlay" storage driver? For me, not working in Red Hat 7.2 "latest". I had to implement the workaround: RUN touch /var/lib/rpm/* in my Dockerfile, before "yum install" or similar. Note that the workaround must be run prior to yum install and in the same build layer. Tested with the CentOS 6.6 image. |