Bug 1213602 - Yum/rpm should work inside a container backed by overlayfs
Yum/rpm should work inside a container backed by overlayfs
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: yum-utils (Show other bugs)
7.2
Unspecified Unspecified
urgent Severity urgent
: rc
: ---
Assigned To: Valentina Mukhamedzhanova
Karel Srot
:
Depends On: 1212533
Blocks: 1216064 1259679 1260977 1260979
  Show dependency treegraph
 
Reported: 2015-04-20 17:37 EDT by Eric Paris
Modified: 2017-10-07 19:00 EDT (History)
26 users (show)

See Also:
Fixed In Version: yum-utils-1.1.31-33.el7
Doc Type: Bug Fix
Doc Text:
When using yum inside a container with the OverlayFS service as storage, yum terminated unexpectedly with an "Rpmdb checksum is invalid" error". This update adds the yum-plugin-ovl sub-package, which ensures that yum works as expected in the described scenario.
Story Points: ---
Clone Of: 1212533
: 1216064 1260977 (view as bug list)
Environment:
Last Closed: 2015-11-19 07:10:38 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Eric Paris 2015-04-20 17:37:49 EDT
See also Bug #1212533

https://github.com/docker/docker/issues/10180

Summary:
      * You can not use overlayfs as the storage driver for docker and
        use yum. Confirmed on both rawhide (kernel 4.0) and RHEL7.

Reproducer:
      * edit /etc/sysconfig/docker-storage to have
        --storage-driver=overlay
      * setenforce 0 (known to not work)
      * docker run -ti fedora /bin/bash
      * yum install -y quilt
      * yum fails with: Rpmdb checksum is invalid: dCDPT(pkg checksums):
        groff-base.x86_64 0:1.22.2-11.fc21 - u

Slight bit of technical depth:
      * fd1 = open("rpmdb", O_RDONLY)
      * fd2 = open("rpmdb", O_RDWR)
      * write to fd2
      * read from fd1
      * Get old data
      * Because fd1 is still a handle to the underlying file, and fd2
        triggered a copy up

This behavior is 'correct,' documented, and expected in overlayfs. We would like for yum/rpm to not double open files like this, or to make sure to only open the RO file after the RW file.
Comment 2 Florian Festi 2015-04-21 10:51:46 EDT
> This behavior is 'correct,' documented, and expected in overlayfs.

Well, the overlayfs docs is not the right place to look up basic file system semantics. May be you guys should have a look into the POSIX standard from time to time...

Assuming that it is not sufficient to only clean up some yum caches from the base image an intermediate solution could be to lift the files in question (/var/lib/rpm +  may be some yum files) to the overlay FS by touching them or - if this is not enough - opening them RW as part of the spin up of the container. Having such a mechanism will become handy for fixing all the other applications that run into the same trap.

I wish I could say that I am surprised about this issue. For years - or may be even decades - no one has been able to create a working union, overlay, you-name-it file system that actually works. This is just another instance of not having solved the actual problem at hand. Are there no capable file system developers in the kernel or are they just too busy to turn their attention such a trivial problem?

Anyway, I am actually hesitant to fix this in rpm or yum and to become complicit in enabling the use of an non POSIX compliant file system for our OS and removing the immediate need for a proper solution: Fixing overlayfs or any of the other - non-working - implementations. Or finally giving up the idea as impractical or impossible.
Comment 3 Florian Festi 2015-04-22 05:09:51 EDT
Had a short look at the error message and the code producing it. It looks like this is triggered by yum's cache of the rpmdb. Removing the files in /var/lib/yum/rpmdb-indexes/ from the base image should make the immediate error go away. Also the error message is surrounded by an if __debug__: line. So switching off debugging for yum should also make the error go away and trigger the recreating of the cache.

I switch this bugreport over to yum for now - as that's where error happens. Fixing the images is probably the easier way of getting rid of this (Unless docker misses the means to do so)
Comment 8 David Howells 2015-04-27 11:47:11 EDT
(In reply to Florian Festi from comment #2)
> > This behavior is 'correct,' documented, and expected in overlayfs.
> 
> Well, the overlayfs docs is not the right place to look up basic file system
> semantics. May be you guys should have a look into the POSIX standard from
> time to time...

Overlayfs provides a restricted subset of the POSIX standards because implementing the same level as the real filesystems is very difficult.  Consider the following:

      fd1 = open("rpmdb", O_RDONLY)
      fd2 = open("rpmdb", O_RDWR)
      fstat(fd1, &st1);
      fstat(fd2, &st2);

With overlayfs, not only can you not guarantee fd2 to correspond to the same file as fd1, you cannot expect st1.{st_dev,st_ino} == st2.{st_dev,st_ino}.

The true difficulty in the kernel is making the latter work.  Every time you opened a R/O file, you would have to allocate an inode number on the upper (destination) fs and this would then have to be *persistent* such that each time the same R/O file was opened, you got the same inode number.  This would require some serious fudging of the filesystem code and the upper fs could then be horribly broken by accidentally mismatching the upper and lower layers in an overlay.

Further, you would also have a potential problem if someone altered the lower fs in such a way that a file got deleted and a new one created with the same inode number.  Yes, some filesystems have a generation number, but not all.
Comment 9 Elvir Kuric 2015-04-28 07:45:55 EDT
(In reply to Florian Festi from comment #3)
> Had a short look at the error message and the code producing it. It looks
> like this is triggered by yum's cache of the rpmdb. Removing the files in
> /var/lib/yum/rpmdb-indexes/ from the base image should make the immediate
> error go away. Also the error message is surrounded by an if __debug__:
> line. So switching off debugging for yum should also make the error go away
> and trigger the recreating of the cache.
> 
> I switch this bugreport over to yum for now - as that's where error happens.
> Fixing the images is probably the easier way of getting rid of this (Unless
> docker misses the means to do so)

I tried to to remove files in /var/lib/yum/rpmdb-indexes/, docker build fails with [1] - this is when I only removed /var/lib/yum/rpmdb-indexes/version - lstat is always there if any of files is removed ( or all ) 


providing inside container /etc/yum.conf with debuglevel=0 did not helped, the error message I got when executing docker build is [2] - please note that in this case I did not removed any of files in /var/lib/yum/rpmdb-indexes/ 

thank you 

Elvir 

[1] 

lstat /mnt/docker/overlay/bc61310a8ce35ab13a7d18ca67066c5ceeb35222656840c285f160325a2161cc/merged/var/lib/yum/rpmdb-indexes/version: no such file or directory
time="2015-04-28T07:16:22-04:00" level=info msg="-job build() = ERR (1)" 
INFO[0009] lstat /mnt/docker/overlay/bc61310a8ce35ab13a7d18ca67066c5ceeb35222656840c285f160325a2161cc/merged/var/lib/yum/rpmdb-indexes/version: no such file or directory 




[2] 
time="2015-04-28T07:25:23-04:00" level=info msg="-job release_interface(7dbf2ebdc0db5d6aa059ca047cff381a5393b43cddc0505c5f7e575815293b61) = OK (0)" 
lstat /mnt/docker/overlay/7dbf2ebdc0db5d6aa059ca047cff381a5393b43cddc0505c5f7e575815293b61/merged/var/lib/yum/rpmdb-indexes/conflicts: no such file or directory
time="2015-04-28T07:25:23-04:00" level=info msg="-job build() = ERR (1)" 
INFO[0111] lstat /mnt/docker/overlay/7dbf2ebdc0db5d6aa059ca047cff381a5393b43cddc0505c5f7e575815293b61/merged/var/lib/yum/rpmdb-indexes/conflicts: no such file or directory
Comment 10 Jeremy Eder 2015-04-28 08:40:33 EDT
(In reply to Jan Zeleny from comment #7)
> Note that yum is obsolete in Fedora as of F22 so the right component to file
> such a bug against is dnf, assuming the bug is still there.

Fedora rawhide clone is:
https://bugzilla.redhat.com/show_bug.cgi?id=1216064

Elvir tried both workarounds.  Clearing existing needinfo, but ...

This issue prevents us from properly testing docker builds on overlay-backed docker storage.
Need to know if this can be fixed in yum for RHEL 7.2 (see c#4).  Thanks.
Comment 11 Florian Festi 2015-04-29 10:26:07 EDT
Sorry, looks like the error message had misled me into thinking that the caches get mixed up. Looks like the problem actually is the rpmdb which yum still has opened read-only. Executing "touch /var/lib/rpm/*" right at the beginning makes the error go away for me.

So we are back to my first comment: Easiest way of fixing this is just executing "touch /var/lib/rpm/*" on container spin-up. While I am not the yum maintainer and the decision is not mine I kinda have doubt that yum wants to change the way it handles the rpmdb - especially in the 7.2 time frame. If touching the rpmdb is for some reason to complicated for docker or wasting the disk space is unwanted you can consider shipping a yum plug-in that touches the rpmdb files.
Comment 12 Elvir Kuric 2015-04-29 11:50:53 EDT
(In reply to Florian Festi from comment #11)
> Sorry, looks like the error message had misled me into thinking that the
> caches get mixed up. Looks like the problem actually is the rpmdb which yum
> still has opened read-only. Executing "touch /var/lib/rpm/*" right at the
> beginning makes the error go away for me.
> 
> So we are back to my first comment: Easiest way of fixing this is just
> executing "touch /var/lib/rpm/*" on container spin-up. While I am not the
> yum maintainer and the decision is not mine I kinda have doubt that yum
> wants to change the way it handles the rpmdb - especially in the 7.2 time
> frame. If touching the rpmdb is for some reason to complicated for docker or
> wasting the disk space is unwanted you can consider shipping a yum plug-in
> that touches the rpmdb files.

you wrote container spin-up, but what I wrote / tested in #9 is what I see when executing 'docker build', so that is before I can spin up container. 

However I tried advice from #11, and it ends with [1]. I added RUN touch /var/lib/rpm/* as first command in dockerifle.

Packages I am trying to install during image build are listed below and I am using rhel7 base image

FROM registry.access.redhat.com/redhat/rhel7

RUN yum clean all; time yum install -y bc blktrace btrfs-progs ethtool findutils gcc gdb git glibc-common glibc-utils gnuplot httpd hwloc iotop iproute iputils less pciutils ltrace mailx man-db netsniff-ng net-tools numactl numactl-devel passwd procps-ng psmisc screen strace tar tcpdump vim-enhanced wget xauth which

What docker version you have and what is dockerfile which lead to sucessfull build of docker image when docker is started with -s overlay option

thank you 

kind regards, 

Elvir 

 

[1] 

time="2015-04-29T11:45:04-04:00" level=info msg="+job log(die, 994cdd6f69f282b3a28c91001a55684b2ce1f481ed8a9e57acbb62b726d9ef58, 03071466b791)" 
time="2015-04-29T11:45:04-04:00" level=info msg="-job log(die, 994cdd6f69f282b3a28c91001a55684b2ce1f481ed8a9e57acbb62b726d9ef58, 03071466b791) = OK (0)" 
time="2015-04-29T11:45:04-04:00" level=info msg="+job release_interface(994cdd6f69f282b3a28c91001a55684b2ce1f481ed8a9e57acbb62b726d9ef58)" 
time="2015-04-29T11:45:04-04:00" level=info msg="-job release_interface(994cdd6f69f282b3a28c91001a55684b2ce1f481ed8a9e57acbb62b726d9ef58) = OK (0)" 
lstat /mnt/docker/overlay/994cdd6f69f282b3a28c91001a55684b2ce1f481ed8a9e57acbb62b726d9ef58/merged/usr/lib64/libgcc_s-4.8.2-20140120.so.1: no such file or directory
time="2015-04-29T11:45:04-04:00" level=info msg="-job build() = ERR (1)" 
time="2015-04-29T11:45:04-04:00" level=info msg="lstat /mnt/docker/overlay/994cdd6f69f282b3a28c91001a55684b2ce1f481ed8a9e57acbb62b726d9ef58/merged/usr/lib64/libgcc_s-4.8.2-20140120.so.1: no such file or directory"
Comment 13 Florian Festi 2015-04-29 13:13:53 EDT
#3 and though #9 is most likely a red herring.

I have not build a docker image. I follow the reproducer instructions in the original bug report and was able to reproduce the error. After having set things up for OverlayFS I instead do:

docker run -ti fedora /bin/bash
touch /var/lib/rpm/*
yum install -y quilt

With the two later commands being executed in the container. This made the "Rpmdb checksum is invalid: dCDPT(pkg checksums): groff-base.x86_64 0:1.22.2-11.fc21 - u" error go away. This means that elevating the rpmdb in the upper layer fixes this issue. No idea how to do this for your docker build use case, though.

I am using Fedora 21 both as host and as container. 
> rpm -q docker-io
docker-io-1.6.0-2.git3eac457.fc21.x86_64
Given the nature of the issue the exact version of docker and the OS should actually not matter.
Comment 14 Colin Walters 2015-05-04 13:21:21 EDT
I think telling people to do `touch /var/lib/rpm/*` in their Dockerfiles today makes sense.  Even if this bug was fixed *today*, we'd still have RHEL6 base images, not to mention probably years of people running 7.0/7.1 images.

That said, it seems really worth having someone who knows this part of the yum codebase to make it overlayfs compatible.
Comment 19 Valentina Mukhamedzhanova 2015-06-25 08:18:31 EDT
Upstream commits:

commit 617d2d90a553f9e5bc4dfd9ab2f9c194b956fcab
Author: Valentina Mukhamedzhanova <vmukhame@redhat.com>
Date:   Thu Jun 25 12:53:39 2015 +0200

    ovl plugin: get rpmdbpath from conduit

commit 1555cfa6465e6e31515a86f097c8993d89c0085e
Author: Valentina Mukhamedzhanova <vmukhame@redhat.com>
Date:   Thu Jun 25 12:30:13 2015 +0200

    ovl plugin: fix indentation

commit 0c0b029122b476c269a4b560d9be558e69e054ae
Author: Valentina Mukhamedzhanova <vmukhame@redhat.com>
Date:   Thu Jun 25 12:09:52 2015 +0200

    Add plugin for overlayfs issue workaround. Patch by Pavel Odvody. BZ#1213602
Comment 36 Matthew Gyurgyik 2015-08-24 06:21:28 EDT
I am having problems building a container when yum is used. This is the same issue mentioned in comment #9. I have opened new ticket for this issue: #1256299.
Comment 49 Fedora Update System 2015-10-17 19:20:59 EDT
yum-utils-1.1.31-28.fc21 has been pushed to the Fedora 21 stable repository. If problems still persist, please make note of it in this bug report.
Comment 50 errata-xmlrpc 2015-11-19 07:10:38 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2129.html
Comment 51 Jamshid Afshar 2015-12-08 21:57:23 EST
Is this going to be fixed for docker images that rely on CentOS 6, or will those images remain unusable if the docker server uses the "overlay" storage driver?
Comment 52 Mimmus 2016-07-12 05:25:41 EDT
For me, not working in Red Hat 7.2 "latest".

I had to implement the workaround:
 RUN touch /var/lib/rpm/*
in my Dockerfile, before "yum install" or similar.

Note You need to log in before you can comment on or make changes to this bug.