Red Hat Bugzilla – Bug 1477787
oci-umount RPM prevents /var/lib/docker/containers from being mounted in fluentd pods
Last modified: 2018-04-02 17:10:33 EDT
On a recently upgraded OpenShift 3.5 cluster, oci-umount package was installed (oci-umount.x86_64 2:1.12.6-48.git0fdc778.el7 @rhel-7-server-extras-rpms), which by defaults contains "/var/lib/docker/containers" in its /etc/oci-umount.conf file, effectively preventing the fluentd pods from collecting docker logs when the json-file driver option is in use.
We used this playbook  to fix the cluster on the fly.
The expectation of oci-umount was that it would unmount everything under /var/lib/docker/containers/* but since the container is run with
/var/lib/docker itself is a mount point. Thus the volume itself is being unmounted, and is in turn ruining at least 1 of the 2 use cases which were the entire point: to allow our logging to work.
I misspoke in #11. It is
/var/lib/docker/containers/ is a bind mount. Even if oci-umount unmounts it (to remove shm mount points underneath it), fluentd containers should still be able to access json based log files present in /var/lib/docker/containers/ directory.
We designed oci-umount in such a way so that fluentd container could access logs. So there is more to the story. Can somebody please explain what exactly is going on.
"Even if oci-umount unmounts it [snip], fluentd containers should still be able to access json based log files present in [it]"
That seems logically inconsistent. This is easy to reproduce:
# rpm -q oci-umount
# cat /etc/oci-umount.conf
# docker run -ti --rm -v /var/lib/docker/containers:/var/lib/docker/containers fedora ls /var/lib/docker/containers/
# docker run -ti --rm -v /var/lib/docker:/var/lib/docker fedora ls /var/lib/docker/containers/
oci-umount is unmounting the bind mount so we can not get to the json files inside the bind mount.
Ok, got it. So in that case fluentd can mount /var/lib/docker and things should work?
-v /var/lib/docker:/var/lib/docker and oci-umount will leave /var/lib/docker mountpoint in place and unmount /var/lib/docker/containers?
Yes, it is possible to work around this regression. However since fluentd has been running successfully for months (years) and is already in production and numerous customer's sites, this regression breaks existing functional systems.
What I don't understand, is they supposedly tested the oci-umount and gave us the patch. I believe my patch will not work, since I misunderstood the way we were handling the umount. The current code umounts the mount points in /etc/oci-umount wherever they are mounted in the container. The way we are umounting it, will get all submounts under these mount points.
Since /var/lib/docker/containers is not usually a mount point, we actually made it into one, just so we could get rid of the "ROOTPATH/var/lib/docker/containers/*/shm". This is the way oci-umount was designed.
(In reply to Eric Paris from comment #7)
> Yes, it is possible to work around this regression. However since fluentd
> has been running successfully for months (years) and is already in
> production and numerous customer's sites, this regression breaks existing
> functional systems.
Ok, I have proposed a PR to solve this issue.
Now if one adds a suffix "/*" to path in /etc/oci-umount.conf then only submounts of that mount will be unmounted. So in this case /var/lib/docker/containers/* has been specified by default and only submounts of /var/lib/docker/containers/ will go away while /var/lib/docker/containers/ will continue to be in container.
I will also clone this bug so that fluentd can move to volume mounting /var/lib/docker/ instead of /var/lib/docker/containers in future.
We also need to figure a way out how to do synchronize our testing efforts. oci-umount work was finished quite some time back and I was under the impression that by now fluentd has been tested and things are working fine. I would prefer to catch these kind of regressions early. Not sure how to do that though.
PR mentioned in comment 9 has been merged now. Have requested lokesh for a new build.
But I think this will solve the issue on either freshly installed systems or systems which are being upgraded from pre oci-umount version. Anything which has oci-umount already, will have old /etc/oci-umount.conf and upgrade will not replace it with new file. That means new package will continue to unmount /var/lib/docker/containers (and not submounts).
For such configurations, we will have to do the manual operation of adding "/*" at the end of /var/lib/docker/containers in /etc/oci-umount.conf.
I have tested this with logging and it works. How soon can we get this into 3.7, 3.6, and 3.5?
If this is listed as a config file in rpm, and we fix the default, users who did not edit will get the new default. Users who editted the file by hand will not get the new default. This is behavior I want.
So is it a config, or a config(noreplace) ?
(In reply to Eric Paris from comment #21)
> If this is listed as a config file in rpm, and we fix the default, users who
> did not edit will get the new default. Users who editted the file by hand
> will not get the new default. This is behavior I want.
> So is it a config, or a config(noreplace) ?
Eric is right. I think every user will get the update, doubt any users have actually modified this file.
(In reply to Vivek Goyal from comment #29)
> (In reply to Rich Megginson from comment #27)
> > > Was /etc/oci-umount.conf untouched or modified before upgrade. I just tested
> > > this on F26 and upgrading oci-umount package worked. It now has new
> > > /etc/oci-umount.conf which has "/var/lib/docker/containers/*"
> > I don't know. I didn't personally touch or edit /etc/oci-umount.conf before
> > the upgrade. Perhaps openshift-ansible does?
> > At any rate, I guess the consensus is that this should already be handled by
> > rpm upgrade of the oci-umount package.
> What version of docker you are testing with?
> Looks like in -54, we went back
> to old oci-umount. So if you are testing with -54, things will not work.
silly me, thinking that testing with -54 would be as good as testing with -52 . . .
at any rate, once I changed /etc/oci-umount.conf to use "var/lib/docker/containers/*", fluentd worked fine.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.