Bug 1477787

Summary: oci-umount RPM prevents /var/lib/docker/containers from being mounted in fluentd pods
Product: Red Hat Enterprise Linux 7 Reporter: Peter Portante <pportant>
Component: dockerAssignee: Vivek Goyal <vgoyal>
Status: CLOSED ERRATA QA Contact: atomic-bugs <atomic-bugs>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.4CC: amurdaca, anli, aos-bugs, ddarrah, dwalsh, eparis, imcleod, jcantril, jkaur, jokerman, kurktchiev, lsm5, mmccomas, pportant, rmeggins, rromerom, sdodson, tparsons, vgoyal
Target Milestone: rcKeywords: Extras, OpsBlocker
Target Release: 7.4   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: docker-1.12.6-55.gitc4618fb.el7_4 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-09-05 10:35:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1478821    

Description Peter Portante 2017-08-03 01:23:01 UTC
On a recently upgraded OpenShift 3.5 cluster, oci-umount package was installed (oci-umount.x86_64  2:1.12.6-48.git0fdc778.el7  @rhel-7-server-extras-rpms), which by defaults contains "/var/lib/docker/containers" in its /etc/oci-umount.conf file, effectively preventing the fluentd pods from collecting docker logs when the json-file driver option is in use.

We used this playbook [1] to fix the cluster on the fly.

[1] https://github.com/openshift/openshift-ansible-ops/pull/2941

Comment 1 Eric Paris 2017-08-03 13:14:45 UTC
The expectation of oci-umount was that it would unmount everything under /var/lib/docker/containers/*  but since the container is run with

-v /var/lib/docker:/var/lib/docker 

/var/lib/docker itself is a mount point. Thus the volume itself is being unmounted, and is in turn ruining at least 1 of the 2 use cases which were the entire point: to allow our logging to work.

Comment 2 Eric Paris 2017-08-03 13:24:10 UTC
I misspoke in #11. It is

-v /var/lib/docker/containers:/var/lib/docker/containers

Comment 4 Vivek Goyal 2017-08-07 15:30:02 UTC
/var/lib/docker/containers/ is a bind mount. Even if oci-umount unmounts it (to remove shm mount points underneath it), fluentd containers should still be able to access json based log files present in /var/lib/docker/containers/ directory.

We designed oci-umount in such a way so that fluentd container could access logs. So there is more to the story. Can somebody please explain what exactly is going on.

Comment 5 Eric Paris 2017-08-07 15:53:48 UTC
"Even if oci-umount unmounts it [snip], fluentd containers should still be able to access json based log files present in [it]"

That seems logically inconsistent. This is easy to reproduce:

# rpm -q oci-umount
oci-umount-1.13.1-20.git27e468e.fc26.x86_64
# cat /etc/oci-umount.conf 
/var/lib/docker/overlay2
/var/lib/docker/overlay
/var/lib/docker/devicemapper
/var/lib/docker/containers
/var/lib/docker-latest/overlay2
/var/lib/docker-latest/overlay
/var/lib/docker-latest/devicemapper
/var/lib/docker-latest/containers

# docker run -ti --rm -v /var/lib/docker/containers:/var/lib/docker/containers fedora ls /var/lib/docker/containers/
[nothing]

# docker run -ti --rm -v /var/lib/docker:/var/lib/docker fedora ls /var/lib/docker/containers/
b4c66ccbfbbd4b75e9aa8ad0c30f164a0a4730adad40bc50aa4cd77f771d8918

oci-umount is unmounting the bind mount so we can not get to the json files inside the bind mount.

Comment 6 Vivek Goyal 2017-08-07 16:04:27 UTC
Ok, got it. So in that case fluentd can mount /var/lib/docker and things should work?

-v /var/lib/docker:/var/lib/docker and oci-umount will leave /var/lib/docker mountpoint in place and unmount /var/lib/docker/containers?

Comment 7 Eric Paris 2017-08-07 16:49:24 UTC
Yes, it is possible to work around this regression. However since fluentd has been running successfully for months (years) and is already in production and numerous customer's sites, this regression breaks existing functional systems.

Comment 8 Daniel Walsh 2017-08-07 17:11:30 UTC
What I don't understand, is they supposedly tested the oci-umount and gave us the patch.  I believe my patch will not work, since I misunderstood the way we were handling the umount.  The current code umounts the mount points in /etc/oci-umount wherever they are mounted in the container.  The way we are umounting it, will get all submounts under these mount points.

Since /var/lib/docker/containers is not usually a mount point, we actually made it into one, just so we could get rid of the "ROOTPATH/var/lib/docker/containers/*/shm".  This is the way oci-umount was designed.

Comment 9 Vivek Goyal 2017-08-08 13:54:15 UTC
(In reply to Eric Paris from comment #7)
> Yes, it is possible to work around this regression. However since fluentd
> has been running successfully for months (years) and is already in
> production and numerous customer's sites, this regression breaks existing
> functional systems.

Ok, I have proposed a PR to solve this issue.

https://github.com/projectatomic/oci-umount/pull/15

Now if one adds a suffix "/*" to path in /etc/oci-umount.conf then only submounts of that mount will be unmounted. So in this case /var/lib/docker/containers/* has been specified by default and only submounts of /var/lib/docker/containers/ will go away while /var/lib/docker/containers/ will continue to be in container.

I will also clone this bug so that fluentd can move to volume mounting /var/lib/docker/ instead of /var/lib/docker/containers in future.

We also need to figure a way out how to do synchronize our testing efforts. oci-umount work was finished quite some time back and I was under the impression that by now fluentd has been tested and things are working fine. I would prefer to catch these kind of regressions early. Not sure how to do that though.

Comment 10 Vivek Goyal 2017-08-15 12:34:27 UTC
PR mentioned in comment 9 has been merged now. Have requested lokesh for a new build. 

But I think this will solve the issue on either freshly installed systems or systems which are being upgraded from pre oci-umount version. Anything which has oci-umount already, will have old /etc/oci-umount.conf and upgrade will not replace it with new file. That means new package will continue to unmount /var/lib/docker/containers (and not submounts).

For such configurations, we will have to do the manual operation of adding "/*" at the end of /var/lib/docker/containers in /etc/oci-umount.conf.

Comment 20 Rich Megginson 2017-08-23 20:04:20 UTC
I have tested this with logging and it works.  How soon can we get this into 3.7, 3.6, and 3.5?

Comment 21 Eric Paris 2017-08-23 20:15:59 UTC
If this is listed as a config file in rpm, and we fix the default, users who did not edit will get the new default. Users who editted the file by hand will not get the new default. This is behavior I want.

So is it a config, or a config(noreplace) ?

Comment 22 Rich Megginson 2017-08-23 20:39:24 UTC
(In reply to Eric Paris from comment #21)
> If this is listed as a config file in rpm, and we fix the default, users who
> did not edit will get the new default. Users who editted the file by hand
> will not get the new default. This is behavior I want.
> 
> So is it a config, or a config(noreplace) ?

%config(noreplace) %{_sysconfdir}/oci-umount.conf

Comment 23 Daniel Walsh 2017-08-24 11:46:31 UTC
Eric is right.  I think every user will get the update, doubt any users have actually modified this file.

Comment 32 Rich Megginson 2017-08-24 18:35:14 UTC
(In reply to Vivek Goyal from comment #29)
> (In reply to Rich Megginson from comment #27)
> > > Was /etc/oci-umount.conf untouched or modified before upgrade. I just tested
> > > this on F26 and upgrading oci-umount package worked. It now has new
> > > /etc/oci-umount.conf which has "/var/lib/docker/containers/*"
> > 
> > I don't know.  I didn't personally touch or edit /etc/oci-umount.conf before
> > the upgrade.  Perhaps openshift-ansible does?
> > 
> > At any rate, I guess the consensus is that this should already be handled by
> > rpm upgrade of the oci-umount package.
> 
> What version of docker you are testing with?

-54

> Looks like in -54, we went back
> to old oci-umount. So if you are testing with -54, things will not work.

silly me, thinking that testing with -54 would be as good as testing with -52 . . .
at any rate, once I changed /etc/oci-umount.conf to use "var/lib/docker/containers/*", fluentd worked fine.

Comment 38 errata-xmlrpc 2017-09-05 10:35:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2599