Bug 1200394

Summary: "atomic run --spc" fails in latest docker builds due to problems mapping /run/ into container
Product: Red Hat Enterprise Linux 7 Reporter: Stephen Tweedie <sct>
Component: dockerAssignee: Daniel Walsh <dwalsh>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.1CC: cgoern, lsm5, lsu, vgoyal, vpavlin
Target Milestone: rcKeywords: Extras, Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: docker-1.5.0-19.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-03-30 13:12:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1188764    

Description Stephen Tweedie 2015-03-10 13:30:14 UTC
Description of problem:
Any "docker run" that maps /run into the container ("docker run -v /run:/run") fails in the latest docker build with a symlink error when setting up the container

Version-Release number of selected component (if applicable):
docker-1.5.0-16.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. # docker run --rm -ti -v /run:/run rhel7.1 bash

Actual results:
setup mount namespace invalid symlink "/var/lib/docker/devicemapper/mnt/5035f5e4f34b4bcf47782e6a5d5c98e251b25cf8e7d9ea05610257e1aa74048d/rootfs/.tmpfsrun/cloud-init/result.json" -> "../../var/lib/cloud/data/result.json"FATA[0001] Error response from daemon: Cannot start container 5035f5e4f34b4bcf47782e6a5d5c98e251b25cf8e7d9ea05610257e1aa74048d: setup mount namespace invalid symlink "/var/lib/docker/devicemapper/mnt/5035f5e4f34b4bcf47782e6a5d5c98e251b25cf8e7d9ea05610257e1aa74048d/rootfs/.tmpfsrun/cloud-init/result.json" -> "../../var/lib/cloud/data/result.json"

Expected results:
Container should start

Additional info:
This is also blocking the rsyslog image update; adding logrotate requires us to share /run/syslog.pid between the host and container.

The error mentions cloud-init files, which are symlinks on the rhel7 cloud-image-based VM showing the problem:

# ls -lR /run/cloud-init
/run/cloud-init:
total 0
lrwxrwxrwx. 1 root root 36 Mar 10 09:02 result.json -> ../../var/lib/cloud/data/result.json
lrwxrwxrwx. 1 root root 36 Mar 10 09:02 status.json -> ../../var/lib/cloud/data/status.json

These symlinks in /run/ are broken/dangling within the container, which seems to be triggering the problem.

Comment 1 Lokesh Mandvekar 2015-03-10 16:45:55 UTC
if it helps, this issue started with 1.5.0-6. 1.5.0-5 works fine. looking further ...

Comment 2 Daniel Walsh 2015-03-10 16:57:23 UTC
Do we see any of this content in the /run directory if we don't do the -v /run:/run?

Comment 3 Stephen Tweedie 2015-03-10 17:09:05 UTC
(In reply to Daniel Walsh from comment #2)
> Do we see any of this content in the /run directory if we don't do the -v
> /run:/run?

No.  If you run the full --spc docker command line omitting only the -v /run:/run, it runs perfectly but the only thing present in the container /run is /run/secrets.

Comment 4 Daniel Walsh 2015-03-10 17:55:24 UTC
I have no clue what is going on here then.  We do some specially tar/untar stuff but this happens in either case when you have the tpmfs on /run patch.

Comment 5 Vivek Goyal 2015-03-10 18:31:14 UTC
On my old atomic host virt machine, I can bind mount /run into a test direcotry /root/test/test1. And this does break the symlinks of clould-init/ but bind mounting succeeds.

I am not clear about two things.

- Who is doing the checking whether there are broken symlinks or not.
- What is this .tmpfsrun directory and how is creating it and why.

Comment 6 Vivek Goyal 2015-03-10 18:43:04 UTC
I can take upstream docker, create a dir /run/test/ and then create a symlink inside that.

#mkdir /run/test
# cd /run/test/
# ln -s ../../root/testfile.txt .

And then run a container.

docker run -ti -v /run:/run fedora /bin/bash

it works just fine. Though there are broken links in /run/test/ in container, it does not prevent bind mounting inside container.

I don't see any .tmpfsrun directory being created in my upstream docker stuff. 

Following does not list any .tmpfsrun. So something is fishy about this directory. And culprit might be the one who is setting it up.

#ls -al /var/lib/docker/devicemapper/mnt/5dc1f6d57a92d5fcfaaad1cdc531bdd2405e3a66644c478e287fa97db05eb84e/rootfs

Comment 7 Vivek Goyal 2015-03-10 19:22:26 UTC
For the record, I can't reproduce this on my rhel7.1 VM running 
docker-1.5.0-16.el7.x86_64.

Following runs just fine.

docker run -ti -v /run:/run registry.access.redhat.com/rhel7.1 /bin/bash

Comment 8 Stephen Tweedie 2015-03-10 22:28:19 UTC
(In reply to Vivek Goyal from comment #7)
> For the record, I can't reproduce this on my rhel7.1 VM running 
> docker-1.5.0-16.el7.x86_64.
> 
> Following runs just fine.
> 
> docker run -ti -v /run:/run registry.access.redhat.com/rhel7.1 /bin/bash

I don't think an anaconda-installed VM will show this out-of-the-box; the case that triggers it for me is running cloud-init using a pre-installed VM base image.  

If the /run/cloud-init/* symlinks from the initial description are rm'ed, the problem does not reproduce; such symlinks are required to trigger the problem.

I _can_ reproduce this on a 7.1 VM, though, if I add your /run/test/ symlink from comment #6, using docker-1.5.0-16.el7.x86_64.

Comment 13 Vivek Goyal 2015-03-11 14:29:50 UTC
Ok, I could reproduce this problem on rhel7 vm with symlink created in /run as specified in comment6.

In fact I did not have to launch a rhel7 container. Even it could be reproduced while trying to launch a fedora container.

And it did not happen with upstream docker. So either it got fixed upstream or we have this problem only in the docker we are carrying.

Comment 14 Vivek Goyal 2015-03-11 14:36:38 UTC
I suspect that following patch might be the one creating problems.

https://github.com/lsm5/docker/commit/23bd8e66de138aecd61d006141accaec660dc7d7


 New implementation of /run support

    This mounts a /run tmpfs into the container, with the initial contents
    copies from the /run in the base image, unless MountRun is set to false
    in the HostConfig.

    Additionally MountRun is always set to false during a docker build, which i
    means any setup of /run in a Dockerfile is saved in the image to be copied
    into the final /run tmpfs when a container is started.

    Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl> (github: alexlarsson)

Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh> (github: rhatdan)

Comment 16 Daniel Walsh 2015-03-11 18:47:09 UTC
/mnt/5035f5e4f34b4bcf47782e6a5d5c98e251b25cf8e7d9ea05610257e1aa74048d/rootfs/.tmpfsrun/cloud-init/result.json" -> "../../var/lib/cloud

This comes from the /run patch.  Basically when we create a container based off an image, the initial /run is created by looking at the contents of /run and mounting a tmpfs over it.  What the patch does is tar's up the content in /run into a temp file in / called /.tmpfsDIR and then mounts the tmpfs, after it is mounted the /tmpfsDIR gets tar'd up and placed back on top of the /run.  What I think is happening is the /run volume mount is being mounted before the tmpfs mount, which is causing the conflict.

Comment 17 Daniel Walsh 2015-03-11 18:48:13 UTC
I think we need to change our patch to not do a tmpfs on /run if the /run is being bind mounted.

Comment 18 Daniel Walsh 2015-03-11 18:56:48 UTC
Lokesh I just pushed a fix for 1.5.0, which should not attempt to mount tmpfs on /run if /run is to be volume mounted.


Can you rebuild docker-1.5

Comment 19 Stephen Tweedie 2015-03-12 13:47:26 UTC
Confirmed that docker-1.5.0-19.el7.x86_64 fixes this for me.

Comment 21 Luwen Su 2015-03-26 15:20:03 UTC
Move to verified in docker-1.5.0-27.el7.x86_64

# docker run --rm -ti -v /run:/run rhel7 bash
[root@f0efa9797205 /]# exit

Comment 23 errata-xmlrpc 2015-03-30 13:12:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0759.html