Bug 1410118
Summary: | Containers fail to start due to /run/secrets mount when run with -v /run/:/run/ | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Pilar Gomez <pgomezmoy> | ||||
Component: | docker | Assignee: | Antonio Murdaca <amurdaca> | ||||
Status: | CLOSED WONTFIX | QA Contact: | atomic-bugs <atomic-bugs> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 7.0 | CC: | amurdaca, andrey.arapov, dornelas, dustymabe, dwalsh, eric, lsm5, pasik, pgomezmoy, tsweeney, vgoyal | ||||
Target Milestone: | rc | Keywords: | Extras | ||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2020-06-09 21:02:38 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1186913 | ||||||
Attachments: |
|
Note: It breaks Openstack kolla, that is why we need the run mount to work :) Antonio did we loose this change? (In reply to Daniel Walsh from comment #7) > Antonio did we loose this change? I don't actually remember making this change anywhere, I'll try to find it. Hi, I am having the same issue. Linux: 3.10.0-514.26.2.el7.x86_64 Docker: docker-1.12.6-32.git88a4867.el7.centos.x86_64 [root@kolla ~]# mount |grep secret [root@kolla ~]# docker run --rm -ti -v /run:/run:shared alpine sh / # exit [root@kolla ~]# mount |grep secret /dev/mapper/luks-197a2853-54f0-47e4-9b4a-9ea402f5364d on /run/secrets type ext4 (rw,relatime,seclabel,data=ordered) [root@kolla ~]# docker run --rm -ti -v /run:/run:shared alpine sh /usr/bin/docker-current: Error response from daemon: invalid header field value "oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:359: container init caused \\\"rootfs_linux.go:54: mounting \\\\\\\"/var/lib/docker/containers/7d99edd055c1e757a8194373e1b5be42d1b014462f20a2254a56237d8fbf4367/secrets\\\\\\\" to rootfs \\\\\\\"/var/lib/docker/overlay/c96c220e10f301b6172c0defa0f5fc789c5fbda6b83971fdeda18b57efe8d115/merged\\\\\\\" at \\\\\\\"/var/lib/docker/overlay/c96c220e10f301b6172c0defa0f5fc789c5fbda6b83971fdeda18b57efe8d115/merged/run/secrets\\\\\\\" caused \\\\\\\"no such file or directory\\\\\\\"\\\"\"\n". [root@kolla ~]# mount |grep secret /dev/mapper/luks-197a2853-54f0-47e4-9b4a-9ea402f5364d on /run/secrets type ext4 (rw,relatime,seclabel,data=ordered) [root@kolla ~]# getenforce Enforcing [root@kolla ~]# ausearch -m avc <no matches> And it works again as soon as I unmount /run/secrets And am also running Kolla, that suggested to specify the following MountFlags: [root@kolla ~]# cat /etc/systemd/system/docker.service.d/kolla.conf # managed by puppet [Service] MountFlags=shared Kind regards, Andrey Arapov Per https://docs.openstack.org/kolla-ansible/latest/quickstart.html it says: > Verify that the MountFlags parameter is configured as shared. If you do not > set the MountFlags option correctly then kolla-ansible will fail to deploy the > neutron-dhcp-agent container and throws APIError/HTTPError. The Kolla check https://github.com/openstack/kolla-ansible/blob/a9113fc4669db9572688b4e41ed90fa9b0a74212/ansible/roles/neutron/tasks/precheck.yml#L30 Seems like Red Hat's Docker sets MountFlags=slave, while Kolla wants MountFlags=shared. [root@kolla ~]# grep -r MountFlags /usr/lib/systemd/ /usr/lib/systemd/system/docker.service:MountFlags=slave /usr/lib/systemd/system/systemd-udevd.service:MountFlags=slave Interestingly, according to https://github.com/moby/moby/commit/2aee081cad72352f8b0c37ba0414ebc925b022e8 the upstream Docker (moby) used to have MountFlags=slave, but then they decided to remove MountFlags to allow shared mount propagation. Seems like that the Kolla's "MountFlags=shared requirement" has been emerged from the this issue https://bugs.launchpad.net/kolla/+bug/1546798 filled in 2016-02-17. https://review.openstack.org/#/c/281279/ https://review.openstack.org/#/c/281031/ I have tried omitting the MountFlags setting: # cat /etc/systemd/system/docker.service.d/kolla.conf # managed by puppet [Service] # remove MountFlags to allow shared mount propagation MountFlags= # systemctl show docker -p MountFlags MountFlags=0 And all 33 Kolla Docker containers have been started without an error. [root@kolla puppet]# kolla-ansible deploy -i /etc/kolla/inventory ... ... PLAY RECAP *************************************************************************************************************************************************** kolla.local : ok=261 changed=93 unreachable=0 failed=0 I think we should rather follow the upstream Docker and omit setting the MountFlags at all. Kind Regards, Andrey Arapov Hello, it seems I was just lucky with running kolla-ansible successfully as I had unmounted all /run/secrets before. So please ignore my previous comment on MountFlags, it does not really solve the original problem with the /run/secrets. In reality what helped me is setting --enable-secrets=false for dockerd. Similar issue: https://bugs.launchpad.net/kolla/+bug/1650778 Kind Regards, Andrey Arapov Hello, if anyone is interested, the upstream Docker does not have this issue https://github.com/moby/moby/issues/34238#issuecomment-319361701 Kind Regards, Andrey Arapov Upstream docker does not have the secrets patch. And therefore can not update/install RHEL packages using the hosts credentials. Antonio I thought this was fixed a long time ago. Dan, this doesn't sound the same issue we're talking about. This seems a race I have once discussed with Vivek (who I'm CC'ing) but don't fully remember it. So what's the correlation of the problem with "shared" mount. I mean does this problem happen when I do "-v /run:/run"? Or it happens only if "-v /run:/run:shared" is passed. Also I think first report said that it is failure of mount where mount returns no such file or directory. I suspect it is complaining about that /run/secrets/ does not exist. If that's the case, one possibility is. - /run/secrets is created. - /run/:/run is mounted - And when we try to mount on /run/secrets, its not there. Not sure how it can happen though. Is it possible to create a /run/secrets directory on host and see if that fixes the problem. Also this report was with docker-1.10. Please upgrde to docker-1.12 atleast and see if problem is still happening. I'm not able to reproduce the run error with docker-1.13.1-58.git87f2fab.el7 but this does seem to cause the container to fail to be removed # rpm -q docker redhat-release-server docker-1.13.1-58.git87f2fab.el7.x86_64 redhat-release-server-7.5-8.el7.x86_64 # uname -a Linux docker3.example.com 3.10.0-862.2.3.el7.x86_64 #1 SMP Mon Apr 30 12:37:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux # grep -r MountFlags /usr/lib/systemd/system/* /usr/lib/systemd/system/systemd-udevd.service:MountFlags=slave # grep secrets /proc/mounts | wc -l 0 # ls -l /run/secrets ls: cannot access /run/secrets: No such file or directory # cat test.sh for c in $(seq 1 10); do docker run -d --name test$c -ti -v /run:/run:shared rhel7; done grep secrets /proc/mounts sleep 10 docker stop $(docker ps -aq --format "{{.Names}}") docker rm $(docker ps -aq --format "{{.Names}}") while umount /run/secrets; do echo "rc: $?"; done docker rm $(docker ps -aq --format "{{.Names}}") # bash -x test.sh ++ seq 1 10 + for c in '$(seq 1 10)' + docker run -d --name test1 -ti -v /run:/run:shared rhel7 bd3c42328cc34f53078604083b7e56388450f896b54a327f7920d811a4740d28 + for c in '$(seq 1 10)' + docker run -d --name test2 -ti -v /run:/run:shared rhel7 8075a9fe7e330c7f1f6b58708cd5233f3463fc1bcdb5cb6fc230fbc0c9fe8760 + for c in '$(seq 1 10)' + docker run -d --name test3 -ti -v /run:/run:shared rhel7 fb1d55f3bf74c82afa7709057fde4142d244a0e8642721087931a33209d4fd7d + for c in '$(seq 1 10)' + docker run -d --name test4 -ti -v /run:/run:shared rhel7 1ecf8c6b96a2eb0a7c7a94382c7c657959dd2899cb8b064b233f18f1f3a5ac35 + for c in '$(seq 1 10)' + docker run -d --name test5 -ti -v /run:/run:shared rhel7 57f72696d5a7a80360d66446786881151d7b4e14a254e7046895f5222290af7e + for c in '$(seq 1 10)' + docker run -d --name test6 -ti -v /run:/run:shared rhel7 e2dbfbb4069f4f1a5e34934fc67ae14ae58dd98eafcb294848ba3a1ec0d45d34 + for c in '$(seq 1 10)' + docker run -d --name test7 -ti -v /run:/run:shared rhel7 d09e4463a366c1354ca967a2dd91052d3fac5fcfb5f58c399b52c3be482d8c26 + for c in '$(seq 1 10)' + docker run -d --name test8 -ti -v /run:/run:shared rhel7 0c2250cfebf46c1699bfe86355e58a7b0d09a9c54d9da418a29bb36c19d2f758 + for c in '$(seq 1 10)' + docker run -d --name test9 -ti -v /run:/run:shared rhel7 362075519e25a753b3ed7c785befc67d358acdc996448f54101a10527c7a1e67 + for c in '$(seq 1 10)' + docker run -d --name test10 -ti -v /run:/run:shared rhel7 ae3e7b0d19822b50e862fe8d8be354693a0e3c5875830835f16d4e61033cc43e + grep secrets /proc/mounts /dev/mapper/rhel_docker3-root /run/secrets xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0 /dev/mapper/rhel_docker3-root /run/secrets xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0 /dev/mapper/rhel_docker3-root /run/secrets xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0 /dev/mapper/rhel_docker3-root /run/secrets xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0 /dev/mapper/rhel_docker3-root /run/secrets xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0 /dev/mapper/rhel_docker3-root /run/secrets xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0 /dev/mapper/rhel_docker3-root /run/secrets xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0 /dev/mapper/rhel_docker3-root /run/secrets xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0 /dev/mapper/rhel_docker3-root /run/secrets xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0 /dev/mapper/rhel_docker3-root /run/secrets xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0 + sleep 10 ++ docker ps -aq --format '{{.Names}}' + docker stop test10 test9 test8 test7 test6 test5 test4 test3 test2 test1 test10 test9 test8 test7 test6 test5 test4 test3 test2 test1 ++ docker ps -aq --format '{{.Names}}' + docker rm test10 test9 test8 test7 test6 test5 test4 test3 test2 test1 test10 Error response from daemon: Unable to remove filesystem for 362075519e25a753b3ed7c785befc67d358acdc996448f54101a10527c7a1e67: remove /var/lib/docker/containers/362075519e25a753b3ed7c785befc67d358acdc996448f54101a10527c7a1e67/secrets: device or resource busy Error response from daemon: Unable to remove filesystem for 0c2250cfebf46c1699bfe86355e58a7b0d09a9c54d9da418a29bb36c19d2f758: remove /var/lib/docker/containers/0c2250cfebf46c1699bfe86355e58a7b0d09a9c54d9da418a29bb36c19d2f758/secrets: device or resource busy Error response from daemon: Unable to remove filesystem for d09e4463a366c1354ca967a2dd91052d3fac5fcfb5f58c399b52c3be482d8c26: remove /var/lib/docker/containers/d09e4463a366c1354ca967a2dd91052d3fac5fcfb5f58c399b52c3be482d8c26/secrets: device or resource busy Error response from daemon: Unable to remove filesystem for e2dbfbb4069f4f1a5e34934fc67ae14ae58dd98eafcb294848ba3a1ec0d45d34: remove /var/lib/docker/containers/e2dbfbb4069f4f1a5e34934fc67ae14ae58dd98eafcb294848ba3a1ec0d45d34/secrets: device or resource busy Error response from daemon: Unable to remove filesystem for 57f72696d5a7a80360d66446786881151d7b4e14a254e7046895f5222290af7e: remove /var/lib/docker/containers/57f72696d5a7a80360d66446786881151d7b4e14a254e7046895f5222290af7e/secrets: device or resource busy Error response from daemon: Unable to remove filesystem for 1ecf8c6b96a2eb0a7c7a94382c7c657959dd2899cb8b064b233f18f1f3a5ac35: remove /var/lib/docker/containers/1ecf8c6b96a2eb0a7c7a94382c7c657959dd2899cb8b064b233f18f1f3a5ac35/secrets: device or resource busy Error response from daemon: Unable to remove filesystem for fb1d55f3bf74c82afa7709057fde4142d244a0e8642721087931a33209d4fd7d: remove /var/lib/docker/containers/fb1d55f3bf74c82afa7709057fde4142d244a0e8642721087931a33209d4fd7d/secrets: device or resource busy Error response from daemon: Unable to remove filesystem for 8075a9fe7e330c7f1f6b58708cd5233f3463fc1bcdb5cb6fc230fbc0c9fe8760: remove /var/lib/docker/containers/8075a9fe7e330c7f1f6b58708cd5233f3463fc1bcdb5cb6fc230fbc0c9fe8760/secrets: device or resource busy Error response from daemon: Unable to remove filesystem for bd3c42328cc34f53078604083b7e56388450f896b54a327f7920d811a4740d28: remove /var/lib/docker/containers/bd3c42328cc34f53078604083b7e56388450f896b54a327f7920d811a4740d28/secrets: device or resource busy + umount /run/secrets + echo 'rc: 0' rc: 0 + umount /run/secrets + echo 'rc: 0' rc: 0 + umount /run/secrets + echo 'rc: 0' rc: 0 + umount /run/secrets + echo 'rc: 0' rc: 0 + umount /run/secrets + echo 'rc: 0' rc: 0 + umount /run/secrets + echo 'rc: 0' rc: 0 + umount /run/secrets + echo 'rc: 0' rc: 0 + umount /run/secrets + echo 'rc: 0' rc: 0 + umount /run/secrets + echo 'rc: 0' rc: 0 + umount /run/secrets + echo 'rc: 0' rc: 0 + umount /run/secrets umount: /run/secrets: not mounted ++ docker ps -aq --format '{{.Names}}' + docker rm test9 test8 test7 test6 test5 test4 test3 test2 test1 test9 test8 test7 test6 test5 test4 test3 test2 test1 If I leave off ":shared" this doesn't happen. Is any of this expected? We have no plans to ship another version of Docker at this time. RHEL7 is in final support stages where only security fixes will get released. Customers should move to use Podman which is available starting in RHEL 7.6. |
Created attachment 1237188 [details] Various logs (strace, verbose docker output) Description of problem: Docker containers fail to start because of what to appears to be a race condition with the secrets mount, when run mounting a /run as a volume (-v /run/:/run/:shared option). The source of the problem seems to be the mount of /dev/mapper/centos_cims-root on /run/secrets - there is no problem when mounting other paths under /run/ as volumes when running a container. Version-Release number of selected component (if applicable): Base OS: centos-release-7-2.1511.el7.centos.2.10.x86_64 (run with vagrant + virtualbox) Kernel: reproduced in 3.10.0-327.el7.x86_64 and 4.9.0-1.el7.elrepo.x86_64 Docker: Docker version 1.10.3, build cb079f6-unsupported How reproducible: As it is a race condition, but it happens often enough to be easy to reproduce. Steps to Reproduce: You need to spin a few containers that mount /run/ on /run/. I am using the centos image in this example. Chances are that you will hit the bug straight away. If it does not happen, removing the containers will most likely make the bug show up, and then you can reproduce it as many times as you need. 1. Pull an image to test. I am using the centos image here: [root@devcluster2 ~]# docker images REPOSITORY TAG IMAGE ID CREATED SIZE 10.142.0.1:5031/csms/centos-binary-heat-api-cfn 2.1.0DV a7a8015f38c7 24 hours ago 716.3 MB [...] docker.io/centos latest 67591570dd29 2 weeks ago 191.8 MB 2. Spin a few containers that mount /run/ on /run/: [root@devcluster2 ~]# docker run -d --name centos_test -v /run/:/run/:shared 67591570dd29 7d288d8c2a97f122fe6980b1eabab82bee4b8b8c8ca33d6150a9598c55eb3267 [root@devcluster2 ~]# docker run -d --name centos_test2 -v /run/:/run/:shared 67591570dd29 b667747581926860fe6d286ab91c017fb1b8090a9984633a8476084ae0686c66 [root@devcluster2 ~]# docker run -d --name centos_test3 -v /run/:/run/:shared 67591570dd29 2a8aa2284b9d5f22f8352a565ab58c574d3b86b96ebb8863fc78dbdc20721352 [root@devcluster2 ~]# docker run -d --name centos_test4 -v /run/:/run/:shared 67591570dd29 d14afd35180b17703c768b6160c63b23066a682b2f8cb6242917de04eedeb65a [root@devcluster2 ~]# docker run -d --name centos_test5 -v /run/:/run/:shared 67591570dd29 5218c2c045c12875ee1683e4f4b1c128031a6c23d2131831233843dc54eb201d [root@devcluster2 ~]# docker run -d --name centos_test6 -v /run/:/run/:shared 67591570dd29 137ad656380af906666793cf9c6bcc2c269590f74d72461d9de0e5208ec07e31 If in any of those runs you see the following output, you are done. That is what we were looking for: [pgomezmoy@devcluster2 .git]$ sudo docker run -d --name centos_test6 -v /run/:/run/:shared 67591570dd29 0db2d4fd1c121471e04e3e5f3a36c62ff723c81a7ce8ff36764016a32866fe38 docker: Error response from daemon: Cannot start container 0db2d4fd1c121471e04e3e5f3a36c62ff723c81a7ce8ff36764016a32866fe38: [9] System error: could not synchronise with container process. Othewise, continue to step 3 3. Remove the containers. You will most likely see an error in the response from the daemon: [root@devcluster2 ~]# docker ps -a | grep test | awk '{print $1}' | xargs docker rm -f 137ad656380a dd7cc6ebc28c Failed to remove container (5218c2c045c1): Error response from daemon: Unable to remove filesystem for 5218c2c045c12875ee1683e4f4b1c128031a6c23d2131831233843dc54eb201d: readdirent: no such file or directory Failed to remove container (d14afd35180b): Error response from daemon: Unable to remove filesystem for d14afd35180b17703c768b6160c63b23066a682b2f8cb6242917de04eedeb65a: readdirent: no such file or directory Failed to remove container (2a8aa2284b9d): Error response from daemon: Unable to remove filesystem for 2a8aa2284b9d5f22f8352a565ab58c574d3b86b96ebb8863fc78dbdc20721352: readdirent: no such file or directory Failed to remove container (b66774758192): Error response from daemon: Unable to remove filesystem for b667747581926860fe6d286ab91c017fb1b8090a9984633a8476084ae0686c66: readdirent: no such file or directory Failed to remove container (7d288d8c2a97): Error response from daemon: Unable to remove filesystem for 7d288d8c2a97f122fe6980b1eabab82bee4b8b8c8ca33d6150a9598c55eb3267: readdirent: no such file or directory Here it is! 4. Now, every time you try to spin a container with the run volume, you will see the error: $ [pgomezmoy@devcluster2 .git]$ sudo docker run -d --name centos_test6 -v /run/:/run/:shared 67591570dd29 0db2d4fd1c121471e04e3e5f3a36c62ff723c81a7ce8ff36764016a32866fe38 docker: Error response from daemon: Cannot start container 0db2d4fd1c121471e04e3e5f3a36c62ff723c81a7ce8ff36764016a32866fe38: [9] System error: could not synchronise with container process. Actual results: After the error happens, no containers that mount /run/ will start. Expected results: Containers get started when mounting run Additional info: This bug has been found when running OpenStack kolla, that mounts /run/ in quite a few containers so, when the containers dies, when it is restarted, it can pick everything up where it left it (hence the reason we need this mount). I attach syslogs and straces. To see the relevant bits on the straces, just grep for secret: $ grep -R secret And you will be able to locate this line: strace_out2.21653:07:04:15.794080 mount("/var/lib/docker/containers/347ba07bee65a46b199b0ad2a826ca65af672eb1381e7f508a9e523b4127302d/secrets", "/var/lib/docker/devicemapper/mnt/ba154316a9d78d58759b18bef10a6ec26629c3bccfa82d3db877f7155ee0aab7/rootfs/run/secrets", 0xc820642788, MS_BIND|MS_REC, NULL) = -1 ENOENT (No such file or directory) The mount flags are set to shared: $ cat /etc/systemd/system/docker.service.d/kolla.conf $ [Service] $ MountFlags=shared Udev sync is supported: $ docker info $ Containers: 13 $ Running: 11 $ [...] $ Udev Sync Supported: true $ [...] SELinux is disabled: $ getenforce $ Disabled