Bug 1450554 - Error removing mounted layer XXX: failed to remove device XXX is Busy"
Summary: Error removing mounted layer XXX: failed to remove device XXX is Busy"
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: docker
Version: 7.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Vivek Goyal
QA Contact: atomic-bugs@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-05-13 01:36 UTC by Eric Paris
Modified: 2020-12-14 08:39 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-06-30 15:03:02 UTC
Target Upstream Version:


Attachments (Terms of Use)
All logs and data in a txt file (41.21 KB, text/plain)
2017-05-13 13:27 UTC, Eric Paris
no flags Details

Description Eric Paris 2017-05-13 01:36:26 UTC
docker-1.12.6-11.el7.x86_64
kernel-3.10.0-514.16.1.el7.x86_64


So it appears the root of (stopped) container c3a09510e8c4 leaked into running container 39429cb438b3. But I cannot explain how/why. Clearly the running container mounted in `-v /:/host:rslave`, which we thought meant the unmount on the host os would cause the unmount inside. But that doesn't appear to be happening...


/usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --authorization-plugin=rhel-push-plugin --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current --selinux-enabled --log-driver json-file --log-opt max-size=50m --storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/docker_vg-docker--pool --storage-opt dm.use_deferred_removal=true --add-registry registry.access.redhat.com --insecure-registry registry.qe.openshift.com



[snip from journalctl]
dockerd-current[84297]: time="2017-05-13T01:14:57.034714705Z" level=info msg="{Action=remove, LoginUID=4294967295, PID=84811}"
kernel: device-mapper: thin: Deletion of thin device 137375 failed. 
dockerd-current[84297]: time="2017-05-13T01:14:57.044483116Z" level=error msg="Error removing mounted layer c3a09510e8c4932d1ca7e1c376676e4007c49d66fd0c526572c0182aae00ad12: failed to remove device c769b13124241578f4fb7e7e4f5c0e3c99fd73ecf92c2fefa9163b40231cd9bb:Device is Busy"
atomic-openshift-node[84811]: W0513 01:14:57.044837   84811 container_gc.go:252] Failed to remove container "c3a09510e8c4932d1ca7e1c376676e4007c49d66fd0c526572c0182aae00ad12": Error response from daemon: {"message":"Driver devicemapper failed to remove root filesystem c3a09510e8c4932d1ca7e1c376676e4007c49d66fd0c526572c0182aae00ad12: failed to remove device c769b13124241578f4fb7e7e4f5c0e3c99fd73ecf92c2fefa9163b40231cd9bb:Device is Busy"}
dockerd-current[84297]: time="2017-05-13T01:14:57.044551086Z" level=error msg="Handler for DELETE /v1.24/containers/c3a09510e8c4932d1ca7e1c376676e4007c49d66fd0c526572c0182aae00ad12?v=1 returned error: Driver devicemapper failed to remove root filesystem c3a09510e8c4932d1ca7e1c376676e4007c49d66fd0c526572c0182aae00ad12: failed to remove device c769b13124241578f4fb7e7e4f5c0e3c99fd73ecf92c2fefa9163b40231cd9bb:Device is Busy"
dockerd-current[84297]: time="2017-05-13T01:14:57.044621335Z" level=error msg="Handler for DELETE /v1.24/containers/c3a09510e8c4932d1ca7e1c376676e4007c49d66fd0c526572c0182aae00ad12 returned error: Driver devicemapper failed to remove root filesystem c3a09510e8c4932d1ca7e1c376676e4007c49d66fd0c526572c0182aae00ad12: failed to remove device c769b13124241578f4fb7e7e4f5c0e3c99fd73ecf92c2fefa9163b40231cd9bb:Device is Busy"
dockerd-current[84297]: time="2017-05-13T01:14:57.045427689Z" level=info msg="{Action=remove, LoginUID=4294967295, PID=84811}"



./find-busy-mnt.sh c769b13124241578f4fb7e7e4f5c0e3c99fd73ecf92c2fefa9163b40231cd9bb
PID	NAME		MNTNS
126873	crond		mnt:[4026533158]
126875	python		mnt:[4026533158]
126932	sleep		mnt:[4026533158]
91633	crond		mnt:[4026533158]
93818	check-pmcd-stat		mnt:[4026533158]
93819	pmpause		mnt:[4026533158]
93870	pmcd		mnt:[4026533158]
93872	pmdaroot		mnt:[4026533158]
93874	pmdaproc		mnt:[4026533158]
93875	pmdaxfs		mnt:[4026533158]
93876	pmdalinux		mnt:[4026533158]


# pstree
systemd─┬─NetworkManager─┬─dhclient
        │                └─2*[{NetworkManager}]
        ├─docker-current───7*[{docker-current}]
        ├─dockerd-current─┬─docker-containe─┬─docker-containe─┬─crond─┬─check-pmcd-stat───sleep
        │                 │                 │                 │       └─pmpause
        │                 │                 │                 ├─pmcd───pmdaroot─┬─pmdalinux
        │                 │                 │                 │                 ├─pmdaproc
        │                 │                 │                 │                 └─pmdaxfs
        │                 │                 │                 └─10*[{docker-containe}]
        │                 │                 └─40*[{docker-containe}]
        │                 └─74*[{dockerd-current}]


# docker inspect 39429cb438b3 | jq '.[0] | {"HostConfig.Binds": .HostConfig.Binds, GraphDriver: .GraphDriver, Mounts: .Mounts}'
{
  "HostConfig.Binds": [
    "/etc/localtime:/etc/localtime",
    "/etc/origin/node:/etc/origin/node",
    "/usr/bin/oc:/usr/bin/oc",
    "/usr/bin/oadm:/usr/bin/oadm",
    "/:/host:ro,rslave",
    "/var/cache/yum:/host/var/cache/yum:rw",
    "/etc/openshift_tools:/container_setup:ro",
    "/sys:/sys:ro",
    "/var/lib/docker/volumes-ops/oso-rhel7-host-monitoring/empty_selinux:/sys/fs/selinux",
    "/var/run/docker.sock:/var/run/docker.sock",
    "/var/run/openvswitch:/var/run/openvswitch"
  ],
  "GraphDriver": {
    "Name": "devicemapper",
    "Data": {
      "DeviceId": "137379",
      "DeviceName": "docker-253:3-33596144-6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d",
      "DeviceSize": "107374182400"
    }
  },
  "Mounts": [
    {
      "Source": "/etc/origin/node",
      "Destination": "/etc/origin/node",
      "Mode": "",
      "RW": true,
      "Propagation": "rprivate"
    },
    {
      "Source": "/usr/bin/oadm",
      "Destination": "/usr/bin/oadm",
      "Mode": "",
      "RW": true,
      "Propagation": "rprivate"
    },
    {
      "Source": "/",
      "Destination": "/host",
      "Mode": "ro,rslave",
      "RW": false,
      "Propagation": "rslave"
    },
    {
      "Source": "/var/cache/yum",
      "Destination": "/host/var/cache/yum",
      "Mode": "rw",
      "RW": true,
      "Propagation": "rprivate"
    },
    {
      "Source": "/var/run/openvswitch",
      "Destination": "/var/run/openvswitch",
      "Mode": "",
      "RW": true,
      "Propagation": "rprivate"
    },
    {
      "Source": "/etc/localtime",
      "Destination": "/etc/localtime",
      "Mode": "",
      "RW": true,
      "Propagation": "rprivate"
    },
    {
      "Source": "/etc/openshift_tools",
      "Destination": "/container_setup",
      "Mode": "ro",
      "RW": false,
      "Propagation": "rprivate"
    },
    {
      "Source": "/sys",
      "Destination": "/sys",
      "Mode": "ro",
      "RW": false,
      "Propagation": "rprivate"
    },
    {
      "Source": "/var/lib/docker/volumes-ops/oso-rhel7-host-monitoring/empty_selinux",
      "Destination": "/sys/fs/selinux",
      "Mode": "",
      "RW": true,
      "Propagation": "rprivate"
    },
    {
      "Source": "/var/run/docker.sock",
      "Destination": "/var/run/docker.sock",
      "Mode": "",
      "RW": true,
      "Propagation": "rprivate"
    },
    {
      "Source": "/usr/bin/oc",
      "Destination": "/usr/bin/oc",
      "Mode": "",
      "RW": true,
      "Propagation": "rprivate"
    }
  ]
}

Comment 2 Eric Paris 2017-05-13 01:39:11 UTC
I don't know how this is getting leaked in (and not removed). What do you think Vivek?

Comment 3 Eric Paris 2017-05-13 01:45:54 UTC
# docker exec -ti 39429cb438b3 cat /proc/mounts | grep c769b13124241578f4fb7e7e4f5c0e3c99fd73ecf92c2fefa9163b40231cd9bb
/dev/mapper/docker-253:3-33596144-c769b13124241578f4fb7e7e4f5c0e3c99fd73ecf92c2fefa9163b40231cd9bb /host/var/lib/docker/devicemapper/mnt/c769b13124241578f4fb7e7e4f5c0e3c99fd73ecf92c2fefa9163b40231cd9bb xfs rw,context="system_u:object_r:svirt_sandbox_file_t:s0:c2,c3",relatime,nouuid,attr2,inode64,logbsize=128k,sunit=256,swidth=1024,noquota 0 0


This shows that it is mounted at /host/var/lib/docker/devicemapper/mnt/ which says it came in via the:

    {
      "Source": "/",
      "Destination": "/host",
      "Mode": "ro,rslave",
      "RW": false,
      "Propagation": "rslave"
    },

Mount. Which is also clearly rslave....

Comment 4 Eric Paris 2017-05-13 01:49:48 UTC
/me thinks we need a tool like pstree, only for mounts and mount propagation modes. And it needs to be a tool even I can understand   :)

Derek, Justin, this is yet another problem with 'stuck terminating' in dev-preview-stg.

For anyone who wonders what 'dev-preview-stg' means. It is the cluster that we hoped to push live to free tier customers today, but we had too many problems.

Comment 5 Daniel Walsh 2017-05-13 09:53:48 UTC
How much can you do with 
findmnt?  Would findmnt need to enter different mount namespaces to trace all of the mount points?

Comment 6 Eric Paris 2017-05-13 13:24:58 UTC
Yes, Yes I can.

# ps -ef | grep 91610
root      91610  84305  0 May08 ?        00:00:00 /usr/bin/docker-containerd-shim-current 39429cb438b31c45fe529fe1a8cf9c6b08ce3111bfd06533555661fd19ba9526 /var/run/docker/libcontainerd/39429cb438b31c45fe529fe1a8cf9c6b08ce3111bfd06533555661fd19ba9526 /usr/libexec/docker/docker-runc-current
root      91633  91610  0 May08 ?        00:00:17 /usr/sbin/crond -n -m off
root      93870  91610  0 May08 ?        00:05:43 /usr/libexec/pcp/bin/pmcd -A

# nsenter -t 91610 -m findmnt -o TARGET,FSTYPE,PROPAGATION | grep c769b13124241578f4fb7e7e4
[nothing]

# nsenter -t 91633 -m findmnt -o TARGET,FSTYPE,PROPAGATION
TARGET                                                                                                                                                                 PROPAGATION
/                                                                                                                                                                      private
[snip]
├─/host                                                                                                                                                                private,slave
│ ├─/host/dev                                                                                                                                                          private,slave
│ │ ├─/host/dev/shm                                                                                                                                                    private,slave
│ │ ├─/host/dev/pts                                                                                                                                                    private,slave
│ │ ├─/host/dev/mqueue                                                                                                                                                 private,slave
│ │ └─/host/dev/hugepages                                                                                                                                              private,slave
│ ├─/host/proc                                                                                                                                                         private,slave
│ │ ├─/host/proc/sys/fs/binfmt_misc                                                                                                                                    private,slave
│ │ └─/host/proc/fs/nfsd                                                                                                                                               private,slave
│ ├─/host/sys                                                                                                                                                          private,slave
│ │ ├─/host/sys/kernel/security                                                                                                                                        private,slave
│ │ ├─/host/sys/fs/cgroup                                                                                                                                              private,slave
│ │ │ ├─/host/sys/fs/cgroup/systemd                                                                                                                                    private,slave
│ │ │ ├─/host/sys/fs/cgroup/cpu,cpuacct                                                                                                                                private,slave
│ │ │ ├─/host/sys/fs/cgroup/pids                                                                                                                                       private,slave
│ │ │ ├─/host/sys/fs/cgroup/hugetlb                                                                                                                                    private,slave
│ │ │ ├─/host/sys/fs/cgroup/memory                                                                                                                                     private,slave
│ │ │ ├─/host/sys/fs/cgroup/blkio                                                                                                                                      private,slave
│ │ │ ├─/host/sys/fs/cgroup/perf_event                                                                                                                                 private,slave
│ │ │ ├─/host/sys/fs/cgroup/cpuset                                                                                                                                     private,slave
│ │ │ ├─/host/sys/fs/cgroup/freezer                                                                                                                                    private,slave
│ │ │ ├─/host/sys/fs/cgroup/net_cls,net_prio                                                                                                                           private,slave
│ │ │ └─/host/sys/fs/cgroup/devices                                                                                                                                    private,slave
│ │ ├─/host/sys/fs/pstore                                                                                                                                              private,slave
│ │ ├─/host/sys/kernel/config                                                                                                                                          private,slave
│ │ ├─/host/sys/fs/selinux                                                                                                                                             private,slave
│ │ └─/host/sys/kernel/debug                                                                                                                                           private,slave
│ ├─/host/run                                                                                                                                                          private,slave
│ │ ├─/host/run/user/763                                                                                                                                               private,slave
│ │ ├─/host/run/docker/netns/default                                                                                                                                   private,slave
│ │ ├─/host/run/docker/netns/3fa48b388ff3                                                                                                                              private,slave
│ │ ├─/host/run/docker/netns/b8ff94764a43                                                                                                                              private,slave
│ │ ├─/host/run/docker/netns/c62eb108d8f1                                                                                                                              private,slave
│ │ ├─/host/run/docker/netns/8724a60fa2d9                                                                                                                              private,slave
│ │ ├─/host/run/docker/netns/468a03f57fa9                                                                                                                              private,slave
│ │ ├─/host/run/docker/netns/c47c279af13c                                                                                                                              private,slave
│ │ └─/host/run/docker/netns/6486ec1b465d                                                                                                                              private,slave
│ ├─/host/boot                                                                                                                                                         private,slave
│ └─/host/var                                                                                                                                                          private,slave
│   ├─/host/var/lib/origin/openshift.local.volumes/pods/ee3d2c61-34bb-11e7-8754-0eaa067b1713/volumes/kubernetes.io~secret/aggregated-logging-elasticsearch-token-6vtqk private,slave
│   ├─/host/var/lib/origin/openshift.local.volumes/pods/f960f0d6-34bb-11e7-8754-0eaa067b1713/volumes/kubernetes.io~secret/kibana-proxy                                 private,slave
│   ├─/host/var/lib/origin/openshift.local.volumes/pods/f54453aa-34bb-11e7-8754-0eaa067b1713/volumes/kubernetes.io~secret/hawkular-cassandra-secrets                   private,slave
│   ├─/host/var/lib/origin/openshift.local.volumes/pods/ee3d2c61-34bb-11e7-8754-0eaa067b1713/volumes/kubernetes.io~secret/elasticsearch                                private,slave
│   ├─/host/var/lib/origin/openshift.local.volumes/pods/fba0a04b-34bb-11e7-8754-0eaa067b1713/volumes/kubernetes.io~secret/hawkular-token-wvq1x                         private,slave
│   ├─/host/var/lib/origin/openshift.local.volumes/pods/eea6c43a-34bb-11e7-8754-0eaa067b1713/volumes/kubernetes.io~secret/aggregated-logging-kibana-token-m1jtx        private,slave
│   ├─/host/var/lib/origin/openshift.local.volumes/pods/eea6c43a-34bb-11e7-8754-0eaa067b1713/volumes/kubernetes.io~secret/kibana                                       private,slave
│   ├─/host/var/lib/origin/openshift.local.volumes/pods/f54453aa-34bb-11e7-8754-0eaa067b1713/volumes/kubernetes.io~secret/cassandra-token-16vcj                        private,slave
│   ├─/host/var/lib/origin/openshift.local.volumes/pods/f54453aa-34bb-11e7-8754-0eaa067b1713/volumes/kubernetes.io~aws-ebs/pvc-c1fe10ae-1e20-11e7-bee2-0ea1922a9381    private,slave
│   ├─/host/var/lib/origin/openshift.local.volumes/pods/a373c7a7-34bb-11e7-8754-0eaa067b1713/volumes/kubernetes.io~secret/registry-token-at20c                         private,slave
│   ├─/host/var/lib/origin/openshift.local.volumes/pods/a373c7a7-34bb-11e7-8754-0eaa067b1713/volumes/kubernetes.io~secret/dockersecrets                                private,slave
│   ├─/host/var/lib/origin/openshift.local.volumes/pods/a373c7a7-34bb-11e7-8754-0eaa067b1713/volumes/kubernetes.io~secret/dockercerts                                  private,slave
│   ├─/host/var/lib/origin/openshift.local.volumes/pods/eea6c43a-34bb-11e7-8754-0eaa067b1713/volumes/kubernetes.io~secret/kibana-proxy                                 private,slave
│   ├─/host/var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/aws-ebs/mounts/aws/us-east-1c/vol-049d7a9316d0feee6                                           private,slave
│   ├─/host/var/lib/origin/openshift.local.volumes/pods/ee3d2c61-34bb-11e7-8754-0eaa067b1713/volumes/kubernetes.io~aws-ebs/pvc-4216403a-db50-11e6-a28c-0ea1922a9381    private,slave
│   ├─/host/var/lib/origin/openshift.local.volumes/pods/f960f0d6-34bb-11e7-8754-0eaa067b1713/volumes/kubernetes.io~secret/aggregated-logging-kibana-token-m1jtx        private,slave
│   ├─/host/var/lib/origin/openshift.local.volumes/pods/f960f0d6-34bb-11e7-8754-0eaa067b1713/volumes/kubernetes.io~secret/kibana                                       private,slave
│   ├─/host/var/lib/origin/openshift.local.volumes/pods/fba0a04b-34bb-11e7-8754-0eaa067b1713/volumes/kubernetes.io~secret/hawkular-metrics-secrets                     private,slave
│   ├─/host/var/lib/origin/openshift.local.volumes/pods/fba0a04b-34bb-11e7-8754-0eaa067b1713/volumes/kubernetes.io~secret/hawkular-metrics-client-secrets              private,slave
│   ├─/host/var/lib/nfs/rpc_pipefs                                                                                                                                     private,slave
│   ├─/host/var/lib/origin/openshift.local.volumes/pods/a16d5d28-2eb4-11e7-a9db-0eaa067b1713/volumes/kubernetes.io~secret/intercom-api-auth                            private,slave
│   ├─/host/var/lib/origin/openshift.local.volumes/pods/a16d5d28-2eb4-11e7-a9db-0eaa067b1713/volumes/kubernetes.io~secret/intercom-account-reconciler-token-ieus2      private,slave
│   ├─/host/var/lib/origin/openshift.local.volumes/pods/e5087dce-2eb4-11e7-a576-0ee251450653/volumes/kubernetes.io~secret/aggregated-logging-fluentd-token-jpdv1       private,slave
│   ├─/host/var/lib/origin/openshift.local.volumes/pods/e5087dce-2eb4-11e7-a576-0ee251450653/volumes/kubernetes.io~secret/certs                                        private,slave
│   ├─/host/var/lib/origin/openshift.local.volumes/pods/1a31d16e-2eb5-11e7-9452-0ea1922a9381/volumes/kubernetes.io~secret/heapster-secrets                             private,slave
│   ├─/host/var/lib/origin/openshift.local.volumes/pods/1a31d16e-2eb5-11e7-9452-0ea1922a9381/volumes/kubernetes.io~secret/hawkular-metrics-certificate                 private,slave
│   ├─/host/var/lib/origin/openshift.local.volumes/pods/1a31d16e-2eb5-11e7-9452-0ea1922a9381/volumes/kubernetes.io~secret/hawkular-metrics-account                     private,slave
│   ├─/host/var/lib/origin/openshift.local.volumes/pods/1a31d16e-2eb5-11e7-9452-0ea1922a9381/volumes/kubernetes.io~secret/heapster-token-k4bbq                         private,slave
│   ├─/host/var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/aws-ebs/mounts/aws/us-east-1c/vol-0e5a313767d7bf160                                           private,slave
│   ├─/host/var/lib/docker/devicemapper                                                                                                                                private
│   │ ├─/host/var/lib/docker/devicemapper/mnt/8b12a2cd61ab9858ba387a87a53533626b8526a883e9af63c36c07495d26da4c                                                         private
│   │ ├─/host/var/lib/docker/devicemapper/mnt/cffcc7529f69cba901280b0bf708be62200551a2c20906a6d9852c60d832be3c                                                         private
│   │ ├─/host/var/lib/docker/devicemapper/mnt/0249eb425393428e728dcae02ac845fd592f9d9dfa04e02a72be76697c79bc4b                                                         private
│   │ ├─/host/var/lib/docker/devicemapper/mnt/5ef001a69d34eb72861d1961b9d9e2c2e26a5f7e3cbd9130afdcef2f2301649a                                                         private
│   │ ├─/host/var/lib/docker/devicemapper/mnt/c66873c5385aebbae026161415a24e8a91f3b2825da44d0bf961a7e3b1573b03                                                         private
│   │ ├─/host/var/lib/docker/devicemapper/mnt/1f9fac7876b3fc38dcd789c1bac03ddc29f17a356c63fafbe92a7b7cc9c4702c                                                         private
│   │ ├─/host/var/lib/docker/devicemapper/mnt/9ad31820c36291df278e83f55f0c8353062e27e7ce857ac352e79148db2b39ec                                                         private
│   │ ├─/host/var/lib/docker/devicemapper/mnt/19874f25575bee6c11dfbd936b0abf621ed2e94a9771123beb04863332ddf6a5                                                         private
│   │ ├─/host/var/lib/docker/devicemapper/mnt/c185928c7d1afd3f0202999ec19eac157572769195ed29417335a279e634e3c6                                                         private
│   │ ├─/host/var/lib/docker/devicemapper/mnt/986f24e8f415d798aa3d83380b2facd7f67d548f073608d2a9e029e720c28402                                                         private
│   │ ├─/host/var/lib/docker/devicemapper/mnt/0f84bc086382991be5714036c4de22ebc3e7624b0e154ba4f56344183842bd56                                                         private
│   │ ├─/host/var/lib/docker/devicemapper/mnt/72eba9b0da7f90c55e105c401d030628bb4d17bacea421211c755bbbdc2be91b                                                         private
│   │ ├─/host/var/lib/docker/devicemapper/mnt/33ef62ee5a982631f114825ecab5fd6d77f8de74265f9554516ecc2f4cfc83df                                                         private
│   │ ├─/host/var/lib/docker/devicemapper/mnt/9615836764baf3d14ac576edc5eacc0143a0694ebd94800cf4e2bbd6c77264ef                                                         private
│   │ ├─/host/var/lib/docker/devicemapper/mnt/c769b13124241578f4fb7e7e4f5c0e3c99fd73ecf92c2fefa9163b40231cd9bb                                                         private
│   │ ├─/host/var/lib/docker/devicemapper/mnt/73954d9ef7ad0b67d7b6b697a23e8784c731057796ca501f7bb569743bc87414                                                         private
│   │ └─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d                                                         private
│   │   └─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs                                                private
│   │     ├─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/proc                                         private
│   │     ├─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/dev                                          private
│   │     │ ├─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/dev/pts                                    private
│   │     │ └─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/dev/mqueue                                 private
│   │     ├─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/sys/fs/cgroup                                private
│   │     │ ├─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/sys/fs/cgroup/systemd                      private,slave
│   │     │ ├─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/sys/fs/cgroup/cpuacct,cpu                  private,slave
│   │     │ ├─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/sys/fs/cgroup/pids                         private,slave
│   │     │ ├─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/sys/fs/cgroup/hugetlb                      private,slave
│   │     │ ├─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/sys/fs/cgroup/memory                       private,slave
│   │     │ ├─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/sys/fs/cgroup/blkio                        private,slave
│   │     │ ├─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/sys/fs/cgroup/perf_event                   private,slave
│   │     │ ├─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/sys/fs/cgroup/cpuset                       private,slave
│   │     │ ├─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/sys/fs/cgroup/freezer                      private,slave
│   │     │ ├─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/sys/fs/cgroup/net_prio,net_cls             private,slave
│   │     │ └─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/sys/fs/cgroup/devices                      private,slave
│   │     ├─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/container_setup                              private
│   │     └─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/sys                                          private
│   │       ├─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/sys/kernel/security                        private
│   │       ├─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/sys/fs/cgroup                              private
│   │       │ ├─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/sys/fs/cgroup/systemd                    private
│   │       │ ├─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/sys/fs/cgroup/cpu,cpuacct                private
│   │       │ ├─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/sys/fs/cgroup/pids                       private
│   │       │ ├─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/sys/fs/cgroup/hugetlb                    private
│   │       │ ├─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/sys/fs/cgroup/memory                     private
│   │       │ ├─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/sys/fs/cgroup/blkio                      private
│   │       │ ├─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/sys/fs/cgroup/perf_event                 private
│   │       │ ├─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/sys/fs/cgroup/cpuset                     private
│   │       │ ├─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/sys/fs/cgroup/freezer                    private
│   │       │ ├─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/sys/fs/cgroup/net_cls,net_prio           private
│   │       │ └─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/sys/fs/cgroup/devices                    private
│   │       ├─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/sys/fs/pstore                              private
│   │       ├─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/sys/kernel/config                          private
│   │       ├─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/sys/fs/selinux                             private
│   │       └─/host/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d/rootfs/sys/kernel/debug                           private
│   ├─/host/var/lib/docker/containers/0ed7611c66103b4304d50b86eebb3666e089e996ff22b73ac41b37dd28d7ca2c/shm                                                             private
│   ├─/host/var/lib/docker/containers/85b94fcf3366455ea1a52b9d8930161b16bf2668b15c4e7c074c0f6d19752818/shm                                                             private
│   ├─/host/var/lib/docker/containers/57110e64a742ef0e4f32493c71c6637ce9ad92dd9fcc914010996703a6e41808/shm                                                             private
│   ├─/host/var/lib/docker/containers/8ca045106dbe156cdb30b8a70451f37d14ad199ecb671801ff101c75889f5b2d/shm                                                             private
│   ├─/host/var/lib/docker/containers/0caba83962930a44f22d6a0c92743c78eb1b937ead6d9f2303edfa8a1d37f14d/shm                                                             private
│   ├─/host/var/lib/docker/containers/5125ba88fa08a511c7e98b35b6c4af317ad418953bc5dd2c8b15f56de031d314/shm                                                             private
│   ├─/host/var/lib/docker/containers/143901234b39bb1ff6649d55eea8165ebe4c0e8a13ee184358af91d324142f42/shm                                                             private
│   ├─/host/var/lib/docker/containers/b93217f44ddb4f03c3f0323dc53c6a3463988d10f4bc830862752c2d89ff8222/shm                                                             private
│   └─/host/var/cache/yum                                                                                                                                              private
[snip]

Comment 7 Eric Paris 2017-05-13 13:27:49 UTC
Created attachment 1278413 [details]
All logs and data in a txt file

Comment 8 Eric Paris 2017-05-13 13:31:28 UTC
findmnt basically says that some things under /host are private,slave and most things (not all!) under /host/var/lib/docker/devicemapper are just private and not private,slave.

I have no explaination for why this happens, but it sure explains the undestroyable devicemapper thin pool...

Comment 9 Daniel Walsh 2017-05-14 10:05:13 UTC
Vivek will know best but

The rootfs of each driver is mounted private to prevent it leaking into parent namespace

Which explains

/var/lib/docker/devicemapper

Being private

I think runc mounts up parts of the cgroup file system into containers so that they processes inside of a container can checkout their cgroup constraints.

Comment 10 Vivek Goyal 2017-05-15 12:44:05 UTC
/var/lib/docker/devicemapper is mounted "private" by docker so that any further mount points by docker are not seen by other applications. These are docker private mounts and docker does not want other applications/containers to see these mounts.

Comment 11 Vivek Goyal 2017-05-15 12:47:53 UTC
How about if we try to do recursive unmount of /var/lib/docker/ direcotry at container startup? (similar to fluentd solution). 

Will it make sense to enhance runc so that user can pass a list of Unmounts to be done. 

Unmounts {
   Source: /var/lib/docker/
}

Comment 12 Vivek Goyal 2017-05-15 12:54:25 UTC
Given /var/lib/docker/ itself can be a mount point and containers like fluentd look at some data inside /var/lib/docker/ we prbably can make it little find graind.

Unmounts {
   Source: /var/lib/docker/containers
   Source: /var/lib/docker/devicemapper
   Source: /var/lib/docker/overlay2
}

I think this leaked mount point will be an issue with overlay2 as well? Directory removal will fail saying -EBUSY (atleast till rhel7.3 and in default configuration of rhel 7.4)

Comment 13 Vivek Goyal 2017-05-15 13:17:52 UTC
In long term, I think rhel7.4 onwards this problem should be automatically solved. Because when a directory is removed on host and associated mount point
will be removed from containers.

Comment 14 Eric Paris 2017-05-15 13:27:37 UTC
@vivek, wouldn't actually paying attention to rslave have worked? you get private,slave? So nothing new would show up and umount outside would clean up inside? Why not respect the user's request?

I'd also love to understand why the same containers do not show up under both /var/lib/docker/devicemapper/ and /var/lib/docker/containers ? What may have been special about these?

Comment 15 Vivek Goyal 2017-05-15 13:55:46 UTC
(In reply to Eric Paris from comment #14)
> @vivek, wouldn't actually paying attention to rslave have worked? you get
> private,slave? So nothing new would show up and umount outside would clean
> up inside? Why not respect the user's request?

rslave will only work if original mount point is shared, right? Or may be slave as well (not sure).

rslave can't change property of a mount point which is "private" to begin with. 

IOW, if a mount point is shared originally, then you can create a copy of it
and mark new point "slave" so that it receives further mount/unmount updates
from original mount point. But if original mount point is "private", then
rslave can't make it propagate. So docker's rlsave semantics are not broken as
such.

docker is trying to protect itself by hiding its container mount points hoping
that this will reduce leaks of mount points and various complex configurations
which can result from keeping it shared.

This change was introduced by Alexander Larsson originally, and I think he
had some performance concerns as well because too many propagations were
happening.

If this is critical, we could trying introducing a docker option to not
mark graph driver parent directory private and let it inherit the property
from parent. I am not sure what will be result though. We might see performance issues or we might see mount points showing up unexpectedly or some interesting
hang/deadlock situations. This will be an experimental thing and see if it works reasonably well or not. 

> 
> I'd also love to understand why the same containers do not show up under
> both /var/lib/docker/devicemapper/ and /var/lib/docker/containers ? What may
> have been special about these?

/var/lib/docker/containers/ is supposed to have only some specialized mount points. All the container rootfs mount points are under /var/lib/docker/devicemapper/. (or /var/lib/docker/overlay/ if graph driver is overlay).

Comment 16 Vivek Goyal 2017-05-15 14:01:08 UTC
I think a good example would be.

Say a user does "unshare -m --propagation=unchanged bash". Now if docker starts a container, its rootfs will not propagate to "bash". And that's one way we reduce the leaking of container mount points. Its a security concern as well, right? We don't want applications running in other mount namespaces to be seeing container mount points.

Comment 17 Vivek Goyal 2017-05-15 14:25:26 UTC
Eric,

Once the docker has started, change property of /var/lib/docker/devicemapper manually.

mount --make-rshare /var/lib/docker/devicemapper

And let your workload run and see how does it go. I think if you do this manually you should see container rootfs mount points as "private, slave" inside container.

Comment 18 Eric Paris 2017-05-15 16:13:57 UTC
root      84297      1  3 May08 ?        05:56:26 /usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --authorization-plugin=rhel-push-plugin --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current --selinux-enabled --log-driver json-file --log-opt max-size=50m --storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/docker_vg-docker--pool --storage-opt dm.use_deferred_removal=true --add-registry registry.access.redhat.com --insecure-registry registry.qe.openshift.com


# nsenter -t 84297 -m findmnt -o TARGET,PROPAGATION 
TARGET                                                                                                                                                          PROPAGATION
/                                                                                                                                                               private,slave
└─/var                                                                                                                                                          private,slave
  ├─/var/lib/docker/devicemapper                                                                                                                                private
  │ └─/var/lib/docker/devicemapper/mnt/6b37df1d86aef17cbc866069997c60ac9a834bd6a352a84616233bebf281ae0d                                                         private


Almost every mount inside docker was private,slave. Except, as we can see, /var/lib/docker/devicemapper which is just private. And that privateness percolated into the container...

Comment 19 Vivek Goyal 2017-05-15 19:04:25 UTC
Ok, we discussed it during standup. One of the ideas mrunal mentioned that can we try  to unmount those using hooks. And everybody liked it.

So Dan Walsh has kindly agreed to look into writing a hook to unmount all mount points under /var/lib/docker/devicemapper directory during container startup.

If this works well, I think we could extend it to unmount mount points
under /var/lib/docker/containers/*/shm as well to solve the fluentd issue
we are seeing.

Comment 20 Daniel Walsh 2017-05-16 12:14:48 UTC
First pass.

https://github.com/rhatdan/oci-umount

As vivek points out we really want to recusively umount any volume under a specified path.

Comment 21 Vivek Goyal 2017-05-16 12:52:55 UTC
Dan, I think doing lazy mount on top mount should be enough. 

mount -l /var/lib/docker/devicemapper

Comment 22 Vivek Goyal 2017-05-16 12:53:28 UTC
What I meant was unmount.

unmount -l /var/lib/docker/devicemapper/

Comment 23 Vivek Goyal 2017-05-16 13:12:39 UTC
May 16 09:10:36 vm8-f26 dockerd[16198]: time="2017-05-16T09:10:36.689341776-04:00" level=error msg="Handler for POST /v1.26/containers/27ed9da02516e0056d8f7458d8213964d7f1ccc034d1b54d37e476718e09c329/start returned error: oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:334: running prestart hook 3 caused \\\"error running hook: exit status 1, stdout: , stderr: \\\"\"\n"

Comment 24 Vivek Goyal 2017-05-16 13:17:41 UTC
When do these hooks run? This will work only if these hooks run after pivot_root(). If we run it before pivot_root(), then container rootfs we have prepared in /var/lib/docker/overlay/... will be lost too?

Comment 25 Daniel Walsh 2017-05-16 14:21:14 UTC
I don't know.  Mrunal?

Comment 26 Vivek Goyal 2017-05-16 14:25:36 UTC
Also, what's the root filesystem of the hook process? Is it same rootfs as container process rootfs? I think following two scenarios should work.

- Either we run these hooks before doing volume mounts for container process and rootfs is same as docker process. So when we have finished unmounts and then runc goes on to do "-v /:/host", then by that time these mounts have disappeared and only a subset of mounts will be mounted on container rootfs.

- Or We run these hooks from container rootfs after pivot_root().

Comment 27 Vivek Goyal 2017-05-16 14:33:04 UTC
Also this should be called from mount namespace of container process.

Comment 28 Mrunal Patel 2017-05-16 14:45:45 UTC
The hooks are called right before the pivot_root. At the point we have access to all the mounts and can still bind mount additional directories from the host. Otherwise a lot of use cases are blocked if it is moved after.

Comment 29 Mrunal Patel 2017-05-16 15:07:47 UTC
We can still use the create/start split to get to the mount namespace after pivot root. It would mean more code to add supporting hooks between create/start in docker, though.

Comment 30 Vivek Goyal 2017-05-16 19:19:46 UTC
Discussed this during post scrum. Here is the brief summary.

- Prestart hooks should work.
- We need to use nsenter to enter container processes's mount namespace.
- We need to rootfs path before mount point.
- We also need to prefix target of volmume mount. So final path will look something like.

  $rootfs/$target/$path_on_host

- Determining $rootfs is easy. Calculating $target is not that simple.

- To make it generic, I think we will have to do following.

  1. Search $path_on_host in in list of volumes and match it against source.
  2. If it matches, find associated $target and use that value.
  3. break
  4. If not, then parse /proc/self/mountinfo and determine parent mount of $path_on_host.
  5. path_on_host = parent_mount
  6. Go to step 1.

Comment 31 Vivek Goyal 2017-05-16 19:24:08 UTC
If above works, we should be able to use this to solve shm sharing issue as well.

- Modify docker to make /var/lib/docker/containers/ a mount point.

  (mount --bind /var/lib/docker/containers /var/lib/docker/containers)

- And then call "umount -l /var/lib/docker/containers" using above hook. That
  should unmount all the /var/lib/docker/containers/*/shm mount points as well
  from container's mount namespace.

- But this will only solve the issue of unintential sharing of mount point. If 
  /var/lib/docker/containers/container_A/shm has been mounted at say
  /dev/shm/foo inside container B, that means, we will have to make changes to
  docker to make sure shm mount points are moved out of container directory and
  are refcounted.

Comment 32 Daniel Walsh 2017-05-19 12:03:32 UTC
Vivek and I have been playing around with a new tool oci-umount that unmounts leaked mount points into a container.  This tool can run as an OCI hook, like oci-systemd-hook and oci-register-machine.  

So far we have seen some good results although we have to get a little hacky with the shm mount points. 

But we have run a container with all content under /var/lib/docker/DRIVERS and /var/lib/docker/containers unmounted before the container starts.

The version Vivek has right now ONLY works with /:/host, so we are working to make this more flexible.

Comment 33 Eric Paris 2017-05-19 21:51:55 UTC
Thank you both. this is hugely impactful to us as every cluster hosted by Red Hat mounts /:/host and ever cluster that install logging on prem does the same.

Comment 34 Daniel Walsh 2017-05-23 16:20:56 UTC
We now have patches merged into projectatomic/docker and are reviewing oci-umount.  Once we have these built we will need openshift to run tests on them.

Comment 35 Daniel Walsh 2017-06-30 15:03:02 UTC
We are now shipping oci-umount so this should be fixed...


Note You need to log in before you can comment on or make changes to this bug.