Created attachment 1258623 [details] node log Description of problem: Could not detach ebs volume after restart atomic-openshift-node service I found this issue on the container env Version-Release number of selected component (if applicable): openshift v3.5.0.35 kubernetes v1.5.2+43a9be4 etcd 3.1.0 How reproducible: Always Steps to Reproduce: 1.Create a pvc https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/persistent-volumes/misc/pvc.json 2.Create a pod https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/persistent-volumes/misc/pod.yaml 3.After pod is running, restart atomic-openshift-node service systemctl restart atomic-openshift-node 4.Delete pod and pvc 5.PV is in Released status Actual results: PV is in Released status, could be detached Expected results: PV is deleted and umounted Additional info: Mar 1 04:00:51 ip-172-18-15-152 atomic-openshift-node: I0301 04:00:51.626011 16462 nsenter_mount.go:196] IsLikelyNotMountPoint findmnt output for path /var/lib/origin/openshift.local.volumes/pods/9f98cc56-fe58-11e6-84b7-0edc552e3d92/volumes/kubernetes.io~aws-ebs/pvc-96c19fae-fe58-11e6-84b7-0edc552e3d92: /: Mar 1 04:00:51 ip-172-18-15-152 atomic-openshift-node: I0301 04:00:51.626026 16462 nsenter_mount.go:202] IsLikelyNotMountPoint: /var/lib/origin/openshift.local.volumes/pods/9f98cc56-fe58-11e6-84b7-0edc552e3d92/volumes/kubernetes.io~aws-ebs/pvc-96c19fae-fe58-11e6-84b7-0edc552e3d92 is not a mount point Mar 1 04:00:51 ip-172-18-15-152 atomic-openshift-node: W0301 04:00:51.626032 16462 util.go:84] Warning: "/var/lib/origin/openshift.local.volumes/pods/9f98cc56-fe58-11e6-84b7-0edc552e3d92/volumes/kubernetes.io~aws-ebs/pvc-96c19fae-fe58-11e6-84b7-0edc552e3d92" is not a mountpoint, deleting
Created attachment 1258624 [details] master log
Looking at the logs: Mar 01 03:28:06 ip-172-18-14-41.ec2.internal docker[17082]: I0301 08:28:06.767846 1 namespace_controller_utils.go:352] namespace controller - deleteAllContent - namespace: ywn61, ... Mar 01 03:28:07 ip-172-18-14-41.ec2.internal docker[17082]: I0301 08:28:07.442888 1 namespace_controller_utils.go:167] namespace controller - deleteCollection - namespace: ywn61, gvr: { v1 persistentvolumeclaims} ... Mar 01 03:28:08 ip-172-18-14-41.ec2.internal docker[17082]: I0301 08:28:08.163615 1 namespace_controller_utils.go:167] namespace controller - deleteCollection - namespace: ywn61, gvr: { v1 pods} So my hypothesis is: the pods have been deleted by deleting their namespace somehow and the order of deleting items in the namespace is wrong: the pvcs that are still being in use are attempted to be deleted before the pods that use them -- so the volume detaching fails.
I parse through the log messages for pvc-96c19fae-fe58-11e6-84b7-0edc552e3d92 (vol-094673f29dd5b5c81) https://paste.fedoraproject.org/paste/YBxE1QvxmuQ3Gg1rWHltXl5M1UNdIGYhyRLivL9gydE= It appears the directory "/var/lib/origin/openshift.local.volumes/pods/9f98cc56-fe58-11e6-84b7-0edc552e3d92/volumes/kubernetes.io~aws-ebs/pvc-96c19fae-fe58-11e6-84b7-0edc552e3d92" gets unmounted successfully at first, it gets seen as "unmounted"/"not a mount point" from the first unmount attempt onward, but every removal attempt ultimately fails with "device or resource busy." After 6 minutes force detach is triggered. But in "Waiting for volume" to detach, we see the volume state goes from detaching -> busy.
FYI the '=' is the last character of the URL and bugzilla doesn't like it https://paste.fedoraproject.org/paste/YBxE1QvxmuQ3Gg1rWHltXl5M1UNdIGYhyRLivL9gydE=?
Could not reproduce on the same version of ose. The only difference is that I don't restart the node using systemd, I just kill it. Tried 3x varying the time between node deletion and such but for me detach always succeeded. I also tried deleting PVC before pod and it worked fine, so that's probably not it. Please @chaoyang can you confirm this is always reproducible and give more precise steps if they are necessary. How long do you wait between each step, etc.. Thanks here's a set of my logs to compare, if needed master: https://paste.fedoraproject.org/paste/KLLfeWEaRZFdM3Rvs-x8615M1UNdIGYhyRLivL9gydE= node pre-restart: https://paste.fedoraproject.org/paste/U1Y4dTIdbi5CGcPeKaeoFF5M1UNdIGYhyRLivL9gydE= node post-restart: https://paste.fedoraproject.org/paste/-vZSTrsV616AU~~11mhne15M1UNdIGYhyRLivL9gydE=
Created attachment 1258933 [details] logs with successful detach, following same repro steps on same version Sorry for noise, pastebin truncated my logs so I'm attaching
The openshift node container is run with: * "-v /:/rootfs:ro" because nsenter_mount.go expects the host's "/" to be at "/rootfs" inside the container * "-v /var/lib/origin:/var/lib/origin:rslave" because the container needs mounts/unmounts it does in the host namespace (using nsenter mount) to propagate back down to it If you restart the node while you have a volume mounted at /var/lib/origin/example you will again bind mount "/:/rootfs" as usual and you will unfortunately end up mounting "/var/lib/origin/example:/rootfs/var/lib/origin/example." This is the source of the problem. Because when you unmount /var/lib/origin/example, that doesn't propagate down to your /rootfs/var/lib/origin/example. So you've got this /rootfs/var/lib/origin/example mountpoint keeping the device busy. Best explanation I could do, still thinking of a solution, will continue tomorrow :)
Created attachment 1259350 [details] output of 'mount' at various stages in the bug repro
Hemant, what happens if you add ":rslave" to your "-v /:/rootfs/" in the node service file (/etc/systemd/system/atomic-openshift-node.service iirc)?
Ok, I was able to reproduce with docker 1.12. As suspected the default propagation is rprivate. I added rslave to the volume option in /etc/systemd/system/origin-node.service and confirmed that this addresses the issue. Now, whether it introduces another issue is a different question. The volume is ro from inside the container so the only difference is that mount events on the host propagate to /rootfs in the origin node container. I can't think of a case in which one wouldn't want that. I opened a PR to change the systemd service file: https://github.com/openshift/origin/pull/13327
https://github.com/openshift/openshift-ansible/pull/3728 backported to release-1.5
I am still able to reproduce this issue on openshift v3.5.2 kubernetes v1.5.2+43a9be4 etcd 3.1.0 The installer fix is already in. /usr/bin/docker-current run --name atomic-openshift-node --rm --privileged --net=host --pid=host --env-file=/etc/sysconfig/atomic-openshift-node -v /:/rootfs:ro,rslave -e CONFIG_FILE=/etc/origin/node/node-config.yaml -e OPTIONS=--loglevel=5 -e HOST=/rootfs -e HOST_ETC=/host-etc -v /var/lib/origin:/var/lib/origin:rslave -v /etc/origin/node:/etc/origin/node -v /etc/origin/cloudprovider:/etc/origin/cloudprovider -v /etc/localtime:/etc/localtime:ro -v /etc/machine-id:/etc/machine-id:ro -v /run:/run -v /sys:/sys:rw -v /sys/fs/cgroup:/sys/fs/cgroup:rw -v /usr/bin/docker:/usr/bin/docker:ro -v /var/lib/docker:/var/lib/docker -v /lib/modules:/lib/modules -v /etc/origin/openvswitch:/etc/openvswitch -v /etc/origin/sdn:/etc/openshift-sdn -v /var/lib/cni:/var/lib/cni -v /etc/systemd/system:/host-etc/systemd/system -v /var/log:/var/log -v /dev:/dev --volume=/usr/bin/docker-current:/usr/bin/docker-current:ro --volume=/etc/sysconfig/docker:/etc/sysconfig/docker:ro openshift3/node:v3.5.2 Steps: In containerized OCP 1. Create dynamic PVC/Pods 2. Restart atomic-openshift-node service 3. Delete Pod and PVC 4. List PV and find PV could not be deleted. On nodes, volumes are not unmounted. oc get pv NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM REASON AGE pvc-33cfc2ab-0fa0-11e7-88a9-0e7333781f58 1Gi RWO Delete Released jhou/ebsc 41m pvc-3e076d0a-0fa2-11e7-88a9-0e7333781f58 1Gi RWO Delete Released jhou1/ebsc 26m mount|grep ebs /dev/xvdbb on /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/aws-ebs/mounts/aws/us-east-1d/vol-0d6fd4531ed1d7c57 type ext4 (rw,relatime,seclabel,data=ordered) /dev/xvdbc on /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/aws-ebs/mounts/aws/us-east-1d/vol-0026bc4e3288e3cd5 type ext4 (rw,relatime,seclabel,data=ordered)
Attempt to unmount vol-0026bc4e3288e3cd5 (/dev/xvdbc): Mar 23 06:43:38 ip-172-18-14-229.ec2.internal atomic-openshift-node[11475]: I0323 06:43:38.173294 11535 reconciler.go:189] UnmountVolume operation started for volume "kubernetes.io/aws-ebs/aws://us-east-1d/vol-0026bc4e3288e3cd5" (spec.Name: "pvc-3e076d0a-0fa2-11e7-88a9-0e7333781f58") from pod "4240c473-0fa2-11e7-88a9-0e7333781f58" (UID: "4240c473-0fa2-11e7-88a9-0e7333781f58"). Mar 23 06:43:38 ip-172-18-14-229.ec2.internal atomic-openshift-node[11475]: I0323 06:43:38.173402 11535 nsenter_mount.go:183] findmnt command: nsenter [--mount=/rootfs/proc/1/ns/mnt -- /bin/findmnt -o target,fstype --noheadings --first-only --target /var/lib/origin/openshift.local.volumes/pods/4240c473-0fa2-11e7-88a9-0e7333781f58/volumes/kubernetes.io~aws-ebs/pvc-3e076d0a-0fa2-11e7-88a9-0e7333781f58] Mar 23 06:43:38 ip-172-18-14-229.ec2.internal atomic-openshift-node[11475]: I0323 06:43:38.179508 11535 nsenter_mount.go:196] IsLikelyNotMountPoint findmnt output for path /var/lib/origin/openshift.local.volumes/pods/4240c473-0fa2-11e7-88a9-0e7333781f58/volumes/kubernetes.io~aws-ebs/pvc-3e076d0a-0fa2-11e7-88a9-0e7333781f58: /sys: Mar 23 06:43:38 ip-172-18-14-229.ec2.internal atomic-openshift-node[11475]: I0323 06:43:38.179525 11535 nsenter_mount.go:202] IsLikelyNotMountPoint: /var/lib/origin/openshift.local.volumes/pods/4240c473-0fa2-11e7-88a9-0e7333781f58/volumes/kubernetes.io~aws-ebs/pvc-3e076d0a-0fa2-11e7-88a9-0e7333781f58 is not a mount point Mar 23 06:43:38 ip-172-18-14-229.ec2.internal atomic-openshift-node[11475]: W0323 06:43:38.179531 11535 util.go:84] Warning: "/var/lib/origin/openshift.local.volumes/pods/4240c473-0fa2-11e7-88a9-0e7333781f58/volumes/kubernetes.io~aws-ebs/pvc-3e076d0a-0fa2-11e7-88a9-0e7333781f58" is not a mountpoint, deleting Mar 23 06:43:38 ip-172-18-14-229.ec2.internal atomic-openshift-node[11475]: E0323 06:43:38.179592 11535 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/aws-ebs/aws://us-east-1d/vol-0026bc4e3288e3cd5\" (\"4240c473-0fa2-11e7-88a9-0e7333781f58\")" failed. No retries permitted until 2017-03-23 06:45:38.179565672 -0400 EDT (durationBeforeRetry 2m0s). Error: UnmountVolume.TearDown failed for volume "kubernetes.io/aws-ebs/aws://us-east-1d/vol-0026bc4e3288e3cd5" (volume.spec.Name: "pvc-3e076d0a-0fa2-11e7-88a9-0e7333781f58") pod "4240c473-0fa2-11e7-88a9-0e7333781f58" (UID: "4240c473-0fa2-11e7-88a9-0e7333781f58") with: remove /var/lib/origin/openshift.local.volumes/pods/4240c473-0fa2-11e7-88a9-0e7333781f58/volumes/kubernetes.io~aws-ebs/pvc-3e076d0a-0fa2-11e7-88a9-0e7333781f58: device or resource busy On the host I can see that nothing is mounted on [...]/volumes/kubernetes.io~aws-ebs/pvc-3e076d0a-0fa2-11e7-88a9-0e7333781f58, but it cannot be deleted. /etc/systemd/system/atomic-openshift-node.service contains rslave in these bind mounts: docker run ... -v /:/rootfs:ro,rslave ... -v /var/lib/origin:/var/lib/origin:rslave I got lost in /proc/<openshift node>/mountinfo, it has 356 lines.
Created attachment 1265760 [details] /proc/1/mountinfo cat /proc/1/mountinfo on the host
Created attachment 1265761 [details] /proc/<openshift node>/mountinfo cat /proc/`pidof openshift`/mountinfo (i.e. openshift-node view of mounts)
There are two mount points at play here: The plugin mount (ebs block device to plugins mount point) in /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/aws-ebs/mounts/aws The volume mount (bind mount plugins mount to pvc volumes mount) in openshift.local.volumes/pods/<uuid>/volumes/kubernetes.io~aws-ebs The plugin mount still exists in the host and also, therefore, in the container 267 60 202:13824 / /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/aws-ebs/mounts/aws/us-east-1d/vol-0026bc4e3288e3cd5 rw,relatime shared:134 - ext4 /dev/xvdbc rw,seclabel,data=ordered The volume mount does not exist in the host, but does in the container /var/lib/docker/devicemapper/mnt/aa87d538248aee4226b782898d2f50ed40f3f7f7c1f0e57238683e5d26df9cd4/rootfs/var/lib/origin/openshift.local.volumes/pods/4240c473-0fa2-11e7-88a9-0e7333781f58/volumes/kubernetes.io~aws-ebs/pvc-3e076d0a-0fa2-11e7-88a9-0e7333781f58 rw,relatime - ext4 /dev/xvdbc rw,seclabel,data=ordered The is the cause of the umount failure. The umount of the volume in the host is not propagating to the container. I'm not sure why the container still has this mount point though if /rootfs and /var/lib/docker in the container are rslave. Very strange.
Ok so this is trickier than I first realized. The graphdriver mount points under /var/lib/docker/<graphdriver> are mounted on the _host_ as private. This means that even though we mount /rootfs and /var/lib/docker in the node container with the rslave option in the docker run command, they are recursively mounted private in the container as docker can't bind mount as shared a source that is private as a private mount, by definition, doesn't send or receive mount propagation events. While mount activity is done in the host mount namespace via nsenter by the node process, there are still parts of the node code that will inspect /var/lib/docker for statistics and to figure out the current state. So /var/lib/docker needs to be mounted in the container. But if you do that, you will encounter the issue in this bug as /var/lib/docker/<graphdriver> mount points will be private and any container mount points that exist when the node starts will be held by the container even if they are umounted in the host mount namespace. The only solution I can think of is to have a docker option for overriding the graphdriver mount point propagation to be shared. This is not a bug in Kubernetes or Origin, but rather a side-effect resulting from choosing to mount graphdriver mount points as private. The original change I recommended is still a good idea as there can be other mount points under /rootfs that should receive mount propagation events, but doesn't address the issue. In light of this, I'm assigning to Containers to continue the discussion on what can be done. What are the downsides setting shared on the the graphdriver mount points?
*** Bug 1459006 has been marked as a duplicate of this bug. ***
Now we have implemented oci-umount plugin which unmounts /var/lib/docker/graphdriver/ mount points from inside the container when container starts. So try the latest docker and hopefully that should resolve the issue.
QE can you test if this issue is fixed or not. I think we have a clone of this bug where QE said it is fixed. If that's the case, this bug should be marked verified also. But I will let QE do that.
It is passed on oc v3.5.5.31.2 kubernetes v1.5.2+43a9be4 docker version Client: Version: 1.12.6 API version: 1.24 Package version: docker-1.12.6-32.git88a4867.el7.x86_64 Go version: go1.7.6 Git commit: 88a4867/1.12.6 Built: Mon Jun 19 17:26:57 2017 OS/Arch: linux/amd64 Server: Version: 1.12.6 API version: 1.24 Package version: docker-1.12.6-32.git88a4867.el7.x86_64 Go version: go1.7.6 Git commit: 88a4867/1.12.6 Built: Mon Jun 19 17:26:57 2017 OS/Arch: linux/amd64 After pod is running, delete pod and pvc, pv is also deleted successfully