1427807 – Could not detach ebs volume after restart atomic-openshift-node service

Bug 1427807 - Could not detach ebs volume after restart atomic-openshift-node service

Summary: Could not detach ebs volume after restart atomic-openshift-node service

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Containers
Sub Component:
Version:	3.5.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	3.5.z
Assignee:	Vivek Goyal
QA Contact:	Chao Yang
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1459006 (view as bug list)
Depends On:
Blocks:	1434599
TreeView+	depends on / blocked

Reported:	2017-03-01 09:54 UTC by Chao Yang
Modified:	2019-11-21 18:38 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Clones:	1434599 (view as bug list)
Environment:
Last Closed:	2019-11-21 18:38:23 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
node log (1.53 MB, application/zip) 2017-03-01 09:54 UTC, Chao Yang	no flags	Details
master log (1.08 MB, application/zip) 2017-03-01 09:55 UTC, Chao Yang	no flags	Details
logs with successful detach, following same repro steps on same version (138.66 KB, application/x-xz) 2017-03-01 23:16 UTC, Matthew Wong	no flags	Details
output of 'mount' at various stages in the bug repro (4.84 KB, application/x-xz) 2017-03-02 22:44 UTC, Matthew Wong	no flags	Details
/proc/1/mountinfo (4.61 KB, text/plain) 2017-03-23 13:35 UTC, Jan Safranek	no flags	Details
/proc/<openshift node>/mountinfo (75.39 KB, text/plain) 2017-03-23 13:37 UTC, Jan Safranek	no flags	Details
Show Obsolete (1) View All

Description Chao Yang 2017-03-01 09:54:11 UTC

Created attachment 1258623 [details]
node log

Description of problem:
Could not detach ebs volume after restart atomic-openshift-node service
I found this issue on the container env
Version-Release number of selected component (if applicable):
openshift v3.5.0.35
kubernetes v1.5.2+43a9be4
etcd 3.1.0

How reproducible:
Always

Steps to Reproduce:
1.Create a pvc
https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/persistent-volumes/misc/pvc.json
2.Create a pod
https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/persistent-volumes/misc/pod.yaml
3.After pod is running, restart atomic-openshift-node service
systemctl restart atomic-openshift-node
4.Delete pod and pvc
5.PV is in Released status

Actual results:
PV is in Released status, could be detached

Expected results:
PV is deleted and umounted

Additional info:
Mar  1 04:00:51 ip-172-18-15-152 atomic-openshift-node: I0301 04:00:51.626011   16462 nsenter_mount.go:196] IsLikelyNotMountPoint findmnt output for path /var/lib/origin/openshift.local.volumes/pods/9f98cc56-fe58-11e6-84b7-0edc552e3d92/volumes/kubernetes.io~aws-ebs/pvc-96c19fae-fe58-11e6-84b7-0edc552e3d92: /:
Mar  1 04:00:51 ip-172-18-15-152 atomic-openshift-node: I0301 04:00:51.626026   16462 nsenter_mount.go:202] IsLikelyNotMountPoint: /var/lib/origin/openshift.local.volumes/pods/9f98cc56-fe58-11e6-84b7-0edc552e3d92/volumes/kubernetes.io~aws-ebs/pvc-96c19fae-fe58-11e6-84b7-0edc552e3d92 is not a mount point
Mar  1 04:00:51 ip-172-18-15-152 atomic-openshift-node: W0301 04:00:51.626032   16462 util.go:84] Warning: "/var/lib/origin/openshift.local.volumes/pods/9f98cc56-fe58-11e6-84b7-0edc552e3d92/volumes/kubernetes.io~aws-ebs/pvc-96c19fae-fe58-11e6-84b7-0edc552e3d92" is not a mountpoint, deleting

Comment 1 Chao Yang 2017-03-01 09:55:27 UTC

Created attachment 1258624 [details]
master log

Comment 2 Tomas Smetana 2017-03-01 19:04:01 UTC

Looking at the logs:

Mar 01 03:28:06 ip-172-18-14-41.ec2.internal docker[17082]: I0301 08:28:06.767846       1 namespace_controller_utils.go:352] namespace controller - deleteAllContent - namespace: ywn61,
...
Mar 01 03:28:07 ip-172-18-14-41.ec2.internal docker[17082]: I0301 08:28:07.442888       1 namespace_controller_utils.go:167] namespace controller - deleteCollection - namespace: ywn61, gvr: { v1 persistentvolumeclaims}
...
Mar 01 03:28:08 ip-172-18-14-41.ec2.internal docker[17082]: I0301 08:28:08.163615       1 namespace_controller_utils.go:167] namespace controller - deleteCollection - namespace: ywn61, gvr: { v1 pods}

So my hypothesis is: the pods have been deleted by deleting their namespace somehow and the order of deleting items in the namespace is wrong: the pvcs that are still being in use are attempted to be deleted before the pods that use them -- so the volume detaching fails.

Comment 3 Matthew Wong 2017-03-01 19:17:10 UTC

I parse through the log messages for pvc-96c19fae-fe58-11e6-84b7-0edc552e3d92 (vol-094673f29dd5b5c81) https://paste.fedoraproject.org/paste/YBxE1QvxmuQ3Gg1rWHltXl5M1UNdIGYhyRLivL9gydE=

It appears the directory "/var/lib/origin/openshift.local.volumes/pods/9f98cc56-fe58-11e6-84b7-0edc552e3d92/volumes/kubernetes.io~aws-ebs/pvc-96c19fae-fe58-11e6-84b7-0edc552e3d92" gets unmounted successfully at first, it gets seen as "unmounted"/"not a mount point" from the first unmount attempt onward, but every removal attempt ultimately fails with "device or resource busy."

After 6 minutes force detach is triggered. But in "Waiting for volume" to detach, we see the volume state goes from detaching -> busy.

Comment 4 Matthew Wong 2017-03-01 19:21:56 UTC

FYI the '=' is the last character of the URL and bugzilla doesn't like it

https://paste.fedoraproject.org/paste/YBxE1QvxmuQ3Gg1rWHltXl5M1UNdIGYhyRLivL9gydE=?

Comment 5 Matthew Wong 2017-03-01 23:14:13 UTC

Could not reproduce on the same version of ose. The only difference is that I don't restart the node using systemd, I just kill it. Tried 3x varying the time between node deletion and such but for me detach always succeeded. I also tried deleting PVC before pod and it worked fine, so that's probably not it.

Please @chaoyang can you confirm this is always reproducible and give more precise steps if they are necessary. How long do you wait between each step, etc.. Thanks

here's a set of my logs to compare, if needed
master:
https://paste.fedoraproject.org/paste/KLLfeWEaRZFdM3Rvs-x8615M1UNdIGYhyRLivL9gydE=
node pre-restart:
https://paste.fedoraproject.org/paste/U1Y4dTIdbi5CGcPeKaeoFF5M1UNdIGYhyRLivL9gydE=
node post-restart:
https://paste.fedoraproject.org/paste/-vZSTrsV616AU~~11mhne15M1UNdIGYhyRLivL9gydE=

Comment 6 Matthew Wong 2017-03-01 23:16:40 UTC

Created attachment 1258933 [details]
logs with successful detach, following same repro steps on same version

Sorry for noise, pastebin truncated my logs so I'm attaching

Comment 11 Matthew Wong 2017-03-02 22:40:07 UTC

The openshift node container is run with:
  * "-v /:/rootfs:ro" because nsenter_mount.go expects the host's "/" to be at "/rootfs" inside the container
  * "-v /var/lib/origin:/var/lib/origin:rslave" because the container needs mounts/unmounts it does in the host namespace (using nsenter mount) to propagate back down to it

If you restart the node while you have a volume mounted at /var/lib/origin/example you will again bind mount "/:/rootfs" as usual and you will unfortunately end up mounting "/var/lib/origin/example:/rootfs/var/lib/origin/example." This is the source of the problem. Because when you unmount /var/lib/origin/example, that doesn't propagate down to your /rootfs/var/lib/origin/example. So you've got this /rootfs/var/lib/origin/example mountpoint keeping the device busy.

Best explanation I could do, still thinking of a solution, will continue tomorrow :)

Comment 12 Matthew Wong 2017-03-02 22:44:36 UTC

Created attachment 1259350 [details]
output of 'mount' at various stages in the bug repro

Comment 26 Seth Jennings 2017-03-08 20:42:46 UTC

Hemant, what happens if you add ":rslave" to your "-v /:/rootfs/" in the node service file (/etc/systemd/system/atomic-openshift-node.service iirc)?

Comment 28 Seth Jennings 2017-03-09 19:01:49 UTC

Ok, I was able to reproduce with docker 1.12.  As suspected the default propagation is rprivate.  I added rslave to the volume option in /etc/systemd/system/origin-node.service and confirmed that this addresses the issue.

Now, whether it introduces another issue is a different question.  The volume is ro from inside the container so the only difference is that mount events on the host propagate to /rootfs in the origin node container.  I can't think of a case in which one wouldn't want that.

I opened a PR to change the systemd service file:
https://github.com/openshift/origin/pull/13327

Comment 29 Scott Dodson 2017-03-22 17:22:31 UTC

https://github.com/openshift/openshift-ansible/pull/3728 backported to release-1.5

Comment 31 Jianwei Hou 2017-03-23 09:00:19 UTC

I am still able to reproduce this issue on 
openshift v3.5.2
kubernetes v1.5.2+43a9be4
etcd 3.1.0

The installer fix is already in.

/usr/bin/docker-current run --name atomic-openshift-node --rm --privileged --net=host --pid=host --env-file=/etc/sysconfig/atomic-openshift-node -v /:/rootfs:ro,rslave -e CONFIG_FILE=/etc/origin/node/node-config.yaml -e OPTIONS=--loglevel=5 -e HOST=/rootfs -e HOST_ETC=/host-etc -v /var/lib/origin:/var/lib/origin:rslave -v /etc/origin/node:/etc/origin/node -v /etc/origin/cloudprovider:/etc/origin/cloudprovider -v /etc/localtime:/etc/localtime:ro -v /etc/machine-id:/etc/machine-id:ro -v /run:/run -v /sys:/sys:rw -v /sys/fs/cgroup:/sys/fs/cgroup:rw -v /usr/bin/docker:/usr/bin/docker:ro -v /var/lib/docker:/var/lib/docker -v /lib/modules:/lib/modules -v /etc/origin/openvswitch:/etc/openvswitch -v /etc/origin/sdn:/etc/openshift-sdn -v /var/lib/cni:/var/lib/cni -v /etc/systemd/system:/host-etc/systemd/system -v /var/log:/var/log -v /dev:/dev --volume=/usr/bin/docker-current:/usr/bin/docker-current:ro --volume=/etc/sysconfig/docker:/etc/sysconfig/docker:ro openshift3/node:v3.5.2

Steps:
In containerized OCP
1. Create dynamic PVC/Pods
2. Restart atomic-openshift-node service
3. Delete Pod and PVC
4. List PV and find PV could not be deleted. On nodes, volumes are not unmounted.  

oc get pv
NAME                                       CAPACITY   ACCESSMODES   RECLAIMPOLICY   STATUS     CLAIM        REASON    AGE
pvc-33cfc2ab-0fa0-11e7-88a9-0e7333781f58   1Gi        RWO           Delete          Released   jhou/ebsc              41m
pvc-3e076d0a-0fa2-11e7-88a9-0e7333781f58   1Gi        RWO           Delete          Released   jhou1/ebsc             26m

 mount|grep ebs
/dev/xvdbb on /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/aws-ebs/mounts/aws/us-east-1d/vol-0d6fd4531ed1d7c57 type ext4 (rw,relatime,seclabel,data=ordered)
/dev/xvdbc on /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/aws-ebs/mounts/aws/us-east-1d/vol-0026bc4e3288e3cd5 type ext4 (rw,relatime,seclabel,data=ordered)

Comment 34 Jan Safranek 2017-03-23 13:27:57 UTC

Attempt to unmount vol-0026bc4e3288e3cd5 (/dev/xvdbc):

Mar 23 06:43:38 ip-172-18-14-229.ec2.internal atomic-openshift-node[11475]: I0323 06:43:38.173294   11535 reconciler.go:189] UnmountVolume operation started for volume "kubernetes.io/aws-ebs/aws://us-east-1d/vol-0026bc4e3288e3cd5" (spec.Name: "pvc-3e076d0a-0fa2-11e7-88a9-0e7333781f58") from pod "4240c473-0fa2-11e7-88a9-0e7333781f58" (UID: "4240c473-0fa2-11e7-88a9-0e7333781f58").
Mar 23 06:43:38 ip-172-18-14-229.ec2.internal atomic-openshift-node[11475]: I0323 06:43:38.173402   11535 nsenter_mount.go:183] findmnt command: nsenter [--mount=/rootfs/proc/1/ns/mnt -- /bin/findmnt -o target,fstype --noheadings --first-only --target /var/lib/origin/openshift.local.volumes/pods/4240c473-0fa2-11e7-88a9-0e7333781f58/volumes/kubernetes.io~aws-ebs/pvc-3e076d0a-0fa2-11e7-88a9-0e7333781f58]
Mar 23 06:43:38 ip-172-18-14-229.ec2.internal atomic-openshift-node[11475]: I0323 06:43:38.179508   11535 nsenter_mount.go:196] IsLikelyNotMountPoint findmnt output for path /var/lib/origin/openshift.local.volumes/pods/4240c473-0fa2-11e7-88a9-0e7333781f58/volumes/kubernetes.io~aws-ebs/pvc-3e076d0a-0fa2-11e7-88a9-0e7333781f58: /sys:
Mar 23 06:43:38 ip-172-18-14-229.ec2.internal atomic-openshift-node[11475]: I0323 06:43:38.179525   11535 nsenter_mount.go:202] IsLikelyNotMountPoint: /var/lib/origin/openshift.local.volumes/pods/4240c473-0fa2-11e7-88a9-0e7333781f58/volumes/kubernetes.io~aws-ebs/pvc-3e076d0a-0fa2-11e7-88a9-0e7333781f58 is not a mount point
Mar 23 06:43:38 ip-172-18-14-229.ec2.internal atomic-openshift-node[11475]: W0323 06:43:38.179531   11535 util.go:84] Warning: "/var/lib/origin/openshift.local.volumes/pods/4240c473-0fa2-11e7-88a9-0e7333781f58/volumes/kubernetes.io~aws-ebs/pvc-3e076d0a-0fa2-11e7-88a9-0e7333781f58" is not a mountpoint, deleting
Mar 23 06:43:38 ip-172-18-14-229.ec2.internal atomic-openshift-node[11475]: E0323 06:43:38.179592   11535 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/aws-ebs/aws://us-east-1d/vol-0026bc4e3288e3cd5\" (\"4240c473-0fa2-11e7-88a9-0e7333781f58\")" failed. No retries permitted until 2017-03-23 06:45:38.179565672 -0400 EDT (durationBeforeRetry 2m0s). Error: UnmountVolume.TearDown failed for volume "kubernetes.io/aws-ebs/aws://us-east-1d/vol-0026bc4e3288e3cd5" (volume.spec.Name: "pvc-3e076d0a-0fa2-11e7-88a9-0e7333781f58") pod "4240c473-0fa2-11e7-88a9-0e7333781f58" (UID: "4240c473-0fa2-11e7-88a9-0e7333781f58") with: remove /var/lib/origin/openshift.local.volumes/pods/4240c473-0fa2-11e7-88a9-0e7333781f58/volumes/kubernetes.io~aws-ebs/pvc-3e076d0a-0fa2-11e7-88a9-0e7333781f58: device or resource busy

On the host I can see that nothing is mounted on [...]/volumes/kubernetes.io~aws-ebs/pvc-3e076d0a-0fa2-11e7-88a9-0e7333781f58, but it cannot be deleted.

/etc/systemd/system/atomic-openshift-node.service contains rslave in these bind mounts:
docker run ... -v /:/rootfs:ro,rslave ... -v /var/lib/origin:/var/lib/origin:rslave

I got lost in /proc/<openshift node>/mountinfo, it has 356 lines.

Comment 35 Jan Safranek 2017-03-23 13:35:36 UTC

Created attachment 1265760 [details]
/proc/1/mountinfo

cat /proc/1/mountinfo on the host

Comment 36 Jan Safranek 2017-03-23 13:37:27 UTC

Created attachment 1265761 [details]
/proc/<openshift node>/mountinfo

cat /proc/`pidof openshift`/mountinfo (i.e. openshift-node view of mounts)

Comment 37 Seth Jennings 2017-03-27 15:26:25 UTC

There are two mount points at play here:

The plugin mount (ebs block device to plugins mount point) in /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/aws-ebs/mounts/aws

The volume mount (bind mount plugins mount to pvc volumes mount) in 
openshift.local.volumes/pods/<uuid>/volumes/kubernetes.io~aws-ebs

The plugin mount still exists in the host and also, therefore, in the container

267 60 202:13824 / /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/aws-ebs/mounts/aws/us-east-1d/vol-0026bc4e3288e3cd5 rw,relatime shared:134 - ext4 /dev/xvdbc rw,seclabel,data=ordered

The volume mount does not exist in the host, but does in the container

/var/lib/docker/devicemapper/mnt/aa87d538248aee4226b782898d2f50ed40f3f7f7c1f0e57238683e5d26df9cd4/rootfs/var/lib/origin/openshift.local.volumes/pods/4240c473-0fa2-11e7-88a9-0e7333781f58/volumes/kubernetes.io~aws-ebs/pvc-3e076d0a-0fa2-11e7-88a9-0e7333781f58 rw,relatime - ext4 /dev/xvdbc rw,seclabel,data=ordered

The is the cause of the umount failure.  The umount of the volume in the host is not propagating to the container.

I'm not sure why the container still has this mount point though if /rootfs and /var/lib/docker in the container are rslave.  Very strange.

Comment 38 Seth Jennings 2017-03-28 01:57:39 UTC

Ok so this is trickier than I first realized.

The graphdriver mount points under /var/lib/docker/<graphdriver> are mounted on the _host_ as private.  This means that even though we mount /rootfs and /var/lib/docker in the node container with the rslave option in the docker run command, they are recursively mounted private in the container as docker can't bind mount as shared a source that is private as a private mount, by definition, doesn't send or receive mount propagation events.

While mount activity is done in the host mount namespace via nsenter by the node process, there are still parts of the node code that will inspect /var/lib/docker for statistics and to figure out the current state.  So /var/lib/docker needs to be mounted in the container.  But if you do that, you will encounter the issue in this bug as /var/lib/docker/<graphdriver> mount points will be private and any container mount points that exist when the node starts will be held by the container even if they are umounted in the host mount namespace.

The only solution I can think of is to have a docker option for overriding the graphdriver mount point propagation to be shared.

This is not a bug in Kubernetes or Origin, but rather a side-effect resulting from choosing to mount graphdriver mount points as private.

The original change I recommended is still a good idea as there can be other mount points under /rootfs that should receive mount propagation events, but doesn't address the issue. 

In light of this, I'm assigning to Containers to continue the discussion on what can be done.  What are the downsides setting shared on the the graphdriver mount points?

Comment 39 Matthew Wong 2017-06-06 18:54:07 UTC

*** Bug 1459006 has been marked as a duplicate of this bug. ***

Comment 40 Vivek Goyal 2017-06-14 18:28:53 UTC

Now we have implemented oci-umount plugin which unmounts /var/lib/docker/graphdriver/ mount points from inside the container when container starts. 

So try the latest docker and hopefully that should resolve the issue.

Comment 41 Vivek Goyal 2017-07-05 13:08:53 UTC

QE can you test if this issue is fixed or not. I think we have a clone of this bug where QE said it is fixed. If that's the case, this bug should be marked verified also. But I will let QE do that.

Comment 43 Chao Yang 2017-07-06 07:33:03 UTC

It is passed on
oc v3.5.5.31.2
kubernetes v1.5.2+43a9be4

docker version
Client:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-32.git88a4867.el7.x86_64
 Go version:      go1.7.6
 Git commit:      88a4867/1.12.6
 Built:           Mon Jun 19 17:26:57 2017
 OS/Arch:         linux/amd64

Server:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-32.git88a4867.el7.x86_64
 Go version:      go1.7.6
 Git commit:      88a4867/1.12.6
 Built:           Mon Jun 19 17:26:57 2017
 OS/Arch:         linux/amd64

After pod is running, delete pod and pvc, pv is also deleted successfully

Note You need to log in before you can comment on or make changes to this bug.