Bug 1434599 - [3.4] Could not detach ebs volume after restart atomic-openshift-node service
Summary: [3.4] Could not detach ebs volume after restart atomic-openshift-node service
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers
Version: 3.4.1
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.4.z
Assignee: Vivek Goyal
QA Contact: Chao Yang
URL:
Whiteboard:
Depends On: 1427807
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-21 21:40 UTC by Scott Dodson
Modified: 2017-07-11 10:47 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1427807
Environment:
Last Closed: 2017-07-11 10:47:38 UTC
Target Upstream Version:
Embargoed:
lxia: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:1640 0 normal SHIPPED_LIVE OpenShift Container Platform 3.5 and 3.4 bug fix update 2017-07-11 14:47:16 UTC

Comment 1 Scott Dodson 2017-03-21 21:41:29 UTC
https://github.com/openshift/openshift-ansible/pull/3729 to update installer for 3.4

Comment 4 Johnny Liu 2017-03-30 08:47:50 UTC
Verified this bug with openshift-ansible-3.4.74-1.git.0.6542413.el7.noarch, installer PR is already merged, installation and sti build is running well. Note that this verification is only validating installer part.


# ps -ef|grep node|grep slave
root      18548      1  0 04:03 ?        00:00:10 /usr/bin/docker-current run --name atomic-openshift-node --rm --privileged --net=host --pid=host --env-file=/etc/sysconfig/atomic-openshift-node -v /:/rootfs:ro,rslave -e CONFIG_FILE=/etc/origin/node/node-config.yaml -e OPTIONS=--loglevel=5 -e HOST=/rootfs -e HOST_ETC=/host-etc -v /var/lib/origin:/var/lib/origin:rslave -v /etc/origin/node:/etc/origin/node -v /etc/localtime:/etc/localtime:ro -v /etc/machine-id:/etc/machine-id:ro -v /run:/run -v /sys:/sys:rw -v /usr/bin/docker:/usr/bin/docker:ro -v /var/lib/docker:/var/lib/docker -v /lib/modules:/lib/modules -v /etc/origin/openvswitch:/etc/openvswitch -v /etc/origin/sdn:/etc/openshift-sdn -v /var/lib/cni:/var/lib/cni -v /etc/systemd/system:/host-etc/systemd/system -v /var/log:/var/log -v /dev:/dev --volume=/usr/bin/docker-current:/usr/bin/docker-current:ro --volume=/etc/sysconfig/docker:/etc/sysconfig/docker:ro openshift3/node:v3.4.1.12

@Scott, according to Bug #1427807, seem like the installer PR is not enough to fix the "detach ebs volume" issue. Do we really mean to use this bug to track "detach ebs volume" issue? or this bug is only used to track installer update?

If it is the former, QE have to move this bug to "ASSIGNED" status, if it is the latter, pls re-attach this bug to installer advisory, move it back to "ON_QA" status.

Comment 9 Jhon Honce 2017-06-14 16:01:19 UTC
Please retest with with 7.3.6, the addition of the oci-umount plugin should resolve this issue.  Thank you.

https://github.com/projectatomic/oci-umount

Comment 11 Chao Yang 2017-06-23 08:13:23 UTC
The ebs volume is still in "Released" status on 7.3.6, version is 
Linux ip-172-18-0-127.ec2.internal 3.10.0-514.25.2.el7.x86_64 #1 SMP Tue May 30 02:42:10 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

Comment 12 Jhon Honce 2017-06-27 16:27:09 UTC
Which version of the docker rpm was used in #11?

Comment 13 Praveen Varma 2017-06-28 04:04:48 UTC
@Vivek- We have a situation here with regards to the errata - https://errata.devel.redhat.com/advisory/29143 where the release date is tomorrow (29th June) and the customer is looking for this for quite some time. Customer also escalated this several times and Mustafa, Sudhir, Satish and a lot of others from the senior management is directly involved to get the issues taken care for the customer. Just received an update from Xiaoli Tan that if these bugs are fixed today, we could still have the timely release tomorrow.

Thanks,
Praveen
Escalation Manager

Comment 14 Liang Xia 2017-06-28 06:09:18 UTC
# oc version
oc v3.4.1.44
kubernetes v1.4.0+776c994
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-18-9-101.ec2.internal
openshift v3.4.1.44
kubernetes v1.4.0+776c994

===============================================================================

# docker version
Client:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-28.git1398f24.el7.x86_64
 Go version:      go1.7.4
 Git commit:      1398f24/1.12.6
 Built:           Wed May 17 01:16:44 2017
 OS/Arch:         linux/amd64

Server:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-28.git1398f24.el7.x86_64
 Go version:      go1.7.4
 Git commit:      1398f24/1.12.6
 Built:           Wed May 17 01:16:44 2017
 OS/Arch:         linux/amd64

===============================================================================

# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.3 (Maipo)

===============================================================================

# uname -a
Linux ip-172-18-9-101.ec2.internal 3.10.0-514.26.1.el7.x86_64 #1 SMP Tue Jun 20 01:16:02 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

===============================================================================

# oc describe pv pvc-90480aca-5bc5-11e7-8e36-0e21840058f8
Name:		pvc-90480aca-5bc5-11e7-8e36-0e21840058f8
Labels:		failure-domain.beta.kubernetes.io/region=us-east-1
		failure-domain.beta.kubernetes.io/zone=us-east-1d
StorageClass:	
Status:		Failed
Claim:		default/pvc1
Reclaim Policy:	Delete
Access Modes:	RWO
Capacity:	1Gi
Message:	Delete of volume "pvc-90480aca-5bc5-11e7-8e36-0e21840058f8" failed: error deleting EBS volumes: VolumeInUse: Volume vol-0010b9fca128a96ce is currently attached to i-0df6b59f16561f59b
		status code: 400, request id: 
Source:
    Type:	AWSElasticBlockStore (a Persistent Disk resource in AWS)
    VolumeID:	aws://us-east-1d/vol-0010b9fca128a96ce
    FSType:	ext4
    Partition:	0
    ReadOnly:	false
Events:
  FirstSeen	LastSeen	Count	From				SubobjectPath	Type		Reason		Message
  ---------	--------	-----	----				-------------	--------	------		-------
  6m		6m		1	{persistentvolume-controller }			Warning		VolumeFailedDelete	Delete of volume "pvc-90480aca-5bc5-11e7-8e36-0e21840058f8" failed: error deleting EBS volumes: VolumeInUse: Volume vol-0010b9fca128a96ce is currently attached to i-0df6b59f16561f59b
		status code: 400, request id: 

===============================================================================
On the node(after delete pvc/pod),
# df | grep aws ; mount | grep aws
/dev/xvdbb               999320    2564    927944   1% /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/aws-ebs/mounts/aws/us-east-1d/vol-0010b9fca128a96ce
/dev/xvdbb on /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/aws-ebs/mounts/aws/us-east-1d/vol-0010b9fca128a96ce type ext4 (rw,relatime,seclabel,data=ordered)

==============================================================================

Comment 15 Liang Xia 2017-06-28 06:15:15 UTC
Steps used in #comment 14,

1. Set up a CONTAINERISED cluster.
2. Create a pvc using below file with name changed to pvc1.
https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/persistent-volumes/misc/pod.yaml
3. Create a pod using below file with claimName changed to pvc1.
https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/persistent-volumes/misc/pod.yaml
4. After pod is running, restart atomic-openshift-node service.
systemctl restart atomic-openshift-node
5. Delete pod and pvc.
6. Check pv status.

Comment 17 Vivek Goyal 2017-06-28 12:30:45 UTC
(In reply to Liang Xia from comment #14)
>  Package version: docker-1.12.6-28.git1398f24.el7.x86_64

This docker version did not have oci-umount plugin. So if oci-umount was supposed to solve this issue, then bug was tested with wrong docker version. 

Latest version seems to be docker-2:1.12.6-32.git88a4867. Try with that and see if problem is fixed or not.

Comment 18 Vivek Goyal 2017-06-28 12:35:45 UTC
(In reply to Praveen Varma from comment #13)
> @Vivek- We have a situation here with regards to the errata -
> https://errata.devel.redhat.com/advisory/29143 where the release date is
> tomorrow (29th June) and the customer is looking for this for quite some
> time. Customer also escalated this several times and Mustafa, Sudhir, Satish
> and a lot of others from the senior management is directly involved to get
> the issues taken care for the customer. Just received an update from Xiaoli
> Tan that if these bugs are fixed today, we could still have the timely
> release tomorrow.

Praveen, This bug has just been assigned to me and I don't even know if this is docker issue or not.

Anyway, I had implemented oci-umount plugin and it has been suggested that it might fix that issue. QE has tested this with a docker version which did not have oci-umount plugin. So this issue needs to be retested with docker version docker-2:1.12.6-32.git88a4867 and see if issue is fixed or not.

This seems to be openshift errata. I am not sure which docker version is being used with this build. So can't say if this errata will fix the issue or not.

IMO, let first QE test latest docker version and see if that fixes the issue or not. If it does fix the issue, then we need to figure out how this docker version makes into openshift (if it is not already happening).

I think somebody from openshift team needs to provide details what docker version will be used with this errata release.

Comment 19 Scott Dodson 2017-06-28 15:28:54 UTC
Per IRC discussion, we're looking for QE to verify whether or not the problem goes away in docker-1.12.6-32, marking ON_QA

Comment 20 Chao Yang 2017-06-29 03:26:34 UTC
This is passed on below version:
PV is deleted after I delete pod and pvc

[root@ip-172-18-1-168 ~]# oc version
oc v3.4.1.44
kubernetes v1.4.0+776c994
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-18-1-168.ec2.internal
openshift v3.4.1.44
kubernetes v1.4.0+776c994

[root@ip-172-18-1-168 ~]# docker version
Client:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-32.git88a4867.el7.x86_64
 Go version:      go1.7.6
 Git commit:      88a4867/1.12.6
 Built:           Mon Jun 19 17:26:57 2017
 OS/Arch:         linux/amd64

Server:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-32.git88a4867.el7.x86_64
 Go version:      go1.7.6
 Git commit:      88a4867/1.12.6
 Built:           Mon Jun 19 17:26:57 2017
 OS/Arch:         linux/amd64

Comment 22 errata-xmlrpc 2017-07-11 10:47:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1640


Note You need to log in before you can comment on or make changes to this bug.