Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1463844 - EBS Volume Cant Be Reused After Node Shutdown
EBS Volume Cant Be Reused After Node Shutdown
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage (Show other bugs)
3.2.1
Unspecified Unspecified
unspecified Severity low
: ---
: 3.9.0
Assigned To: Hemant Kumar
chaoyang
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-21 17:27 EDT by Steven Walter
Modified: 2018-03-28 10:05 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-03-28 10:05:01 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0489 None None None 2018-03-28 10:05 EDT

  None (edit)
Description Steven Walter 2017-06-21 17:27:05 EDT
Description of problem:

Created a pod in a project, the pod claims PVC backed by EBS volume, everything works.

The node where the pod was running with the EBS attached was stopped for testing; the pod tries to start on other node in the cluster but the EBS was not detached from the stopped node and pod will thus not start.

Version-Release number of selected component (if applicable):
3.5

How reproducible:
Unconfirmed

Seems similar to https://bugzilla.redhat.com/show_bug.cgi?id=1427807 but none of the associated logs appear in the environment so seems like potentially a different issue. Uploading logs and other data shortly.
Comment 3 Hemant Kumar 2017-06-22 14:00:51 EDT
Stopping or shutting down instance does not detaches EBS volumes from a node, so user will have to terminate the node to use same EBS volumes on other nodes. 

Also merely stopping atomic-openshift-node process does not automatically kill the containers started by it. 

So IMO - this doesn't look like a bug. If customer wants to test HA, then he will have to terminate the node.
Comment 4 Hemant Kumar 2017-06-22 14:03:28 EDT
So to clarify, there are only 3 ways to detach EBS volumes from a node:

1. Either manually drain the node, which will migrate all running pods from it. This will also detach the EBS volumes.
2. Delete/terminate the pod using the EBS volume. This will automatically detach the associated volume.
3. Terminate the node altogether from AWS console. This will detach all EBS volumes attached to it and these volumes can be used elsewhere.
Comment 5 Bradley Childs 2017-06-22 14:25:54 EDT
As hemant says the reported behavior is 'working as designed' (for now).  The proper recovery in these scenarios was unclear and it we intentionally left it to the admin to perform recovery on the storage after node failure.

However, if the scheduler is willing to re-schedule the POD, I think we are obligated to automate corrective steps for the storage. I've created a trello card here:

https://trello.com/c/xgyqYpYa/522-volumes-are-not-detached-if-a-node-goes-down

As this is 'functioning as intended' and there are work arounds, I'm going to set the priority to 'low' so it doesn't block releases and will schedule the RFE with PM.
Comment 6 Steven Walter 2017-06-22 14:34:51 EDT
Ok, that makes sense to me. I have informed the customer as well; if it's ok we can keep this bug open to track that as an RFE instead?
Comment 7 Hemant Kumar 2017-11-16 10:38:45 EST
I am working on fixing this.
Comment 8 Hemant Kumar 2017-11-16 11:19:49 EST
upstream PR - https://github.com/kubernetes/kubernetes/pull/55893
Comment 9 Hemant Kumar 2017-12-04 20:41:25 EST
Opened PR for backporting to Openshift-3.8:

https://github.com/openshift/origin/pull/17544
Comment 10 Hemant Kumar 2018-01-18 17:56:19 EST
Fix merged in 3.9.
Comment 12 chaoyang 2018-01-23 02:26:56 EST
This is failed on 
[root@ip-172-18-8-172 ~]# oc version
oc v3.9.0-0.22.0
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-18-8-172.ec2.internal:8443
openshift v3.9.0-0.22.0
kubernetes v1.9.1+a0ce1bc657

steps is as below:
1.oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-ex.git
2.oc create -f https://raw.githubusercontent.com/chao007/v3-testfiles/master/persistent-volumes/ebs/dynamic-provisioning-pvc.json
3.oc volume dc/ruby-ex --add --type=persistentVolumeClaim --mount-path=/opt1 --name=v1 --claim-name=ebsc1
4.After pod is running, shutdown the node which pod scheduled to
5.ebs volume is detached from shutdonw node and attached to other node

[root@ip-172-18-8-172 ~]# oc get pods -o wide
NAME              READY     STATUS    RESTARTS   AGE       IP            NODE
ruby-ex-2-nj46n   1/1       Running   0          33m       10.129.0.11   ip-172-18-8-172.ec2.internal
Comment 15 errata-xmlrpc 2018-03-28 10:05:01 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489

Note You need to log in before you can comment on or make changes to this bug.