Bug 1463844
| Summary: | EBS Volume Cant Be Reused After Node Shutdown | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Steven Walter <stwalter> |
| Component: | Storage | Assignee: | Hemant Kumar <hekumar> |
| Status: | CLOSED ERRATA | QA Contact: | Chao Yang <chaoyang> |
| Severity: | low | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.2.1 | CC: | aos-bugs, bchilds |
| Target Milestone: | --- | ||
| Target Release: | 3.9.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-03-28 14:05:01 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Steven Walter
2017-06-21 21:27:05 UTC
Stopping or shutting down instance does not detaches EBS volumes from a node, so user will have to terminate the node to use same EBS volumes on other nodes. Also merely stopping atomic-openshift-node process does not automatically kill the containers started by it. So IMO - this doesn't look like a bug. If customer wants to test HA, then he will have to terminate the node. So to clarify, there are only 3 ways to detach EBS volumes from a node: 1. Either manually drain the node, which will migrate all running pods from it. This will also detach the EBS volumes. 2. Delete/terminate the pod using the EBS volume. This will automatically detach the associated volume. 3. Terminate the node altogether from AWS console. This will detach all EBS volumes attached to it and these volumes can be used elsewhere. As hemant says the reported behavior is 'working as designed' (for now). The proper recovery in these scenarios was unclear and it we intentionally left it to the admin to perform recovery on the storage after node failure. However, if the scheduler is willing to re-schedule the POD, I think we are obligated to automate corrective steps for the storage. I've created a trello card here: https://trello.com/c/xgyqYpYa/522-volumes-are-not-detached-if-a-node-goes-down As this is 'functioning as intended' and there are work arounds, I'm going to set the priority to 'low' so it doesn't block releases and will schedule the RFE with PM. Ok, that makes sense to me. I have informed the customer as well; if it's ok we can keep this bug open to track that as an RFE instead? I am working on fixing this. upstream PR - https://github.com/kubernetes/kubernetes/pull/55893 Opened PR for backporting to Openshift-3.8: https://github.com/openshift/origin/pull/17544 Fix merged in 3.9. This is failed on [root@ip-172-18-8-172 ~]# oc version oc v3.9.0-0.22.0 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ip-172-18-8-172.ec2.internal:8443 openshift v3.9.0-0.22.0 kubernetes v1.9.1+a0ce1bc657 steps is as below: 1.oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-ex.git 2.oc create -f https://raw.githubusercontent.com/chao007/v3-testfiles/master/persistent-volumes/ebs/dynamic-provisioning-pvc.json 3.oc volume dc/ruby-ex --add --type=persistentVolumeClaim --mount-path=/opt1 --name=v1 --claim-name=ebsc1 4.After pod is running, shutdown the node which pod scheduled to 5.ebs volume is detached from shutdonw node and attached to other node [root@ip-172-18-8-172 ~]# oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE ruby-ex-2-nj46n 1/1 Running 0 33m 10.129.0.11 ip-172-18-8-172.ec2.internal Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0489 |