Description of problem: Automatic deletion of EBS volumes is failing. Sep 19 20:46:54 ip-172-31-6-179.ec2.internal atomic-openshift-master-controllers[45543]: I0919 20:46:54.084796 45543 aws_util.go:51] Error deleting EBS Disk volume aws://us-east-1c/vol-1a662bb6: error deleting EBS volumes: VolumeInUse: Volume vol-1a662bb6 is currently attached to i-0e5a5388 This is causing the released volumes to enter Failed state. [root@dev-preview-int-master-d41bf origin]# oc get pv|grep Failed pv-aws-cxxkd 5Gi RWO Failed jjj/mongodb 12h pv-aws-nj8nc 1Gi RWO Failed jjj/jws-app-mongodb-claim 10h [root@dev-preview-int-master-d41bf origin]# oc describe pv pv-aws-cxxkd Name: pv-aws-cxxkd Labels: failure-domain.beta.kubernetes.io/region=us-east-1 failure-domain.beta.kubernetes.io/zone=us-east-1c Status: Failed Claim: jjj/mongodb Reclaim Policy: Delete Access Modes: RWO Capacity: 5Gi Message: Delete of volume "pv-aws-cxxkd" failed: error deleting EBS volumes: VolumeInUse: Volume vol-5e0f42f2 is currently attached to i-0e5a5388 status code: 400, request id: Source: Type: AWSElasticBlockStore (a Persistent Disk resource in AWS) VolumeID: aws://us-east-1c/vol-5e0f42f2 FSType: ext4 Partition: 0 ReadOnly: false No events. The node log shows that the disk was unmounted from the instance. Sep 19 09:26:06 ip-172-31-14-248.ec2.internal atomic-openshift-node[2881]: I0919 09:26:06.982740 2881 reconciler.go:139] UnmountVolume operation started for volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-5e0f42f2" (spec.Name: "mongodb-data") from pod "bfa6e3bc-7e40-11e6-9392-0a27f5fffde3" (UID: "bfa6e3bc-7e40-11e6-9392-0a27f5fffde3"). Sep 19 09:26:07 ip-172-31-14-248.ec2.internal atomic-openshift-node[2881]: I0919 09:26:06.993698 2881 operation_executor.go:812] UnmountVolume.TearDown succeeded for volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-5e0f42f2" (volume.spec.Name: "mongodb-data") pod "bfa6e3bc-7e40-11e6-9392-0a27f5fffde3" (UID: "bfa6e3bc-7e40-11e6-9392-0a27f5fffde3"). Sep 19 09:26:07 ip-172-31-14-248.ec2.internal atomic-openshift-node[2881]: I0919 09:26:07.084966 2881 reconciler.go:284] UnmountDevice operation started for volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-5e0f42f2" (spec.Name: "pv-aws-cxxkd") Sep 19 09:26:07 ip-172-31-14-248.ec2.internal atomic-openshift-node[2881]: I0919 09:26:07.202760 2881 operation_executor.go:890] UnmountDevice succeeded for volume "kubernetes.io/aws-ebs/aws://us-east-1c/vol-5e0f42f2" (spec.Name: "pv-aws-cxxkd"). But the AWS web console shows the volume is still connected to the instance (state: in-use), and therefore won't be able to be deleted. The controller log shows pages of this message: Sep 19 16:06:24 ip-172-31-6-179.ec2.internal atomic-openshift-master-controllers[5313]: I0919 16:06:24.392142 5313 controller.go:1079] isVolumeReleased[pv-aws-cxxkd]: volume is released Sep 19 16:06:54 ip-172-31-6-179.ec2.internal atomic-openshift-master-controllers[5313]: I0919 16:06:54.392306 5313 controller.go:1079] isVolumeReleased[pv-aws-cxxkd]: volume is released Sep 19 16:07:24 ip-172-31-6-179.ec2.internal atomic-openshift-master-controllers[5313]: I0919 16:07:24.393297 5313 controller.go:1079] isVolumeReleased[pv-aws-cxxkd]: volume is released Version-Release number of selected component (if applicable): atomic-openshift-3.3.0.31-1.git.0.aede597.el7.x86_64 How reproducible: Every time. Steps to Reproduce: 1. Add this to node-config.yaml. kubeletArguments: enable-controller-attach-detach: - 'true' 2. Create an app containing a pvc. 'oc new-app cakephp-mysql-example -n <project>'. 3. Delete the app and check the PV status. PV will be "Failed". Actual results: Expected results: Additional info:
Bumping up the severity as this is a blocker for DevPreview upgrade to 3.3. This is currently being investigated by Mark and Stefanie. We will pull in Paul Morie or folks from the storage team once we gather more data.
*** Bug 1377548 has been marked as a duplicate of this bug. ***
I did not reproduce this on the ocp [root@ip-172-18-13-195 ~]# openshift version openshift v3.3.0.31 kubernetes v1.3.0+52492b4 etcd 2.3.0+git 1.Create pvc and pod,after pod is running, delete the pod and pvc PV enters Failed state but will be deleted after some time. https://bugzilla.redhat.com/show_bug.cgi?id=1371375 2.Create pvc and pod,after pod is running, delete pvc, pv will be failed and ebs volume is "in-use", but after delete pod, pv will be deleted and ebs volume is deleted from web console
With: atomic-openshift-node-3.3.0.31-1.git.0.aede597.el7.x86_64 atomic-openshift-master-3.3.0.31-1.git.0.aede597.el7.x86_64 and: enable-controller-attach-detach: - 'true' I wasn't able to reproduce this. I followed the same steps from https://bugzilla.redhat.com/show_bug.cgi?id=1377548 1. oc new-app mongodb-persistent 2. oc edit pvc mongodb, add annotation to make it auto provision a 5Gi ebs volume 3. Verify pods are ready: NAME READY STATUS RESTARTS AGE mongodb-1-yon7y 1/1 Running 0 3m 4. oc delete pvc --all 5. Immediately the PV became 'FAILED' 6. oc delete project --all 7. oc get pv pvc-2b1259cd-7fc6-11e6-9027-0ed4ad3f44b5 5Gi RWO Failed jhou/mongodb 5m 8. Wait a minute and get pv again, pv was already deleted. Also volume was deleted which I could validate from ec2 console
When you delete a pod and PVC at the same time it takes some time for the pod to detach its volumes and during this period the volume cannot be deleted (hence the error). You must wait even for a few minutes for the volumes to get detached and then they should be automatically deleted. There is a "fix" in https://github.com/kubernetes/kubernetes/pull/31978, which does not mark the volumes as Failed and leaves them Released instead, making the process less confusing - it's a normal operation, not an error. So, be patient and let Kubernetes to detach everything, the volume *should* be deleted eventually. Let me know if the volume is there even after ~5 minutes and provide full master logs, preferably with log level 5.
I just did another batch of 20 PV-backed app-creates and then deleted all 20 of the projects. I saw the controller detach and clean up all the volumes as it should, this time. So it's no longer an urgent issue. I think the behavior I was seeing previously had to do with bz #1377483, where the controllers were offline intermittently and therefore unable to perform the detach operation. It seems that if the controllers are offline, we miss our chance to get the volumes detached, because the original Failed PVs are still present after more than a day: [root@dev-preview-int-master-968f8 ~]# oc get pv |grep Failed pv-aws-4ln9z 1Gi RWO Failed dakinitest60/mysql 1d pv-aws-9kgnp 1Gi RWO Failed ttt/mongodb 1d pv-aws-cxxkd 5Gi RWO Failed jjj/mongodb 2d pv-aws-nj8nc 1Gi RWO Failed jjj/jws-app-mongodb-claim 2d 'oc describe' shows a similar message for each of them: Message: Delete of volume "pv-aws-4ln9z" failed: error deleting EBS volumes: VolumeInUse: Volume vol-4b8ac4e7 is currently attached to i-095a538f And the controllers attempt to delete the volume, but no longer attempt to detach it: Sep 21 02:16:28 ip-172-31-6-178.ec2.internal atomic-openshift-master-controllers[86332]: I0921 02:16:28.468300 86332 controller.go:995] deleteVolumeOperation [pv-aws-4ln9z] started Sep 21 02:16:28 ip-172-31-6-178.ec2.internal atomic-openshift-master-controllers[86332]: I0921 02:16:28.472444 86332 controller.go:1079] isVolumeReleased[pv-aws-4ln9z]: volume is released Sep 21 02:16:28 ip-172-31-6-178.ec2.internal atomic-openshift-master-controllers[86332]: I0921 02:16:28.472477 86332 controller.go:1086] doDeleteVolume [pv-aws-4ln9z] Sep 21 02:16:28 ip-172-31-6-178.ec2.internal atomic-openshift-master-controllers[86332]: I0921 02:16:28.615384 86332 controller.go:1017] deletion of volume "pv-aws-4ln9z" failed: Delete of volume "pv-aws-4ln9z" failed: error deleting EBS volumes: VolumeInUse: Volume vol-4b8ac4e7 is currently attached to i-095a538f Sep 21 02:16:28 ip-172-31-6-178.ec2.internal atomic-openshift-master-controllers[86332]: I0921 02:16:28.615398 86332 controller.go:626] updating updateVolumePhaseWithEvent[pv-aws-4ln9z]: set phase Failed Sep 21 02:16:28 ip-172-31-6-178.ec2.internal atomic-openshift-master-controllers[86332]: I0921 02:16:28.615415 86332 controller.go:629] updating updateVolumePhaseWithEvent[pv-aws-4ln9z]: phase Failed already set
Newly started controller manager should catch up and continue where the crashed one stopped. It may take some time to gather all the necessary info, however it should detach and delete everything eventually.
Anyway, I'm closing this bug, please reopen it if you're able to reproduce it again (and please gather logs!).
This issue has come up again after installing OCP 3.3.0.33 in prod. Again, we're having a problem where the controllers are intermittently inaccessible. However, despite the controllers going offline, I think there should be some kind of retry logic that will attempt to detach the EBS volumes repeatedly when they get stuck like this. The problem seems to be that if a volume fails to detach initially, the detach operation is never retried. I currently have 54 volumes in prod that are in a Failed state because they're not being detached from their instances. Please let me know what logs would be useful in debugging this. [root@preview-master-afbb8 ~]# oc get pv |grep Fail pv-aws-0cr3v 1Gi RWO Failed teste/postgresql 60d pv-aws-10ri1 1Gi RWO Failed drivenow/mysql 29d pv-aws-15v1s 1Gi RWO Failed development/mysql 15d pv-aws-1vbtv 1Gi RWO Failed vdinh-prod-1/eap-app-mysql-claim 4d pv-aws-34ekb 1Gi RWO Failed mmj2/postgresql 17d pv-aws-3rbml 1Gi RWO Failed theblocks-example/mongodb 16d pv-aws-3xj0z 1Gi RWO Failed naveentest/jenkins 60d pv-aws-5n59j 1Gi RWO Failed openshiftnodemongo/mongodb 27d pv-aws-6rc24 1Gi RWO Failed dbtudo/postgresql 12d pv-aws-71jl0 1Gi RWO Failed mydb/postgresql 29d pv-aws-83jhj 1Gi RWO Failed my-cloud/pg 16d pv-aws-94cu5 1Gi RWO Failed ganadomas/eap-app-mysql-claim 7d pv-aws-9fwva 1Gi RWO Failed jbibrest/mysql 101d pv-aws-b58b1 1Gi RWO Failed mycloud/mysql 21d pv-aws-ba3dt 1Gi RWO Failed azizbw/postgresql 14d pv-aws-c24x0 1Gi RWO Failed letpressknow/jenkins 11d pv-aws-cgsxq 1Gi RWO Failed my-project-my/mongodb 61d pv-aws-etodp 1Gi RWO Failed shift-test/mongodb 16d pv-aws-fe49y 1Gi RWO Failed mineduc/eap-app-mysql-claim 5d pv-aws-g1hed 1Gi RWO Failed yledemo/postgresql 30d pv-aws-hsend 1Gi RWO Failed testesomething/jenkins 31d pv-aws-i5b85 1Gi RWO Failed myflights/mongodb 98d pv-aws-j7xqv 1Gi RWO Failed dakinitest2/mysql 23h pv-aws-jrha9 1Gi RWO Failed ivinnikov-example/mongodb 26d pv-aws-k1del 1Gi RWO Failed supertiendas-turpial/mysql 9d pv-aws-krqm9 1Gi RWO Failed cfg-invoice/mysql 61d pv-aws-lbpzv 1Gi RWO Failed zenifyxbot/mongodb 16d pv-aws-lhd73 1Gi RWO Failed powelldean-m1/database 68d pv-aws-lhl1n 1Gi RWO Failed for-sale/mongodb 30d pv-aws-m1tbs 1Gi RWO Failed dockertest/mongodb 31d pv-aws-m840v 1Gi RWO Failed shirleycharlin-demo/mysql 20d pv-aws-ma0nx 1Gi RWO Failed fixi/mysql 65d pv-aws-nccjm 1Gi RWO Failed hello-openshift/jenkins 11d pv-aws-ojz2r 1Gi RWO Failed worldportfolio/worldportfolio 15d pv-aws-pjt2e 1Gi RWO Failed eigh-tk/jenkins 32d pv-aws-q1tcw 1Gi RWO Failed sasskinn12/mysql 25d pv-aws-qbn0m 1Gi RWO Failed tiendafruta/mysql 31d pv-aws-qxf3m 1Gi RWO Failed epiclegends-musicbot/mysql 29d pv-aws-qxyvx 1Gi RWO Failed fclm-test-newgen/mongodb 2h pv-aws-rfset 1Gi RWO Failed rates/ratesjobs-mongodb-claim 37d pv-aws-s1p58 1Gi RWO Failed malodmor/jenkins 30d pv-aws-s4d3s 1Gi RWO Failed wemaspace/postgresql 14d pv-aws-sahyb 1Gi RWO Failed project1wordpress/mysql 29d pv-aws-siatf 1Gi RWO Failed jw-001/mongodb 30d pv-aws-sqxpi 1Gi RWO Failed pjm-test/mariadb 28d pv-aws-t2u8l 1Gi RWO Failed staging/mariadb 28d pv-aws-tegp0 1Gi RWO Failed restaurante/mysql 29d pv-aws-v80fv 1Gi RWO Failed examen/mysql 62d pv-aws-wbje7 1Gi RWO Failed job-app-db/mongodb 14d pv-aws-wxdw6 1Gi RWO Failed masco/mysql 29d pv-aws-x5sd5 1Gi RWO Failed djsispan/postgresql 31d pv-aws-xk5n0 1Gi RWO Failed nights-out/mysql 34d pv-aws-yr6kd 1Gi RWO Failed rockettest/mongodb 70d pv-aws-zkc9e 1Gi RWO Failed zhangqi-test/jenkins 33d [root@preview-master-afbb8 ~]# oc get pv |grep Failed -c 54
Created attachment 1207376 [details] logs of the failed PVs' state
Detach operation should be retried periodically when the volume is not needed. "When not needed" is important - please make sure that all pods that needed the volume are terminated. We need controller-manager and kubelet logs (i.e. openshift master + openshift node), with as high log level as possible. Several GB of logs is better than missing an important message.
I have a list of Failed PVs from yesterday and another list of them today. I took 5 random PVs from the list to use for testing. I made sure they were on today's list and yesterday's list. These 5 PVs have no associated projects anymore. So certainly no PVC or pod remaining. Our PV provisioner pod is marking them Released after some time, and then they become Failed again later. So they're constantly cycling between Failed and Released. [root@preview-master-afbb8 ~]# date; oc get pv pv-aws-zkc9e pv-aws-yr6kd pv-aws-xk5n0 pv-aws-x5sd5 pv-aws-wbje7 Wed Oct 5 15:26:48 UTC 2016 NAME CAPACITY ACCESSMODES STATUS CLAIM REASON AGE pv-aws-zkc9e 1Gi RWO Failed zhangqi-test/jenkins 34d pv-aws-yr6kd 1Gi RWO Failed rockettest/mongodb 70d pv-aws-xk5n0 1Gi RWO Failed nights-out/mysql 35d pv-aws-x5sd5 1Gi RWO Failed djsispan/postgresql 32d pv-aws-wbje7 1Gi RWO Failed job-app-db/mongodb 15d [root@preview-master-afbb8 ~]# date; oc get pv pv-aws-zkc9e pv-aws-yr6kd pv-aws-xk5n0 pv-aws-x5sd5 pv-aws-wbje7 Wed Oct 5 15:27:43 UTC 2016 NAME CAPACITY ACCESSMODES STATUS CLAIM REASON AGE pv-aws-zkc9e 1Gi RWO Released zhangqi-test/jenkins 34d pv-aws-yr6kd 1Gi RWO Released rockettest/mongodb 70d pv-aws-xk5n0 1Gi RWO Released nights-out/mysql 35d pv-aws-x5sd5 1Gi RWO Released djsispan/postgresql 32d pv-aws-wbje7 1Gi RWO Released job-app-db/mongodb 15d [root@preview-master-afbb8 ~]# oc get project zhangqi-test Error from server: namespaces "zhangqi-test" not found [root@preview-master-afbb8 ~]# oc get project rockettest Error from server: namespaces "rockettest" not found [root@preview-master-afbb8 ~]# oc get project nights-out Error from server: namespaces "nights-out" not found [root@preview-master-afbb8 ~]# oc get project djsispan Error from server: namespaces "djsispan" not found [root@preview-master-afbb8 ~]# oc get project job-app-db Error from server: namespaces "job-app-db" not found Since the pods don't exist, I don't have a way to look up what node they were running on. So I don't think I can provide node logs.
I just talked with Mark Turansky who said he fixed this issue in the latest PV provisioner pod. I'm going to try deploying that in prod and see if it clears up the issue.
With the latest version of the PV provisioner pod (oso-volume-provisioner:v3.3.0.0-4), the PVs are no longer cycling between Failed and Released. They're now consistently in Failed state, 60 of them.
It's been an hour since the latest PV provisioner pod was deployed, and I still see the 60 PVs in Failed state. The vast majority of the projects that used to contain the PVCs for these PVs are gone. I think that would suggest the controller is failing to detach these. In our API logs I'm seeing about 30 requests per second attempting to delete the volumes and failing due to "VolumeInUse". But I've only seen a handful of DetachVolume attempts in the past 3 hours. All of them succeeded. Which means it's not attempting to detach the Failed PVs. [root@preview-master-afbb8 ~]# oc get pv |grep Failed -c 61 [root@preview-master-afbb8 ~]# for project in $(oc get pv |awk '/Failed/ {print $5}'| cut -d/ -f1); do oc get project $project ; done Error from server: namespaces "teste" not found Error from server: namespaces "drivenow" not found Error from server: namespaces "development" not found Error from server: namespaces "vdinh-prod-1" not found Error from server: namespaces "mmj2" not found Error from server: namespaces "hocctnss-project" not found Error from server: namespaces "theblocks-example" not found Error from server: namespaces "naveentest" not found Error from server: namespaces "openshiftnodemongo" not found Error from server: namespaces "dbtudo" not found Error from server: namespaces "mydb" not found NAME DISPLAY NAME STATUS my-cloud My Cloud Active Error from server: namespaces "jenk" not found Error from server: namespaces "ganadomas" not found Error from server: namespaces "jbibrest" not found Error from server: namespaces "mycloud" not found Error from server: namespaces "azizbw" not found NAME DISPLAY NAME STATUS pueblosmagicos PueblosMagicos Active Error from server: namespaces "letpressknow" not found Error from server: namespaces "my-project-my" not found Error from server: namespaces "deneme" not found Error from server: namespaces "shift-test" not found NAME DISPLAY NAME STATUS mineduc mineduc Active Error from server: namespaces "yledemo" not found Error from server: namespaces "testesomething" not found Error from server: namespaces "myflights" not found Error from server: namespaces "dakinitest2" not found Error from server: namespaces "ivinnikov-example" not found Error from server: namespaces "supertiendas-turpial" not found Error from server: namespaces "cfg-invoice" not found Error from server: namespaces "zenifyxbot" not found Error from server: namespaces "powelldean-m1" not found Error from server: namespaces "for-sale" not found Error from server: namespaces "dockertest" not found Error from server: namespaces "shirleycharlin-demo" not found Error from server: namespaces "fixi" not found Error from server: namespaces "hello-openshift" not found NAME DISPLAY NAME STATUS worldportfolio Weltportfolio Active Error from server: namespaces "eigh-tk" not found Error from server: namespaces "sasskinn12" not found Error from server: namespaces "samwebapp" not found Error from server: namespaces "tiendafruta" not found Error from server: namespaces "epiclegends-musicbot" not found Error from server: namespaces "fclm-test-newgen" not found Error from server: namespaces "rates" not found Error from server: namespaces "malodmor" not found Error from server: namespaces "wemaspace" not found Error from server: namespaces "project1wordpress" not found Error from server: namespaces "jw-001" not found Error from server: namespaces "pjm-test" not found Error from server: namespaces "staging" not found Error from server: namespaces "restaurante" not found Error from server: namespaces "examen" not found Error from server: namespaces "jy-project" not found Error from server: namespaces "job-app-db" not found Error from server: namespaces "masco" not found Error from server: namespaces "djsispan" not found Error from server: namespaces "nights-out" not found NAME DISPLAY NAME STATUS sistcoop Active Error from server: namespaces "rockettest" not found Error from server: namespaces "zhangqi-test" not found
Thanks for the logs! They're big but valuable! It seems that there is something wrong in the attach controller and AWS - after controller manager crash it does not seem to restore list of attached volumes correctly. The master crashes every couple of minutes, it's possible that it does not even have chance to restore its state. I'll dig into this. If possible, can you try to find a stable reproducer in the meantime? Without gigabytes of logs and with as small number steps as possible? /bin/kill might be helpful.
Ok, I'll try reproducing in STG. This seems like a fairly rare case to hit, since even Prod only gets about 2 failures per day. But I'll see what I can do.
I reproduced something that has very similar symptoms and filled https://github.com/kubernetes/kubernetes/issues/34242. It should be much better if master did not crash.
Nice! Thanks for that. I have a bug for the master crashes here: https://bugzilla.redhat.com/show_bug.cgi?id=1381745 and I believe the crashes are happening as a result of all the AWS throttling going on here: https://bugzilla.redhat.com/show_bug.cgi?id=1367229 I was able to significantly reduce the amount of AWS throttling by manually detaching about 80 EBS volumes this week. So that allowed the DeleteVolume calls to succeed instead of hammering the API many times per second. Also, because of that, we're down from 100+ controller crashes per day to about 15. So the impact of this particular bug isn't as serious as it was yesterday. I think it would be ok to reduce the severity of this bug.
I tried just about every kill signal out there and I wasn't able to reproduce the exact state of the controller restarts we see in prod (specifically, this): Oct 04 20:51:33 ip-172-31-10-25.ec2.internal atomic-openshift-master-controllers[110809]: F1004 20:51:33.797506 110809 start_master.go:568] Controller graceful shutdown requested Oct 04 20:51:34 ip-172-31-10-25.ec2.internal systemd[1]: atomic-openshift-master-controllers.service: main process exited, code=exited, status=255/n/a So I couldn't trigger the graceful shutdown. But shutting off the controllers seemed to work just as well. 1. From the master, I created a bunch of PV-backed apps. for num in {0..20}; do oadm new-project dakinitest${num}; oc new-app cakephp-mysql-example -n dakinitest${num}; sleep 1s; done 2. Made sure they were all running. for num in {0..20}; do oc get pods -n dakinitest${num}|grep mysql-1|grep -v deploy; done 3. Shut off the controller service across all 3 masters. opssh -i -l root -c dev-preview-stg --v3 -t master "systemctl stop atomic-openshift-controllers" 4. Deleted the projects. for num in {0..20}; do oc delete project dakinitest${num}; done 5. Started the controllers back up. opssh -i -l root -c dev-preview-stg --v3 -t master "systemctl stop atomic-openshift-controllers" 6. Then I looked at which PVs had failed, and why they failed. oc get pv |awk '/Failed/ {print $1}'|xargs oc describe pv Any failures always said "error deleting EBS volumes: VolumeInUse". I found I could get more failures when I restarted the controller process a few times instead of leaving it running the whole time after the projects were deleted.
Thanks for the reproducer! Newly started Kubernetes controller manager does not detach volumes when a pod is deleted when the controller manager is down. Talking to upstream, there is no easy solution for it and it will take some time to fix it, see link in comment #19. The best thing to do is not to let the manager crash and manually detach+delete volumes when it does.
I added a workaround here for any Ops people who need to detach the volumes manually. https://github.com/openshift/openshift-tools/blob/prod/ansible/playbooks/adhoc/detach_failed_volumes.yml
Tomas is working on a fix.
I have tested this on below version with a HA OCP, all the pv can be deleted without showing "Failed" status. # oc version openshift v3.5.5.9 kubernetes v1.5.2+43a9be4 # for num in {0..18}; do oadm new-project pvtest${num}; oc new-app mysql-persistent -n pvtest${num}; oc patch pvc mysql -p '{"metadata":{"annotations":{"volume.alpha.kubernetes.io/storage-class":"foo"}}}' -n pvtest${num}; sleep 1s; done # oc get pods --all-namespaces=true NAMESPACE NAME READY STATUS RESTARTS AGE pvtest0 mysql-1-7mn3c 1/1 Running 0 7m pvtest1 mysql-1-ghcvw 1/1 Running 0 7m pvtest10 mysql-1-2dqtj 1/1 Running 0 4m pvtest11 mysql-1-90dvz 1/1 Running 0 4m pvtest12 mysql-1-bgldj 1/1 Running 0 3m pvtest13 mysql-1-h2vch 1/1 Running 0 2m pvtest14 mysql-1-nm6p7 1/1 Running 0 1m pvtest15 mysql-1-b9qxb 1/1 Running 0 1m pvtest16 mysql-1-lwsxn 1/1 Running 0 1m pvtest17 mysql-1-215ws 1/1 Running 0 1m pvtest18 mysql-1-dhvw1 1/1 Running 0 1m pvtest2 mysql-1-5frfp 1/1 Running 0 7m pvtest3 mysql-1-44wc3 1/1 Running 0 6m pvtest4 mysql-1-gjcs5 1/1 Running 0 6m pvtest5 mysql-1-1jwmx 1/1 Running 0 6m pvtest6 mysql-1-rq3tq 1/1 Running 0 4m pvtest7 mysql-1-bkxtl 1/1 Running 0 4m pvtest8 mysql-1-09s44 1/1 Running 0 4m pvtest9 mysql-1-qlqhq 1/1 Running 0 4m # systemctl stop atomic-openshift-master-controllers.service (On each master) # for num in {0..18}; do oc delete project pvtest${num}; done # oc get pv NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM REASON AGE pvc-2cfac421-2a48-11e7-b646-0ea5d1077080 1Gi RWO Delete Released pvtest6/mysql 12m pvc-2ea2bf46-2a48-11e7-b646-0ea5d1077080 1Gi RWO Delete Released pvtest7/mysql 12m pvc-30526917-2a48-11e7-b646-0ea5d1077080 1Gi RWO Delete Released pvtest8/mysql 12m pvc-31f67d17-2a48-11e7-b646-0ea5d1077080 1Gi RWO Delete Released pvtest9/mysql 12m pvc-33a1d848-2a48-11e7-b646-0ea5d1077080 1Gi RWO Delete Released pvtest10/mysql 12m pvc-355c78ad-2a48-11e7-b646-0ea5d1077080 1Gi RWO Delete Released pvtest11/mysql 12m pvc-36ffc5f2-2a48-11e7-b646-0ea5d1077080 1Gi RWO Delete Released pvtest12/mysql 12m pvc-7bf096bc-2a48-11e7-b646-0ea5d1077080 1Gi RWO Delete Released pvtest13/mysql 10m pvc-7d9829f8-2a48-11e7-b646-0ea5d1077080 1Gi RWO Delete Released pvtest14/mysql 10m pvc-7f4681a0-2a48-11e7-b646-0ea5d1077080 1Gi RWO Delete Released pvtest15/mysql 10m pvc-80e55303-2a48-11e7-b646-0ea5d1077080 1Gi RWO Delete Released pvtest16/mysql 10m pvc-82a5fe68-2a48-11e7-b646-0ea5d1077080 1Gi RWO Delete Released pvtest17/mysql 10m pvc-84453982-2a48-11e7-b646-0ea5d1077080 1Gi RWO Delete Released pvtest18/mysql 10m pvc-c6a365a4-2a47-11e7-b646-0ea5d1077080 1Gi RWO Delete Released pvtest0/mysql 15m pvc-c82e4c1f-2a47-11e7-b646-0ea5d1077080 1Gi RWO Delete Released pvtest1/mysql 15m pvc-c9bd786f-2a47-11e7-b646-0ea5d1077080 1Gi RWO Delete Released pvtest2/mysql 15m pvc-cb56b8f0-2a47-11e7-b646-0ea5d1077080 1Gi RWO Delete Released pvtest3/mysql 15m pvc-cd011db3-2a47-11e7-b646-0ea5d1077080 1Gi RWO Delete Released pvtest4/mysql 15m pvc-cea4b0f4-2a47-11e7-b646-0ea5d1077080 1Gi RWO Delete Released pvtest5/mysql 15m # systemctl start atomic-openshift-master-controllers.service (on each master) And then wait for several mins, all the PVs have been deleted without showing "error deleting EBS volumes: VolumeInUse"
I moved this bug to ON_QA assuming this bug was for Online. Moving it back to MODIFIED for someone to chime in with the proper process for the status transition for OCP bugs.
This bug is fixed and I have already verified in comment 32. Thanks.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1425