Created attachment 1400102 [details] mater log Description of problem: When deleted projects, displayed dynamic provisioned pv failed to deleted but actually ebs volume is deleted when checked from aws web console Version-Release number of selected component (if applicable): oc v3.9.0-0.48.0 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ip-172-18-14-168.ec2.internal:8443 openshift v3.9.0-0.48.0 kubernetes v1.9.1+a0ce1bc657 How reproducible: 20% Steps to Reproduce: 1.Create a new project named 8hdv3 2.Create 10 dynamic pvc 3.Create 10 pods 4.After 10 pods are running, restart atomic-openshift-master-controllers services 3 times. 5.Delete project 6.Two of pv failed to be deleted pvc-6896077e-1878-11e8-ba71-0e63240230b4 1Gi RWO Delete Failed 8hdv3/dynamic-pvc-4 gp2 18h pvc-73682406-1841-11e8-ba71-0e63240230b4 1Gi RWO Delete Bound openshift-ansible-service-broker/etcd gp2 1d pvc-b3ac9313-1878-11e8-ba71-0e63240230b4 1Gi RWO Delete Failed 8hdv3/dynamic-pvc-7 gp2 18h oc describe pv pvc-b3ac9313-1878-11e8-ba71-0e63240230b4 Name: pvc-b3ac9313-1878-11e8-ba71-0e63240230b4 Labels: failure-domain.beta.kubernetes.io/region=us-east-1 failure-domain.beta.kubernetes.io/zone=us-east-1d Annotations: kubernetes.io/createdby=aws-ebs-dynamic-provisioner pv.kubernetes.io/bound-by-controller=yes pv.kubernetes.io/provisioned-by=kubernetes.io/aws-ebs StorageClass: gp2 Status: Failed Claim: 8hdv3/dynamic-pvc-7 Reclaim Policy: Delete Access Modes: RWO Capacity: 1Gi Message: Error deleting EBS volume "vol-085a251610877ae5d" since volume is currently attached to "i-06e16431113508ecd" Source: Type: AWSElasticBlockStore (a Persistent Disk resource in AWS) VolumeID: aws://us-east-1d/vol-085a251610877ae5d FSType: ext4 Partition: 0 ReadOnly: false Events: <none> 7.Check above ebs volume from aws web console, found this volume is not existed. Actual results: Two of pv failed to be deleted Expected results: All of pv used by 10 pvc should be deleted Master Log: Node Log (of failed PODs): PV Dump: PVC Dump: StorageClass Dump (if StorageClass used by PV/PVC): Additional info:
Created attachment 1400103 [details] node log
Fom the log it seems that something (old controller?) deleted EBS volume and did not delete associated PV. New controller failed to delete the volume because it did not exist. Cloud provider should be idempotent and report success in this case. Feb 23 10:58:00 ip-172-18-14-168 atomic-openshift-master-controllers: E0223 10:58:00.751176 123899 aws.go:2225] Error describing volume "vol-085a251610877ae5d": "error querying ec2 for volume \"vol-085a251610877ae5d\": \"error listing AWS volumes: \\\"InvalidVolume.NotFound: The volume 'vol-085a251610877ae5d' does not exist.\\\\n\\\\tstatus code: 400, request id: cb327c80-9f10-4462-905b-8c75878fb86c\\\"\"" Feb 23 10:58:00 ip-172-18-14-168 atomic-openshift-master-controllers: E0223 10:58:00.751192 123899 aws.go:2211] error querying ec2 for volume "vol-085a251610877ae5d": "error listing AWS volumes: \"InvalidVolume.NotFound: The volume 'vol-085a251610877ae5d' does not exist.\\n\\tstatus code: 400, request id: cb327c80-9f10-4462-905b-8c75878fb86c\"" Feb 23 10:58:00 ip-172-18-14-168 atomic-openshift-master-controllers: I0223 10:58:00.751200 123899 aws_util.go:57] Error deleting EBS Disk volume aws://us-east-1d/vol-085a251610877ae5d: error querying ec2 for volume "vol-085a251610877ae5d": "error listing AWS volumes: \"InvalidVolume.NotFound: The volume 'vol-085a251610877ae5d' does not exist.\\n\\tstatus code: 400, request id: cb327c80-9f10-4462-905b-8c75878fb86c\""
Upstream PR: https://github.com/kubernetes/kubernetes/pull/60490
1.10 PR: https://github.com/openshift/origin/pull/18856
1.9 PR: https://github.com/openshift/origin/pull/18878
Hi Jan, Tested this issue as below steps: 1.Create a new project named 8hdv3 2.Create 10 dynamic pvc 3.Create 10 pods 4.After 10 pods are running, restart atomic-openshift-master-controllers services 3 times. 5.Delete project I found the pv entered into failed status for some seconds and the were deleted. Not sure it is not right or by design. Could you help double confirm? [root@ip-172-18-9-130 ~]# oc describe pv pvc-a4b2509e-58e8-11e8-9cbc-0e566b69876a Name: pvc-a4b2509e-58e8-11e8-9cbc-0e566b69876a Labels: failure-domain.beta.kubernetes.io/region=us-east-1 failure-domain.beta.kubernetes.io/zone=us-east-1d Annotations: kubernetes.io/createdby=aws-ebs-dynamic-provisioner pv.kubernetes.io/bound-by-controller=yes pv.kubernetes.io/provisioned-by=kubernetes.io/aws-ebs Finalizers: [kubernetes.io/pv-protection] StorageClass: gp2 Status: Failed Claim: 2yow9/dynamic-pvc-1 Reclaim Policy: Delete Access Modes: RWO Capacity: 1Gi Node Affinity: <none> Message: Error deleting EBS volume "vol-0c692a1ffaa4fa50b" since volume is currently attached to "i-0d92bac96c2736e25" Source: Type: AWSElasticBlockStore (a Persistent Disk resource in AWS) VolumeID: aws://us-east-1d/vol-0c692a1ffaa4fa50b FSType: ext4 Partition: 0 ReadOnly: false Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning VolumeFailedDelete 1m persistentvolume-controller Error deleting EBS volume "vol-0c692a1ffaa4fa50b" since volume is currently attached to "i-0d92bac96c2736e25"
> I found the pv entered into failed status for some seconds and the were deleted. This is ok and expected, as long as the PV is deleted in couple of seconds (say 1 minute). It would be a bug if it stays this way forever.
It is passed. Thanks for Jan's double confirm.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1816