Bug 1548628

Summary: Displayed "pv deleted failed" when deleted projects but ebs volume was deleted indeed
Product: OpenShift Container Platform Reporter: Chao Yang <chaoyang>
Component: StorageAssignee: Jan Safranek <jsafrane>
Status: CLOSED ERRATA QA Contact: Chao Yang <chaoyang>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.9.0CC: aos-bugs, aos-storage-staff, bchilds, jsafrane, wsun
Target Milestone: ---   
Target Release: 3.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-07-30 19:09:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
mater log
none
node log none

Description Chao Yang 2018-02-24 03:51:28 UTC
Created attachment 1400102 [details]
mater log

Description of problem:
When deleted projects, displayed dynamic provisioned pv failed to deleted but actually ebs volume is deleted when checked from aws web console

Version-Release number of selected component (if applicable):
oc v3.9.0-0.48.0
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-18-14-168.ec2.internal:8443
openshift v3.9.0-0.48.0
kubernetes v1.9.1+a0ce1bc657

How reproducible:
20%

Steps to Reproduce:
1.Create a new project named 8hdv3
2.Create 10 dynamic pvc
3.Create 10 pods
4.After 10 pods are running, restart atomic-openshift-master-controllers services 3 times.
5.Delete project
6.Two of pv failed to be deleted

pvc-6896077e-1878-11e8-ba71-0e63240230b4   1Gi        RWO            Delete           Failed    8hdv3/dynamic-pvc-4                     gp2                      18h
pvc-73682406-1841-11e8-ba71-0e63240230b4   1Gi        RWO            Delete           Bound     openshift-ansible-service-broker/etcd   gp2                      1d
pvc-b3ac9313-1878-11e8-ba71-0e63240230b4   1Gi        RWO            Delete           Failed    8hdv3/dynamic-pvc-7                     gp2                      18h

oc describe pv pvc-b3ac9313-1878-11e8-ba71-0e63240230b4
Name:            pvc-b3ac9313-1878-11e8-ba71-0e63240230b4
Labels:          failure-domain.beta.kubernetes.io/region=us-east-1
                 failure-domain.beta.kubernetes.io/zone=us-east-1d
Annotations:     kubernetes.io/createdby=aws-ebs-dynamic-provisioner
                 pv.kubernetes.io/bound-by-controller=yes
                 pv.kubernetes.io/provisioned-by=kubernetes.io/aws-ebs
StorageClass:    gp2
Status:          Failed
Claim:           8hdv3/dynamic-pvc-7
Reclaim Policy:  Delete
Access Modes:    RWO
Capacity:        1Gi
Message:         Error deleting EBS volume "vol-085a251610877ae5d" since volume is currently attached to "i-06e16431113508ecd"
Source:
    Type:       AWSElasticBlockStore (a Persistent Disk resource in AWS)
    VolumeID:   aws://us-east-1d/vol-085a251610877ae5d
    FSType:     ext4
    Partition:  0
    ReadOnly:   false
Events:         <none>

7.Check above ebs volume from aws web console, found this volume is not existed.

Actual results:
Two of pv failed to be deleted

Expected results:
All of pv used by 10 pvc should be deleted
Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 1 Chao Yang 2018-02-24 03:55:10 UTC
Created attachment 1400103 [details]
node log

Comment 2 Jan Safranek 2018-02-26 09:08:19 UTC
Fom the log it seems that something (old controller?) deleted EBS volume and did not delete associated PV. New controller failed to delete the volume because it did not exist.

Cloud provider should be idempotent and report success in this case.

Feb 23 10:58:00 ip-172-18-14-168 atomic-openshift-master-controllers: E0223 10:58:00.751176  123899 aws.go:2225] Error describing volume "vol-085a251610877ae5d": "error querying ec2 for volume \"vol-085a251610877ae5d\": \"error listing AWS volumes: \\\"InvalidVolume.NotFound: The volume 'vol-085a251610877ae5d' does not exist.\\\\n\\\\tstatus code: 400, request id: cb327c80-9f10-4462-905b-8c75878fb86c\\\"\""
Feb 23 10:58:00 ip-172-18-14-168 atomic-openshift-master-controllers: E0223 10:58:00.751192  123899 aws.go:2211] error querying ec2 for volume "vol-085a251610877ae5d": "error listing AWS volumes: \"InvalidVolume.NotFound: The volume 'vol-085a251610877ae5d' does not exist.\\n\\tstatus code: 400, request id: cb327c80-9f10-4462-905b-8c75878fb86c\""
Feb 23 10:58:00 ip-172-18-14-168 atomic-openshift-master-controllers: I0223 10:58:00.751200  123899 aws_util.go:57] Error deleting EBS Disk volume aws://us-east-1d/vol-085a251610877ae5d: error querying ec2 for volume "vol-085a251610877ae5d": "error listing AWS volumes: \"InvalidVolume.NotFound: The volume 'vol-085a251610877ae5d' does not exist.\\n\\tstatus code: 400, request id: cb327c80-9f10-4462-905b-8c75878fb86c\""

Comment 3 Jan Safranek 2018-02-27 10:00:00 UTC
Upstream PR: https://github.com/kubernetes/kubernetes/pull/60490

Comment 4 Jan Safranek 2018-03-06 15:45:01 UTC
1.10 PR: https://github.com/openshift/origin/pull/18856

Comment 5 Jan Safranek 2018-03-07 17:56:58 UTC
1.9 PR: https://github.com/openshift/origin/pull/18878

Comment 7 Chao Yang 2018-05-16 09:28:45 UTC
Hi Jan,

Tested this issue as below steps:
1.Create a new project named 8hdv3
2.Create 10 dynamic pvc
3.Create 10 pods
4.After 10 pods are running, restart atomic-openshift-master-controllers services 3 times.
5.Delete project

I found the pv entered into failed status for some seconds and the were deleted.
Not sure it is not right or by design.
Could you help double confirm?


[root@ip-172-18-9-130 ~]# oc describe pv pvc-a4b2509e-58e8-11e8-9cbc-0e566b69876a
Name:            pvc-a4b2509e-58e8-11e8-9cbc-0e566b69876a
Labels:          failure-domain.beta.kubernetes.io/region=us-east-1
                 failure-domain.beta.kubernetes.io/zone=us-east-1d
Annotations:     kubernetes.io/createdby=aws-ebs-dynamic-provisioner
                 pv.kubernetes.io/bound-by-controller=yes
                 pv.kubernetes.io/provisioned-by=kubernetes.io/aws-ebs
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    gp2
Status:          Failed
Claim:           2yow9/dynamic-pvc-1
Reclaim Policy:  Delete
Access Modes:    RWO
Capacity:        1Gi
Node Affinity:   <none>
Message:         Error deleting EBS volume "vol-0c692a1ffaa4fa50b" since volume is currently attached to "i-0d92bac96c2736e25"
Source:
    Type:       AWSElasticBlockStore (a Persistent Disk resource in AWS)
    VolumeID:   aws://us-east-1d/vol-0c692a1ffaa4fa50b
    FSType:     ext4
    Partition:  0
    ReadOnly:   false
Events:
  Type     Reason              Age   From                         Message
  ----     ------              ----  ----                         -------
  Warning  VolumeFailedDelete  1m    persistentvolume-controller  Error deleting EBS volume "vol-0c692a1ffaa4fa50b" since volume is currently attached to "i-0d92bac96c2736e25"

Comment 8 Jan Safranek 2018-05-16 09:41:00 UTC
> I found the pv entered into failed status for some seconds and the were deleted.


This is ok and expected, as long as the PV is deleted in couple of seconds (say 1 minute). It would be a bug if it stays this way forever.

Comment 9 Chao Yang 2018-05-17 03:43:45 UTC
It is passed.
Thanks for Jan's double confirm.

Comment 11 errata-xmlrpc 2018-07-30 19:09:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816