Bug 1690031

Summary: Error deleting EBS volume "x" since volume is currently attached to "y"
Product: OpenShift Container Platform Reporter: Corey Daley <cdaley>
Component: StorageAssignee: aos-storage-staff <aos-storage-staff>
Storage sub component: Storage QA Contact: Qin Ping <piqin>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: medium CC: aos-bugs, aos-storage-staff, chaoyang, hekumar, hripps, jokerman, jsafrane, lxia, mmccomas, nagrawal, rgudimet, wking
Version: 4.1.0   
Target Milestone: ---   
Target Release: 4.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-04 11:12:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Occurences of this error in CI from 2019-03-19T12:28 to 2019-03-21T14:53Z none

Description Corey Daley 2019-03-18 15:51:19 UTC
Description of problem:
Error deleting EBS volume "x" since volume is currently attached to "y"


How reproducible:
Sporadic

Actual results:
Failed the ResourceQuota tests

Expected results:
Should pass

Additional info:
Has failed the ResourceQuota [sig-scheduling] tests 3 times in the last 4 days

https://prow.k8s.io/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.0/5801

https://prow.k8s.io/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.0/5772

https://prow.k8s.io/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.0/5740

Comment 1 W. Trevor King 2019-03-21 23:40:34 UTC
Created attachment 1546728 [details]
Occurences of this error in CI from 2019-03-19T12:28 to 2019-03-21T14:53Z

Generated with [1]:

$ deck-build-log-plot 'Error deleting EBS volume .* since volume is currently attached'

This error is currently in 184 out of 816 *-e2e-aws* failures across our whole CI system over the past 48 hours.

[1]: https://github.com/wking/openshift-release/tree/debug-scripts/deck-build-log

Comment 2 Seth Jennings 2019-03-27 21:22:12 UTC
Sending to Storage since they handle EBS attachment and PV management but I don't think the reason for any failure.  I think it is just that the attach/detach controller has not yet removed the EBS volume from the instance.

Comment 3 Matthew Wong 2019-03-27 21:59:54 UTC
The warnings are unrelated and happen to be interspersed with the test failures. Detach and Delete operations are done async by two separate controllers so the message
> Error deleting EBS volume "x" since volume is currently attached to "y"
just means that the pv controller responsible for deleting "x" attempted to do so before the attach-detach controller successfully detached "x" from "y."

Note also that some of these failing tests don't involve PVs, e.g. ResourceQuota should create a ResourceQuota and capture the life of a pod.

I am not sure what other team would be best equipped to look into these failures.

Comment 4 Matthew Wong 2019-03-27 22:03:17 UTC
Since the timeout error is not very descriptive, here is what the code says:
https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/test/e2e/scheduling/resource_quota.go#L75
"// wait for resource quota status to show the expected used resources value"
https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/test/e2e/scheduling/resource_quota.go#L1513

Don't know what component updates resource quota statuses

Comment 7 Ben Parees 2019-07-30 00:28:28 UTC
bump.  This continues to be a recurring failure in our 4.2 release stream.  The failure may be benign, but it makes our error rates noisy and makes it difficult to understand if we have a stable product.

Comment 9 W. Trevor King 2019-07-31 17:03:10 UTC
These failures consume some of our "failed deletion attempt" quotas, increasing the chance that AWS throttling cause noticeable issues.  But I don't have any specific runs I can link demonstrating that connection.

Comment 12 Jan Safranek 2019-12-13 13:59:59 UTC
We've fixed most of the API throttling as part of bug #1698829, it should be much better now. In this bug we focus on the warning event sent by PV controller:

W persistentvolume/pvc-4697ff0a-dd8f-44b0-8ce4-26a504215483 Error deleting EBS volume "vol-0f67031cbcacc5515" since volume is currently attached to "i-0ae5f32d0a0298f99"

Moved corresponding event from Warning to Normal, as it is part of normal operation:
https://github.com/kubernetes/kubernetes/pull/86250

Comment 15 Chao Yang 2020-02-05 09:23:28 UTC
Create 100 volumes and did not hit this issue.
Update the bug status to verified on 
version   4.4.0-0.nightly-2020-02-04-171905   True        False         171m    Cluster version is 4.4.0-0.nightly-2020-02-04-171905

Comment 17 errata-xmlrpc 2020-05-04 11:12:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581