Bug 1690031 - Error deleting EBS volume "x" since volume is currently attached to "y" [NEEDINFO]
Summary: Error deleting EBS volume "x" since volume is currently attached to "y"
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
: 4.4.0
Assignee: Jan Safranek
QA Contact: Chao Yang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-18 15:51 UTC by Corey Daley
Modified: 2020-05-04 11:13 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-04 11:12:48 UTC
Target Upstream Version:
lxia: needinfo? (chaoyang)


Attachments (Terms of Use)
Occurences of this error in CI from 2019-03-19T12:28 to 2019-03-21T14:53Z (322.59 KB, image/svg+xml)
2019-03-21 23:40 UTC, W. Trevor King
no flags Details


Links
System ID Priority Status Summary Last Updated
Github openshift origin pull 24311 None closed Bug 1690031: UPSTREAM: 86250: AWS: Don't report deletion of attached volume as warning 2020-08-07 09:48:46 UTC
Red Hat Product Errata RHBA-2020:0581 None None None 2020-05-04 11:13:13 UTC

Description Corey Daley 2019-03-18 15:51:19 UTC
Description of problem:
Error deleting EBS volume "x" since volume is currently attached to "y"


How reproducible:
Sporadic

Actual results:
Failed the ResourceQuota tests

Expected results:
Should pass

Additional info:
Has failed the ResourceQuota [sig-scheduling] tests 3 times in the last 4 days

https://prow.k8s.io/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.0/5801

https://prow.k8s.io/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.0/5772

https://prow.k8s.io/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.0/5740

Comment 1 W. Trevor King 2019-03-21 23:40:34 UTC
Created attachment 1546728 [details]
Occurences of this error in CI from 2019-03-19T12:28 to 2019-03-21T14:53Z

Generated with [1]:

$ deck-build-log-plot 'Error deleting EBS volume .* since volume is currently attached'

This error is currently in 184 out of 816 *-e2e-aws* failures across our whole CI system over the past 48 hours.

[1]: https://github.com/wking/openshift-release/tree/debug-scripts/deck-build-log

Comment 2 Seth Jennings 2019-03-27 21:22:12 UTC
Sending to Storage since they handle EBS attachment and PV management but I don't think the reason for any failure.  I think it is just that the attach/detach controller has not yet removed the EBS volume from the instance.

Comment 3 Matthew Wong 2019-03-27 21:59:54 UTC
The warnings are unrelated and happen to be interspersed with the test failures. Detach and Delete operations are done async by two separate controllers so the message
> Error deleting EBS volume "x" since volume is currently attached to "y"
just means that the pv controller responsible for deleting "x" attempted to do so before the attach-detach controller successfully detached "x" from "y."

Note also that some of these failing tests don't involve PVs, e.g. ResourceQuota should create a ResourceQuota and capture the life of a pod.

I am not sure what other team would be best equipped to look into these failures.

Comment 4 Matthew Wong 2019-03-27 22:03:17 UTC
Since the timeout error is not very descriptive, here is what the code says:
https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/test/e2e/scheduling/resource_quota.go#L75
"// wait for resource quota status to show the expected used resources value"
https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/test/e2e/scheduling/resource_quota.go#L1513

Don't know what component updates resource quota statuses

Comment 7 Ben Parees 2019-07-30 00:28:28 UTC
bump.  This continues to be a recurring failure in our 4.2 release stream.  The failure may be benign, but it makes our error rates noisy and makes it difficult to understand if we have a stable product.

Comment 9 W. Trevor King 2019-07-31 17:03:10 UTC
These failures consume some of our "failed deletion attempt" quotas, increasing the chance that AWS throttling cause noticeable issues.  But I don't have any specific runs I can link demonstrating that connection.

Comment 12 Jan Safranek 2019-12-13 13:59:59 UTC
We've fixed most of the API throttling as part of bug #1698829, it should be much better now. In this bug we focus on the warning event sent by PV controller:

W persistentvolume/pvc-4697ff0a-dd8f-44b0-8ce4-26a504215483 Error deleting EBS volume "vol-0f67031cbcacc5515" since volume is currently attached to "i-0ae5f32d0a0298f99"

Moved corresponding event from Warning to Normal, as it is part of normal operation:
https://github.com/kubernetes/kubernetes/pull/86250

Comment 15 Chao Yang 2020-02-05 09:23:28 UTC
Create 100 volumes and did not hit this issue.
Update the bug status to verified on 
version   4.4.0-0.nightly-2020-02-04-171905   True        False         171m    Cluster version is 4.4.0-0.nightly-2020-02-04-171905

Comment 17 errata-xmlrpc 2020-05-04 11:12:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.