1690031 – Error deleting EBS volume "x" since volume is currently attached to "y"

Bug 1690031 - Error deleting EBS volume "x" since volume is currently attached to "y"

Summary: Error deleting EBS volume "x" since volume is currently attached to "y"

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	low
Target Milestone:	---
Target Release:	4.4.0
Assignee:	aos-storage-staff@redhat.com
QA Contact:	Qin Ping
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-03-18 15:51 UTC by Corey Daley
Modified:	2021-05-07 08:46 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-05-04 11:12:48 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Occurences of this error in CI from 2019-03-19T12:28 to 2019-03-21T14:53Z (322.59 KB, image/svg+xml) 2019-03-21 23:40 UTC, W. Trevor King	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift origin pull 24311	0	None	closed	Bug 1690031: UPSTREAM: 86250: AWS: Don't report deletion of attached volume as warning	2020-08-07 09:48:46 UTC
Red Hat Product Errata	RHBA-2020:0581	0	None	None	None	2020-05-04 11:13:13 UTC

Description Corey Daley 2019-03-18 15:51:19 UTC

Description of problem:
Error deleting EBS volume "x" since volume is currently attached to "y"


How reproducible:
Sporadic

Actual results:
Failed the ResourceQuota tests

Expected results:
Should pass

Additional info:
Has failed the ResourceQuota [sig-scheduling] tests 3 times in the last 4 days

https://prow.k8s.io/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.0/5801

https://prow.k8s.io/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.0/5772

https://prow.k8s.io/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.0/5740

Comment 1 W. Trevor King 2019-03-21 23:40:34 UTC

Created attachment 1546728 [details]
Occurences of this error in CI from 2019-03-19T12:28 to 2019-03-21T14:53Z

Generated with [1]:

$ deck-build-log-plot 'Error deleting EBS volume .* since volume is currently attached'

This error is currently in 184 out of 816 *-e2e-aws* failures across our whole CI system over the past 48 hours.

[1]: https://github.com/wking/openshift-release/tree/debug-scripts/deck-build-log

Comment 2 Seth Jennings 2019-03-27 21:22:12 UTC

Sending to Storage since they handle EBS attachment and PV management but I don't think the reason for any failure.  I think it is just that the attach/detach controller has not yet removed the EBS volume from the instance.

Comment 3 Matthew Wong 2019-03-27 21:59:54 UTC

The warnings are unrelated and happen to be interspersed with the test failures. Detach and Delete operations are done async by two separate controllers so the message
> Error deleting EBS volume "x" since volume is currently attached to "y"
just means that the pv controller responsible for deleting "x" attempted to do so before the attach-detach controller successfully detached "x" from "y."

Note also that some of these failing tests don't involve PVs, e.g. ResourceQuota should create a ResourceQuota and capture the life of a pod.

I am not sure what other team would be best equipped to look into these failures.

Comment 4 Matthew Wong 2019-03-27 22:03:17 UTC

Since the timeout error is not very descriptive, here is what the code says:
https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/test/e2e/scheduling/resource_quota.go#L75
"// wait for resource quota status to show the expected used resources value"
https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/test/e2e/scheduling/resource_quota.go#L1513

Don't know what component updates resource quota statuses

Comment 7 Ben Parees 2019-07-30 00:28:28 UTC

bump.  This continues to be a recurring failure in our 4.2 release stream.  The failure may be benign, but it makes our error rates noisy and makes it difficult to understand if we have a stable product.

Comment 9 W. Trevor King 2019-07-31 17:03:10 UTC

These failures consume some of our "failed deletion attempt" quotas, increasing the chance that AWS throttling cause noticeable issues.  But I don't have any specific runs I can link demonstrating that connection.

Comment 12 Jan Safranek 2019-12-13 13:59:59 UTC

We've fixed most of the API throttling as part of bug #1698829, it should be much better now. In this bug we focus on the warning event sent by PV controller:

W persistentvolume/pvc-4697ff0a-dd8f-44b0-8ce4-26a504215483 Error deleting EBS volume "vol-0f67031cbcacc5515" since volume is currently attached to "i-0ae5f32d0a0298f99"

Moved corresponding event from Warning to Normal, as it is part of normal operation:
https://github.com/kubernetes/kubernetes/pull/86250

Comment 15 Chao Yang 2020-02-05 09:23:28 UTC

Create 100 volumes and did not hit this issue.
Update the bug status to verified on 
version   4.4.0-0.nightly-2020-02-04-171905   True        False         171m    Cluster version is 4.4.0-0.nightly-2020-02-04-171905

Comment 17 errata-xmlrpc 2020-05-04 11:12:48 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581

Note You need to log in before you can comment on or make changes to this bug.