Bug 1690031

Summary:

Error deleting EBS volume "x" since volume is currently attached to "y"

Product:

OpenShift Container Platform

Reporter:

Corey Daley <cdaley>

Component:

Storage

Assignee:

aos-storage-staff <aos-storage-staff>

Storage sub component:

Storage

QA Contact:

Qin Ping <piqin>

Status:

CLOSED ERRATA

Docs Contact:

Severity:

low

Priority:

medium

CC:

aos-bugs, aos-storage-staff, chaoyang, hekumar, hripps, jokerman, jsafrane, lxia, mmccomas, nagrawal, rgudimet, wking

Version:

4.1.0

Target Milestone:

---

Target Release:

4.4.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2020-05-04 11:12:48 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Occurences of this error in CI from 2019-03-19T12:28 to 2019-03-21T14:53Z	none

Description Corey Daley 2019-03-18 15:51:19 UTC

Description of problem:
Error deleting EBS volume "x" since volume is currently attached to "y"


How reproducible:
Sporadic

Actual results:
Failed the ResourceQuota tests

Expected results:
Should pass

Additional info:
Has failed the ResourceQuota [sig-scheduling] tests 3 times in the last 4 days

https://prow.k8s.io/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.0/5801

https://prow.k8s.io/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.0/5772

https://prow.k8s.io/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.0/5740

Comment 1 W. Trevor King 2019-03-21 23:40:34 UTC

Created attachment 1546728 [details]
Occurences of this error in CI from 2019-03-19T12:28 to 2019-03-21T14:53Z

Generated with [1]:

$ deck-build-log-plot 'Error deleting EBS volume .* since volume is currently attached'

This error is currently in 184 out of 816 *-e2e-aws* failures across our whole CI system over the past 48 hours.

[1]: https://github.com/wking/openshift-release/tree/debug-scripts/deck-build-log

Comment 2 Seth Jennings 2019-03-27 21:22:12 UTC

Sending to Storage since they handle EBS attachment and PV management but I don't think the reason for any failure.  I think it is just that the attach/detach controller has not yet removed the EBS volume from the instance.

Comment 3 Matthew Wong 2019-03-27 21:59:54 UTC

The warnings are unrelated and happen to be interspersed with the test failures. Detach and Delete operations are done async by two separate controllers so the message
> Error deleting EBS volume "x" since volume is currently attached to "y"
just means that the pv controller responsible for deleting "x" attempted to do so before the attach-detach controller successfully detached "x" from "y."

Note also that some of these failing tests don't involve PVs, e.g. ResourceQuota should create a ResourceQuota and capture the life of a pod.

I am not sure what other team would be best equipped to look into these failures.

Comment 4 Matthew Wong 2019-03-27 22:03:17 UTC

Since the timeout error is not very descriptive, here is what the code says:
https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/test/e2e/scheduling/resource_quota.go#L75
"// wait for resource quota status to show the expected used resources value"
https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/test/e2e/scheduling/resource_quota.go#L1513

Don't know what component updates resource quota statuses

Comment 7 Ben Parees 2019-07-30 00:28:28 UTC

bump.  This continues to be a recurring failure in our 4.2 release stream.  The failure may be benign, but it makes our error rates noisy and makes it difficult to understand if we have a stable product.

Comment 9 W. Trevor King 2019-07-31 17:03:10 UTC

These failures consume some of our "failed deletion attempt" quotas, increasing the chance that AWS throttling cause noticeable issues.  But I don't have any specific runs I can link demonstrating that connection.

Comment 12 Jan Safranek 2019-12-13 13:59:59 UTC

We've fixed most of the API throttling as part of bug #1698829, it should be much better now. In this bug we focus on the warning event sent by PV controller:

W persistentvolume/pvc-4697ff0a-dd8f-44b0-8ce4-26a504215483 Error deleting EBS volume "vol-0f67031cbcacc5515" since volume is currently attached to "i-0ae5f32d0a0298f99"

Moved corresponding event from Warning to Normal, as it is part of normal operation:
https://github.com/kubernetes/kubernetes/pull/86250

Comment 15 Chao Yang 2020-02-05 09:23:28 UTC

Create 100 volumes and did not hit this issue.
Update the bug status to verified on 
version   4.4.0-0.nightly-2020-02-04-171905   True        False         171m    Cluster version is 4.4.0-0.nightly-2020-02-04-171905

Comment 17 errata-xmlrpc 2020-05-04 11:12:48 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581