Description of problem:
Error deleting EBS volume "x" since volume is currently attached to "y"
Failed the ResourceQuota tests
Has failed the ResourceQuota [sig-scheduling] tests 3 times in the last 4 days
Created attachment 1546728 [details]
Occurences of this error in CI from 2019-03-19T12:28 to 2019-03-21T14:53Z
Generated with :
$ deck-build-log-plot 'Error deleting EBS volume .* since volume is currently attached'
This error is currently in 184 out of 816 *-e2e-aws* failures across our whole CI system over the past 48 hours.
Sending to Storage since they handle EBS attachment and PV management but I don't think the reason for any failure. I think it is just that the attach/detach controller has not yet removed the EBS volume from the instance.
The warnings are unrelated and happen to be interspersed with the test failures. Detach and Delete operations are done async by two separate controllers so the message
> Error deleting EBS volume "x" since volume is currently attached to "y"
just means that the pv controller responsible for deleting "x" attempted to do so before the attach-detach controller successfully detached "x" from "y."
Note also that some of these failing tests don't involve PVs, e.g. ResourceQuota should create a ResourceQuota and capture the life of a pod.
I am not sure what other team would be best equipped to look into these failures.
Since the timeout error is not very descriptive, here is what the code says:
"// wait for resource quota status to show the expected used resources value"
Don't know what component updates resource quota statuses
bump. This continues to be a recurring failure in our 4.2 release stream. The failure may be benign, but it makes our error rates noisy and makes it difficult to understand if we have a stable product.
These failures consume some of our "failed deletion attempt" quotas, increasing the chance that AWS throttling cause noticeable issues. But I don't have any specific runs I can link demonstrating that connection.
We've fixed most of the API throttling as part of bug #1698829, it should be much better now. In this bug we focus on the warning event sent by PV controller:
W persistentvolume/pvc-4697ff0a-dd8f-44b0-8ce4-26a504215483 Error deleting EBS volume "vol-0f67031cbcacc5515" since volume is currently attached to "i-0ae5f32d0a0298f99"
Moved corresponding event from Warning to Normal, as it is part of normal operation:
Create 100 volumes and did not hit this issue.
Update the bug status to verified on
version 4.4.0-0.nightly-2020-02-04-171905 True False 171m Cluster version is 4.4.0-0.nightly-2020-02-04-171905
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.