Bug 1737788

Summary: CSI volumes are not detached on RequestLimitExceeded errors
Product: OpenShift Container Platform Reporter: Jan Safranek <jsafrane>
Component: StorageAssignee: Jan Safranek <jsafrane>
Status: CLOSED ERRATA QA Contact: Chao Yang <chaoyang>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.2.0CC: aos-bugs, aos-storage-staff, nagrawal
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-16 06:34:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jan Safranek 2019-08-06 08:42:42 UTC
Description of problem:
When AWS EBS CSI driver returns "RequestLimitExceeded" to ControllerUnpublishRequest (="detach"), csi-external-attacher marks the volume as detached:

I0805 16:05:08.948172       1 connection.go:180] GRPC call: /csi.v1.Controller/ControllerUnpublishVolume
I0805 16:05:08.948182       1 connection.go:181] GRPC request: {"node_id":"i-0eb51d1f90e270d91","volume_id":"vol-0766d9cc3c1966b86"}
I0805 16:05:15.323947       1 connection.go:183] GRPC response: {}
I0805 16:05:15.324406       1 connection.go:184] GRPC error: rpc error: code = Internal desc = Could not detach volume "vol-0766d9cc3c1966b86" from node "i-0eb51d1f90e270d91": could not detach volume "vol-0766d9cc3c1966b86" from node "i-0eb51d1f90e270d91": RequestLimitExceeded: Request limit exceeded.
        status code: 503, request id: 10ceab6c-4a6d-4da5-add2-d46d4fdb652a
I0805 16:05:15.324422       1 csi_handler.go:369] Detached "csi-51ad5c68abe99844c08cfce659c0f8375d8b0255391a6b28b543501577b3dee6" with error rpc error: code = Internal desc = Could not detach volume "vol-0766d9cc3c1966b86" from node "i-0eb51d1f90e270d91": could not detach volume "vol-0766d9cc3c1966b86" from node "i-0
eb51d1f90e270d91": RequestLimitExceeded: Request limit exceeded.
        status code: 503, request id: 10ceab6c-4a6d-4da5-add2-d46d4fdb652a
I0805 16:05:15.324459       1 util.go:70] Marking as detached "csi-51ad5c68abe99844c08cfce659c0f8375d8b0255391a6b28b543501577b3dee6"

The volume remains attached to a node and the external-attacher never re-tries to detach the volume.

Version-Release number of selected component (if applicable):
4.2.0-0.okd-2019-08-05-143844

How reproducible:
rarely

Steps to Reproduce:
1. run a pod with AWS EBS volume provided by CSI.
2. delete the pod and hope for RequestLimitExceeded response.

Actual results:
1. The volume is still attached to the node (as seen in AWS console)
2. VolumeAttachment Kubernetes object is deleted

Expected results:
1. The volume is detached from the node (after a while).
2. VolumeAttachment Kubernetes object is deleted.

Comment 1 Jan Safranek 2019-08-06 08:43:21 UTC
Filed https://github.com/kubernetes-csi/external-attacher/pull/165

Comment 4 Chao Yang 2019-08-21 02:47:44 UTC
Created 400 pvc, volumes and pods, but did not meet the AWS "RequestLimitExceeded".
Mark this bug to be verified. Will re-open it if this issue happen again.

Comment 5 errata-xmlrpc 2019-10-16 06:34:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922