Description of problem: Master controllers pod try to detach a removed cinder volume and it's generating spamming logs about it as follows[0]. The following "pvc-aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" PV has already removed in "oc get pv" and "cccccccc-cccc-cccc-cccc-cccccccccccc" cinder volume also does not exist in "openstack volume list". This PV and volume is removed and detached 10 days ago without any errors. [0] master controllers logs ~~~ W0719 16:33:55.048164 1 reconciler.go:231] attacherDetacher.DetachVolume started for volume "pvc-aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" (UniqueName: "kubernetes.io/cinder/cccccccc-cccc-cccc-cccc-cccccccccccc") on node "worker01.ocp.example.com" This volume is not safe to detach, but maxWaitForUnmountDuration 6m0s expired, force detaching E0719 16:33:55.126275 1 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/cinder/cccccccc-cccc-cccc-cccc-cccccccccccc\"" failed. No retries permitted until 2019-07-19 16:35:57.126225996 +0900 JST m=+3102241.249061221 (durationBeforeRetry 2m2s). Error: "DetachVolume.Detach failed for volume \"pvc-aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa\" (UniqueName: \"kubernetes.io/cinder/cccccccc-cccc-cccc-cccc-cccccccccccc\") on node \"worker01.ocp.example.com\" : error occurred getting volume by ID: cccccccc-cccc-cccc-cccc-cccccccccccc, err: Resource not found" W0719 16:35:57.202790 1 reconciler.go:231] attacherDetacher.DetachVolume started for volume "pvc-aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" (UniqueName: "kubernetes.io/cinder/cccccccc-cccc-cccc-cccc-cccccccccccc") on node "worker01.ocp.example.com" This volume is not safe to detach, but maxWaitForUnmountDuration 6m0s expired, force detaching E0719 16:35:57.728152 1 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/cinder/cccccccc-cccc-cccc-cccc-cccccccccccc\"" failed. No retries permitted until 2019-07-19 16:37:59.72812061 +0900 JST m=+3102363.850955837 (durationBeforeRetry 2m2s). Error: "DetachVolume.Detach failed for volume \"pvc-aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa\" (UniqueName: \"kubernetes.io/cinder/cccccccc-cccc-cccc-cccc-cccccccccccc\") on node \"worker01.ocp.example.com\" : error occurred getting volume by ID: cccccccc-cccc-cccc-cccc-cccccccccccc, err: Resource not found" ~~~ This node has been configured with "volumes.kubernetes.io/controller-managed-attach-detach=true" annotation and cloud provider is RHOSP 13. Additionally PV is provided by dynamic provider which is configured with Cinder. Version-Release number of selected component (if applicable): OpenStack RHOSP 13 Openshift v3.11.98 Kubernetes v1.11.0+d4cacc0 How reproducible: This issue can not reproduce, it's occurring on CU's environment. Steps to Reproduce: 1. 2. 3. Actual results: Master controllers try to detach a removed cinder volume through PV and it's generating spamming error messages: "Resource not found". Expected results: No messages and errors to detach a removed volumes. Additional info: All failure-domain.beta.kubernetes.io/region and zone are same in the OCP cluster.
I can see that Cinder detach is not idempotent, it fails when detaching already detached volume.
Filed https://github.com/kubernetes/kubernetes/pull/80518 upstream.
It is passed on 4.2.0-0.nightly-2019-08-27-072819 1.Create pvc, pod 2.Detach and delete volume from openstack webconsole 3.Delete pod 4.Check logs in the openshift-kube-controller-manager
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922