Bug 1732193 - Master controllers try to detach a removed cinder volume every 2 minutes with "Resource not found" message.
Summary: Master controllers try to detach a removed cinder volume every 2 minutes with...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.2.0
Assignee: Jan Safranek
QA Contact: Chao Yang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-23 01:57 UTC by Daein Park
Modified: 2019-10-16 06:30 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:30:44 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift origin pull 23491 None closed Bug 1732193: UPSTREAM: 80518: Fix detachment of deleted volumes 2020-01-31 05:59:22 UTC
Red Hat Product Errata RHBA-2019:2922 None None None 2019-10-16 06:30:59 UTC

Description Daein Park 2019-07-23 01:57:18 UTC
Description of problem:

Master controllers pod try to detach a removed cinder volume and it's generating spamming logs about it as follows[0].
The following "pvc-aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" PV has already removed in "oc get pv" and "cccccccc-cccc-cccc-cccc-cccccccccccc" cinder volume also does not exist in "openstack volume list". This PV and volume is removed and detached 10 days ago without any errors.

  [0] master controllers logs
    ~~~
    W0719 16:33:55.048164       1 reconciler.go:231] attacherDetacher.DetachVolume started for volume "pvc-aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" (UniqueName: "kubernetes.io/cinder/cccccccc-cccc-cccc-cccc-cccccccccccc") on node "worker01.ocp.example.com" This volume is not safe to detach, but maxWaitForUnmountDuration 6m0s expired, force detaching
    E0719 16:33:55.126275       1 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/cinder/cccccccc-cccc-cccc-cccc-cccccccccccc\"" failed. No retries permitted until 2019-07-19 16:35:57.126225996 +0900 JST m=+3102241.249061221 (durationBeforeRetry 2m2s). Error: "DetachVolume.Detach failed for volume \"pvc-aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa\" (UniqueName: \"kubernetes.io/cinder/cccccccc-cccc-cccc-cccc-cccccccccccc\") on node \"worker01.ocp.example.com\" : error occurred getting volume by ID: cccccccc-cccc-cccc-cccc-cccccccccccc, err: Resource not found"
    W0719 16:35:57.202790       1 reconciler.go:231] attacherDetacher.DetachVolume started for volume "pvc-aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" (UniqueName: "kubernetes.io/cinder/cccccccc-cccc-cccc-cccc-cccccccccccc") on node "worker01.ocp.example.com" This volume is not safe to detach, but maxWaitForUnmountDuration 6m0s expired, force detaching
    E0719 16:35:57.728152       1 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/cinder/cccccccc-cccc-cccc-cccc-cccccccccccc\"" failed. No retries permitted until 2019-07-19 16:37:59.72812061 +0900 JST m=+3102363.850955837 (durationBeforeRetry 2m2s). Error: "DetachVolume.Detach failed for volume \"pvc-aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa\" (UniqueName: \"kubernetes.io/cinder/cccccccc-cccc-cccc-cccc-cccccccccccc\") on node \"worker01.ocp.example.com\" : error occurred getting volume by ID: cccccccc-cccc-cccc-cccc-cccccccccccc, err: Resource not found"
    ~~~

  This node has been configured with "volumes.kubernetes.io/controller-managed-attach-detach=true" annotation and cloud provider is RHOSP 13.
  Additionally PV is provided by dynamic provider which is configured with Cinder.

Version-Release number of selected component (if applicable):

  OpenStack RHOSP 13
  Openshift v3.11.98
  Kubernetes v1.11.0+d4cacc0

How reproducible:

  This issue can not reproduce, it's occurring on CU's environment.

Steps to Reproduce:
1.
2.
3.

Actual results:

  Master controllers try to detach a removed cinder volume through PV and it's generating  spamming error messages: "Resource not found".

Expected results:

  No messages and errors to detach a removed volumes.

Additional info:

  All failure-domain.beta.kubernetes.io/region and zone are same in the OCP cluster.

Comment 2 Jan Safranek 2019-07-24 08:41:01 UTC
I can see that Cinder detach is not idempotent, it fails when detaching already detached volume.

Comment 3 Jan Safranek 2019-07-24 12:30:56 UTC
Filed https://github.com/kubernetes/kubernetes/pull/80518 upstream.

Comment 9 Chao Yang 2019-08-28 08:00:14 UTC
It is passed on 4.2.0-0.nightly-2019-08-27-072819
1.Create pvc, pod
2.Detach and delete volume from openstack webconsole
3.Delete pod
4.Check logs in the openshift-kube-controller-manager

Comment 10 errata-xmlrpc 2019-10-16 06:30:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.