1732193 – Master controllers try to detach a removed cinder volume every 2 minutes with "Resource not found" message.

Bug 1732193 - Master controllers try to detach a removed cinder volume every 2 minutes with "Resource not found" message.

Summary: Master controllers try to detach a removed cinder volume every 2 minutes with...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	3.11.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.2.0
Assignee:	Jan Safranek
QA Contact:	Chao Yang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-07-23 01:57 UTC by Daein Park
Modified:	2019-10-16 06:30 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-10-16 06:30:44 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift origin pull 23491	0	None	closed	Bug 1732193: UPSTREAM: 80518: Fix detachment of deleted volumes	2021-01-28 10:30:27 UTC
Red Hat Product Errata	RHBA-2019:2922	0	None	None	None	2019-10-16 06:30:59 UTC

Description Daein Park 2019-07-23 01:57:18 UTC

Description of problem:

Master controllers pod try to detach a removed cinder volume and it's generating spamming logs about it as follows[0].
The following "pvc-aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" PV has already removed in "oc get pv" and "cccccccc-cccc-cccc-cccc-cccccccccccc" cinder volume also does not exist in "openstack volume list". This PV and volume is removed and detached 10 days ago without any errors.

[0] master controllers logs
~~~
W0719 16:33:55.048164 1 reconciler.go:231] attacherDetacher.DetachVolume started for volume "pvc-aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" (UniqueName: "kubernetes.io/cinder/cccccccc-cccc-cccc-cccc-cccccccccccc") on node "worker01.ocp.example.com" This volume is not safe to detach, but maxWaitForUnmountDuration 6m0s expired, force detaching
E0719 16:33:55.126275 1 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/cinder/cccccccc-cccc-cccc-cccc-cccccccccccc\"" failed. No retries permitted until 2019-07-19 16:35:57.126225996 +0900 JST m=+3102241.249061221 (durationBeforeRetry 2m2s). Error: "DetachVolume.Detach failed for volume \"pvc-aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa\" (UniqueName: \"kubernetes.io/cinder/cccccccc-cccc-cccc-cccc-cccccccccccc\") on node \"worker01.ocp.example.com\" : error occurred getting volume by ID: cccccccc-cccc-cccc-cccc-cccccccccccc, err: Resource not found"
W0719 16:35:57.202790 1 reconciler.go:231] attacherDetacher.DetachVolume started for volume "pvc-aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" (UniqueName: "kubernetes.io/cinder/cccccccc-cccc-cccc-cccc-cccccccccccc") on node "worker01.ocp.example.com" This volume is not safe to detach, but maxWaitForUnmountDuration 6m0s expired, force detaching
E0719 16:35:57.728152 1 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/cinder/cccccccc-cccc-cccc-cccc-cccccccccccc\"" failed. No retries permitted until 2019-07-19 16:37:59.72812061 +0900 JST m=+3102363.850955837 (durationBeforeRetry 2m2s). Error: "DetachVolume.Detach failed for volume \"pvc-aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa\" (UniqueName: \"kubernetes.io/cinder/cccccccc-cccc-cccc-cccc-cccccccccccc\") on node \"worker01.ocp.example.com\" : error occurred getting volume by ID: cccccccc-cccc-cccc-cccc-cccccccccccc, err: Resource not found"
~~~

This node has been configured with "volumes.kubernetes.io/controller-managed-attach-detach=true" annotation and cloud provider is RHOSP 13.
Additionally PV is provided by dynamic provider which is configured with Cinder.

Version-Release number of selected component (if applicable):

OpenStack RHOSP 13
Openshift v3.11.98
Kubernetes v1.11.0+d4cacc0

How reproducible:

This issue can not reproduce, it's occurring on CU's environment.

Steps to Reproduce:
1.
2.
3.

Actual results:

Master controllers try to detach a removed cinder volume through PV and it's generating spamming error messages: "Resource not found".

Expected results:

No messages and errors to detach a removed volumes.

Additional info:

All failure-domain.beta.kubernetes.io/region and zone are same in the OCP cluster.

Comment 2 Jan Safranek 2019-07-24 08:41:01 UTC

I can see that Cinder detach is not idempotent, it fails when detaching already detached volume.

Comment 3 Jan Safranek 2019-07-24 12:30:56 UTC

Filed https://github.com/kubernetes/kubernetes/pull/80518 upstream.

Comment 9 Chao Yang 2019-08-28 08:00:14 UTC

It is passed on 4.2.0-0.nightly-2019-08-27-072819
1.Create pvc, pod
2.Detach and delete volume from openstack webconsole
3.Delete pod
4.Check logs in the openshift-kube-controller-manager

Comment 10 errata-xmlrpc 2019-10-16 06:30:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922

Note You need to log in before you can comment on or make changes to this bug.