Bug 1869618

Summary:	One of diskmaker-discovery pod is not deleted after delete the localvolumediscovery instance
Product:	OpenShift Container Platform	Reporter:	Chao Yang <chaoyang>
Component:	Storage	Assignee:	Santosh Pillai <sapillai>
Storage sub component:	Local Storage Operator	QA Contact:	Chao Yang <chaoyang>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	medium
Priority:	medium	CC:	aos-bugs, jsafrane, sapillai
Version:	4.6
Target Milestone:	---
Target Release:	4.6.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-10-27 16:29:05 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Chao Yang 2020-08-18 11:24:19 UTC

Description of problem:
One of diskmaker-discovery pod is not deleted after delete the localvolumediscovery instance

Version-Release number of selected component (if applicable):
4.6.0-0.nightly-2020-08-16-072105
local-storage-operator.4.6.0-202008111832.p0 

How reproducible:
1 time

Steps to Reproduce:
1.Deploy LSO
2.Create Localvolumediscovery instance
3.All required pods are running.
4.Delete the localvolumeddiscovery
5.One of pod diskmaker-discovery is not deleted
6.Re-create localvolumediscovery, 3 pods of diskmaker-discovery are running.

oc get pods -o wide
NAME                                      READY   STATUS    RESTARTS   AGE     IP             NODE                                         NOMINATED NODE   READINESS GATES
diskmaker-discovery-6b7gq                 1/1     Running   0          11m     10.128.2.156   ip-10-0-137-29.us-east-2.compute.internal    <none>           <none>
diskmaker-discovery-fl4kq                 1/1     Running   0          25h     10.131.0.21    ip-10-0-178-125.us-east-2.compute.internal   <none>           <none>
diskmaker-discovery-k2ztb                 1/1     Running   0          11m     10.131.0.38    ip-10-0-178-125.us-east-2.compute.internal   <none>           <none>
diskmaker-discovery-nzrb7                 1/1     Running   0          11m     10.129.2.33    ip-10-0-206-82.us-east-2.compute.internal    <none>           <none>


$ oc get pod -n openshift-local-storage --show-labels
NAME                                      READY   STATUS    RESTARTS   AGE     LABELS
diskmaker-discovery-6b7gq                 1/1     Running   0          57m     app=diskmaker-discovery,controller-revision-hash=7cd6787664,pod-template-generation=3
diskmaker-discovery-fl4kq                 1/1     Running   0          26h     app=diskmaker-discovery,controller-revision-hash=7587f56bd,pod-template-generation=1
diskmaker-discovery-k2ztb                 1/1     Running   0          58m     app=diskmaker-discovery,controller-revision-hash=7cd6787664,pod-template-generation=3
diskmaker-discovery-nzrb7                 1/1     Running   0          57m     app=diskmaker-discovery,controller-revision-hash=7cd6787664,pod-template-generation=3

oc logs pod/local-storage-operator-68f4dd987f-wcp5m 
{"level":"error","ts":1597740923.8552113,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"localvolumediscovery-controller","request":"openshift-local-storage/auto-discover-devices","error":"running 2 out of 3 discovery daemons","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/openshift/local-storage-operator/vendor/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:258\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/openshift/local-storage-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/openshift/local-storage-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/openshift/local-storage-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}


Actual results:
Pod diskmaker-discovery-fl4kq is not deleted when delete localvolummediscovery

Expected results:
Pod diskmaker-discovery-fl4kq should be deleted when delete localvolummediscovery

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 1 Santosh Pillai 2020-08-20 11:35:33 UTC

@Chao 
I was not able to reproduce this issue.  In my case all the daemonset pods and LocalVolumeDisoveryResults got deleted successfully on deleting the LocalVolumeDiscovery CR.
Can you share the cluster where this issue was reproduced, if you still have it.

Comment 2 Santosh Pillai 2020-08-30 02:35:16 UTC

@Chao - Any updates on this?

Comment 3 Chao Yang 2020-08-31 01:06:09 UTC

I only meet this once.
Could not reproduce it right now.

Comment 4 Santosh Pillai 2020-09-07 09:58:06 UTC

@Chao. Still not able to reproduce this bug. Moving it back to on_QA. Let me know if its still happening.

Comment 5 Chao Yang 2020-09-08 09:11:12 UTC

Update bz status since not reproduce.

Comment 8 errata-xmlrpc 2020-10-27 16:29:05 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196