Bug 1784678

Summary: OCP 4.2.12: upi on baremetal - openshift-nfd namespace disappears with nfd pods several hours after deploying Node Feature Discovery operator from operatorHub
Product: OpenShift Container Platform Reporter: Walid A. <wabouham>
Component: Node Feature Discovery OperatorAssignee: Carlos Eduardo Arango Gutierrez <carangog>
Status: CLOSED DUPLICATE QA Contact: Walid A. <wabouham>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.2.zCC: carangog, eparis, mifiedle, sejug, wsun, zkosic
Target Milestone: ---Keywords: Regression
Target Release: 4.2.z   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-28 13:35:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Walid A. 2019-12-18 03:55:20 UTC
Description of problem:
This is on UPI on baremetal openstack OCP 4.2.12 staging cluster.  After deploying Node Feature Discovery (NFD) operator from OperatorHub from the OCP console, and creating an instance of that operator, NFD gets deployed successfully and the nfd pods in the openshift-nfd namespace are created successfully.  All the nodes get the NFD specific labels.

After letting the cluster sit for several hours, at least 6-8 hours or overnight, the openshift-nfd namespace disappeared along with the all the nfd pods.

The NFD operator shows still installed and the operator pod in the namespace I created is still running.

Version-Release number of selected component (if applicable):
Server Version: 4.2.12
Kubernetes Version: v1.14.6+32dc4a0


How reproducible:
Seen twice

Steps to Reproduce:
1. UPI baremetal cluster install of OCP 4.2.12 on openstack, 3 master and 3 worker nodes
2. from OCP console, logged in as kubeadmin user with kuebadmin-password, create a new project called test-nfd.
3. From Console Operators -> Operator Hub:  Search for "NFD", and click on Node Feature Discovery operator icon, then install.  Choose: 
- install in namespace you created 
- 4.2 for update channel
- Approval strategy:  Automatic
4. Create instance of that operator

Actual results:
NFD operator is deployed successfully in test-nfd namespace and is running.  A new namespace called "openshift-nfd" is created along with the nfd-master and nfd-worker worker for each node.  NFD labels are created successfully on all node.  Only problem is after several hours (6-8 hours), the namespace "openshift-nfd" along with all the nfd pods disappears.

$  oc get pods -n openshift-nfd
No resources found.

Expected results:
Openshift-nfd namespace should not disappear and the nfd master and worker pods in that namespace should not disappear either and stay running.

$  oc get pods -n openshift-nfd
NAME               READY   STATUS    RESTARTS   AGE
nfd-master-bzsb5   1/1     Running   0          71s
nfd-master-dstjq   1/1     Running   0          71s
nfd-master-t84wf   1/1     Running   0          71s
nfd-worker-2c6bw   1/1     Running   2          72s
nfd-worker-glb55   1/1     Running   2          72s
nfd-worker-tsnj5   1/1     Running   2          72s

Additional info:
Link to must-gather logs and various oc commands is provided in next comment

Comment 2 Walid A. 2019-12-20 20:45:38 UTC
Hitting the same issue on AWS IPI installed OCP 4.2.1.
Link to logs will be in next comment

Comment 6 Carlos Eduardo Arango Gutierrez 2020-02-06 13:30:16 UTC
Assigned to Eduardo Arango -> Cherry pick fix from master

Comment 7 Zvonko Kosic 2020-02-28 13:35:14 UTC

*** This bug has been marked as a duplicate of bug 1805394 ***