Bug 1775849

Summary: NFD pods disappear after cluster upgrade
Product: OpenShift Container Platform Reporter: Eric Matysek <ematysek>
Component: Node Feature Discovery OperatorAssignee: Zvonko Kosic <zkosic>
Status: CLOSED ERRATA QA Contact: Eric Matysek <ematysek>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.3.0CC: eparis, mifiedle, mpatel, nagrawal, sejug, wabouham
Target Milestone: ---   
Target Release: 4.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1782948 1785307 (view as bug list) Environment:
Last Closed: 2020-01-23 11:13:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1778904, 1782948    
Bug Blocks: 1785307, 1805394    

Description Eric Matysek 2019-11-22 23:25:55 UTC
Description of problem:
NFD pods disappear after cluster upgrade
upgrading from 4.2.7 to 4.3.0-0.nightly-2019-11-21-122827

Version-Release number of selected component (if applicable):


How reproducible:
100%


Steps to Reproduce:
1. Deploy 4.2.7 cluster
2. git clone https://github.com/openshift/cluster-nfd-operator
3. make deploy
4. oc adm upgrade --to-image registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2019-11-21-122827 --force --allow-explicit-upgrade

Actual results:
before upgrade:
[ematysek@jump ~]$ oc get all -n openshift-nfd
NAME                   READY   STATUS    RESTARTS   AGE
pod/nfd-master-gvj7n   1/1     Running   0          32s
pod/nfd-master-kmtx8   1/1     Running   0          32s
pod/nfd-master-wg47s   1/1     Running   0          32s
pod/nfd-worker-csdwm   1/1     Running   2          33s
pod/nfd-worker-nxqrv   1/1     Running   2          33s
pod/nfd-worker-qnjxn   1/1     Running   2          33s

after upgrade:
[ematysek@jump cluster-nfd-operator]$ oc get all -n openshift-nfd
No resources found in openshift-nfd namespace.


Expected results:
NFD pods should still exist

Additional info:

Comment 2 Zvonko Kosic 2019-12-04 14:59:53 UTC
(In reply to Eric Matysek from comment #0)
> Description of problem:
> NFD pods disappear after cluster upgrade
> upgrading from 4.2.7 to 4.3.0-0.nightly-2019-11-21-122827
> 
> Version-Release number of selected component (if applicable):
> 
> 
> How reproducible:
> 100%
> 
> 
> Steps to Reproduce:
> 1. Deploy 4.2.7 cluster
> 2. git clone https://github.com/openshift/cluster-nfd-operator
> 3. make deploy
> 4. oc adm upgrade --to-image
> registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2019-11-21-122827
> --force --allow-explicit-upgrade
> 
> Actual results:
> before upgrade:
> [ematysek@jump ~]$ oc get all -n openshift-nfd
> NAME                   READY   STATUS    RESTARTS   AGE
> pod/nfd-master-gvj7n   1/1     Running   0          32s
> pod/nfd-master-kmtx8   1/1     Running   0          32s
> pod/nfd-master-wg47s   1/1     Running   0          32s
> pod/nfd-worker-csdwm   1/1     Running   2          33s
> pod/nfd-worker-nxqrv   1/1     Running   2          33s
> pod/nfd-worker-qnjxn   1/1     Running   2          33s
> 
> after upgrade:
> [ematysek@jump cluster-nfd-operator]$ oc get all -n openshift-nfd
> No resources found in openshift-nfd namespace.
> 
> 
> Expected results:
> NFD pods should still exist
> 
> Additional info:

You were testing the master against a specific openshift release. Master will not always work. Please install NFD from operatorhub in ocp 4.2 and try the upgrade path.

Comment 4 Eric Matysek 2019-12-05 22:32:03 UTC
Verified this works with the NFD version in public OperatorHub as well

Comment 5 Eric Matysek 2019-12-05 22:34:46 UTC
And by works I mean the bug is present... sorry for the typo

Comment 6 Zvonko Kosic 2019-12-06 14:11:01 UTC
Can you provide any logs or events?

oc get events -n openshift-nfd
oc get events -n openshift-nfd-operator

Is the operator still running?

oc logs -f <operator-name> -n openshift-nfd-operator

What is the status of the DaemonSets in openshift-nfd

oc describe ds -n openshift-nfd

Comment 7 Zvonko Kosic 2019-12-12 14:49:44 UTC
Can you also try to deploy NFD via OLM, and then do the upgrade? 

OLM is responsible for updating the day 2 operators, manually deploying means also manually updating.

Comment 8 Zvonko Kosic 2019-12-12 16:55:16 UTC
This might fix the issue, https://github.com/openshift/cluster-nfd-operator/pull/45

Comment 10 Mike Fiedler 2019-12-13 13:46:36 UTC
Verification blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1778904

Comment 11 Eric Matysek 2019-12-19 16:15:38 UTC
PR 49 is not merged with release-4.2 branch, so it has no effect on a 4.2.x cluster

Comment 12 Mrunal Patel 2020-01-03 19:09:26 UTC
https://github.com/openshift/cluster-nfd-operator/pull/51 for release-4.2

Comment 13 Eric Paris 2020-01-07 01:09:52 UTC
I don't know why this BZ is talking about 4.2.  This is a 4.3 BZ. I see that this fix is supposedly merged and that 1778904 is VERIFIED. So I'm moving this back ON_QA.

Comment 14 Eric Matysek 2020-01-08 22:52:47 UTC
Upgraded successfully without nfd pods disappearing!

[ematysek@jump ~]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.3.0-0.nightly-2020-01-08-181129   True        False         13m     Cluster version is 4.3.0-0.nightly-2020-01-08-181129

[ematysek@jump ~]$ oc get pods
NAME               READY   STATUS    RESTARTS   AGE
nfd-master-cb57h   1/1     Running   0          14m
nfd-master-pv9c7   1/1     Running   0          14m
nfd-master-rs8tz   1/1     Running   0          14m
nfd-worker-khrhj   1/1     Running   2          14m
nfd-worker-wnwlv   1/1     Running   2          14m
nfd-worker-wxzq7   1/1     Running   2          14m

Comment 16 errata-xmlrpc 2020-01-23 11:13:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062