Bug 1775849 - NFD pods disappear after cluster upgrade
Summary: NFD pods disappear after cluster upgrade
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node Feature Discovery Operator
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.3.0
Assignee: Zvonko Kosic
QA Contact: Eric Matysek
URL:
Whiteboard:
Depends On: 1778904 1782948
Blocks: 1785307 1805394
TreeView+ depends on / blocked
 
Reported: 2019-11-22 23:25 UTC by Eric Matysek
Modified: 2020-02-27 15:56 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1782948 1785307 (view as bug list)
Environment:
Last Closed: 2020-01-23 11:13:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-nfd-operator pull 49 0 None closed Bug 1775849: [release-4.3] Added more Watcher for secondary resource 2020-03-04 17:09:01 UTC
Red Hat Product Errata RHBA-2020:0062 0 None None None 2020-01-23 11:14:20 UTC

Description Eric Matysek 2019-11-22 23:25:55 UTC
Description of problem:
NFD pods disappear after cluster upgrade
upgrading from 4.2.7 to 4.3.0-0.nightly-2019-11-21-122827

Version-Release number of selected component (if applicable):


How reproducible:
100%


Steps to Reproduce:
1. Deploy 4.2.7 cluster
2. git clone https://github.com/openshift/cluster-nfd-operator
3. make deploy
4. oc adm upgrade --to-image registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2019-11-21-122827 --force --allow-explicit-upgrade

Actual results:
before upgrade:
[ematysek@jump ~]$ oc get all -n openshift-nfd
NAME                   READY   STATUS    RESTARTS   AGE
pod/nfd-master-gvj7n   1/1     Running   0          32s
pod/nfd-master-kmtx8   1/1     Running   0          32s
pod/nfd-master-wg47s   1/1     Running   0          32s
pod/nfd-worker-csdwm   1/1     Running   2          33s
pod/nfd-worker-nxqrv   1/1     Running   2          33s
pod/nfd-worker-qnjxn   1/1     Running   2          33s

after upgrade:
[ematysek@jump cluster-nfd-operator]$ oc get all -n openshift-nfd
No resources found in openshift-nfd namespace.


Expected results:
NFD pods should still exist

Additional info:

Comment 2 Zvonko Kosic 2019-12-04 14:59:53 UTC
(In reply to Eric Matysek from comment #0)
> Description of problem:
> NFD pods disappear after cluster upgrade
> upgrading from 4.2.7 to 4.3.0-0.nightly-2019-11-21-122827
> 
> Version-Release number of selected component (if applicable):
> 
> 
> How reproducible:
> 100%
> 
> 
> Steps to Reproduce:
> 1. Deploy 4.2.7 cluster
> 2. git clone https://github.com/openshift/cluster-nfd-operator
> 3. make deploy
> 4. oc adm upgrade --to-image
> registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2019-11-21-122827
> --force --allow-explicit-upgrade
> 
> Actual results:
> before upgrade:
> [ematysek@jump ~]$ oc get all -n openshift-nfd
> NAME                   READY   STATUS    RESTARTS   AGE
> pod/nfd-master-gvj7n   1/1     Running   0          32s
> pod/nfd-master-kmtx8   1/1     Running   0          32s
> pod/nfd-master-wg47s   1/1     Running   0          32s
> pod/nfd-worker-csdwm   1/1     Running   2          33s
> pod/nfd-worker-nxqrv   1/1     Running   2          33s
> pod/nfd-worker-qnjxn   1/1     Running   2          33s
> 
> after upgrade:
> [ematysek@jump cluster-nfd-operator]$ oc get all -n openshift-nfd
> No resources found in openshift-nfd namespace.
> 
> 
> Expected results:
> NFD pods should still exist
> 
> Additional info:

You were testing the master against a specific openshift release. Master will not always work. Please install NFD from operatorhub in ocp 4.2 and try the upgrade path.

Comment 4 Eric Matysek 2019-12-05 22:32:03 UTC
Verified this works with the NFD version in public OperatorHub as well

Comment 5 Eric Matysek 2019-12-05 22:34:46 UTC
And by works I mean the bug is present... sorry for the typo

Comment 6 Zvonko Kosic 2019-12-06 14:11:01 UTC
Can you provide any logs or events?

oc get events -n openshift-nfd
oc get events -n openshift-nfd-operator

Is the operator still running?

oc logs -f <operator-name> -n openshift-nfd-operator

What is the status of the DaemonSets in openshift-nfd

oc describe ds -n openshift-nfd

Comment 7 Zvonko Kosic 2019-12-12 14:49:44 UTC
Can you also try to deploy NFD via OLM, and then do the upgrade? 

OLM is responsible for updating the day 2 operators, manually deploying means also manually updating.

Comment 8 Zvonko Kosic 2019-12-12 16:55:16 UTC
This might fix the issue, https://github.com/openshift/cluster-nfd-operator/pull/45

Comment 10 Mike Fiedler 2019-12-13 13:46:36 UTC
Verification blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1778904

Comment 11 Eric Matysek 2019-12-19 16:15:38 UTC
PR 49 is not merged with release-4.2 branch, so it has no effect on a 4.2.x cluster

Comment 12 Mrunal Patel 2020-01-03 19:09:26 UTC
https://github.com/openshift/cluster-nfd-operator/pull/51 for release-4.2

Comment 13 Eric Paris 2020-01-07 01:09:52 UTC
I don't know why this BZ is talking about 4.2.  This is a 4.3 BZ. I see that this fix is supposedly merged and that 1778904 is VERIFIED. So I'm moving this back ON_QA.

Comment 14 Eric Matysek 2020-01-08 22:52:47 UTC
Upgraded successfully without nfd pods disappearing!

[ematysek@jump ~]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.3.0-0.nightly-2020-01-08-181129   True        False         13m     Cluster version is 4.3.0-0.nightly-2020-01-08-181129

[ematysek@jump ~]$ oc get pods
NAME               READY   STATUS    RESTARTS   AGE
nfd-master-cb57h   1/1     Running   0          14m
nfd-master-pv9c7   1/1     Running   0          14m
nfd-master-rs8tz   1/1     Running   0          14m
nfd-worker-khrhj   1/1     Running   2          14m
nfd-worker-wnwlv   1/1     Running   2          14m
nfd-worker-wxzq7   1/1     Running   2          14m

Comment 16 errata-xmlrpc 2020-01-23 11:13:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062


Note You need to log in before you can comment on or make changes to this bug.