Bug 1753871 - NFD operand pods crash looping after install from OperatorHub: version not set!
Summary: NFD operand pods crash looping after install from OperatorHub: version not ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node Feature Discovery Operator
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 4.2.0
Assignee: Zvonko Kosic
QA Contact: Walid A.
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-09-20 06:12 UTC by Mike Fiedler
Modified: 2019-10-16 06:41 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:41:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
openshift-nfd events (13.55 KB, text/plain)
2019-09-20 06:12 UTC, Mike Fiedler
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:41:54 UTC

Description Mike Fiedler 2019-09-20 06:12:46 UTC
Created attachment 1617031 [details]
openshift-nfd events

Description of problem:

After installing NFD from OperatorHub and creating a NodeFeatureDiscovery CR from OperatorHub, all of the operand pods are crashlooping with this messages:

# oc logs nfd-master-42prh
2019/09/20 05:54:37 version not set! Set -ldflags "-X sigs.k8s.io/node-feature-discovery/pkg/version.version=`git describe --tags --dirty --always`" during build or run.

CR:

apiVersion: v1
items:
- apiVersion: nfd.openshift.io/v1alpha1
  kind: NodeFeatureDiscovery
  metadata:
    creationTimestamp: "2019-09-20T05:48:58Z"
    generation: 1
    name: nfd-master-server
    namespace: openshift-nfd
    resourceVersion: "821742"
    selfLink: /apis/nfd.openshift.io/v1alpha1/namespaces/openshift-nfd/nodefeaturediscoveries/nfd-master-server
    uid: 56ed42a1-db6a-11e9-a186-0202847169d6
  spec:
    namespace: openshift-nfd
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""


 


Version-Release number of selected component (if applicable):

NFD builds:

ose-cluster-nfd-operator:v4.2.0-201909192219                                                                                                                                       
ose-node-feature-discovery:v4.2.0-201909151553 


How reproducible: Always


Steps to Reproduce:
1. Set up cluster to use redhat-operators-art as the operator source (https://docs.google.com/document/d/1t81RSsZbUoGO4r5OgJ1bqAESKt2fM25MvV6pcgQUPSk/edit?usp=sharing)
2. oc adm new-project openshift-nfd
3. Go to OperatorHub and install NFD into the openshift-nfd project
4. Click on the NFD operator and go to Node Feature Discovery tab and create a NodeFeatureDiscovery CR in the openshift-nfd namespace
5. oc get pods -n openshift-nfd

Actual results:

NAME                            READY   STATUS             RESTARTS   AGE
nfd-master-42prh                0/1     Error              9          21m
nfd-master-dmlrn                0/1     CrashLoopBackOff   4          117s
nfd-master-sdklp                0/1     CrashLoopBackOff   8          21m
nfd-operator-6bb75fcddc-htm9g   1/1     Running            0          21m
nfd-worker-4b7f8                0/1     CrashLoopBackOff   8          21m
nfd-worker-gbmcx                0/1     Error              9          21m
nfd-worker-tdlmn                0/1     CrashLoopBackOff   8          21m
nfd-worker-vlb72                0/1     CrashLoopBackOff   8          21m

# oc logs nfd-master-42prh
2019/09/20 05:54:37 version not set! Set -ldflags "-X sigs.k8s.io/node-feature-discovery/pkg/version.version=`git describe --tags --dirty --always`" during build or run.

Expected results:  Operands running

Comment 4 Walid A. 2019-09-20 21:49:16 UTC
Followed the steps and doc in the description, I was able to successfully install NFD operator and nfd operand from OperatorHub with latest images: 
- openshift/ose-node-feature-discovery:v4.2.0-201909201019
- openshift/ose-cluster-nfd-operator:v4.2.0-201909192219

$ oc get pods -n new-nfd-operator
NAME                            READY   STATUS    RESTARTS   AGE
nfd-operator-6f84bc4556-5vhcq   1/1     Running   0          54m

$ oc get pods -n openshift-nfd
NAME               READY   STATUS    RESTARTS   AGE
nfd-master-5gvh5   1/1     Running   0          54m
nfd-master-gb88t   1/1     Running   0          35m
nfd-master-stphc   1/1     Running   0          54m
nfd-worker-2kp7d   1/1     Running   0          54m
nfd-worker-ckxq4   1/1     Running   1          54m
nfd-worker-plz5p   1/1     Running   1          54m

$ oc describe node | grep feature
                    feature.node.kubernetes.io/cpu-cpuid.ADX=true
                    feature.node.kubernetes.io/cpu-cpuid.AESNI=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX2=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512BW=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512CD=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512DQ=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512F=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512VL=true
                    feature.node.kubernetes.io/cpu-cpuid.FMA3=true
                    feature.node.kubernetes.io/cpu-cpuid.HLE=true
                    feature.node.kubernetes.io/cpu-cpuid.MPX=true
                    feature.node.kubernetes.io/cpu-cpuid.RTM=true
                    feature.node.kubernetes.io/cpu-hardware_multithreading=true
                    feature.node.kubernetes.io/kernel-version.full=4.18.0-80.11.1.el8_0.x86_64
                    feature.node.kubernetes.io/kernel-version.major=4
                    feature.node.kubernetes.io/kernel-version.minor=18
                    feature.node.kubernetes.io/kernel-version.revision=0
                    feature.node.kubernetes.io/pci-1d0f.present=true
                    feature.node.kubernetes.io/storage-nonrotationaldisk=true
                    feature.node.kubernetes.io/system-os_release.ID=rhcos
                    feature.node.kubernetes.io/system-os_release.VERSION_ID=4.2
                    feature.node.kubernetes.io/system-os_release.VERSION_ID.major=4
                    feature.node.kubernetes.io/system-os_release.VERSION_ID.minor=2
                    nfd.node.kubernetes.io/feature-labels:
                    feature.node.kubernetes.io/cpu-cpuid.ADX=true
                    feature.node.kubernetes.io/cpu-cpuid.AESNI=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX2=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512BW=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512CD=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512DQ=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512F=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512VL=true
                    feature.node.kubernetes.io/cpu-cpuid.FMA3=true
                    feature.node.kubernetes.io/cpu-cpuid.HLE=true
                    feature.node.kubernetes.io/cpu-cpuid.MPX=true
                    feature.node.kubernetes.io/cpu-cpuid.RTM=true
                    feature.node.kubernetes.io/cpu-hardware_multithreading=true
                    feature.node.kubernetes.io/kernel-version.full=4.18.0-80.11.1.el8_0.x86_64
                    feature.node.kubernetes.io/kernel-version.major=4
                    feature.node.kubernetes.io/kernel-version.minor=18
                    feature.node.kubernetes.io/kernel-version.revision=0
                    feature.node.kubernetes.io/pci-1d0f.present=true
                    feature.node.kubernetes.io/storage-nonrotationaldisk=true
                    feature.node.kubernetes.io/system-os_release.ID=rhcos
                    feature.node.kubernetes.io/system-os_release.VERSION_ID=4.2
                    feature.node.kubernetes.io/system-os_release.VERSION_ID.major=4
                    feature.node.kubernetes.io/system-os_release.VERSION_ID.minor=2
                    nfd.node.kubernetes.io/feature-labels:
                    feature.node.kubernetes.io/cpu-cpuid.ADX=true
                    feature.node.kubernetes.io/cpu-cpuid.AESNI=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX2=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512BW=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512CD=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512DQ=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512F=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512VL=true
                    feature.node.kubernetes.io/cpu-cpuid.FMA3=true
                    feature.node.kubernetes.io/cpu-cpuid.HLE=true
                    feature.node.kubernetes.io/cpu-cpuid.MPX=true
                    feature.node.kubernetes.io/cpu-cpuid.RTM=true
                    feature.node.kubernetes.io/cpu-hardware_multithreading=true
                    feature.node.kubernetes.io/kernel-version.full=4.18.0-80.11.1.el8_0.x86_64
                    feature.node.kubernetes.io/kernel-version.major=4
                    feature.node.kubernetes.io/kernel-version.minor=18
                    feature.node.kubernetes.io/kernel-version.revision=0
                    feature.node.kubernetes.io/pci-1d0f.present=true
                    feature.node.kubernetes.io/storage-nonrotationaldisk=true
                    feature.node.kubernetes.io/system-os_release.ID=rhcos
                    feature.node.kubernetes.io/system-os_release.VERSION_ID=4.2
                    feature.node.kubernetes.io/system-os_release.VERSION_ID.major=4
                    feature.node.kubernetes.io/system-os_release.VERSION_ID.minor=2
                    nfd.node.kubernetes.io/feature-labels:

Comment 5 Walid A. 2019-09-20 22:11:05 UTC
Correction:  I had listed the wrong tag for the ose-cluster-nfd-operator image in my previous comment.

These are the correct image build versions that were used in the verification steps:

- openshift/ose-node-feature-discovery:v4.2.0-201909201019
- openshift/ose-cluster-nfd-operator:v4.2.0-201909201019

Comment 6 Mike Fiedler 2019-09-23 03:37:31 UTC
In my env the worker pods are always restarting twice before going to Running state.  Is that expected?

Comment 7 Walid A. 2019-09-23 12:02:42 UTC
Yes, it is expected, nfd-workers will have 1-2 restarts because nfd-masters are not yet ready serving.

Comment 8 errata-xmlrpc 2019-10-16 06:41:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.