Bug 1753871

Summary: NFD operand pods crash looping after install from OperatorHub: version not set!
Product: OpenShift Container Platform Reporter: Mike Fiedler <mifiedle>
Component: Node Feature Discovery OperatorAssignee: Zvonko Kosic <zkosic>
Status: CLOSED ERRATA QA Contact: Walid A. <wabouham>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.2.0CC: sejug, wabouham
Target Milestone: ---Keywords: TestBlocker
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-16 06:41:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
openshift-nfd events none

Description Mike Fiedler 2019-09-20 06:12:46 UTC
Created attachment 1617031 [details]
openshift-nfd events

Description of problem:

After installing NFD from OperatorHub and creating a NodeFeatureDiscovery CR from OperatorHub, all of the operand pods are crashlooping with this messages:

# oc logs nfd-master-42prh
2019/09/20 05:54:37 version not set! Set -ldflags "-X sigs.k8s.io/node-feature-discovery/pkg/version.version=`git describe --tags --dirty --always`" during build or run.

CR:

apiVersion: v1
items:
- apiVersion: nfd.openshift.io/v1alpha1
  kind: NodeFeatureDiscovery
  metadata:
    creationTimestamp: "2019-09-20T05:48:58Z"
    generation: 1
    name: nfd-master-server
    namespace: openshift-nfd
    resourceVersion: "821742"
    selfLink: /apis/nfd.openshift.io/v1alpha1/namespaces/openshift-nfd/nodefeaturediscoveries/nfd-master-server
    uid: 56ed42a1-db6a-11e9-a186-0202847169d6
  spec:
    namespace: openshift-nfd
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""


 


Version-Release number of selected component (if applicable):

NFD builds:

ose-cluster-nfd-operator:v4.2.0-201909192219                                                                                                                                       
ose-node-feature-discovery:v4.2.0-201909151553 


How reproducible: Always


Steps to Reproduce:
1. Set up cluster to use redhat-operators-art as the operator source (https://docs.google.com/document/d/1t81RSsZbUoGO4r5OgJ1bqAESKt2fM25MvV6pcgQUPSk/edit?usp=sharing)
2. oc adm new-project openshift-nfd
3. Go to OperatorHub and install NFD into the openshift-nfd project
4. Click on the NFD operator and go to Node Feature Discovery tab and create a NodeFeatureDiscovery CR in the openshift-nfd namespace
5. oc get pods -n openshift-nfd

Actual results:

NAME                            READY   STATUS             RESTARTS   AGE
nfd-master-42prh                0/1     Error              9          21m
nfd-master-dmlrn                0/1     CrashLoopBackOff   4          117s
nfd-master-sdklp                0/1     CrashLoopBackOff   8          21m
nfd-operator-6bb75fcddc-htm9g   1/1     Running            0          21m
nfd-worker-4b7f8                0/1     CrashLoopBackOff   8          21m
nfd-worker-gbmcx                0/1     Error              9          21m
nfd-worker-tdlmn                0/1     CrashLoopBackOff   8          21m
nfd-worker-vlb72                0/1     CrashLoopBackOff   8          21m

# oc logs nfd-master-42prh
2019/09/20 05:54:37 version not set! Set -ldflags "-X sigs.k8s.io/node-feature-discovery/pkg/version.version=`git describe --tags --dirty --always`" during build or run.

Expected results:  Operands running

Comment 4 Walid A. 2019-09-20 21:49:16 UTC
Followed the steps and doc in the description, I was able to successfully install NFD operator and nfd operand from OperatorHub with latest images: 
- openshift/ose-node-feature-discovery:v4.2.0-201909201019
- openshift/ose-cluster-nfd-operator:v4.2.0-201909192219

$ oc get pods -n new-nfd-operator
NAME                            READY   STATUS    RESTARTS   AGE
nfd-operator-6f84bc4556-5vhcq   1/1     Running   0          54m

$ oc get pods -n openshift-nfd
NAME               READY   STATUS    RESTARTS   AGE
nfd-master-5gvh5   1/1     Running   0          54m
nfd-master-gb88t   1/1     Running   0          35m
nfd-master-stphc   1/1     Running   0          54m
nfd-worker-2kp7d   1/1     Running   0          54m
nfd-worker-ckxq4   1/1     Running   1          54m
nfd-worker-plz5p   1/1     Running   1          54m

$ oc describe node | grep feature
                    feature.node.kubernetes.io/cpu-cpuid.ADX=true
                    feature.node.kubernetes.io/cpu-cpuid.AESNI=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX2=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512BW=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512CD=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512DQ=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512F=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512VL=true
                    feature.node.kubernetes.io/cpu-cpuid.FMA3=true
                    feature.node.kubernetes.io/cpu-cpuid.HLE=true
                    feature.node.kubernetes.io/cpu-cpuid.MPX=true
                    feature.node.kubernetes.io/cpu-cpuid.RTM=true
                    feature.node.kubernetes.io/cpu-hardware_multithreading=true
                    feature.node.kubernetes.io/kernel-version.full=4.18.0-80.11.1.el8_0.x86_64
                    feature.node.kubernetes.io/kernel-version.major=4
                    feature.node.kubernetes.io/kernel-version.minor=18
                    feature.node.kubernetes.io/kernel-version.revision=0
                    feature.node.kubernetes.io/pci-1d0f.present=true
                    feature.node.kubernetes.io/storage-nonrotationaldisk=true
                    feature.node.kubernetes.io/system-os_release.ID=rhcos
                    feature.node.kubernetes.io/system-os_release.VERSION_ID=4.2
                    feature.node.kubernetes.io/system-os_release.VERSION_ID.major=4
                    feature.node.kubernetes.io/system-os_release.VERSION_ID.minor=2
                    nfd.node.kubernetes.io/feature-labels:
                    feature.node.kubernetes.io/cpu-cpuid.ADX=true
                    feature.node.kubernetes.io/cpu-cpuid.AESNI=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX2=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512BW=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512CD=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512DQ=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512F=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512VL=true
                    feature.node.kubernetes.io/cpu-cpuid.FMA3=true
                    feature.node.kubernetes.io/cpu-cpuid.HLE=true
                    feature.node.kubernetes.io/cpu-cpuid.MPX=true
                    feature.node.kubernetes.io/cpu-cpuid.RTM=true
                    feature.node.kubernetes.io/cpu-hardware_multithreading=true
                    feature.node.kubernetes.io/kernel-version.full=4.18.0-80.11.1.el8_0.x86_64
                    feature.node.kubernetes.io/kernel-version.major=4
                    feature.node.kubernetes.io/kernel-version.minor=18
                    feature.node.kubernetes.io/kernel-version.revision=0
                    feature.node.kubernetes.io/pci-1d0f.present=true
                    feature.node.kubernetes.io/storage-nonrotationaldisk=true
                    feature.node.kubernetes.io/system-os_release.ID=rhcos
                    feature.node.kubernetes.io/system-os_release.VERSION_ID=4.2
                    feature.node.kubernetes.io/system-os_release.VERSION_ID.major=4
                    feature.node.kubernetes.io/system-os_release.VERSION_ID.minor=2
                    nfd.node.kubernetes.io/feature-labels:
                    feature.node.kubernetes.io/cpu-cpuid.ADX=true
                    feature.node.kubernetes.io/cpu-cpuid.AESNI=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX2=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512BW=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512CD=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512DQ=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512F=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512VL=true
                    feature.node.kubernetes.io/cpu-cpuid.FMA3=true
                    feature.node.kubernetes.io/cpu-cpuid.HLE=true
                    feature.node.kubernetes.io/cpu-cpuid.MPX=true
                    feature.node.kubernetes.io/cpu-cpuid.RTM=true
                    feature.node.kubernetes.io/cpu-hardware_multithreading=true
                    feature.node.kubernetes.io/kernel-version.full=4.18.0-80.11.1.el8_0.x86_64
                    feature.node.kubernetes.io/kernel-version.major=4
                    feature.node.kubernetes.io/kernel-version.minor=18
                    feature.node.kubernetes.io/kernel-version.revision=0
                    feature.node.kubernetes.io/pci-1d0f.present=true
                    feature.node.kubernetes.io/storage-nonrotationaldisk=true
                    feature.node.kubernetes.io/system-os_release.ID=rhcos
                    feature.node.kubernetes.io/system-os_release.VERSION_ID=4.2
                    feature.node.kubernetes.io/system-os_release.VERSION_ID.major=4
                    feature.node.kubernetes.io/system-os_release.VERSION_ID.minor=2
                    nfd.node.kubernetes.io/feature-labels:

Comment 5 Walid A. 2019-09-20 22:11:05 UTC
Correction:  I had listed the wrong tag for the ose-cluster-nfd-operator image in my previous comment.

These are the correct image build versions that were used in the verification steps:

- openshift/ose-node-feature-discovery:v4.2.0-201909201019
- openshift/ose-cluster-nfd-operator:v4.2.0-201909201019

Comment 6 Mike Fiedler 2019-09-23 03:37:31 UTC
In my env the worker pods are always restarting twice before going to Running state.  Is that expected?

Comment 7 Walid A. 2019-09-23 12:02:42 UTC
Yes, it is expected, nfd-workers will have 1-2 restarts because nfd-masters are not yet ready serving.

Comment 8 errata-xmlrpc 2019-10-16 06:41:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922