Created attachment 1617031 [details] openshift-nfd events Description of problem: After installing NFD from OperatorHub and creating a NodeFeatureDiscovery CR from OperatorHub, all of the operand pods are crashlooping with this messages: # oc logs nfd-master-42prh 2019/09/20 05:54:37 version not set! Set -ldflags "-X sigs.k8s.io/node-feature-discovery/pkg/version.version=`git describe --tags --dirty --always`" during build or run. CR: apiVersion: v1 items: - apiVersion: nfd.openshift.io/v1alpha1 kind: NodeFeatureDiscovery metadata: creationTimestamp: "2019-09-20T05:48:58Z" generation: 1 name: nfd-master-server namespace: openshift-nfd resourceVersion: "821742" selfLink: /apis/nfd.openshift.io/v1alpha1/namespaces/openshift-nfd/nodefeaturediscoveries/nfd-master-server uid: 56ed42a1-db6a-11e9-a186-0202847169d6 spec: namespace: openshift-nfd kind: List metadata: resourceVersion: "" selfLink: "" Version-Release number of selected component (if applicable): NFD builds: ose-cluster-nfd-operator:v4.2.0-201909192219 ose-node-feature-discovery:v4.2.0-201909151553 How reproducible: Always Steps to Reproduce: 1. Set up cluster to use redhat-operators-art as the operator source (https://docs.google.com/document/d/1t81RSsZbUoGO4r5OgJ1bqAESKt2fM25MvV6pcgQUPSk/edit?usp=sharing) 2. oc adm new-project openshift-nfd 3. Go to OperatorHub and install NFD into the openshift-nfd project 4. Click on the NFD operator and go to Node Feature Discovery tab and create a NodeFeatureDiscovery CR in the openshift-nfd namespace 5. oc get pods -n openshift-nfd Actual results: NAME READY STATUS RESTARTS AGE nfd-master-42prh 0/1 Error 9 21m nfd-master-dmlrn 0/1 CrashLoopBackOff 4 117s nfd-master-sdklp 0/1 CrashLoopBackOff 8 21m nfd-operator-6bb75fcddc-htm9g 1/1 Running 0 21m nfd-worker-4b7f8 0/1 CrashLoopBackOff 8 21m nfd-worker-gbmcx 0/1 Error 9 21m nfd-worker-tdlmn 0/1 CrashLoopBackOff 8 21m nfd-worker-vlb72 0/1 CrashLoopBackOff 8 21m # oc logs nfd-master-42prh 2019/09/20 05:54:37 version not set! Set -ldflags "-X sigs.k8s.io/node-feature-discovery/pkg/version.version=`git describe --tags --dirty --always`" during build or run. Expected results: Operands running
Followed the steps and doc in the description, I was able to successfully install NFD operator and nfd operand from OperatorHub with latest images: - openshift/ose-node-feature-discovery:v4.2.0-201909201019 - openshift/ose-cluster-nfd-operator:v4.2.0-201909192219 $ oc get pods -n new-nfd-operator NAME READY STATUS RESTARTS AGE nfd-operator-6f84bc4556-5vhcq 1/1 Running 0 54m $ oc get pods -n openshift-nfd NAME READY STATUS RESTARTS AGE nfd-master-5gvh5 1/1 Running 0 54m nfd-master-gb88t 1/1 Running 0 35m nfd-master-stphc 1/1 Running 0 54m nfd-worker-2kp7d 1/1 Running 0 54m nfd-worker-ckxq4 1/1 Running 1 54m nfd-worker-plz5p 1/1 Running 1 54m $ oc describe node | grep feature feature.node.kubernetes.io/cpu-cpuid.ADX=true feature.node.kubernetes.io/cpu-cpuid.AESNI=true feature.node.kubernetes.io/cpu-cpuid.AVX=true feature.node.kubernetes.io/cpu-cpuid.AVX2=true feature.node.kubernetes.io/cpu-cpuid.AVX512BW=true feature.node.kubernetes.io/cpu-cpuid.AVX512CD=true feature.node.kubernetes.io/cpu-cpuid.AVX512DQ=true feature.node.kubernetes.io/cpu-cpuid.AVX512F=true feature.node.kubernetes.io/cpu-cpuid.AVX512VL=true feature.node.kubernetes.io/cpu-cpuid.FMA3=true feature.node.kubernetes.io/cpu-cpuid.HLE=true feature.node.kubernetes.io/cpu-cpuid.MPX=true feature.node.kubernetes.io/cpu-cpuid.RTM=true feature.node.kubernetes.io/cpu-hardware_multithreading=true feature.node.kubernetes.io/kernel-version.full=4.18.0-80.11.1.el8_0.x86_64 feature.node.kubernetes.io/kernel-version.major=4 feature.node.kubernetes.io/kernel-version.minor=18 feature.node.kubernetes.io/kernel-version.revision=0 feature.node.kubernetes.io/pci-1d0f.present=true feature.node.kubernetes.io/storage-nonrotationaldisk=true feature.node.kubernetes.io/system-os_release.ID=rhcos feature.node.kubernetes.io/system-os_release.VERSION_ID=4.2 feature.node.kubernetes.io/system-os_release.VERSION_ID.major=4 feature.node.kubernetes.io/system-os_release.VERSION_ID.minor=2 nfd.node.kubernetes.io/feature-labels: feature.node.kubernetes.io/cpu-cpuid.ADX=true feature.node.kubernetes.io/cpu-cpuid.AESNI=true feature.node.kubernetes.io/cpu-cpuid.AVX=true feature.node.kubernetes.io/cpu-cpuid.AVX2=true feature.node.kubernetes.io/cpu-cpuid.AVX512BW=true feature.node.kubernetes.io/cpu-cpuid.AVX512CD=true feature.node.kubernetes.io/cpu-cpuid.AVX512DQ=true feature.node.kubernetes.io/cpu-cpuid.AVX512F=true feature.node.kubernetes.io/cpu-cpuid.AVX512VL=true feature.node.kubernetes.io/cpu-cpuid.FMA3=true feature.node.kubernetes.io/cpu-cpuid.HLE=true feature.node.kubernetes.io/cpu-cpuid.MPX=true feature.node.kubernetes.io/cpu-cpuid.RTM=true feature.node.kubernetes.io/cpu-hardware_multithreading=true feature.node.kubernetes.io/kernel-version.full=4.18.0-80.11.1.el8_0.x86_64 feature.node.kubernetes.io/kernel-version.major=4 feature.node.kubernetes.io/kernel-version.minor=18 feature.node.kubernetes.io/kernel-version.revision=0 feature.node.kubernetes.io/pci-1d0f.present=true feature.node.kubernetes.io/storage-nonrotationaldisk=true feature.node.kubernetes.io/system-os_release.ID=rhcos feature.node.kubernetes.io/system-os_release.VERSION_ID=4.2 feature.node.kubernetes.io/system-os_release.VERSION_ID.major=4 feature.node.kubernetes.io/system-os_release.VERSION_ID.minor=2 nfd.node.kubernetes.io/feature-labels: feature.node.kubernetes.io/cpu-cpuid.ADX=true feature.node.kubernetes.io/cpu-cpuid.AESNI=true feature.node.kubernetes.io/cpu-cpuid.AVX=true feature.node.kubernetes.io/cpu-cpuid.AVX2=true feature.node.kubernetes.io/cpu-cpuid.AVX512BW=true feature.node.kubernetes.io/cpu-cpuid.AVX512CD=true feature.node.kubernetes.io/cpu-cpuid.AVX512DQ=true feature.node.kubernetes.io/cpu-cpuid.AVX512F=true feature.node.kubernetes.io/cpu-cpuid.AVX512VL=true feature.node.kubernetes.io/cpu-cpuid.FMA3=true feature.node.kubernetes.io/cpu-cpuid.HLE=true feature.node.kubernetes.io/cpu-cpuid.MPX=true feature.node.kubernetes.io/cpu-cpuid.RTM=true feature.node.kubernetes.io/cpu-hardware_multithreading=true feature.node.kubernetes.io/kernel-version.full=4.18.0-80.11.1.el8_0.x86_64 feature.node.kubernetes.io/kernel-version.major=4 feature.node.kubernetes.io/kernel-version.minor=18 feature.node.kubernetes.io/kernel-version.revision=0 feature.node.kubernetes.io/pci-1d0f.present=true feature.node.kubernetes.io/storage-nonrotationaldisk=true feature.node.kubernetes.io/system-os_release.ID=rhcos feature.node.kubernetes.io/system-os_release.VERSION_ID=4.2 feature.node.kubernetes.io/system-os_release.VERSION_ID.major=4 feature.node.kubernetes.io/system-os_release.VERSION_ID.minor=2 nfd.node.kubernetes.io/feature-labels:
Correction: I had listed the wrong tag for the ose-cluster-nfd-operator image in my previous comment. These are the correct image build versions that were used in the verification steps: - openshift/ose-node-feature-discovery:v4.2.0-201909201019 - openshift/ose-cluster-nfd-operator:v4.2.0-201909201019
In my env the worker pods are always restarting twice before going to Running state. Is that expected?
Yes, it is expected, nfd-workers will have 1-2 restarts because nfd-masters are not yet ready serving.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922