+++ This bug was initially created as a clone of Bug #1986305 +++ Description of problem: When deploying and upgrading the nfd operator on a disconnected environment there is always a need of manually editing the operand image. If the user remove the image from the operand section the config is applied on the cluster but the operator will failed to create the daemonset with the following error: 2021-07-26T16:15:50.238Z ERROR controller-runtime.manager.controller.nodefeaturediscovery Reconciler error {"reconciler group": "nfd.openshift.io", "reconciler kind": "NodeFeatureDiscovery", "name": "nfd-instance", "namespace": "openshift-nfd", "error": "DaemonSet.apps \"nfd-master\" is invalid: spec.template.spec.containers[0].image: Required value"} github.com/go-logr/zapr.(*zapLogger).Error /go/src/github.com/openshift/cluster-nfd-operator/vendor/github.com/go-logr/zapr/zapr.go:132 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler /go/src/github.com/openshift/cluster-nfd-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:267 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/src/github.com/openshift/cluster-nfd-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1 /go/src/github.com/openshift/cluster-nfd-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:198 k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1 /go/src/github.com/openshift/cluster-nfd-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1 /go/src/github.com/openshift/cluster-nfd-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 k8s.io/apimachinery/pkg/util/wait.BackoffUntil /go/src/github.com/openshift/cluster-nfd-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 k8s.io/apimachinery/pkg/util/wait.JitterUntil /go/src/github.com/openshift/cluster-nfd-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext /go/src/github.com/openshift/cluster-nfd-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185 k8s.io/apimachinery/pkg/util/wait.UntilWithContext /go/src/github.com/openshift/cluster-nfd-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:99 How reproducible: 100% Steps to Reproduce: 1. deploy nfd operator 2. deploy the nfdConfig CR without image in the operand section Expected results: If the image field is empty use the environment variable that exist in the nfd-operator this will use the right image of the operand that was built for the specific bundle version and was mirror from the bundle. It will also allow switch to new images when the operator is upgraded without the request from the user to check the csv for related images and switch the config. - name: OPERATOR_NAME value: cluster-nfd-operator - name: NODE_FEATURE_DISCOVERY_IMAGE value: registry.redhat.io/openshift4/ose-node-feature-discovery@sha256:5d9be97edd55051934bc9129fed627ef24211e8c1758e52253f64d0eeffadf3b - name: OPERATOR_CONDITION_NAME value: node-feature-discovery-operator.v4.8. image: registry.redhat.io/openshift4/ose-cluster-nfd-operator@sha256:1d1446af00893668d34452f367914ea2f43ad93a693ee7bf58905208b558ec79
Verified on OCP 4.8.0-0.nightly-2021-08-17-004424 on AWS IPI cluster. NFD nodefeaturediscovery instance was created when the CR was deployed from yaml file with oc apply -f and with the image name removed. # cat 010_namespace.yml apiVersion: v1 kind: Namespace metadata: name: openshift-nfd # oc apply -f 010_namespace.yml Created new NFD bundle from release-4.8 branch of NFD operator github repo clone: # git clone https://github.com/openshift/cluster-nfd-operator.git # cd cluster-nfd-operator # git checkout release-4.8 # export REGISTRY_AUTH_FILE=/root/<pull-secret.json> # ORG=<username> PULLPOLICY=Always IMAGE_REGISTRY='quay.io/<username>' IMAGE_PUSH_CMD='podman push' IMAGE_BUILD_CMD='podman build' make image # podman tag <nfd-operator-image-id> quay.io/<username>/cluster-nfd-operator:4.8.5 # podman login -u <username> quay.io # podman push quay.io/<username>/cluster-nfd-operator:4.8.5 # export VERSION=4.8.5 ORG=<username> BUNDLE_IMG='quay.io/<username>/nfd-operator-bundle:4.8.5' PULLPOLICY=Always IMAGE_REGISTRY='quay.io/<username>' IMAGE_PUSH_CMD='podman push' IMAGE_BUILD_CMD='podman build' make bundle bundle-build bundle-push VERSION=4.8.5 operator-sdk run bundle "quay.io/<username>/nfd-operator-bundle:4.8.5" -n openshift-nfd INFO[0006] Successfully created registry pod: quay-io-<username>-nfd-operator-bundle-4-8-5 INFO[0006] Created CatalogSource: node-feature-discovery-operator-catalog INFO[0006] OperatorGroup "operator-sdk-og" created INFO[0006] Created Subscription: node-feature-discovery-operator-v4-8-5-sub INFO[0010] Approved InstallPlan install-wwvfz for the Subscription: node-feature-discovery-operator-v4-8-5-sub INFO[0010] Waiting for ClusterServiceVersion "openshift-nfd/node-feature-discovery-operator.v4.8.5" to reach 'Succeeded' phase INFO[0010] Waiting for ClusterServiceVersion "openshift-nfd/node-feature-discovery-operator.v4.8.5" to appear INFO[0017] Found ClusterServiceVersion "openshift-nfd/node-feature-discovery-operator.v4.8.5" phase: Pending INFO[0018] Found ClusterServiceVersion "openshift-nfd/node-feature-discovery-operator.v4.8.5" phase: InstallReady INFO[0019] Found ClusterServiceVersion "openshift-nfd/node-feature-discovery-operator.v4.8.5" phase: Installing INFO[0049] Found ClusterServiceVersion "openshift-nfd/node-feature-discovery-operator.v4.8.5" phase: Succeeded INFO[0049] OLM has successfully installed "node-feature-discovery-operator.v4.8.5" oc get clusterserviceversion.operators.coreos.com/node-feature-discovery-operator.v4.8.5 -n openshift-nfd -o json | jq -r '.metadata.annotations."alm-examples"' | jq .[0] | jq --arg ns openshift-nfd '.metadata.namespace = $ns' | jq --arg ns openshift-nfd '.spec.namespace = $ns' > nfd_cr_4.8.5.json # remove the image name from nfd_cr_4.8.5.json and rename CR file cat nfd_cr_4.8.5.json_removed_operand_image { "apiVersion": "nfd.openshift.io/v1", "kind": "NodeFeatureDiscovery", "metadata": { "name": "nfd-instance", "namespace": "openshift-nfd" }, "spec": { "customConfig": { "configData": "# - name: \"more.kernel.features\"\n# matchOn:\n# - loadedKMod: [\"example_kmod3\"]\n# - name: \"more.features.by.nodename\"\n# value: customValue\n# matchOn:\n# - nodename: [\"special-.*-node-.*\"]\n" }, "instance": "", "operand": { "imagePullPolicy": "Always", "namespace": "openshift-nfd" }, "workerConfig": { "configData": "core:\n# labelWhiteList:\n# noPublish: false\n sleepInterval: 60s\n# sources: [all]\n# klog:\n# addDirHeader: false\n# alsologtostderr: false\n# logBacktraceAt:\n# logtostderr: true\n# skipHeaders: false\n# stderrthreshold: 2\n# v: 0\n# vmodule:\n## NOTE: the following options are not dynamically run-time configurable\n## and require a nfd-worker restart to take effect after being changed\n# logDir:\n# logFile:\n# logFileMaxSize: 1800\n# skipLogHeaders: false\nsources:\n# cpu:\n# cpuid:\n## NOTE: whitelist has priority over blacklist\n# attributeBlacklist:\n# - \"BMI1\"\n# - \"BMI2\"\n# - \"CLMUL\"\n# - \"CMOV\"\n# - \"CX16\"\n# - \"ERMS\"\n# - \"F16C\"\n# - \"HTT\"\n# - \"LZCNT\"\n# - \"MMX\"\n# - \"MMXEXT\"\n# - \"NX\"\n# - \"POPCNT\"\n# - \"RDRAND\"\n# - \"RDSEED\"\n# - \"RDTSCP\"\n# - \"SGX\"\n# - \"SSE\"\n# - \"SSE2\"\n# - \"SSE3\"\n# - \"SSE4.1\"\n# - \"SSE4.2\"\n# - \"SSSE3\"\n# attributeWhitelist:\n# kernel:\n# kconfigFile: \"/path/to/kconfig\"\n# configOpts:\n# - \"NO_HZ\"\n# - \"X86\"\n# - \"DMI\"\n pci:\n deviceClassWhitelist:\n - \"0200\"\n - \"03\"\n - \"12\"\n deviceLabelFields:\n# - \"class\"\n - \"vendor\"\n# - \"device\"\n# - \"subsystem_vendor\"\n# - \"subsystem_device\"\n# usb:\n# deviceClassWhitelist:\n# - \"0e\"\n# - \"ef\"\n# - \"fe\"\n# - \"ff\"\n# deviceLabelFields:\n# - \"class\"\n# - \"vendor\"\n# - \"device\"\n# custom:\n# - name: \"my.kernel.feature\"\n# matchOn:\n# - loadedKMod: [\"example_kmod1\", \"example_kmod2\"]\n# - name: \"my.pci.feature\"\n# matchOn:\n# - pciId:\n# class: [\"0200\"]\n# vendor: [\"15b3\"]\n# device: [\"1014\", \"1017\"]\n# - pciId :\n# vendor: [\"8086\"]\n# device: [\"1000\", \"1100\"]\n# - name: \"my.usb.feature\"\n# matchOn:\n# - usbId:\n# class: [\"ff\"]\n# vendor: [\"03e7\"]\n# device: [\"2485\"]\n# - usbId:\n# class: [\"fe\"]\n# vendor: [\"1a6e\"]\n# device: [\"089a\"]\n# - name: \"my.combined.feature\"\n# matchOn:\n# - pciId:\n# vendor: [\"15b3\"]\n# device: [\"1014\", \"1017\"]\n# loadedKMod : [\"vendor_kmod1\", \"vendor_kmod2\"]\n" }, "namespace": "openshift-nfd" } } # oc apply -f nfd_cr_4.8.5.json_removed_operand_image # oc get pods -n openshift-nfd NAME READY STATUS RESTARTS AGE f0641f817e848f07b6508f21783cf46079edac611f247dc064a30e7d8d4r2q9 0/1 Completed 0 100m nfd-controller-manager-6dc6c6dc5c-v6fmk 2/2 Running 0 100m nfd-master-fwwbf 1/1 Running 0 36m nfd-master-lh628 1/1 Running 0 36m nfd-master-wfmz8 1/1 Running 0 36m nfd-worker-cgqnh 1/1 Running 0 36m nfd-worker-klps4 1/1 Running 0 36m nfd-worker-xnm7j 1/1 Running 0 36m quay-io-<username>-nfd-operator-bundle-4-8-5 1/1 Running 0 100m # oc describe pod/nfd-worker-cgqnh -n openshift-nfd Name: nfd-worker-cgqnh Namespace: openshift-nfd Priority: 0 Node: ip-10-0-193-61.us-east-2.compute.internal/10.0.193.61 Start Time: Tue, 17 Aug 2021 20:51:13 +0000 Labels: app=nfd-worker controller-revision-hash=669869d9b9 pod-template-generation=1 Annotations: openshift.io/scc: nfd-worker Status: Running IP: 10.0.193.61 IPs: IP: 10.0.193.61 Controlled By: DaemonSet/nfd-worker Containers: nfd-worker: Container ID: cri-o://ab8d1ddc97d4f13dcea1b88538dc2870dcba5abb88f13a33bb2da7f7ffb39f2b Image: quay.io/openshift/origin-node-feature-discovery:4.8 Image ID: quay.io/openshift/origin-node-feature-discovery@sha256:2cd02ec6e65e19b26d2e79898ab0f535916cf1f3e0bc894c1190370100b9719f Port: <none> Host Port: <none> Command: nfd-worker Args: --server=nfd-master:$(NFD_MASTER_SERVICE_PORT) State: Running Started: Tue, 17 Aug 2021 20:51:22 +0000 Ready: True Restart Count: 0 Environment: NODE_NAME: (v1:spec.nodeName) Mounts: /etc/kubernetes/node-feature-discovery from nfd-worker-config (rw) /etc/kubernetes/node-feature-discovery/custom.d/custom-rules from custom-config (ro) /etc/kubernetes/node-feature-discovery/features.d from nfd-features (rw) /etc/kubernetes/node-feature-discovery/source.d from nfd-hooks (rw) /host-boot from host-boot (ro) /host-etc/os-release from host-os-release (ro) /host-sys from host-sys (rw) /host-usr/lib from host-usr-lib (ro) /host-usr/src from host-usr-src (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-c9drg (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: host-boot: Type: HostPath (bare host directory volume) Path: /boot HostPathType: host-os-release: Type: HostPath (bare host directory volume) Path: /etc/os-release HostPathType: host-sys: Type: HostPath (bare host directory volume) Path: /sys HostPathType: host-usr-lib: Type: HostPath (bare host directory volume) Path: /usr/lib HostPathType: host-usr-src: Type: HostPath (bare host directory volume) Path: /usr/src HostPathType: nfd-hooks: Type: HostPath (bare host directory volume) Path: /etc/kubernetes/node-feature-discovery/source.d HostPathType: nfd-features: Type: HostPath (bare host directory volume) Path: /etc/kubernetes/node-feature-discovery/features.d HostPathType: nfd-worker-config: Type: ConfigMap (a volume populated by a ConfigMap) Name: nfd-worker Optional: false custom-config: Type: ConfigMap (a volume populated by a ConfigMap) Name: nfd-worker Optional: false kube-api-access-c9drg: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: BestEffort Node-Selectors: <none> Tolerations: :NoSchedule op=Exists node.kubernetes.io/disk-pressure:NoSchedule op=Exists node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/network-unavailable:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists node.kubernetes.io/pid-pressure:NoSchedule op=Exists node.kubernetes.io/unreachable:NoExecute op=Exists node.kubernetes.io/unschedulable:NoSchedule op=Exists Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 36m default-scheduler Successfully assigned openshift-nfd/nfd-worker-cgqnh to ip-10-0-193-61.us-east-2.compute.internal Normal Pulling 36m kubelet Pulling image "quay.io/openshift/origin-node-feature-discovery:4.8" Normal Pulled 36m kubelet Successfully pulled image "quay.io/openshift/origin-node-feature-discovery:4.8" in 8.066785052s Normal Created 36m kubelet Created container nfd-worker Normal Started 36m kubelet Started container nfd-worker #
OpenShift engineering has decided to NOT ship 4.8.6 on 8/23 due to the following issue. https://bugzilla.redhat.com/show_bug.cgi?id=1995785 All the fixes part will be now included in 4.8.7 on 8/30.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.8.9 extras update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3249