Description of problem: The scenario is to install the NFD operator via OperatorHub on ppc64le with 4.7 nightly builds. The operator installation is successful. There are two issues seen after that: 1. While creating the NodeFeatureDiscovery, the 3 Operand fields are empty for the Image, Image pull policy and namespace. 2. Need the right ppc64le image which can be used to create the operand. Tried image quay.io/openshift/origin-node-feature-discovery:4.7 from https://github.com/openshift/cluster-nfd-operator/blob/master/manifests/olm-catalog/4.7/nfd.v4.7.0.clusterserviceversion.yaml#L30 but it does not have Power support. Version-Release number of selected component (if applicable): # oc version Client Version: 4.7.0-0.nightly-ppc64le-2021-01-08-053006 Server Version: 4.7.0-0.nightly-ppc64le-2021-01-08-053006 Kubernetes Version: v1.20.0+6313d1d How reproducible: Always 1. Install the NFD operator via OperatorHub. Installation of the operator is successful. 2. Once it is installed, create NodeFeatureDiscovery from the Node Feature Discovery tab. Actual Results: The fields for Operand requiring the Image, Image pull policy and namespace are empty. I have tried entering the image from https://github.com/openshift/cluster-nfd-operator/blob/master/manifests/olm-catalog/4.7/nfd.v4.7.0.clusterserviceversion.yaml#L30. However this image is x86 only and does not work on ppc64le. The config looks like this: apiVersion: nfd.openshift.io/v1 kind: NodeFeatureDiscovery metadata: creationTimestamp: '2021-01-08T14:10:51Z' generation: 2 name: example namespace: openshift-operators resourceVersion: '91308' uid: 05a427f2-b55f-43eb-bbb5-25853a1ee595 spec: operand: image: 'quay.io/openshift/origin-node-feature-discovery:4.7' imagePullPolicy: Always namespace: node-feature-discovery-operator # oc get csv NAME DISPLAY VERSION REPLACES PHASE nfd.4.7.0-202012212130.p0 Node Feature Discovery 4.7.0-202012212130.p0 Succeeded The pods are failing because the NFD image quay.io/openshift/origin-node-feature-discovery:4.7 is not supported on Power. # oc get pods -A | grep nfd openshift-operators nfd-master-ctfsz 0/1 CrashLoopBackOff 757 2d19h openshift-operators nfd-master-fc5pd 0/1 CrashLoopBackOff 797 2d19h openshift-operators nfd-master-nkzxl 0/1 CrashLoopBackOff 798 2d19h openshift-operators nfd-operator-59d645bb4-fgxlt 1/1 Running 1 2d22h openshift-operators nfd-worker-rlwd9 0/1 CrashLoopBackOff 757 2d19h openshift-operators nfd-worker-xpphq 0/1 CrashLoopBackOff 756 2d19h # oc logs nfd-master-nkzxl -n openshift-operators standard_init_linux.go:219: exec user process caused: exec format error Expected Results: The Operand fields should contain the image based on the architecture. Please provide ppc64le image to be used in the fields.
Created attachment 1746247 [details] nfd-operand-screenshot.jpg
A specific procedure is required for testing OLM operators prior to GA: https://docs.engineering.redhat.com/display/MULTIARCH/How+To+Test+Red+Hat+ART+Operators The community images are not supported by Red Hat on any architecture, and in most cases are only available for x86_64. Attempting to use them would invalidate testing.
Making Yaakov's comment un-private so that the Power team could leverage the link for testing.
We have used and followed the instructions. We see the `nfd-operator` pod Running, but that does NOT do a thing about feature discovery. You need to "configure" it by creating an instance post install. When you try to do that, the page asks you to fill name, label, namespace, etc, but also "Image" to pull and run. I don't know if that's supposed to be pre-filled, as 4.6 didn't ask for such a thing. I believe this bug should be reopened, looked at the NFD team, and for building/posting multi-arch images, if not already done. Thanks.
Re-opening the bug based on Hiro's comments in https://bugzilla.redhat.com/show_bug.cgi?id=1914869#c4.
After discussing this bug with the Power testing team who opened this bug, I am setting this bug as a "Blocker+" as the bug is blocking an NFD regression test case executed by the Multi-Arch Power team; however, if the NFD Operator team believes otherwise, please feel free to let us know and make the appropriate changes.
The issue here is actually that in the default UI deployment of NFD, the master image is not set to the NODE_FEATURE_DISCOVERY_IMAGE supplied in the operator environment. If you add it in manually, via the GUI or yaml, nfd appears to deploy and work as expected. Full discussion is here: https://coreos.slack.com/archives/C0138QKKYTU/p1610468952274000?thread_ts=1610371907.258200&cid=C0138QKKYTU This is a regression in terms of the UI functionality from 4.6.
Created attachment 1748425 [details] nfd-operand-4.7.0-202101161147.p0
The Operand fields - Image, Image Pull Policy and Namespace still appear empty with the 4.7.0-202101161147.p0 version of NFD. Screenshot attachment 1748425 [details] for reference. # oc version Client Version: 4.7.0-0.nightly-ppc64le-2021-01-18-024748 Server Version: 4.7.0-0.nightly-ppc64le-2021-01-18-024748 Kubernetes Version: v1.20.0+d9c52cc # oc get packagemanifest | grep nfd nfd Red Hat Operators v4.7 Stage 19m # oc get csv | grep nfd nfd.4.7.0-202101161147.p0 Node Feature Discovery 4.7.0-202101161147.p0 Succeeded # oc get pods -A | grep nfd openshift-operators nfd-operator-7c46664675-mvps2 1/1 Running 0 6m6s
Created attachment 1749519 [details] NFD-operand-nfd.4.7.0-202101210137.p
Have re-deployed NFD with version nfd.4.7.0-202101210137.p. The fields get populated now but the image provided "quay.io/openshift/origin-node-feature-discovery:4.7" is not multi-arch. See Screenshot attachment 1749519 [details] for reference. We would need this image pre-populated as per the arch. Cluster build details: # oc version Client Version: 4.7.0-0.nightly-ppc64le-2021-01-21-052650 Server Version: 4.7.0-0.nightly-ppc64le-2021-01-21-052650 Kubernetes Version: v1.20.0+91b6da5 # oc get packagemanifest | grep nfd nfd Red Hat Operators v4.7 Stage 8h # oc get csv | grep nfd nfd.4.7.0-202101210137.p0 Node Feature Discovery 4.7.0-202101210137.p0 Succeeded
Verified on AWS nightly build `4.7.0-0.nightly-2021-01-22-104107` the NodeFeatureDiscoveries instance fields are pre-populated as expected. nfd csv version: nfd.4.7.0-202101230053.p0
When a release is GA, the production version of Operators is pulled from the Red Hat registry (e.g. registry.redhat.io/openshift4/ose-cluster-nfd-operator : https://catalog.redhat.com/software/containers/openshift4/ose-cluster-nfd-operator/5d9e23f1bed8bd2245d9378c?container-tabs=overview this ART built images have support for all the Red Hat supported Multi-Arch Before that you can manually use an image from https://brewweb.engineering.redhat.com/brew/search?match=glob&type=build&terms=node-feature-discovery-container-*4.7*
@pdsilva do you have an OCP environment on ppc64le to verify the fix ? thanks.
ART provides the multi-arch team builds that point directly to the brew registry for testing. The latest builds should have been pointing to registry.redhat.io. Does this only start happening when the builds get pushed to stage?
The problem here is that the operator image appears to populate the default nfd image to the quay origin URL instead of the corresponding redhat.registry.io address. It never used to do this before - it has always pointed directly to the redhat.registry.io image corresponding to the release in question. This appears to be a bug. I'm having Hiro attempt to reproduce the bug with the CFC stage index image. https://docs.engineering.redhat.com/display/CFC/Test If this fails, this operator may go live populating the image operator with the upstream/origin image. (Which is not something you would detect on x86, but that image doesn't work for multi-arch).
The latest staging index image in the 4.7 channel appears to be from 12/21/21 and does not include the fix above.
Have verified NFD installation on OCP 4.7.0-rc.1 on Power with the staging OperatorSource. The installation is successful and the operand image shows registry.redhat.io/openshift4/ose-node-feature-discovery:v4.7.0. I have currently used the 202102130115.p0 image from brew registry "registry.redhat.io/openshift4/ose-node-feature-discovery@sha256:417e5b1e0c60f67b82462e3bd7a678ae913968b5908b6b717f706ed34520d071" with which the pods are in Running state. # oc version Client Version: 4.7.0-rc.1 Server Version: 4.7.0-rc.1 Kubernetes Version: v1.20.0+ba45583 # oc get csv | grep nfd nfd.4.7.0-202102111715.p0 Node Feature Discovery 4.7.0-202102111715.p0 Succeeded # oc get pods -A | grep nfd openshift-operators nfd-master-44nvn 1/1 Running 0 49m openshift-operators nfd-master-5lhhq 1/1 Running 0 47m openshift-operators nfd-master-dnnb7 1/1 Running 0 48m openshift-operators nfd-operator-65955df6f4-gfp94 1/1 Running 0 36h openshift-operators nfd-worker-ct9tr 1/1 Running 0 45m openshift-operators nfd-worker-g5wtp 1/1 Running 0 22m
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 extras and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5635
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days