Bug 1914869 - OCP 4.7 NFD - Operand configuration options for NodeFeatureDiscovery are empty, no supported image for ppc64le
Summary: OCP 4.7 NFD - Operand configuration options for NodeFeatureDiscovery are empt...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node Feature Discovery Operator
Version: 4.7
Hardware: ppc64le
OS: Linux
unspecified
urgent
Target Milestone: ---
: 4.7.0
Assignee: Carlos Eduardo Arango Gutierrez
QA Contact: pdsilva
URL:
Whiteboard:
Depends On: 1927489
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-11 11:35 UTC by pdsilva
Modified: 2023-09-15 00:58 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1927489 (view as bug list)
Environment:
Last Closed: 2021-02-24 15:01:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
nfd-operand-screenshot.jpg (61.25 KB, image/jpeg)
2021-01-11 12:32 UTC, pdsilva
no flags Details
nfd-operand-4.7.0-202101161147.p0 (103.31 KB, image/jpeg)
2021-01-18 10:16 UTC, pdsilva
no flags Details
NFD-operand-nfd.4.7.0-202101210137.p (31.45 KB, image/jpeg)
2021-01-21 20:28 UTC, pdsilva
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-nfd-operator pull 121 0 None closed [release-4.7] Bug 1914869: fix alm-example 2021-02-17 18:19:03 UTC
Github openshift cluster-nfd-operator pull 129 0 None closed Bug 1914869: Fix alm-example json parsing 2021-02-15 15:59:51 UTC
Github openshift cluster-nfd-operator pull 137 0 None closed Bug 1914869: replace latest for 4.7, so is easier to replace images by ART 2021-02-15 15:59:51 UTC
Red Hat Product Errata RHSA-2020:5635 0 None None None 2021-02-24 15:02:57 UTC

Description pdsilva 2021-01-11 11:35:10 UTC
Description of problem:
The scenario is to install the NFD operator via OperatorHub on ppc64le with 4.7 nightly builds. The operator installation is successful. There are two issues seen after that:

1. While creating the NodeFeatureDiscovery, the 3 Operand fields are empty for the Image, Image pull policy and namespace. 

2. Need the right ppc64le image which can be used to create the operand. Tried image quay.io/openshift/origin-node-feature-discovery:4.7 from  https://github.com/openshift/cluster-nfd-operator/blob/master/manifests/olm-catalog/4.7/nfd.v4.7.0.clusterserviceversion.yaml#L30 but it does not have Power support. 
 
Version-Release number of selected component (if applicable):
# oc version
Client Version: 4.7.0-0.nightly-ppc64le-2021-01-08-053006
Server Version: 4.7.0-0.nightly-ppc64le-2021-01-08-053006
Kubernetes Version: v1.20.0+6313d1d


How reproducible: Always


1. Install the NFD operator via OperatorHub. Installation of the operator is successful.

2. Once it is installed, create NodeFeatureDiscovery from the Node Feature Discovery tab.

Actual Results:

The fields for Operand requiring the Image, Image pull policy and namespace  are empty. 
I have tried entering the image from https://github.com/openshift/cluster-nfd-operator/blob/master/manifests/olm-catalog/4.7/nfd.v4.7.0.clusterserviceversion.yaml#L30. However this image is x86 only and does not work on ppc64le.

The config looks like this:

apiVersion: nfd.openshift.io/v1
kind: NodeFeatureDiscovery
metadata:
  creationTimestamp: '2021-01-08T14:10:51Z'
  generation: 2
  name: example
  namespace: openshift-operators
  resourceVersion: '91308'
  uid: 05a427f2-b55f-43eb-bbb5-25853a1ee595
spec:
  operand:
    image: 'quay.io/openshift/origin-node-feature-discovery:4.7'
    imagePullPolicy: Always
    namespace: node-feature-discovery-operator

# oc get csv
NAME                        DISPLAY                  VERSION                 REPLACES   PHASE
nfd.4.7.0-202012212130.p0   Node Feature Discovery   4.7.0-202012212130.p0              Succeeded


The pods are failing because the NFD image quay.io/openshift/origin-node-feature-discovery:4.7 is not supported on Power.

# oc get pods -A | grep nfd
openshift-operators                                nfd-master-ctfsz                                                  0/1     CrashLoopBackOff   757        2d19h
openshift-operators                                nfd-master-fc5pd                                                  0/1     CrashLoopBackOff   797        2d19h
openshift-operators                                nfd-master-nkzxl                                                  0/1     CrashLoopBackOff   798        2d19h
openshift-operators                                nfd-operator-59d645bb4-fgxlt                                      1/1     Running            1          2d22h
openshift-operators                                nfd-worker-rlwd9                                                  0/1     CrashLoopBackOff   757        2d19h
openshift-operators                                nfd-worker-xpphq                                                  0/1     CrashLoopBackOff   756        2d19h


# oc logs nfd-master-nkzxl  -n openshift-operators
standard_init_linux.go:219: exec user process caused: exec format error


Expected Results:
The Operand fields should contain the image based on the architecture. Please provide ppc64le image to be used in the fields.

Comment 1 pdsilva 2021-01-11 12:32:19 UTC
Created attachment 1746247 [details]
nfd-operand-screenshot.jpg

Comment 2 Yaakov Selkowitz 2021-01-11 13:47:45 UTC
A specific procedure is required for testing OLM operators prior to GA:

https://docs.engineering.redhat.com/display/MULTIARCH/How+To+Test+Red+Hat+ART+Operators

The community images are not supported by Red Hat on any architecture, and in most cases are only available for x86_64.  Attempting to use them would invalidate testing.

Comment 3 Dan Li 2021-01-11 14:04:08 UTC
Making Yaakov's comment un-private so that the Power team could leverage the link for testing.

Comment 4 Hiro Miyamoto 2021-01-11 18:10:22 UTC
We have used and followed the instructions. We see the `nfd-operator` pod Running, but that does NOT do a thing about feature discovery. You need to "configure" it by creating an instance post install. When you try to do that, the page asks you to fill name, label, namespace, etc, but also "Image" to pull and run. I don't know if that's supposed to be pre-filled, as 4.6 didn't ask for such a thing.

I believe this bug should be reopened, looked at the NFD team, and for building/posting multi-arch images, if not already done. Thanks.

Comment 5 pdsilva 2021-01-12 11:35:47 UTC
Re-opening the bug based on Hiro's comments in https://bugzilla.redhat.com/show_bug.cgi?id=1914869#c4.

Comment 6 Dan Li 2021-01-12 15:21:41 UTC
After discussing this bug with the Power testing team who opened this bug, I am setting this bug as a "Blocker+" as the bug is blocking an NFD regression test case executed by the Multi-Arch Power team; however, if the NFD Operator team believes otherwise, please feel free to let us know and make the appropriate changes.

Comment 8 Jeremy Poulin 2021-01-12 16:45:47 UTC
The issue here is actually that in the default UI deployment of NFD, the master image is not set to the NODE_FEATURE_DISCOVERY_IMAGE supplied in the operator environment.
If you add it in manually, via the GUI or yaml, nfd appears to deploy and work as expected.

Full discussion is here:
https://coreos.slack.com/archives/C0138QKKYTU/p1610468952274000?thread_ts=1610371907.258200&cid=C0138QKKYTU

This is a regression in terms of the UI functionality from 4.6.

Comment 10 pdsilva 2021-01-18 10:16:07 UTC
Created attachment 1748425 [details]
nfd-operand-4.7.0-202101161147.p0

Comment 11 pdsilva 2021-01-18 10:22:30 UTC
The Operand fields - Image, Image Pull Policy and Namespace still appear empty with the 4.7.0-202101161147.p0 version of NFD. Screenshot attachment 1748425 [details] for reference.  

# oc version
Client Version: 4.7.0-0.nightly-ppc64le-2021-01-18-024748
Server Version: 4.7.0-0.nightly-ppc64le-2021-01-18-024748
Kubernetes Version: v1.20.0+d9c52cc

# oc get packagemanifest | grep nfd
nfd                                 Red Hat Operators v4.7 Stage   19m

# oc get csv  | grep nfd
nfd.4.7.0-202101161147.p0   Node Feature Discovery   4.7.0-202101161147.p0              Succeeded

# oc get pods -A | grep nfd
openshift-operators                                nfd-operator-7c46664675-mvps2                                     1/1     Running     0          6m6s

Comment 13 pdsilva 2021-01-21 20:28:34 UTC
Created attachment 1749519 [details]
NFD-operand-nfd.4.7.0-202101210137.p

Comment 14 pdsilva 2021-01-21 20:35:01 UTC
Have re-deployed NFD with version nfd.4.7.0-202101210137.p. The fields get populated now but the image provided "quay.io/openshift/origin-node-feature-discovery:4.7" is not multi-arch. See Screenshot attachment 1749519 [details] for reference. We would need this image pre-populated as per the arch.

Cluster build details:
# oc version
Client Version: 4.7.0-0.nightly-ppc64le-2021-01-21-052650
Server Version: 4.7.0-0.nightly-ppc64le-2021-01-21-052650
Kubernetes Version: v1.20.0+91b6da5

# oc get packagemanifest | grep nfd
nfd                                 Red Hat Operators v4.7 Stage   8h

# oc get csv | grep nfd
nfd.4.7.0-202101210137.p0   Node Feature Discovery   4.7.0-202101210137.p0              Succeeded

Comment 15 Walid A. 2021-01-23 15:56:21 UTC
Verified on AWS nightly build `4.7.0-0.nightly-2021-01-22-104107` the NodeFeatureDiscoveries instance fields are pre-populated as expected.

nfd csv version:  nfd.4.7.0-202101230053.p0

Comment 16 Carlos Eduardo Arango Gutierrez 2021-01-25 20:10:50 UTC
When a release is GA, the production version of Operators is pulled from the Red Hat registry (e.g. registry.redhat.io/openshift4/ose-cluster-nfd-operator : https://catalog.redhat.com/software/containers/openshift4/ose-cluster-nfd-operator/5d9e23f1bed8bd2245d9378c?container-tabs=overview

this ART built images have support for all the Red Hat supported Multi-Arch

Before that you can manually use an image from https://brewweb.engineering.redhat.com/brew/search?match=glob&type=build&terms=node-feature-discovery-container-*4.7*

Comment 17 Walid A. 2021-02-06 04:55:18 UTC
@pdsilva do you have an OCP environment on ppc64le to verify the fix ?
thanks.

Comment 18 Jeremy Poulin 2021-02-10 18:31:39 UTC
ART provides the multi-arch team builds that point directly to the brew registry for testing.
The latest builds should have been pointing to registry.redhat.io. Does this only start happening when the builds get pushed to stage?

Comment 19 Jeremy Poulin 2021-02-10 19:16:39 UTC
The problem here is that the operator image appears to populate the default nfd image to the quay origin URL instead of the corresponding redhat.registry.io address. It never used to do this before - it has always pointed directly to the redhat.registry.io image corresponding to the release in question. This appears to be a bug.

I'm having Hiro attempt to reproduce the bug with the CFC stage index image.
https://docs.engineering.redhat.com/display/CFC/Test

If this fails, this operator may go live populating the image operator with the upstream/origin image. (Which is not something you would detect on x86, but that image doesn't work for multi-arch).

Comment 20 Jeremy Poulin 2021-02-10 19:42:58 UTC
The latest staging index image in the 4.7 channel appears to be from 12/21/21 and does not include the fix above.

Comment 23 pdsilva 2021-02-18 05:00:00 UTC
Have verified NFD installation on OCP 4.7.0-rc.1 on Power with the staging OperatorSource. The installation is successful and the operand image shows registry.redhat.io/openshift4/ose-node-feature-discovery:v4.7.0. 

I have currently used the 202102130115.p0 image from brew registry "registry.redhat.io/openshift4/ose-node-feature-discovery@sha256:417e5b1e0c60f67b82462e3bd7a678ae913968b5908b6b717f706ed34520d071" with which the pods are in Running state. 

# oc version
Client Version: 4.7.0-rc.1
Server Version: 4.7.0-rc.1
Kubernetes Version: v1.20.0+ba45583

# oc get csv | grep nfd
nfd.4.7.0-202102111715.p0   Node Feature Discovery   4.7.0-202102111715.p0              Succeeded

# oc get pods -A | grep nfd
openshift-operators                                nfd-master-44nvn                                          1/1     Running             0          49m
openshift-operators                                nfd-master-5lhhq                                          1/1     Running             0          47m
openshift-operators                                nfd-master-dnnb7                                          1/1     Running             0          48m
openshift-operators                                nfd-operator-65955df6f4-gfp94                             1/1     Running             0          36h
openshift-operators                                nfd-worker-ct9tr                                          1/1     Running             0          45m
openshift-operators                                nfd-worker-g5wtp                                          1/1     Running             0          22m

Comment 25 errata-xmlrpc 2021-02-24 15:01:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 extras and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5635

Comment 26 Red Hat Bugzilla 2023-09-15 00:58:04 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.