Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2039358

Summary: The cluster-nfd-operator controller does not honor NodeFeatureDiscovery CR and operand's namespace
Product: OpenShift Container Platform Reporter: aleskandro <adistefa>
Component: Node Feature Discovery OperatorAssignee: Carlos Eduardo Arango Gutierrez <carangog>
Status: CLOSED ERRATA QA Contact: aleskandro <adistefa>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.10CC: adistefa, aos-bugs, sejug
Target Milestone: ---Keywords: Reopened
Target Release: 4.10.0   
Hardware: All   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-10 15:56:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description aleskandro 2022-01-11 15:21:24 UTC
Description of problem:

When a user deploys the Node Feature Discovery operator in a namespace different than "openshift-nfd", the controller does not honor the future operands' namespace.

Even though the resources owned by the operand works correctly and the nodes are labeled, the status of the NodeFeatureDiscovery resource does not reconcile.

Version-Release number of selected component (if applicable): verified on 4.9.0-202112142229


How reproducible:


Steps to Reproduce 1 (CLI, see the yaml below):
0. Have a fresh cluster ready
1. Create a custom namespace (let us assume "custom-ns")
2. Create an OperatorGroup resource
3. Create the Subscription resource to install the nfd-operator
4. Wait for a successful installation in the custom namespace
5. Create a NodeFeatureDiscovery resource in the "custom-ns" namespace with operand's namespace set to "custom-ns" too.


Steps to Reproduce 2 (UI):
0. Have a fresh cluster ready
1. Create a namespace (let us assume "custom-ns")
2. Go to the OperatorHub page, search nfd and click on install
3. Set the Installation Mode to "A specific namespace on the cluster"
4. Set the installed namespace to "custom-ns"
5. Submit
6. Wait for the operator to be installed
7. Go to the NodeFeatureDiscoveries page
8. Add a new NodeFeatureDisovery resource

Note: In the YAML view, the operand's namespace (operand.namespace) is by default "openshift-nfd". Should it be set to the same namespace as metadata.namespace? In both cases, currently, the resources are deployed in the custom-ns namespace.


Actual results:

The DaemonSets and their pods are correctly instantiated, and the nodes are labeled. 

However, the status of the NodeFeatureDiscovery instance is Degraded.

Status:
  Conditions:
    Last Heartbeat Time:   2022-01-11T13:25:38Z
    Last Transition Time:  2022-01-11T13:25:38Z
    Status:                False
    Type:                  Available
    Last Heartbeat Time:   2022-01-11T13:25:38Z
    Last Transition Time:  2022-01-11T13:25:38Z
    Status:                False
    Type:                  Upgradeable
    Last Heartbeat Time:   2022-01-11T13:25:38Z
    Last Transition Time:  2022-01-11T13:25:38Z
    Status:                False
    Type:                  Progressing
    Last Heartbeat Time:   2022-01-11T13:25:38Z
    Last Transition Time:  2022-01-11T13:25:38Z
    Message:               NFDWorkerServiceAccountDegraded
    Reason:                NFDWorkerServiceAccountDegraded
    Status:                True
    Type:                  Degraded

Expected results:

The DaemonSets and their pods are correctly instantiated, and the nodes are labeled. 

The status of the NodeFeatureDiscovery instance is like for the installation on the openshift-nfd namespace:

Status:
  Conditions:
    Last Heartbeat Time:   2022-01-11T14:55:58Z
    Last Transition Time:  2022-01-11T14:55:58Z
    Status:                True
    Type:                  Available
    Last Heartbeat Time:   2022-01-11T14:55:58Z
    Last Transition Time:  2022-01-11T14:55:58Z
    Status:                True
    Type:                  Upgradeable
    Last Heartbeat Time:   2022-01-11T14:55:58Z
    Last Transition Time:  2022-01-11T14:55:58Z
    Status:                False
    Type:                  Progressing
    Last Heartbeat Time:   2022-01-11T14:55:58Z
    Last Transition Time:  2022-01-11T14:55:58Z
    Status:                False
    Type:                  Degraded


Additional info:

It seems that the "openshift-nfd" namespace is hard-coded in https://github.com/openshift/cluster-nfd-operator/blob/f284c3e332c0dfe88c78893ed3a2eb03f9c991f2/controllers/nodefeaturediscovery_status.go#L23

Comment 1 aleskandro 2022-01-11 15:22:40 UTC
YAML to deploy from CLI and reproduce:

apiVersion: v1
kind: Namespace
metadata:
  name: custom-ns
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  generateName: openshift-nfd-
  name: openshift-nfd
  namespace: custom-ns
spec:
  targetNamespaces:
  - custom-ns
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: nfd
  namespace: custom-ns
spec:
  channel: "stable"
  installPlanApproval: Automatic
  name: nfd
  source: redhat-operators
  sourceNamespace: openshift-marketplace


Verified node feature discovery resource:

apiVersion: nfd.openshift.io/v1
kind: NodeFeatureDiscovery
metadata:
  name: nfd-instance
  namespace: custom-ns
spec:
  customConfig:
    configData: |
  instance: ''
  operand:
    image: >-
      registry.redhat.io/openshift4/ose-node-feature-discovery@sha256:c46a6b1b03b2f05debeaa1c6a7a016121afa7829759b38574e288c3a639f8f6d
    imagePullPolicy: Always
    namespace: custom-ns
  servicePort: 12000
  workerConfig:
    configData: |
      core:
        sleepInterval: 60s
      sources:
        pci:
          deviceClassWhitelist:
            - "0200"
            - "03"
            - "12"
          deviceLabelFields:
            - "vendor"

Comment 2 Carlos Eduardo Arango Gutierrez 2022-01-28 15:50:21 UTC
for security reasons we have decided to deprecate the namespace on the CRD, NFD components will only be deployed on the same NS as the operator

Comment 3 aleskandro 2022-01-28 17:11:08 UTC
Hello Eduardo, it makes sense. Does it mean there is already a pull request to disable this possibility at least in the OperatorHub UI?

Comment 4 Carlos Eduardo Arango Gutierrez 2022-01-28 17:34:13 UTC
Yes, see https://github.com/openshift/cluster-nfd-operator/blob/master/api/v1/nodefeaturediscovery_types.go#L69 
namespace has been removed as an option of config for the operand

Comment 5 aleskandro 2022-01-28 18:45:47 UTC
This bug was about using a custom namespace for the operator, not the operand. 
I set the operand and the operator in the steps to reproduce to lay in the same namespace to not consider the case of different namespaces.

The nodes were correctly labeled but the operator's controller didn't update the conditions of the operand, because trying to look up at it in the openshift-nfd namespace, not the "custom one".

Deprecating the use of the namespace for the operand does not mean that the controller lookups at the correct namespace (even if the same on which it is running) to update the status of the operand.


In fact, at the time when this bug was raised, see the link in the additional info, the namespace of the controller was hard-coded to "openshift-nfd". 


To make it clear for the users. Do we support using a different namespace than openshift-nfd? Has the hard-coded namespace been fixed?

Comment 6 Carlos Eduardo Arango Gutierrez 2022-01-28 19:36:17 UTC
Sorry for the confusion

Yes this also has been addressed, with the most recent changes on the master branch, soon to be 4.10, this issue should be gone.
On the documentation we will stress the recommendation to still have an "openshift-nfd" ns just for the NFD operator, but https://github.com/openshift/cluster-nfd-operator/pull/234 should fix your concern

Comment 7 aleskandro 2022-01-28 21:48:58 UTC
Ok. Thanks for the clarification, should you set this bug as "Modified" instead of "notabug"?

Comment 8 Carlos Eduardo Arango Gutierrez 2022-01-28 22:02:23 UTC
let's move it to QE for revision, good call

Comment 9 aleskandro 2022-02-01 14:09:19 UTC
Tested on ocp 4.10.0-0.nightly-arm64-2022-01-31-185520 (bare metal env) with NFD version 4.10.0-202201310820.

Both the operator and the operand are now installable on a namespace different than "openshift-nfd". The nodes are correctly labeled.

Moving the bug to Verified.

aleskandro@dujour /tmp % oc describe NodeFeatureDiscovery/nfd-instance -n custom-ns-2

Status:
  Conditions:
    Last Heartbeat Time:   2022-02-01T14:03:43Z
    Last Transition Time:  2022-02-01T14:03:43Z
    Status:                True
    Type:                  Available
    Last Heartbeat Time:   2022-02-01T14:03:43Z
    Last Transition Time:  2022-02-01T14:03:43Z
    Status:                True
    Type:                  Upgradeable
    Last Heartbeat Time:   2022-02-01T14:03:43Z
    Last Transition Time:  2022-02-01T14:03:43Z
    Status:                False
    Type:                  Progressing
    Last Heartbeat Time:   2022-02-01T14:03:43Z
    Last Transition Time:  2022-02-01T14:03:43Z
    Status:                False
    Type:                  Degraded
Events:                    <none>


oc describe node master....
Name:               master....
Roles:              master,worker
Labels:             beta.kubernetes.io/arch=arm64
                    beta.kubernetes.io/os=linux
                    feature.node.kubernetes.io/cpu-cpuid.AES=true
                    feature.node.kubernetes.io/cpu-cpuid.ASIMD=true
                    feature.node.kubernetes.io/cpu-cpuid.ASIMDDP=true
                    feature.node.kubernetes.io/cpu-cpuid.ASIMDHP=true
                    feature.node.kubernetes.io/cpu-cpuid.ASIMDRDM=true
                    feature.node.kubernetes.io/cpu-cpuid.ATOMICS=true
                    feature.node.kubernetes.io/cpu-cpuid.CPUID=true
                    feature.node.kubernetes.io/cpu-cpuid.CRC32=true
                    feature.node.kubernetes.io/cpu-cpuid.DCPOP=true
                    feature.node.kubernetes.io/cpu-cpuid.EVTSTRM=true
                    feature.node.kubernetes.io/cpu-cpuid.FP=true
                    feature.node.kubernetes.io/cpu-cpuid.FPHP=true
                    feature.node.kubernetes.io/cpu-cpuid.LRCPC=true
                    feature.node.kubernetes.io/cpu-cpuid.PMULL=true
                    feature.node.kubernetes.io/cpu-cpuid.SHA1=true
                    feature.node.kubernetes.io/cpu-cpuid.SHA2=true
                    feature.node.kubernetes.io/cpu-hardware_multithreading=false
                    feature.node.kubernetes.io/iommu-enabled=true
                    feature.node.kubernetes.io/kernel-config.NO_HZ=true
                    feature.node.kubernetes.io/kernel-config.NO_HZ_FULL=true
                    feature.node.kubernetes.io/kernel-selinux.enabled=true
                    feature.node.kubernetes.io/kernel-version.full=4.18.0-305.34.2.el8_4.aarch64
                    feature.node.kubernetes.io/kernel-version.major=4
                    feature.node.kubernetes.io/kernel-version.minor=18
                    feature.node.kubernetes.io/kernel-version.revision=0
                    feature.node.kubernetes.io/memory-numa=true
                    feature.node.kubernetes.io/network-sriov.capable=true
                    feature.node.kubernetes.io/pci-1a03.present=true
                    feature.node.kubernetes.io/pci-8086.present=true
                    feature.node.kubernetes.io/pci-8086.sriov.capable=true
                    feature.node.kubernetes.io/storage-nonrotationaldisk=true
                    feature.node.kubernetes.io/system-os_release.ID=rhcos
                    feature.node.kubernetes.io/system-os_release.OSTREE_VERSION=410.84.202201311003-0
                    feature.node.kubernetes.io/system-os_release.RHEL_VERSION=8.4
                    feature.node.kubernetes.io/system-os_release.VERSION_ID=4.10
                    feature.node.kubernetes.io/system-os_release.VERSION_ID.major=4
                    feature.node.kubernetes.io/system-os_release.VERSION_ID.minor=10

Comment 12 errata-xmlrpc 2022-03-10 15:56:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.10.3 extras update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0057