Bug 2039358
| Summary: | The cluster-nfd-operator controller does not honor NodeFeatureDiscovery CR and operand's namespace | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | aleskandro <adistefa> |
| Component: | Node Feature Discovery Operator | Assignee: | Carlos Eduardo Arango Gutierrez <carangog> |
| Status: | CLOSED ERRATA | QA Contact: | aleskandro <adistefa> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.10 | CC: | adistefa, aos-bugs, sejug |
| Target Milestone: | --- | Keywords: | Reopened |
| Target Release: | 4.10.0 | ||
| Hardware: | All | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-03-10 15:56:48 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
YAML to deploy from CLI and reproduce:
apiVersion: v1
kind: Namespace
metadata:
name: custom-ns
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
generateName: openshift-nfd-
name: openshift-nfd
namespace: custom-ns
spec:
targetNamespaces:
- custom-ns
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: nfd
namespace: custom-ns
spec:
channel: "stable"
installPlanApproval: Automatic
name: nfd
source: redhat-operators
sourceNamespace: openshift-marketplace
Verified node feature discovery resource:
apiVersion: nfd.openshift.io/v1
kind: NodeFeatureDiscovery
metadata:
name: nfd-instance
namespace: custom-ns
spec:
customConfig:
configData: |
instance: ''
operand:
image: >-
registry.redhat.io/openshift4/ose-node-feature-discovery@sha256:c46a6b1b03b2f05debeaa1c6a7a016121afa7829759b38574e288c3a639f8f6d
imagePullPolicy: Always
namespace: custom-ns
servicePort: 12000
workerConfig:
configData: |
core:
sleepInterval: 60s
sources:
pci:
deviceClassWhitelist:
- "0200"
- "03"
- "12"
deviceLabelFields:
- "vendor"
for security reasons we have decided to deprecate the namespace on the CRD, NFD components will only be deployed on the same NS as the operator Hello Eduardo, it makes sense. Does it mean there is already a pull request to disable this possibility at least in the OperatorHub UI? Yes, see https://github.com/openshift/cluster-nfd-operator/blob/master/api/v1/nodefeaturediscovery_types.go#L69 namespace has been removed as an option of config for the operand This bug was about using a custom namespace for the operator, not the operand. I set the operand and the operator in the steps to reproduce to lay in the same namespace to not consider the case of different namespaces. The nodes were correctly labeled but the operator's controller didn't update the conditions of the operand, because trying to look up at it in the openshift-nfd namespace, not the "custom one". Deprecating the use of the namespace for the operand does not mean that the controller lookups at the correct namespace (even if the same on which it is running) to update the status of the operand. In fact, at the time when this bug was raised, see the link in the additional info, the namespace of the controller was hard-coded to "openshift-nfd". To make it clear for the users. Do we support using a different namespace than openshift-nfd? Has the hard-coded namespace been fixed? Sorry for the confusion Yes this also has been addressed, with the most recent changes on the master branch, soon to be 4.10, this issue should be gone. On the documentation we will stress the recommendation to still have an "openshift-nfd" ns just for the NFD operator, but https://github.com/openshift/cluster-nfd-operator/pull/234 should fix your concern Ok. Thanks for the clarification, should you set this bug as "Modified" instead of "notabug"? let's move it to QE for revision, good call Tested on ocp 4.10.0-0.nightly-arm64-2022-01-31-185520 (bare metal env) with NFD version 4.10.0-202201310820.
Both the operator and the operand are now installable on a namespace different than "openshift-nfd". The nodes are correctly labeled.
Moving the bug to Verified.
aleskandro@dujour /tmp % oc describe NodeFeatureDiscovery/nfd-instance -n custom-ns-2
Status:
Conditions:
Last Heartbeat Time: 2022-02-01T14:03:43Z
Last Transition Time: 2022-02-01T14:03:43Z
Status: True
Type: Available
Last Heartbeat Time: 2022-02-01T14:03:43Z
Last Transition Time: 2022-02-01T14:03:43Z
Status: True
Type: Upgradeable
Last Heartbeat Time: 2022-02-01T14:03:43Z
Last Transition Time: 2022-02-01T14:03:43Z
Status: False
Type: Progressing
Last Heartbeat Time: 2022-02-01T14:03:43Z
Last Transition Time: 2022-02-01T14:03:43Z
Status: False
Type: Degraded
Events: <none>
oc describe node master....
Name: master....
Roles: master,worker
Labels: beta.kubernetes.io/arch=arm64
beta.kubernetes.io/os=linux
feature.node.kubernetes.io/cpu-cpuid.AES=true
feature.node.kubernetes.io/cpu-cpuid.ASIMD=true
feature.node.kubernetes.io/cpu-cpuid.ASIMDDP=true
feature.node.kubernetes.io/cpu-cpuid.ASIMDHP=true
feature.node.kubernetes.io/cpu-cpuid.ASIMDRDM=true
feature.node.kubernetes.io/cpu-cpuid.ATOMICS=true
feature.node.kubernetes.io/cpu-cpuid.CPUID=true
feature.node.kubernetes.io/cpu-cpuid.CRC32=true
feature.node.kubernetes.io/cpu-cpuid.DCPOP=true
feature.node.kubernetes.io/cpu-cpuid.EVTSTRM=true
feature.node.kubernetes.io/cpu-cpuid.FP=true
feature.node.kubernetes.io/cpu-cpuid.FPHP=true
feature.node.kubernetes.io/cpu-cpuid.LRCPC=true
feature.node.kubernetes.io/cpu-cpuid.PMULL=true
feature.node.kubernetes.io/cpu-cpuid.SHA1=true
feature.node.kubernetes.io/cpu-cpuid.SHA2=true
feature.node.kubernetes.io/cpu-hardware_multithreading=false
feature.node.kubernetes.io/iommu-enabled=true
feature.node.kubernetes.io/kernel-config.NO_HZ=true
feature.node.kubernetes.io/kernel-config.NO_HZ_FULL=true
feature.node.kubernetes.io/kernel-selinux.enabled=true
feature.node.kubernetes.io/kernel-version.full=4.18.0-305.34.2.el8_4.aarch64
feature.node.kubernetes.io/kernel-version.major=4
feature.node.kubernetes.io/kernel-version.minor=18
feature.node.kubernetes.io/kernel-version.revision=0
feature.node.kubernetes.io/memory-numa=true
feature.node.kubernetes.io/network-sriov.capable=true
feature.node.kubernetes.io/pci-1a03.present=true
feature.node.kubernetes.io/pci-8086.present=true
feature.node.kubernetes.io/pci-8086.sriov.capable=true
feature.node.kubernetes.io/storage-nonrotationaldisk=true
feature.node.kubernetes.io/system-os_release.ID=rhcos
feature.node.kubernetes.io/system-os_release.OSTREE_VERSION=410.84.202201311003-0
feature.node.kubernetes.io/system-os_release.RHEL_VERSION=8.4
feature.node.kubernetes.io/system-os_release.VERSION_ID=4.10
feature.node.kubernetes.io/system-os_release.VERSION_ID.major=4
feature.node.kubernetes.io/system-os_release.VERSION_ID.minor=10
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.10.3 extras update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:0057 |
Description of problem: When a user deploys the Node Feature Discovery operator in a namespace different than "openshift-nfd", the controller does not honor the future operands' namespace. Even though the resources owned by the operand works correctly and the nodes are labeled, the status of the NodeFeatureDiscovery resource does not reconcile. Version-Release number of selected component (if applicable): verified on 4.9.0-202112142229 How reproducible: Steps to Reproduce 1 (CLI, see the yaml below): 0. Have a fresh cluster ready 1. Create a custom namespace (let us assume "custom-ns") 2. Create an OperatorGroup resource 3. Create the Subscription resource to install the nfd-operator 4. Wait for a successful installation in the custom namespace 5. Create a NodeFeatureDiscovery resource in the "custom-ns" namespace with operand's namespace set to "custom-ns" too. Steps to Reproduce 2 (UI): 0. Have a fresh cluster ready 1. Create a namespace (let us assume "custom-ns") 2. Go to the OperatorHub page, search nfd and click on install 3. Set the Installation Mode to "A specific namespace on the cluster" 4. Set the installed namespace to "custom-ns" 5. Submit 6. Wait for the operator to be installed 7. Go to the NodeFeatureDiscoveries page 8. Add a new NodeFeatureDisovery resource Note: In the YAML view, the operand's namespace (operand.namespace) is by default "openshift-nfd". Should it be set to the same namespace as metadata.namespace? In both cases, currently, the resources are deployed in the custom-ns namespace. Actual results: The DaemonSets and their pods are correctly instantiated, and the nodes are labeled. However, the status of the NodeFeatureDiscovery instance is Degraded. Status: Conditions: Last Heartbeat Time: 2022-01-11T13:25:38Z Last Transition Time: 2022-01-11T13:25:38Z Status: False Type: Available Last Heartbeat Time: 2022-01-11T13:25:38Z Last Transition Time: 2022-01-11T13:25:38Z Status: False Type: Upgradeable Last Heartbeat Time: 2022-01-11T13:25:38Z Last Transition Time: 2022-01-11T13:25:38Z Status: False Type: Progressing Last Heartbeat Time: 2022-01-11T13:25:38Z Last Transition Time: 2022-01-11T13:25:38Z Message: NFDWorkerServiceAccountDegraded Reason: NFDWorkerServiceAccountDegraded Status: True Type: Degraded Expected results: The DaemonSets and their pods are correctly instantiated, and the nodes are labeled. The status of the NodeFeatureDiscovery instance is like for the installation on the openshift-nfd namespace: Status: Conditions: Last Heartbeat Time: 2022-01-11T14:55:58Z Last Transition Time: 2022-01-11T14:55:58Z Status: True Type: Available Last Heartbeat Time: 2022-01-11T14:55:58Z Last Transition Time: 2022-01-11T14:55:58Z Status: True Type: Upgradeable Last Heartbeat Time: 2022-01-11T14:55:58Z Last Transition Time: 2022-01-11T14:55:58Z Status: False Type: Progressing Last Heartbeat Time: 2022-01-11T14:55:58Z Last Transition Time: 2022-01-11T14:55:58Z Status: False Type: Degraded Additional info: It seems that the "openshift-nfd" namespace is hard-coded in https://github.com/openshift/cluster-nfd-operator/blob/f284c3e332c0dfe88c78893ed3a2eb03f9c991f2/controllers/nodefeaturediscovery_status.go#L23