Bug 1898510

Summary: NFD worker pods not scheduler on a 3 node master/worker cluster
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: Node Feature Discovery OperatorAssignee: Carlos Eduardo Arango Gutierrez <carangog>
Status: CLOSED ERRATA QA Contact: Walid A. <wabouham>
Severity: high Docs Contact:
Priority: high    
Version: 4.7CC: sejug
Target Milestone: ---   
Target Release: 4.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-01 15:16:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1897346    
Bug Blocks:    

Description OpenShift BugZilla Robot 2020-11-17 11:40:35 UTC
+++ This bug was initially created as a clone of Bug #1897346 +++

Description of problem:

On a 4.6 converged master/worker cluster NFD workers are not deployed. 
We are tolerating all taints and repelling the nfd worker from the master. 
Now in the converged master/worker case we need also make sure that NFD worker is also scheduled on a node with master/worker label.

Comment 3 Walid A. 2021-01-25 22:04:11 UTC
Verified on OCP 4.6.0-0.nightly-2021-01-25-060359.

# oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.6.0-0.nightly-2021-01-25-060359   True        False         False      5h3m
cloud-credential                           4.6.0-0.nightly-2021-01-25-060359   True        False         False      5h51m
cluster-autoscaler                         4.6.0-0.nightly-2021-01-25-060359   True        False         False      5h43m
config-operator                            4.6.0-0.nightly-2021-01-25-060359   True        False         False      5h43m
console                                    4.6.0-0.nightly-2021-01-25-060359   True        False         False      5h5m
csi-snapshot-controller                    4.6.0-0.nightly-2021-01-25-060359   True        False         False      5h33m
dns                                        4.6.0-0.nightly-2021-01-25-060359   True        False         False      5h42m
etcd                                       4.6.0-0.nightly-2021-01-25-060359   True        False         False      5h41m
image-registry                             4.6.0-0.nightly-2021-01-25-060359   True        False         False      5h33m
ingress                                    4.6.0-0.nightly-2021-01-25-060359   True        False         False      5h32m
insights                                   4.6.0-0.nightly-2021-01-25-060359   True        False         False      5h43m
kube-apiserver                             4.6.0-0.nightly-2021-01-25-060359   True        False         False      5h37m
kube-controller-manager                    4.6.0-0.nightly-2021-01-25-060359   True        False         False      5h36m
kube-scheduler                             4.6.0-0.nightly-2021-01-25-060359   True        False         False      5h38m
kube-storage-version-migrator              4.6.0-0.nightly-2021-01-25-060359   True        False         False      5h12m
machine-api                                4.6.0-0.nightly-2021-01-25-060359   True        False         False      5h37m
machine-approver                           4.6.0-0.nightly-2021-01-25-060359   True        False         False      5h42m
machine-config                             4.6.0-0.nightly-2021-01-25-060359   True        False         False      5h41m
marketplace                                4.6.0-0.nightly-2021-01-25-060359   True        False         False      5h4m
monitoring                                 4.6.0-0.nightly-2021-01-25-060359   True        False         False      5h32m
network                                    4.6.0-0.nightly-2021-01-25-060359   True        False         False      5h43m
node-tuning                                4.6.0-0.nightly-2021-01-25-060359   True        False         False      5h43m
openshift-apiserver                        4.6.0-0.nightly-2021-01-25-060359   True        False         False      5h33m
openshift-controller-manager               4.6.0-0.nightly-2021-01-25-060359   True        False         False      5h31m
openshift-samples                          4.6.0-0.nightly-2021-01-25-060359   True        False         False      5h32m
operator-lifecycle-manager                 4.6.0-0.nightly-2021-01-25-060359   True        False         False      5h42m
operator-lifecycle-manager-catalog         4.6.0-0.nightly-2021-01-25-060359   True        False         False      5h42m
operator-lifecycle-manager-packageserver   4.6.0-0.nightly-2021-01-25-060359   True        False         False      5h5m
service-ca                                 4.6.0-0.nightly-2021-01-25-060359   True        False         False      5h43m
storage                                    4.6.0-0.nightly-2021-01-25-060359   True        False         False      5h11m

# oc get nodes
NAME                                         STATUS   ROLES           AGE     VERSION
ip-10-0-136-167.us-east-2.compute.internal   Ready    master,worker   5h45m   v1.19.0+1833054
ip-10-0-173-7.us-east-2.compute.internal     Ready    master,worker   5h45m   v1.19.0+1833054
ip-10-0-209-236.us-east-2.compute.internal   Ready    master,worker   5h45m   v1.19.0+1833054

# oc get pods -n test-nfd
NAME                            READY   STATUS    RESTARTS   AGE
nfd-master-52dj6                1/1     Running   0          15m
nfd-master-bcldh                1/1     Running   0          15m
nfd-master-jwp2d                1/1     Running   0          15m
nfd-operator-64f66f8476-6swsh   1/1     Running   0          18m
nfd-worker-fvkbs                1/1     Running   1          15m
nfd-worker-qhzvv                1/1     Running   1          15m
nfd-worker-znm5w                1/1     Running   0          15m

# oc describe node | grep feature
                    feature.node.kubernetes.io/cpu-cpuid.ADX=true
                    feature.node.kubernetes.io/cpu-cpuid.AESNI=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX2=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512BW=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512CD=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512DQ=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512F=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512VL=true
                    feature.node.kubernetes.io/cpu-cpuid.FMA3=true
                    feature.node.kubernetes.io/cpu-cpuid.MPX=true
                    feature.node.kubernetes.io/cpu-hardware_multithreading=true
                    feature.node.kubernetes.io/custom-rdma.available=true
                    feature.node.kubernetes.io/kernel-selinux.enabled=true
                    feature.node.kubernetes.io/kernel-version.full=4.18.0-193.40.1.el8_2.x86_64
                    feature.node.kubernetes.io/kernel-version.major=4
                    feature.node.kubernetes.io/kernel-version.minor=18
                    feature.node.kubernetes.io/kernel-version.revision=0
                    feature.node.kubernetes.io/pci-1d0f.present=true
                    feature.node.kubernetes.io/storage-nonrotationaldisk=true
                    feature.node.kubernetes.io/system-os_release.ID=rhcos
                    feature.node.kubernetes.io/system-os_release.VERSION_ID=4.6
                    feature.node.kubernetes.io/system-os_release.VERSION_ID.major=4
                    feature.node.kubernetes.io/system-os_release.VERSION_ID.minor=6
                    nfd.node.kubernetes.io/feature-labels:
                    feature.node.kubernetes.io/cpu-cpuid.ADX=true
                    feature.node.kubernetes.io/cpu-cpuid.AESNI=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX2=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512BW=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512CD=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512DQ=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512F=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512VL=true
                    feature.node.kubernetes.io/cpu-cpuid.FMA3=true
                    feature.node.kubernetes.io/cpu-cpuid.MPX=true
                    feature.node.kubernetes.io/cpu-hardware_multithreading=true
                    feature.node.kubernetes.io/custom-rdma.available=true
                    feature.node.kubernetes.io/kernel-selinux.enabled=true
                    feature.node.kubernetes.io/kernel-version.full=4.18.0-193.40.1.el8_2.x86_64
                    feature.node.kubernetes.io/kernel-version.major=4
                    feature.node.kubernetes.io/kernel-version.minor=18
                    feature.node.kubernetes.io/kernel-version.revision=0
                    feature.node.kubernetes.io/pci-1d0f.present=true
                    feature.node.kubernetes.io/storage-nonrotationaldisk=true
                    feature.node.kubernetes.io/system-os_release.ID=rhcos
                    feature.node.kubernetes.io/system-os_release.VERSION_ID=4.6
                    feature.node.kubernetes.io/system-os_release.VERSION_ID.major=4
                    feature.node.kubernetes.io/system-os_release.VERSION_ID.minor=6
                    nfd.node.kubernetes.io/feature-labels:
                    feature.node.kubernetes.io/cpu-cpuid.ADX=true
                    feature.node.kubernetes.io/cpu-cpuid.AESNI=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX2=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512BW=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512CD=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512DQ=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512F=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX512VL=true
                    feature.node.kubernetes.io/cpu-cpuid.FMA3=true
                    feature.node.kubernetes.io/cpu-cpuid.MPX=true
                    feature.node.kubernetes.io/cpu-hardware_multithreading=true
                    feature.node.kubernetes.io/custom-rdma.available=true
                    feature.node.kubernetes.io/kernel-selinux.enabled=true
                    feature.node.kubernetes.io/kernel-version.full=4.18.0-193.40.1.el8_2.x86_64
                    feature.node.kubernetes.io/kernel-version.major=4
                    feature.node.kubernetes.io/kernel-version.minor=18
                    feature.node.kubernetes.io/kernel-version.revision=0
                    feature.node.kubernetes.io/pci-1d0f.present=true
                    feature.node.kubernetes.io/storage-nonrotationaldisk=true
                    feature.node.kubernetes.io/system-os_release.ID=rhcos
                    feature.node.kubernetes.io/system-os_release.VERSION_ID=4.6
                    feature.node.kubernetes.io/system-os_release.VERSION_ID.major=4
                    feature.node.kubernetes.io/system-os_release.VERSION_ID.minor=6
                    nfd.node.kubernetes.io/feature-labels:

Comment 5 errata-xmlrpc 2021-02-01 15:16:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.15 extras update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0238