Description of problem: Sometimes the DU node does not get the performance profile configuration applied and MachineConfigPool stays stuck in Updating Version-Release number of selected component (if applicable): OCP 4.9.6 PAO 4.9.0 How reproducible: Not all the times, aproximately 1/5 times Steps to Reproduce: 1. Deploy DU node via ZTP process from http://registry.kni-qe-0.lab.eng.rdu2.redhat.com:3000/kni-qe/ztp-site-configs/src/kni-qe-1-4.9 2. Wait for OCP to finish deployment 3. Wait for the policies to get created and applied Actual results: Performance profile gets created but its configuration are not applied to the node: perf profile: spec: additionalKernelArgs: - idle=poll - rcupdate.rcu_normal_after_boot=0 cpu: isolated: 2-23,26-47 reserved: 0-1,24-25 globallyDisableIrqLoadBalancing: true hugepages: defaultHugepagesSize: 1G pages: - count: 32 size: 1G machineConfigPoolSelector: pools.operator.machineconfiguration.openshift.io/master: "" nodeSelector: node-role.kubernetes.io/master: "" numa: topologyPolicy: restricted realTimeKernel: enabled: true ssh core.lab.eng.rdu2.redhat.com -6 'cat /proc/cmdline' BOOT_IMAGE=(hd2,gpt3)/ostree/rhcos-6837dc5ee75f6f61a4949e5954648bce575363916ef26b0b7002cfbd40a9cb8d/vmlinuz-4.18.0-305.25.1.el8_4.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ignition.platform.id=metal ostree=/ostree/boot.1/rhcos/6837dc5ee75f6f61a4949e5954648bce575363916ef26b0b7002cfbd40a9cb8d/0 ip=ens2f0:dhcp6 root=UUID=b75e1774-5260-42d9-ad5d-de3db9890cdc rw rootflags=prjquota intel_iommu=on iommu=pt Expected results: Configuration specified in the performance profile get applied to the node. Additional info: Setup is stuck on: oc get nodes,mcp NAME STATUS ROLES AGE VERSION node/sno.kni-qe-1.lab.eng.rdu2.redhat.com Ready,SchedulingDisabled master,worker 4h22m v1.22.1+d8c4430 NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE machineconfigpool.machineconfiguration.openshift.io/master rendered-master-90fe2b00c7185b2de24b103db4a32ec4 False True False 1 0 0 0 4h21m machineconfigpool.machineconfiguration.openshift.io/worker rendered-worker-31197fc6da09ee3f662ba1f19a8f0dda True False False 0 0 0 0 4h21m
The issue was reproduced again today.
*** Bug 2021534 has been marked as a duplicate of this bug. ***
*** Bug 2022665 has been marked as a duplicate of this bug. ***
*** Bug 2015305 has been marked as a duplicate of this bug. ***
Hi, Marius Cornea, could you help verify this bug, assign QA to you, thanks.
Verified on a 4.10 DU node deployed via ZTP process with sriov-network-operator.4.10.0-202201210948 [root@sno core]# grep -Ri 'reqReboot true' /var/log/pods/openshift-sriov-network-operator* [root@sno core]#
*** Bug 2016600 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056