Description of problem: When deploying a DU node via ZTP process the performance profile configuration does not get applied. We can see the performance operators gets installed and the performance profile is created but the performance profile configuration do not get set on the node in /proc/cmdline Version-Release number of selected component (if applicable): OCP 4.9.6 PAO 4.9.0 How reproducible: Not always Steps to Reproduce: 1. Deploy DU siteconfig and policygentemplate from: http://registry.kni-qe-0.lab.eng.rdu2.redhat.com:3000/kni-qe/ztp-site-configs/src/kni-qe-1-4.9 2. Wait for OCP to get deployed 3. Wait for policies to get created 4. Wait for performance profile configuration to get applied Actual results: perf profile gets created: oc get performanceprofile openshift-node-performance-profile -o yaml apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: creationTimestamp: "2021-11-09T13:00:22Z" finalizers: - foreground-deletion generation: 1 name: openshift-node-performance-profile resourceVersion: "49126" uid: bd8be560-501b-4c15-9bdb-4f5db3bc8ccc spec: additionalKernelArgs: - idle=poll - rcupdate.rcu_normal_after_boot=0 cpu: isolated: 2-23,26-47 reserved: 0-1,24-25 globallyDisableIrqLoadBalancing: true hugepages: defaultHugepagesSize: 1G pages: - count: 32 size: 1G machineConfigPoolSelector: pools.operator.machineconfiguration.openshift.io/master: "" nodeSelector: node-role.kubernetes.io/master: "" numa: topologyPolicy: restricted realTimeKernel: enabled: true status: conditions: - lastHeartbeatTime: "2021-11-09T13:45:32Z" lastTransitionTime: "2021-11-09T13:45:32Z" status: "True" type: Available - lastHeartbeatTime: "2021-11-09T13:45:32Z" lastTransitionTime: "2021-11-09T13:45:32Z" status: "True" type: Upgradeable - lastHeartbeatTime: "2021-11-09T13:45:32Z" lastTransitionTime: "2021-11-09T13:45:32Z" status: "False" type: Progressing - lastHeartbeatTime: "2021-11-09T13:45:32Z" lastTransitionTime: "2021-11-09T13:45:32Z" status: "False" type: Degraded runtimeClass: performance-openshift-node-performance-profile tuned: openshift-cluster-node-tuning-operator/openshift-node-performance-openshift-node-performance-profile node and mcp is reported as ready: oc get nodes,mcp NAME STATUS ROLES AGE VERSION node/sno.kni-qe-1.lab.eng.rdu2.redhat.com Ready master,worker 122m v1.22.1+d8c4430 NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE machineconfigpool.machineconfiguration.openshift.io/master rendered-master-e334d40b29c0b1a0f088c8850b36f926 True False False 1 1 1 0 120m machineconfigpool.machineconfiguration.openshift.io/worker rendered-worker-e0fab5388ce3af89a9fcc8ab58c78a27 True False False 0 0 0 0 120m Checking /proc/cmdline ssh -6 core.lab.eng.rdu2.redhat.com 'cat /proc/cmdline' BOOT_IMAGE=(hd2,gpt3)/ostree/rhcos-fd51e58f0999d0e649cf0d06681c73be663d8533e7c4c82c04d25828d9cc0e74/vmlinuz-4.18.0-305.25.1.rt7.97.el8_4.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ignition.platform.id=metal ostree=/ostree/boot.0/rhcos/fd51e58f0999d0e649cf0d06681c73be663d8533e7c4c82c04d25828d9cc0e74/0 ip=ens2f0:dhcp6 root=UUID=28b9c4b9-52a8-453e-a21b-bad28a92ec11 rw rootflags=prjquota intel_iommu=on iommu=pt Expected results: /proc/cmdline includes performance profile configuration Additional info: Attaching must-gather
Note that the node switched to using real time kernel: [kni ~]$ ssh -6 core.lab.eng.rdu2.redhat.com 'uname -r' 4.18.0-305.25.1.rt7.97.el8_4.x86_64
MCP reports as updated but the kernelarguments from the rendered-machine config are not applied. It looks like MCO doesn't actually check kernelargs when validating.
After running manually rpm-ostree kargs and rebooting the node the cmdline got updated: rpm-ostree kargs --append=skew_tick=1 --append=nohz=on --append=rcu_nocbs=2-23,26-47 --append=tuned.non_isolcpus=03000003 --append=intel_pstate=disable --append=nosoftlockup --append=tsc=nowatchdog --append=intel_iommu=on --append=iommu=pt --append=isolcpus=managed_irq,2-23,26-47 --append=systemd.cpu_affinity=0,1,24,25 --append=default_hugepagesz=1G --append=hugepagesz=1G --append=hugepages=32 --append=idle=poll --append=rcupdate.rcu_normal_after_boot=0 --append=nohz_full=2-23,26-47
Seems in my environment, when this happens, a reboot should have started but didn't? [root@master-0 core]# last reboot | grep reboot reboot system boot 4.18.0-305.25.1. Thu Nov 11 22:14 still running reboot system boot 4.18.0-305.25.1. Thu Nov 11 21:33 still running reboot system boot 4.18.0-305.25.1. Thu Nov 11 21:13 - 21:31 (00:17) reboot system boot 4.18.0-305.25.1. Thu Nov 11 20:50 - 21:11 (00:20) reboot system boot 4.18.0-305.25.1. Thu Nov 11 20:34 - 20:48 (00:13) reboot system boot 4.18.0-305.25.1. Thu Nov 11 20:28 - 20:32 (00:04) reboot system boot 4.18.0-305.25.1. Thu Nov 11 19:57 - 20:25 (00:28) reboot system boot 4.18.0-305.19.1. Thu Nov 11 19:54 - 19:55 (00:00)
*** This bug has been marked as a duplicate of bug 2021151 ***