Bug 2021534

Summary: performance profile configuration does not get applied during DU node configuration via ZTP deployment
Product: OpenShift Container Platform Reporter: Marius Cornea <mcornea>
Component: Telco EdgeAssignee: Ian Miller <imiller>
Telco Edge sub component: ZTP QA Contact: yliu1
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: unspecified CC: achernet, browsell, jerzhang, keyoung
Version: 4.9   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-24 13:50:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marius Cornea 2021-11-09 14:43:40 UTC
Description of problem:

When deploying a DU node via ZTP process the performance profile configuration does not get applied. We can see the performance operators gets installed and the performance profile is created but the performance profile configuration do not get set on the node in /proc/cmdline

Version-Release number of selected component (if applicable):
OCP 4.9.6
PAO 4.9.0

How reproducible:
Not always

Steps to Reproduce:
1. Deploy DU siteconfig and policygentemplate from:
http://registry.kni-qe-0.lab.eng.rdu2.redhat.com:3000/kni-qe/ztp-site-configs/src/kni-qe-1-4.9
2. Wait for OCP to get deployed
3. Wait for policies to get created
4. Wait for performance profile configuration to get applied

Actual results:

perf profile gets created:

oc get performanceprofile openshift-node-performance-profile -o yaml
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  creationTimestamp: "2021-11-09T13:00:22Z"
  finalizers:
  - foreground-deletion
  generation: 1
  name: openshift-node-performance-profile
  resourceVersion: "49126"
  uid: bd8be560-501b-4c15-9bdb-4f5db3bc8ccc
spec:
  additionalKernelArgs:
  - idle=poll
  - rcupdate.rcu_normal_after_boot=0
  cpu:
    isolated: 2-23,26-47
    reserved: 0-1,24-25
  globallyDisableIrqLoadBalancing: true
  hugepages:
    defaultHugepagesSize: 1G
    pages:
    - count: 32
      size: 1G
  machineConfigPoolSelector:
    pools.operator.machineconfiguration.openshift.io/master: ""
  nodeSelector:
    node-role.kubernetes.io/master: ""
  numa:
    topologyPolicy: restricted
  realTimeKernel:
    enabled: true
status:
  conditions:
  - lastHeartbeatTime: "2021-11-09T13:45:32Z"
    lastTransitionTime: "2021-11-09T13:45:32Z"
    status: "True"
    type: Available
  - lastHeartbeatTime: "2021-11-09T13:45:32Z"
    lastTransitionTime: "2021-11-09T13:45:32Z"
    status: "True"
    type: Upgradeable
  - lastHeartbeatTime: "2021-11-09T13:45:32Z"
    lastTransitionTime: "2021-11-09T13:45:32Z"
    status: "False"
    type: Progressing
  - lastHeartbeatTime: "2021-11-09T13:45:32Z"
    lastTransitionTime: "2021-11-09T13:45:32Z"
    status: "False"
    type: Degraded
  runtimeClass: performance-openshift-node-performance-profile
  tuned: openshift-cluster-node-tuning-operator/openshift-node-performance-openshift-node-performance-profile


node and mcp is reported as ready:

oc get nodes,mcp
NAME                                        STATUS   ROLES           AGE    VERSION
node/sno.kni-qe-1.lab.eng.rdu2.redhat.com   Ready    master,worker   122m   v1.22.1+d8c4430

NAME                                                         CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
machineconfigpool.machineconfiguration.openshift.io/master   rendered-master-e334d40b29c0b1a0f088c8850b36f926   True      False      False      1              1                   1                     0                      120m
machineconfigpool.machineconfiguration.openshift.io/worker   rendered-worker-e0fab5388ce3af89a9fcc8ab58c78a27   True      False      False      0              0                   0                     0                      120m


Checking /proc/cmdline


ssh -6 core.lab.eng.rdu2.redhat.com  'cat /proc/cmdline'

BOOT_IMAGE=(hd2,gpt3)/ostree/rhcos-fd51e58f0999d0e649cf0d06681c73be663d8533e7c4c82c04d25828d9cc0e74/vmlinuz-4.18.0-305.25.1.rt7.97.el8_4.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ignition.platform.id=metal ostree=/ostree/boot.0/rhcos/fd51e58f0999d0e649cf0d06681c73be663d8533e7c4c82c04d25828d9cc0e74/0 ip=ens2f0:dhcp6 root=UUID=28b9c4b9-52a8-453e-a21b-bad28a92ec11 rw rootflags=prjquota intel_iommu=on iommu=pt


Expected results:
/proc/cmdline includes performance profile configuration

Additional info:
Attaching must-gather

Comment 2 Marius Cornea 2021-11-09 14:52:39 UTC
Note that the node switched to using real time kernel:

[kni ~]$ ssh -6 core.lab.eng.rdu2.redhat.com  'uname -r'

4.18.0-305.25.1.rt7.97.el8_4.x86_64

Comment 4 Marius Cornea 2021-11-09 16:40:28 UTC
MCP reports as updated but the kernelarguments from the rendered-machine config are not applied. It looks like MCO doesn't actually check kernelargs when validating.

Comment 5 Marius Cornea 2021-11-09 16:54:34 UTC
After running manually rpm-ostree kargs and rebooting the node the cmdline got updated:

rpm-ostree  kargs --append=skew_tick=1 --append=nohz=on --append=rcu_nocbs=2-23,26-47 --append=tuned.non_isolcpus=03000003 --append=intel_pstate=disable --append=nosoftlockup --append=tsc=nowatchdog --append=intel_iommu=on --append=iommu=pt --append=isolcpus=managed_irq,2-23,26-47 --append=systemd.cpu_affinity=0,1,24,25 --append=default_hugepagesz=1G --append=hugepagesz=1G --append=hugepages=32 --append=idle=poll --append=rcupdate.rcu_normal_after_boot=0 --append=nohz_full=2-23,26-47

Comment 7 yliu1 2021-11-12 15:55:04 UTC
Seems in my environment, when this happens, a reboot should have started but didn't?

[root@master-0 core]# last reboot | grep reboot
reboot   system boot  4.18.0-305.25.1. Thu Nov 11 22:14   still running
reboot   system boot  4.18.0-305.25.1. Thu Nov 11 21:33   still running
reboot   system boot  4.18.0-305.25.1. Thu Nov 11 21:13 - 21:31  (00:17)
reboot   system boot  4.18.0-305.25.1. Thu Nov 11 20:50 - 21:11  (00:20)
reboot   system boot  4.18.0-305.25.1. Thu Nov 11 20:34 - 20:48  (00:13)
reboot   system boot  4.18.0-305.25.1. Thu Nov 11 20:28 - 20:32  (00:04)
reboot   system boot  4.18.0-305.25.1. Thu Nov 11 19:57 - 20:25  (00:28)
reboot   system boot  4.18.0-305.19.1. Thu Nov 11 19:54 - 19:55  (00:00)

Comment 12 Ken Young 2021-11-24 13:50:19 UTC

*** This bug has been marked as a duplicate of bug 2021151 ***