2021534 – performance profile configuration does not get applied during DU node configuration via ZTP deployment

Bug 2021534 - performance profile configuration does not get applied during DU node configuration via ZTP deployment

Summary: performance profile configuration does not get applied during DU node configu...

Keywords:
Status:	CLOSED DUPLICATE of bug 2021151
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Telco Edge
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Ian Miller
QA Contact:	yliu1
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-11-09 14:43 UTC by Marius Cornea
Modified:	2021-11-24 13:50 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-11-24 13:50:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Marius Cornea 2021-11-09 14:43:40 UTC

Description of problem:

When deploying a DU node via ZTP process the performance profile configuration does not get applied. We can see the performance operators gets installed and the performance profile is created but the performance profile configuration do not get set on the node in /proc/cmdline

Version-Release number of selected component (if applicable):
OCP 4.9.6
PAO 4.9.0

How reproducible:
Not always

Steps to Reproduce:
1. Deploy DU siteconfig and policygentemplate from:
http://registry.kni-qe-0.lab.eng.rdu2.redhat.com:3000/kni-qe/ztp-site-configs/src/kni-qe-1-4.9
2. Wait for OCP to get deployed
3. Wait for policies to get created
4. Wait for performance profile configuration to get applied

Actual results:

perf profile gets created:

oc get performanceprofile openshift-node-performance-profile -o yaml
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  creationTimestamp: "2021-11-09T13:00:22Z"
  finalizers:
  - foreground-deletion
  generation: 1
  name: openshift-node-performance-profile
  resourceVersion: "49126"
  uid: bd8be560-501b-4c15-9bdb-4f5db3bc8ccc
spec:
  additionalKernelArgs:
  - idle=poll
  - rcupdate.rcu_normal_after_boot=0
  cpu:
    isolated: 2-23,26-47
    reserved: 0-1,24-25
  globallyDisableIrqLoadBalancing: true
  hugepages:
    defaultHugepagesSize: 1G
    pages:
    - count: 32
      size: 1G
  machineConfigPoolSelector:
    pools.operator.machineconfiguration.openshift.io/master: ""
  nodeSelector:
    node-role.kubernetes.io/master: ""
  numa:
    topologyPolicy: restricted
  realTimeKernel:
    enabled: true
status:
  conditions:
  - lastHeartbeatTime: "2021-11-09T13:45:32Z"
    lastTransitionTime: "2021-11-09T13:45:32Z"
    status: "True"
    type: Available
  - lastHeartbeatTime: "2021-11-09T13:45:32Z"
    lastTransitionTime: "2021-11-09T13:45:32Z"
    status: "True"
    type: Upgradeable
  - lastHeartbeatTime: "2021-11-09T13:45:32Z"
    lastTransitionTime: "2021-11-09T13:45:32Z"
    status: "False"
    type: Progressing
  - lastHeartbeatTime: "2021-11-09T13:45:32Z"
    lastTransitionTime: "2021-11-09T13:45:32Z"
    status: "False"
    type: Degraded
  runtimeClass: performance-openshift-node-performance-profile
  tuned: openshift-cluster-node-tuning-operator/openshift-node-performance-openshift-node-performance-profile


node and mcp is reported as ready:

oc get nodes,mcp
NAME                                        STATUS   ROLES           AGE    VERSION
node/sno.kni-qe-1.lab.eng.rdu2.redhat.com   Ready    master,worker   122m   v1.22.1+d8c4430

NAME                                                         CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
machineconfigpool.machineconfiguration.openshift.io/master   rendered-master-e334d40b29c0b1a0f088c8850b36f926   True      False      False      1              1                   1                     0                      120m
machineconfigpool.machineconfiguration.openshift.io/worker   rendered-worker-e0fab5388ce3af89a9fcc8ab58c78a27   True      False      False      0              0                   0                     0                      120m


Checking /proc/cmdline


ssh -6 core.lab.eng.rdu2.redhat.com  'cat /proc/cmdline'

BOOT_IMAGE=(hd2,gpt3)/ostree/rhcos-fd51e58f0999d0e649cf0d06681c73be663d8533e7c4c82c04d25828d9cc0e74/vmlinuz-4.18.0-305.25.1.rt7.97.el8_4.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ignition.platform.id=metal ostree=/ostree/boot.0/rhcos/fd51e58f0999d0e649cf0d06681c73be663d8533e7c4c82c04d25828d9cc0e74/0 ip=ens2f0:dhcp6 root=UUID=28b9c4b9-52a8-453e-a21b-bad28a92ec11 rw rootflags=prjquota intel_iommu=on iommu=pt


Expected results:
/proc/cmdline includes performance profile configuration

Additional info:
Attaching must-gather

Comment 2 Marius Cornea 2021-11-09 14:52:39 UTC

Note that the node switched to using real time kernel:

[kni ~]$ ssh -6 core.lab.eng.rdu2.redhat.com  'uname -r'

4.18.0-305.25.1.rt7.97.el8_4.x86_64

Comment 4 Marius Cornea 2021-11-09 16:40:28 UTC

MCP reports as updated but the kernelarguments from the rendered-machine config are not applied. It looks like MCO doesn't actually check kernelargs when validating.

Comment 5 Marius Cornea 2021-11-09 16:54:34 UTC

After running manually rpm-ostree kargs and rebooting the node the cmdline got updated:

rpm-ostree  kargs --append=skew_tick=1 --append=nohz=on --append=rcu_nocbs=2-23,26-47 --append=tuned.non_isolcpus=03000003 --append=intel_pstate=disable --append=nosoftlockup --append=tsc=nowatchdog --append=intel_iommu=on --append=iommu=pt --append=isolcpus=managed_irq,2-23,26-47 --append=systemd.cpu_affinity=0,1,24,25 --append=default_hugepagesz=1G --append=hugepagesz=1G --append=hugepages=32 --append=idle=poll --append=rcupdate.rcu_normal_after_boot=0 --append=nohz_full=2-23,26-47

Comment 7 yliu1 2021-11-12 15:55:04 UTC

Seems in my environment, when this happens, a reboot should have started but didn't?

[root@master-0 core]# last reboot | grep reboot
reboot   system boot  4.18.0-305.25.1. Thu Nov 11 22:14   still running
reboot   system boot  4.18.0-305.25.1. Thu Nov 11 21:33   still running
reboot   system boot  4.18.0-305.25.1. Thu Nov 11 21:13 - 21:31  (00:17)
reboot   system boot  4.18.0-305.25.1. Thu Nov 11 20:50 - 21:11  (00:20)
reboot   system boot  4.18.0-305.25.1. Thu Nov 11 20:34 - 20:48  (00:13)
reboot   system boot  4.18.0-305.25.1. Thu Nov 11 20:28 - 20:32  (00:04)
reboot   system boot  4.18.0-305.25.1. Thu Nov 11 19:57 - 20:25  (00:28)
reboot   system boot  4.18.0-305.19.1. Thu Nov 11 19:54 - 19:55  (00:00)

Comment 12 Ken Young 2021-11-24 13:50:19 UTC


*** This bug has been marked as a duplicate of bug 2021151 ***

Note You need to log in before you can comment on or make changes to this bug.