2000997 – Change to additionalKernelArgs does not take effect on node

Bug 2000997 - Change to additionalKernelArgs does not take effect on node

Summary: Change to additionalKernelArgs does not take effect on node

Keywords:
Status:	CLOSED DUPLICATE of bug 1999608
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Performance Addon Operator
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Martin Sivák
QA Contact:	Gowrishankar Rajaiyan
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-09-03 13:57 UTC by Ian Miller
Modified:	2021-11-29 10:01 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-09-08 17:09:08 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Ian Miller 2021-09-03 13:57:47 UTC

Description of problem:
After the initial PerformanceProfile was applied to the system and all changes had taken effect, I modified the "additionalKernelArgs" setting to add a second argument. This value did not get applied to the system (waited 30 minutes).

The initial performance profile:
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile                     
metadata:                                     
  name: perfprofile-policy
spec:                                                                                                                        
  additionalKernelArgs:        
  - idle=poll                  
  cpu:                         
    isolated: 2-39,42-79                                                                                                     
    reserved: 0-1,40-41                                                                                                                                                                                                                                   
  globallyDisableIrqLoadBalancing: true
  hugepages:
    defaultHugepagesSize: 1G
    pages:
    - count: 10
      node: 0
      size: 1G
  machineConfigPoolSelector:
    pools.operator.machineconfiguration.openshift.io/master: ""
  nodeSelector:
    node-role.kubernetes.io/master: ""
  numa:
    topologyPolicy: restricted
  realTimeKernel:
    enabled: false


The updated version:
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile                     
metadata:                                     
  name: perfprofile-policy
spec:                                                                                                                        
  additionalKernelArgs:        
  - idle=poll                  
  - rcupdate.rcu_normal_after_boot=0
  cpu:                         
    isolated: 2-39,42-79                                                                                                     
    reserved: 0-1,40-41                                                                                                                                                                                                                                   
  globallyDisableIrqLoadBalancing: true
  hugepages:
    defaultHugepagesSize: 1G
    pages:
    - count: 10
      node: 0
      size: 1G
  machineConfigPoolSelector:
    pools.operator.machineconfiguration.openshift.io/master: ""
  nodeSelector:
    node-role.kubernetes.io/master: ""
  numa:
    topologyPolicy: restricted
  realTimeKernel:
    enabled: false


Version-Release number of selected component (if applicable): 4.9


How reproducible: Always


Steps to Reproduce:
1. Apply initial PerformanceProfile
2. Wait for node to reconcile and all changes to take effect
3. Update additionalKernelArgs

Actual results:
Kernel command line:
BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-4b83116ea516b3615355459ce879612839cfa52edea6851de9a01b6b9eb4eff4/vmlinuz-4.18.0-305.17.1.rt7.89.el8_4.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ignition.platform.id=metal ostree=/ostree/boot.1/rhcos/4b83116ea516b3615355459ce879612839cfa52edea6851de9a01b6b9eb4eff4/0 ip=eno1:dhcp root=UUID=b91d7af0-b528-45fd-98f0-0a66bea1fd1d rw rootflags=prjquota skew_tick=1 nohz=on rcu_nocbs=2-39,42-79 tuned.non_isolcpus=00000300,00000003 intel_pstate=disable nosoftlockup tsc=nowatchdog intel_iommu=on iommu=pt isolcpus=managed_irq,2-39,42-79 systemd.cpu_affinity=0,1,40,41 default_hugepagesz=1G idle=poll nohz_full=2-39,42-79   

Expected results:
Kernel command line includes:
 rcupdate.rcu_normal_after_boot=0

Comment 1 Martin Sivák 2021-09-03 14:27:45 UTC

@Ian, can you double check whether the MachineConfig owned by your Performance Profile got updated?

Comment 2 Ian Miller 2021-09-03 15:26:15 UTC

(In reply to Martin Sivák from comment #1)
> @Ian, can you double check whether the MachineConfig owned by your
> Performance Profile got updated?

When I edit the PerformanceProfile to add/remove additionalKernelArgs I do not see any changes made to the in the MC "50-performance-perfprofile-policy" on the node. Looking at the contents of that MC the kernelArguments field is null.

Comment 3 Ian Miller 2021-09-03 18:26:48 UTC

After more investigation. Edits to the PAO profile are immediately updating the tuned config file:

oc exec -n openshift-cluster-node-tuning-operator tuned-z87df -- cat /etc/tuned/openshift-node-performance-perfprofile-policy/tuned.conf

from: cmdline_additionalArg=+ idle=poll 
to: cmdline_additionalArg=+ idle=poll rcupdate.rcu_normal_after_boot=0 

However this change does not take effect. Perhaps this is a duplicate of 1998247?

As a workaround deleting the node-tuning operator's tuned-xxxxx pod causes the updated configuration to take effect.

Comment 4 Jiří Mencák 2021-09-08 13:12:44 UTC

Hi Ian,

(In reply to Ian Miller from comment #3)
> However this change does not take effect. Perhaps this is a duplicate of
> 1998247?

Probably not.  Can I have your cluster version (oc get clusterversion)?  Thank you.

Comment 6 Ian Miller 2021-09-08 13:32:40 UTC

oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.9     True        False         6d9h    Cluster version is 4.8.9

Comment 7 Jiří Mencák 2021-09-08 13:39:41 UTC

(In reply to Ian Miller from comment #6)
> oc get clusterversion
> NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
> version   4.8.9     True        False         6d9h    Cluster version is
> 4.8.9

Thank you.  Not sure why does this BZ say 4.9 then.  1998247 is not fixed in 4.8 yet, but I need to see if this is the same bug.
A minimum reproducer (without PAO in the picture) would help.

Comment 10 Jiří Mencák 2021-09-08 17:09:08 UTC

This is a duplicate of 1998247/1999608
Verified in Ian's environment.  This is already fixed in 4.9.  Fix for 4.8 depends on 
https://bugzilla.redhat.com/show_bug.cgi?id=1999608
i.e.
https://github.com/openshift/cluster-node-tuning-operator/pull/268 merging.
Thank you for providing me the access to your environment, Ian.  Closing.

*** This bug has been marked as a duplicate of bug 1999608 ***

Note You need to log in before you can comment on or make changes to this bug.