Description of problem: When the Tuned profile is updated. Node tuning operator does not get updated to apply the changes in the profile. Version-Release number of selected component (if applicable): [root@dell-r640-028 performance]# oc version Client Version: 4.7.0-fc.3 Server Version: 4.7.0-fc.3 Kubernetes Version: v1.20.0+d9c52c How reproducible: 1. Setup up OCP 4.7 2. Install and setup performance addon operator apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: name: performance spec: additionalKernelArgs: - nosmt cpu: isolated: "2-3" reserved: "0-1" hugepages: defaultHugepagesSize: "1G" pages: - size: "1G" node: 0 count: 1 realTimeKernel: enabled: true nodeSelector: node-role.kubernetes.io/worker-cnf: "" 3. Create a tuned profile. as show below. (In this profile we are disabling the stalld). [root@dell-r640-028 performance]# cat disable_stalld.yaml apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: name: performance-patch namespace: openshift-cluster-node-tuning-operator spec: profile: - data: | [main] summary=Configuration changes profile inherited from performance created tuned include=openshift-node-performance-example-performanceprofile [service] service.stalld=stop,disable name: performance-patch recommend: - machineConfigLabels: machineconfiguration.openshift.io/role: "worker-cnf" priority: 19 profile: performance-patch 4. Once the above profile is applied . 5. Modify the Tuned profile/performance-patch . Update the include mentioned in the Tuned profile. In the above mentioned profile in the include parameter in tuned profile doesn't exist. Once the profile is updated to specify the right Tuned profile. [root@dell-r640-028 performance]# oc get Tuned NAME AGE default 143m openshift-node-performance-performance 25m performance-patch 28m rendered 143m 6. Modified tuned profile to specify the right profile. $ cat disable_stalld.yaml apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: name: performance-patch namespace: openshift-cluster-node-tuning-operator spec: profile: - data: | [main] summary=Configuration changes profile inherited from performance created tuned include=openshift-node-performance-performance [service] service.stalld=stop,disable name: performance-patch recommend: - machineConfigLabels: machineconfiguration.openshift.io/role: "worker-cnf" priority: 19 profile: performance-patch [root@dell-r640-028 performance]# oc apply -f disable_stalld.yaml tuned.tuned.openshift.io/performance-patch configured Check any changes in tuned. [root@dell-r640-028 performance]# oc get pods NAME READY STATUS RESTARTS AGE cluster-node-tuning-operator-674966bd95-dkltc 1/1 Running 0 66m tuned-6d9v4 1/1 Running 0 146m tuned-8j54t 1/1 Running 0 138m tuned-8mh25 1/1 Running 0 146m tuned-dp2gp 1/1 Running 0 138m tuned-jqfw9 1/1 Running 0 138m tuned-sgv76 1/1 Running 0 146m [root@dell-r640-028 performance]# oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-fd44a5696af050011856431fbb3b2c3b True False False 3 3 3 0 147m worker rendered-worker-0e4354cac64e3253ee87d7aeb3449782 True False False 1 1 1 0 147m worker-cnf rendered-worker-cnf-dc8fe15e9eaa459be86d35da3d6c8701 True False False 2 2 2 0 33m [root@dell-r640-028 performance]# oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-fd44a5696af050011856431fbb3b2c3b True False False 3 3 3 0 147m worker rendered-worker-0e4354cac64e3253ee87d7aeb3449782 True False False 1 1 1 0 147m worker-cnf rendered-worker-cnf-dc8fe15e9eaa459be86d35da3d6c8701 True False False 2 2 2 0 33m Actual results: Once the Tuned profile is modified. NTO doesn't seem to update the changes. Expected results: NTO should update the changes in Tuned profile. Additional info: Logs: I0125 13:06:21.489811 3534 tuned.go:462] sending HUP to PID 5550 2021-01-25 13:06:21,490 INFO tuned.daemon.daemon: stopping tuning 2021-01-25 13:06:21,511 INFO tuned.daemon.daemon: terminating Tuned, rolling back all changes 2021-01-25 13:06:21,524 INFO tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration. 2021-01-25 13:06:21,524 INFO tuned.daemon.daemon: Using 'performance-patch' profile 2021-01-25 13:06:21,525 INFO tuned.profiles.loader: loading profile: performance-patch 2021-01-25 13:06:21,525 ERROR tuned.daemon.controller: Failed to reload Tuned: Cannot load profile(s) 'performance-patch': Cannot find profile 'openshift-node-performance-example-performanceprofile' in '['/etc/tuned', '/usr/lib/tuned']'. I0125 13:09:33.689001 3534 tuned.go:291] extracting Tuned profiles I0125 13:09:33.848530 3534 tuned.go:325] recommended Tuned profile performance-patch content unchanged 2021-01-25 13:16:13,332 INFO tuned.daemon.controller: terminating controller E0125 13:17:13.785882 4556 reflector.go:127] github.com/openshift/cluster-node-tuning-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to watch *v1.Profile: failed to list *v1.Profile: Get "https://172.30.0.1:443/apis/tuned.openshift.io/v1/namespaces/openshift-cluster-node-tuning-operator/profiles?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: connect: no route to host E0125 13:17:13.785882 4556 reflector.go:127] github.com/openshift/cluster-node-tuning-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to watch *v1.Tuned: failed to list *v1.Tuned: Get "https://172.30.0.1:443/apis/tuned.openshift.io/v1/namespaces/openshift-cluster-node-tuning-operator/tuneds?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: connect: no route to host I0125 13:17:15.089012 4556 tuned.go:274] disabling system tuned... I0125 13:17:15.433528 4556 tuned.go:852] started events processor I0125 13:17:15.434670 4556 tuned.go:895] started controller I0125 13:17:15.435280 4556 tuned.go:369] written "/etc/tuned/recommend.d/50-openshift.conf" to set Tuned profile performance-patch I0125 13:17:15.435352 4556 tuned.go:291] extracting Tuned profiles I0125 13:17:15.675014 4556 tuned.go:325] recommended Tuned profile performance-patch content changed I0125 13:17:16.594504 4556 tuned.go:595] active profile () != recommended profile (performance-patch) I0125 13:17:16.594601 4556 tuned.go:382] starting tuned... 2021-01-25 13:17:16,752 INFO tuned.daemon.application: dynamic tuning is globally disabled 2021-01-25 13:17:16,762 INFO tuned.daemon.daemon: using sleep interval of 1 second(s) 2021-01-25 13:17:16,762 INFO tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration. 2021-01-25 13:17:16,763 INFO tuned.daemon.daemon: Using 'performance-patch' profile 2021-01-25 13:17:16,764 INFO tuned.profiles.loader: loading profile: performance-patch 2021-01-25 13:17:16,765 ERROR tuned.daemon.daemon: Cannot set initial profile. No tunings will be enabled: Cannot load profile(s) 'performance-patch': Cannot find profile 'openshift-node-performance-example-performanceprofile' in '['/etc/tuned', '/usr/lib/tuned']'. 2021-01-25 13:17:16,766 INFO tuned.daemon.controller: starting controller I0125 13:37:48.121788 4556 tuned.go:291] extracting Tuned profiles I0125 13:37:48.282621 4556 tuned.go:325] recommended Tuned profile performance-patch content changed
To workaround the issue is to delete the nto pods running on worker-cnf nodes, then the updated tuned profile gets applied.
Created attachment 1750525 [details] NTO logs from pods running on worker-cnf node.
Another way of dealing with it is deleting the tuned CR and recreating it properly.
From the Tuned Pod logs I can see you're missing the `openshift-node-performance-example-performanceprofile` profile. It also doesn't show in your `oc get Tuned` output. Is it created before you instantiate disable_stalld.yaml ?
OK, I think I know what you mean now and this is a known issue. It is planned to be fixed in 4.8 and the fix is already included here: https://github.com/openshift/cluster-node-tuning-operator/pull/188
From the Tuned Pod logs I can see you're missing the `openshift-node-performance-example-performanceprofile` profile. It also doesn't show in your `oc get Tuned` output. Is it created before you instantiate disable_stalld.yaml ? Yes openshift-node-performance-example-performanceprofile is missing , So we modified the tuned profile to provide the right profile. But after updating the profile. NTO still doesn't get updated.
(In reply to Niranjan Mallapadi Raghavender from comment #6) > Yes openshift-node-performance-example-performanceprofile is missing , So we > modified the tuned profile to provide the right profile. But after updating > the profile. > NTO still doesn't get updated. Understood and thanks for clarification. This is a know issue which I was planning to address in 4.8 with the PR I mentioned above. It might be worth, however, backporting part of this PR to address the issue in 4.7 (and maybe even earlier) already. Thank you.
Cluster version: 4.7.0-0.nightly-2021-01-29-094746 # Get worker node and NTO pod on this node node=$(oc get nodes | grep -m 1 worker | cut -f 1 -d ' ') && echo $node pod=$(oc get pods -n openshift-cluster-node-tuning-operator -o wide | grep $node | cut -d ' ' -f 1) && echo $pod # label the node: oc label node $node node-role.kubernetes.io/worker-cnf= # Log in into web console # Operators -> Operator Hub -> Performance Addon Operator -> Install # Adding performance profile: oc create -f- <<EOF apiVersion: performance.openshift.io/v1 kind: PerformanceProfile metadata: name: performance namespace: openshift-operators spec: additionalKernelArgs: - nosmt cpu: isolated: "1" reserved: "0-1" hugepages: defaultHugepagesSize: "1G" pages: - size: "1G" node: 0 count: 1 realTimeKernel: enabled: true nodeSelector: node-role.kubernetes.io/worker-cnf: "" EOF # New tuned is created # Create and wait for mcp: oc create -f- <<EOF apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: name: worker-cnf labels: worker-cnf: "" spec: machineConfigSelector: matchExpressions: - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,worker-cnf]} nodeSelector: matchLabels: node-role.kubernetes.io/worker-cnf: "" EOF # Check tuned profile on worker-cnf node oc get profiles $node -n openshift-cluster-node-tuning-operator -o json | jq ".spec.config.tunedProfile" "openshift-node-performance-performance" # Check logs oc logs $pod 2021-01-29 17:38:10,226 INFO tuned.daemon.daemon: static tuning from profile 'openshift-node-performance-performance' applied # Check node - new openshift-node-performance-performance profile with set up vm.stat_interval = 10: oc debug node/$node -- chroot /host sysctl vm.stat_interval Starting pod/skordas129-smst4-worker-a-92ll8copenshift-qeinternal-debug ... To use host binaries, run `chroot /host` vm.stat_interval = 10 # Create performance-patch tuned, include no existing profile oc create -f- <<EOF apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: name: performance-patch namespace: openshift-cluster-node-tuning-operator spec: profile: - data: | [main] summary=Configuration changes profile inherited from performance created tuned include=openshift-node-performance-example-performanceprofile [service] service.stalld=stop,disable name: performance-patch recommend: - machineConfigLabels: machineconfiguration.openshift.io/role: "worker-cnf" priority: 19 profile: performance-patch EOF # Check once again tuned profile - new profile oc get profiles $node -n openshift-cluster-node-tuning-operator -o json | jq ".spec.config.tunedProfile" "performance-patch" # Check logs (missing profile as expected) oc logs $pod 2021-01-29 17:43:21,554 ERROR tuned.daemon.daemon: Cannot set initial profile. No tunings will be enabled: Cannot load profile(s) 'performance-patch': Cannot find profile 'openshift-node-performance-example-performanceprofile' in '['/etc/tuned', '/usr/lib/tuned']'. 2021-01-29 17:43:21,554 INFO tuned.daemon.controller: starting controller # Check node (here is 1 not 10 like previously, because profile openshift-node-performance-performance is not included) oc debug node/$node -- chroot /host sysctl vm.stat_interval Starting pod/skordas129-smst4-worker-a-92ll8copenshift-qeinternal-debug ... To use host binaries, run `chroot /host` vm.stat_interval = 1 # Update performance-patch profile, including correct profile: oc edit tuned performance-patch include=openshift-node-performance-example-performanceprofile -> include=openshift-node-performance-performance # Check tuned profile on worker-cnf node oc get profiles $node -n openshift-cluster-node-tuning-operator -o json | jq ".spec.config.tunedProfile" "performance-patch" # Chck logs once again oc logs $pod 2021-01-29 18:56:39,999 INFO tuned.plugins.plugin_bootloader: installing additional boot command line parameters to grub2 2021-01-29 18:56:39,999 INFO tuned.plugins.plugin_bootloader: cannot find grub.cfg to patch 2021-01-29 18:56:40,001 INFO tuned.daemon.daemon: static tuning from profile 'performance-patch' applied # Check value on node - value was included oc debug node/$node -- chroot /host sysctl vm.stat_interval Starting pod/skordas129-smst4-worker-a-92ll8copenshift-qeinternal-debug ... To use host binaries, run `chroot /host` vm.stat_interval = 10
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633