Hide Forgot
+++ This bug was initially created as a clone of Bug #1919970 +++ Description of problem: When the Tuned profile is updated. Node tuning operator does not get updated to apply the changes in the profile. Version-Release number of selected component (if applicable): [root@dell-r640-028 performance]# oc version Client Version: 4.7.0-fc.3 Server Version: 4.7.0-fc.3 Kubernetes Version: v1.20.0+d9c52c How reproducible: 1. Setup up OCP 4.7 2. Install and setup performance addon operator apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: name: performance spec: additionalKernelArgs: - nosmt cpu: isolated: "2-3" reserved: "0-1" hugepages: defaultHugepagesSize: "1G" pages: - size: "1G" node: 0 count: 1 realTimeKernel: enabled: true nodeSelector: node-role.kubernetes.io/worker-cnf: "" 3. Create a tuned profile. as show below. (In this profile we are disabling the stalld). [root@dell-r640-028 performance]# cat disable_stalld.yaml apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: name: performance-patch namespace: openshift-cluster-node-tuning-operator spec: profile: - data: | [main] summary=Configuration changes profile inherited from performance created tuned include=openshift-node-performance-example-performanceprofile [service] service.stalld=stop,disable name: performance-patch recommend: - machineConfigLabels: machineconfiguration.openshift.io/role: "worker-cnf" priority: 19 profile: performance-patch 4. Once the above profile is applied . 5. Modify the Tuned profile/performance-patch . Update the include mentioned in the Tuned profile. In the above mentioned profile in the include parameter in tuned profile doesn't exist. Once the profile is updated to specify the right Tuned profile. [root@dell-r640-028 performance]# oc get Tuned NAME AGE default 143m openshift-node-performance-performance 25m performance-patch 28m rendered 143m 6. Modified tuned profile to specify the right profile. $ cat disable_stalld.yaml apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: name: performance-patch namespace: openshift-cluster-node-tuning-operator spec: profile: - data: | [main] summary=Configuration changes profile inherited from performance created tuned include=openshift-node-performance-performance [service] service.stalld=stop,disable name: performance-patch recommend: - machineConfigLabels: machineconfiguration.openshift.io/role: "worker-cnf" priority: 19 profile: performance-patch [root@dell-r640-028 performance]# oc apply -f disable_stalld.yaml tuned.tuned.openshift.io/performance-patch configured Check any changes in tuned. [root@dell-r640-028 performance]# oc get pods NAME READY STATUS RESTARTS AGE cluster-node-tuning-operator-674966bd95-dkltc 1/1 Running 0 66m tuned-6d9v4 1/1 Running 0 146m tuned-8j54t 1/1 Running 0 138m tuned-8mh25 1/1 Running 0 146m tuned-dp2gp 1/1 Running 0 138m tuned-jqfw9 1/1 Running 0 138m tuned-sgv76 1/1 Running 0 146m [root@dell-r640-028 performance]# oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-fd44a5696af050011856431fbb3b2c3b True False False 3 3 3 0 147m worker rendered-worker-0e4354cac64e3253ee87d7aeb3449782 True False False 1 1 1 0 147m worker-cnf rendered-worker-cnf-dc8fe15e9eaa459be86d35da3d6c8701 True False False 2 2 2 0 33m [root@dell-r640-028 performance]# oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-fd44a5696af050011856431fbb3b2c3b True False False 3 3 3 0 147m worker rendered-worker-0e4354cac64e3253ee87d7aeb3449782 True False False 1 1 1 0 147m worker-cnf rendered-worker-cnf-dc8fe15e9eaa459be86d35da3d6c8701 True False False 2 2 2 0 33m Actual results: Once the Tuned profile is modified. NTO doesn't seem to update the changes. Expected results: NTO should update the changes in Tuned profile. Additional info: Logs: I0125 13:06:21.489811 3534 tuned.go:462] sending HUP to PID 5550 2021-01-25 13:06:21,490 INFO tuned.daemon.daemon: stopping tuning 2021-01-25 13:06:21,511 INFO tuned.daemon.daemon: terminating Tuned, rolling back all changes 2021-01-25 13:06:21,524 INFO tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration. 2021-01-25 13:06:21,524 INFO tuned.daemon.daemon: Using 'performance-patch' profile 2021-01-25 13:06:21,525 INFO tuned.profiles.loader: loading profile: performance-patch 2021-01-25 13:06:21,525 ERROR tuned.daemon.controller: Failed to reload Tuned: Cannot load profile(s) 'performance-patch': Cannot find profile 'openshift-node-performance-example-performanceprofile' in '['/etc/tuned', '/usr/lib/tuned']'. I0125 13:09:33.689001 3534 tuned.go:291] extracting Tuned profiles I0125 13:09:33.848530 3534 tuned.go:325] recommended Tuned profile performance-patch content unchanged 2021-01-25 13:16:13,332 INFO tuned.daemon.controller: terminating controller E0125 13:17:13.785882 4556 reflector.go:127] github.com/openshift/cluster-node-tuning-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to watch *v1.Profile: failed to list *v1.Profile: Get "https://172.30.0.1:443/apis/tuned.openshift.io/v1/namespaces/openshift-cluster-node-tuning-operator/profiles?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: connect: no route to host E0125 13:17:13.785882 4556 reflector.go:127] github.com/openshift/cluster-node-tuning-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to watch *v1.Tuned: failed to list *v1.Tuned: Get "https://172.30.0.1:443/apis/tuned.openshift.io/v1/namespaces/openshift-cluster-node-tuning-operator/tuneds?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: connect: no route to host I0125 13:17:15.089012 4556 tuned.go:274] disabling system tuned... I0125 13:17:15.433528 4556 tuned.go:852] started events processor I0125 13:17:15.434670 4556 tuned.go:895] started controller I0125 13:17:15.435280 4556 tuned.go:369] written "/etc/tuned/recommend.d/50-openshift.conf" to set Tuned profile performance-patch I0125 13:17:15.435352 4556 tuned.go:291] extracting Tuned profiles I0125 13:17:15.675014 4556 tuned.go:325] recommended Tuned profile performance-patch content changed I0125 13:17:16.594504 4556 tuned.go:595] active profile () != recommended profile (performance-patch) I0125 13:17:16.594601 4556 tuned.go:382] starting tuned... 2021-01-25 13:17:16,752 INFO tuned.daemon.application: dynamic tuning is globally disabled 2021-01-25 13:17:16,762 INFO tuned.daemon.daemon: using sleep interval of 1 second(s) 2021-01-25 13:17:16,762 INFO tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration. 2021-01-25 13:17:16,763 INFO tuned.daemon.daemon: Using 'performance-patch' profile 2021-01-25 13:17:16,764 INFO tuned.profiles.loader: loading profile: performance-patch 2021-01-25 13:17:16,765 ERROR tuned.daemon.daemon: Cannot set initial profile. No tunings will be enabled: Cannot load profile(s) 'performance-patch': Cannot find profile 'openshift-node-performance-example-performanceprofile' in '['/etc/tuned', '/usr/lib/tuned']'. 2021-01-25 13:17:16,766 INFO tuned.daemon.controller: starting controller I0125 13:37:48.121788 4556 tuned.go:291] extracting Tuned profiles I0125 13:37:48.282621 4556 tuned.go:325] recommended Tuned profile performance-patch content changed --- Additional comment from Niranjan Mallapadi Raghavender on 2021-01-25 13:56:16 UTC --- To workaround the issue is to delete the nto pods running on worker-cnf nodes, then the updated tuned profile gets applied. --- Additional comment from Niranjan Mallapadi Raghavender on 2021-01-25 14:03:27 UTC --- --- Additional comment from Yanir Quinn on 2021-01-25 14:10:30 UTC --- Another way of dealing with it is deleting the tuned CR and recreating it properly. --- Additional comment from on 2021-01-25 14:27:17 UTC --- From the Tuned Pod logs I can see you're missing the `openshift-node-performance-example-performanceprofile` profile. It also doesn't show in your `oc get Tuned` output. Is it created before you instantiate disable_stalld.yaml ? --- Additional comment from on 2021-01-25 14:50:56 UTC --- OK, I think I know what you mean now and this is a known issue. It is planned to be fixed in 4.8 and the fix is already included here: https://github.com/openshift/cluster-node-tuning-operator/pull/188 --- Additional comment from Niranjan Mallapadi Raghavender on 2021-01-25 15:01:20 UTC --- From the Tuned Pod logs I can see you're missing the `openshift-node-performance-example-performanceprofile` profile. It also doesn't show in your `oc get Tuned` output. Is it created before you instantiate disable_stalld.yaml ? Yes openshift-node-performance-example-performanceprofile is missing , So we modified the tuned profile to provide the right profile. But after updating the profile. NTO still doesn't get updated. --- Additional comment from on 2021-01-25 15:11:10 UTC --- (In reply to Niranjan Mallapadi Raghavender from comment #6) > Yes openshift-node-performance-example-performanceprofile is missing , So we > modified the tuned profile to provide the right profile. But after updating > the profile. > NTO still doesn't get updated. Understood and thanks for clarification. This is a know issue which I was planning to address in 4.8 with the PR I mentioned above. It might be worth, however, backporting part of this PR to address the issue in 4.7 (and maybe even earlier) already. Thank you.
Cluster version: 4.6.0-0.nightly-2021-02-22-141201
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.19 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0634