Bug 1919970
Summary: | NTO does not update when the tuned profile is updated. | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Niranjan Mallapadi Raghavender <mniranja> | ||||
Component: | Node Tuning Operator | Assignee: | Jiří Mencák <jmencak> | ||||
Status: | CLOSED ERRATA | QA Contact: | Simon <skordas> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 4.7 | CC: | grajaiya, kquinn, sejug, yquinn | ||||
Target Milestone: | --- | ||||||
Target Release: | 4.7.0 | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
Cause:
openshift-tuned does not handle failures to apply a Tuned profile.
Consequence:
When an invalid Tuned profile is created, the openshift-tuned supervisor process may ignore future profile updates( and fail to apply the updated profile).
Fix:
Keep state information about Tuned profile application success or failure.
Result:
openshift-tuned will recover from profile application failures on receiving new valid profiles.
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 1920525 (view as bug list) | Environment: | |||||
Last Closed: | 2021-02-24 15:55:53 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1920525 | ||||||
Attachments: |
|
Description
Niranjan Mallapadi Raghavender
2021-01-25 13:41:21 UTC
To workaround the issue is to delete the nto pods running on worker-cnf nodes, then the updated tuned profile gets applied. Created attachment 1750525 [details]
NTO logs from pods running on worker-cnf node.
Another way of dealing with it is deleting the tuned CR and recreating it properly. From the Tuned Pod logs I can see you're missing the `openshift-node-performance-example-performanceprofile` profile. It also doesn't show in your `oc get Tuned` output. Is it created before you instantiate disable_stalld.yaml ? OK, I think I know what you mean now and this is a known issue. It is planned to be fixed in 4.8 and the fix is already included here: https://github.com/openshift/cluster-node-tuning-operator/pull/188 From the Tuned Pod logs I can see you're missing the `openshift-node-performance-example-performanceprofile` profile. It also doesn't show in your `oc get Tuned` output. Is it created before you instantiate disable_stalld.yaml ? Yes openshift-node-performance-example-performanceprofile is missing , So we modified the tuned profile to provide the right profile. But after updating the profile. NTO still doesn't get updated. (In reply to Niranjan Mallapadi Raghavender from comment #6) > Yes openshift-node-performance-example-performanceprofile is missing , So we > modified the tuned profile to provide the right profile. But after updating > the profile. > NTO still doesn't get updated. Understood and thanks for clarification. This is a know issue which I was planning to address in 4.8 with the PR I mentioned above. It might be worth, however, backporting part of this PR to address the issue in 4.7 (and maybe even earlier) already. Thank you. Cluster version: 4.7.0-0.nightly-2021-01-29-094746 # Get worker node and NTO pod on this node node=$(oc get nodes | grep -m 1 worker | cut -f 1 -d ' ') && echo $node pod=$(oc get pods -n openshift-cluster-node-tuning-operator -o wide | grep $node | cut -d ' ' -f 1) && echo $pod # label the node: oc label node $node node-role.kubernetes.io/worker-cnf= # Log in into web console # Operators -> Operator Hub -> Performance Addon Operator -> Install # Adding performance profile: oc create -f- <<EOF apiVersion: performance.openshift.io/v1 kind: PerformanceProfile metadata: name: performance namespace: openshift-operators spec: additionalKernelArgs: - nosmt cpu: isolated: "1" reserved: "0-1" hugepages: defaultHugepagesSize: "1G" pages: - size: "1G" node: 0 count: 1 realTimeKernel: enabled: true nodeSelector: node-role.kubernetes.io/worker-cnf: "" EOF # New tuned is created # Create and wait for mcp: oc create -f- <<EOF apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: name: worker-cnf labels: worker-cnf: "" spec: machineConfigSelector: matchExpressions: - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,worker-cnf]} nodeSelector: matchLabels: node-role.kubernetes.io/worker-cnf: "" EOF # Check tuned profile on worker-cnf node oc get profiles $node -n openshift-cluster-node-tuning-operator -o json | jq ".spec.config.tunedProfile" "openshift-node-performance-performance" # Check logs oc logs $pod 2021-01-29 17:38:10,226 INFO tuned.daemon.daemon: static tuning from profile 'openshift-node-performance-performance' applied # Check node - new openshift-node-performance-performance profile with set up vm.stat_interval = 10: oc debug node/$node -- chroot /host sysctl vm.stat_interval Starting pod/skordas129-smst4-worker-a-92ll8copenshift-qeinternal-debug ... To use host binaries, run `chroot /host` vm.stat_interval = 10 # Create performance-patch tuned, include no existing profile oc create -f- <<EOF apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: name: performance-patch namespace: openshift-cluster-node-tuning-operator spec: profile: - data: | [main] summary=Configuration changes profile inherited from performance created tuned include=openshift-node-performance-example-performanceprofile [service] service.stalld=stop,disable name: performance-patch recommend: - machineConfigLabels: machineconfiguration.openshift.io/role: "worker-cnf" priority: 19 profile: performance-patch EOF # Check once again tuned profile - new profile oc get profiles $node -n openshift-cluster-node-tuning-operator -o json | jq ".spec.config.tunedProfile" "performance-patch" # Check logs (missing profile as expected) oc logs $pod 2021-01-29 17:43:21,554 ERROR tuned.daemon.daemon: Cannot set initial profile. No tunings will be enabled: Cannot load profile(s) 'performance-patch': Cannot find profile 'openshift-node-performance-example-performanceprofile' in '['/etc/tuned', '/usr/lib/tuned']'. 2021-01-29 17:43:21,554 INFO tuned.daemon.controller: starting controller # Check node (here is 1 not 10 like previously, because profile openshift-node-performance-performance is not included) oc debug node/$node -- chroot /host sysctl vm.stat_interval Starting pod/skordas129-smst4-worker-a-92ll8copenshift-qeinternal-debug ... To use host binaries, run `chroot /host` vm.stat_interval = 1 # Update performance-patch profile, including correct profile: oc edit tuned performance-patch include=openshift-node-performance-example-performanceprofile -> include=openshift-node-performance-performance # Check tuned profile on worker-cnf node oc get profiles $node -n openshift-cluster-node-tuning-operator -o json | jq ".spec.config.tunedProfile" "performance-patch" # Chck logs once again oc logs $pod 2021-01-29 18:56:39,999 INFO tuned.plugins.plugin_bootloader: installing additional boot command line parameters to grub2 2021-01-29 18:56:39,999 INFO tuned.plugins.plugin_bootloader: cannot find grub.cfg to patch 2021-01-29 18:56:40,001 INFO tuned.daemon.daemon: static tuning from profile 'performance-patch' applied # Check value on node - value was included oc debug node/$node -- chroot /host sysctl vm.stat_interval Starting pod/skordas129-smst4-worker-a-92ll8copenshift-qeinternal-debug ... To use host binaries, run `chroot /host` vm.stat_interval = 10 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |