Bug 2036303
| Summary: | The tuned profile goes into degraded status and ksm.service is displayed in the log. | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Aaron Park <aapark> | |
| Component: | Node Tuning Operator | Assignee: | Jiří Mencák <jmencak> | |
| Status: | CLOSED ERRATA | QA Contact: | liqcui | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 4.9 | CC: | aos-bugs, dagray | |
| Target Milestone: | --- | |||
| Target Release: | 4.9.z | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2037036 (view as bug list) | Environment: | ||
| Last Closed: | 2022-01-17 08:07:31 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 2037036 | |||
| Bug Blocks: | ||||
Verified Result: [ocpadmin@ec2-18-217-45-133 ~]$ oc get nodes NAME STATUS ROLES AGE VERSION liqcui-oc4903-x4dvl-master-0.c.openshift-qe.internal Ready master 24m v1.22.3+e790d7f liqcui-oc4903-x4dvl-master-1.c.openshift-qe.internal Ready master 24m v1.22.3+e790d7f liqcui-oc4903-x4dvl-master-2.c.openshift-qe.internal Ready master 24m v1.22.3+e790d7f liqcui-oc4903-x4dvl-worker-a-xmdcs.c.openshift-qe.internal Ready worker 13m v1.22.3+e790d7f liqcui-oc4903-x4dvl-worker-b-fl7pn.c.openshift-qe.internal Ready worker 13m v1.22.3+e790d7f liqcui-oc4903-x4dvl-worker-c-2d4zg.c.openshift-qe.internal Ready worker 16m v1.22.3+e790d7f [ocpadmin@ec2-18-217-45-133 ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.nightly-2022-01-10-045851 True False 19m Cluster version is 4.9.0-0.nightly-2022-01-10-045851 [ocpadmin@ec2-18-217-45-133 ~]$ oc label no liqcui-oc4903-x4dvl-worker-a-xmdcs.c.openshift-qe.internal node-role.kubernetes.io/worker-rt= node/liqcui-oc4903-x4dvl-worker-a-xmdcs.c.openshift-qe.internal labeled [ocpadmin@ec2-18-217-45-133 ~]$ oc create -f- <<EOF > apiVersion: tuned.openshift.io/v1 > kind: Tuned > metadata: > name: openshift-cpu-partitioning > namespace: openshift-cluster-node-tuning-operator > spec: > profile: > - data: | > [main] > summary=Custom OpenShift cpu-partitioning profile > include=openshift-node,cpu-partitioning > [variables] > # {isolated,no_balance}_cores take a list of ranges; e.g. isolated_cores=2,4-7 > isolated_cores=1 > no_balance_cores=1 > [bootloader] > # set empty values to disable RHEL initrd setting in cpu-partitioning > initrd_remove_dir= > initrd_dst_img= > initrd_add_dir= > name: openshift-cpu-partitioning > > recommend: > - match: > - label: node-role.kubernetes.io/worker-rt > priority: 20 > profile: openshift-cpu-partitioning > EOF tuned.tuned.openshift.io/openshift-cpu-partitioning created [ocpadmin@ec2-18-217-45-133 ~]$ oc project openshift-cluster-node-tuning-operator Now using project "openshift-cluster-node-tuning-operator" on server "https://api.liqcui-oc4903.qe.gcp.devcluster.openshift.com:6443". [ocpadmin@ec2-18-217-45-133 ~]$ oc get po -o wide|grep liqcui-oc4903-x4dvl-worker-a-xmdcs.c.openshift-qe.internal tuned-fq9jf 1/1 Running 0 29m 10.0.128.2 liqcui-oc4903-x4dvl-worker-a-xmdcs.c.openshift-qe.internal <none> <none> [ocpadmin@ec2-18-217-45-133 ~]$ oc get profile NAME TUNED APPLIED DEGRADED AGE liqcui-oc4903-x4dvl-master-0.c.openshift-qe.internal openshift-control-plane True False 35m liqcui-oc4903-x4dvl-master-1.c.openshift-qe.internal openshift-control-plane True False 35m liqcui-oc4903-x4dvl-master-2.c.openshift-qe.internal openshift-control-plane True False 35m liqcui-oc4903-x4dvl-worker-a-xmdcs.c.openshift-qe.internal openshift-cpu-partitioning True True 29m liqcui-oc4903-x4dvl-worker-b-fl7pn.c.openshift-qe.internal openshift-node True False 29m liqcui-oc4903-x4dvl-worker-c-2d4zg.c.openshift-qe.internal openshift-node True False 30m [ocpadmin@ec2-18-217-45-133 ~]$ oc logs tuned-fq9jf | grep ksm.service [ocpadmin@ec2-18-217-45-133 ~]$ oc logs tuned-fq9jf | tail -15 2022-01-10 10:00:19,575 INFO tuned.daemon.daemon: starting tuning 2022-01-10 10:00:19,579 INFO tuned.plugins.base: instance cpu: assigning devices cpu2, cpu3, cpu1, cpu0 2022-01-10 10:00:19,580 INFO tuned.plugins.plugin_cpu: We are running on an x86 GenuineIntel platform 2022-01-10 10:00:19,583 WARNING tuned.plugins.plugin_cpu: your CPU doesn't support MSR_IA32_ENERGY_PERF_BIAS, ignoring CPU energy performance bias 2022-01-10 10:00:19,585 INFO tuned.plugins.base: instance disk: assigning devices sda 2022-01-10 10:00:19,587 INFO tuned.plugins.base: instance net: assigning devices ens4 2022-01-10 10:00:19,594 INFO tuned.plugins.plugin_cpu: setting new cpu latency 0 2022-01-10 10:00:19,597 ERROR tuned.plugins.plugin_sysctl: Failed to set sysctl parameter 'kernel.nmi_watchdog' to '0': [Errno 524] Unknown error 524 2022-01-10 10:00:19,597 INFO tuned.plugins.plugin_sysctl: reapplying system sysctl 2022-01-10 10:00:19,711 INFO tuned.plugins.plugin_systemd: setting 'CPUAffinity' to '0 2 3' in the '/etc/systemd/system.conf' 2022-01-10 10:00:19,741 INFO tuned.plugins.plugin_script: calling script '/usr/lib/tuned/cpu-partitioning/script.sh' with arguments '['start']' 2022-01-10 10:00:19,881 INFO tuned.plugins.plugin_bootloader: installing additional boot command line parameters to grub2 E0110 10:00:19.882539 2566 tuned.go:776] unable to sync(daemon/) requeued (4) E0110 10:00:19.882576 2566 tuned.go:776] unable to sync(daemon/) requeued (5) 2022-01-10 10:00:19,882 INFO tuned.daemon.daemon: static tuning from profile 'openshift-cpu-partitioning' applied Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.9.15 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:0110 |
Description of problem: When the Tuned profile is updated. The tuned profile is applied to the node, but still remains DEGRADED. Version-Release number of selected component (if applicable): $ omg get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.12 True False 38m Error while reconciling 4.9.12: the cluster operator insights is degraded How reproducible: Steps to Reproduce: 1. Install and setup performance addon operator [root@bastion1 dk]# oc get performanceprofiles.performance.openshift.io performance -oyaml apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: creationTimestamp: "2021-11-02T10:18:56Z" finalizers: - foreground-deletion generation: 1 name: performance resourceVersion: "9172819" uid: 931a600a-7e9a-499d-9e08-f99abbdd90ed spec: cpu: isolated: 4-39,44-79 reserved: 0-3,40-43 globallyDisableIrqLoadBalancing: true hugepages: defaultHugepagesSize: 1G pages: - count: 32 node: 0 size: 1G - count: 32 node: 1 size: 1G nodeSelector: node-role.kubernetes.io/sys: "" numa: topologyPolicy: restricted 2. create a tuned profile [root@bastion1 smile]# cat tuned_sysctl_socket_buffer_profile.yaml apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: name: sysctl-socket-buffer namespace: openshift-cluster-node-tuning-operator spec: profile: - data: | [main] summary=Set rmem_default,rmem_max,wmem_default,wmem_max include=openshift-node [sysctl] net.core.rmem_default = 2097152 net.core.rmem_max = 2097152 net.core.wmem_default = 2097152 net.core.wmem_max = 2097152 name: openshift-sysctl recommend: - machineConfigLabels: machineconfiguration.openshift.io/role: "sys" priority: 20 profile: openshift-sysctl 3. tuned profile is degraded [root@bastion1 dk]# oc get profile -A NAMESPACE NAME TUNED APPLIED DEGRADED AGE openshift-cluster-node-tuning-operator master01.ss2.samsung.local openshift-control-plane True False 65d openshift-cluster-node-tuning-operator master02.ss2.samsung.local openshift-control-plane True False 64d openshift-cluster-node-tuning-operator master03.ss2.samsung.local openshift-control-plane True False 65d openshift-cluster-node-tuning-operator worker01.ss2.samsung.local openshift-sysctl-oam True True 61d openshift-cluster-node-tuning-operator worker02.ss2.samsung.local openshift-sysctl-oam True False 61d openshift-cluster-node-tuning-operator worker03.ss2.samsung.local openshift-sysctl-oam True True 61d openshift-cluster-node-tuning-operator worker04.ss2.samsung.local openshift-sysctl-oam True False 61d openshift-cluster-node-tuning-operator worker05.ss2.samsung.local openshift-sysctl-sys True False 61d openshift-cluster-node-tuning-operator worker06.ss2.samsung.local openshift-sysctl-sys True True 61d openshift-cluster-node-tuning-operator worker07.ss2.samsung.local openshift-sysctl-sys True False 61d openshift-cluster-node-tuning-operator worker08.ss2.samsung.local openshift-sysctl-sys True False 61d openshift-cluster-node-tuning-operator worker09.ss2.samsung.local openshift-sysctl-call True False 34d openshift-cluster-node-tuning-operator worker10.ss2.samsung.local openshift-sysctl-call True True 34d openshift-cluster-node-tuning-operator worker11.ss2.samsung.local openshift-sysctl-call2 True False 6d20h openshift-cluster-node-tuning-operator worker12.ss2.samsung.local openshift-sysctl-call2 True False 6d20h Actual results: 1) Error occurred in tuned profile -- $ omg get profile worker10.ss2.samsung.local -o yaml ~ status: bootcmdline: skew_tick=1 nohz=on rcu_nocbs=4-27,32-55 tuned.non_isolcpus=f000000f intel_pstate=disable nosoftlockup tsc=nowatchdog intel_iommu=on iommu=pt isolcpus=managed_irq,4-27,32-55 systemd.cpu_affinity=0,1,2,3,28,29,30,31 default_hugepagesz=1G + conditions: - lastTransitionTime: '2021-12-29T03:30:22Z' message: Tuned profile applied. reason: AsExpected status: 'True' type: Applied - lastTransitionTime: '2021-12-29T03:30:22Z' message: Tuned daemon issued one or more error message(s) during profile application. reason: TunedError status: 'True' type: Degraded tunedProfile: openshift-sysctl-call -- 2) error log in tuned Pod -- $ omg logs tuned-zzgm5 ~ 2021-12-29T03:30:24.027172311Z 2021-12-29 03:30:24,027 INFO tuned.plugins.plugin_cpu: setting new cpu latency 2 2021-12-29T03:30:24.033503757Z 2021-12-29 03:30:24,033 INFO tuned.plugins.plugin_sysctl: reapplying system sysctl 2021-12-29T03:30:24.528353891Z 2021-12-29 03:30:24,528 INFO tuned.plugins.plugin_systemd: setting 'CPUAffinity' to '0 1 2 3 28 29 30 31' in the '/etc/systemd/system.conf' 2021-12-29T03:30:25.007818601Z 2021-12-29 03:30:25,007 INFO tuned.plugins.plugin_script: calling script '/usr/lib/tuned/cpu-partitioning/script.sh' with arguments '['start']' 2021-12-29T03:30:25.535868718Z 2021-12-29 03:30:25,535 ERROR tuned.plugins.plugin_script: script '/usr/lib/tuned/cpu-partitioning/script.sh' error output: 'Unit ksm.service does not exist, proceeding anyway. 2021-12-29T03:30:25.535868718Z Unit ksmtuned.service does not exist, proceeding anyway.' 2021-12-29T03:30:25.536893772Z 2021-12-29 03:30:25,536 INFO tuned.plugins.plugin_bootloader: installing additional boot command line parameters to grub2 2021-12-29T03:30:25.537422292Z E1229 03:30:25.537398 16277 tuned.go:776] unable to sync(daemon/) requeued (6) 2021-12-29T03:30:25.537499978Z E1229 03:30:25.537479 16277 tuned.go:776] unable to sync(daemon/) requeued (7) 2021-12-29T03:30:25.537575410Z 2021-12-29 03:30:25,537 INFO tuned.daemon.daemon: static tuning from profile 'openshift-sysctl-call' applied Expected results: tuned profile 'DEGRADED STATUS' will be false Additional info: