Hide Forgot
Description of problem: There are cases where TuneD daemon seems to be stuck during applications of a profile (see rhbz#2013940). NTO does not restart TuneD daemon when profile application is taking too long. Version-Release number of selected component (if applicable): All How reproducible: Always Steps to Reproduce: 1. Create a profile that will take too long to get applied by NTO. For example: apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: name: openshift-profile-stuck namespace: openshift-cluster-node-tuning-operator spec: profile: - data: | [main] summary=OpenShift profile stuck [variables] v=${f:exec:sleep:inf} name: openshift-profile-stuck recommend: - match: - label: profile priority: 20 profile: openshift-profile-stuck Actual results: Profile application will never be restarted/retried. Expected results: Profile application should be restarted/retried. Additional info: https://github.com/openshift/cluster-node-tuning-operator/pull/282
Fixed in 4.10.0-0.nightly-2021-10-27-230233 and above. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2021-10-27-230233 True False 20m Cluster version is 4.10.0-0.nightly-2021-10-27-230233 $ oc get no NAME STATUS ROLES AGE VERSION jmencak-7r5lg-master-0.c.openshift-gce-devel.internal Ready master 35m v1.22.1+674f31e jmencak-7r5lg-master-1.c.openshift-gce-devel.internal Ready master 35m v1.22.1+674f31e jmencak-7r5lg-master-2.c.openshift-gce-devel.internal Ready master 35m v1.22.1+674f31e jmencak-7r5lg-worker-a-tlq29.c.openshift-gce-devel.internal Ready worker 27m v1.22.1+674f31e jmencak-7r5lg-worker-b-dd727.c.openshift-gce-devel.internal Ready worker 27m v1.22.1+674f31e $ oc label no jmencak-7r5lg-worker-a-tlq29.c.openshift-gce-devel.internal profile= node/jmencak-7r5lg-worker-a-tlq29.c.openshift-gce-devel.internal labeled $ oc get po -o wide|grep jmencak-7r5lg-worker-a-tlq29.c.openshift-gce-devel.internal tuned-pnl8x 1/1 Running 0 28m 10.0.128.2 jmencak-7r5lg-worker-a-tlq29.c.openshift-gce-devel.internal <none> <none> $ cat stuck.yaml apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: name: openshift-profile-stuck namespace: openshift-cluster-node-tuning-operator spec: profile: - data: | [main] summary=OpenShift profile stuck [variables] v=${f:exec:sleep:72} name: openshift-profile-stuck recommend: - match: - label: profile priority: 20 profile: openshift-profile-stuck $ oc create -f stuck.yaml $ oc logs tuned-pnl8x | tail -n 28 I1028 06:37:13.201963 2182 tuned.go:1229] previous application of TuneD profile failed; change detected, scheduling full restart in 1s 2021-10-28 06:37:13,299 INFO tuned.daemon.application: dynamic tuning is globally disabled 2021-10-28 06:37:13,303 INFO tuned.daemon.daemon: using sleep interval of 1 second(s) 2021-10-28 06:37:13,304 INFO tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration. 2021-10-28 06:37:13,304 INFO tuned.daemon.daemon: Using 'openshift-profile-stuck' profile 2021-10-28 06:37:13,305 INFO tuned.profiles.loader: loading profile: openshift-profile-stuck E1028 06:37:14.202848 2182 tuned.go:1211] timeout (60) to apply TuneD profile; restarting TuneD daemon E1028 06:37:14.205003 2182 tuned.go:508] error waiting for tuned: signal: terminated I1028 06:37:14.205213 2182 tuned.go:441] starting tuned... 2021-10-28 06:37:14,327 INFO tuned.daemon.application: dynamic tuning is globally disabled 2021-10-28 06:37:14,332 INFO tuned.daemon.daemon: using sleep interval of 1 second(s) 2021-10-28 06:37:14,333 INFO tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration. 2021-10-28 06:37:14,333 INFO tuned.daemon.daemon: Using 'openshift-profile-stuck' profile 2021-10-28 06:37:14,334 INFO tuned.profiles.loader: loading profile: openshift-profile-stuck E1028 06:38:14.205888 2182 tuned.go:1211] timeout (120) to apply TuneD profile; restarting TuneD daemon E1028 06:38:14.207821 2182 tuned.go:508] error waiting for tuned: signal: terminated I1028 06:38:14.208077 2182 tuned.go:441] starting tuned... 2021-10-28 06:38:14,339 INFO tuned.daemon.application: dynamic tuning is globally disabled 2021-10-28 06:38:14,343 INFO tuned.daemon.daemon: using sleep interval of 1 second(s) 2021-10-28 06:38:14,344 INFO tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration. 2021-10-28 06:38:14,344 INFO tuned.daemon.daemon: Using 'openshift-profile-stuck' profile 2021-10-28 06:38:14,345 INFO tuned.profiles.loader: loading profile: openshift-profile-stuck 2021-10-28 06:39:26,351 INFO tuned.daemon.controller: starting controller 2021-10-28 06:39:26,351 INFO tuned.daemon.daemon: starting tuning 2021-10-28 06:39:26,352 INFO tuned.daemon.daemon: static tuning from profile 'openshift-profile-stuck' applied I1028 06:39:26.365143 2182 tuned.go:995] updated Profile jmencak-7r5lg-worker-a-tlq29.c.openshift-gce-devel.internal stalld=<nil>, bootcmdline: I1028 06:39:26.365402 2182 tuned.go:428] written "/etc/tuned/recommend.d/50-openshift.conf" to set TuneD profile openshift-profile-stuck I1028 06:39:26.476307 2182 tuned.go:719] active and recommended profile (openshift-profile-stuck) match; profile change will not trigger profile reload $ oc get profile NAME TUNED APPLIED DEGRADED AGE jmencak-7r5lg-master-0.c.openshift-gce-devel.internal openshift-control-plane True False 48m jmencak-7r5lg-master-1.c.openshift-gce-devel.internal openshift-control-plane True False 48m jmencak-7r5lg-master-2.c.openshift-gce-devel.internal openshift-control-plane True False 48m jmencak-7r5lg-worker-a-tlq29.c.openshift-gce-devel.internal openshift-profile-stuck True False 42m jmencak-7r5lg-worker-b-dd727.c.openshift-gce-devel.internal openshift-node True False 42m
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056