+++ This bug was initially created as a clone of Bug #2017488 +++ +++ This bug was initially created as a clone of Bug #2017427 +++ Description of problem: There are cases where TuneD daemon seems to be stuck during applications of a profile (see rhbz#2013940). NTO does not restart TuneD daemon when profile application is taking too long. Version-Release number of selected component (if applicable): All How reproducible: Always Steps to Reproduce: 1. Create a profile that will take too long to get applied by NTO. For example: apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: name: openshift-profile-stuck namespace: openshift-cluster-node-tuning-operator spec: profile: - data: | [main] summary=OpenShift profile stuck [variables] v=${f:exec:sleep:inf} name: openshift-profile-stuck recommend: - match: - label: profile priority: 20 profile: openshift-profile-stuck Actual results: Profile application will never be restarted/retried. Expected results: Profile application should be restarted/retried. Additional info: https://github.com/openshift/cluster-node-tuning-operator/pull/282
Fixed in 4.8.0-0.nightly-2021-11-03-171325 and above. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-11-03-171325 True False 7h37m Cluster version is 4.8.0-0.nightly-2021-11-03-171325 $ oc project openshift-cluster-node-tuning-operator $ oc get po -o wide|grep worker-a tuned-d6s6j 1/1 Running 0 7h51m 10.0.128.3 jmencak-hcp9p-worker-a-7rzvf.c.openshift-gce-devel.internal <none> <none> $ oc label no jmencak-hcp9p-worker-a-7rzvf.c.openshift-gce-devel.internal profile= $ cat stuck.yaml apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: name: openshift-profile-stuck namespace: openshift-cluster-node-tuning-operator spec: profile: - data: | [main] summary=OpenShift profile stuck [variables] v=${f:exec:sleep:72} name: openshift-profile-stuck recommend: - match: - label: profile priority: 20 profile: openshift-profile-stuck $ oc create -f stuck.yaml $ oc logs -f tuned-d6s6j | tail -n17 I1104 15:45:46.249348 2398 tuned.go:542] reloading tuned... I1104 15:45:46.249354 2398 tuned.go:545] sending HUP to PID 3628 2021-11-04 15:45:46,249 INFO tuned.daemon.daemon: stopping tuning 2021-11-04 15:45:46,266 INFO tuned.daemon.daemon: terminating Tuned, rolling back all changes 2021-11-04 15:45:46,313 INFO tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration. 2021-11-04 15:45:46,314 INFO tuned.daemon.daemon: Using 'openshift-profile-stuck' profile 2021-11-04 15:45:46,314 INFO tuned.profiles.loader: loading profile: openshift-profile-stuck E1104 15:46:46.249925 2398 tuned.go:1128] timeout (60) to apply TuneD profile; restarting TuneD daemon E1104 15:46:56.252435 2398 tuned.go:479] error waiting for tuned: signal: killed I1104 15:46:56.252578 2398 tuned.go:429] starting tuned... I1104 15:46:56.268933 2398 tuned.go:917] updated Profile jmencak-hcp9p-worker-a-7rzvf.c.openshift-gce-devel.internal stalld=<nil>, bootcmdline: I1104 15:46:56.269286 2398 tuned.go:416] written "/etc/tuned/recommend.d/50-openshift.conf" to set Tuned profile openshift-profile-stuck 2021-11-04 15:46:56,371 INFO tuned.daemon.application: dynamic tuning is globally disabled 2021-11-04 15:46:56,377 INFO tuned.daemon.daemon: using sleep interval of 1 second(s) 2021-11-04 15:46:56,377 INFO tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration. 2021-11-04 15:46:56,378 INFO tuned.daemon.daemon: Using 'openshift-profile-stuck' profile 2021-11-04 15:46:56,379 INFO tuned.profiles.loader: loading profile: openshift-profile-stuck In 4.8, no exponential backoff was implemented, but the profile application times out after 60 seconds and is retried. QE, please acknowledge the fix.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.8.20 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4574