Bug 2017427
| Summary: | NTO does not restart TuneD daemon when profile application is taking too long | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Jiří Mencák <jmencak> | |
| Component: | Node Tuning Operator | Assignee: | Jiří Mencák <jmencak> | |
| Status: | CLOSED ERRATA | QA Contact: | liqcui | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 4.10 | CC: | aos-bugs, dagray, liqcui | |
| Target Milestone: | --- | |||
| Target Release: | 4.10.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2029436 (view as bug list) | Environment: | ||
| Last Closed: | 2022-03-10 16:22:07 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2017488, 2029436 | |||
Fixed in 4.10.0-0.nightly-2021-10-27-230233 and above.
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.10.0-0.nightly-2021-10-27-230233 True False 20m Cluster version is 4.10.0-0.nightly-2021-10-27-230233
$ oc get no
NAME STATUS ROLES AGE VERSION
jmencak-7r5lg-master-0.c.openshift-gce-devel.internal Ready master 35m v1.22.1+674f31e
jmencak-7r5lg-master-1.c.openshift-gce-devel.internal Ready master 35m v1.22.1+674f31e
jmencak-7r5lg-master-2.c.openshift-gce-devel.internal Ready master 35m v1.22.1+674f31e
jmencak-7r5lg-worker-a-tlq29.c.openshift-gce-devel.internal Ready worker 27m v1.22.1+674f31e
jmencak-7r5lg-worker-b-dd727.c.openshift-gce-devel.internal Ready worker 27m v1.22.1+674f31e
$ oc label no jmencak-7r5lg-worker-a-tlq29.c.openshift-gce-devel.internal profile=
node/jmencak-7r5lg-worker-a-tlq29.c.openshift-gce-devel.internal labeled
$ oc get po -o wide|grep jmencak-7r5lg-worker-a-tlq29.c.openshift-gce-devel.internal
tuned-pnl8x 1/1 Running 0 28m 10.0.128.2 jmencak-7r5lg-worker-a-tlq29.c.openshift-gce-devel.internal <none> <none>
$ cat stuck.yaml
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
name: openshift-profile-stuck
namespace: openshift-cluster-node-tuning-operator
spec:
profile:
- data: |
[main]
summary=OpenShift profile stuck
[variables]
v=${f:exec:sleep:72}
name: openshift-profile-stuck
recommend:
- match:
- label: profile
priority: 20
profile: openshift-profile-stuck
$ oc create -f stuck.yaml
$ oc logs tuned-pnl8x | tail -n 28
I1028 06:37:13.201963 2182 tuned.go:1229] previous application of TuneD profile failed; change detected, scheduling full restart in 1s
2021-10-28 06:37:13,299 INFO tuned.daemon.application: dynamic tuning is globally disabled
2021-10-28 06:37:13,303 INFO tuned.daemon.daemon: using sleep interval of 1 second(s)
2021-10-28 06:37:13,304 INFO tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2021-10-28 06:37:13,304 INFO tuned.daemon.daemon: Using 'openshift-profile-stuck' profile
2021-10-28 06:37:13,305 INFO tuned.profiles.loader: loading profile: openshift-profile-stuck
E1028 06:37:14.202848 2182 tuned.go:1211] timeout (60) to apply TuneD profile; restarting TuneD daemon
E1028 06:37:14.205003 2182 tuned.go:508] error waiting for tuned: signal: terminated
I1028 06:37:14.205213 2182 tuned.go:441] starting tuned...
2021-10-28 06:37:14,327 INFO tuned.daemon.application: dynamic tuning is globally disabled
2021-10-28 06:37:14,332 INFO tuned.daemon.daemon: using sleep interval of 1 second(s)
2021-10-28 06:37:14,333 INFO tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2021-10-28 06:37:14,333 INFO tuned.daemon.daemon: Using 'openshift-profile-stuck' profile
2021-10-28 06:37:14,334 INFO tuned.profiles.loader: loading profile: openshift-profile-stuck
E1028 06:38:14.205888 2182 tuned.go:1211] timeout (120) to apply TuneD profile; restarting TuneD daemon
E1028 06:38:14.207821 2182 tuned.go:508] error waiting for tuned: signal: terminated
I1028 06:38:14.208077 2182 tuned.go:441] starting tuned...
2021-10-28 06:38:14,339 INFO tuned.daemon.application: dynamic tuning is globally disabled
2021-10-28 06:38:14,343 INFO tuned.daemon.daemon: using sleep interval of 1 second(s)
2021-10-28 06:38:14,344 INFO tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2021-10-28 06:38:14,344 INFO tuned.daemon.daemon: Using 'openshift-profile-stuck' profile
2021-10-28 06:38:14,345 INFO tuned.profiles.loader: loading profile: openshift-profile-stuck
2021-10-28 06:39:26,351 INFO tuned.daemon.controller: starting controller
2021-10-28 06:39:26,351 INFO tuned.daemon.daemon: starting tuning
2021-10-28 06:39:26,352 INFO tuned.daemon.daemon: static tuning from profile 'openshift-profile-stuck' applied
I1028 06:39:26.365143 2182 tuned.go:995] updated Profile jmencak-7r5lg-worker-a-tlq29.c.openshift-gce-devel.internal stalld=<nil>, bootcmdline:
I1028 06:39:26.365402 2182 tuned.go:428] written "/etc/tuned/recommend.d/50-openshift.conf" to set TuneD profile openshift-profile-stuck
I1028 06:39:26.476307 2182 tuned.go:719] active and recommended profile (openshift-profile-stuck) match; profile change will not trigger profile reload
$ oc get profile
NAME TUNED APPLIED DEGRADED AGE
jmencak-7r5lg-master-0.c.openshift-gce-devel.internal openshift-control-plane True False 48m
jmencak-7r5lg-master-1.c.openshift-gce-devel.internal openshift-control-plane True False 48m
jmencak-7r5lg-master-2.c.openshift-gce-devel.internal openshift-control-plane True False 48m
jmencak-7r5lg-worker-a-tlq29.c.openshift-gce-devel.internal openshift-profile-stuck True False 42m
jmencak-7r5lg-worker-b-dd727.c.openshift-gce-devel.internal openshift-node True False 42m
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |
Description of problem: There are cases where TuneD daemon seems to be stuck during applications of a profile (see rhbz#2013940). NTO does not restart TuneD daemon when profile application is taking too long. Version-Release number of selected component (if applicable): All How reproducible: Always Steps to Reproduce: 1. Create a profile that will take too long to get applied by NTO. For example: apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: name: openshift-profile-stuck namespace: openshift-cluster-node-tuning-operator spec: profile: - data: | [main] summary=OpenShift profile stuck [variables] v=${f:exec:sleep:inf} name: openshift-profile-stuck recommend: - match: - label: profile priority: 20 profile: openshift-profile-stuck Actual results: Profile application will never be restarted/retried. Expected results: Profile application should be restarted/retried. Additional info: https://github.com/openshift/cluster-node-tuning-operator/pull/282