Bug 2017488
| Summary: | NTO does not restart TuneD daemon when profile application is taking too long | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | OpenShift BugZilla Robot <openshift-bugzilla-robot> | |
| Component: | Node Tuning Operator | Assignee: | Jiří Mencák <jmencak> | |
| Status: | CLOSED ERRATA | QA Contact: | liqcui | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 4.10 | CC: | aos-bugs, dagray | |
| Target Milestone: | --- | |||
| Target Release: | 4.9.z | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2018053 (view as bug list) | Environment: | ||
| Last Closed: | 2021-11-10 21:03:03 UTC | Type: | --- | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 2017427, 2029436 | |||
| Bug Blocks: | 2018053 | |||
|
Description
OpenShift BugZilla Robot
2021-10-26 15:30:17 UTC
Fixed in 4.9.0-0.nightly-2021-10-30-120753 and above. QE, please confirm so we can unblock the 4.8 backport.
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.9.0-0.nightly-2021-10-30-120753 True False 88m Cluster version is 4.9.0-0.nightly-2021-10-30-120753
$ oc get no
NAME STATUS ROLES AGE VERSION
jmencak-fh99x-master-0.c.openshift-gce-devel.internal Ready master 104m v1.22.0-rc.0+a44d0f0
jmencak-fh99x-master-1.c.openshift-gce-devel.internal Ready master 105m v1.22.0-rc.0+a44d0f0
jmencak-fh99x-master-2.c.openshift-gce-devel.internal Ready master 104m v1.22.0-rc.0+a44d0f0
jmencak-fh99x-worker-a-kkhfc.c.openshift-gce-devel.internal Ready worker 97m v1.22.0-rc.0+a44d0f0
jmencak-fh99x-worker-b-hxdms.c.openshift-gce-devel.internal Ready worker 97m v1.22.0-rc.0+a44d0f0
$ oc label no jmencak-fh99x-worker-a-kkhfc.c.openshift-gce-devel.internal profile=
node/jmencak-fh99x-worker-a-kkhfc.c.openshift-gce-devel.internal labeled
$ cat stuck.yaml
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
name: openshift-profile-stuck
namespace: openshift-cluster-node-tuning-operator
spec:
profile:
- data: |
[main]
summary=OpenShift profile stuck
[variables]
v=${f:exec:sleep:72}
name: openshift-profile-stuck
recommend:
- match:
- label: profile
priority: 20
profile: openshift-profile-stuck
$ oc create -f stuck.yaml
$ oc project openshift-cluster-node-tuning-operator
$ oc get po -o wide|grep worker-a
tuned-kkvr9 1/1 Running 0 101m 10.0.128.3 jmencak-fh99x-worker-a-kkhfc.c.openshift-gce-devel.internal <none> <none>
$ oc logs tuned-kkvr9 | tail -n28
I1102 11:59:12.416986 2274 tuned.go:1229] previous application of TuneD profile failed; change detected, scheduling full restart in 1s
2021-11-02 11:59:12,518 INFO tuned.daemon.application: dynamic tuning is globally disabled
2021-11-02 11:59:12,523 INFO tuned.daemon.daemon: using sleep interval of 1 second(s)
2021-11-02 11:59:12,523 INFO tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2021-11-02 11:59:12,524 INFO tuned.daemon.daemon: Using 'openshift-profile-stuck' profile
2021-11-02 11:59:12,524 INFO tuned.profiles.loader: loading profile: openshift-profile-stuck
E1102 11:59:13.417860 2274 tuned.go:1211] timeout (60) to apply TuneD profile; restarting TuneD daemon
E1102 11:59:13.419970 2274 tuned.go:508] error waiting for tuned: signal: terminated
I1102 11:59:13.420128 2274 tuned.go:441] starting tuned...
2021-11-02 11:59:13,538 INFO tuned.daemon.application: dynamic tuning is globally disabled
2021-11-02 11:59:13,543 INFO tuned.daemon.daemon: using sleep interval of 1 second(s)
2021-11-02 11:59:13,543 INFO tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2021-11-02 11:59:13,544 INFO tuned.daemon.daemon: Using 'openshift-profile-stuck' profile
2021-11-02 11:59:13,544 INFO tuned.profiles.loader: loading profile: openshift-profile-stuck
E1102 12:00:13.420158 2274 tuned.go:1211] timeout (120) to apply TuneD profile; restarting TuneD daemon
E1102 12:00:13.421876 2274 tuned.go:508] error waiting for tuned: signal: terminated
I1102 12:00:13.421965 2274 tuned.go:441] starting tuned...
2021-11-02 12:00:13,532 INFO tuned.daemon.application: dynamic tuning is globally disabled
2021-11-02 12:00:13,537 INFO tuned.daemon.daemon: using sleep interval of 1 second(s)
2021-11-02 12:00:13,538 INFO tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2021-11-02 12:00:13,538 INFO tuned.daemon.daemon: Using 'openshift-profile-stuck' profile
2021-11-02 12:00:13,539 INFO tuned.profiles.loader: loading profile: openshift-profile-stuck
2021-11-02 12:01:25,544 INFO tuned.daemon.controller: starting controller
2021-11-02 12:01:25,544 INFO tuned.daemon.daemon: starting tuning
2021-11-02 12:01:25,546 INFO tuned.daemon.daemon: static tuning from profile 'openshift-profile-stuck' applied
I1102 12:01:25.558914 2274 tuned.go:428] written "/etc/tuned/recommend.d/50-openshift.conf" to set TuneD profile openshift-profile-stuck
I1102 12:01:25.559183 2274 tuned.go:995] updated Profile jmencak-fh99x-worker-a-kkhfc.c.openshift-gce-devel.internal stalld=<nil>, bootcmdline:
I1102 12:01:25.682873 2274 tuned.go:719] active and recommended profile (openshift-profile-stuck) match; profile change will not trigger profile reload
Verified in my environment also, the bugs is fixed now Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.9.6 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4119 |