Bug 2020518 - NTO does not restart TuneD daemon when profile application is taking too long
Summary: NTO does not restart TuneD daemon when profile application is taking too long
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node Tuning Operator
Version: 4.10
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.7.z
Assignee: Jiří Mencák
QA Contact: liqcui
URL:
Whiteboard:
: 2013940 (view as bug list)
Depends On: 2018053
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-11-05 07:35 UTC by Jiří Mencák
Modified: 2021-12-15 12:56 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2018053
Environment:
Last Closed: 2021-12-01 13:35:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-node-tuning-operator pull 287 0 None open Bug 2020518: tuned: add timeout and restarts 2021-11-05 07:47:18 UTC
Red Hat Product Errata RHBA-2021:4802 0 None None None 2021-12-01 13:35:43 UTC

Comment 2 liqcui 2021-11-15 05:01:40 UTC
Verified Result:
[ocpadmin@ec2-18-217-45-133 ~]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-11-08-025909   True        False         25m     Cluster version is 4.7.0-0.nightly-2021-11-08-025909
[ocpadmin@ec2-18-217-45-133 ~]$ oc project openshift-cluster-node-tuning-operator
Now using project "openshift-cluster-node-tuning-operator" on server "https://api.liqcui-ocaws47z.qe.devcluster.openshift.com:6443".
[ocpadmin@ec2-18-217-45-133 ~]$ oc get pods -o wide | grep ip-10-0-130-247.us-east-2.compute.internal
tuned-xd6cx                                    1/1     Running   0          42m   10.0.130.247   ip-10-0-130-247.us-east-2.compute.internal   <none>           <none>
[ocpadmin@ec2-18-217-45-133 nto]$ cat stuck.yaml 
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: openshift-profile-stuck
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=OpenShift profile stuck
      [variables]
      v=${f:exec:sleep:72}
    name: openshift-profile-stuck
  recommend:
  - match:
    - label: profile
    priority: 20
    profile: openshift-profile-stuck

[ocpadmin@ec2-18-217-45-133 nto]$  oc label no ip-10-0-130-247.us-east-2.compute.internal profile=
node/ip-10-0-130-247.us-east-2.compute.internal labeled

[ocpadmin@ec2-18-217-45-133 nto]$ oc create -f stuck.yaml
tuned.tuned.openshift.io/openshift-profile-stuck created
[ocpadmin@ec2-18-217-45-133 nto]$ oc logs -f tuned-xd6cx
E1115 03:45:09.179778    1993 reflector.go:138] github.com/openshift/cluster-node-tuning-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to watch *v1.Profile: failed to list *v1.Profile: Get "https://172.30.0.1:443/apis/tuned.openshift.io/v1/namespaces/openshift-cluster-node-tuning-operator/profiles?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: connect: no route to host
E1115 03:45:09.180594    1993 reflector.go:138] github.com/openshift/cluster-node-tuning-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to watch *v1.Tuned: failed to list *v1.Tuned: Get "https://172.30.0.1:443/apis/tuned.openshift.io/v1/namespaces/openshift-cluster-node-tuning-operator/tuneds?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: connect: no route to host
I1115 03:45:10.788676    1993 tuned.go:923] started events processor
I1115 03:45:10.789536    1993 tuned.go:966] started controller
I1115 03:45:10.789604    1993 tuned.go:308] extracting Tuned profiles
I1115 03:46:51.705191    1993 tuned.go:291] disabling system tuned...
I1115 03:46:51.722193    1993 tuned.go:386] written "/etc/tuned/recommend.d/50-openshift.conf" to set Tuned profile openshift-node
I1115 03:46:51.942691    1993 tuned.go:633] re-applying profile () as the previous application did not complete
I1115 03:46:51.942741    1993 tuned.go:399] starting tuned...
2021-11-15 03:46:52,092 INFO     tuned.daemon.application: dynamic tuning is globally disabled
2021-11-15 03:46:52,106 INFO     tuned.daemon.daemon: using sleep interval of 1 second(s)
2021-11-15 03:46:52,106 INFO     tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2021-11-15 03:46:52,107 INFO     tuned.daemon.daemon: Using 'openshift-node' profile
2021-11-15 03:46:52,108 INFO     tuned.profiles.loader: loading profile: openshift-node
2021-11-15 03:46:52,165 INFO     tuned.daemon.controller: starting controller
2021-11-15 03:46:52,165 INFO     tuned.daemon.daemon: starting tuning
2021-11-15 03:46:52,177 INFO     tuned.plugins.base: instance cpu: assigning devices cpu0, cpu1
2021-11-15 03:46:52,183 INFO     tuned.plugins.plugin_cpu: We are running on an x86 GenuineIntel platform
2021-11-15 03:46:52,186 WARNING  tuned.plugins.plugin_cpu: your CPU doesn't support MSR_IA32_ENERGY_PERF_BIAS, ignoring CPU energy performance bias
2021-11-15 03:46:52,187 WARNING  tuned.plugins.base: instance disk: no matching devices available
2021-11-15 03:46:52,190 INFO     tuned.plugins.base: instance net: assigning devices ens5
2021-11-15 03:46:52,223 INFO     tuned.plugins.plugin_sysctl: reapplying system sysctl
2021-11-15 03:46:52,231 INFO     tuned.daemon.daemon: static tuning from profile 'openshift-node' applied
I1115 04:28:05.996296    1993 tuned.go:308] extracting Tuned profiles
I1115 04:28:06.198980    1993 tuned.go:342] recommended Tuned profile openshift-node content unchanged
I1115 04:28:52.174948    1993 tuned.go:386] written "/etc/tuned/recommend.d/50-openshift.conf" to set Tuned profile openshift-profile-stuck
I1115 04:28:52.908622    1993 tuned.go:639] active profile (openshift-node) != recommended profile (openshift-profile-stuck)
I1115 04:28:52.908665    1993 tuned.go:495] reloading tuned...
I1115 04:28:52.908673    1993 tuned.go:498] sending HUP to PID 6768
2021-11-15 04:28:52,908 INFO     tuned.daemon.daemon: stopping tuning
2021-11-15 04:28:52,956 INFO     tuned.daemon.daemon: terminating Tuned, rolling back all changes
2021-11-15 04:28:52,965 INFO     tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2021-11-15 04:28:52,966 INFO     tuned.daemon.daemon: Using 'openshift-profile-stuck' profile
2021-11-15 04:28:52,967 INFO     tuned.profiles.loader: loading profile: openshift-profile-stuck
I1115 04:29:39.594877    1993 tuned.go:308] extracting Tuned profiles
I1115 04:29:39.714649    1993 tuned.go:386] written "/etc/tuned/recommend.d/50-openshift.conf" to set Tuned profile openshift-node
I1115 04:29:53.009679    1993 tuned.go:308] extracting Tuned profiles
I1115 04:29:53.194057    1993 tuned.go:342] recommended Tuned profile openshift-node content unchanged
I1115 04:29:53.194302    1993 tuned.go:386] written "/etc/tuned/recommend.d/50-openshift.conf" to set Tuned profile openshift-profile-stuck
2021-11-15 04:30:04,976 INFO     tuned.daemon.daemon: starting tuning
2021-11-15 04:30:04,978 INFO     tuned.daemon.daemon: static tuning from profile 'openshift-profile-stuck' applied
I1115 04:30:05.923355    1993 tuned.go:650] active and recommended profile (openshift-profile-stuck) match; profile change will not trigger profile reload








^C
[ocpadmin@ec2-18-217-45-133 nto]$ oc get profiles
NAME                                         AGE
ip-10-0-130-247.us-east-2.compute.internal   48m
ip-10-0-158-17.us-east-2.compute.internal    53m
ip-10-0-180-236.us-east-2.compute.internal   53m
ip-10-0-182-244.us-east-2.compute.internal   47m
ip-10-0-203-105.us-east-2.compute.internal   48m
ip-10-0-216-59.us-east-2.compute.internal    53m
[ocpadmin@ec2-18-217-45-133 nto]$ oc logs -f tuned-xd6cx
E1115 03:45:09.179778    1993 reflector.go:138] github.com/openshift/cluster-node-tuning-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to watch *v1.Profile: failed to list *v1.Profile: Get "https://172.30.0.1:443/apis/tuned.openshift.io/v1/namespaces/openshift-cluster-node-tuning-operator/profiles?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: connect: no route to host
E1115 03:45:09.180594    1993 reflector.go:138] github.com/openshift/cluster-node-tuning-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to watch *v1.Tuned: failed to list *v1.Tuned: Get "https://172.30.0.1:443/apis/tuned.openshift.io/v1/namespaces/openshift-cluster-node-tuning-operator/tuneds?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: connect: no route to host
I1115 03:45:10.788676    1993 tuned.go:923] started events processor
I1115 03:45:10.789536    1993 tuned.go:966] started controller
I1115 03:45:10.789604    1993 tuned.go:308] extracting Tuned profiles
I1115 03:46:51.705191    1993 tuned.go:291] disabling system tuned...
I1115 03:46:51.722193    1993 tuned.go:386] written "/etc/tuned/recommend.d/50-openshift.conf" to set Tuned profile openshift-node
I1115 03:46:51.942691    1993 tuned.go:633] re-applying profile () as the previous application did not complete
I1115 03:46:51.942741    1993 tuned.go:399] starting tuned...
2021-11-15 03:46:52,092 INFO     tuned.daemon.application: dynamic tuning is globally disabled
2021-11-15 03:46:52,106 INFO     tuned.daemon.daemon: using sleep interval of 1 second(s)
2021-11-15 03:46:52,106 INFO     tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2021-11-15 03:46:52,107 INFO     tuned.daemon.daemon: Using 'openshift-node' profile
2021-11-15 03:46:52,108 INFO     tuned.profiles.loader: loading profile: openshift-node
2021-11-15 03:46:52,165 INFO     tuned.daemon.controller: starting controller
2021-11-15 03:46:52,165 INFO     tuned.daemon.daemon: starting tuning
2021-11-15 03:46:52,177 INFO     tuned.plugins.base: instance cpu: assigning devices cpu0, cpu1
2021-11-15 03:46:52,183 INFO     tuned.plugins.plugin_cpu: We are running on an x86 GenuineIntel platform
2021-11-15 03:46:52,186 WARNING  tuned.plugins.plugin_cpu: your CPU doesn't support MSR_IA32_ENERGY_PERF_BIAS, ignoring CPU energy performance bias
2021-11-15 03:46:52,187 WARNING  tuned.plugins.base: instance disk: no matching devices available
2021-11-15 03:46:52,190 INFO     tuned.plugins.base: instance net: assigning devices ens5
2021-11-15 03:46:52,223 INFO     tuned.plugins.plugin_sysctl: reapplying system sysctl
2021-11-15 03:46:52,231 INFO     tuned.daemon.daemon: static tuning from profile 'openshift-node' applied
I1115 04:28:05.996296    1993 tuned.go:308] extracting Tuned profiles
I1115 04:28:06.198980    1993 tuned.go:342] recommended Tuned profile openshift-node content unchanged
I1115 04:28:52.174948    1993 tuned.go:386] written "/etc/tuned/recommend.d/50-openshift.conf" to set Tuned profile openshift-profile-stuck
I1115 04:28:52.908622    1993 tuned.go:639] active profile (openshift-node) != recommended profile (openshift-profile-stuck)
I1115 04:28:52.908665    1993 tuned.go:495] reloading tuned...
I1115 04:28:52.908673    1993 tuned.go:498] sending HUP to PID 6768
2021-11-15 04:28:52,908 INFO     tuned.daemon.daemon: stopping tuning
2021-11-15 04:28:52,956 INFO     tuned.daemon.daemon: terminating Tuned, rolling back all changes
2021-11-15 04:28:52,965 INFO     tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2021-11-15 04:28:52,966 INFO     tuned.daemon.daemon: Using 'openshift-profile-stuck' profile
2021-11-15 04:28:52,967 INFO     tuned.profiles.loader: loading profile: openshift-profile-stuck
I1115 04:29:39.594877    1993 tuned.go:308] extracting Tuned profiles
I1115 04:29:39.714649    1993 tuned.go:386] written "/etc/tuned/recommend.d/50-openshift.conf" to set Tuned profile openshift-node
I1115 04:29:53.009679    1993 tuned.go:308] extracting Tuned profiles
I1115 04:29:53.194057    1993 tuned.go:342] recommended Tuned profile openshift-node content unchanged
I1115 04:29:53.194302    1993 tuned.go:386] written "/etc/tuned/recommend.d/50-openshift.conf" to set Tuned profile openshift-profile-stuck
2021-11-15 04:30:04,976 INFO     tuned.daemon.daemon: starting tuning
2021-11-15 04:30:04,978 INFO     tuned.daemon.daemon: static tuning from profile 'openshift-profile-stuck' applied
I1115 04:30:05.923355    1993 tuned.go:650] active and recommended profile (openshift-profile-stuck) match; profile change will not trigger profile reload


Compared the log with 4.8/4.9/4.10, no timeout and restart.

Comment 3 Jiří Mencák 2021-11-15 08:45:34 UTC
Thank you for looking Liquan!

The fix (https://github.com/openshift/cluster-node-tuning-operator/pull/287) merged 5 days ago on 2021-11-10, so your observation this is not fixed in
4.7.0-0.nightly-2021-11-08-025909 is correct.

I've tested on 4.7.0-0.nightly-2021-11-12-230709 (red build) and it is fixed there:

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-11-12-230709   True        False         13m     Cluster version is 4.7.0-0.nightly-2021-11-12-230709


I1115 08:35:43.253765    2368 tuned.go:531] sending HUP to PID 3070
2021-11-15 08:35:43,253 INFO     tuned.daemon.daemon: stopping tuning
2021-11-15 08:35:43,271 INFO     tuned.daemon.daemon: terminating Tuned, rolling back all changes
2021-11-15 08:35:43,280 INFO     tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2021-11-15 08:35:43,280 INFO     tuned.daemon.daemon: Using 'openshift-profile-stuck' profile
2021-11-15 08:35:43,281 INFO     tuned.profiles.loader: loading profile: openshift-profile-stuck
E1115 08:36:43.253936    2368 tuned.go:1055] timeout (60) to apply TuneD profile; restarting TuneD daemon
E1115 08:36:53.255773    2368 tuned.go:465] error waiting for tuned: signal: killed
I1115 08:36:53.255884    2368 tuned.go:415] starting tuned...
2021-11-15 08:36:53,368 INFO     tuned.daemon.application: dynamic tuning is globally disabled
2021-11-15 08:36:53,372 INFO     tuned.daemon.daemon: using sleep interval of 1 second(s)
2021-11-15 08:36:53,373 INFO     tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2021-11-15 08:36:53,373 INFO     tuned.daemon.daemon: Using 'openshift-profile-stuck' profile
2021-11-15 08:36:53,374 INFO     tuned.profiles.loader: loading profile: openshift-profile-stuck

Comment 8 errata-xmlrpc 2021-12-01 13:35:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.38 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4802

Comment 9 Jiří Mencák 2021-12-15 12:56:05 UTC
*** Bug 2013940 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.