Bug 2020518

Summary:	NTO does not restart TuneD daemon when profile application is taking too long
Product:	OpenShift Container Platform	Reporter:	Jiří Mencák <jmencak>
Component:	Node Tuning Operator	Assignee:	Jiří Mencák <jmencak>
Status:	CLOSED ERRATA	QA Contact:	liqcui
Severity:	high	Docs Contact:
Priority:	high
Version:	4.10	CC:	aos-bugs, cback, dagray, liqcui, openshift-bugzilla-robot, skordas
Target Milestone:	---
Target Release:	4.7.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	2018053	Environment:
Last Closed:	2021-12-01 13:35:22 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	2018053
Bug Blocks:

Comment 2 liqcui 2021-11-15 05:01:40 UTC

Verified Result:
[ocpadmin@ec2-18-217-45-133 ~]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-11-08-025909   True        False         25m     Cluster version is 4.7.0-0.nightly-2021-11-08-025909
[ocpadmin@ec2-18-217-45-133 ~]$ oc project openshift-cluster-node-tuning-operator
Now using project "openshift-cluster-node-tuning-operator" on server "https://api.liqcui-ocaws47z.qe.devcluster.openshift.com:6443".
[ocpadmin@ec2-18-217-45-133 ~]$ oc get pods -o wide | grep ip-10-0-130-247.us-east-2.compute.internal
tuned-xd6cx                                    1/1     Running   0          42m   10.0.130.247   ip-10-0-130-247.us-east-2.compute.internal   <none>           <none>
[ocpadmin@ec2-18-217-45-133 nto]$ cat stuck.yaml 
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: openshift-profile-stuck
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=OpenShift profile stuck
      [variables]
      v=${f:exec:sleep:72}
    name: openshift-profile-stuck
  recommend:
  - match:
    - label: profile
    priority: 20
    profile: openshift-profile-stuck

[ocpadmin@ec2-18-217-45-133 nto]$  oc label no ip-10-0-130-247.us-east-2.compute.internal profile=
node/ip-10-0-130-247.us-east-2.compute.internal labeled

[ocpadmin@ec2-18-217-45-133 nto]$ oc create -f stuck.yaml
tuned.tuned.openshift.io/openshift-profile-stuck created
[ocpadmin@ec2-18-217-45-133 nto]$ oc logs -f tuned-xd6cx
E1115 03:45:09.179778    1993 reflector.go:138] github.com/openshift/cluster-node-tuning-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to watch *v1.Profile: failed to list *v1.Profile: Get "https://172.30.0.1:443/apis/tuned.openshift.io/v1/namespaces/openshift-cluster-node-tuning-operator/profiles?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: connect: no route to host
E1115 03:45:09.180594    1993 reflector.go:138] github.com/openshift/cluster-node-tuning-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to watch *v1.Tuned: failed to list *v1.Tuned: Get "https://172.30.0.1:443/apis/tuned.openshift.io/v1/namespaces/openshift-cluster-node-tuning-operator/tuneds?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: connect: no route to host
I1115 03:45:10.788676    1993 tuned.go:923] started events processor
I1115 03:45:10.789536    1993 tuned.go:966] started controller
I1115 03:45:10.789604    1993 tuned.go:308] extracting Tuned profiles
I1115 03:46:51.705191    1993 tuned.go:291] disabling system tuned...
I1115 03:46:51.722193    1993 tuned.go:386] written "/etc/tuned/recommend.d/50-openshift.conf" to set Tuned profile openshift-node
I1115 03:46:51.942691    1993 tuned.go:633] re-applying profile () as the previous application did not complete
I1115 03:46:51.942741    1993 tuned.go:399] starting tuned...
2021-11-15 03:46:52,092 INFO     tuned.daemon.application: dynamic tuning is globally disabled
2021-11-15 03:46:52,106 INFO     tuned.daemon.daemon: using sleep interval of 1 second(s)
2021-11-15 03:46:52,106 INFO     tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2021-11-15 03:46:52,107 INFO     tuned.daemon.daemon: Using 'openshift-node' profile
2021-11-15 03:46:52,108 INFO     tuned.profiles.loader: loading profile: openshift-node
2021-11-15 03:46:52,165 INFO     tuned.daemon.controller: starting controller
2021-11-15 03:46:52,165 INFO     tuned.daemon.daemon: starting tuning
2021-11-15 03:46:52,177 INFO     tuned.plugins.base: instance cpu: assigning devices cpu0, cpu1
2021-11-15 03:46:52,183 INFO     tuned.plugins.plugin_cpu: We are running on an x86 GenuineIntel platform
2021-11-15 03:46:52,186 WARNING  tuned.plugins.plugin_cpu: your CPU doesn't support MSR_IA32_ENERGY_PERF_BIAS, ignoring CPU energy performance bias
2021-11-15 03:46:52,187 WARNING  tuned.plugins.base: instance disk: no matching devices available
2021-11-15 03:46:52,190 INFO     tuned.plugins.base: instance net: assigning devices ens5
2021-11-15 03:46:52,223 INFO     tuned.plugins.plugin_sysctl: reapplying system sysctl
2021-11-15 03:46:52,231 INFO     tuned.daemon.daemon: static tuning from profile 'openshift-node' applied
I1115 04:28:05.996296    1993 tuned.go:308] extracting Tuned profiles
I1115 04:28:06.198980    1993 tuned.go:342] recommended Tuned profile openshift-node content unchanged
I1115 04:28:52.174948    1993 tuned.go:386] written "/etc/tuned/recommend.d/50-openshift.conf" to set Tuned profile openshift-profile-stuck
I1115 04:28:52.908622    1993 tuned.go:639] active profile (openshift-node) != recommended profile (openshift-profile-stuck)
I1115 04:28:52.908665    1993 tuned.go:495] reloading tuned...
I1115 04:28:52.908673    1993 tuned.go:498] sending HUP to PID 6768
2021-11-15 04:28:52,908 INFO     tuned.daemon.daemon: stopping tuning
2021-11-15 04:28:52,956 INFO     tuned.daemon.daemon: terminating Tuned, rolling back all changes
2021-11-15 04:28:52,965 INFO     tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2021-11-15 04:28:52,966 INFO     tuned.daemon.daemon: Using 'openshift-profile-stuck' profile
2021-11-15 04:28:52,967 INFO     tuned.profiles.loader: loading profile: openshift-profile-stuck
I1115 04:29:39.594877    1993 tuned.go:308] extracting Tuned profiles
I1115 04:29:39.714649    1993 tuned.go:386] written "/etc/tuned/recommend.d/50-openshift.conf" to set Tuned profile openshift-node
I1115 04:29:53.009679    1993 tuned.go:308] extracting Tuned profiles
I1115 04:29:53.194057    1993 tuned.go:342] recommended Tuned profile openshift-node content unchanged
I1115 04:29:53.194302    1993 tuned.go:386] written "/etc/tuned/recommend.d/50-openshift.conf" to set Tuned profile openshift-profile-stuck
2021-11-15 04:30:04,976 INFO     tuned.daemon.daemon: starting tuning
2021-11-15 04:30:04,978 INFO     tuned.daemon.daemon: static tuning from profile 'openshift-profile-stuck' applied
I1115 04:30:05.923355    1993 tuned.go:650] active and recommended profile (openshift-profile-stuck) match; profile change will not trigger profile reload








^C
[ocpadmin@ec2-18-217-45-133 nto]$ oc get profiles
NAME                                         AGE
ip-10-0-130-247.us-east-2.compute.internal   48m
ip-10-0-158-17.us-east-2.compute.internal    53m
ip-10-0-180-236.us-east-2.compute.internal   53m
ip-10-0-182-244.us-east-2.compute.internal   47m
ip-10-0-203-105.us-east-2.compute.internal   48m
ip-10-0-216-59.us-east-2.compute.internal    53m
[ocpadmin@ec2-18-217-45-133 nto]$ oc logs -f tuned-xd6cx
E1115 03:45:09.179778    1993 reflector.go:138] github.com/openshift/cluster-node-tuning-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to watch *v1.Profile: failed to list *v1.Profile: Get "https://172.30.0.1:443/apis/tuned.openshift.io/v1/namespaces/openshift-cluster-node-tuning-operator/profiles?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: connect: no route to host
E1115 03:45:09.180594    1993 reflector.go:138] github.com/openshift/cluster-node-tuning-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to watch *v1.Tuned: failed to list *v1.Tuned: Get "https://172.30.0.1:443/apis/tuned.openshift.io/v1/namespaces/openshift-cluster-node-tuning-operator/tuneds?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: connect: no route to host
I1115 03:45:10.788676    1993 tuned.go:923] started events processor
I1115 03:45:10.789536    1993 tuned.go:966] started controller
I1115 03:45:10.789604    1993 tuned.go:308] extracting Tuned profiles
I1115 03:46:51.705191    1993 tuned.go:291] disabling system tuned...
I1115 03:46:51.722193    1993 tuned.go:386] written "/etc/tuned/recommend.d/50-openshift.conf" to set Tuned profile openshift-node
I1115 03:46:51.942691    1993 tuned.go:633] re-applying profile () as the previous application did not complete
I1115 03:46:51.942741    1993 tuned.go:399] starting tuned...
2021-11-15 03:46:52,092 INFO     tuned.daemon.application: dynamic tuning is globally disabled
2021-11-15 03:46:52,106 INFO     tuned.daemon.daemon: using sleep interval of 1 second(s)
2021-11-15 03:46:52,106 INFO     tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2021-11-15 03:46:52,107 INFO     tuned.daemon.daemon: Using 'openshift-node' profile
2021-11-15 03:46:52,108 INFO     tuned.profiles.loader: loading profile: openshift-node
2021-11-15 03:46:52,165 INFO     tuned.daemon.controller: starting controller
2021-11-15 03:46:52,165 INFO     tuned.daemon.daemon: starting tuning
2021-11-15 03:46:52,177 INFO     tuned.plugins.base: instance cpu: assigning devices cpu0, cpu1
2021-11-15 03:46:52,183 INFO     tuned.plugins.plugin_cpu: We are running on an x86 GenuineIntel platform
2021-11-15 03:46:52,186 WARNING  tuned.plugins.plugin_cpu: your CPU doesn't support MSR_IA32_ENERGY_PERF_BIAS, ignoring CPU energy performance bias
2021-11-15 03:46:52,187 WARNING  tuned.plugins.base: instance disk: no matching devices available
2021-11-15 03:46:52,190 INFO     tuned.plugins.base: instance net: assigning devices ens5
2021-11-15 03:46:52,223 INFO     tuned.plugins.plugin_sysctl: reapplying system sysctl
2021-11-15 03:46:52,231 INFO     tuned.daemon.daemon: static tuning from profile 'openshift-node' applied
I1115 04:28:05.996296    1993 tuned.go:308] extracting Tuned profiles
I1115 04:28:06.198980    1993 tuned.go:342] recommended Tuned profile openshift-node content unchanged
I1115 04:28:52.174948    1993 tuned.go:386] written "/etc/tuned/recommend.d/50-openshift.conf" to set Tuned profile openshift-profile-stuck
I1115 04:28:52.908622    1993 tuned.go:639] active profile (openshift-node) != recommended profile (openshift-profile-stuck)
I1115 04:28:52.908665    1993 tuned.go:495] reloading tuned...
I1115 04:28:52.908673    1993 tuned.go:498] sending HUP to PID 6768
2021-11-15 04:28:52,908 INFO     tuned.daemon.daemon: stopping tuning
2021-11-15 04:28:52,956 INFO     tuned.daemon.daemon: terminating Tuned, rolling back all changes
2021-11-15 04:28:52,965 INFO     tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2021-11-15 04:28:52,966 INFO     tuned.daemon.daemon: Using 'openshift-profile-stuck' profile
2021-11-15 04:28:52,967 INFO     tuned.profiles.loader: loading profile: openshift-profile-stuck
I1115 04:29:39.594877    1993 tuned.go:308] extracting Tuned profiles
I1115 04:29:39.714649    1993 tuned.go:386] written "/etc/tuned/recommend.d/50-openshift.conf" to set Tuned profile openshift-node
I1115 04:29:53.009679    1993 tuned.go:308] extracting Tuned profiles
I1115 04:29:53.194057    1993 tuned.go:342] recommended Tuned profile openshift-node content unchanged
I1115 04:29:53.194302    1993 tuned.go:386] written "/etc/tuned/recommend.d/50-openshift.conf" to set Tuned profile openshift-profile-stuck
2021-11-15 04:30:04,976 INFO     tuned.daemon.daemon: starting tuning
2021-11-15 04:30:04,978 INFO     tuned.daemon.daemon: static tuning from profile 'openshift-profile-stuck' applied
I1115 04:30:05.923355    1993 tuned.go:650] active and recommended profile (openshift-profile-stuck) match; profile change will not trigger profile reload


Compared the log with 4.8/4.9/4.10, no timeout and restart.

Comment 3 Jiří Mencák 2021-11-15 08:45:34 UTC

Thank you for looking Liquan!

The fix (https://github.com/openshift/cluster-node-tuning-operator/pull/287) merged 5 days ago on 2021-11-10, so your observation this is not fixed in
4.7.0-0.nightly-2021-11-08-025909 is correct.

I've tested on 4.7.0-0.nightly-2021-11-12-230709 (red build) and it is fixed there:

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-11-12-230709   True        False         13m     Cluster version is 4.7.0-0.nightly-2021-11-12-230709


I1115 08:35:43.253765    2368 tuned.go:531] sending HUP to PID 3070
2021-11-15 08:35:43,253 INFO     tuned.daemon.daemon: stopping tuning
2021-11-15 08:35:43,271 INFO     tuned.daemon.daemon: terminating Tuned, rolling back all changes
2021-11-15 08:35:43,280 INFO     tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2021-11-15 08:35:43,280 INFO     tuned.daemon.daemon: Using 'openshift-profile-stuck' profile
2021-11-15 08:35:43,281 INFO     tuned.profiles.loader: loading profile: openshift-profile-stuck
E1115 08:36:43.253936    2368 tuned.go:1055] timeout (60) to apply TuneD profile; restarting TuneD daemon
E1115 08:36:53.255773    2368 tuned.go:465] error waiting for tuned: signal: killed
I1115 08:36:53.255884    2368 tuned.go:415] starting tuned...
2021-11-15 08:36:53,368 INFO     tuned.daemon.application: dynamic tuning is globally disabled
2021-11-15 08:36:53,372 INFO     tuned.daemon.daemon: using sleep interval of 1 second(s)
2021-11-15 08:36:53,373 INFO     tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2021-11-15 08:36:53,373 INFO     tuned.daemon.daemon: Using 'openshift-profile-stuck' profile
2021-11-15 08:36:53,374 INFO     tuned.profiles.loader: loading profile: openshift-profile-stuck

Comment 8 errata-xmlrpc 2021-12-01 13:35:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.38 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4802

Comment 9 Jiří Mencák 2021-12-15 12:56:05 UTC

*** Bug 2013940 has been marked as a duplicate of this bug. ***