1928614 – NTO may fail to disable stalld when relying on Tuned '[service]' plugin

Bug 1928614 - NTO may fail to disable stalld when relying on Tuned '[service]' plugin

Summary: NTO may fail to disable stalld when relying on Tuned '[service]' plugin

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node Tuning Operator
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.7.z
Assignee:	Jiří Mencák
QA Contact:	Simon
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1928618 (view as bug list)
Depends On:	1926903
Blocks:
TreeView+	depends on / blocked

Reported:	2021-02-15 07:22 UTC by OpenShift BugZilla Robot
Modified:	2021-04-05 13:55 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-04-05 13:55:11 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-node-tuning-operator pull 215	0	None	open	Bug 1928614: Keep ignition units in sync with [service] plugin.	2021-03-04 11:28:48 UTC
Red Hat Product Errata	RHSA-2021:1005	0	None	None	None	2021-04-05 13:55:33 UTC

Comment 1 Jiří Mencák 2021-02-18 10:27:53 UTC

*** Bug 1928618 has been marked as a duplicate of this bug. ***

Comment 4 Simon 2021-03-29 17:29:07 UTC

$ oc get clusterversions.config.openshift.io 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-03-27-082615   True        False         51m     Cluster version is 4.7.0-0.nightly-2021-03-27-082615

$ oc project openshift-cluster-node-tuning-operator
Now using project "openshift-cluster-node-tuning-operator" on server "https://api.skordas32947.qe.devcluster.openshift.com:6443".

$ node=$(oc get nodes | grep -m 1 worker | cut -f 1 -d ' ') && echo $node
ip-10-0-147-104.us-east-2.compute.internal

$ pod=$(oc get pods -n openshift-cluster-node-tuning-operator -o wide | grep $node | cut -d ' ' -f 1) && echo $pod
tuned-46ms5

$ oc label node $node node-role.kubernetes.io/worker-rt=
node/ip-10-0-147-104.us-east-2.compute.internal labeled

$ oc get nodes
NAME                                         STATUS   ROLES              AGE   VERSION
ip-10-0-147-104.us-east-2.compute.internal   Ready    worker,worker-rt   71m   v1.20.0+bafe72f
ip-10-0-154-13.us-east-2.compute.internal    Ready    master             77m   v1.20.0+bafe72f
ip-10-0-161-142.us-east-2.compute.internal   Ready    master             77m   v1.20.0+bafe72f
ip-10-0-186-22.us-east-2.compute.internal    Ready    worker             72m   v1.20.0+bafe72f
ip-10-0-214-168.us-east-2.compute.internal   Ready    worker             72m   v1.20.0+bafe72f
ip-10-0-219-33.us-east-2.compute.internal    Ready    master             77m   v1.20.0+bafe72f


$ oc create -f- <<EOF
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
 name: worker-rt
 labels:
   worker-rt: ""
spec:
 machineConfigSelector:
   matchExpressions:
     - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,worker-rt]}
 nodeSelector:
   matchLabels:
     node-role.kubernetes.io/worker-rt: ""
EOF
machineconfigpool.machineconfiguration.openshift.io/worker-rt created

# stalld enabled
$ oc create -f- <<EOF
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
 name: openshift-realtime
 namespace: openshift-cluster-node-tuning-operator
spec:
 profile:
 - data: |
     [main]
     summary=Custom OpenShift realtime profile
     include=openshift-node,realtime
     [variables]
     # isolated_cores take a list of ranges; e.g. isolated_cores=2,4-7
     isolated_cores=1
     #isolate_managed_irq=Y
     not_isolated_cores_expanded=${f:cpulist_invert:${isolated_cores_expanded}}
     [bootloader]
     cmdline_ocp_realtime=+systemd.cpu_affinity=${not_isolated_cores_expanded}
     [service]
     service.stalld=start,enable
   name: openshift-realtime

 recommend:
 - machineConfigLabels:
     machineconfiguration.openshift.io/role: "worker-rt"
   priority: 20
   profile: openshift-realtime
EOF
tuned.tuned.openshift.io/openshift-realtime created

$ oc get tuned
NAME                 AGE
default              76m
openshift-realtime   23s
rendered             76m

$ oc get mcp
NAME        CONFIG                                                UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master      rendered-master-79a3dc29d96e9d9386b4b39b6067fd32      True      False      False      3              3                   3                     0                      80m
worker      rendered-worker-9db7a4d01330d47cb1b56c2e0bf5be30      True      False      False      2              2                   2                     0                      80m
worker-rt   rendered-worker-rt-8c07700b24d5949d3e634d8a235cc600   True      False      False      1              1                   1                     0                      2m56s

$ oc logs $pod
...
2021-03-29 16:49:54,444 INFO     tuned.daemon.daemon: static tuning from profile 'openshift-realtime' applied

$ oc debug node/$node
Creating debug namespace/openshift-debug-node-rrtlh ...
Starting pod/ip-10-0-147-104us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.147.104
If you don't see a command prompt, try pressing enter.
sh-4.4# ps auxww | grep stalld
root        3583  0.5  0.0   8180  2928 ?        Ss   16:49   0:00 /usr/local/bin/stalld --systemd -p 1000000000 -r 10000 -d 3 -t 20 --log_syslog --log_kmsg --foreground --pidfile /run/stalld.pid
root        4837  0.0  0.0   9184  1080 pts/0    S+   16:52   0:00 grep stalld
sh-4.4# exit
exit

Removing debug pod ...
Removing debug namespace/openshift-debug-node-rrtlh ...

# stalld is running - as expected!

# Disabling stalld 

$ oc edit tuned openshift-realtime 
# edit service.stalld=start,enable -> service.stalld=stop,disable

$ oc debug node/$node
Creating debug namespace/openshift-debug-node-xpv4v ...
Starting pod/ip-10-0-147-104us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.147.104
If you don't see a command prompt, try pressing enter.
sh-4.4# ps auxww | grep stalld
root        4023  0.0  0.0   9184  1084 pts/0    S+   16:56   0:00 grep stalld
sh-4.4# exit
exit

Removing debug pod ...
Removing debug namespace/openshift-debug-node-xpv4v ...

# Stalld is not running as expected

# enable back stalld
$ oc edit tuned openshift-realtime 
# edit service.stalld=stop,disable -> service.stalld=start,enable

$ oc debug node/$node
Creating debug namespace/openshift-debug-node-qwk9q ...
Starting pod/ip-10-0-147-104us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.147.104
If you don't see a command prompt, try pressing enter.
sh-4.4# ps auxww | grep stalld
root        3665  0.8  0.0   7964  2788 ?        Ss   16:59   0:00 /usr/local/bin/stalld --systemd -p 1000000000 -r 10000 -d 3 -t 20 --log_syslog --log_kmsg --foreground --pidfile /run/stalld.pid
root        3927  0.0  0.0   9184   968 pts/0    S+   16:59   0:00 grep stalld
sh-4.4# exit
exit

Removing debug pod ...
Removing debug namespace/openshift-debug-node-qwk9q ...

# stalld enabled - as expected!


# After multiple enabling/disabling stalld was enabled/disabled in correct way.

Comment 6 errata-xmlrpc 2021-04-05 13:55:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.5 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:1005

Note You need to log in before you can comment on or make changes to this bug.