That is weird. I see NTO should be using this to start stalld: ExecStart=/usr/bin/chrt -f 10 /usr/local/bin/stalld --systemd $CLIST $AGGR $BP $BR $BD $THRESH $LOGGING $FG $PF And that should to setting FIFO:10. Can you double check the stalld systemd unit that is present on the node?
Oh.. I think I know what happened. RHCOS 8.4 includes stalld and systemd picked up the unit shipped with it instead of the unit that NTO creates. Jirka: Where do you install the systemd unit that you install via NTO?
(In reply to Martin Sivák from comment #1) > That is weird. I see NTO should be using this to start stalld: > > ExecStart=/usr/bin/chrt -f 10 /usr/local/bin/stalld --systemd $CLIST $AGGR > $BP $BR $BD $THRESH $LOGGING $FG $PF Where do you see this, Martine? sh-4.4# grep ExecStart= /usr/lib/systemd/system/stalld.service ExecStart=/usr/bin/stalld --systemd $CLIST $AGGR $BP $BR $BD $THRESH $LOGGING $FG $PF NTO no longer ships the stalld unit files as of: https://github.com/openshift/cluster-node-tuning-operator/pull/226 The CoreOS-shipped stalld.service file is now used and that one seems to be missing the "/usr/bin/chrt -f 10" as pointed out by Brent.
(In reply to Martin Sivák from comment #3) > Jirka: Where do you install the systemd unit that you install via NTO? Again, NTO no longer installs any systemd stalld unit files, it relies on the CoreOS provided ones.
I found it here: https://github.com/openshift/cluster-node-tuning-operator/blob/master/pkg/tuned/host_payload.go#L80
Fixed in 4.9.0-0.nightly-2021-06-18-002931 and above. The next OCP 4.8 nightly should also have the fix as https://github.com/openshift/cluster-node-tuning-operator/pull/237 merged a while ago.
Thanks Jirka!
Verifying the bug fix on : oc version Client Version: 4.8.0-0.nightly-2021-06-22-192915 Server Version: 4.8.0-0.nightly-2021-06-22-192915 Kubernetes Version: v1.21.0-rc.0+120883f oc get csv NAME DISPLAY VERSION REPLACES PHASE performance-addon-operator.v4.8.0 Performance Addon Operator 4.8.0 Succeeded Verify that stalld runs now as sched_fifo : ps -ef | grep stalld root 7294 1 0 14:16 ? 00:00:00 /usr/local/bin/stalld --systemd -p 1000000000 -r 10000 -d 3 -t 20 --log_syslog --log_kmsg --foreground --pidfile /run/stalld.pid systemctl status stalld # Write a pidfile # ex: PF=--pidfile /run/stalld.pid Environment=PF="--pidfile /run/stalld.pid" ExecStartPre=/usr/local/bin/throttlectl.sh off ExecStart=/usr/bin/chrt -f 10 /usr/local/bin/stalld --systemd $CLIST $AGGR $BP $BR $BD $THRESH $LOGGING > ExecStopPost=/usr/local/bin/throttlectl.sh on Restart=always User=root As it can be noticed above , stalld binary is used now from nto & running with fifo scheduler (fifo flag of chrt is -f) with priority 10.
following comment 14: Retrieving the scheduling attributes of the stalld pid, we get : chrt -ap 7294 pid 7294's current scheduling policy: SCHED_FIFO pid 7294's current scheduling priority: 10 & by verifying that the scheduling policy is SCHED_FIFO.
PR link : https://github.com/openshift-kni/performance-addon-operators/pull/674