Bug 1972701 - Stalld running not running as fifo
Summary: Stalld running not running as fifo
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Performance Addon Operator
Version: 4.8
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 4.8.0
Assignee: Martin Sivák
QA Contact: Gowrishankar Rajaiyan
URL:
Whiteboard:
Depends On: 1973237
Blocks: 1970940
TreeView+ depends on / blocked
 
Reported: 2021-06-16 12:46 UTC by browsell
Modified: 2022-08-26 14:52 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-26 14:52:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Comment 1 Martin Sivák 2021-06-16 12:55:18 UTC
That is weird. I see NTO should be using this to start stalld:

ExecStart=/usr/bin/chrt -f 10 /usr/local/bin/stalld --systemd $CLIST $AGGR $BP $BR $BD $THRESH $LOGGING $FG $PF

And that should to setting FIFO:10.

Can you double check the stalld systemd unit that is present on the node?

Comment 3 Martin Sivák 2021-06-16 13:37:38 UTC
Oh.. I think I know what happened. RHCOS 8.4 includes stalld and systemd picked up the unit shipped with it instead of the unit that NTO creates.

Jirka: Where do you install the systemd unit that you install via NTO?

Comment 4 Jiří Mencák 2021-06-16 14:07:32 UTC
(In reply to Martin Sivák from comment #1)
> That is weird. I see NTO should be using this to start stalld:
> 
> ExecStart=/usr/bin/chrt -f 10 /usr/local/bin/stalld --systemd $CLIST $AGGR
> $BP $BR $BD $THRESH $LOGGING $FG $PF

Where do you see this, Martine?
sh-4.4# grep ExecStart= /usr/lib/systemd/system/stalld.service 
ExecStart=/usr/bin/stalld --systemd $CLIST $AGGR $BP $BR $BD $THRESH $LOGGING $FG $PF

NTO no longer ships the stalld unit files as of:
https://github.com/openshift/cluster-node-tuning-operator/pull/226

The CoreOS-shipped stalld.service file is now used and that one seems to be missing the "/usr/bin/chrt -f 10" as pointed out by Brent.

Comment 5 Jiří Mencák 2021-06-16 14:09:55 UTC
(In reply to Martin Sivák from comment #3)
> Jirka: Where do you install the systemd unit that you install via NTO?

Again, NTO no longer installs any systemd stalld unit files, it relies on the CoreOS provided ones.

Comment 12 Jiří Mencák 2021-06-18 06:44:15 UTC
Fixed in 4.9.0-0.nightly-2021-06-18-002931 and above.  The next OCP 4.8 nightly should also have the fix as
https://github.com/openshift/cluster-node-tuning-operator/pull/237 merged a while ago.

Comment 13 Martin Sivák 2021-06-18 08:50:07 UTC
Thanks Jirka!

Comment 14 Shereen Haj Makhoul 2021-06-24 08:57:03 UTC
Verifying the bug fix on :

oc version 
Client Version: 4.8.0-0.nightly-2021-06-22-192915
Server Version: 4.8.0-0.nightly-2021-06-22-192915
Kubernetes Version: v1.21.0-rc.0+120883f

oc get csv 
NAME                                DISPLAY                      VERSION   REPLACES   PHASE
performance-addon-operator.v4.8.0   Performance Addon Operator   4.8.0                Succeeded

Verify that stalld runs now as sched_fifo :

ps -ef | grep stalld
root        7294       1  0 14:16 ?        00:00:00 /usr/local/bin/stalld --systemd -p 1000000000 -r 10000 -d 3 -t 20 --log_syslog --log_kmsg --foreground --pidfile /run/stalld.pid


systemctl status stalld
# Write a pidfile
# ex: PF=--pidfile /run/stalld.pid
Environment=PF="--pidfile /run/stalld.pid"

ExecStartPre=/usr/local/bin/throttlectl.sh off
ExecStart=/usr/bin/chrt -f 10 /usr/local/bin/stalld --systemd $CLIST $AGGR $BP $BR $BD $THRESH $LOGGING >
ExecStopPost=/usr/local/bin/throttlectl.sh on
Restart=always
User=root

As it can be noticed above , stalld binary is used now from nto & running with fifo scheduler (fifo flag of chrt is -f) with priority 10.

Comment 15 Shereen Haj Makhoul 2021-06-24 11:00:47 UTC
following comment 14:

Retrieving the scheduling attributes of the stalld pid, we get :
chrt -ap 7294
pid 7294's current scheduling policy: SCHED_FIFO
pid 7294's current scheduling priority: 10

& by verifying that the scheduling policy is SCHED_FIFO.

Comment 16 Shereen Haj Makhoul 2021-06-29 14:01:18 UTC
PR link : https://github.com/openshift-kni/performance-addon-operators/pull/674


Note You need to log in before you can comment on or make changes to this bug.