Bug 1892459 - NTO-shipped stalld needs to use FIFO for boosting.
Summary: NTO-shipped stalld needs to use FIFO for boosting.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node Tuning Operator
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.6.z
Assignee: jmencak
QA Contact: Simon
URL:
Whiteboard:
Depends On: 1892457
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-28 20:14 UTC by jmencak
Modified: 2020-11-16 14:38 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1892457
Environment:
Last Closed: 2020-11-16 14:37:43 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-node-tuning-operator pull 170 0 None closed Bug 1892459: [release-4.6] Set scheduling policy to SCHED_FIFO to stalld and lower threshold. 2020-11-24 16:41:50 UTC
Github openshift cluster-node-tuning-operator pull 171 0 None closed Bug 1892459: [release-4.6] Ship the latest version of stalld. 2020-11-24 16:41:50 UTC
Red Hat Product Errata RHBA-2020:4987 0 None None None 2020-11-16 14:38:00 UTC

Description jmencak 2020-10-28 20:14:20 UTC
+++ This bug was initially created as a clone of Bug #1892457 +++

Description of problem:
Currently, NTO-shipped stalld deamon does not allow stalld operation without the
Deadline scheduler policy. The latest stalld version allows this.  Include the latest stalld version in NTO.

Additional info:
https://bugzilla.redhat.com/show_bug.cgi?id=1887568#c26

--- Additional comment from  on 2020-10-28 20:09:23 UTC ---

Upstream PRs
https://github.com/openshift/cluster-node-tuning-operator/pull/168
https://github.com/openshift/cluster-node-tuning-operator/pull/169

Comment 3 Simon 2020-11-05 19:50:23 UTC
$ oc get clusterversions.config.openshift.io
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-11-05-134712   True        False         4h37m   Cluster version is 4.6.0-0.nightly-2020-11-05-134712

$ oc project openshift-cluster-node-tuning-operator
Now using project "openshift-cluster-node-tuning-operator" on server "https://api.skordas511a.qe.devcluster.openshift.com:6443".

$ oc get nodes
NAME                                         STATUS   ROLES    AGE     VERSION
ip-10-0-154-69.us-east-2.compute.internal    Ready    worker   4h58m   v1.19.0+9f84db3
ip-10-0-159-2.us-east-2.compute.internal     Ready    master   5h4m    v1.19.0+9f84db3
ip-10-0-181-86.us-east-2.compute.internal    Ready    master   5h4m    v1.19.0+9f84db3
ip-10-0-182-196.us-east-2.compute.internal   Ready    worker   4h53m   v1.19.0+9f84db3
ip-10-0-202-160.us-east-2.compute.internal   Ready    worker   5h      v1.19.0+9f84db3
ip-10-0-205-149.us-east-2.compute.internal   Ready    master   5h4m    v1.19.0+9f84db3

$ node=ip-10-0-154-69.us-east-2.compute.internal
$ echo $node
ip-10-0-154-69.us-east-2.compute.internal

$ oc label node $node node-role.kubernetes.io/worker-rt=
node/ip-10-0-154-69.us-east-2.compute.internal labeled

$ oc create -f- <<EOF
> apiVersion: machineconfiguration.openshift.io/v1
> kind: MachineConfigPool
> metadata:
>   name: worker-rt
>   labels:
>     worker-rt: ""
> spec:
>   machineConfigSelector:
>     matchExpressions:
>       - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,worker-rt]}
>   nodeSelector:
>     matchLabels:
>       node-role.kubernetes.io/worker-rt: ""
> EOF
machineconfigpool.machineconfiguration.openshift.io/worker-rt created
$ oc create -f- <<EOF
> apiVersion: tuned.openshift.io/v1
> kind: Tuned
> metadata:
>   name: openshift-realtime
>   namespace: openshift-cluster-node-tuning-operator
> spec:
>   profile:
>   - data: |
>       [main]
>       summary=Custom OpenShift realtime profile
>       include=openshift-node,realtime
>       [variables]
>       # isolated_cores take a list of ranges; e.g. isolated_cores=2,4-7
>       isolated_cores=1
>       #isolate_managed_irq=Y
>       not_isolated_cores_expanded=${f:cpulist_invert:${isolated_cores_expanded}}
>       [bootloader]
>       cmdline_ocp_realtime=+systemd.cpu_affinity=${not_isolated_cores_expanded}
>       [service]
>       service.stalld=start,enable
>     name: openshift-realtime
>
>   recommend:
>   - machineConfigLabels:
>       machineconfiguration.openshift.io/role: "worker-rt"
>     priority: 20
>     profile: openshift-realtime
> EOF
tuned.tuned.openshift.io/openshift-realtime created

$ oc get nodes
NAME                                         STATUS                     ROLES              AGE     VERSION
ip-10-0-154-69.us-east-2.compute.internal    Ready,SchedulingDisabled   worker,worker-rt   5h      v1.19.0+9f84db3
ip-10-0-159-2.us-east-2.compute.internal     Ready                      master             5h5m    v1.19.0+9f84db3
ip-10-0-181-86.us-east-2.compute.internal    Ready                      master             5h5m    v1.19.0+9f84db3
ip-10-0-182-196.us-east-2.compute.internal   Ready                      worker             4h55m   v1.19.0+9f84db3
ip-10-0-202-160.us-east-2.compute.internal   Ready                      worker             5h1m    v1.19.0+9f84db3
ip-10-0-205-149.us-east-2.compute.internal   Ready                      master             5h5m    v1.19.0+9f84db3

$ oc get mcp
NAME        CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master      rendered-master-f578e5fdbe575539b1e14c36f757e432   True      False      False      3              3                   3                     0                      5h5m
worker      rendered-worker-7da090ee233da82a7c774564fa964a72   True      False      False      2              2                   2                     0                      5h5m
worker-rt                                                      False     True       False      1              0

$ oc get nodes && oc get mcp
NAME                                         STATUS   ROLES              AGE     VERSION
ip-10-0-154-69.us-east-2.compute.internal    Ready    worker,worker-rt   5h11m   v1.19.0+9f84db3
ip-10-0-159-2.us-east-2.compute.internal     Ready    master             5h17m   v1.19.0+9f84db3
ip-10-0-181-86.us-east-2.compute.internal    Ready    master             5h17m   v1.19.0+9f84db3
ip-10-0-182-196.us-east-2.compute.internal   Ready    worker             5h6m    v1.19.0+9f84db3
ip-10-0-202-160.us-east-2.compute.internal   Ready    worker             5h13m   v1.19.0+9f84db3
ip-10-0-205-149.us-east-2.compute.internal   Ready    master             5h17m   v1.19.0+9f84db3
NAME        CONFIG                                                UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master      rendered-master-f578e5fdbe575539b1e14c36f757e432      True      False      False      3              3                   3                     0                      5h16m
worker      rendered-worker-7da090ee233da82a7c774564fa964a72      True      False      False      2              2                   2                     0                      5h16m
worker-rt   rendered-worker-rt-6e1c08ca08fdfaccf8e7995f6899680b   True      False      False      1              1                   1                     0                      12m

$ oc debug node/$node
Starting pod/ip-10-0-154-69us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.154.69
If you don't see a command prompt, try pressing enter.
sh-4.4# ps auxww | grep stalld
root        3472  0.4  0.0   7440  2616 ?        Ss   19:40   0:02 /usr/local/bin/stalld -p 1000000000 -r 10000 -d 3 -t 20 --log_syslog --log_kmsg --foreground --pidfile /run/stalld.pid
root       10359  0.0  0.0   9180   980 pts/0    S+   19:49   0:00 grep stalld
sh-4.4# grep ExecStart /host/etc/systemd/system/stalld.service 
ExecStart=/usr/bin/chrt -f 10 /usr/local/bin/stalld $CLIST $AGGR $BP $BR $BD $THRESH $LOGGING $FG $PF

Comment 6 errata-xmlrpc 2020-11-16 14:37:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.4 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4987


Note You need to log in before you can comment on or make changes to this bug.