Bug 1892457 - NTO-shipped stalld needs to use FIFO for boosting.
Summary: NTO-shipped stalld needs to use FIFO for boosting.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node Tuning Operator
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.7.0
Assignee: Jiří Mencák
QA Contact: Simon
URL:
Whiteboard:
Depends On:
Blocks: 1892459
TreeView+ depends on / blocked
 
Reported: 2020-10-28 20:06 UTC by Jiří Mencák
Modified: 2021-02-24 15:29 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1892459 (view as bug list)
Environment:
Last Closed: 2021-02-24 15:28:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-node-tuning-operator pull 168 0 None closed Bug 1892457: Ship the latest version of stalld 2021-01-07 19:57:48 UTC
Github openshift cluster-node-tuning-operator pull 169 0 None closed Bug 1892457: Set scheduling policy to SCHED_FIFO to stalld and lower threshold 2021-01-07 19:57:51 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:29:05 UTC

Description Jiří Mencák 2020-10-28 20:06:10 UTC
Description of problem:
Currently, NTO-shipped stalld deamon does not allow stalld operation without the
Deadline scheduler policy. The latest stalld version allows this.  Include the latest stalld version in NTO.

Additional info:
https://bugzilla.redhat.com/show_bug.cgi?id=1887568#c26

Comment 2 Simon 2020-10-29 15:47:56 UTC
Regression pass.

Comment 3 Scott Dodson 2020-10-29 19:28:01 UTC
My intent by moving this back and retitling the PRs was to ensure that this made it back into normal process, lets leave this MODIFIED so that the normal automation ensures it gets linked to the 4.7 errata.

Comment 5 Jiří Mencák 2020-11-03 08:11:50 UTC
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          68m     Unable to apply 4.7.0-0.nightly-2020-11-03-062304: the cluster operator image-registry has not yet successfully rolled out

$ oc project openshift-cluster-node-tuning-operator
Now using project "openshift-cluster-node-tuning-operator" on server "https://api.jmencak.gcp.devcluster.openshift.com:6443".

$ oc get no
NAME                                                          STATUS   ROLES    AGE   VERSION
jmencak-lmmc8-master-0.c.openshift-gce-devel.internal         Ready    master   65m   v1.19.0+74d9cb5
jmencak-lmmc8-master-1.c.openshift-gce-devel.internal         Ready    master   65m   v1.19.0+74d9cb5
jmencak-lmmc8-master-2.c.openshift-gce-devel.internal         Ready    master   65m   v1.19.0+74d9cb5
jmencak-lmmc8-worker-a-vvgj4.c.openshift-gce-devel.internal   Ready    worker   56m   v1.19.0+74d9cb5
jmencak-lmmc8-worker-b-drvv9.c.openshift-gce-devel.internal   Ready    worker   56m   v1.19.0+74d9cb5

$ oc label no jmencak-lmmc8-worker-a-vvgj4.c.openshift-gce-devel.internal node-role.kubernetes.io/worker-rt=
node/jmencak-lmmc8-worker-a-vvgj4.c.openshift-gce-devel.internal labeled

$ oc create -f- <<EOF
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: worker-rt
  labels:
    worker-rt: ""
spec:
  machineConfigSelector:
    matchExpressions:
      - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,worker-rt]}
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker-rt: ""
EOF

$ oc create -f- <<EOF
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: openshift-realtime
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=Custom OpenShift realtime profile
      include=openshift-node,realtime
      [variables]
      # isolated_cores take a list of ranges; e.g. isolated_cores=2,4-7
      isolated_cores=1
      #isolate_managed_irq=Y
      not_isolated_cores_expanded=${f:cpulist_invert:${isolated_cores_expanded}}
      [bootloader]
      cmdline_ocp_realtime=+systemd.cpu_affinity=${not_isolated_cores_expanded}
      [service]
      service.stalld=start,enable
    name: openshift-realtime

  recommend:
  - machineConfigLabels:
      machineconfiguration.openshift.io/role: "worker-rt"
    priority: 20
    profile: openshift-realtime
EOF

$ oc get no
NAME                                                          STATUS                     ROLES              AGE   VERSION
jmencak-lmmc8-master-0.c.openshift-gce-devel.internal         Ready                      master             69m   v1.19.0+74d9cb5
jmencak-lmmc8-master-1.c.openshift-gce-devel.internal         Ready                      master             69m   v1.19.0+74d9cb5
jmencak-lmmc8-master-2.c.openshift-gce-devel.internal         Ready                      master             69m   v1.19.0+74d9cb5
jmencak-lmmc8-worker-a-vvgj4.c.openshift-gce-devel.internal   Ready,SchedulingDisabled   worker,worker-rt   59m   v1.19.0+74d9cb5
jmencak-lmmc8-worker-b-drvv9.c.openshift-gce-devel.internal   Ready                      worker             59m   v1.19.0+74d9cb5

$ oc get mcp
NAME        CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master      rendered-master-6c73da35252d96e7767394716b7009bc   True      False      False      3              3                   3                     0                      67m
worker      rendered-worker-ab6f82757a950d2a151998b692b0f6be   True      False      False      1              1                   1                     0                      67m
worker-rt                                                      False     True       False      1              0                   0                     0                      53s

$ oc get no
NAME                                                          STATUS   ROLES              AGE   VERSION
jmencak-lmmc8-master-0.c.openshift-gce-devel.internal         Ready    master             78m   v1.19.0+74d9cb5
jmencak-lmmc8-master-1.c.openshift-gce-devel.internal         Ready    master             78m   v1.19.0+74d9cb5
jmencak-lmmc8-master-2.c.openshift-gce-devel.internal         Ready    master             78m   v1.19.0+74d9cb5
jmencak-lmmc8-worker-a-vvgj4.c.openshift-gce-devel.internal   Ready    worker,worker-rt   68m   v1.19.0+74d9cb5
jmencak-lmmc8-worker-b-drvv9.c.openshift-gce-devel.internal   Ready    worker             68m   v1.19.0+74d9cb5

$ oc get mcp
NAME        CONFIG                                                UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master      rendered-master-6c73da35252d96e7767394716b7009bc      True      False      False      3              3                   3                     0                      78m
worker      rendered-worker-ab6f82757a950d2a151998b692b0f6be      True      False      False      1              1                   1                     0                      78m
worker-rt   rendered-worker-rt-8f093c3d15d6f63bf35876befedb9bdf   True      False      False      1              1                   1                     0                      11m

$ oc debug no/jmencak-lmmc8-worker-a-vvgj4.c.openshift-gce-devel.internal
Creating debug namespace/openshift-debug-node-zh4v7 ...
Starting pod/jmencak-lmmc8-worker-a-vvgj4copenshift-gce-develinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.32.2
If you don't see a command prompt, try pressing enter.
sh-4.4# ps auxww|grep stalld
root        3740  0.5  0.0   8296  2744 ?        Ss   07:59   0:02 /usr/local/bin/stalld -p 1000000000 -r 10000 -d 3 -t 20 --log_syslog --log_kmsg --foreground --pidfile /run/stalld.pid
root        8874  0.0  0.0   9180  1064 pts/0    S+   08:05   0:00 grep stalld

Threshold changed to 20s.  

sh-4.4#  grep ExecStart /host/etc/systemd/system/stalld.service 
ExecStart=/usr/bin/chrt -f 10 /usr/local/bin/stalld $CLIST $AGGR $BP $BR $BD $THRESH $LOGGING $FG $PF

Using chrt with priority 10 and SCHED_FIFO.

Comment 6 Simon 2020-11-03 17:34:47 UTC
Verified on v: 4.7.0-0.nightly-2020-11-03-111352

Comment 9 errata-xmlrpc 2021-02-24 15:28:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.