Bug 1892457
Summary: | NTO-shipped stalld needs to use FIFO for boosting. | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Jiří Mencák <jmencak> | |
Component: | Node Tuning Operator | Assignee: | Jiří Mencák <jmencak> | |
Status: | CLOSED ERRATA | QA Contact: | Simon <skordas> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 4.6 | CC: | sejug | |
Target Milestone: | --- | |||
Target Release: | 4.7.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | No Doc Update | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1892459 (view as bug list) | Environment: | ||
Last Closed: | 2021-02-24 15:28:37 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1892459 |
Description
Jiří Mencák
2020-10-28 20:06:10 UTC
Upstream PRs https://github.com/openshift/cluster-node-tuning-operator/pull/168 https://github.com/openshift/cluster-node-tuning-operator/pull/169 Regression pass. My intent by moving this back and retitling the PRs was to ensure that this made it back into normal process, lets leave this MODIFIED so that the normal automation ensures it gets linked to the 4.7 errata. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version False True 68m Unable to apply 4.7.0-0.nightly-2020-11-03-062304: the cluster operator image-registry has not yet successfully rolled out $ oc project openshift-cluster-node-tuning-operator Now using project "openshift-cluster-node-tuning-operator" on server "https://api.jmencak.gcp.devcluster.openshift.com:6443". $ oc get no NAME STATUS ROLES AGE VERSION jmencak-lmmc8-master-0.c.openshift-gce-devel.internal Ready master 65m v1.19.0+74d9cb5 jmencak-lmmc8-master-1.c.openshift-gce-devel.internal Ready master 65m v1.19.0+74d9cb5 jmencak-lmmc8-master-2.c.openshift-gce-devel.internal Ready master 65m v1.19.0+74d9cb5 jmencak-lmmc8-worker-a-vvgj4.c.openshift-gce-devel.internal Ready worker 56m v1.19.0+74d9cb5 jmencak-lmmc8-worker-b-drvv9.c.openshift-gce-devel.internal Ready worker 56m v1.19.0+74d9cb5 $ oc label no jmencak-lmmc8-worker-a-vvgj4.c.openshift-gce-devel.internal node-role.kubernetes.io/worker-rt= node/jmencak-lmmc8-worker-a-vvgj4.c.openshift-gce-devel.internal labeled $ oc create -f- <<EOF apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: name: worker-rt labels: worker-rt: "" spec: machineConfigSelector: matchExpressions: - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,worker-rt]} nodeSelector: matchLabels: node-role.kubernetes.io/worker-rt: "" EOF $ oc create -f- <<EOF apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: name: openshift-realtime namespace: openshift-cluster-node-tuning-operator spec: profile: - data: | [main] summary=Custom OpenShift realtime profile include=openshift-node,realtime [variables] # isolated_cores take a list of ranges; e.g. isolated_cores=2,4-7 isolated_cores=1 #isolate_managed_irq=Y not_isolated_cores_expanded=${f:cpulist_invert:${isolated_cores_expanded}} [bootloader] cmdline_ocp_realtime=+systemd.cpu_affinity=${not_isolated_cores_expanded} [service] service.stalld=start,enable name: openshift-realtime recommend: - machineConfigLabels: machineconfiguration.openshift.io/role: "worker-rt" priority: 20 profile: openshift-realtime EOF $ oc get no NAME STATUS ROLES AGE VERSION jmencak-lmmc8-master-0.c.openshift-gce-devel.internal Ready master 69m v1.19.0+74d9cb5 jmencak-lmmc8-master-1.c.openshift-gce-devel.internal Ready master 69m v1.19.0+74d9cb5 jmencak-lmmc8-master-2.c.openshift-gce-devel.internal Ready master 69m v1.19.0+74d9cb5 jmencak-lmmc8-worker-a-vvgj4.c.openshift-gce-devel.internal Ready,SchedulingDisabled worker,worker-rt 59m v1.19.0+74d9cb5 jmencak-lmmc8-worker-b-drvv9.c.openshift-gce-devel.internal Ready worker 59m v1.19.0+74d9cb5 $ oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-6c73da35252d96e7767394716b7009bc True False False 3 3 3 0 67m worker rendered-worker-ab6f82757a950d2a151998b692b0f6be True False False 1 1 1 0 67m worker-rt False True False 1 0 0 0 53s $ oc get no NAME STATUS ROLES AGE VERSION jmencak-lmmc8-master-0.c.openshift-gce-devel.internal Ready master 78m v1.19.0+74d9cb5 jmencak-lmmc8-master-1.c.openshift-gce-devel.internal Ready master 78m v1.19.0+74d9cb5 jmencak-lmmc8-master-2.c.openshift-gce-devel.internal Ready master 78m v1.19.0+74d9cb5 jmencak-lmmc8-worker-a-vvgj4.c.openshift-gce-devel.internal Ready worker,worker-rt 68m v1.19.0+74d9cb5 jmencak-lmmc8-worker-b-drvv9.c.openshift-gce-devel.internal Ready worker 68m v1.19.0+74d9cb5 $ oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-6c73da35252d96e7767394716b7009bc True False False 3 3 3 0 78m worker rendered-worker-ab6f82757a950d2a151998b692b0f6be True False False 1 1 1 0 78m worker-rt rendered-worker-rt-8f093c3d15d6f63bf35876befedb9bdf True False False 1 1 1 0 11m $ oc debug no/jmencak-lmmc8-worker-a-vvgj4.c.openshift-gce-devel.internal Creating debug namespace/openshift-debug-node-zh4v7 ... Starting pod/jmencak-lmmc8-worker-a-vvgj4copenshift-gce-develinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.32.2 If you don't see a command prompt, try pressing enter. sh-4.4# ps auxww|grep stalld root 3740 0.5 0.0 8296 2744 ? Ss 07:59 0:02 /usr/local/bin/stalld -p 1000000000 -r 10000 -d 3 -t 20 --log_syslog --log_kmsg --foreground --pidfile /run/stalld.pid root 8874 0.0 0.0 9180 1064 pts/0 S+ 08:05 0:00 grep stalld Threshold changed to 20s. sh-4.4# grep ExecStart /host/etc/systemd/system/stalld.service ExecStart=/usr/bin/chrt -f 10 /usr/local/bin/stalld $CLIST $AGGR $BP $BR $BD $THRESH $LOGGING $FG $PF Using chrt with priority 10 and SCHED_FIFO. Verified on v: 4.7.0-0.nightly-2020-11-03-111352 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |