1. Install ocp-4.7.7 and pao from prod (pao-4.7.2-1) [root@dell-r640-015 performance]# /root/img-nvr.sh f31fcaa57d82c50cd2a4b10ca0420940b7d2c24edee4a199b5161bd24c9783a7 NVR=v4.7.2-1 2. Check the tuned profile: oot@dell-r640-015 performance]# oc describe tuned/openshift-node-performance-performance Name: openshift-node-performance-performance Namespace: openshift-cluster-node-tuning-operator Labels: <none> Annotations: <none> API Version: tuned.openshift.io/v1 Kind: Tuned Metadata: Creation Timestamp: 2021-04-26T07:07:44Z Generation: 1 Managed Fields: API Version: tuned.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:ownerReferences: .: k:{"uid":"a61ec883-ec20-4862-a44e-71afd641e657"}: .: f:apiVersion: f:blockOwnerDeletion: f:controller: f:kind: f:name: f:uid: f:spec: .: f:profile: f:recommend: f:status: Manager: performance-operator Operation: Update Time: 2021-04-26T07:07:44Z Owner References: API Version: performance.openshift.io/v2 Block Owner Deletion: true Controller: true Kind: PerformanceProfile Name: performance UID: a61ec883-ec20-4862-a44e-71afd641e657 Resource Version: 61672 Self Link: /apis/tuned.openshift.io/v1/namespaces/openshift-cluster-node-tuning-operator/tuneds/openshift-node-performance-performance UID: 2fc4d332-c6a6-4e27-9324-5e5121cd47fd Spec: Profile: Data: [main] summary=Openshift node optimized for deterministic performance at the cost of increased power consumption, focused on low latency network performance. Based on Tuned 2.11 and Cluster node tuning (oc 4.5) include=openshift-node,cpu-partitioning # Inheritance of base profiles legend: # cpu-partitioning -> network-latency -> latency-performance # https://github.com/redhat-performance/tuned/blob/master/profiles/latency-performance/tuned.conf # https://github.com/redhat-performance/tuned/blob/master/profiles/network-latency/tuned.conf # https://github.com/redhat-performance/tuned/blob/master/profiles/cpu-partitioning/tuned.conf # All values are mapped with a comment where a parent profile contains them. # Different values will override the original values in parent profiles. [variables] # isolated_cores take a list of ranges; e.g. isolated_cores=2,4-7 isolated_cores=1-3 not_isolated_cores_expanded=${f:cpulist_invert:${isolated_cores_expanded}} [cpu] force_latency=cstate.id:1|3 # latency-performance (override) governor=performance # latency-performance energy_perf_bias=performance # latency-performance min_perf_pct=100 # latency-performance # Comment the stalld service section to prevent stalld installation # until bugs https://bugzilla.redhat.com/show_bug.cgi?id=1912118 and # https://bugzilla.redhat.com/show_bug.cgi?id=1903302 will be fixed #[service] #service.stalld=stop,disable 3. Upgrade to ocp-4.7.8 oc adm upgrade --to-image registry.ci.openshift.org/ocp/release@sha256:7456516a64edf63268522565cf00dc581f1d7ad22355ffab8157a9e106cf607f --allow-explicit-upgrade --force warning: The requested upgrade image is not one of the available updates. You have used --allow-explicit-upgrade to the update to proceed anyway warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures. Updating to release image registry.ci.openshift.org/ocp/release@sha256:7456516a64edf63268522565cf00dc581f1d7ad22355ffab8157a9e106cf607f [root@dell-r640-015 ~]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.7 True True 7s Working towards registry.ci.openshift.org/ocp/release@sha256:7456516a64edf63268522565cf00dc581f1d7ad22355ffab8157a9e106cf607f: downloading update [root@dell-r640-015 ~]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.8 True False 44s Cluster version is 4.7.8 [root@dell-r640-015 ~]# oc get nodes NAME STATUS ROLES AGE VERSION ocp47-master-0.demo.lab.shanks Ready master 4h34m v1.20.0+7d0a2b2 ocp47-master-1.demo.lab.shanks Ready master 4h34m v1.20.0+7d0a2b2 ocp47-master-2.demo.lab.shanks Ready master 4h34m v1.20.0+7d0a2b2 ocp47-worker-0.demo.lab.shanks Ready worker,worker-cnf 4h22m v1.20.0+7d0a2b2 ocp47-worker-1.demo.lab.shanks Ready worker,worker-cnf 4h22m v1.20.0+7d0a2b2 ocp47-worker-2.demo.lab.shanks Ready worker 4h20m v1.20.0+7d0a2b2 4. Upgradee performance addon operator to 4.7.3-1 [root@dell-r640-015 performance]# oc get csv NAME DISPLAY VERSION REPLACES PHASE performance-addon-operator.v4.7.3 Performance Addon Operator 4.7.3 performance-addon-operator.v4.7.2 Succeeded [root@dell-r640-015 ~]# oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ocp47-master-0.demo.lab.shanks Ready master 4h34m v1.20.0+7d0a2b2 192.168.122.17 <none> Red Hat Enterprise Linux CoreOS 47.83.202104161442-0 (Ootpa) 4.18.0-240.22.1.el8_3.x86_64 cri-o://1.20.2-6.rhaos4.7.gitf1d5201.el8 ocp47-master-1.demo.lab.shanks Ready master 4h34m v1.20.0+7d0a2b2 192.168.122.233 <none> Red Hat Enterprise Linux CoreOS 47.83.202104161442-0 (Ootpa) 4.18.0-240.22.1.el8_3.x86_64 cri-o://1.20.2-6.rhaos4.7.gitf1d5201.el8 ocp47-master-2.demo.lab.shanks Ready master 4h34m v1.20.0+7d0a2b2 192.168.122.220 <none> Red Hat Enterprise Linux CoreOS 47.83.202104161442-0 (Ootpa) 4.18.0-240.22.1.el8_3.x86_64 cri-o://1.20.2-6.rhaos4.7.gitf1d5201.el8 ocp47-worker-0.demo.lab.shanks Ready worker,worker-cnf 4h22m v1.20.0+7d0a2b2 192.168.122.26 <none> Red Hat Enterprise Linux CoreOS 47.83.202104161442-0 (Ootpa) 4.18.0-240.22.1.rt7.77.el8_3.x86_64 cri-o://1.20.2-6.rhaos4.7.gitf1d5201.el8 ocp47-worker-1.demo.lab.shanks Ready worker,worker-cnf 4h22m v1.20.0+7d0a2b2 192.168.122.241 <none> Red Hat Enterprise Linux CoreOS 47.83.202104161442-0 (Ootpa) 4.18.0-240.22.1.rt7.77.el8_3.x86_64 cri-o://1.20.2-6.rhaos4.7.gitf1d5201.el8 ocp47-worker-2.demo.lab.shanks Ready worker 4h20m v1.20.0+7d0a2b2 192.168.122.183 <none> Red Hat Enterprise Linux CoreOS 47.83.202104161442-0 (Ootpa) 4.18.0-240.22.1.el8_3.x86_64 cri-o://1.20.2-6.rhaos4.7.gitf1d5201.el8 5. Check the tuned profile and verify stalld is enabled [root@dell-r640-015 performance]# oc describe tuned/openshift-node-performance-performance Name: openshift-node-performance-performance Namespace: openshift-cluster-node-tuning-operator Labels: <none> Annotations: <none> API Version: tuned.openshift.io/v1 Kind: Tuned Metadata: Creation Timestamp: 2021-04-26T07:07:44Z Generation: 3 Managed Fields: API Version: tuned.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:ownerReferences: .: k:{"uid":"a61ec883-ec20-4862-a44e-71afd641e657"}: .: f:apiVersion: f:blockOwnerDeletion: f:controller: f:kind: f:name: f:uid: f:spec: .: f:profile: f:recommend: f:status: Manager: performance-operator Operation: Update Time: 2021-04-26T07:07:44Z Owner References: API Version: performance.openshift.io/v2 Block Owner Deletion: true Controller: true Kind: PerformanceProfile Name: performance UID: a61ec883-ec20-4862-a44e-71afd641e657 Resource Version: 212538 Self Link: /apis/tuned.openshift.io/v1/namespaces/openshift-cluster-node-tuning-operator/tuneds/openshift-node-performance-performance UID: 2fc4d332-c6a6-4e27-9324-5e5121cd47fd Spec: Profile: Data: [main] summary=Openshift node optimized for deterministic performance at the cost of increased power consumption, focused on low latency network performance. Based on Tuned 2.11 and Cluster node tuning (oc 4.5) include=openshift-node,cpu-partitioning # Inheritance of base profiles legend: # cpu-partitioning -> network-latency -> latency-performance # https://github.com/redhat-performance/tuned/blob/master/profiles/latency-performance/tuned.conf # https://github.com/redhat-performance/tuned/blob/master/profiles/network-latency/tuned.conf # https://github.com/redhat-performance/tuned/blob/master/profiles/cpu-partitioning/tuned.conf # All values are mapped with a comment where a parent profile contains them. # Different values will override the original values in parent profiles. [variables] # isolated_cores take a list of ranges; e.g. isolated_cores=2,4-7 isolated_cores=4-46 not_isolated_cores_expanded=${f:cpulist_invert:${isolated_cores_expanded}} [cpu] force_latency=cstate.id:1|3 # latency-performance (override) governor=performance # latency-performance energy_perf_bias=performance # latency-performance min_perf_pct=100 # latency-performance [service] service.stalld=start,enable [root@ocp47-worker-0 ~]# systemctl status stalld ● stalld.service - Stall Monitor Loaded: loaded (/etc/systemd/system/stalld.service; static; vendor preset: disabled) Active: active (running) since Mon 2021-04-26 13:52:55 UTC; 1h 16min ago Main PID: 10639 (stalld) Tasks: 1 (limit: 205142) Memory: 4.2M CPU: 2min 10.809s CGroup: /system.slice/stalld.service └─10639 /usr/local/bin/stalld --systemd -p 1000000000 -r 10000 -d 3 -t 20 --log_syslog --log_kmsg --foreground --pidfile /run/stalld.pid Apr 26 13:52:48 ocp47-worker-0.demo.lab.shanks systemd[1]: Starting Stall Monitor... Apr 26 13:52:55 ocp47-worker-0.demo.lab.shanks systemd[1]: Started Stall Monitor. Apr 26 13:52:55 ocp47-worker-0.demo.lab.shanks stalld[10639]: dl_runtime is shorter than 1ms, setting HRTICK Apr 26 13:52:55 ocp47-worker-0.demo.lab.shanks stalld[10639]: boosted pid 0 using SCHED_DEADLINE Apr 26 13:52:55 ocp47-worker-0.demo.lab.shanks stalld[10639]: using SCHED_DEADLINE for boosting Apr 26 13:52:55 ocp47-worker-0.demo.lab.shanks stalld[10639]: initial config_buffer_size set to 535500 Apr 26 13:52:55 ocp47-worker-0.demo.lab.shanks stalld[10639]: detected new task format Apr 26 13:52:55 ocp47-worker-0.demo.lab.shanks stalld[10639]: sched_debug is getting larger, increasing the buffer to 1071000 Apr 26 14:48:07 ocp47-worker-0.demo.lab.shanks stalld[10639]: sched_debug is getting larger, increasing the buffer to 2142000
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.7.8 low-latency extras update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:1349