Bug 2061676 - cnf-tests: oslat container lack of cpu-quota.crio.io annotation
Summary: cnf-tests: oslat container lack of cpu-quota.crio.io annotation
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: CNF Platform Validation
Version: 4.9
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 4.11.0
Assignee: Talor Itzhak
QA Contact: Nikita
URL:
Whiteboard:
Depends On:
Blocks: 2068917
TreeView+ depends on / blocked
 
Reported: 2022-03-08 09:58 UTC by Franck Baudin
Modified: 2022-08-26 15:29 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: oslat container specs lacks of """ cpu-quota.crio.io: "disable" """ annotation, resulting to high latency. Consequence: The """ cpu-quota.crio.io: "disable" """ annotation was missing from the pod definition during the creation Fix: The mentioned annotation was appended during the pod creation. Result: The """ cpu-quota.crio.io: "disable" """ annotation appears in the oslat pod's spec.
Clone Of:
Environment:
Last Closed: 2022-08-26 15:29:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift-kni performance-addon-operators pull 859 0 None open Bug 2061676: latency tests: disable cpu cfs quota 2022-03-08 10:54:03 UTC
Red Hat Bugzilla 2031010 1 unspecified CLOSED CPU throttling on pods with isolated and pinned CPUs 2023-01-20 10:38:27 UTC

Description Franck Baudin 2022-03-08 09:58:20 UTC
Description of problem:

oslat container specs lacks of """ cpu-quota.crio.io: "disable" """ annotation, resulting to high latency. 

Why this annotation is needed is explained in https://docs.openshift.com/container-platform/4.9/scalability_and_performance/cnf-performance-addon-operator-for-low-latency-nodes.html#disabling-cpu-cfs-quota_cnf-master


Version-Release number of selected component (if applicable):
OCP 4.9.9

How reproducible:
100% 

Steps to Reproduce:

sudo podman run  --network=host -ti -v ${HOME}/bin:/kubeconfig:Z -e KUBECONFIG=/kubeconfig/kubeconfig -e DISCOVERY_MODE=true  -e LATENCY_TEST_RUN=true -e PERF_TEST_PROFILE=dpdk-ready -e LATENCY_TEST_CPUS=16 -e LATENCY_TEST_RUNTIME=600 -e OSLAT_MAXIMUM_LATENCY=20 registry.redhat.io/openshift4/cnf-tests-rhel8:v4.9 /usr/bin/test-run.sh -ginkgo.focus="oslat"

Actual results:

$ oc describe pod -n performance-addon-operators-testing  `oc get pods -n performance-addon-operators-testing -o jsonpath='{.items[0].metadata.name}'` | grep crio
Annotations:  cpu-load-balancing.crio.io: disable
              irq-load-balancing.crio.io: disable
## cpu-quota.crio.io: "disable" is missing !! ###

$ oc logs -n performance-addon-operators-testing  `oc get pods -n performance-addon-operators-testing -o jsonpath='{.items[0].metadata.name}'` -f
I0308 09:37:48.971225       1 node.go:37] Environment information: /proc/cmdline: BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-df759d438ed374e2238a31f97545c06b6a51975b2972a6d398b877f566d76f5c/vmlinuz-4.18.0-305.28.1.el8_4.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ignition.platform.id=metal ostree=/ostree/boot.1/rhcos/df759d438ed374e2238a31f97545c06b6a51975b2972a6d398b877f566d76f5c/0 ip=eno1:dhcp root=UUID=5819802b-6075-4af0-8468-a6cd3b9c7db0 rw rootflags=prjquota skew_tick=1 nohz=on rcu_nocbs=2-17,38-53,20-35,56-71 tuned.non_isolcpus=00c00030,000c0003 intel_pstate=disable nosoftlockup tsc=nowatchdog intel_iommu=on iommu=pt isolcpus=managed_irq,2-17,38-53,20-35,56-71 systemd.cpu_affinity=0,1,36,37,18,19,54,55 default_hugepagesz=1G hugepagesz=1G hugepages=64 +
I0308 09:37:48.971423       1 node.go:44] Environment information: kernel version 4.18.0-305.28.1.el8_4.x86_64
I0308 09:37:48.971450       1 main.go:53] Running the oslat command with arguments [--duration 600 --rtprio 1 --cpu-list 3-9,38-45 --cpu-main-thread 2]
I0308 09:47:50.084654       1 main.go:59] Succeeded to run the oslat command: oslat V 2.10
Total runtime: 		600 seconds
Thread priority: 	SCHED_FIFO:1
CPU list: 		3-9,38-45
CPU for main thread: 	2
Workload: 		no
Workload mem: 		0 (KiB)
Preheat cores: 		15

Pre-heat for 1 seconds...
Test starts...
Test completed.

        Core:	 3 4 5 6 7 8 9 38 39 40 41 42 43 44 45
    CPU Freq:	 2298 2293 2293 2293 2293 2293 2293 2293 2293 2293 2293 2293 2293 2293 2293 (Mhz)
    001 (us):	 19427311688 19467885180 19445758996 19458233054 18609634261 19446794675 19446123672 20354621104 19450099527 19465985167 19445350459 19456170545 18602833145 19447259790 19446452552
    002 (us):	 893 3512 28173 26947 49793 31188 25682 231144 39335 17936 42307 18886 56956 61376 40357
    003 (us):	 219030 594964 570980 571073 548440 566566 571254 312408 559770 570037 557072 557232 541246 538165 558365
    004 (us):	 302116 2308 1560 2690 2506 2890 3773 55151 2095 12449 1146 24328 2182 986 1826
    005 (us):	 77241 6 5 19 12 25 19 1833 102 106 5 146 22 2 6
    006 (us):	 1441 2 2 6 1 3 6 17 38 6 0 16 3 1 3
    007 (us):	 28 1 5 2 1 8 9 6 66 9 9 2 21 19 10
    008 (us):	 5 0 10 2 6 27 14 6 66 21 27 2 40 34 10
    009 (us):	 9 0 9 6 9 30 19 11 17 18 23 3 39 23 13
    010 (us):	 12 0 14 8 10 24 11 18 8 23 22 8 29 17 10
    011 (us):	 14 1 11 13 8 10 7 20 5 15 12 4 19 8 10
    012 (us):	 1 0 12 6 8 10 0 13 4 7 3 5 11 1 8
    013 (us):	 0 0 7 10 4 6 2 6 6 5 3 9 12 0 9
    014 (us):	 2 0 3 8 3 6 1 3 1 8 1 2 10 1 5
    015 (us):	 1 0 2 2 2 0 0 2 0 4 1 2 7 1 0
    016 (us):	 0 0 0 3 0 1 0 2 2 3 0 11 10 0 0
    017 (us):	 1 0 1 1 0 3 0 1 0 0 0 5 10 0 0
    018 (us):	 0 0 0 0 0 1 0 0 1 5 0 8 8 0 0
    019 (us):	 0 0 1 0 0 2 0 0 0 2 0 0 3 1 0
    020 (us):	 0 0 0 0 0 1 0 0 0 0 0 1 5 0 0
    021 (us):	 0 0 0 0 0 0 0 0 0 0 0 1 5 0 0
    022 (us):	 0 0 0 0 0 0 0 1 0 0 0 0 2 0 0
    023 (us):	 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
    024 (us):	 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    025 (us):	 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0
    026 (us):	 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
    027 (us):	 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
    028 (us):	 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0
    029 (us):	 0 0 0 0 0 0 0 0 0 2 0 2 0 0 0
    030 (us):	 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    031 (us):	 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
    032 (us):	 0 0 0 0 0 0 0 0 0 1 0 3 0 0 0 (including overflows)
     Minimum:	 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 (us)
     Average:	 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 (us)
     Maximum:	 17 11 19 17 15 20 14 22 18 34 15 112 23 19 14 (us)
     Max-Min:	 16 10 18 16 14 19 13 21 17 33 14 111 22 18 13 (us)
    Duration:	 599.089 600.396 600.396 600.396 600.396 600.396 600.396 600.396 600.396 600.396 600.396 600.396 600.396 600.396 600.396 (sec)


Expected results:

Latency below 20usecs and 
$ oc describe pod -n performance-addon-operators-testing  `oc get pods -n performance-addon-operators-testing -o jsonpath='{.items[0].metadata.name}'` | grep crio
Annotations:  cpu-load-balancing.crio.io: disable
              irq-load-balancing.crio.io: disable
              cpu-quota.crio.io: disable

The missing code line is here (thanks Martin!): https://github.com/openshift-kni/performance-addon-operators/blob/master/functests/4_latency/latency.go#L383

Comment 1 Shereen Haj Makhoul 2022-03-24 16:00:06 UTC
Verification:

cnf-tests: registry-proxy.engineering.redhat.com/rh-osbs/openshift4-cnf-tests:v4.11.0-7 
PAO: 4.10.1
OCP: 4.10.6

Steps:
=====
podman run  -v $KUBECONFIG:/root/kubeconfig:Z -e KUBECONFIG=/root/kubeconfig -e IMAGE_REGISTRY=registry-proxy.engineering.redhat.com/rh-osbs -e CNF_TESTS_IMAGE=openshift4-cnf-tests:v4.11.0-7 -e PERF_TEST_PROFILE=performance -e ROLE_WORKER_CNF=worker-cnf -e LATENCY_TEST_RUN=true -e LATENCY_TEST_RUNTIME=180 -e MAXIMUM_LATENCY=2000000 -e DISCOVERY_MODE=true registry-proxy.engineering.redhat.com/rh-osbs/openshift4-cnf-tests:v4.11.0-7 usr/bin/test-run.sh -ginkgo.focus="Latency\ Test" -ginkgo.v


Output:
=======
from pod's description of:

oslat:
Name:         oslat-77tkn
Namespace:    performance-addon-operators-testing
Priority:     0
Node:         ocp-worker-0.demo.lab.shajmakh/192.168.122.204
Start Time:   Thu, 24 Mar 2022 17:42:33 +0200
Labels:       <none>
Annotations:  cpu-load-balancing.crio.io: disable
              cpu-quota.crio.io: disable
              irq-load-balancing.crio.io: disable


cyclictest:
Name:         cyclictest-rkm2j
Namespace:    performance-addon-operators-testing
Priority:     0
Node:         ocp-worker-0.demo.lab.shajmakh/192.168.122.204
Start Time:   Thu, 24 Mar 2022 17:47:25 +0200
Labels:       <none>
Annotations:  cpu-load-balancing.crio.io: disable
              cpu-quota.crio.io: disable
              irq-load-balancing.crio.io: disable


hwlatdetec:

Name:         hwlatdetect-p7z5h
Namespace:    performance-addon-operators-testing
Priority:     0
Node:         ocp-worker-0.demo.lab.shajmakh/192.168.122.204
Start Time:   Thu, 24 Mar 2022 17:51:31 +0200
Labels:       <none>
Annotations:  cpu-load-balancing.crio.io: disable
              cpu-quota.crio.io: disable
              irq-load-balancing.crio.io: disable

as can be seen above, cpu-quota.crio.io: disable is now available.


Note You need to log in before you can comment on or make changes to this bug.