Bug 2055267
| Summary: | OSLAT runner uses both sibling threads causing latency spikes | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | OpenShift BugZilla Robot <openshift-bugzilla-robot> |
| Component: | CNF Platform Validation | Assignee: | Talor Itzhak <titzhak> |
| Status: | CLOSED ERRATA | QA Contact: | Dwaine Gonyier <dgonyier> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.9 | CC: | aos-bugs, dgonyier, mniranja, shajmakh, titzhak |
| Target Milestone: | --- | ||
| Target Release: | 4.10.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-12-06 13:00:39 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 2051443 | ||
| Bug Blocks: | |||
|
Description
OpenShift BugZilla Robot
2022-02-16 14:42:14 UTC
Verification:
cnf-tests: registry-proxy.engineering.redhat.com/rh-osbs/openshift4-cnf-tests:v4.10.10-6
The machine has HT enabled, as can be seen below that CPU 0 has a sibling 40:
sh-4.4# cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
0,40
also from lscpu:
sh-4.4# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 80
On-line CPU(s) list: 0-79
Thread(s) per core: 2
Core(s) per socket: 20
Socket(s): 2
NUMA node(s): 2
...
CPU(s)= Socket(s)*Core(s) per socket
configure the PP as follows:
```
spec:
cpu:
isolated: "2-39,42-80"
reserved: "0,1,40,41"
realTimeKernel:
enabled: true
nodeSelector:
node-role.kubernetes.io/worker: ""
```
Run oslat as follows requesting 6 cpus:
podman run --net=host -v /home/kni/clusterconfigs/auth:/kc:z -e KUBECONFIG=/kc/kubeconfig -e IMAGE_REGISTRY=registry.hlxcl6.lab.eng.tlv2.redhat.com:5000/ -e CNF_TESTS_IMAGE=openshift4-cnf-tests:v4.10.10-6 -e LATENCY_TEST_RUN=true -e DISCOVERY_MODE=true -e ROLE_WORKER_CNF=worker -e LATENCY_TEST_RUNTIME=100 -e LATENCY_TEST_CPUS=6 -e MAXIMUM_LATENCY=10000000 registry.hlxcl6.lab.eng.tlv2.redhat.com:5000/openshift4-cnf-tests:v4.10.10-6 /usr/bin/test-run.sh -ginkgo.focus="oslat"
Trying to pull registry.hlxcl6.lab.eng.tlv2.redhat.com:5000/openshift4-cnf-tests:v4.10.10-6...
Getting image source signatures
...
Will run 1 of 173 specs
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSLog file created at: 2022/11/22 12:00:21
Running on machine: oslat-d56kq
Binary: Built with gc go1.17.12 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I1122 12:00:21.564781 1 node.go:39] Environment information: /proc/cmdline: BOOT_IMAGE=(hd1,gpt3)/ostree/rhcos-b594aea28251da3b472da2adba0a57d5fcf82c28c87897a88eb26e6db542b18b/vmlinuz-4.18.0-425.3.1.rt7.213.el8.x86_64 ignition.platform.id=metal ostree=/ostree/boot.0/rhcos/b594aea28251da3b472da2adba0a57d5fcf82c28c87897a88eb26e6db542b18b/0 ip=dhcp root=UUID=10f5e35f-f897-41e5-ab22-a4e811b9d4eb rw rootflags=prjquota boot=UUID=97cd8797-d71c-4b73-a9f3-5b8f41e0a473
I1122 12:00:21.564989 1 node.go:46] Environment information: kernel version 4.18.0-425.3.1.rt7.213.el8.x86_64
I1122 12:00:21.565010 1 main.go:58] Running the oslat command with arguments [--duration 100 --rtprio 1 --cpu-list 3-4,43-44 --cpu-main-thread 2]
I1122 12:02:02.603193 1 main.go:64] Succeeded to run the oslat command: oslat V 1.10
Total runtime: 100 seconds
Thread priority: SCHED_FIFO:1
CPU list: 3-4,43-44
CPU for main thread: 2
Workload: no
Workload mem: 0 (KiB)
Preheat cores: 4
Pre-heat for 1 seconds...
Test starts...
Test completed.
Core: 3 4 43 44
CPU Freq: 2392 2392 2398 2392 (Mhz)
001 (us): 3334852986 3331365521 3335019579 3334277484
002 (us): 86 88 71 79
003 (us): 10595 310 11890 37009
004 (us): 63866 37955 65905 51737
005 (us): 18945 44141 15939 5529
006 (us): 1077 11262 770 235
007 (us): 87 794 67 64
008 (us): 45 63 46 46
009 (us): 55 41 46 53
010 (us): 32 57 30 43
011 (us): 578 424 728 439
012 (us): 870 1031 723 1014
013 (us): 3166 2057 3255 2454
014 (us): 191 1346 99 885
015 (us): 102 73 106 88
016 (us): 118 103 248 195
017 (us): 3709 2250 3637 2267
018 (us): 872 2370 806 2250
019 (us): 11 16 14 13
020 (us): 37 18 32 32
021 (us): 25 23 24 30
022 (us): 4 35 7 3
023 (us): 5 6 5 3
024 (us): 2 3 2 2
025 (us): 1 3 1 2
026 (us): 3 3 3 4
027 (us): 4 2 4 2
028 (us): 4 5 4 6
029 (us): 3 4 3 3
030 (us): 0 0 0 0
031 (us): 0 0 0 0
032 (us): 109 109 109 109 (including overflows)
Minimum: 1 1 1 1 (us)
Average: 1.002 1.002 1.002 1.002 (us)
Maximum: 50167 50168 50058 50184 (us)
Max-Min: 50166 50167 50057 50183 (us)
Duration: 100.095 100.095 99.845 100.095 (sec)
------------------------------
• [SLOW TEST:173.351 seconds]
[performance] Latency Test
/remote-source/app/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:62
with the oslat image
/remote-source/app/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:132
should succeed
/remote-source/app/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:157
------------------------------
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSI1122 12:03:21.567281 27 request.go:665] Waited for 1.171689031s due to client-side throttling, not priority and fairness, request: GET:https://api.hlxcl6.lab.eng.tlv2.redhat.com:6443/apis/flowcontrol.apiserver.k8s.io/v1beta1?timeout=32s
JUnit report was created: /junit.xml/cnftests-junit.xml
Ran 1 of 173 Specs in 215.647 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 172 Skipped
The pod logs are as follows:
oc logs oslat-d56kq
I1122 12:00:21.564781 1 node.go:39] Environment information: /proc/cmdline: BOOT_IMAGE=(hd1,gpt3)/ostree/rhcos-b594aea28251da3b472da2adba0a57d5fcf82c28c87897a88eb26e6db542b18b/vmlinuz-4.18.0-425.3.1.rt7.213.el8.x86_64 ignition.platform.id=metal ostree=/ostree/boot.0/rhcos/b594aea28251da3b472da2adba0a57d5fcf82c28c87897a88eb26e6db542b18b/0 ip=dhcp root=UUID=10f5e35f-f897-41e5-ab22-a4e811b9d4eb rw rootflags=prjquota boot=UUID=97cd8797-d71c-4b73-a9f3-5b8f41e0a473
I1122 12:00:21.564989 1 node.go:46] Environment information: kernel version 4.18.0-425.3.1.rt7.213.el8.x86_64
I1122 12:00:21.565010 1 main.go:58] Running the oslat command with arguments [--duration 100 --rtprio 1 --cpu-list 3-4,43-44 --cpu-main-thread 2]
[root@registry ~]# ^C
2 is the main CPU to run the test, and its sibling is 42,
--cpu-list contains 4 cpus, in total that makes it 6 cpus.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.10 low-latency extras update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:8760 |