+++ This bug was initially created as a clone of Bug #1979554 +++ Description of problem: Latency tests runs on V1.9 oslat version , while 4.8 cnf-tests image uses the latest version with modified tool options , and one of the used options in our tests is deleted so oslat tests fails to run because the no longer supported option --runtime. Version-Release number of selected component (if applicable): 4.8 How reproducible: always Steps to Reproduce: 1.pull cnf-tests image http://registry-proxy.engineering.redhat.com/rh-osbs/openshift4-cnf-tests:v4.8.0-51 podman run -d -ti --privileged --entrypoint=/bin/bash registry-proxy.engineering.redhat.com/rh-osbs/openshift4-cnf-tests:v4.8.0-51 2. run only latency tests : podman run -v $KUBECONFIG:/root/kubeconfig:Z -e KUBECONFIG=/root/kubeconfig -e IMAGE_REGISTRY=registry-proxy.engineering.redhat.com/rh-osbs -e CNF_TESTS_IMAGE=openshift4-cnf-tests:v4.8.0-51 -e PERF_TEST_PROFILE=performance -e ROLE_WORKER_CNF=worker-cnf -e LATENCY_TEST_RUN=true -e LATENCY_TEST_RUNTIME=10 -e OSLAT_MAXIMUM_LATENCY=20 -e DISCOVERY_MODE=true registry-proxy.engineering.redhat.com/rh-osbs/openshift4-cnf-tests:v4.8.0-51 usr/bin/test-run.sh -ginkgo.focus="\[performance\]\ Latency\ Test" 3. Actual results: ================================================= -The test fails with the following error : Unexpected error: <*errors.errorString | 0xc0003dcc50>: { s: "timed out waiting for the condition", } timed out waiting for the condition occurred /remote-source/app/vendor/github.com/openshift-kni/performance-addon- operators/functests/4_latency/latency.go:138 ================================================== -Display the logs of the test's pod in the test's namespace : oc logs oslat-rt7xn -n performance-addon-operators-testing I0706 10:43:16.165323 1 main.go:83] Environment information: /proc/cmdline: BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-3c8ec880c490645b99e01bb052821294c500c2478a715a00dacdffe87b0825f6/vmlinuz-4.18.0-305.3.1.rt7.75.el8_4.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ignition.platform.id=qemu ostree=/ostree/boot.1/rhcos/3c8ec880c490645b99e01bb052821294c500c2478a715a00dacdffe87b0825f6/0 root=UUID=8e7cc1f7-cbe0-432c-aa91-0b3528ccc7a3 rw rootflags=prjquota skew_tick=1 nohz=on rcu_nocbs=1-3 tuned.non_isolcpus=fffffff1 intel_pstate=disable nosoftlockup tsc=nowatchdog intel_iommu=on iommu=pt isolcpus=managed_irq,1-3 systemd.cpu_affinity=0,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31 default_hugepagesz=1G hugepagesz=1G hugepages=1 + I0706 10:43:16.165512 1 main.go:89] Environment information: kernel version 4.18.0-305.3.1.rt7.75.el8_4.x86_64 I0706 10:43:16.165560 1 main.go:52] Running the oslat command with arguments [--runtime 10 --rtprio 1 --cpu-list 2 --cpu-main-thread 1] F0706 10:43:16.166889 1 main.go:55] failed to run oslat command; out: /usr/bin/oslat: unrecognized option '--runtime' oslat V 1.10 Usage: oslat <options> This is an OS latency detector by running busy loops on specified cores. Please run this tool using root. Available options: -b, --bucket-size Specify the number of the buckets (4-1024) -B, --bias Add a bias to all the buckets using the estimated mininum -c, --cpu-list Specify CPUs to run on, e.g. '1,3,5,7-15' -C, --cpu-main-thread Specify which CPU the main thread runs on. Default is cpu0. -D, --duration Specify test duration, e.g., 60, 20m, 2H (m/M: minutes, h/H: hours, d/D: days) -f, --rtprio Using SCHED_FIFO priority (1-99) -m, --workload-mem Size of the memory to use for the workload (e.g., 4K, 1M). Total memory usage will be this value multiplies 2*N, because there will be src/dst buffers for each thread, and N is the number of processors for testing. -s, --single-preheat Use a single thread when measuring latency at preheat stage NOTE: please make sure the CPU frequency on all testing cores are locked before using this parmater. If you don't know how to lock the freq then please don't use this parameter. -T, --trace-threshold Stop the test when threshold triggered (in us), print a marker in ftrace and stop ftrace too. -v, --version Display the version of the software. -w, --workload Specify a kind of workload, default is no workload (options: no, memmove) -z, --zero-omit Don't display buckets in the output histogram if all zeros. ; err: exit status 1 ================================================== -Describe the test's pod : oc describe pod -n performance-addon-operators-testing Name: oslat-rt7xn Namespace: performance-addon-operators-testing Priority: 0 Node: ocp48-worker-0.demo.lab.shajmakh/192.168.122.114 Start Time: Tue, 06 Jul 2021 13:43:13 +0300 Labels: <none> Annotations: cpu-load-balancing.crio.io: true k8s.v1.cni.cncf.io/network-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.133.2.90" ], "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.133.2.90" ], "default": true, "dns": {} }] openshift.io/scc: node-exporter Status: Pending IP: IPs: <none> Containers: oslat-runner: Container ID: Image: registry-proxy.engineering.redhat.com/rh-osbs/openshift4-cnf-tests:v4.8.0-51 Image ID: Port: <none> Host Port: <none> Command: /usr/bin/oslat-runner Args: -logtostderr=false -alsologtostderr=true -log_file=/host/oslat.log -runtime=10 State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Limits: cpu: 2 memory: 1Gi Requests: cpu: 2 memory: 1Gi Environment: <none> Mounts: /host from logs (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9dtx5 (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: logs: Type: HostPath (bare host directory volume) Path: /var/log HostPathType: Directory kube-api-access-9dtx5: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: Guaranteed Node-Selectors: kubernetes.io/hostname=ocp48-worker-0.demo.lab.shajmakh node-role.kubernetes.io/worker-cnf= Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 2s default-scheduler Successfully assigned performance-addon-operators-testing/oslat-rt7xn to ocp48-worker-0.demo.lab.shajmakh Normal AddedInterface 0s multus Add eth0 [10.133.2.90/23] from openshift-sdn Normal Pulled 0s kubelet Container image "registry-proxy.engineering.redhat.com/rh-osbs/openshift4-cnf-tests:v4.8.0-51" already present on machine [root@ocp-edge41 ~]# oc describe pod -n performance-addon-operators-testing Name: oslat-rt7xn Namespace: performance-addon-operators-testing Priority: 0 Node: ocp48-worker-0.demo.lab.shajmakh/192.168.122.114 Start Time: Tue, 06 Jul 2021 13:43:13 +0300 Labels: <none> Annotations: cpu-load-balancing.crio.io: true k8s.v1.cni.cncf.io/network-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.133.2.90" ], "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.133.2.90" ], "default": true, "dns": {} }] openshift.io/scc: node-exporter Status: Failed IP: 10.133.2.90 IPs: IP: 10.133.2.90 Containers: oslat-runner: Container ID: cri-o://5ddab4e60b74d341a9fd4e35f8daf83c25f692b5d552849a462f488d302259a1 Image: registry-proxy.engineering.redhat.com/rh-osbs/openshift4-cnf-tests:v4.8.0-51 Image ID: registry-proxy.engineering.redhat.com/rh-osbs/openshift4-cnf-tests@sha256:690a2921183a44f7bb184b29cb5775d040b2ca0a45f226f7f177abab01bf2753 Port: <none> Host Port: <none> Command: /usr/bin/oslat-runner Args: -logtostderr=false -alsologtostderr=true -log_file=/host/oslat.log -runtime=10 State: Terminated Reason: Error Exit Code: 255 Started: Tue, 06 Jul 2021 13:43:16 +0300 Finished: Tue, 06 Jul 2021 13:43:16 +0300 Ready: False Restart Count: 0 Limits: cpu: 2 memory: 1Gi Requests: cpu: 2 memory: 1Gi Environment: <none> Mounts: /host from logs (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9dtx5 (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: logs: Type: HostPath (bare host directory volume) Path: /var/log HostPathType: Directory kube-api-access-9dtx5: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: Guaranteed Node-Selectors: kubernetes.io/hostname=ocp48-worker-0.demo.lab.shajmakh node-role.kubernetes.io/worker-cnf= Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 4s default-scheduler Successfully assigned performance-addon-operators-testing/oslat-rt7xn to ocp48-worker-0.demo.lab.shajmakh Normal AddedInterface 2s multus Add eth0 [10.133.2.90/23] from openshift-sdn Normal Pulled 2s kubelet Container image "registry-proxy.engineering.redhat.com/rh-osbs/openshift4-cnf-tests:v4.8.0-51" already present on machine Normal Created 1s kubelet Created container oslat-runner Normal Started 1s kubelet Started container oslat-runner Expected results: the test should pass with output similar to the following : running /usr/bin//validationsuite -ginkgo.focus=\[performance\]\[config\]|\[performance\]\ Latency\ Test I1204 08:14:28.107369 7 request.go:621] Throttling request took 1.043625033s, request: GET:https://api.hlxcl1.lab.eng.tlv2.redhat.com:6443/apis/discovery.k8s.io/v1beta1?timeout=32s Running Suite: CNF Features e2e validation ========================================== Random Seed: 1607069666 Will run 0 of 19 specs SSSSSSSSSSSSSSSSSSS Ran 0 of 19 Specs in 0.000 seconds SUCCESS! -- 0 Passed | 0 Failed | 0 Pending | 19 Skipped PASS Discovery mode enabled, skipping setup running /usr/bin//cnftests -ginkgo.focus=\[performance\]\[config\]|\[performance\]\ Latency\ Test I1204 08:14:34.757317 41 request.go:621] Throttling request took 1.049226619s, request: GET:https://api.hlxcl1.lab.eng.tlv2.redhat.com:6443/apis/network.openshift.io/v1?timeout=32s Running Suite: CNF Features e2e integration tests ================================================= Random Seed: 1607069671 Will run 1 of 79 specs SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSI1204 08:15:15.082544 41 nodes.go:88] found mcd machine-config-daemon-mwt5r for node helix13.lab.eng.tlv2.redhat.com I1204 08:15:30.576061 41 utils.go:23] run command 'oc [exec -i -n openshift-machine-config-operator -c machine-config-daemon --request-timeout 30 machine-config-daemon-mwt5r -- cat /rootfs/var/log/oslat.log]' (err=<nil>): stdout=Log file created at: 2020/12/04 08:15:02 Running on machine: oslat-v9wrb Binary: Built with gc go1.13.8 for linux/amd64 Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg I1204 08:15:02.748539 1 main.go:83] Environment information: /proc/cmdline: BOOT_IMAGE=(hd0,gpt1)/ostree/rhcos-1c166408d7b3b896d9717812661af1720a6e1848d2834702ee41bae53b1e059e/vmlinuz-4.18.0-193.31.1.rt13.81.el8_2.ocptestv2.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 rd.luks.options=discard ostree=/ostree/boot.0/rhcos/1c166408d7b3b896d9717812661af1720a6e1848d2834702ee41bae53b1e059e/0 ignition.platform.id=openstack root=UUID=910678ff-f77e-4a7d-8d53-86f2ac47a823 rw skew_tick=1 nohz=on rcu_nocbs=2-15 tuned.non_isolcpus=000000ff,ffff0003 intel_pstate=disable nosoftlockup tsc=nowatchdog intel_iommu=on iommu=pt isolcpus=managed_irq,2-15 systemd.cpu_affinity=0,1,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39 default_hugepagesz=1G nmi_watchdog=0 audit=0 mce=off processor.max_cstate=1 idle=poll intel_idle.max_cstate=0 nosmt tsc=reliable I1204 08:15:02.748763 1 main.go:89] Environment information: kernel version 4.18.0-193.31.1.rt13.81.el8_2.ocptestv2.x86_64 I1204 08:15:02.748781 1 main.go:52] Running the oslat command with arguments [--runtime 10 --rtprio 1 --cpu-list 4,6,8,10,12,14,16,18,20,22,24,26 --cpu-main-thread 2] I1204 08:15:13.850656 1 main.go:58] Succeeded to run the oslat command: Version: 1.90 Total runtime: 10 seconds Thread priority: SCHED_FIFO:1 CPU list: 4,6,8,10,12,14,16,18,20,22,24,26 CPU for main thread: 2 Workload: no Workload mem: 0 (KiB) Preheat cores: 12 Pre-heat for 1 seconds... Test starts... Test completed. Core: 4 6 8 10 12 14 16 18 20 22 24 26 CPU Freq: 2493 2493 2493 2493 2493 2493 2493 2493 2493 2493 2493 2493 (Mhz) 001 (us): 377237726 377235726 379730454 381527298 381500295 379760223 377375690 381527971 381478523 377253998 381391212 381512049 002 (us): 6977 7694 8727 8724 8676 8980 8282 8412 8390 8533 8728 8458 003 (us): 3165 2458 1435 1438 1489 1179 1874 1739 1757 1633 1436 1700 004 (us): 20 15 4 4 1 6 10 14 18 1 3 8 005 (us): 0 0 0 0 0 0 0 10 6 0 0 0 006 (us): 0 0 0 0 0 0 0 3 10 0 0 0 007 (us): 0 0 0 0 0 0 0 20 37 0 0 0 008 (us): 0 0 0 0 0 0 0 5 17 0 0 0 009 (us): 0 0 0 0 0 0 0 0 2 0 0 0 010 (us): 0 0 0 0 0 0 0 2 0 0 0 0 011 (us): 0 0 0 0 0 0 0 0 0 0 0 0 012 (us): 0 0 0 0 0 0 0 0 1 0 0 0 013 (us): 0 0 0 0 0 0 0 0 0 0 0 0 014 (us): 0 0 0 0 0 0 0 0 1 0 0 0 015 (us): 0 0 0 0 0 0 0 0 0 0 0 0 016 (us): 0 0 0 0 0 0 0 1 0 0 0 0 017 (us): 0 0 0 0 0 0 0 0 0 0 0 0 018 (us): 0 0 0 0 0 0 0 0 0 0 0 0 019 (us): 0 0 0 0 0 0 0 0 0 0 0 0 020 (us): 0 0 0 0 0 0 0 0 0 0 0 0 021 (us): 0 0 0 0 0 0 0 0 0 0 0 0 022 (us): 0 0 0 0 0 0 0 0 0 0 0 0 023 (us): 0 0 0 0 0 0 0 1 0 0 0 0 024 (us): 0 0 0 0 0 0 0 0 0 0 0 0 025 (us): 0 0 0 0 0 0 0 0 0 0 0 0 026 (us): 0 0 0 0 0 0 0 0 0 0 0 0 027 (us): 0 0 0 0 0 0 0 0 0 0 0 0 028 (us): 0 0 0 0 0 0 0 0 0 0 0 0 029 (us): 0 0 0 0 0 0 0 0 0 0 0 0 030 (us): 0 0 0 0 0 0 0 0 0 0 0 0 031 (us): 0 0 0 0 0 0 0 0 0 0 0 0 032 (us): 0 0 0 0 0 0 0 0 0 0 0 0 (including overflows) Minimum: 1 1 1 1 1 1 1 1 1 1 1 1 (us) Average: 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 (us) Maximum: 4 4 4 4 4 4 4 23 14 4 4 4 (us) Max-Min: 3 3 3 3 3 3 3 22 13 3 3 3 (us) Duration: 10.001 10.001 10.001 10.001 10.001 10.001 10.001 10.001 10.001 10.001 10.001 10.001 (sec) ------------------------------ • [SLOW TEST:52.186 seconds] [performance] Latency Test /go/src/github.com/openshift-kni/cnf-features-deploy/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:75 with the oslat image /go/src/github.com/openshift-kni/cnf-features-deploy/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:126 should succeed /go/src/github.com/openshift-kni/cnf-features-deploy/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:127 ------------------------------ SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS JUnit report was created: /junit.xml/cnftests-junit.xml Ran 1 of 79 Specs in 76.357 seconds SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 78 Skipped PASS Additional info: --- Additional comment from shajmakh on 2021-07-06 12:05:18 UTC --- Detailed versions : - oc version Client Version: 4.8.0-0.nightly-2021-06-22-192915 Server Version: 4.8.0-0.nightly-2021-06-29-033219 Kubernetes Version: v1.21.0-rc.0+766a5fe - oc get csv NAME DISPLAY VERSION REPLACES PHASE performance-addon-operator.v4.8.0 Performance Addon Operator 4.8.0 Succeeded [root@ocp-edge41 ~]#
Bug verification ================== Run the test using the updated cnf-tests image "cnf-tests-container-v4.7.5-7": podman run -v $KUBECONFIG:/root/kubeconfig:Z -e KUBECONFIG=/root/kubeconfig -e IMAGE_REGISTRY=registry-proxy.engineering.redhat.com/rh-osbs -e CNF_TESTS_IMAGE=openshift4-cnf-tests:v4.7.5-7 -e PERF_TEST_PROFILE=performance -e ROLE_WORKER_CNF=worker-cnf -e LATENCY_TEST_RUN=true -e LATENCY_TEST_RUNTIME=20 -e OSLAT_MAXIMUM_LATENCY=20 -e DISCOVERY_MODE=true registry-proxy.engineering.redhat.com/rh-osbs/openshift4-cnf-tests:v4.7.5-7 usr/bin/test-run.sh -ginkgo.focus="\[performance\]\ Latency\ Test" Test result: ============ I0802 10:52:34.644938 1 main.go:52] Running the oslat command with arguments [--duration 20 --rtprio 1 --cpu-list 4 --cpu-main-thread 3] I0802 10:52:55.667521 1 main.go:58] Succeeded to run the oslat command: oslat V 1.10 ... 032 (us): 30 (including overflows) Minimum: 1 (us) Average: 1.001 (us) Maximum: 389 (us) Max-Min: 388 (us) Duration: 19.988 (sec) ------------------------------ • [SLOW TEST:62.901 seconds] [performance] Latency Test /remote-source/app/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:75 with the oslat image /remote-source/app/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:126 should succeed /remote-source/app/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:127 ------------------------------ S JUnit report was created: /junit.xml/cnftests-junit.xml Ran 1 of 89 Specs in 84.175 seconds SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 88 Skipped PASS =================================================================== Summary : As one can notice from the above output, the test ran properly on the expected oslat version (1.10) & oslat command was executed successfully.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Updated Red Hat OpenShift Enterprise ASYNC Stream container images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2987