Bug 2009660
| Summary: | cyclictest run missing -m argument | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Marius Cornea <mcornea> |
| Component: | CNF Platform Validation | Assignee: | Federico Paolinelli <fpaoline> |
| Status: | CLOSED ERRATA | QA Contact: | Nikita <nkononov> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.9 | CC: | aos-bugs, dgonyier, titzhak |
| Target Milestone: | --- | ||
| Target Release: | 4.10.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | cnf-tests-container-v4.10.0-43 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-03-10 19:34:25 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 2020931 | ||
Note: Note: Bug environment appears to be SNO cluster. Verified fix on normal cluster (virtual).
--mlockall parameter added to cyclic test command line.
### OCP version
[root@ocp-edge52 performance]# oc version
Client Version: 4.10.0-fc.3
Server Version: 4.10.0-fc.3
Kubernetes Version: v1.23.0+2135ac2
### PAO build
oc describe -n openshift-performance-addon-operator pod/performance-operator-5f87f8d694-fjbdq | grep sha256
registry.redhat.io/openshift4/performance-addon-rhel8-operator@sha256:f8cbad44f91f5bfeb285ce7daa4e61dffc2500a11983861b6911f2b6d2f19abf
Image: registry.redhat.io/openshift4/performance-addon-rhel8-operator@sha256:f8cbad44f91f5bfeb285ce7daa4e61dffc2500a11983861b6911f2b6d2f19abf
[root@ocp-edge52 performance]# skopeo inspect --tls-verify=false docker://registry-proxy.engineering.redhat.com/rh-osbs/openshift4-performance-addon-rhel8-operator@sha256:f8cbad44f91f5bfeb285ce7daa4e61dffc2500a11983861b6911f2b6d2f19abf | grep '"Digest"\|"url"'
"Digest": "sha256:f8cbad44f91f5bfeb285ce7daa4e61dffc2500a11983861b6911f2b6d2f19abf",
"url": "https://access.redhat.com/containers/#/registry.access.redhat.com/openshift4/performance-addon-rhel8-operator/images/v4.10.0-27",
### cnf-test build
openshift4-cnf-tests:v4.10.0-43
### podman command
[root@ocp-edge52 performance]# cat ./cyclic_alt.sh
podman container rm cnf-container-tests
podman run --name cnf-container-tests \
--net=host \
-v /root/.kcli/clusters/ocp410/auth:/kubeconfig:Z \
-v /tmp/junit:/junit:Z \
-v /tmp/report:/report:Z \
-e KUBECONFIG=/kubeconfig/kubeconfig \
-e IMAGE_REGISTRY=registry-proxy.engineering.redhat.com/rh-osbs/ \
-e CNF_TESTS_IMAGE=openshift4-cnf-tests:v4.10.0-43 \
-e LATENCY_TEST_RUN=true \
-e LATENCY_TEST_RUNTIME=120 \
-e LATENCY_TEST_CPUS=2 \
-e OSLAT_MAXIMUM_LATENCY=3000 \
-e LATENCY_TEST_DELAY=65 \
-e CYCLICTEST_MAXIMUM_LATENCY=5200 \
-e HWLATDETECT_MAXIMUM_LATENCY=5300 \
-e ROLE_WORKER_CNF=worker-cnf \
-e PERF_TEST_PROFILE=performance \
-e DISCOVERY_MODE=true \
registry-proxy.engineering.redhat.com/rh-osbs/openshift4-cnf-tests:v4.10.0-43 \
/usr/bin/test-run.sh \
-ginkgo.focus="\[performance\]\ Latency\ Test" \
--junit /junit -ginkgo.v
#you may replace -ginkgo.focus value with \[performance\]\ Latency\ Test, to
#execute all the tools since hwlatdetec had the same bug,
#-ginkgo.focus="cyclictest" -ginkgo.v
### Run latency tests on normal cluster (virtual)
[root@ocp-edge52 performance]# time bash ./cyclic_alt.sh
a05523f7e37bb30791819979dbca962154536d4b2ac8ac4846c9341a3e5197bf
running /usr/bin/validationsuite -ginkgo.focus=\[performance\]\ Latency\ Test --junit /junit -ginkgo.v
I0202 17:43:05.978463 8 request.go:668] Waited for 1.044139781s due to client-side throttling, not priority and fairness, request: GET:https://api.ocp410.demo.lab.dgonyier:6443/apis/cloudcredential.openshift.io/v1?timeout=32s
Running Suite: CNF Features e2e validation
==========================================
Random Seed: 1643823784
Will run 0 of 48 specs
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
JUnit report was created: /junit/validation_junit.xml
Ran 0 of 48 Specs in 0.000 seconds
SUCCESS! -- 0 Passed | 0 Failed | 0 Pending | 48 Skipped
PASS
You're using deprecated Ginkgo functionality:
=============================================
Ginkgo 2.0 is under active development and will introduce (a small number of) breaking changes.
To learn more, view the migration guide at https://github.com/onsi/ginkgo/blob/v2/docs/MIGRATING_TO_V2.md
To comment, chime in at https://github.com/onsi/ginkgo/issues/711
You are using a custom reporter. Support for custom reporters will likely be removed in V2. Most users were using them to generate junit or teamcity reports and this functionality will be merged into the core reporter. In addition, Ginkgo 2.0 will support emitting a JSON-formatted report that users can then manipulate to generate custom reports.
If this change will be impactful to you please leave a comment on https://github.com/onsi/ginkgo/issues/711
Learn more at: https://github.com/onsi/ginkgo/blob/v2/docs/MIGRATING_TO_V2.md#removed-custom-reporters
To silence deprecations that can be silenced set the following environment variable:
ACK_GINKGO_DEPRECATIONS=1.16.4
Discovery mode enabled, skipping setup
running /usr/bin/cnftests -ginkgo.focus=\[performance\]\ Latency\ Test --junit /junit -ginkgo.v
I0202 17:43:08.697027 44 request.go:668] Waited for 1.042978123s due to client-side throttling, not priority and fairness, request: GET:https://api.ocp410.demo.lab.dgonyier:6443/apis/network.openshift.io/v1?timeout=32s
Running Suite: CNF Features e2e integration tests
=================================================
Random Seed: 1643823787
Will run 3 of 157 specs
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
------------------------------
[performance] Latency Test with the oslat image
should succeed
/remote-source/app/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:153
STEP: Waiting two minutes to download the latencyTest image
STEP: Checking actual CPUs number for the running pod
STEP: Waiting another two minutes to give enough time for the cluster to move the pod to Succeeded phase
Feb 2 17:46:28.762: [INFO]: found mcd machine-config-daemon-hh8xd for node ocp410-worker-0.demo.lab.dgonyier
Feb 2 17:46:29.621: [INFO]: found mcd machine-config-daemon-hh8xd for node ocp410-worker-0.demo.lab.dgonyier
Feb 2 17:46:29.830: [INFO]: found mcd machine-config-daemon-hh8xd for node ocp410-worker-0.demo.lab.dgonyier
Log file created at: 2022/02/02 17:43:21
Running on machine: oslat-2f7dg
Binary: Built with gc go1.16.6 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0202 17:43:21.563413 1 node.go:37] Environment information: /proc/cmdline: BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-769653a88b3fa5277f7ee22aa792f3130a868aa89527e2666bd517a9ebd50598/vmlinuz-4.18.0-305.34.2.rt7.107.el8_4.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ignition.platform.id=qemu ostree=/ostree/boot.1/rhcos/769653a88b3fa5277f7ee22aa792f3130a868aa89527e2666bd517a9ebd50598/0 root=UUID=43fee01c-54ae-4c3c-be5d-a47a6904dbba rw rootflags=prjquota boot=UUID=65c2f6bd-6a74-495e-8cc1-4c14a2619f5c skew_tick=1 nohz=on rcu_nocbs=1-3 tuned.non_isolcpus=00000ff1 intel_pstate=disable nosoftlockup tsc=nowatchdog intel_iommu=on iommu=pt isolcpus=managed_irq,1-3 systemd.cpu_affinity=0,4,5,6,7,8,9,10,11 default_hugepagesz=1G hugepagesz=1G hugepages=1 +
I0202 17:43:21.563708 1 node.go:44] Environment information: kernel version 4.18.0-305.34.2.rt7.107.el8_4.x86_64
I0202 17:44:26.564121 1 main.go:53] Running the oslat command with arguments [--duration 120 --rtprio 1 --cpu-list 2 --cpu-main-thread 1]
I0202 17:46:27.633822 1 main.go:59] Succeeded to run the oslat command: oslat V 2.10
Total runtime: 120 seconds
Thread priority: SCHED_FIFO:1
CPU list: 2
CPU for main thread: 1
Workload: no
Workload mem: 0 (KiB)
Preheat cores: 1
Pre-heat for 1 seconds...
Test starts...
Test completed.
Core: 2
CPU Freq: 2096 (Mhz)
001 (us): 3877192008
002 (us): 23084
003 (us): 4934
004 (us): 63506
005 (us): 52065
006 (us): 68939
007 (us): 37551
008 (us): 7931
009 (us): 1720
010 (us): 539
011 (us): 27897
012 (us): 11267
013 (us): 8881
014 (us): 222
015 (us): 259
016 (us): 39414
017 (us): 8547
018 (us): 1062
019 (us): 123
020 (us): 182
021 (us): 378
022 (us): 298
023 (us): 100
024 (us): 24
025 (us): 16
026 (us): 9
027 (us): 9
028 (us): 9
029 (us): 6
030 (us): 4
031 (us): 6
032 (us): 18 (including overflows)
Minimum: 1 (us)
Average: 1.001 (us)
Maximum: 1052 (us)
Max-Min: 1051 (us)
Duration: 119.941 (sec)
Feb 2 17:46:30.008: [INFO]: found mcd machine-config-daemon-hh8xd for node ocp410-worker-0.demo.lab.dgonyier
Feb 2 17:47:30.517: [ERROR]: timed out waiting for the condition
• [SLOW TEST:258.650 seconds]
[performance] Latency Test
/remote-source/app/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:62
with the oslat image
/remote-source/app/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:128
should succeed
/remote-source/app/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:153
------------------------------
[performance] Latency Test with the cyclictest image
should succeed
/remote-source/app/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:197
STEP: Waiting two minutes to download the latencyTest image
STEP: Checking actual CPUs number for the running pod
STEP: Waiting another two minutes to give enough time for the cluster to move the pod to Succeeded phase
Feb 2 17:50:43.012: [INFO]: found mcd machine-config-daemon-hh8xd for node ocp410-worker-0.demo.lab.dgonyier
Feb 2 17:50:43.186: [INFO]: found mcd machine-config-daemon-hh8xd for node ocp410-worker-0.demo.lab.dgonyier
Feb 2 17:50:43.430: [INFO]: found mcd machine-config-daemon-hh8xd for node ocp410-worker-0.demo.lab.dgonyier
Log file created at: 2022/02/02 17:47:35
Running on machine: cyclictest-c8svn
Binary: Built with gc go1.16.6 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0202 17:47:35.180137 1 node.go:37] Environment information: /proc/cmdline: BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-769653a88b3fa5277f7ee22aa792f3130a868aa89527e2666bd517a9ebd50598/vmlinuz-4.18.0-305.34.2.rt7.107.el8_4.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ignition.platform.id=qemu ostree=/ostree/boot.1/rhcos/769653a88b3fa5277f7ee22aa792f3130a868aa89527e2666bd517a9ebd50598/0 root=UUID=43fee01c-54ae-4c3c-be5d-a47a6904dbba rw rootflags=prjquota boot=UUID=65c2f6bd-6a74-495e-8cc1-4c14a2619f5c skew_tick=1 nohz=on rcu_nocbs=1-3 tuned.non_isolcpus=00000ff1 intel_pstate=disable nosoftlockup tsc=nowatchdog intel_iommu=on iommu=pt isolcpus=managed_irq,1-3 systemd.cpu_affinity=0,4,5,6,7,8,9,10,11 default_hugepagesz=1G hugepagesz=1G hugepages=1 +
I0202 17:47:35.180373 1 node.go:44] Environment information: kernel version 4.18.0-305.34.2.rt7.107.el8_4.x86_64
I0202 17:48:40.181339 1 main.go:51] running the cyclictest command with arguments [-D 120 -p 95 -t 2 -a 1-2 -h 30 -i 1000 --mlockall --quiet]
I0202 17:50:40.785066 1 main.go:57] succeeded to run the cyclictest command: # /dev/cpu_dma_latency set to 0us
# Histogram
000000 000000 000000
000001 000000 000000
000002 000000 000000
000003 000000 000000
000004 000007 000000
000005 000074 000000
000006 000183 000000
000007 000272 000003
000008 000306 000015
000009 000335 000042
000010 000310 000082
000011 000332 000146
000012 000481 000177
000013 000488 000187
000014 000518 000202
000015 000604 000245
000016 000611 000272
000017 000626 000298
000018 000652 000330
000019 000683 000398
000020 000637 000414
000021 000658 000345
000022 000725 000431
000023 000723 000422
000024 000694 000482
000025 000704 000508
000026 000764 000450
000027 001055 000431
000028 002831 000414
000029 007397 000424
# Total: 000022670 000006718
# Min Latencies: 00004 00007
# Avg Latencies: 00033 00054
# Max Latencies: 04823 04966
# Histogram Overflows: 97316 113269
# Histogram Overflow at cycle number:
# Thread 0: 00014 00015 00018 00019 00020 00021 00022 00023 00024 00025 00026 00027 00028 00029 00030 00031 00032 00033 00034 00035 00036 00037 00038 00039 00041 00042 00043 00044 00045 00046 # 97286 others
# Thread 1: 00000 00001 00002 00003 00004 00005 00006 00007 00008 00009 00010 00011 00012 00013 00014 00015 00016 00017 00018 00019 00020 00021 00022 00023 00024 00025 00026 00027 00028 00029 # 113239 others
Feb 2 17:50:43.667: [INFO]: found mcd machine-config-daemon-hh8xd for node ocp410-worker-0.demo.lab.dgonyier
Feb 2 17:51:44.380: [ERROR]: timed out waiting for the condition
• [SLOW TEST:253.289 seconds]
[performance] Latency Test
/remote-source/app/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:62
with the cyclictest image
/remote-source/app/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:185
should succeed
/remote-source/app/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:197
------------------------------
[performance] Latency Test with the hwlatdetect image
should succeed
/remote-source/app/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:236
STEP: Waiting two minutes to download the latencyTest image
STEP: Checking actual CPUs number for the running pod
STEP: Waiting another two minutes to give enough time for the cluster to move the pod to Succeeded phase
Feb 2 17:54:59.376: [INFO]: found mcd machine-config-daemon-hh8xd for node ocp410-worker-0.demo.lab.dgonyier
Feb 2 17:54:59.605: [INFO]: found mcd machine-config-daemon-hh8xd for node ocp410-worker-0.demo.lab.dgonyier
Log file created at: 2022/02/02 17:51:48
Running on machine: hwlatdetect-vdq87
Binary: Built with gc go1.16.6 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0202 17:51:48.525040 1 node.go:37] Environment information: /proc/cmdline: BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-769653a88b3fa5277f7ee22aa792f3130a868aa89527e2666bd517a9ebd50598/vmlinuz-4.18.0-305.34.2.rt7.107.el8_4.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ignition.platform.id=qemu ostree=/ostree/boot.1/rhcos/769653a88b3fa5277f7ee22aa792f3130a868aa89527e2666bd517a9ebd50598/0 root=UUID=43fee01c-54ae-4c3c-be5d-a47a6904dbba rw rootflags=prjquota boot=UUID=65c2f6bd-6a74-495e-8cc1-4c14a2619f5c skew_tick=1 nohz=on rcu_nocbs=1-3 tuned.non_isolcpus=00000ff1 intel_pstate=disable nosoftlockup tsc=nowatchdog intel_iommu=on iommu=pt isolcpus=managed_irq,1-3 systemd.cpu_affinity=0,4,5,6,7,8,9,10,11 default_hugepagesz=1G hugepagesz=1G hugepages=1 +
I0202 17:51:48.525670 1 node.go:44] Environment information: kernel version 4.18.0-305.34.2.rt7.107.el8_4.x86_64
I0202 17:52:53.526113 1 main.go:50] running the hwlatdetect command with arguments [/usr/bin/hwlatdetect --threshold 5300 --hardlimit 5300 --duration 120 --window 10000000us --width 950000us]
I0202 17:54:56.786751 1 main.go:56] succeeded to run the hwlatdetect command: hwlatdetect: test duration 120 seconds
detector: tracer
parameters:
Latency threshold: 5300us
Sample window: 10000000us
Sample width: 950000us
Non-sampling period: 9050000us
Output File: None
Starting test
test finished
Max Latency: Below threshold
Samples recorded: 0
Samples exceeding threshold: 0
Feb 2 17:54:59.778: [INFO]: found mcd machine-config-daemon-hh8xd for node ocp410-worker-0.demo.lab.dgonyier
Feb 2 17:56:00.268: [ERROR]: timed out waiting for the condition
• [SLOW TEST:256.151 seconds]
[performance] Latency Test
/remote-source/app/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:62
with the hwlatdetect image
/remote-source/app/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:228
should succeed
/remote-source/app/vendor/github.com/openshift-kni/performance-addon-operators/functests/4_latency/latency.go:236
------------------------------
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSI0202 17:56:28.561846 44 request.go:668] Waited for 1.156106585s due to client-side throttling, not priority and fairness, request: GET:https://api.ocp410.demo.lab.dgonyier:6443/apis/operator.openshift.io/v1alpha1?timeout=32s
I0202 17:56:38.761896 44 request.go:668] Waited for 11.356020753s due to client-side throttling, not priority and fairness, request: GET:https://api.ocp410.demo.lab.dgonyier:6443/apis/operators.coreos.com/v2?timeout=32s
JUnit report was created: /junit/cnftests-junit.xml
Ran 3 of 157 Specs in 839.672 seconds
SUCCESS! -- 3 Passed | 0 Failed | 0 Pending | 154 Skipped
PASS
You're using deprecated Ginkgo functionality:
=============================================
Ginkgo 2.0 is under active development and will introduce (a small number of) breaking changes.
To learn more, view the migration guide at https://github.com/onsi/ginkgo/blob/v2/docs/MIGRATING_TO_V2.md
To comment, chime in at https://github.com/onsi/ginkgo/issues/711
You are using a custom reporter. Support for custom reporters will likely be removed in V2. Most users were using them to generate junit or teamcity reports and this functionality will be merged into the core reporter. In addition, Ginkgo 2.0 will support emitting a JSON-formatted report that users can then manipulate to generate custom reports.
If this change will be impactful to you please leave a comment on https://github.com/onsi/ginkgo/issues/711
Learn more at: https://github.com/onsi/ginkgo/blob/v2/docs/MIGRATING_TO_V2.md#removed-custom-reporters
To silence deprecations that can be silenced set the following environment variable:
ACK_GINKGO_DEPRECATIONS=1.16.4
real 14m6.618s
user 0m1.774s
sys 0m0.956s
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.10 low-latency extras update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:0640 |
Description of problem: Currently when running cyclictest we are not passing -m argument which is important to prevent memory from getting paged out. Version-Release number of selected component (if applicable): "vcs-ref": "b46a47f7c4d3007b8150551c771cc8460fba9dfa", "vcs-type": "git", "vcs-url": "https://github.com/openshift-kni/cnf-features-deploy", How reproducible: 100% Steps to Reproduce: podman run --name cnf-container-tests \ --net=host \ -v /home/kni/cnf_tests_mcornea:/kubeconfig:Z \ -e KUBECONFIG=/kubeconfig/kubeconfig \ -e IMAGE_REGISTRY=registry.kni-qe-0.lab.eng.rdu2.redhat.com:5000/ \ -e CNF_TESTS_IMAGE=cnf-tests \ -e LATENCY_TEST_RUN=true \ -e LATENCY_TEST_RUNTIME=3600 \ -e LATENCY_TEST_CPUS=16 \ -e OSLAT_MAXIMUM_LATENCY=10 \ -e CYCLICTEST_MAXIMUM_LATENCY=20 \ -e HWLATDETECT_MAXIMUM_LATENCY=10 \ -e ROLE_WORKER_CNF=master \ -e PERF_TEST_PROFILE=openshift-node-performance-profile \ -e DISCOVERY_MODE=true \ quay.io/openshift-kni/cnf-tests \ /usr/bin/test-run.sh \ -ginkgo.focus="\[performance\]\[config\]|\[performance\]\ Latency\ Test\ with\ the\ cyclictest" \ --junit /junit -ginkgo.v Actual results: running the cyclictest command with arguments [-D 3600 -p 1 -t 16 -a 2-9,26-33 -h 30 -i 1000 --quiet] Expected results: -m argument passed to cyclictest command Additional info: