Bug 2051540 - cnf-tests - Add mainaffinity arg to cyclictest runner
Summary: cnf-tests - Add mainaffinity arg to cyclictest runner
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: CNF Platform Validation
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.11.0
Assignee: Talor Itzhak
QA Contact: Nikita
URL:
Whiteboard:
Depends On:
Blocks: 2064585
TreeView+ depends on / blocked
 
Reported: 2022-02-07 13:02 UTC by Talor Itzhak
Modified: 2022-08-26 15:19 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: CNF tests cyclictest runner should provide the --mainaffinity argument which tells the binary on which thread it should run. Consequence: The cyclictest runner was missing the --mainaffinity argument Fix: Added the --mainaffinity argument to the cyclictest runner Result: --mainaffinity argument passed to cyclictest command
Clone Of:
Environment:
Last Closed: 2022-08-26 15:19:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift-kni cnf-features-deploy pull 1043 0 None open Bug 2051540: cnf-tests:cyclictest: remove smi option 2022-03-28 14:20:49 UTC
Github openshift-kni cnf-features-deploy pull 971 0 None open Bug 2051540: cnf-tests: filter main CPU's thread siblings under the cyclic test 2022-02-16 14:01:21 UTC

Description Talor Itzhak 2022-02-07 13:02:31 UTC
Description of problem:

CNF tests cyclictest runner should provide the --mainaffinity argument which tells the binary on which thread it should run. 
In addition, it should exclude the sibling thread of the main thread from the CPU list in order to avoid noisy neighbor. 

Version-Release number of selected component (if applicable):
N/A

How reproducible:
100%

Steps to Reproduce:
podman run --name cnf-container-tests  \
  --net=host  \
  -v /home/kni/cnf_tests_mcornea:/kubeconfig:Z  \
  -e KUBECONFIG=/kubeconfig/kubeconfig \
  -e IMAGE_REGISTRY=registry.kni-qe-0.lab.eng.rdu2.redhat.com:5000/ \
  -e CNF_TESTS_IMAGE=cnf-tests \
  -e LATENCY_TEST_RUN=true \
  -e LATENCY_TEST_RUNTIME=3600 \
  -e LATENCY_TEST_CPUS=16 \
  -e OSLAT_MAXIMUM_LATENCY=10 \
  -e CYCLICTEST_MAXIMUM_LATENCY=20 \
  -e HWLATDETECT_MAXIMUM_LATENCY=10 \
  -e ROLE_WORKER_CNF=master \
  -e PERF_TEST_PROFILE=openshift-node-performance-profile \
  -e DISCOVERY_MODE=true \
  quay.io/openshift-kni/cnf-tests \
  /usr/bin/test-run.sh  \
  -ginkgo.focus="\[performance\]\[config\]|\[performance\]\ Latency\ Test\ with\ the\ cyclictest" \
  --junit /junit -ginkgo.v

Actual results:
running the cyclictest command with arguments [-D 3600 -p 1 -t 16 -a 2-9,26-33 -h 30 -i 1000 -m --quiet]

Expected results:
--mainaffinity argument passed to cyclictest command

Additional info:

Comment 4 Shereen Haj Makhoul 2022-03-24 15:55:00 UTC
Verification:

Version:
snf-tests:
PAO: 
OCP: 

Steps:
=====
podman run  -v $KUBECONFIG:/root/kubeconfig:Z -e KUBECONFIG=/root/kubeconfig -e IMAGE_REGISTRY=registry-proxy.engineering.redhat.com/rh-osbs -e CNF_TESTS_IMAGE=openshift4-cnf-tests:v4.11.0-7 -e PERF_TEST_PROFILE=performance -e ROLE_WORKER_CNF=worker-cnf -e LATENCY_TEST_RUN=true -e LATENCY_TEST_RUNTIME=180 -e MAXIMUM_LATENCY=2000000 -e DISCOVERY_MODE=true registry-proxy.engineering.redhat.com/rh-osbs/openshift4-cnf-tests:v4.11.0-7 usr/bin/test-run.sh -ginkgo.focus="Latency\ Test" -ginkgo.v


Output:
=======
[root@ocp-edge41 ~]# oc logs cyclictest-rkm2j 
I0324 15:47:28.479536       1 node.go:39] Environment information: /proc/cmdline: BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-79e8986e248003d39a9f173ce26b2312789ea61a26ea4c31dfc883c2fd2039c7/vmlinuz-4.18.0-305.40.2.rt7.113.el8_4.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ignition.platform.id=qemu ostree=/ostree/boot.1/rhcos/79e8986e248003d39a9f173ce26b2312789ea61a26ea4c31dfc883c2fd2039c7/0 root=UUID=db85d03c-0f10-4d1b-bd85-272498f3a22a rw rootflags=prjquota boot=UUID=c12817f7-bf16-4581-beb9-b5c16ff447ef skew_tick=1 nohz=on rcu_nocbs=3-6 tuned.non_isolcpus=ffffff87 intel_pstate=disable nosoftlockup tsc=nowatchdog intel_iommu=on iommu=pt isolcpus=managed_irq,3-6 systemd.cpu_affinity=0,1,2,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31 + +
I0324 15:47:28.479858       1 node.go:46] Environment information: kernel version 4.18.0-305.40.2.rt7.113.el8_4.x86_64
I0324 15:47:28.479895       1 main.go:64] running the cyclictest command with arguments [--duration 180 --priority 95 --threads 2 --affinity 4-5 --histogram 30 --interval 1000 --mlockall --mainaffinity 3 --smi --quiet]

mainaffinity is passed with value 3. Verified successfully.

Comment 5 Shereen Haj Makhoul 2022-03-24 15:56:44 UTC
Above verification was conducted on

cnf-tests: registry-proxy.engineering.redhat.com/rh-osbs/openshift4-cnf-tests:v4.11.0-7 
PAO: 4.10.1
OCP: 4.10.6

Comment 6 Bart Wensley 2022-03-24 19:29:00 UTC
I just found out that cyclictest does not yet support the --smi option for newer processors (e.g. ice lake). It fails with this error:
FATAL: SMI counter is not supported on this processor

This will require an update to the rt-tests package to support newer processor models. Until that is done, the change to add --smi should probably be backed out, or this test won't work on ice lake processors.

Comment 7 Shereen Haj Makhoul 2022-04-18 15:01:34 UTC
Verification on Ice Lake Intel processer:

Cluster: SNO with Intel 6338N CPU
cnf-tests: registry-proxy.engineering.redhat.com/rh-osbs/openshift4-cnf-tests:v4.11.0-12
OCP: 4.10.8
PAO: 4.10.2
(registry.redhat.io/openshift4/performance-addon-rhel8-operator@sha256:6aae6c329965efb2d83c3aa2a311db7c77a69a3d4853c51b2002646d6b7859f2)

Steps:
run the image focusing on the cyclictest:
podman run  -v $KUBECONFIG:/root/kubeconfig:Z --net=host -e KUBECONFIG=/root/kubeconfig -e IMAGE_REGISTRY=registry.kni-qe-0.lab.eng.rdu2.redhat.com:5000/rh-osbs/ -e CNF_TESTS_IMAGE=openshift4-cnf-tests:v4.11.0-12 -e PERF_TEST_PROFILE=openshift-node-performance-profile -e ROLE_WORKER_CNF=master -e LATENCY_TEST_RUN=true -e LATENCY_TEST_RUNTIME=10 -e MAXIMUM_LATENCY=2000000 -e DISCOVERY_MODE=true registry.kni-qe-0.lab.eng.rdu2.redhat.com:5000/rh-osbs/openshift4-cnf-tests:v4.11.0-12 usr/bin/test-run.sh -ginkgo.focus="with\ the\ cyclictest\ image" -ginkgo.v

(Note that the cluster on which this was verified is disconnected ipv6, so needed to mirror the image into this registry that is reachable from the cluster: registry.kni-qe-0.lab.eng.rdu2.redhat.com:5000/rh-osbs:
oc image mirror registry-proxy.engineering.redhat.com/rh-osbs/openshift4-cnf-tests:v4.11.0-12 registry.kni-qe-0.lab.eng.rdu2.redhat.com:5000/rh-osbs/openshift4-cnf-tests:v4.11.0-12)

cyclictest output:

Running on machine: cyclictest-w2zz2
Binary: Built with gc go1.17.5 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0418 14:44:55.107721       1 node.go:39] Environment information: /proc/cmdline: BOOT_IMAGE=(hd5,gpt3)/ostree/rhcos-f8d78958809527530cd0bb680a8223427e36991881224da64750edb085470033/vmlinuz-4.18.0-305.40.2.rt7.113.el8_4.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ignition.platform.id=metal ostree=/ostree/boot.1/rhcos/f8d78958809527530cd0bb680a8223427e36991881224da64750edb085470033/0 ip=ens1f0:dhcp6 root=UUID=ecb1e9b3-17ef-4f9f-83b6-73b7c9761d92 rw rootflags=prjquota crashkernel=256M skew_tick=1 nohz=on rcu_nocbs=2-31,34-63 tuned.non_isolcpus=00000003,00000003 intel_pstate=disable nosoftlockup tsc=nowatchdog intel_iommu=on iommu=pt isolcpus=managed_irq,2-31,34-63 systemd.cpu_affinity=0,1,32,33 default_hugepagesz=1G hugepagesz=1G hugepages=32 idle=poll rcupdate.rcu_normal_after_boot=0 nohz_full=2-31,34-63 intel_iommu=on iommu=pt
I0418 14:44:55.107935       1 node.go:46] Environment information: kernel version 4.18.0-305.40.2.rt7.113.el8_4.x86_64
I0418 14:44:55.107977       1 main.go:63] running the cyclictest command with arguments [--duration 10 --priority 95 --threads 57 --affinity 3-31,35-62 --histogram 30 --interval 1000 --mlockall --mainaffinity 2 --quiet]               <-------------------------------
I0418 14:45:05.204696       1 main.go:69] succeeded to run the cyclictest command: # /dev/cpu_dma_latency set to 0us



From the above output, mainaffinity is now available and since we are running on an ice lake processer, the smi flag is no longer passed.


Note You need to log in before you can comment on or make changes to this bug.