Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2051540

Summary: cnf-tests - Add mainaffinity arg to cyclictest runner
Product: OpenShift Container Platform Reporter: Talor Itzhak <titzhak>
Component: CNF Platform ValidationAssignee: Talor Itzhak <titzhak>
Status: CLOSED CURRENTRELEASE QA Contact: Nikita <nkononov>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.9CC: aos-bugs, bwensley, shajmakh, vlaad
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: CNF tests cyclictest runner should provide the --mainaffinity argument which tells the binary on which thread it should run. Consequence: The cyclictest runner was missing the --mainaffinity argument Fix: Added the --mainaffinity argument to the cyclictest runner Result: --mainaffinity argument passed to cyclictest command
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-26 15:19:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2064585    

Description Talor Itzhak 2022-02-07 13:02:31 UTC
Description of problem:

CNF tests cyclictest runner should provide the --mainaffinity argument which tells the binary on which thread it should run. 
In addition, it should exclude the sibling thread of the main thread from the CPU list in order to avoid noisy neighbor. 

Version-Release number of selected component (if applicable):
N/A

How reproducible:
100%

Steps to Reproduce:
podman run --name cnf-container-tests  \
  --net=host  \
  -v /home/kni/cnf_tests_mcornea:/kubeconfig:Z  \
  -e KUBECONFIG=/kubeconfig/kubeconfig \
  -e IMAGE_REGISTRY=registry.kni-qe-0.lab.eng.rdu2.redhat.com:5000/ \
  -e CNF_TESTS_IMAGE=cnf-tests \
  -e LATENCY_TEST_RUN=true \
  -e LATENCY_TEST_RUNTIME=3600 \
  -e LATENCY_TEST_CPUS=16 \
  -e OSLAT_MAXIMUM_LATENCY=10 \
  -e CYCLICTEST_MAXIMUM_LATENCY=20 \
  -e HWLATDETECT_MAXIMUM_LATENCY=10 \
  -e ROLE_WORKER_CNF=master \
  -e PERF_TEST_PROFILE=openshift-node-performance-profile \
  -e DISCOVERY_MODE=true \
  quay.io/openshift-kni/cnf-tests \
  /usr/bin/test-run.sh  \
  -ginkgo.focus="\[performance\]\[config\]|\[performance\]\ Latency\ Test\ with\ the\ cyclictest" \
  --junit /junit -ginkgo.v

Actual results:
running the cyclictest command with arguments [-D 3600 -p 1 -t 16 -a 2-9,26-33 -h 30 -i 1000 -m --quiet]

Expected results:
--mainaffinity argument passed to cyclictest command

Additional info:

Comment 4 Shereen Haj Makhoul 2022-03-24 15:55:00 UTC
Verification:

Version:
snf-tests:
PAO: 
OCP: 

Steps:
=====
podman run  -v $KUBECONFIG:/root/kubeconfig:Z -e KUBECONFIG=/root/kubeconfig -e IMAGE_REGISTRY=registry-proxy.engineering.redhat.com/rh-osbs -e CNF_TESTS_IMAGE=openshift4-cnf-tests:v4.11.0-7 -e PERF_TEST_PROFILE=performance -e ROLE_WORKER_CNF=worker-cnf -e LATENCY_TEST_RUN=true -e LATENCY_TEST_RUNTIME=180 -e MAXIMUM_LATENCY=2000000 -e DISCOVERY_MODE=true registry-proxy.engineering.redhat.com/rh-osbs/openshift4-cnf-tests:v4.11.0-7 usr/bin/test-run.sh -ginkgo.focus="Latency\ Test" -ginkgo.v


Output:
=======
[root@ocp-edge41 ~]# oc logs cyclictest-rkm2j 
I0324 15:47:28.479536       1 node.go:39] Environment information: /proc/cmdline: BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-79e8986e248003d39a9f173ce26b2312789ea61a26ea4c31dfc883c2fd2039c7/vmlinuz-4.18.0-305.40.2.rt7.113.el8_4.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ignition.platform.id=qemu ostree=/ostree/boot.1/rhcos/79e8986e248003d39a9f173ce26b2312789ea61a26ea4c31dfc883c2fd2039c7/0 root=UUID=db85d03c-0f10-4d1b-bd85-272498f3a22a rw rootflags=prjquota boot=UUID=c12817f7-bf16-4581-beb9-b5c16ff447ef skew_tick=1 nohz=on rcu_nocbs=3-6 tuned.non_isolcpus=ffffff87 intel_pstate=disable nosoftlockup tsc=nowatchdog intel_iommu=on iommu=pt isolcpus=managed_irq,3-6 systemd.cpu_affinity=0,1,2,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31 + +
I0324 15:47:28.479858       1 node.go:46] Environment information: kernel version 4.18.0-305.40.2.rt7.113.el8_4.x86_64
I0324 15:47:28.479895       1 main.go:64] running the cyclictest command with arguments [--duration 180 --priority 95 --threads 2 --affinity 4-5 --histogram 30 --interval 1000 --mlockall --mainaffinity 3 --smi --quiet]

mainaffinity is passed with value 3. Verified successfully.

Comment 5 Shereen Haj Makhoul 2022-03-24 15:56:44 UTC
Above verification was conducted on

cnf-tests: registry-proxy.engineering.redhat.com/rh-osbs/openshift4-cnf-tests:v4.11.0-7 
PAO: 4.10.1
OCP: 4.10.6

Comment 6 Bart Wensley 2022-03-24 19:29:00 UTC
I just found out that cyclictest does not yet support the --smi option for newer processors (e.g. ice lake). It fails with this error:
FATAL: SMI counter is not supported on this processor

This will require an update to the rt-tests package to support newer processor models. Until that is done, the change to add --smi should probably be backed out, or this test won't work on ice lake processors.

Comment 7 Shereen Haj Makhoul 2022-04-18 15:01:34 UTC
Verification on Ice Lake Intel processer:

Cluster: SNO with Intel 6338N CPU
cnf-tests: registry-proxy.engineering.redhat.com/rh-osbs/openshift4-cnf-tests:v4.11.0-12
OCP: 4.10.8
PAO: 4.10.2
(registry.redhat.io/openshift4/performance-addon-rhel8-operator@sha256:6aae6c329965efb2d83c3aa2a311db7c77a69a3d4853c51b2002646d6b7859f2)

Steps:
run the image focusing on the cyclictest:
podman run  -v $KUBECONFIG:/root/kubeconfig:Z --net=host -e KUBECONFIG=/root/kubeconfig -e IMAGE_REGISTRY=registry.kni-qe-0.lab.eng.rdu2.redhat.com:5000/rh-osbs/ -e CNF_TESTS_IMAGE=openshift4-cnf-tests:v4.11.0-12 -e PERF_TEST_PROFILE=openshift-node-performance-profile -e ROLE_WORKER_CNF=master -e LATENCY_TEST_RUN=true -e LATENCY_TEST_RUNTIME=10 -e MAXIMUM_LATENCY=2000000 -e DISCOVERY_MODE=true registry.kni-qe-0.lab.eng.rdu2.redhat.com:5000/rh-osbs/openshift4-cnf-tests:v4.11.0-12 usr/bin/test-run.sh -ginkgo.focus="with\ the\ cyclictest\ image" -ginkgo.v

(Note that the cluster on which this was verified is disconnected ipv6, so needed to mirror the image into this registry that is reachable from the cluster: registry.kni-qe-0.lab.eng.rdu2.redhat.com:5000/rh-osbs:
oc image mirror registry-proxy.engineering.redhat.com/rh-osbs/openshift4-cnf-tests:v4.11.0-12 registry.kni-qe-0.lab.eng.rdu2.redhat.com:5000/rh-osbs/openshift4-cnf-tests:v4.11.0-12)

cyclictest output:

Running on machine: cyclictest-w2zz2
Binary: Built with gc go1.17.5 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0418 14:44:55.107721       1 node.go:39] Environment information: /proc/cmdline: BOOT_IMAGE=(hd5,gpt3)/ostree/rhcos-f8d78958809527530cd0bb680a8223427e36991881224da64750edb085470033/vmlinuz-4.18.0-305.40.2.rt7.113.el8_4.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ignition.platform.id=metal ostree=/ostree/boot.1/rhcos/f8d78958809527530cd0bb680a8223427e36991881224da64750edb085470033/0 ip=ens1f0:dhcp6 root=UUID=ecb1e9b3-17ef-4f9f-83b6-73b7c9761d92 rw rootflags=prjquota crashkernel=256M skew_tick=1 nohz=on rcu_nocbs=2-31,34-63 tuned.non_isolcpus=00000003,00000003 intel_pstate=disable nosoftlockup tsc=nowatchdog intel_iommu=on iommu=pt isolcpus=managed_irq,2-31,34-63 systemd.cpu_affinity=0,1,32,33 default_hugepagesz=1G hugepagesz=1G hugepages=32 idle=poll rcupdate.rcu_normal_after_boot=0 nohz_full=2-31,34-63 intel_iommu=on iommu=pt
I0418 14:44:55.107935       1 node.go:46] Environment information: kernel version 4.18.0-305.40.2.rt7.113.el8_4.x86_64
I0418 14:44:55.107977       1 main.go:63] running the cyclictest command with arguments [--duration 10 --priority 95 --threads 57 --affinity 3-31,35-62 --histogram 30 --interval 1000 --mlockall --mainaffinity 2 --quiet]               <-------------------------------
I0418 14:45:05.204696       1 main.go:69] succeeded to run the cyclictest command: # /dev/cpu_dma_latency set to 0us



From the above output, mainaffinity is now available and since we are running on an ice lake processer, the smi flag is no longer passed.