Description of problem: When creating several pods with the following: "annotations": { "irq-load-balancing.crio.io": "disable", "cpu-quota.crio.io": "disable" } ... "spec": { "runtimeClassName": "performance-performance" ... "resources": { "requests": { "cpu": "2", "memory": "2048Mi" }, "limits": { "cpu": "2", "memory": "2048Mi" } } Interrupts are still being delivered to CPUs allocated for these pods. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. create several guaranteed pods with requesting IRQs disabled 2. run a network intensive workload, such as uperf, and collect periodic snapshots of /proc/interrupts Actual results: Device interrupts (network adapter) increase on CPUs that are for the pods. /proc/irq/default_smp_affinity does have what appears to be a correct mask (several CPUs not present in this mask correlating to the CPUs used for guaranteed pods), but interrupts are still delivered to these CPUs. Expected results: Interrupts should not be delivered to the CPUs in the default_smp_affinity mask. Additional info:
I wonder where the cpu-quota came from.. there is at least a docs bug here, the annotations should say: cpu-load-balancing.crio.io: "disable" irq-load-balancing.crio.io: "disable" Compare https://docs.openshift.com/container-platform/4.7/scalability_and_performance/cnf-performance-addon-operator-for-low-latency-nodes.html#disabling_interrupt_processing_for_individual_pods_cnf-master and https://docs.openshift.com/container-platform/4.7/scalability_and_performance/cnf-performance-addon-operator-for-low-latency-nodes.html#performance-addon-operator-disabling-cpu-load-balancing-for-dpdk_cnf-master
Andrew, we need some more information. - The full Performance Profile + at least tuned logs, - or the full must gather with PAO bits. Could this for example be a case of https://bugzilla.redhat.com/show_bug.cgi?id=1846767 ?
Btw, make sure you watch the interrupt counts only during the guaranteed pod execution. And while the perftool pod is running, you can check the interrupt affinity for example using a debug pod we have: quay.io/marsik/debug-tools. Locally I execute it using "podman run quay.io/marsik/debug-tools knit irqaff", in OCP you will have to play with Pod definitions and permissions I guess or you might try podman directly on the node.
Here is the process for validating the interrupts are occurring on the wrong CPUs: Here is how I get the CPUs used from the pods (this is grepping from test result which I will document the procedure to run the test in another comment): [root@amt-f33 ~]# cd /var/lib/crucible/run/latest/run/iterations/iteration-1/sample-1/client/ [root@amt-f33 client]# grep "current affinity list:" */uperf-client-stderrout.txt 10/uperf-client-stderrout.txt:pid 363's current affinity list: 10,50 11/uperf-client-stderrout.txt:pid 364's current affinity list: 8,48 12/uperf-client-stderrout.txt:pid 364's current affinity list: 14,54 13/uperf-client-stderrout.txt:pid 364's current affinity list: 12,52 14/uperf-client-stderrout.txt:pid 364's current affinity list: 4,44 15/uperf-client-stderrout.txt:pid 364's current affinity list: 2,42 16/uperf-client-stderrout.txt:pid 364's current affinity list: 6,46 1/uperf-client-stderrout.txt:pid 373's current affinity list: 28,68 2/uperf-client-stderrout.txt:pid 364's current affinity list: 24,64 3/uperf-client-stderrout.txt:pid 364's current affinity list: 26,66 4/uperf-client-stderrout.txt:pid 364's current affinity list: 20,60 5/uperf-client-stderrout.txt:pid 364's current affinity list: 22,62 6/uperf-client-stderrout.txt:pid 364's current affinity list: 16,56 7/uperf-client-stderrout.txt:pid 364's current affinity list: 18,58 8/uperf-client-stderrout.txt:pid 364's current affinity list: 30,70 9/uperf-client-stderrout.txt:pid 364's current affinity list: 32,72 After a run is complete, we can get the summary info, which includes the result and metrics available, as well as the time (important to correlate with the metrics query after this): $crucible console /bin/bash crucible-container [root@amt-f33:cdmq]$ ./get-result-summary.sh --run 5C6928EE-7C2D-11EB-A19B-4F38DD521EE9 run-id: 5C6928EE-7C2D-11EB-A19B-4F38DD521EE9 tags: irq=bal kernel=4.18.0-240.10.1.el8_3.x86_64 mtu=1400 osruntime=pod pao=yes pao_noirq=yes pod_qos=static pods-per-worker=16 rcos=47.83.202102090044-0 sdn=OVNKubernetes topo=internode userenv=stream worker_pairs=1 metrics: source: procstat types: interrupts-sec source: mpstat types: Busy-CPU NonBusy-CPU source: sar-net types: L2-Gbps source: sar-mem types: Page-faults-sec KB-Paged-in-sec KB-Paged-out-sec Pages-freed-sec Pages-swapped-in-sec Pages-swapped-out-sec VM-Efficiency kswapd-scanned-pages-sec reclaimed-pages-sec scanned-pages-sec source: uperf types: Gbps round-trip-usec transactions-sec source: sar-scheduler types: Context-switches-sec Load-Average-01m Load-Average-05m Load-Average-15m Process-List-Size Run-Queue-Length source: sar-io types: IO-Blocked-Tasks source: sar-tasks types: Processes-created-sec iterations: iteration-id: 269E7E56-7C2F-11EB-9E76-056DDD521EE9 params: duration=60 nthreads=64 protocol=tcp rsize=1024 server-ifname=eth0 test-type=rr wsize=64 period-id: 26A37E10-7C2F-11EB-9E76-056DDD521EE9 periodRange: {"begin":1614782277123,"end":1614782337380} result: (transactions-sec) samples: 107600.00 mean: 107600.00 stddev: 0.00 stddevpct: 0.00 Now that we have the CPUs used for the pods and the time range, we can query for interrupts-sec metric, starting with all interrupts for worker node 1: crucible-container [root@amt-f33:cdmq]$ node get-metric-data.js --run 5C6928EE-7C2D-11EB-A19B-4F38DD521EE9 --source procstat --type interrupts-sec --url localhost:9200 --begin 1614782277123 --end 1614782337380 --resolution 1 --breakout cstype=worker,csid=1 { "name": "procstat", "type": "interrupts-sec", "label": "<cstype>-<csid>", "values": { "<worker>-<1>": [ { "begin": 1614782277123, "end": 1614782337380, "value": "3.377e+5" } ] }, "breakouts": [ "core", "cpu", "desc", "die", "irq", "package", "thread", "type" ] } So we had an average of 337700 interrupts/sec. but that is all interrupts on all cpus. Let's pick a cpu we know was used for a pod: crucible-container [root@amt-f33:cdmq]$ node get-metric-data.js --run 5C6928EE-7C2D-11EB-A19B-4F38DD521EE9 --source procstat --type interrupts-sec --url localhost:9200 --begin 1614782277123 --end 1614782337380 --resolution 1 --breakout cstype=worker,csid=1,cpu=4 { "name": "procstat", "type": "interrupts-sec", "label": "<cstype>-<csid>-<cpu>", "values": { "<worker>-<1>-<4>": [ { "begin": 1614782277123, "end": 1614782337380, "value": "4529" } ] }, "breakouts": [ "core", "desc", "die", "irq", "package", "thread", "type" ] } So, we have 4529 interrupts/sec during this time period, but we don't know what they are for. Let's break it out by interrupt type: crucible-container [root@amt-f33:cdmq]$ node get-metric-data.js --run 5C6928EE-7C2D-11EB-A19B-4F38DD521EE9 --source procstat --type interrupts-sec --url localhost:9200 --begin 1614782277123 --end 1614782337380 --resolution 1 --breakout cstype=worker,csid=1,cpu=4,type { "name": "procstat", "type": "interrupts-sec", "label": "<cstype>-<csid>-<cpu>-<type>", "values": { "<worker>-<1>-<4>-<IR-PCI-MSI>": [ { "begin": 1614782277123, "end": 1614782337380, "value": "3292" } ], "<worker>-<1>-<4>-<DMAR-MSI>": [ { "begin": 1614782277123, "end": 1614782337380, "value": "0.000" } ], "<worker>-<1>-<4>-<IR-IO-APIC>": [ { "begin": 1614782277123, "end": 1614782337380, "value": "0.000" } ], "<worker>-<1>-<4>-<Hyper-V>": [ { "begin": 1614782277123, "end": 1614782337380, "value": "0.000" } ], "<worker>-<1>-<4>-<Machine>": [ { "begin": 1614782277123, "end": 1614782337380, "value": "0.000" } ], <skipping several in this output> "<worker>-<1>-<4>-<Threshold>": [ { "begin": 1614782277123, "end": 1614782337380, "value": "0.000" } ] }, "breakouts": [ "core", "desc", "die", "irq", "package", "thread" ] } So, IR-PCI-MSI has 3292 interrupts/sec. These are device interrupts. We can see what interrupt specifically this is: crucible-container [root@amt-f33:cdmq]$ node get-metric-data.js --run 5C6928EE-7C2D-11EB-A19B-4F38DD521EE9 --source procstat --type interrupts-sec --url localhost:9200 --begin 1614782277123 --end 1614782337380 --resolution 1 --breakout cstype=worker,csid=1,cpu=4,type=IR-PCI-MSI,desc { "name": "procstat", "type": "interrupts-sec", "label": "<cstype>-<csid>-<cpu>-<type>-<desc>", "values": { "<worker>-<1>-<4>-<IR-PCI-MSI>-<PCIe>": [ { "begin": 1614782277123, "end": 1614782337380, "value": "0.000" } ], "<worker>-<1>-<4>-<IR-PCI-MSI>-<ahci[0000:00:11.5]>": [ { "begin": 1614782277123, "end": 1614782337380, "value": "0.000" } ], "<worker>-<1>-<4>-<IR-PCI-MSI>-<mlx5_comp8@pci:0000:5e:00.0>": [ { "begin": 1614782277123, "end": 1614782337380, "value": "0.000" } ], "<worker>-<1>-<4>-<IR-PCI-MSI>-<mlx5_comp8@pci:0000:5e:00.1>": [ { "begin": 1614782277123, "end": 1614782337380, "value": "0.000" } ], "<worker>-<1>-<4>-<IR-PCI-MSI>-<mlx5_comp9@pci:0000:5e:00.0>": [ { "begin": 1614782277123, "end": 1614782337380, "value": "3089" } <hundreds of other metrics skipped here to keep this short> ], "breakouts": [ "core", "die", "irq", "package", "thread" ] } From the output, we can see that the IRQ for mlx5_comp9@pci:0000:5e:00.0 has 3089 interrupts/sec. We can get the IRQ number with the following: crucible-container [root@amt-f33:cdmq]$ node get-metric-data.js --run 5C6928EE-7C2D-11EB-A19B-4F38DD521EE9 --source procstat --type interrupts-sec --url localhost:9200 --begin 1614782277123 --end 1614782337380 --resolution 1 --breakout cstype=worker,csid=1,cpu=4,type=IR-PCI-MSI,desc=mlx5_comp9@pci:0000:5e:00.0,irq { "name": "procstat", "type": "interrupts-sec", "label": "<cstype>-<csid>-<cpu>-<type>-<desc>-<irq>", "values": { "<worker>-<1>-<4>-<IR-PCI-MSI>-<mlx5_comp9@pci:0000:5e:00.0>-<348>": [ { "begin": 1614782277123, "end": 1614782337380, "value": "3089" } ] }, "breakouts": [ "core", "die", "package", "thread" ] } In the metric name, we have <worker>-<1>-<4>-<IR-PCI-MSI>-<mlx5_comp9@pci:0000:5e:00.0>-<348>, and the 348 corresponds to <irq> in the label, <cstype>-<csid>-<cpu>-<type>-<desc>-<irq>. Now we can check some collected data to see what the smp_affinity_list is for that IRQ: [root@amt-f33 ~]# cd /var/lib/crucible/run/latest/ [root@amt-f33 latest]# jq '."run-id"' run/rickshaw-run.json #confirm this is the same run "5C6928EE-7C2D-11EB-A19B-4F38DD521EE9" [root@amt-f33 latest]# cat run/tool-data/worker/1/procstat/proc-irq/proc/irq/348/smp_affinity_list 4 [root@amt-f33 latest]# cat run/tool-data/worker/1/procstat/proc-irq/proc/irq/default_smp_affinity feaa,aaaaabfe,aaaaaaab [root@amt-f33 latest]# cat run/tool-data/worker/1/procstat/proc-irq/proc/irq/default_smp_affinity | tr a-z A-Z | sed -e s/,//g FEAAAAAAABFEAAAAAAAB [root@amt-f33 latest]# cpumask=`cat run/tool-data/worker/1/procstat/proc-irq/proc/irq/default_smp_affinity | tr a-z A-Z | sed -e s/,//g` [root@amt-f33 latest]# echo "ibase=16; obase=2; $cpumask" | bc 11111110101010101010101010101010101010111111111010101010101010101010\ 101010101011 CPU4 is set to 0 in the in the default_smp_affinity mask, so it looks like PAO is setting the mask correctly, but maybe crio is not updating irqbalance.
Created attachment 1760445 [details] oc-get-PerformanceProfile-performance.yaml
Created attachment 1760448 [details] pod-spec.json
The systemd irqbalance service looks into /etc/default/irqbalance: EnvironmentFiles=/etc/sysconfig/irqbalance (ignore_errors=no) Which has: IRQBALANCE_BANNED_CPUS=00000000 The above setting overrides the outcome of: IRQBALANCE_BANNED_CPUS="0000,00000000,55500000" irqbalance --oneshot first time the irqbalance daemon kicks in again. A possible solution is to synchronize the daemon with the env variable of the irqbalance --oneshot call, or to edit the /etc/sysconfig/irqbalance and restart the irqbalance service instead of the irqbalance --oneshot call.
The CRI-O backport was merged into 1.20 https://github.com/cri-o/cri-o/pull/4656
Andrew: We believe the issue is fixed in the latest OCP, can you retest in your lab?
Version: [root@bkr-hv03 uperf]# /root/img-nvr.sh [root@bkr-hv03 uperf]# oc version Client Version: 4.8.0-rc.1 Server Version: 4.8.0-rc.1 Kubernetes Version: v1.21.0-rc.0+766a5fe Pao version: NVR=v4.8.0-41 Ran Crucible tests which runs uperf tests which generates network traffic which inturn generates lot of interrupts for the network devices. 1. Git clone https://github.com/perftool-incubator/crucible-examples and install crucible using instructions from: https://github.com/perftool-incubator/crucible 2. Modify the run.sh Uperf settings: topo="internode" scale_out_factor=1 pod_qos=static # static = guaranteed pod, burstable = default pos qos ocphost=bkr-hv03.dsal.lab.eng.bos.redhat.com k8susr=root annotations=`/bin/pwd`/no-irq-annotations.json runtimeClassNameOpt=",runtimeClassName:performance-performance" irq="bal" 3. Execute run.sh , where it creates guaranteed pods with annotations [root@bkr-hv03 uperf]# ./run.sh Using annotations: /root/crucible-examples/uperf/no-irq-annotations.json Generating --bench-params from --mv-params... podman run --pull=missing -i --name crucible-multiplex --rm -e CRUCIBLE_HOME=/opt/crucible -e TOOLBOX_HOME=/opt/crucible/subprojects/core/toolbox --mount=type=bind,source=/var/lib/containers,destination=/var/lib /containers --mount=type=bind,source=/root,destination=/root --mount=type=bind,source=/opt/crucible/config/.bashrc,destination=/root/.bashrc --mount=type=bind,source=/home,destination=/home --mount=type=bind,sou rce=/var/lib/crucible,destination=/var/lib/crucible --mount=type=bind,source=/opt/crucible,destination=/opt/crucible --privileged --ipc=host --pid=host --net=host --security-opt=label=disable --workdir=/root/cru cible-examples/uperf quay.io/crucible/controller:latest /opt/crucible/subprojects/core/multiplex/multiplex.py --input /var/lib/crucible/run/uperf-2021-07-01_09:45:47--sdn:OVNKubernetes,mtu:1400,rcos:48.84.202106 231817-0,kernel:4.18.0-305.3.1.rt7.75.el8_4.x86_64,irq:bal,userenv:stream,osruntime:pod,topo:internode,pods-per-worker:1,scale_out_factor:1/config/mv-params.json > /var/lib/crucible/run/uperf-2021-07-01_09:45:47 --sdn:OVNKubernetes,mtu:1400,rcos:48.84.202106231817-0,kernel:4.18.0-305.3.1.rt7.75.el8_4.x86_64,irq:bal,userenv:stream,osruntime:pod,topo:internode,pods-per-worker:1,scale_out_factor:1/config/bench-params.json WARN[0000] lstat /sys/fs/cgroup/devices/machine.slice/libpod-73cca0e52f0f7c5ef03136d02c9110bd71d127d61daefecc9b8528034d6d879c.scope: no such file or directory Checking for redis...appears to be running Preparing to run uperf Confirming the endpoints will satisfy the benchmark-client and benchmark-server requirements There will be 1 client(s) and 1 server(s) Building test execution order Image was found at quay.io/crucible/client-server:36fa1c44f05f0de835c534a398132800: Deploying endpoints Roadblock: Thu Jul 1 13:45:52 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:endpoint-deploy Endpoint created followers: master-1 master-2 master-3 worker-1 worker-2 Roadblock: Thu Jul 1 13:47:09 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:client-server-script-start Roadblock: Thu Jul 1 13:47:13 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:client-server-get-data Roadblock: Thu Jul 1 13:47:19 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:client-server-collect-sysinfo Roadblock: Thu Jul 1 13:48:17 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:client-server-start-tools Roadblock: Thu Jul 1 13:49:25 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:client-server-start-tests Running tests: Iteration 1 sample 1 (test 1 of 1) attempt number 1 Roadblock: Thu Jul 1 13:49:39 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:1-1-1:infra-start Roadblock: Thu Jul 1 13:49:47 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:1-1-1:server-start Roadblock: Thu Jul 1 13:49:53 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:1-1-1:endpoint-start Roadblock: Thu Jul 1 13:50:10 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:1-1-1:client-start found new timeout value: 480 Assigning new timeout with padding for next roadblock: 480 Roadblock: Thu Jul 1 13:50:39 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:1-1-1:client-stop Roadblock: Thu Jul 1 13:56:02 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:1-1-1:endpoint-stop Roadblock: Thu Jul 1 13:56:09 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:1-1-1:server-stop Sample 1 completed successfully with 0 failed attempts (0 total sample failures for this iteration) Roadblock: Thu Jul 1 13:56:18 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:1-1-1:infra-stop Roadblock: Thu Jul 1 13:56:27 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:client-server-stop-tests Roadblock: Thu Jul 1 13:56:34 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:client-server-stop-tools Roadblock: Thu Jul 1 13:56:41 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:client-server-send-data Roadblock: Thu Jul 1 13:58:55 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:client-server-script-stop Roadblock: Thu Jul 1 14:00:01 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:endpoint-move-data Roadblock: Thu Jul 1 14:00:26 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:endpoint-finish WARN[0000] lstat /sys/fs/cgroup/devices/machine.slice/libpod-c3acd60d83cec032d9d89ba4d5a761aa2ea9de8afd4b77ace72b5ab3646c96db.scope: no such file or directory Checking for httpd...appears to be running Checking for elasticsearch...appears to be running Launching a post-process job for each iteration x sample x [client|server] for uperf Waiting for 2 post-processing jobs to complete Post-processing complete WARN[0000] lstat /sys/fs/cgroup/devices/machine.slice/libpod-55994ffda65e81cd78d69447965f352cfa1861ebb59bbf3109e7cbb01322336b.scope: no such file or directory Launching a post-process job for each tool * each collector Working on tool dir tool-data/worker/2 Working on tool dir tool-data/worker/1 Working on tool dir tool-data/master/3 Working on tool dir tool-data/master/2 Working on tool dir tool-data/master/1 Waiting for 20 post-processing jobs to complete Post-processing complete WARN[0000] lstat /sys/fs/cgroup/devices/machine.slice/libpod-abaf09d9a8cd36c9eebef772cd77d57a774ed3ee211abd26bad6e15764c6899e.scope: no such file or directory Benchmark result is in /var/lib/crucible/run/uperf-2021-07-01_09:45:47--sdn:OVNKubernetes,mtu:1400,rcos:48.84.202106231817-0,kernel:4.18.0-305.3.1.rt7.75.el8_4.x86_64,irq:bal,userenv:stream,osruntime:pod,topo:in ternode,pods-per-worker:1,scale_out_factor:1 Adding cluster settings % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 227 100 125 100 102 20833 17000 --:--:-- --:--:-- --:--:-- 37833 {"acknowledged":true,"persistent":{"action":{"auto_create_index":"false"},"search":{"max_buckets":"1000000"}},"transient":{}}Creating templates and indices WARN[0000] lstat /sys/fs/cgroup/devices/machine.slice/libpod-be202161d0216d2ef4c29dbc8378ec1267854207d00e2102078aec92dc36f73b.scope: no such file or directory Exporting from /var/lib/crucible/run/uperf-2021-07-01_09:45:47--sdn:OVNKubernetes,mtu:1400,rcos:48.84.202106231817-0,kernel:4.18.0-305.3.1.rt7.75.el8_4.x86_64,irq:bal,userenv:stream,osruntime:pod,topo:internode, pods-per-worker:1,scale_out_factor:1/run/rickshaw-run.json to elasticsearch documents and POSTing to localhost:9200 Run ID: A4B641A6-DA72-11EB-8D30-6350289F27D1 Indexing of tool data for worker-2 starting Waiting for 31 indexing jobs to complete Indexing of tool data for worker-2 complete Indexing of tool data for worker-1 starting Waiting for 31 indexing jobs to complete Indexing of tool data for worker-1 complete Indexing of tool data for master-3 starting Waiting for 23 indexing jobs to complete Indexing of tool data for master-3 complete Indexing of tool data for master-2 starting Waiting for 23 indexing jobs to complete Indexing of tool data for master-2 complete Indexing of tool data for master-1 starting Waiting for 23 indexing jobs to complete Indexing of tool data for master-1 complete Indexing of tool data for master-1 complete Indexing of benchmark data starting Indexing of benchmark data complete Indexing to ES complete WARN[0000] lstat /sys/fs/cgroup/devices/machine.slice/libpod-a22ec92a9081c28731c5d3984953ccb2214b15f92579c0fd179aa10d75e884b2.scope: no such file or directory Benchmark result now in elastic, localhost:9200 WARN[0000] lstat /sys/fs/cgroup/devices/machine.slice/libpod-c535c3191826979f731131a6cb3c33e6e21ac3a0a168851177cf0ccddd0e2dff.scope: no such file or directory Generating benchmark summary report run-id: A4B641A6-DA72-11EB-8D30-6350289F27D1 tags: irq=bal kernel=4.18.0-305.3.1.rt7.75.el8_4.x86_64 mtu=1400 osruntime=pod pods-per-worker=1 rcos=48.84.202106231817-0 scale_out_factor=1 sdn=OVNKubernetes topo=internode userenv=stream metrics: source: procstat types: interrupts-sec source: mpstat types: Busy-CPU NonBusy-CPU source: ovs types: Gbps packets-sec dpctl-mem conntrack source: sar-net types: L2-Gbps packets-sec errors-sec source: sar-scheduler types: IO-Blocked-Tasks Load-Average-01m Load-Average-05m Load-Average-15m Process-List-Size Run-Queue-Length source: sar-mem types: Page-faults-sec KB-Paged-in-sec KB-Paged-out-sec Pages-freed-sec source: sar-tasks types: Context-switches-sec Processes-created-sec source: uperf types: Gbps round-trip-usec transactions-sec iterations: iteration-id: C375E1D0-DA74-11EB-97DA-C371289F27D1 params: duration=300 ifname=eth0 nthreads=64 protocol=tcp rsize=64 test-type=rr wsize=64 primary-period name: measurement samples: sample-id: C37936AA-DA74-11EB-97DA-C371289F27D1 primary period-id: C37A7AE2-DA74-11EB-97DA-C371289F27D1 period range: begin: 1625147441817 end: 1625147741339 result: (transactions-sec) samples: 142300.00 mean: 142300.00 min: 142300.00 max: 142300.00 stddev: NaN stddevpct: NaN WARN[0000] lstat /sys/fs/cgroup/devices/machine.slice/libpod-70f34b5b20e3b699521ad890aee5068707453a99cc32ed486ac3e3a286f503f5.scope: no such file or directory Benchmark summary is complete and can be found in: /var/lib/crucible/run/uperf-2021-07-01_09:45:47--sdn:OVNKubernetes,mtu:1400,rcos:48.84.202106231817-0,kernel:4.18.0-305.3.1.rt7.75.el8_4.x86_64,irq:bal,userenv:stream,osruntime:pod,topo:internode,pods-per-worker :1,scale_out_factor:1/run/result-summary.txt The above creats uperf client and server pods on 2 worker nodes which have below annotations: <snip> { "apiVersion": "v1", "kind": "Pod", "metadata": { "annotations": { "cpu-quota.crio.io": "disable", "irq-load-balancing.crio.io": "disable", "k8s.ovn.org/pod-networks": "{\"default\":{\"ip_addresses\":[\"10.131.0.70/23\"],\"mac_address\":\"0a:58:0a:83:00:46\",\"gateway_ips\":[\"10.131.0.1\"],\"ip_address\":\"10.131.0.70/23\",\"gateway_ip\":\"10.131.0.1\"}}" }, </snip> Pod Spec: <snip> "image": "quay.io/crucible/client-server:36fa1c44f05f0de835c534a398132800", "imagePullPolicy": "Always", "name": "client-1", "resources": { "limits": { "cpu": "70", "memory": "2Gi" }, "requests": { "cpu": "70", "memory": "2Gi" } }, "terminationMessagePath": "/dev/termination-log", "terminationMessagePolicy": "File", "volumeMounts": [ { "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount", "name": "kube-api-access-gcp6r", "readOnly": true } ] } </snip> 4. we check the cpus used by the pods: :1,scale_out_factor:1/run/result-summary.txt [root@bkr-hv03 uperf]# grep allowed /var/lib/crucible/run/latest/run/client-server/logs/client-1.txt Cpus_allowed: 57ff,fffffc57,fffffffc Cpus_allowed_list: 2-34,36,38,42-74,76,78 Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,000000 00,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000003 Mems_allowed_list: 0-1 [root@bkr-hv03 uperf]# grep allowed /var/lib/crucible/run/latest/run/client-server/logs/server-1.txt Cpus_allowed: 57ff,fffffc57,fffffffc Cpus_allowed_list: 2-34,36,38,42-74,76,78 Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,000000 00,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000003 Mems_allowed_list: 0-1 5. Now verify if the interrupts are not using the cpus not used by the above pods: [root@bkr-hv03 uperf]# crucible get metric --run A4B641A6-DA72-11EB-8D30-6350289F27D1 --period C37A7AE2-DA74-11EB-97DA-C371289F27D1 --source procstat --type interrupts-sec --breakout cstype=worker,csid,type=IR-P CI-MSI,cpu --filter gt:0 | grep worker "<worker>-<2>-<IR-PCI-MSI>-<0>": [ "<worker>-<2>-<IR-PCI-MSI>-<40>": [ "<worker>-<2>-<IR-PCI-MSI>-<1>": [ "<worker>-<2>-<IR-PCI-MSI>-<35>": [ "<worker>-<2>-<IR-PCI-MSI>-<37>": [ "<worker>-<2>-<IR-PCI-MSI>-<41>": [ "<worker>-<2>-<IR-PCI-MSI>-<75>": [ "<worker>-<2>-<IR-PCI-MSI>-<77>": [ "<worker>-<2>-<IR-PCI-MSI>-<79>": [ "<worker>-<1>-<IR-PCI-MSI>-<0>": [ "<worker>-<1>-<IR-PCI-MSI>-<40>": [ "<worker>-<1>-<IR-PCI-MSI>-<1>": [ "<worker>-<1>-<IR-PCI-MSI>-<35>": [ "<worker>-<1>-<IR-PCI-MSI>-<37>": [ "<worker>-<1>-<IR-PCI-MSI>-<39>": [ "<worker>-<1>-<IR-PCI-MSI>-<41>": [ "<worker>-<1>-<IR-PCI-MSI>-<77>": [ "<worker>-<1>-<IR-PCI-MSI>-<79>": [ WARN[0000] lstat /sys/fs/cgroup/devices/machine.slice/libpod-f1e85fbb77b2ea947bea7d6e5707c37d695786fbb1ba255f5fc4a1ad884cec9b.scope: no such file or directory As we can the cpus used by interrupts are not from cpus used by the guaranteed pods. Marking this verified
(In reply to Martin Sivák from comment #12) > Andrew: We believe the issue is fixed in the latest OCP, can you retest in > your lab? I worked with Niranjan in his lab with 2 bare-metal workers, and we were able to verify that opting out of interrupts works.
Thank you both Niranjan and Andrew!