Bug 1934630 - Requesting no-irqs for guaranteed pod not effective
Summary: Requesting no-irqs for guaranteed pod not effective
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Performance Addon Operator
Version: 4.7
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.8.0
Assignee: Martin Sivák
QA Contact: Niranjan Mallapadi Raghavender
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-03 15:21 UTC by Andrew Theurer
Modified: 2022-08-26 14:52 UTC (History)
7 users (show)

Fixed In Version: cri-o-1.20.2, openshift 4.7.4
Doc Type: Bug Fix
Doc Text:
Cause: CRI-O used irqbalance --oneshot mode to set interrupt masks while the irqbalance system service was running. Those two conflicted and set different masks. Consequence: Dynamic interrupt masks were not reliable (or not working at all). Low latency was compromised. Fix: CRI-O fixed. Result: The dynamic interrupt mask handling now works as expected.
Clone Of:
Environment:
Last Closed: 2022-08-26 14:52:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
oc-get-PerformanceProfile-performance.yaml (2.77 KB, text/plain)
2021-03-03 17:44 UTC, Andrew Theurer
no flags Details
pod-spec.json (1.90 KB, text/plain)
2021-03-03 17:47 UTC, Andrew Theurer
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github cri-o cri-o pull 4441 0 None closed handle irqbalance service 2021-03-11 14:16:41 UTC
Github cri-o cri-o pull 4656 0 None closed Backport irqbalance handling onto 1.20 2021-03-16 14:43:15 UTC

Description Andrew Theurer 2021-03-03 15:21:39 UTC
Description of problem:

When creating several pods with the following:

"annotations": {
  "irq-load-balancing.crio.io": "disable",
  "cpu-quota.crio.io": "disable"
}

...

"spec": {
    "runtimeClassName": "performance-performance"
...

"resources": {
    "requests": {
        "cpu": "2",
        "memory": "2048Mi"
    },
    "limits": {
        "cpu": "2",
        "memory": "2048Mi"
    }
}

Interrupts are still being delivered to CPUs allocated for these pods.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. create several guaranteed pods with requesting IRQs disabled
2. run a network intensive workload, such as uperf, and collect periodic snapshots of /proc/interrupts


Actual results:
Device interrupts (network adapter) increase on CPUs that are for the pods.  /proc/irq/default_smp_affinity does have what appears to be a correct mask (several CPUs not present in this mask correlating to the CPUs used for guaranteed pods), but interrupts are still delivered to these CPUs.

Expected results:
Interrupts should not be delivered to the CPUs in the default_smp_affinity mask.


Additional info:

Comment 2 Martin Sivák 2021-03-03 16:12:29 UTC
Andrew, we need some more information.

- The full Performance Profile + at least tuned logs,
- or the full must gather with PAO bits.

Could this for example be a case of https://bugzilla.redhat.com/show_bug.cgi?id=1846767 ?

Comment 4 Martin Sivák 2021-03-03 16:28:49 UTC
Btw, make sure you watch the interrupt counts only during the guaranteed pod execution. And while the perftool pod is running, you can check the interrupt affinity for example using a debug pod we have: quay.io/marsik/debug-tools. Locally I execute it using "podman run quay.io/marsik/debug-tools knit irqaff", in OCP you will have to play with Pod definitions and permissions I guess or you might try podman directly on the node.

Comment 5 Andrew Theurer 2021-03-03 17:29:03 UTC
Here is the process for validating the interrupts are occurring on the wrong CPUs:

Here is how I get the CPUs used from the pods (this is grepping from test result which I will document the procedure to run the test in another comment):

[root@amt-f33 ~]# cd /var/lib/crucible/run/latest/run/iterations/iteration-1/sample-1/client/
[root@amt-f33 client]# grep "current affinity list:" */uperf-client-stderrout.txt
10/uperf-client-stderrout.txt:pid 363's current affinity list: 10,50
11/uperf-client-stderrout.txt:pid 364's current affinity list: 8,48
12/uperf-client-stderrout.txt:pid 364's current affinity list: 14,54
13/uperf-client-stderrout.txt:pid 364's current affinity list: 12,52
14/uperf-client-stderrout.txt:pid 364's current affinity list: 4,44
15/uperf-client-stderrout.txt:pid 364's current affinity list: 2,42
16/uperf-client-stderrout.txt:pid 364's current affinity list: 6,46
1/uperf-client-stderrout.txt:pid 373's current affinity list: 28,68
2/uperf-client-stderrout.txt:pid 364's current affinity list: 24,64
3/uperf-client-stderrout.txt:pid 364's current affinity list: 26,66
4/uperf-client-stderrout.txt:pid 364's current affinity list: 20,60
5/uperf-client-stderrout.txt:pid 364's current affinity list: 22,62
6/uperf-client-stderrout.txt:pid 364's current affinity list: 16,56
7/uperf-client-stderrout.txt:pid 364's current affinity list: 18,58
8/uperf-client-stderrout.txt:pid 364's current affinity list: 30,70
9/uperf-client-stderrout.txt:pid 364's current affinity list: 32,72

After a run is complete, we can get the summary info, which includes the result and metrics available, as well as the time (important to correlate with the metrics query after this):

$crucible console /bin/bash
crucible-container [root@amt-f33:cdmq]$ ./get-result-summary.sh --run 5C6928EE-7C2D-11EB-A19B-4F38DD521EE9

run-id: 5C6928EE-7C2D-11EB-A19B-4F38DD521EE9
  tags: irq=bal kernel=4.18.0-240.10.1.el8_3.x86_64 mtu=1400 osruntime=pod pao=yes pao_noirq=yes pod_qos=static pods-per-worker=16 rcos=47.83.202102090044-0 sdn=OVNKubernetes topo=internode userenv=stream worker_pairs=1
  metrics:
    source: procstat
      types: interrupts-sec
    source: mpstat
      types: Busy-CPU NonBusy-CPU
    source: sar-net
      types: L2-Gbps
    source: sar-mem
      types: Page-faults-sec KB-Paged-in-sec KB-Paged-out-sec Pages-freed-sec Pages-swapped-in-sec Pages-swapped-out-sec VM-Efficiency kswapd-scanned-pages-sec reclaimed-pages-sec scanned-pages-sec
    source: uperf
      types: Gbps round-trip-usec transactions-sec
    source: sar-scheduler
      types: Context-switches-sec Load-Average-01m Load-Average-05m Load-Average-15m Process-List-Size Run-Queue-Length
    source: sar-io
      types: IO-Blocked-Tasks
    source: sar-tasks
      types: Processes-created-sec
  iterations:
    iteration-id: 269E7E56-7C2F-11EB-9E76-056DDD521EE9
      params: duration=60 nthreads=64 protocol=tcp rsize=1024 server-ifname=eth0 test-type=rr wsize=64
period-id: 26A37E10-7C2F-11EB-9E76-056DDD521EE9
periodRange: {"begin":1614782277123,"end":1614782337380}
      result: (transactions-sec) samples: 107600.00 mean: 107600.00 stddev: 0.00 stddevpct: 0.00

Now that we have the CPUs used for the pods and the time range, we can query for interrupts-sec metric, starting with all interrupts for worker node 1:

crucible-container [root@amt-f33:cdmq]$ node get-metric-data.js --run 5C6928EE-7C2D-11EB-A19B-4F38DD521EE9  --source procstat --type interrupts-sec --url localhost:9200 --begin 1614782277123 --end 1614782337380 --resolution 1 --breakout cstype=worker,csid=1
{
  "name": "procstat",
  "type": "interrupts-sec",
  "label": "<cstype>-<csid>",
  "values": {
    "<worker>-<1>": [
      {
        "begin": 1614782277123,
        "end": 1614782337380,
        "value": "3.377e+5"
      }
    ]
  },
  "breakouts": [
    "core",
    "cpu",
    "desc",
    "die",
    "irq",
    "package",
    "thread",
    "type"
  ]
}

So we had an average of 337700 interrupts/sec. but that is all interrupts on all cpus.  Let's pick a cpu we know was used for a pod:

crucible-container [root@amt-f33:cdmq]$ node get-metric-data.js --run 5C6928EE-7C2D-11EB-A19B-4F38DD521EE9  --source procstat --type interrupts-sec --url localhost:9200 --begin 1614782277123 --end 1614782337380 --resolution 1 --breakout cstype=worker,csid=1,cpu=4
{
  "name": "procstat",
  "type": "interrupts-sec",
  "label": "<cstype>-<csid>-<cpu>",
  "values": {
    "<worker>-<1>-<4>": [
      {
        "begin": 1614782277123,
        "end": 1614782337380,
        "value": "4529"
      }
    ]
  },
  "breakouts": [
    "core",
    "desc",
    "die",
    "irq",
    "package",
    "thread",
    "type"
  ]
}

So, we have 4529 interrupts/sec during this time period, but we don't know what they are for.  Let's break it out by interrupt type:

crucible-container [root@amt-f33:cdmq]$ node get-metric-data.js --run 5C6928EE-7C2D-11EB-A19B-4F38DD521EE9  --source procstat --type interrupts-sec --url localhost:9200 --begin 1614782277123 --end 1614782337380 --resolution 1 --breakout cstype=worker,csid=1,cpu=4,type
{
  "name": "procstat",
  "type": "interrupts-sec",
  "label": "<cstype>-<csid>-<cpu>-<type>",
  "values": {
    "<worker>-<1>-<4>-<IR-PCI-MSI>": [
      {
        "begin": 1614782277123,
        "end": 1614782337380,
        "value": "3292"
      }
    ],
    "<worker>-<1>-<4>-<DMAR-MSI>": [
      {
        "begin": 1614782277123,
        "end": 1614782337380,
        "value": "0.000"
      }
    ],
    "<worker>-<1>-<4>-<IR-IO-APIC>": [
      {
        "begin": 1614782277123,
        "end": 1614782337380,
        "value": "0.000"
      }
    ],
    "<worker>-<1>-<4>-<Hyper-V>": [
      {
        "begin": 1614782277123,
        "end": 1614782337380,
        "value": "0.000"
      }
    ],
    "<worker>-<1>-<4>-<Machine>": [
      {
        "begin": 1614782277123,
        "end": 1614782337380,
        "value": "0.000"
      }
    ],

    <skipping several in this output>

    "<worker>-<1>-<4>-<Threshold>": [
      {
        "begin": 1614782277123,
        "end": 1614782337380,
        "value": "0.000"
      }
    ]
  },
  "breakouts": [
    "core",
    "desc",
    "die",
    "irq",
    "package",
    "thread"
  ]
}

So, IR-PCI-MSI has 3292 interrupts/sec.  These are device interrupts.  We can see what interrupt specifically this is:

crucible-container [root@amt-f33:cdmq]$ node get-metric-data.js --run 5C6928EE-7C2D-11EB-A19B-4F38DD521EE9  --source procstat --type interrupts-sec --url localhost:9200 --begin 1614782277123 --end 1614782337380 --resolution 1 --breakout cstype=worker,csid=1,cpu=4,type=IR-PCI-MSI,desc
{
  "name": "procstat",
  "type": "interrupts-sec",
  "label": "<cstype>-<csid>-<cpu>-<type>-<desc>",
  "values": {
    "<worker>-<1>-<4>-<IR-PCI-MSI>-<PCIe>": [
      {
        "begin": 1614782277123,
        "end": 1614782337380,
        "value": "0.000"
      }
    ],
    "<worker>-<1>-<4>-<IR-PCI-MSI>-<ahci[0000:00:11.5]>": [
      {
        "begin": 1614782277123,
        "end": 1614782337380,
        "value": "0.000"
      }
    ],
    "<worker>-<1>-<4>-<IR-PCI-MSI>-<mlx5_comp8@pci:0000:5e:00.0>": [
      {
        "begin": 1614782277123,
        "end": 1614782337380,
        "value": "0.000"
      }
    ],
    "<worker>-<1>-<4>-<IR-PCI-MSI>-<mlx5_comp8@pci:0000:5e:00.1>": [
      {
        "begin": 1614782277123,
        "end": 1614782337380,
        "value": "0.000"
      }
    ],
    "<worker>-<1>-<4>-<IR-PCI-MSI>-<mlx5_comp9@pci:0000:5e:00.0>": [
      {
        "begin": 1614782277123,
        "end": 1614782337380,
        "value": "3089"
      }

 <hundreds of other metrics skipped here to keep this short>

    ],
  "breakouts": [
    "core",
    "die",
    "irq",
    "package",
    "thread"
  ]
}

From the output, we can see that the IRQ for mlx5_comp9@pci:0000:5e:00.0 has 3089 interrupts/sec.  We can get the IRQ number with the following:

crucible-container [root@amt-f33:cdmq]$ node get-metric-data.js --run 5C6928EE-7C2D-11EB-A19B-4F38DD521EE9  --source procstat --type interrupts-sec --url localhost:9200 --begin 1614782277123 --end 1614782337380 --resolution 1 --breakout cstype=worker,csid=1,cpu=4,type=IR-PCI-MSI,desc=mlx5_comp9@pci:0000:5e:00.0,irq
{
  "name": "procstat",
  "type": "interrupts-sec",
  "label": "<cstype>-<csid>-<cpu>-<type>-<desc>-<irq>",
  "values": {
    "<worker>-<1>-<4>-<IR-PCI-MSI>-<mlx5_comp9@pci:0000:5e:00.0>-<348>": [
      {
        "begin": 1614782277123,
        "end": 1614782337380,
        "value": "3089"
      }
    ]
  },
  "breakouts": [
    "core",
    "die",
    "package",
    "thread"
  ]
}

In the metric name, we have <worker>-<1>-<4>-<IR-PCI-MSI>-<mlx5_comp9@pci:0000:5e:00.0>-<348>, and the 348 corresponds to <irq> in the label, <cstype>-<csid>-<cpu>-<type>-<desc>-<irq>.

Now we can check some collected data to see what the smp_affinity_list is for that IRQ:

[root@amt-f33 ~]# cd /var/lib/crucible/run/latest/
[root@amt-f33 latest]# jq '."run-id"' run/rickshaw-run.json  #confirm this is the same run
"5C6928EE-7C2D-11EB-A19B-4F38DD521EE9"
[root@amt-f33 latest]# cat run/tool-data/worker/1/procstat/proc-irq/proc/irq/348/smp_affinity_list
4

[root@amt-f33 latest]# cat run/tool-data/worker/1/procstat/proc-irq/proc/irq/default_smp_affinity
feaa,aaaaabfe,aaaaaaab

[root@amt-f33 latest]# cat run/tool-data/worker/1/procstat/proc-irq/proc/irq/default_smp_affinity | tr a-z A-Z | sed -e s/,//g
FEAAAAAAABFEAAAAAAAB
[root@amt-f33 latest]# cpumask=`cat run/tool-data/worker/1/procstat/proc-irq/proc/irq/default_smp_affinity | tr a-z A-Z | sed -e s/,//g`
[root@amt-f33 latest]# echo "ibase=16; obase=2; $cpumask" | bc
11111110101010101010101010101010101010111111111010101010101010101010\
101010101011

CPU4 is set to 0 in the in the default_smp_affinity mask, so it looks like PAO is setting the mask correctly, but maybe crio is not updating irqbalance.

Comment 7 Andrew Theurer 2021-03-03 17:44:44 UTC
Created attachment 1760445 [details]
oc-get-PerformanceProfile-performance.yaml

Comment 8 Andrew Theurer 2021-03-03 17:47:16 UTC
Created attachment 1760448 [details]
pod-spec.json

Comment 10 Marcel Apfelbaum 2021-03-10 19:08:46 UTC
The systemd irqbalance service looks into /etc/default/irqbalance:
    EnvironmentFiles=/etc/sysconfig/irqbalance (ignore_errors=no)
Which has:
    IRQBALANCE_BANNED_CPUS=00000000

The above setting overrides the outcome of:
    IRQBALANCE_BANNED_CPUS="0000,00000000,55500000" irqbalance --oneshot
first time the irqbalance daemon kicks in again.
 
A possible solution is to synchronize the daemon with the env variable of the irqbalance --oneshot call,
or to edit the /etc/sysconfig/irqbalance and restart the irqbalance service instead of the irqbalance --oneshot call.

Comment 11 Martin Sivák 2021-03-16 14:43:17 UTC
The CRI-O backport was merged into 1.20 https://github.com/cri-o/cri-o/pull/4656

Comment 12 Martin Sivák 2021-04-01 14:03:56 UTC
Andrew: We believe the issue is fixed in the latest OCP, can you retest in your lab?

Comment 13 Niranjan Mallapadi Raghavender 2021-07-01 14:39:01 UTC
Version:
[root@bkr-hv03 uperf]# /root/img-nvr.sh 

[root@bkr-hv03 uperf]# oc version
Client Version: 4.8.0-rc.1
Server Version: 4.8.0-rc.1
Kubernetes Version: v1.21.0-rc.0+766a5fe

Pao version:
NVR=v4.8.0-41




Ran Crucible tests which runs uperf tests which generates network traffic which
inturn generates lot of interrupts for the network devices. 

1. Git clone https://github.com/perftool-incubator/crucible-examples and install crucible using instructions from:
https://github.com/perftool-incubator/crucible

2. Modify the run.sh 
Uperf settings:
topo="internode" 
scale_out_factor=1 
pod_qos=static # static = guaranteed pod, burstable = default pos qos
ocphost=bkr-hv03.dsal.lab.eng.bos.redhat.com
k8susr=root
annotations=`/bin/pwd`/no-irq-annotations.json
runtimeClassNameOpt=",runtimeClassName:performance-performance"
irq="bal"

3. Execute run.sh , where it creates guaranteed pods with annotations 

[root@bkr-hv03 uperf]# ./run.sh                                                                                                                                                                                    
Using annotations: /root/crucible-examples/uperf/no-irq-annotations.json                                                                                                                                           
Generating --bench-params from --mv-params...                                                                                                                                                                      
podman run --pull=missing -i --name crucible-multiplex --rm -e CRUCIBLE_HOME=/opt/crucible -e TOOLBOX_HOME=/opt/crucible/subprojects/core/toolbox --mount=type=bind,source=/var/lib/containers,destination=/var/lib
/containers --mount=type=bind,source=/root,destination=/root --mount=type=bind,source=/opt/crucible/config/.bashrc,destination=/root/.bashrc --mount=type=bind,source=/home,destination=/home --mount=type=bind,sou
rce=/var/lib/crucible,destination=/var/lib/crucible --mount=type=bind,source=/opt/crucible,destination=/opt/crucible --privileged --ipc=host --pid=host --net=host --security-opt=label=disable --workdir=/root/cru
cible-examples/uperf quay.io/crucible/controller:latest /opt/crucible/subprojects/core/multiplex/multiplex.py --input /var/lib/crucible/run/uperf-2021-07-01_09:45:47--sdn:OVNKubernetes,mtu:1400,rcos:48.84.202106
231817-0,kernel:4.18.0-305.3.1.rt7.75.el8_4.x86_64,irq:bal,userenv:stream,osruntime:pod,topo:internode,pods-per-worker:1,scale_out_factor:1/config/mv-params.json > /var/lib/crucible/run/uperf-2021-07-01_09:45:47
--sdn:OVNKubernetes,mtu:1400,rcos:48.84.202106231817-0,kernel:4.18.0-305.3.1.rt7.75.el8_4.x86_64,irq:bal,userenv:stream,osruntime:pod,topo:internode,pods-per-worker:1,scale_out_factor:1/config/bench-params.json 
WARN[0000] lstat /sys/fs/cgroup/devices/machine.slice/libpod-73cca0e52f0f7c5ef03136d02c9110bd71d127d61daefecc9b8528034d6d879c.scope: no such file or directory                                                     
Checking for redis...appears to be running                                                                                                                                                                         
Preparing to run uperf                                                                                                                                                                                             
Confirming the endpoints will satisfy the benchmark-client and benchmark-server requirements                                                                                                                       
There will be 1 client(s) and 1 server(s)                                                                                                                                                                          
Building test execution order                                                                                                                                                                                      
Image was found at quay.io/crucible/client-server:36fa1c44f05f0de835c534a398132800:                                                                                                                                
Deploying endpoints                                                                                                                                                                                                
Roadblock: Thu Jul  1 13:45:52 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:endpoint-deploy                                                                                
Endpoint created followers: master-1 master-2 master-3 worker-1 worker-2                                                                                                                                           
Roadblock: Thu Jul  1 13:47:09 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:client-server-script-start                                                                     
Roadblock: Thu Jul  1 13:47:13 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:client-server-get-data                                                                         
Roadblock: Thu Jul  1 13:47:19 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:client-server-collect-sysinfo                                                                  
Roadblock: Thu Jul  1 13:48:17 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:client-server-start-tools                                                                      
Roadblock: Thu Jul  1 13:49:25 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:client-server-start-tests                                                                      
Running tests:                                                                                                                                                                                                     
Iteration 1 sample 1 (test 1 of 1) attempt number 1                                                                                                                                                                
Roadblock: Thu Jul  1 13:49:39 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:1-1-1:infra-start                                                                              
Roadblock: Thu Jul  1 13:49:47 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:1-1-1:server-start                                                                             
Roadblock: Thu Jul  1 13:49:53 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:1-1-1:endpoint-start                                                                           
Roadblock: Thu Jul  1 13:50:10 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:1-1-1:client-start                                                                             
found new timeout value: 480                                                                                                                                                                                       
Assigning new timeout with padding for next roadblock: 480                                                                                                                                                         
Roadblock: Thu Jul  1 13:50:39 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:1-1-1:client-stop                                                                              
Roadblock: Thu Jul  1 13:56:02 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:1-1-1:endpoint-stop                                                                            
Roadblock: Thu Jul  1 13:56:09 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:1-1-1:server-stop                                                                              
Sample 1 completed successfully with 0 failed attempts (0 total sample failures for this iteration)                                                                                                                
Roadblock: Thu Jul  1 13:56:18 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:1-1-1:infra-stop                                                                               
Roadblock: Thu Jul  1 13:56:27 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:client-server-stop-tests                                                                       
Roadblock: Thu Jul  1 13:56:34 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:client-server-stop-tools                                                                       
Roadblock: Thu Jul  1 13:56:41 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:client-server-send-data                                                                        
Roadblock: Thu Jul  1 13:58:55 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:client-server-script-stop                                                                      
Roadblock: Thu Jul  1 14:00:01 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:endpoint-move-data                                                                             
Roadblock: Thu Jul  1 14:00:26 UTC 2021 role: leader attempt number: 1 uuid: 1:A4B641A6-DA72-11EB-8D30-6350289F27D1:endpoint-finish                                                                                
WARN[0000] lstat /sys/fs/cgroup/devices/machine.slice/libpod-c3acd60d83cec032d9d89ba4d5a761aa2ea9de8afd4b77ace72b5ab3646c96db.scope: no such file or directory                                                     
Checking for httpd...appears to be running                                                                                                                                                                         
Checking for elasticsearch...appears to be running                                                                                                                                                                 
Launching a post-process job for each iteration x sample x [client|server] for uperf 
Waiting for 2 post-processing jobs to complete                                                                                                                                                                     
Post-processing complete                                                                                                                                                                                           
WARN[0000] lstat /sys/fs/cgroup/devices/machine.slice/libpod-55994ffda65e81cd78d69447965f352cfa1861ebb59bbf3109e7cbb01322336b.scope: no such file or directory                                                     
Launching a post-process job for each tool * each collector                                                                                                                                                        
Working on tool dir tool-data/worker/2                                                                             
Working on tool dir tool-data/worker/1                                      
Working on tool dir tool-data/master/3
Working on tool dir tool-data/master/2
Working on tool dir tool-data/master/1
Waiting for 20 post-processing jobs to complete
Post-processing complete
WARN[0000] lstat /sys/fs/cgroup/devices/machine.slice/libpod-abaf09d9a8cd36c9eebef772cd77d57a774ed3ee211abd26bad6e15764c6899e.scope: no such file or directory
Benchmark result is in /var/lib/crucible/run/uperf-2021-07-01_09:45:47--sdn:OVNKubernetes,mtu:1400,rcos:48.84.202106231817-0,kernel:4.18.0-305.3.1.rt7.75.el8_4.x86_64,irq:bal,userenv:stream,osruntime:pod,topo:in
ternode,pods-per-worker:1,scale_out_factor:1
Adding cluster settings
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   227  100   125  100   102  20833  17000 --:--:-- --:--:-- --:--:-- 37833
{"acknowledged":true,"persistent":{"action":{"auto_create_index":"false"},"search":{"max_buckets":"1000000"}},"transient":{}}Creating templates and indices
WARN[0000] lstat /sys/fs/cgroup/devices/machine.slice/libpod-be202161d0216d2ef4c29dbc8378ec1267854207d00e2102078aec92dc36f73b.scope: no such file or directory
Exporting from /var/lib/crucible/run/uperf-2021-07-01_09:45:47--sdn:OVNKubernetes,mtu:1400,rcos:48.84.202106231817-0,kernel:4.18.0-305.3.1.rt7.75.el8_4.x86_64,irq:bal,userenv:stream,osruntime:pod,topo:internode,
pods-per-worker:1,scale_out_factor:1/run/rickshaw-run.json to elasticsearch documents and POSTing to localhost:9200
Run ID: A4B641A6-DA72-11EB-8D30-6350289F27D1
Indexing of tool data for worker-2 starting
Waiting for 31 indexing jobs to complete
Indexing of tool data for worker-2 complete
Indexing of tool data for worker-1 starting
Waiting for 31 indexing jobs to complete
Indexing of tool data for worker-1 complete
Indexing of tool data for master-3 starting
Waiting for 23 indexing jobs to complete
Indexing of tool data for master-3 complete
Indexing of tool data for master-2 starting
Waiting for 23 indexing jobs to complete
Indexing of tool data for master-2 complete
Indexing of tool data for master-1 starting
Waiting for 23 indexing jobs to complete
Indexing of tool data for master-1 complete
Indexing of tool data for master-1 complete
Indexing of benchmark data starting
Indexing of benchmark data complete
Indexing to ES complete
WARN[0000] lstat /sys/fs/cgroup/devices/machine.slice/libpod-a22ec92a9081c28731c5d3984953ccb2214b15f92579c0fd179aa10d75e884b2.scope: no such file or directory
Benchmark result now in elastic, localhost:9200
WARN[0000] lstat /sys/fs/cgroup/devices/machine.slice/libpod-c535c3191826979f731131a6cb3c33e6e21ac3a0a168851177cf0ccddd0e2dff.scope: no such file or directory
Generating benchmark summary report

run-id: A4B641A6-DA72-11EB-8D30-6350289F27D1
  tags: irq=bal kernel=4.18.0-305.3.1.rt7.75.el8_4.x86_64 mtu=1400 osruntime=pod pods-per-worker=1 rcos=48.84.202106231817-0 scale_out_factor=1 sdn=OVNKubernetes topo=internode userenv=stream
  metrics:
    source: procstat
      types: interrupts-sec
    source: mpstat
      types: Busy-CPU NonBusy-CPU
    source: ovs
      types: Gbps packets-sec dpctl-mem conntrack
    source: sar-net
      types: L2-Gbps packets-sec errors-sec
    source: sar-scheduler
      types: IO-Blocked-Tasks Load-Average-01m Load-Average-05m Load-Average-15m Process-List-Size Run-Queue-Length
    source: sar-mem
      types: Page-faults-sec KB-Paged-in-sec KB-Paged-out-sec Pages-freed-sec
    source: sar-tasks
      types: Context-switches-sec Processes-created-sec
    source: uperf
      types: Gbps round-trip-usec transactions-sec
  iterations:
    iteration-id: C375E1D0-DA74-11EB-97DA-C371289F27D1
      params: duration=300 ifname=eth0 nthreads=64 protocol=tcp rsize=64 test-type=rr wsize=64
      primary-period name: measurement
      samples:
        sample-id: C37936AA-DA74-11EB-97DA-C371289F27D1
          primary period-id: C37A7AE2-DA74-11EB-97DA-C371289F27D1
          period range: begin: 1625147441817 end: 1625147741339
        result: (transactions-sec) samples: 142300.00 mean: 142300.00 min: 142300.00 max: 142300.00 stddev: NaN stddevpct: NaN
WARN[0000] lstat /sys/fs/cgroup/devices/machine.slice/libpod-70f34b5b20e3b699521ad890aee5068707453a99cc32ed486ac3e3a286f503f5.scope: no such file or directory
Benchmark summary is complete and can be found in:
/var/lib/crucible/run/uperf-2021-07-01_09:45:47--sdn:OVNKubernetes,mtu:1400,rcos:48.84.202106231817-0,kernel:4.18.0-305.3.1.rt7.75.el8_4.x86_64,irq:bal,userenv:stream,osruntime:pod,topo:internode,pods-per-worker
:1,scale_out_factor:1/run/result-summary.txt

The above creats uperf client and server pods on 2 worker nodes which have below annotations:

<snip>

{
    "apiVersion": "v1",
    "kind": "Pod",
    "metadata": {
        "annotations": {
            "cpu-quota.crio.io": "disable",
            "irq-load-balancing.crio.io": "disable",
            "k8s.ovn.org/pod-networks": "{\"default\":{\"ip_addresses\":[\"10.131.0.70/23\"],\"mac_address\":\"0a:58:0a:83:00:46\",\"gateway_ips\":[\"10.131.0.1\"],\"ip_address\":\"10.131.0.70/23\",\"gateway_ip\":\"10.131.0.1\"}}"
        },

</snip>

Pod Spec:

<snip> 
                "image": "quay.io/crucible/client-server:36fa1c44f05f0de835c534a398132800",
                "imagePullPolicy": "Always",
                "name": "client-1",
                "resources": {
                    "limits": {
                        "cpu": "70",
                        "memory": "2Gi"
                    },
                    "requests": {
                        "cpu": "70",
                        "memory": "2Gi"
                    }
                },
                "terminationMessagePath": "/dev/termination-log",
                "terminationMessagePolicy": "File",
                "volumeMounts": [
                    {
                        "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
                        "name": "kube-api-access-gcp6r",
                        "readOnly": true
                    }
                ]
            }
</snip>

4. we check the cpus used by the pods:

:1,scale_out_factor:1/run/result-summary.txt
[root@bkr-hv03 uperf]# grep allowed /var/lib/crucible/run/latest/run/client-server/logs/client-1.txt
Cpus_allowed:   57ff,fffffc57,fffffffc
Cpus_allowed_list:      2-34,36,38,42-74,76,78
Mems_allowed:   00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,000000
00,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000003
Mems_allowed_list:      0-1
[root@bkr-hv03 uperf]# grep allowed /var/lib/crucible/run/latest/run/client-server/logs/server-1.txt
Cpus_allowed:   57ff,fffffc57,fffffffc
Cpus_allowed_list:      2-34,36,38,42-74,76,78
Mems_allowed:   00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,000000
00,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000003
Mems_allowed_list:      0-1

5. Now verify if the interrupts are not using the cpus not used by the above pods:


[root@bkr-hv03 uperf]# crucible get metric --run A4B641A6-DA72-11EB-8D30-6350289F27D1 --period C37A7AE2-DA74-11EB-97DA-C371289F27D1 --source procstat --type interrupts-sec --breakout cstype=worker,csid,type=IR-P
CI-MSI,cpu --filter gt:0 | grep worker
    "<worker>-<2>-<IR-PCI-MSI>-<0>": [
    "<worker>-<2>-<IR-PCI-MSI>-<40>": [
    "<worker>-<2>-<IR-PCI-MSI>-<1>": [
    "<worker>-<2>-<IR-PCI-MSI>-<35>": [
    "<worker>-<2>-<IR-PCI-MSI>-<37>": [
    "<worker>-<2>-<IR-PCI-MSI>-<41>": [
    "<worker>-<2>-<IR-PCI-MSI>-<75>": [
    "<worker>-<2>-<IR-PCI-MSI>-<77>": [
    "<worker>-<2>-<IR-PCI-MSI>-<79>": [
    "<worker>-<1>-<IR-PCI-MSI>-<0>": [
    "<worker>-<1>-<IR-PCI-MSI>-<40>": [
    "<worker>-<1>-<IR-PCI-MSI>-<1>": [
    "<worker>-<1>-<IR-PCI-MSI>-<35>": [
    "<worker>-<1>-<IR-PCI-MSI>-<37>": [
    "<worker>-<1>-<IR-PCI-MSI>-<39>": [
    "<worker>-<1>-<IR-PCI-MSI>-<41>": [
    "<worker>-<1>-<IR-PCI-MSI>-<77>": [
    "<worker>-<1>-<IR-PCI-MSI>-<79>": [
WARN[0000] lstat /sys/fs/cgroup/devices/machine.slice/libpod-f1e85fbb77b2ea947bea7d6e5707c37d695786fbb1ba255f5fc4a1ad884cec9b.scope: no such file or directory

                 

As we can the cpus  used by interrupts are not from cpus used by the guaranteed pods. Marking this verified

Comment 14 Andrew Theurer 2021-07-01 14:44:04 UTC
(In reply to Martin Sivák from comment #12)
> Andrew: We believe the issue is fixed in the latest OCP, can you retest in
> your lab?

I worked with Niranjan in his lab with 2 bare-metal workers, and we were able to verify that opting out of interrupts works.

Comment 15 Martin Sivák 2021-07-01 15:19:08 UTC
Thank you both Niranjan and Andrew!


Note You need to log in before you can comment on or make changes to this bug.