1720042 – RT: tuned IRQ affinity confliction due to irqbalance service

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1720042 - RT: tuned IRQ affinity confliction due to irqbalance service

Summary: RT: tuned IRQ affinity confliction due to irqbalance service

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	tuned
Sub Component:
Version:	7.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	rc
Target Release:	---
Assignee:	Jaroslav Škarvada
QA Contact:	Robin Hack
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1672377 1780577
TreeView+	depends on / blocked

Reported:	2019-06-13 02:51 UTC by Pei Zhang
Modified:	2023-02-12 17:29 UTC (History)
CC List:	13 users (show)
Fixed In Version:	tuned-2.11.0-9.el7
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-09-29 19:36:52 UTC
Target Upstream Version:
Embargoed:
Flags:	pezhang: needinfo- thozza: mirror+

Attachments	(Terms of Use)
tuned.log (13.04 KB, text/plain) 2019-06-26 07:22 UTC, Peter Xu	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	redhat-performance tuned pull 194	'None'	closed	Apply irqbalance service tuning for realtime-virtual-{host\|guest} profiles	2021-01-14 02:18:49 UTC
Red Hat Issue Tracker	RHELPLAN-53501	None	None	None	2023-02-12 17:29:55 UTC
Red Hat Product Errata	RHBA-2020:3884	None	None	None	2020-09-29 19:37:18 UTC

Description Pei Zhang 2019-06-13 02:51:28 UTC

Description of problem:
In 1h cyclictet testing in KVM-RT guest, sometimes there is latency spike (53us).


Version-Release number of selected component (if applicable):
libvirt-4.5.0-22.el7.x86_64
kernel-rt-3.10.0-1055.rt56.1015.el7.x86_64
tuned-2.11.0-5.el7.noarch
qemu-kvm-rhev-2.12.0-32.el7.x86_64


How reproducible:
7/10


Steps to Reproduce:
1. Install a RHEL7.7 RT host

2. Boot a RHEL7.7 RT guest with 10vCPUs

3. Start kernel compiling in host with $twice_of_host_housekeeping_cores

4. Start kernel compiling in guest with $twice_of_guest_housekeeping_cores

5. Start 1h cyclictest in guest 
# taskset -c 1 /home/nfv-virt-rt-kvm/tools/cyclictest -m -n -q -p95 -D 1h -h60 -t 1 -a 1 --notrace

6. Step 3,4,5 are executing at same time. The max latency exceeds 40us, like below.

# Min Latencies: 00008 00010 00010 00010 00010 00010 00010 00010
# Avg Latencies: 00011 00010 00010 00010 00011 00011 00010 00010
# Max Latencies: 00047 00053 00016 00016 00015 00047 00016 00017

Actual results:
The 1h cyclictest max latency exceeds 40us.

Expected results:
The 24h cyclictest should <= 40us.

Additional info:
1. I think this is probably a regression bz from kernel-rt-3.10.0-1052.rt56.1012.el7.x86_64. This version is the recent first version we began to hit spike.

https://beaker.engineering.redhat.com/recipes/6952403#tasks 

24h cyclictest result:
# Min Latencies: 00004
# Avg Latencies: 00015
# Max Latencies: 00047

2. In latest runs, it can easily reproduce this spike, but now 100% reproduced. Below is runs(all are testing with 1h cyclictest) we hit this spike:

https://beaker.engineering.redhat.com/recipes/6992357#tasks 
https://beaker.engineering.redhat.com/recipes/6988238#tasks
https://beaker.engineering.redhat.com/recipes/6988234#tasks

Note:  All detail data please check at the bottom of the "taskout.log"

Comment 3 Pei Zhang 2019-06-13 10:15:20 UTC

(In reply to Pei Zhang from comment #0)
[...]
> 2. In latest runs, it can easily reproduce this spike, but now 100%
                                                             ^^^
Sorry, typo here, should be "not 100%".

Comment 4 Pei Zhang 2019-06-13 11:10:32 UTC

Today I tried to double confirm the regression kernel-rt version, but kernel-rt-3.10.0-1050.rt56.1010 also hit spike like below, this version worked well in the past testings. 

Versions:
libvirt-4.5.0-22.el7.x86_64
tuned-2.11.0-5.el7.noarch
kernel-rt-3.10.0-1050.rt56.1010.el7.x86_64
qemu-kvm-rhev-2.12.0-32.el7.x86_64

Results:
# Min Latencies: 00007 00009 00009 00010 00009 00009 00009 00009
# Avg Latencies: 00010 00010 00009 00010 00009 00009 00009 00009
# Max Latencies: 00050 00045 00016 00015 00015 00015 00015 00015


I'm trying to locate which component cause this spike(maybe not kernel-rt)? I will update results once I have any finding.

Comment 6 Pei Zhang 2019-06-14 14:27:18 UTC

Update:

After downgrading all kernel-rt,tuned,qemu-kvm,libvirt versions(These versions worked well in past 24h testings, like below), but still hit spikes.

Versions:
libvirt-4.5.0-22.el7.x86_64
tuned-2.11.0-5.el7.noarch
kernel-rt-3.10.0-1050.rt56.1010.el7.x86_64
qemu-kvm-rhev-2.12.0-29.el7.x86_64

Beaker job:
https://beaker.engineering.redhat.com/recipes/6992357#tasks

I'm not sure if it's hardware issue(broken?) or tree(any other component?) issue. I'll try on different servers. Will update results soon.

Comment 7 Pei Zhang 2019-06-19 09:34:26 UTC

Update:

The spike still exists in the automation testing with 1h cyclictest after I tried below:

1. Remove image copy step: both host and guest have enough free memory, so it's not memory lack issue.
2. Recover image size from 50G to 20G: so it's not image size issue
3. Remove kernel compiling in both host&guest step: so it's not memory/io/cpu stress issue
4. Testing same scenario on 3 servers, each server may hit spike: so it's not hardware issue

And mostly the spike is in cpu1 and cpu2 in guest(Scenario single VM with multiple vCPUs).


I guess there might be something in automation which cause this spike, however I still cannot find it up to now. 

After talk with Peter, I'll do 24h cyclictest testings manually. Meanwhile I'll continue to debug the automation scripts. 

I'll update manually testing results soon.

Comment 8 Pei Zhang 2019-06-24 02:31:58 UTC

Update:

This issue can not be reproduced in 24h manually testing. 2/2 PASS.

== Versions ==
3.10.0-1057.rt56.1017.el7.x86_64
tuned-2.11.0-5.el7.noarch
qemu-kvm-rhev-2.12.0-32.el7.x86_64
libvirt-4.5.0-22.el7.x86_64


== 24h Run 1 ==
(1)Single VM with 1 rt vCPU:
# Min Latencies: 00007
# Avg Latencies: 00009
# Max Latencies: 00021

(2)Single VM with 8 rt vCPUs:
# Min Latencies: 00007 00010 00010 00010 00010 00010 00010 00010
# Avg Latencies: 00012 00011 00011 00011 00011 00011 00011 00011
# Max Latencies: 00040 00033 00030 00026 00028 00026 00027 00027

(3)Multiple VMs each with 1 rt vCPU:
- VM1
# Min Latencies: 00007
# Avg Latencies: 00011
# Max Latencies: 00031

- VM2
# Min Latencies: 00007
# Avg Latencies: 00010
# Max Latencies: 00027

- VM3
# Min Latencies: 00009
# Avg Latencies: 00011
# Max Latencies: 00028

- VM4
# Min Latencies: 00007
# Avg Latencies: 00019
# Max Latencies: 00025


== 24h run 2 ==
(1)Single VM with 1 rt vCPU:
# Min Latencies: 00007
# Avg Latencies: 00010
# Max Latencies: 00029

(2)Single VM with 8 rt vCPUs:
# Min Latencies: 00008 00010 00010 00010 00010 00010 00010 00010
# Avg Latencies: 00011 00010 00010 00010 00010 00010 00010 00010
# Max Latencies: 00030 00027 00035 00027 00027 00026 00027 00032

(3)Multiple VMs each with 1 rt vCPU:
- VM1
# Min Latencies: 00007
# Avg Latencies: 00011
# Max Latencies: 00030

- VM2
# Min Latencies: 00008
# Avg Latencies: 00011
# Max Latencies: 00030

- VM3
# Min Latencies: 00007
# Avg Latencies: 00011
# Max Latencies: 00033

- VM4
# Min Latencies: 00009
# Avg Latencies: 00011
# Max Latencies: 00031


Note: 
In above 2 runs, we are running automation scripts manually(same testing steps as usual) and the latency looks good. However when running automation scripts with beaker job, it still can hit spike with 1h cyclictest testing. 

So I guess possibly beaker process is doing something which may cause spike.

Comment 9 Pei Zhang 2019-06-24 08:55:29 UTC

Update:

After comparing the difference between beaker job and manually run scripts, we noticed there are differences for process serial and acpi:

For beaker job, process serial and acpi have interrupts on isolated CPU1. And their affinity list is in isolated cores 1,3,5,7,9,11,13,15,17,19. (We hit spike with beaker job. With beaker job, we reboot host after installing kernel-rt packages/applying tuned/setting hugepage)

# cat /proc/interrupts

            CPU0       CPU1       CPU2     ...       CPU19
   0:        517          0          0     ...          0  IR-IO-APIC-edge      timer
   4:       1119        643          0     ...          0  IR-IO-APIC-edge      serial
   8:          1          0          0     ...          0  IR-IO-APIC-edge      rtc0
   9:         27       3177          0     ...          0  IR-IO-APIC-fasteoi   acpi
...


[root@dell-per430-10 ~]# ps aux | grep serial
root      2991  0.0  0.0      0     0 ?        S    02:29   0:00 [irq/4-serial]
...

[root@dell-per430-10 ~]# taskset -cp 2991
pid 2991's current affinity list: 1,3,5,7,9,11,13,15,17,19

[root@dell-per430-10 ~]# ps aux | grep acpi 
root       208  0.0  0.0      0     0 ?        S    02:29   0:00 [irq/9-acpi]
...

[root@dell-per430-10 ~]# taskset -cp 208
pid 208's current affinity list: 1,3,5,7,9,11,13,15,17,19


However when I manually run, serial and acpi affinity list is 0,2,4,6,8,10. (Latency results looks good. I do this testing after shutdown/boot RT host, as I re-used the RT-host setup after beaker job finished)

[root@dell-per430-09 ~]# cat /proc/interrupts
            CPU0       CPU1       CPU2      ...  CPU19   
   0:        518          0          0      ...     0  IR-IO-APIC-edge      timer
   4:       1105         19          0      ...     0  IR-IO-APIC-edge      serial
   8:          1          0          0      ...     0  IR-IO-APIC-edge      rtc0
   9:        122          0          0      ...     0  IR-IO-APIC-fasteoi   acpi
...

[root@dell-per430-09 ~]# ps aux | grep serial
root      2639  0.0  0.0      0     0 ?        S    01:54   0:00 [irq/4-serial]
...

[root@dell-per430-09 ~]# taskset -cp 2639
pid 2639's current affinity list: 0,2,4,6,8,10

[root@dell-per430-09 ~]# ps aux | grep acpi
root       208  0.0  0.0      0     0 ?        S    01:54   0:00 [irq/9-acpi]

[root@dell-per430-09 ~]# taskset -cp 208
pid 208's current affinity list: 0,2,4,6,8,10


More additional info:
1. 
# lscpu (in rt host)
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                20
On-line CPU(s) list:   0-19
Thread(s) per core:    1
Core(s) per socket:    10
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
Stepping:              2
CPU MHz:               2297.378
BogoMIPS:              4594.75
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              25600K
NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18
NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm arat pln pts md_clear spec_ctrl intel_stibp flush_l1d

2. Host isolated cores: 1,3,5,7,9,11,13,15,17,19,12,14,16,18
# cat /proc/cmdline (In rt host)
BOOT_IMAGE=/vmlinuz-3.10.0-1058.rt56.1018.el7.x86_64 root=/dev/mapper/rhel_dell--per430--10-root ro crashkernel=auto rd.lvm.lv=rhel_dell-per430-10/root rd.lvm.lv=rhel_dell-per430-10/swap console=ttyS0,115200n81 default_hugepagesz=1G iommu=pt intel_iommu=on kvm-intel.vmentry_l1d_flush=never skew_tick=1 isolcpus=1,3,5,7,9,11,13,15,17,19,12,14,16,18 intel_pstate=disable nosoftlockup nohz=on nohz_full=1,3,5,7,9,11,13,15,17,19,12,14,16,18 rcu_nocbs=1,3,5,7,9,11,13,15,17,19,12,14,16,18

Comment 10 Luiz Capitulino 2019-06-25 18:32:26 UTC

I think we want to understand why this is happening. For example,
could this be a tuned issue?

Comment 11 Peter Xu 2019-06-26 04:22:00 UTC

(In reply to Luiz Capitulino from comment #10)
> I think we want to understand why this is happening. For example,
> could this be a tuned issue?

I agree.  Actually Pei is looking into what beaker has done to us, and I am trying to understand how these irqs are configured besides beaker.

I am recently reading tuned and I see that we should have operations on irq smp affinity as long as "isolated_cores" parameter is provided in the profile config, then we should already have tuned move all these irq threads to housekeeping cpus explicitly, am I right? (I see it was done in _set_irq_affinity() in tuned/plugins/plugin_scheduler.py, though I still cannot fully digest how tuned daemon worked there, but I feel like that because _isolated_cores() will finally call it)

And if so, I would assume that will also be done after system reboot (during tuned service, as long as we have enabled tuned service), then logically the irqs should be good.  That matches with our manual test results where we don't have beaker.  From that POV, it seems tuned was working good.

Instead, I'm curious on whether beaker would have any specific services/scripts that were run after tuned service so our correct irq affinities were screwed up by that.

Comment 12 Pei Zhang 2019-06-26 05:19:50 UTC

Hi Peter, Luiz,

After testing, I can confirm the serial/acpi affinity issue is decided by tuned and it affects the latency. 

Their affinity is default 1,3,5,7,9,11,13,15,17,19 in my server, after applying tuned via [1], their affinity cores should change to housekeeping cores, which should be 0,2,4,6,8,10.

For the beaker job, after executing[1], mostly it fails to change to housekeeping cores in my recent testings, and the tuned process is running well in host. However I've found a workaround[2] to solve this issue in beaker job, the affinity will change from default value to housekeeping cores. With this workaround, the 1h latency looks good. 3/3 pass.

So from my side, I'll continue to find out why tuned with beaker job can not switch affinity to housekeeping cores(mostly) at the first executing [1].

[1] # tuned-adm profile realtime-virtual-host
[2] Executing [1] - > Reboot host - > Executing [1] again - > Start kvm-rt testing.

Thank you.

Best regards,

Pei

Comment 13 Peter Xu 2019-06-26 07:22:04 UTC

Created attachment 1584635 [details]
tuned.log

tuned log under /var/log/tuned/tuned.log

Comment 14 Peter Xu 2019-06-26 07:29:04 UTC

I've uploaded a tuned log when the problem happened on the beaker system, and I verified Pei's finding that tuned does not apply the affinities well but it can be fixed after a service restart.

Some timestamps for the log:

(1) at 02:15 - the host boots up, now the pinning is wrong:

[root@dell-per430-09 ~]# pgrep -fa serial                                                                                               
2950 irq/4-serial                                                                                                                       
[root@dell-per430-09 ~]# taskset -pc 2950                                                                                               
pid 2950's current affinity list: 1,3,5,7,9,11,13,15,17,19                                                                              

(2) at 03:01 - I restarted tuned.  After that, the pinning is correct:

[root@dell-per430-09 tuned]# systemctl restart tuned                
[root@dell-per430-09 tuned]# taskset -pc 2950                        
pid 2950's current affinity list: 0,2,4,6,8,10

I do see some errors. Should we convert some errors into warnings if they're not going to block the tuned from continuing?  Meanwhile I'm still trying to figure out what's really gone wrong.

Comment 15 Peter Xu 2019-06-27 07:52:31 UTC

Digging through tuned didn't help me a lot... after enabling tuned debugging (appending -D with tuned daemon) we can still see that the IRQ affinities were correctly set via the tuned logs.  So tuned is behaving good.

Finally this trick helped us to identify who's the culprit:

  ExecStartPre=/usr/bin/trace-cmd start -p function -l irq_affinity_proc_write -l irq_affinity_list_proc_write
  (appended to tuned.serivce to start tracing before tuned start; Pei helped me to inject this into beaker)

And with that, after re-invoking a beaker RT testcase we can see this in trace buffer:

  irqbalance-1717  [002] .......    41.573727: irq_affinity_proc_write <-proc_reg_write                                                                                  

So the problem seems to be simply that we forgot to disable irqbalance on the beaker machines...

I assume this is not a bug for our real-time portfolio but a bug in the automation script that we should disable irqbalance service for RT tests, otherwise it'll collapse with tuned.

Pei will fix up our auto-script and see whether the issue will be gone forever.  If it works for us, then we can close this bug as NOTABUG.

Comment 16 Peter Xu 2019-06-27 08:02:28 UTC

Or better.... should tuned try to detect irqbalance (or any other tools that might collapse with it)?

Also, tuned can at least try to detect any configuration change that was unexpected so it can post some errors into the log.  After all we've got a daemon there so verifying configs should be doable too.

All these could be overkills, though... Luiz, do you think any of these could be valid TODO for tuned?

Comment 18 Luiz Capitulino 2019-06-27 19:43:09 UTC

(In reply to Peter Xu from comment #15)
> Digging through tuned didn't help me a lot... after enabling tuned debugging
> (appending -D with tuned daemon) we can still see that the IRQ affinities
> were correctly set via the tuned logs.  So tuned is behaving good.
> 
> Finally this trick helped us to identify who's the culprit:
> 
>   ExecStartPre=/usr/bin/trace-cmd start -p function -l
> irq_affinity_proc_write -l irq_affinity_list_proc_write
>   (appended to tuned.serivce to start tracing before tuned start; Pei helped
> me to inject this into beaker)
> 
> And with that, after re-invoking a beaker RT testcase we can see this in
> trace buffer:
> 
>   irqbalance-1717  [002] .......    41.573727: irq_affinity_proc_write
> <-proc_reg_write                                                            
> 
> 
> So the problem seems to be simply that we forgot to disable irqbalance on
> the beaker machines...

Wondefully done Peter!! What a find!

I think there are two action items:

1. For the immediate term, we should add an entry to our KVM-RT guides about disabling
   (or not installing) irqbalance. I'll follow up with the guides by email so that you
   can do the change

2. Since the KVM-RT tuned profiles are separate rpm packages, I'm wondering if we should
   add a conflicts for irqbalance. But we have to find out what are the implications of
   doing this. Might even be something we want to do only for RHEL8. In any case, I think
   the best way to move forward with this second option is to start a thread downstream
   to see if anyone has objections

I'm lowering the importance of the bug, since we found the root cause and have a workaround.

Comment 19 Pei Zhang 2019-07-01 02:01:37 UTC

After disabling irqbalance, the 24h cyclictest max latency looks good:

==Versions==
kernel-rt-3.10.0-1059.rt56.1019.el7.x86_64
tuned-2.11.0-5.el7.noarch
qemu-kvm-rhev-2.12.0-33.el7.x86_64
libvirt-4.5.0-23.el7.x86_64

==Results==
(1)Single VM with 1 rt vCPU:
# Min Latencies: 00007
# Avg Latencies: 00010
# Max Latencies: 00020

(2)Single VM with 8 rt vCPUs:
# Min Latencies: 00008 00010 00010 00010 00010 00010 00010 00010
# Avg Latencies: 00011 00010 00010 00010 00011 00010 00010 00010
# Max Latencies: 00020 00017 00039 00027 00029 00027 00026 00028

(3)Multiple VMs each with 1 rt vCPU:
- VM1
# Min Latencies: 00007
# Avg Latencies: 00010
# Max Latencies: 00029

- VM2
# Min Latencies: 00007
# Avg Latencies: 00011
# Max Latencies: 00032

- VM3
# Min Latencies: 00007
# Avg Latencies: 00011
# Max Latencies: 00029

- VM4
# Min Latencies: 00007
# Avg Latencies: 00011
# Max Latencies: 00032

Comment 20 Luiz Capitulino 2019-07-03 14:34:25 UTC

On further investigation we found out that this issue has
always existed (ie. not a regression) and the solution should
be implemented in tuned. Peter Xu will follow up with more details.

Comment 21 Peter Xu 2019-07-08 08:26:52 UTC

I've sent a PR to partially fix this problem (as suggested by Luiz):

https://github.com/redhat-performance/tuned/pull/194

I'll wait for maintainer's feedback to see how we should move on with this.

Comment 25 Luiz Capitulino 2019-07-12 20:29:44 UTC

Peter, would you just remind me how your fix solves the problem? It is also
a good idea to have it documented here.

Comment 26 Peter Xu 2019-07-13 00:51:28 UTC

(In reply to Luiz Capitulino from comment #25)
> Peter, would you just remind me how your fix solves the problem? It is also
> a good idea to have it documented here.

Hi, Luiz,

The solution was let the two realtime tuned profiles to use IRQBALANCE_BANNED_CPUS (just like the cpu-partitioning profile) to tell irqbalance service that it should not schedule any interrupt upon our realtime cores.

Thanks,

Comment 35 Pei Zhang 2020-03-26 08:49:35 UTC

Seems this issue still exists with tuned-2.11.0-9.el7.noarch:

With irqbalance service enabled, acpi/serial process are still bind to some isolated cores. We expect them to bind to housekeeping cores.

(1) serial process is bind to some host isolated cores.
# ps aux | grep serial
root      2938  0.0  0.0      0     0 ?        S    00:23   0:00 [irq/4-serial]

# taskset -cp 2938
pid 2938's current affinity list: 1,3,5,7,9,11,13,15,17,19


(1) acpi process is bind to some host isolated cores.
# ps aux | grep acpi
root       208  0.0  0.0      0     0 ?        S    00:23   0:00 [irq/9-acpi]
# taskset -cp 208
pid 208's current affinity list: 1,3,5,7,9,11,13,15,17,19


More info about host config:

# cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-3.10.0-1130.rt56.1101.el7.x86_64 root=/dev/mapper/rhel_dell--per430--11-root ro crashkernel=auto rd.lvm.lv=rhel_dell-per430-11/root rd.lvm.lv=rhel_dell-per430-11/swap console=ttyS0,115200n81 LANG=en_US.UTF-8 default_hugepagesz=1G iommu=pt intel_iommu=on skew_tick=1 isolcpus=1,3,5,7,9,11,13,15,17,19,12,14,16,18 intel_pstate=disable nosoftlockup nohz=on nohz_full=1,3,5,7,9,11,13,15,17,19,12,14,16,18 rcu_nocbs=1,3,5,7,9,11,13,15,17,19,12,14,16,18 spectre_v2=off nopti kvm-intel.vmentry_l1d_flush=never

# systemctl status irqbalance
● irqbalance.service - irqbalance daemon
   Loaded: loaded (/usr/lib/systemd/system/irqbalance.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2020-03-26 00:23:25 EDT; 4h 15min ago
 Main PID: 1647 (irqbalance)
    Tasks: 1
   CGroup: /system.slice/irqbalance.service
           └─1647 /usr/sbin/irqbalance --foreground

Mar 26 00:23:25 dell-per430-11.lab.eng.pek2.redhat.com systemd[1]: Started irqbalance daemon.

# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                20
On-line CPU(s) list:   0-19
Thread(s) per core:    1
Core(s) per socket:    10
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
Stepping:              2
CPU MHz:               2297.261
BogoMIPS:              4594.52
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              25600K
NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18
NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb invpcid_single ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm arat pln pts md_clear spec_ctrl intel_stibp flush_l1d

# cat /etc/sysconfig/irqbalance 
...

IRQBALANCE_BANNED_CPUS=000ffaaa
                       ^^^^^^^^
                      (This represents all isolated cores: 1,3,5,7,9,11,12,13,14,15,16,17,18,19)

Comment 36 Jaroslav Škarvada 2020-03-26 18:10:02 UTC

(In reply to Pei Zhang from comment #35)

Hmm, no idea what to do more on the Tuned side.

Comment 37 Jaroslav Škarvada 2020-03-26 18:15:06 UTC

(In reply to Jaroslav Škarvada from comment #36)
> (In reply to Pei Zhang from comment #35)
> 
> Hmm, no idea what to do more on the Tuned side.

Maybe the irqbalance service needs to be restarted? IIRC we are not doing it in the Tuned profile. In such case machine reboot could fix it (just guessing).

Comment 38 Ondřej Lysoněk 2020-04-23 16:33:14 UTC

Pei, Peter, it looks to me like irqbalance sets the list of banned CPUs automatically, even without Tuned intervention. Please see bug 1784645 comment 11 - 13. Perhaps you're hitting some other issue?

Comment 39 Peter Xu 2020-04-23 18:11:24 UTC

Hi, Ondřej,

(In reply to Ondřej Lysoněk from comment #38)
> Pei, Peter, it looks to me like irqbalance sets the list of banned CPUs
> automatically, even without Tuned intervention. 

I noticed that you mentioned: "RHEL-7.3: there seems to be a bug - the list of banned CPUs contains one more CPU than is necessary" in bug 1784645 comment 13.  Could that still be a problem?  Do you mean what we've encountered is an irqbalance bug rather than anything to do with tuned?

Thanks,

Comment 40 Ondřej Lysoněk 2020-04-24 08:07:27 UTC

(In reply to Peter Xu from comment #39)
> Hi, Ondřej,
> 
> (In reply to Ondřej Lysoněk from comment #38)
> > Pei, Peter, it looks to me like irqbalance sets the list of banned CPUs
> > automatically, even without Tuned intervention. 
> 
> I noticed that you mentioned: "RHEL-7.3: there seems to be a bug - the list
> of banned CPUs contains one more CPU than is necessary" in bug 1784645
> comment 13.  Could that still be a problem?  Do you mean what we've
> encountered is an irqbalance bug rather than anything to do with tuned?

Hard to say. I haven't looked into what the problem is. I think the particular issue that I ran into does not explain it - in that case, the effective list of banned CPUs is a superset of the expected banned CPUs, which is OK, as far as I understand it.

But of course there could be some other irqbalance bug that I haven't discovered. So it is a possibility.

Comment 44 errata-xmlrpc 2020-09-29 19:36:52 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (tuned bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3884

Note You need to log in before you can comment on or make changes to this bug.