Bug 1797025

Summary: Support "managed_irq" in "isolcpus=" parameter
Product: Red Hat Enterprise Linux 8 Reporter: Peter Xu <peterx>
Component: tunedAssignee: Jaroslav Škarvada <jskarvad>
Status: CLOSED ERRATA QA Contact: Robin Hack <rhack>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.2CC: fbaudin, jeder, jskarvad, lcapitulino, lmiksik, olysonek, pezhang, rhack
Target Milestone: rcKeywords: Patch, Upstream
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tuned-2.13.0-6.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-28 16:59:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1783026    
Bug Blocks: 1640832    

Description Peter Xu 2020-01-31 18:47:29 UTC
Kernel commit:

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=11ea68f553e244851d15793a7fa33a97c46d8271

Tuned needs to understand and apply this new sub-parameter to wherever we used isolated_cores.  In other words, we should switch kernel parameter usages like:

  isolcpus=X-Y

to:

  isolcpus=managed_irq,X-Y

So that the isolated cores (X-Y) won't be affected by kernel managed IRQs which can bring extra spikes.

Comment 1 Peter Xu 2020-01-31 21:21:08 UTC
I've raised a question here which could be a challenge...

https://bugzilla.redhat.com/show_bug.cgi?id=1783026#c29

I think one solution could be that we offer another parameter in tuned, like in realtime/realtime-variables.conf:

# Examples:
#
# isolated_cores=2,4-7
# isolated_cores=2-23
#
# Set this when we want to move kernel managed IRQs out of isolated
# cores.  Note that this requires kernel support.  Please only specify
# this parameter if you are sure that the kernel supports it.
#
# isolate_managed_irq=Y

Then we let the user to choose whether to enable this.  Not sure whether this is the best way, though.

Comment 2 Peter Xu 2020-03-12 16:55:59 UTC
I just found a side-effect of managed_irq sub-parameter.  When user specified isolate_managed_irq=Y, instead of using:

  isolcpus=managed_irq,X-Y

I think we need an extra of:

  isolcpus=domain,managed_irq,X-Y

To make sure we keep the HK_FLAG_DOMAIN in kernel.

Let me explain...

The kernel played a trick with the isolcpus= parameter in that if there's no sub-parameter at all, then it'll apply the default one, which is "domain" (HK_FLAG_DOMAIN).  While if there is some sub-parameter specified (in our case, managed_irq), then it'll not apply the default sub-parameter but use what is specified.

Before the managed_irq thing, we're using "isolcpus=X-Y" which implies "isolcpus=domain,X-Y".

So, after the managed_irq, if we want to keep the same behavior as before, but also apply the managed irq logic, what we really need here is "isolcpus=domain,managed_irq,X-Y".

Verify this is easy: HK_FLAG_DOMAIN governs the schedule domain.  So if we're only with "isolcpus=managed_irq,X-Y", we should observe that even our bash will be put into the isolation domain.  Just login to any shell, and try:

  $ taskset -pc $$

The correct result should not contain any isolated cores.

Pei, you can have a look on your test machines to see whether we have this problem after using the managed_irq sub-param.

Comment 3 Pei Zhang 2020-03-13 05:13:00 UTC
(In reply to Peter Xu from comment #2)
> I just found a side-effect of managed_irq sub-parameter.  When user
> specified isolate_managed_irq=Y, instead of using:
> 
>   isolcpus=managed_irq,X-Y
> 
> I think we need an extra of:
> 
>   isolcpus=domain,managed_irq,X-Y
> 
> To make sure we keep the HK_FLAG_DOMAIN in kernel.
> 
> Let me explain...
> 
> The kernel played a trick with the isolcpus= parameter in that if there's no
> sub-parameter at all, then it'll apply the default one, which is "domain"
> (HK_FLAG_DOMAIN).  While if there is some sub-parameter specified (in our
> case, managed_irq), then it'll not apply the default sub-parameter but use
> what is specified.
> 
> Before the managed_irq thing, we're using "isolcpus=X-Y" which implies
> "isolcpus=domain,X-Y".
> 
> So, after the managed_irq, if we want to keep the same behavior as before,
> but also apply the managed irq logic, what we really need here is
> "isolcpus=domain,managed_irq,X-Y".
> 
> Verify this is easy: HK_FLAG_DOMAIN governs the schedule domain.  So if
> we're only with "isolcpus=managed_irq,X-Y", we should observe that even our
> bash will be put into the isolation domain.  Just login to any shell, and
> try:
> 
>   $ taskset -pc $$
> 
> The correct result should not contain any isolated cores.
> 
> Pei, you can have a look on your test machines to see whether we have this
> problem after using the managed_irq sub-param.

Peter,

After using "isolcpus=managed_irq,X-Y", seems it shows correct result.

In guest:

# lscpu
...
NUMA node0 CPU(s):   0-9
...

# cat /proc/cmdline 
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-187.rt13.45.el8bz1779046.x86_64 root=/dev/mapper/rhel_vm--74--105-root ro console=tty0 console=ttyS0,115200n8 biosdevname=0 crashkernel=auto resume=/dev/mapper/rhel_vm--74--105-swap rd.lvm.lv=rhel_vm-74-105/root rd.lvm.lv=rhel_vm-74-105/swap skew_tick=1 isolcpus=2,3,4,5,6,7,8,9 intel_pstate=disable nosoftlockup nohz=on nohz_full=2,3,4,5,6,7,8,9 rcu_nocbs=2,3,4,5,6,7,8,9 default_hugepagesz=1G iommu=pt intel_iommu=on tsc=nowatchdog skew_tick=1 isolcpus=managed_irq,2,3,4,5,6,7,8,9 intel_pstate=disable nosoftlockup tsc=nowatchdog nohz=on nohz_full=2,3,4,5,6,7,8,9 rcu_nocbs=2,3,4,5,6,7,8,9


# taskset -pc $$
pid 9691's current affinity list: 0,1

Comment 4 Peter Xu 2020-03-13 20:15:51 UTC
Good to know it's not a problem downstream. (time to dig the reason but maybe next week :)

Though we probably need that for upstream, or I must have missed something...  Maybe this can be confirmed when we work on this bz.  After all it shouldn't hurt to append "domain" too because it should not break the old ones (it should be the 1st sub-parameter and the default one starting from the very beginning, so it shouldn't break anyone but keep the same behavior always).

Comment 5 Luiz Capitulino 2020-03-17 16:50:55 UTC
Requesting exception, since a complete solution for bug 1783026
depends on this.

Comment 8 Jaroslav Škarvada 2020-03-17 17:13:26 UTC
I am OK with the 8.2 respin.

Comment 9 Jaroslav Škarvada 2020-03-17 17:16:51 UTC
Is this request about conditional or unconditional addition of the managed_irq? I.e. comment 1?

Comment 11 Peter Xu 2020-03-17 17:58:24 UTC
(In reply to Jaroslav Škarvada from comment #9)
> Is this request about conditional or unconditional addition of the
> managed_irq? I.e. comment 1?

Conditional.  Meanwhile, please have a look at comment 2-4, which I think we'd better still follow for upstream kernels (again I haven't digged on why downstream isn't affected, but I think it should affect upstream kernels; it would be good if someone else could verify this too)... 

So I think this is the summary:

- If "isolate_managed_irq=Y" is specified, then append sub-parameters "managed_irq,domain", as:

  isolcpus=managed_irq,domain,X-Y

  The "domain" is majorly for keeping the old behavior, as "isolcpus=X-Y" should implicitly hint "isolcpus=domain,X-Y".

- If "isolate_managed_irq=N" (default) is specified, then keep the isolcpus= parameters as is would be fine, as:

  isolcpus=X-Y

Thanks,

Comment 13 Jaroslav Škarvada 2020-03-20 17:06:35 UTC
https://github.com/redhat-performance/tuned/pull/255

Comment 25 errata-xmlrpc 2020-04-28 16:59:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1883