Bug 2061524

Summary:	unnecessary kernel IPIs break through cpu isolation
Product:	Red Hat Enterprise Linux 8	Reporter:	Karl Rister <krister>
Component:	kernel-rt	Assignee:	Valentin Schneider <vschneid>
kernel-rt sub component:	Core-Kernel	QA Contact:	Qiao Zhao <qzhao>
Status:	CLOSED MIGRATED	Docs Contact:
Severity:	medium
Priority:	high	CC:	bhu, bwensley, ezulian, fbaudin, jlelli, jmario, kcarcia, mhou, pauld, qzhao, vschneid, williams
Version:	8.5	Keywords:	MigratedToJIRA, Triaged
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-09-25 18:14:21 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Karl Rister 2022-03-07 18:35:36 UTC

Description of problem:

When CPU isolation is used, such as via the cpu-partitioning tuned profile, it is possible for isolated CPUs to be interrupted via kernel IPIs initiated by non-isolated CPUs.  There are many different ways that this can happen but a few have been diagnosed using the rt-trace-bpf tool:

caused by NetworkManager:
64359.052209596    NetworkManager       0    1405     smp_call_function_many_cond (cpu=0, func=do_kernel_range_flush)
        smp_call_function_many_cond+0x1
        smp_call_function+0x39
        on_each_cpu+0x2a
        flush_tlb_kernel_range+0x7b
        __purge_vmap_area_lazy+0x70
        _vm_unmap_aliases.part.42+0xdf
        change_page_attr_set_clr+0x16a
        set_memory_ro+0x26
        bpf_int_jit_compile+0x2f9
        bpf_prog_select_runtime+0xc6
        bpf_prepare_filter+0x523
        sk_attach_filter+0x13
        sock_setsockopt+0x92c
        __sys_setsockopt+0x16a
        __x64_sys_setsockopt+0x20
        do_syscall_64+0x87
        entry_SYSCALL_64_after_hwframe+0x65

caused by the mgag200 kernel module:
238903.096535737   kworker/0:1          0    88579    smp_call_function_many_cond (cpu=0, func=do_flush_tlb_all)
        smp_call_function_many_cond+0x1
        smp_call_function+0x39
        on_each_cpu+0x2a
        flush_tlb_kernel_range+0x48
        __purge_vmap_area_lazy+0x70
        free_vmap_area_noflush+0xf2
        remove_vm_area+0x93
        __vunmap+0x59
        drm_gem_shmem_vunmap+0x6d
        mgag200_handle_damage+0x62
        mgag200_simple_display_pipe_update+0x69
        drm_atomic_helper_commit_planes+0xb3
        drm_atomic_helper_commit_tail+0x26
        commit_tail+0xc6
        drm_atomic_helper_commit+0x103
        drm_atomic_helper_dirtyfb+0x20e
        drm_fb_helper_damage_work+0x228
        process_one_work+0x18f
        worker_thread+0x30
        kthread+0x15d
        ret_from_fork+0x1f

Tracing on the isolated CPUs shows preemptions such as this:

58118.769286 |   18)  <...>-128143  |               |  smp_call_function_interrupt() {
58118.769286 |   18)  <...>-128143  |               |    irq_enter() {
58118.769287 |   18)  <...>-128143  |   0.101 us    |      preempt_count_add();
58118.769288 |   18)  <...>-128143  |   0.968 us    |    }
58118.769288 |   18)  <...>-128143  |               |    generic_smp_call_function_single_interrupt() {
58118.769289 |   18)  <...>-128143  |               |      flush_smp_call_function_queue() {
58118.769289 |   18)  <...>-128143  |               |        do_flush_tlb_all() {
58118.769290 |   18)  <...>-128143  |   0.453 us    |          native_flush_tlb_global();
58118.769291 |   18)  <...>-128143  |   1.439 us    |        }
58118.769292 |   18)  <...>-128143  |   2.402 us    |      }
58118.769292 |   18)  <...>-128143  |   3.223 us    |    }
58118.769292 |   18)  <...>-128143  |               |    irq_exit() {
58118.769293 |   18)  <...>-128143  |   0.077 us    |      preempt_count_sub();
58118.769294 |   18)  <...>-128143  |   0.201 us    |      idle_cpu();
58118.769295 |   18)  <...>-128143  |               |      tick_nohz_irq_exit() {
58118.769295 |   18)  <...>-128143  |   0.164 us    |        ktime_get();
58118.769296 |   18)  <...>-128143  |               |        __tick_nohz_full_update_tick() {
58118.769296 |   18)  <...>-128143  |   0.079 us    |          check_tick_dependency();
58118.769297 |   18)  <...>-128143  |   0.074 us    |          check_tick_dependency();
58118.769298 |   18)  <...>-128143  |   0.070 us    |          check_tick_dependency();
58118.769299 |   18)  <...>-128143  |   0.101 us    |          check_tick_dependency();
58118.769300 |   18)  <...>-128143  |   1.458 us    |          tick_nohz_next_event();
58118.769302 |   18)  <...>-128143  |   0.082 us    |          tick_nohz_stop_tick();
58118.769303 |   18)  <...>-128143  |   6.229 us    |        }
58118.769303 |   18)  <...>-128143  |   8.124 us    |      }
58118.769303 |   18)  <...>-128143  | + 10.872 us   |    }
58118.769304 |   18)  <...>-128143  | + 17.471 us   |  }




Version-Release number of selected component (if applicable):

4.18.0-348.12.2.rt7.143.el8_5.x86_64


How reproducible:

Easily


Steps to Reproduce:
1.  Boot the system using an RT kernel and the cpu-partitioning tuned profile
2.  Run a workload that measures latency, such as oslat, on the isolated CPUs
3.  Trace the kernel activity on the isolated CPUs while the workload is running


Actual results:

Latency spikes caused by IPI processing will be observed on the isolated CPUs when there is no need to handle the IPI at that moment.


Expected results:

No needless IPI processing should occur on the isolated CPUs -- for example, for a 100% userspace workload such as oslat there is no need to enter the kernel and service the IPI until a necessary kernel entry occurs (ie. system call, timer interrupt, etc.).


Additional info:

Comment 11 RHEL Program Management 2023-09-25 18:13:59 UTC

Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 12 RHEL Program Management 2023-09-25 18:14:21 UTC

This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.