Bug 2061524 - unnecessary kernel IPIs break through cpu isolation
Summary: unnecessary kernel IPIs break through cpu isolation
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: kernel-rt
Version: 8.5
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: rc
: ---
Assignee: Valentin Schneider
QA Contact: Qiao Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-07 18:35 UTC by Karl Rister
Modified: 2023-08-08 13:14 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-114746 0 None None None 2022-03-07 18:39:27 UTC

Description Karl Rister 2022-03-07 18:35:36 UTC
Description of problem:

When CPU isolation is used, such as via the cpu-partitioning tuned profile, it is possible for isolated CPUs to be interrupted via kernel IPIs initiated by non-isolated CPUs.  There are many different ways that this can happen but a few have been diagnosed using the rt-trace-bpf tool:

caused by NetworkManager:
64359.052209596    NetworkManager       0    1405     smp_call_function_many_cond (cpu=0, func=do_kernel_range_flush)
        smp_call_function_many_cond+0x1
        smp_call_function+0x39
        on_each_cpu+0x2a
        flush_tlb_kernel_range+0x7b
        __purge_vmap_area_lazy+0x70
        _vm_unmap_aliases.part.42+0xdf
        change_page_attr_set_clr+0x16a
        set_memory_ro+0x26
        bpf_int_jit_compile+0x2f9
        bpf_prog_select_runtime+0xc6
        bpf_prepare_filter+0x523
        sk_attach_filter+0x13
        sock_setsockopt+0x92c
        __sys_setsockopt+0x16a
        __x64_sys_setsockopt+0x20
        do_syscall_64+0x87
        entry_SYSCALL_64_after_hwframe+0x65

caused by the mgag200 kernel module:
238903.096535737   kworker/0:1          0    88579    smp_call_function_many_cond (cpu=0, func=do_flush_tlb_all)
        smp_call_function_many_cond+0x1
        smp_call_function+0x39
        on_each_cpu+0x2a
        flush_tlb_kernel_range+0x48
        __purge_vmap_area_lazy+0x70
        free_vmap_area_noflush+0xf2
        remove_vm_area+0x93
        __vunmap+0x59
        drm_gem_shmem_vunmap+0x6d
        mgag200_handle_damage+0x62
        mgag200_simple_display_pipe_update+0x69
        drm_atomic_helper_commit_planes+0xb3
        drm_atomic_helper_commit_tail+0x26
        commit_tail+0xc6
        drm_atomic_helper_commit+0x103
        drm_atomic_helper_dirtyfb+0x20e
        drm_fb_helper_damage_work+0x228
        process_one_work+0x18f
        worker_thread+0x30
        kthread+0x15d
        ret_from_fork+0x1f

Tracing on the isolated CPUs shows preemptions such as this:

58118.769286 |   18)  <...>-128143  |               |  smp_call_function_interrupt() {
58118.769286 |   18)  <...>-128143  |               |    irq_enter() {
58118.769287 |   18)  <...>-128143  |   0.101 us    |      preempt_count_add();
58118.769288 |   18)  <...>-128143  |   0.968 us    |    }
58118.769288 |   18)  <...>-128143  |               |    generic_smp_call_function_single_interrupt() {
58118.769289 |   18)  <...>-128143  |               |      flush_smp_call_function_queue() {
58118.769289 |   18)  <...>-128143  |               |        do_flush_tlb_all() {
58118.769290 |   18)  <...>-128143  |   0.453 us    |          native_flush_tlb_global();
58118.769291 |   18)  <...>-128143  |   1.439 us    |        }
58118.769292 |   18)  <...>-128143  |   2.402 us    |      }
58118.769292 |   18)  <...>-128143  |   3.223 us    |    }
58118.769292 |   18)  <...>-128143  |               |    irq_exit() {
58118.769293 |   18)  <...>-128143  |   0.077 us    |      preempt_count_sub();
58118.769294 |   18)  <...>-128143  |   0.201 us    |      idle_cpu();
58118.769295 |   18)  <...>-128143  |               |      tick_nohz_irq_exit() {
58118.769295 |   18)  <...>-128143  |   0.164 us    |        ktime_get();
58118.769296 |   18)  <...>-128143  |               |        __tick_nohz_full_update_tick() {
58118.769296 |   18)  <...>-128143  |   0.079 us    |          check_tick_dependency();
58118.769297 |   18)  <...>-128143  |   0.074 us    |          check_tick_dependency();
58118.769298 |   18)  <...>-128143  |   0.070 us    |          check_tick_dependency();
58118.769299 |   18)  <...>-128143  |   0.101 us    |          check_tick_dependency();
58118.769300 |   18)  <...>-128143  |   1.458 us    |          tick_nohz_next_event();
58118.769302 |   18)  <...>-128143  |   0.082 us    |          tick_nohz_stop_tick();
58118.769303 |   18)  <...>-128143  |   6.229 us    |        }
58118.769303 |   18)  <...>-128143  |   8.124 us    |      }
58118.769303 |   18)  <...>-128143  | + 10.872 us   |    }
58118.769304 |   18)  <...>-128143  | + 17.471 us   |  }




Version-Release number of selected component (if applicable):

4.18.0-348.12.2.rt7.143.el8_5.x86_64


How reproducible:

Easily


Steps to Reproduce:
1.  Boot the system using an RT kernel and the cpu-partitioning tuned profile
2.  Run a workload that measures latency, such as oslat, on the isolated CPUs
3.  Trace the kernel activity on the isolated CPUs while the workload is running


Actual results:

Latency spikes caused by IPI processing will be observed on the isolated CPUs when there is no need to handle the IPI at that moment.


Expected results:

No needless IPI processing should occur on the isolated CPUs -- for example, for a 100% userspace workload such as oslat there is no need to enter the kernel and service the IPI until a necessary kernel entry occurs (ie. system call, timer interrupt, etc.).


Additional info:


Note You need to log in before you can comment on or make changes to this bug.