Bug 1697531
Summary: | x86 kprobe optimization causes rcu hang | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | William Cohen <wcohen> |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 29 | CC: | airlied, bskeggs, fche, hdegoede, ichavero, itamar, jarodwilson, jeremy, jglisse, john.j5live, jonathan, josef, kernel-maint, linville, masami.hiramatsu, mchehab, mjg59, smakarov, steved, wcohen |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-07-11 15:48:36 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
William Cohen
2019-04-08 15:28:53 UTC
Created attachment 1553655 [details]
uprobes_onthefly.c
C code for the reproducer.
Created attachment 1553656 [details]
systemtap part of the reproducer
The sysetmtap script triggering the rcu hangs in the kernel.
Created attachment 1553946 [details]
Further simplification of the systemtap reproducer
The reproducer has been further reduced to probing the functions in kernel/workqueue.c. I suspect that the kprobe optimization code is using some of the added to kernel/workqueue.c to schedule rcu work and there kprobes firing on those while trying to optimize the kprobes is causing a problem.
Note that there needs to be some load on the machine to trigger the problem:
./uprobes_onthefly.x &
sudo stap pr24416j.stp -D STP_ON_THE_FLY_INTERVAL=100000 --suppress-handler-errors -w -DDEBUG_STP_ON_THE_FLY -DMAXMAPENTRIES=100000 -DSTP_NO_OVERLOAD
After 10 minutes see stall detector message in dmesg output:
[ 2069.970075] INFO: rcu_tasks detected stalls on tasks:
[ 2069.970077] 000000001d2aeff6: .. nvcsw: 1/1 holdout: 1 idle_cpu: -1/0
[ 2069.970079] uprobes_onthefl R running task 0 9118 7230 0x00000000
[ 2069.970081] Call Trace:
[ 2069.970092] ? __schedule+0x253/0x860
[ 2069.970093] schedule+0x28/0x70
[ 2069.970095] exit_to_usermode_loop+0x57/0xe0
[ 2069.970097] prepare_exit_to_usermode+0x59/0x60
[ 2069.970098] retint_user+0x8/0x8
[ 2069.970100] RIP: 0033:0x401113
[ 2069.970101] Code: 2f 00 00 01 5d c3 90 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa eb 8a 55 48 89 e5 89 7d ec 8b 45 ec 89 45 fc <8b> 45 ec 8d 14 00 8b 45 fc 01 d0 5d c3 55 48 89 e5 48 83 ec 10 89
[ 2069.970101] RSP: 002b:00007ffd57a03150 EFLAGS: 00000213 ORIG_RAX: ffffffffffffff13
[ 2069.970102] RAX: 00000000176b92c1 RBX: 0000000000000000 RCX: 00007fb087306738
[ 2069.970103] RDX: 0000000064f261d6 RSI: 00007ffd57a03258 RDI: 00000000176b92c1
[ 2069.970107] RBP: 00007ffd57a03150 R08: 00007fb087307da0 R09: 00007fb087307da0
[ 2069.970107] R10: 0000000000000000 R11: 00007fb0872c8188 R12: 0000000000401020
[ 2069.970108] R13: 00007ffd57a03250 R14: 0000000000000000 R15: 0000000000000000
Created attachment 1554362 [details]
Reproducer that only uses perf commands and load on the system
systemtap can be eliminated from the reproducer. After 10 minutes of running the attached with the commands below it produces an "INFO: rcu_tasks detected stalls on tasks:" in the dmesg output.
$./uprobes_onthefly.x &
$ sudo ./perf_pr24416c.sh
Added new events:
probe:workqueue_apply_unbound_cpumask (on workqueue_apply_unbound_cpumask)
probe:workqueue_congested (on workqueue_congested)
probe:workqueue_offline_cpu (on workqueue_offline_cpu)
probe:workqueue_online_cpu (on workqueue_online_cpu)
probe:workqueue_prepare_cpu (on workqueue_prepare_cpu)
probe:workqueue_set_max_active (on workqueue_set_max_active)
probe:workqueue_set_unbound_cpumask (on workqueue_set_unbound_cpumask)
probe:workqueue_sysfs_register (on workqueue_sysfs_register)
probe:workqueue_sysfs_unregister (on workqueue_sysfs_unregister)
probe:wqattrs_equal (on wqattrs_equal)
probe:wqattrs_hash (on wqattrs_hash)
probe:wq_barrier_func (on wq_barrier_func)
probe:wq_clamp_max_active (on wq_clamp_max_active)
probe:wq_cpumask_show (on wq_cpumask_show)
probe:wq_cpumask_store (on wq_cpumask_store)
probe:wq_device_release (on wq_device_release)
probe:wq_nice_show (on wq_nice_show)
probe:wq_nice_store (on wq_nice_store)
probe:wq_numa_show (on wq_numa_show)
probe:wq_numa_store (on wq_numa_store)
probe:wq_pool_ids_show (on wq_pool_ids_show)
probe:wq_sysfs_prep_attrs (on wq_sysfs_prep_attrs)
probe:wq_unbound_cpumask_show (on wq_unbound_cpumask_show)
probe:wq_unbound_cpumask_store (on wq_unbound_cpumask_store)
probe:wq_worker_comm (on wq_worker_comm)
probe:wq_worker_last_func (on wq_worker_last_func)
probe:wq_worker_sleeping (on wq_worker_sleeping)
probe:wq_worker_waking_up (on wq_worker_waking_up)
You can now use it in all perf tools, such as:
perf record -e probe:wq_worker_waking_up -aR sleep 1
Hello, Can you share the kernel config for the latest 5.1.0-rc3 based one? Also, - is it sure that this doesn't happen older kernel? - can you reproduce this without running uprobes_onthefly.x? - can you share the /sys/kernel/debug/kprobes/list ? - I guess there are some ftrace-based probes are involved on the list. can you reproduce it without those ftrace-based probes (marked as [FTRACE])? I would like to solve this issue if I can reproduce it, and I need your help. Thank you, Hello, An important note was that for systemtap disabling the kprobes-optimization eliminated the problem, so the problem seems to be related to kprobes optimization on x86. Is it possible that some of those workqueue functions might need to be blacklisted from probing by kprobes? Can you share the kernel config for the latest 5.1.0-rc3 based one? This is just the stock fedora rawhide kernels. See the problem in the latest rawhide kernel 5.1.0-0.rc5.git2.1.fc31.x86_64. You should be able to get the /boot/config-5.1.0-0.rc5.git2.1.fc31.x86_64 from https://kojipkgs.fedoraproject.org//packages/kernel/5.1.0/0.rc4.git2.1.fc31/x86_64/kernel-core-5.1.0-0.rc4.git2.1.fc31.x86_64.rpm Also, - is it sure that this doesn't happen older kernel? I have observed the systemtap test getting stuck in older 4.x kernel, but I don't have a specific version that it started in. - can you reproduce this without running uprobes_onthefly.x? There is something about having something keeping the processor busy that is triggering the problem. Note in the latest reproducer that perf probes are not probing the application itself. It is keeping the processor busy on the machine that triggers the problem. - can you share the /sys/kernel/debug/kprobes/list ? After the reproducer triggered the problem and the "perf stat" has not exited: # cat /sys/kernel/debug/kprobes/list ffffffffb310c310 k wq_sysfs_prep_attrs+0x0 [DISABLED][FTRACE] ffffffffb310dd56 k alloc_workqueue+0x126 [DISABLED] ffffffffb310da20 k wq_unbound_cpumask_store+0x0 [DISABLED][FTRACE] ffffffffb310b400 k workqueue_set_max_active+0x0 [DISABLED][FTRACE] ffffffffb310cd40 k wq_worker_comm+0x0 [DISABLED][FTRACE] ffffffffb3106a00 k wq_barrier_func+0x0 [DISABLED][FTRACE] ffffffffb310ba67 k alloc_unbound_pwq+0x137 [DISABLED] ffffffffb310d1c0 k workqueue_offline_cpu+0x0 [DISABLED][FTRACE] ffffffffb3107150 k wq_unbound_cpumask_show+0x0 [DISABLED][FTRACE] ffffffffb3107270 k wq_cpumask_show+0x0 [DISABLED][FTRACE] ffffffffb310b432 k workqueue_set_max_active+0x32 [DISABLED] ffffffffb310ce70 k workqueue_online_cpu+0x0 [DISABLED][FTRACE] ffffffffb310d8a0 k workqueue_set_unbound_cpumask+0xf0 [OPTIMIZED] ffffffffb310ce00 k workqueue_prepare_cpu+0x0 [DISABLED][FTRACE] ffffffffb3109680 k workqueue_congested+0x0 [FTRACE] ffffffffb3107210 k wq_numa_show+0x0 [DISABLED][FTRACE] ffffffffb3109d40 k wq_pool_ids_show+0x0 [DISABLED][FTRACE] ffffffffb310daf0 k workqueue_sysfs_register+0x0 [DISABLED][FTRACE] ffffffffb310b560 k wq_worker_waking_up+0x0 [DISABLED][FTRACE] ffffffffb310b5f0 k wq_worker_last_func+0x0 [DISABLED][FTRACE] ffffffffb310c490 k wq_cpumask_store+0x0 [DISABLED][FTRACE] ffffffffb310c5b0 k wq_nice_store+0x0 [DISABLED][FTRACE] ffffffffb310b953 k alloc_unbound_pwq+0x23 [DISABLED] ffffffffb310d7b0 k workqueue_set_unbound_cpumask+0x0 [DISABLED][FTRACE] ffffffffb3106670 k wq_device_release+0x0 [DISABLED][FTRACE] ffffffffb310c390 k wq_numa_store+0x0 [DISABLED][FTRACE] ffffffffb31072d0 k wq_nice_show+0x0 [DISABLED][FTRACE] ffffffffb310cc2f k destroy_workqueue+0x13f [DISABLED] ffffffffb310b590 k wq_worker_sleeping+0x0 [DISABLED][FTRACE] For a run of perf_pr24416c.sh without uprobes_onthefly.x creating load: # cat /sys/kernel/debug/kprobes/list ffffffffbc10ce00 k workqueue_prepare_cpu+0x0 [FTRACE] ffffffffbc109680 k workqueue_congested+0x0 [FTRACE] ffffffffbc107210 k wq_numa_show+0x0 [FTRACE] ffffffffbc109d40 k wq_pool_ids_show+0x0 [FTRACE] ffffffffbc10daf0 k workqueue_sysfs_register+0x0 [FTRACE] ffffffffbc10b560 k wq_worker_waking_up+0x0 [FTRACE] ffffffffbc10b5f0 k wq_worker_last_func+0x0 [FTRACE] ffffffffbc10c5b0 k wq_nice_store+0x0 [FTRACE] ffffffffbc10c490 k wq_cpumask_store+0x0 [FTRACE] ffffffffbc10b953 k alloc_unbound_pwq+0x23 [OPTIMIZED] ffffffffbc10d7b0 k workqueue_set_unbound_cpumask+0x0 [FTRACE] ffffffffbc106670 k wq_device_release+0x0 [FTRACE] ffffffffbc10c390 k wq_numa_store+0x0 [FTRACE] ffffffffbc1072d0 k wq_nice_show+0x0 [FTRACE] ffffffffbc10cc2f k destroy_workqueue+0x13f [OPTIMIZED] ffffffffbc10b590 k wq_worker_sleeping+0x0 [FTRACE] ffffffffbc10c310 k wq_sysfs_prep_attrs+0x0 [FTRACE] ffffffffbc10dd56 k alloc_workqueue+0x126 [DISABLED] ffffffffbc10da20 k wq_unbound_cpumask_store+0x0 [FTRACE] ffffffffbc10b400 k workqueue_set_max_active+0x0 [FTRACE] ffffffffbc10cd40 k wq_worker_comm+0x0 [FTRACE] ffffffffbc106a00 k wq_barrier_func+0x0 [FTRACE] ffffffffbc10ba67 k alloc_unbound_pwq+0x137 [OPTIMIZED] ffffffffbc10d1c0 k workqueue_offline_cpu+0x0 [FTRACE] ffffffffbc107150 k wq_unbound_cpumask_show+0x0 [FTRACE] ffffffffbc107270 k wq_cpumask_show+0x0 [FTRACE] ffffffffbc10b432 k workqueue_set_max_active+0x32 [OPTIMIZED] ffffffffbc10ce70 k workqueue_online_cpu+0x0 [FTRACE] ffffffffbc10d8a0 k workqueue_set_unbound_cpumask+0xf0 [OPTIMIZED] - I guess there are some ftrace-based probes are involved on the list. can you reproduce it without those ftrace-based probes (marked as [FTRACE])? Based on /sys/kernel/debug/kprobes/list output you have suggestions on how to further subset perf_pr24416c.sh? *********** MASS BUG UPDATE ************** We apologize for the inconvenience. There are a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 29 kernel bugs. Fedora 29 has now been rebased to 5.2.9-100.fc29. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 30, and are still experiencing this issue, please change the version to Fedora 30. If you experience different issues, please open a new bug report for those. *********** MASS BUG UPDATE ************** This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 3 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously. This can be still be triggered on rawhide with a reduced set of perf probe points with the following versions of perf and the kernel: perf-5.5.0-0.rc6.git0.1.fc32.x86_64 kernel-5.5.0-0.rc6.git3.1.fc32.x86_64 $ ./uprobes_onthefly.x & [1] 8853 [wcohen@localhost ~]$ sudo ./perf_pr24416e.sh [sudo] password for wcohen: Added new events: probe:workqueue_apply_unbound_cpumask (on workqueue_apply_unbound_cpumask) probe:workqueue_sysfs_unregister (on workqueue_sysfs_unregister) probe:wqattrs_equal (on wqattrs_equal) probe:wqattrs_hash (on wqattrs_hash) You can now use it in all perf tools, such as: perf record -e probe:wqattrs_hash -aR sleep 1 ... Eventually see the following in the dmesg information: [ 1352.887527] INFO: task kworker/0:0:8628 blocked for more than 122 seconds. [ 1352.887535] Not tainted 5.5.0-0.rc6.git3.1.fc32.x86_64 #1 [ 1352.887536] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1352.887538] kworker/0:0 D14256 8628 2 0x80004000 [ 1352.887546] Workqueue: events kprobe_optimizer [ 1352.887548] Call Trace: [ 1352.887558] ? __schedule+0x377/0x9c0 [ 1352.887560] ? wait_for_completion+0x10d/0x170 [ 1352.887571] ? wait_for_completion+0x10d/0x170 [ 1352.887573] schedule+0x5f/0xd0 [ 1352.887576] schedule_timeout+0x289/0x3c0 [ 1352.887581] ? wait_for_completion+0x10d/0x170 [ 1352.887584] ? mark_held_locks+0x2d/0x80 [ 1352.887586] ? _raw_spin_unlock_irq+0x24/0x40 [ 1352.887589] ? wait_for_completion+0x10d/0x170 [ 1352.887591] wait_for_completion+0x12c/0x170 [ 1352.887594] ? wake_up_q+0xa0/0xa0 [ 1352.887599] __wait_rcu_gp+0x157/0x170 [ 1352.887603] synchronize_rcu_tasks+0x49/0xa0 [ 1352.887605] ? rcu_read_lock_any_held+0xb0/0xb0 [ 1352.887607] ? __bpf_trace_rcu_utilization+0x10/0x10 [ 1352.887613] ? wait_for_completion+0x48/0x170 [ 1352.887616] kprobe_optimizer+0x12d/0x2c0 [ 1352.887620] process_one_work+0x272/0x5a0 [ 1352.887626] worker_thread+0x50/0x3b0 [ 1352.887631] kthread+0x106/0x140 [ 1352.887633] ? process_one_work+0x5a0/0x5a0 [ 1352.887635] ? kthread_park+0x90/0x90 [ 1352.887638] ret_from_fork+0x3a/0x50 [ 1352.887649] INFO: task kworker/0:2:8883 blocked for more than 122 seconds. [ 1352.887651] Not tainted 5.5.0-0.rc6.git3.1.fc32.x86_64 #1 [ 1352.887652] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1352.887653] kworker/0:2 D14296 8883 2 0x80004000 [ 1352.887658] Workqueue: events netstamp_clear [ 1352.887659] Call Trace: [ 1352.887663] ? __schedule+0x377/0x9c0 [ 1352.887668] schedule+0x5f/0xd0 [ 1352.887671] schedule_preempt_disabled+0x11/0x20 [ 1352.887673] __mutex_lock+0x33e/0x9e0 [ 1352.887678] ? arch_jump_label_transform_apply+0x1e/0x50 [ 1352.887684] ? arch_jump_label_transform_apply+0x1e/0x50 [ 1352.887686] arch_jump_label_transform_apply+0x1e/0x50 [ 1352.887689] static_key_enable_cpuslocked+0x59/0x90 [ 1352.887691] static_key_enable+0x16/0x20 [ 1352.887693] process_one_work+0x272/0x5a0 [ 1352.887699] worker_thread+0x50/0x3b0 [ 1352.887704] kthread+0x106/0x140 [ 1352.887705] ? process_one_work+0x5a0/0x5a0 [ 1352.887707] ? kthread_park+0x90/0x90 [ 1352.887710] ret_from_fork+0x3a/0x50 [ 1352.887721] Showing all locks held in the system: [ 1352.887725] 1 lock held by khungtaskd/18: [ 1352.887726] #0: ffffffff9c6959e0 (rcu_read_lock){....}, at: debug_show_all_locks+0x15/0x16f [ 1352.887736] 6 locks held by kworker/0:0/8628: [ 1352.887737] #0: ffff8d18bd43a348 ((wq_completion)events){+.+.}, at: process_one_work+0x1e9/0x5a0 [ 1352.887740] #1: ffffb3770456fe50 ((optimizing_work).work){+.+.}, at: process_one_work+0x1e9/0x5a0 [ 1352.887743] #2: ffffffff9c7543b0 (kprobe_mutex){+.+.}, at: kprobe_optimizer+0x19/0x2c0 [ 1352.887746] #3: ffffffff9c653e50 (cpu_hotplug_lock.rw_sem){++++}, at: kprobe_optimizer+0x1e/0x2c0 [ 1352.887749] #4: ffffffff9c662730 (text_mutex){+.+.}, at: kprobe_optimizer+0x2c/0x2c0 [ 1352.887752] #5: ffffffff9c74bab0 (module_mutex){+.+.}, at: kprobe_optimizer+0x3a/0x2c0 [ 1352.887755] 5 locks held by kworker/0:2/8883: [ 1352.887756] #0: ffff8d18bd43a348 ((wq_completion)events){+.+.}, at: process_one_work+0x1e9/0x5a0 [ 1352.887759] #1: ffffb37701073e50 (netstamp_work){+.+.}, at: process_one_work+0x1e9/0x5a0 [ 1352.887762] #2: ffffffff9c653e50 (cpu_hotplug_lock.rw_sem){++++}, at: static_key_enable+0xe/0x20 [ 1352.887764] #3: ffffffff9c775230 (jump_label_mutex){+.+.}, at: static_key_enable_cpuslocked+0x34/0x90 [ 1352.887767] #4: ffffffff9c662730 (text_mutex){+.+.}, at: arch_jump_label_transform_apply+0x1e/0x50 [ 1352.887777] ============================================= [ 1475.767538] INFO: task kworker/0:0:8628 blocked for more than 245 seconds. [ 1475.767541] Not tainted 5.5.0-0.rc6.git3.1.fc32.x86_64 #1 [ 1475.767542] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1475.767544] kworker/0:0 D14256 8628 2 0x80004000 [ 1475.767558] Workqueue: events kprobe_optimizer [ 1475.767560] Call Trace: [ 1475.767572] ? __schedule+0x377/0x9c0 [ 1475.767574] ? wait_for_completion+0x10d/0x170 [ 1475.767579] ? wait_for_completion+0x10d/0x170 [ 1475.767584] schedule+0x5f/0xd0 [ 1475.767587] schedule_timeout+0x289/0x3c0 [ 1475.767592] ? wait_for_completion+0x10d/0x170 [ 1475.767595] ? mark_held_locks+0x2d/0x80 [ 1475.767597] ? _raw_spin_unlock_irq+0x24/0x40 [ 1475.767601] ? wait_for_completion+0x10d/0x170 [ 1475.767603] wait_for_completion+0x12c/0x170 [ 1475.767606] ? wake_up_q+0xa0/0xa0 [ 1475.767611] __wait_rcu_gp+0x157/0x170 [ 1475.767615] synchronize_rcu_tasks+0x49/0xa0 [ 1475.767633] ? rcu_read_lock_any_held+0xb0/0xb0 [ 1475.767635] ? __bpf_trace_rcu_utilization+0x10/0x10 [ 1475.767641] ? wait_for_completion+0x48/0x170 [ 1475.767644] kprobe_optimizer+0x12d/0x2c0 [ 1475.767648] process_one_work+0x272/0x5a0 [ 1475.767654] worker_thread+0x50/0x3b0 [ 1475.767659] kthread+0x106/0x140 [ 1475.767661] ? process_one_work+0x5a0/0x5a0 [ 1475.767663] ? kthread_park+0x90/0x90 [ 1475.767666] ret_from_fork+0x3a/0x50 [ 1475.767677] INFO: task kworker/0:2:8883 blocked for more than 245 seconds. [ 1475.767679] Not tainted 5.5.0-0.rc6.git3.1.fc32.x86_64 #1 [ 1475.767680] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1475.767681] kworker/0:2 D14296 8883 2 0x80004000 [ 1475.767686] Workqueue: events netstamp_clear [ 1475.767687] Call Trace: [ 1475.767691] ? __schedule+0x377/0x9c0 [ 1475.767696] schedule+0x5f/0xd0 [ 1475.767699] schedule_preempt_disabled+0x11/0x20 [ 1475.767700] __mutex_lock+0x33e/0x9e0 [ 1475.767705] ? arch_jump_label_transform_apply+0x1e/0x50 [ 1475.767712] ? arch_jump_label_transform_apply+0x1e/0x50 [ 1475.767714] arch_jump_label_transform_apply+0x1e/0x50 [ 1475.767717] static_key_enable_cpuslocked+0x59/0x90 [ 1475.767719] static_key_enable+0x16/0x20 [ 1475.767721] process_one_work+0x272/0x5a0 [ 1475.767727] worker_thread+0x50/0x3b0 [ 1475.767732] kthread+0x106/0x140 [ 1475.767733] ? process_one_work+0x5a0/0x5a0 [ 1475.767735] ? kthread_park+0x90/0x90 [ 1475.767738] ret_from_fork+0x3a/0x50 [ 1475.767749] Showing all locks held in the system: [ 1475.767753] 1 lock held by khungtaskd/18: [ 1475.767755] #0: ffffffff9c6959e0 (rcu_read_lock){....}, at: debug_show_all_locks+0x15/0x16f [ 1475.767765] 6 locks held by kworker/0:0/8628: [ 1475.767766] #0: ffff8d18bd43a348 ((wq_completion)events){+.+.}, at: process_one_work+0x1e9/0x5a0 [ 1475.767769] #1: ffffb3770456fe50 ((optimizing_work).work){+.+.}, at: process_one_work+0x1e9/0x5a0 [ 1475.767771] #2: ffffffff9c7543b0 (kprobe_mutex){+.+.}, at: kprobe_optimizer+0x19/0x2c0 [ 1475.767774] #3: ffffffff9c653e50 (cpu_hotplug_lock.rw_sem){++++}, at: kprobe_optimizer+0x1e/0x2c0 [ 1475.767777] #4: ffffffff9c662730 (text_mutex){+.+.}, at: kprobe_optimizer+0x2c/0x2c0 [ 1475.767780] #5: ffffffff9c74bab0 (module_mutex){+.+.}, at: kprobe_optimizer+0x3a/0x2c0 [ 1475.767784] 5 locks held by kworker/0:2/8883: [ 1475.767785] #0: ffff8d18bd43a348 ((wq_completion)events){+.+.}, at: process_one_work+0x1e9/0x5a0 [ 1475.767788] #1: ffffb37701073e50 (netstamp_work){+.+.}, at: process_one_work+0x1e9/0x5a0 [ 1475.767791] #2: ffffffff9c653e50 (cpu_hotplug_lock.rw_sem){++++}, at: static_key_enable+0xe/0x20 [ 1475.767793] #3: ffffffff9c775230 (jump_label_mutex){+.+.}, at: static_key_enable_cpuslocked+0x34/0x90 [ 1475.767796] #4: ffffffff9c662730 (text_mutex){+.+.}, at: arch_jump_label_transform_apply+0x1e/0x50 [ 1475.767801] ============================================= [ 1598.647646] INFO: task kworker/0:0:8628 blocked for more than 368 seconds. [ 1598.647649] Not tainted 5.5.0-0.rc6.git3.1.fc32.x86_64 #1 [ 1598.647650] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1598.647652] kworker/0:0 D14256 8628 2 0x80004000 [ 1598.647660] Workqueue: events kprobe_optimizer [ 1598.647662] Call Trace: [ 1598.647671] ? __schedule+0x377/0x9c0 [ 1598.647673] ? wait_for_completion+0x10d/0x170 [ 1598.647677] ? wait_for_completion+0x10d/0x170 [ 1598.647680] schedule+0x5f/0xd0 [ 1598.647689] schedule_timeout+0x289/0x3c0 [ 1598.647694] ? wait_for_completion+0x10d/0x170 [ 1598.647697] ? mark_held_locks+0x2d/0x80 [ 1598.647699] ? _raw_spin_unlock_irq+0x24/0x40 [ 1598.647702] ? wait_for_completion+0x10d/0x170 [ 1598.647704] wait_for_completion+0x12c/0x170 [ 1598.647707] ? wake_up_q+0xa0/0xa0 [ 1598.647712] __wait_rcu_gp+0x157/0x170 [ 1598.647716] synchronize_rcu_tasks+0x49/0xa0 [ 1598.647718] ? rcu_read_lock_any_held+0xb0/0xb0 [ 1598.647720] ? __bpf_trace_rcu_utilization+0x10/0x10 [ 1598.647727] ? wait_for_completion+0x48/0x170 [ 1598.647730] kprobe_optimizer+0x12d/0x2c0 [ 1598.647733] process_one_work+0x272/0x5a0 [ 1598.647740] worker_thread+0x50/0x3b0 [ 1598.647745] kthread+0x106/0x140 [ 1598.647746] ? process_one_work+0x5a0/0x5a0 [ 1598.647748] ? kthread_park+0x90/0x90 [ 1598.647751] ret_from_fork+0x3a/0x50 [ 1598.647763] INFO: task kworker/0:2:8883 blocked for more than 368 seconds. [ 1598.647764] Not tainted 5.5.0-0.rc6.git3.1.fc32.x86_64 #1 [ 1598.647765] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1598.647766] kworker/0:2 D14296 8883 2 0x80004000 [ 1598.647771] Workqueue: events netstamp_clear [ 1598.647772] Call Trace: [ 1598.647776] ? __schedule+0x377/0x9c0 [ 1598.647782] schedule+0x5f/0xd0 [ 1598.647784] schedule_preempt_disabled+0x11/0x20 [ 1598.647786] __mutex_lock+0x33e/0x9e0 [ 1598.647791] ? arch_jump_label_transform_apply+0x1e/0x50 [ 1598.647798] ? arch_jump_label_transform_apply+0x1e/0x50 [ 1598.647800] arch_jump_label_transform_apply+0x1e/0x50 [ 1598.647802] static_key_enable_cpuslocked+0x59/0x90 [ 1598.647804] static_key_enable+0x16/0x20 [ 1598.647806] process_one_work+0x272/0x5a0 [ 1598.647812] worker_thread+0x50/0x3b0 [ 1598.647817] kthread+0x106/0x140 [ 1598.647819] ? process_one_work+0x5a0/0x5a0 [ 1598.647821] ? kthread_park+0x90/0x90 [ 1598.647823] ret_from_fork+0x3a/0x50 [ 1598.647864] Showing all locks held in the system: [ 1598.647867] 1 lock held by khungtaskd/18: [ 1598.647868] #0: ffffffff9c6959e0 (rcu_read_lock){....}, at: debug_show_all_locks+0x15/0x16f [ 1598.647876] 6 locks held by kworker/0:0/8628: [ 1598.647877] #0: ffff8d18bd43a348 ((wq_completion)events){+.+.}, at: process_one_work+0x1e9/0x5a0 [ 1598.647881] #1: ffffb3770456fe50 ((optimizing_work).work){+.+.}, at: process_one_work+0x1e9/0x5a0 [ 1598.647884] #2: ffffffff9c7543b0 (kprobe_mutex){+.+.}, at: kprobe_optimizer+0x19/0x2c0 [ 1598.647887] #3: ffffffff9c653e50 (cpu_hotplug_lock.rw_sem){++++}, at: kprobe_optimizer+0x1e/0x2c0 [ 1598.647890] #4: ffffffff9c662730 (text_mutex){+.+.}, at: kprobe_optimizer+0x2c/0x2c0 [ 1598.647893] #5: ffffffff9c74bab0 (module_mutex){+.+.}, at: kprobe_optimizer+0x3a/0x2c0 [ 1598.647914] 5 locks held by kworker/0:2/8883: [ 1598.647915] #0: ffff8d18bd43a348 ((wq_completion)events){+.+.}, at: process_one_work+0x1e9/0x5a0 [ 1598.647918] #1: ffffb37701073e50 (netstamp_work){+.+.}, at: process_one_work+0x1e9/0x5a0 [ 1598.647936] #2: ffffffff9c653e50 (cpu_hotplug_lock.rw_sem){++++}, at: static_key_enable+0xe/0x20 [ 1598.647939] #3: ffffffff9c775230 (jump_label_mutex){+.+.}, at: static_key_enable_cpuslocked+0x34/0x90 [ 1598.647942] #4: ffffffff9c662730 (text_mutex){+.+.}, at: arch_jump_label_transform_apply+0x1e/0x50 [ 1598.647946] ============================================= [wcohen@localhost ~]$ Created attachment 1656407 [details]
Script to setup kprobe monitoring points and use them.
This update to the script just sets perf probes that use kprobes and avoid using any ftrace probe points. At some point kprobes attempts to do some optimization and there are issues with the rcu locking reported in the dmesg output.
Re-closing since F29 reached EOL on 2019-11-26. If this is still an issue, please update the version. If you cannot update the version, needinfo me. |