Description of problem: Repeatedly running #lvm2-testsuite --only lvcreate-raid.sh will eventually hang, causing the following kernel messages: [ 1722.976156] INFO: task lvm:212260 blocked for more than 122 seconds. [ 1722.982531] Not tainted 5.14.0-294.bz2183556.el9.x86_64 #1 [ 1722.988544] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1722.996378] task:lvm state:D stack: 0 pid:212260 ppid:211591 flags:0x00000002 [ 1723.004817] Call Trace: [ 1723.007278] <TASK> [ 1723.009387] __schedule+0x248/0x630 [ 1723.012896] schedule+0x5a/0xc0 [ 1723.016051] md_write_start.part.0+0x197/0x240 [ 1723.020512] ? cpuacct_percpu_seq_show+0x10/0x10 [ 1723.025141] raid5_make_request+0x8c/0x3e0 [raid456] [ 1723.030144] ? sched_show_numa+0xf0/0xf0 [ 1723.034076] md_handle_request+0x135/0x1e0 [ 1723.038186] raid_map+0x25/0x40 [dm_raid] [ 1723.042220] __map_bio+0x46/0x1f0 [dm_mod] [ 1723.046365] dm_split_and_process_bio+0x1b8/0x460 [dm_mod] [ 1723.051863] dm_submit_bio+0x4d/0x130 [dm_mod] [ 1723.056323] __submit_bio+0x89/0x130 [ 1723.059921] __submit_bio_noacct+0x81/0x1f0 [ 1723.064114] submit_bio_wait+0x54/0xb0 [ 1723.067878] blkdev_issue_zeroout+0x12f/0x200 [ 1723.072245] blkdev_common_ioctl+0x6aa/0x9f0 [ 1723.076526] blkdev_ioctl+0x116/0x280 [ 1723.080200] __x64_sys_ioctl+0x8a/0xc0 [ 1723.083964] do_syscall_64+0x5c/0x90 [ 1723.087549] ? vfs_write+0x178/0x280 [ 1723.091138] ? vfs_write+0x178/0x280 [ 1723.094726] ? syscall_exit_work+0x11a/0x150 [ 1723.099008] ? syscall_exit_to_user_mode+0x12/0x30 [ 1723.103809] ? do_syscall_64+0x69/0x90 [ 1723.107569] ? do_syscall_64+0x69/0x90 [ 1723.111321] ? do_syscall_64+0x69/0x90 [ 1723.115073] ? sysvec_apic_timer_interrupt+0x3c/0x90 [ 1723.120041] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 1723.125102] RIP: 0033:0x7fe8e3e3ec6b [ 1723.128690] RSP: 002b:00007ffdb9ed8668 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 1723.136264] RAX: ffffffffffffffda RBX: 00005595c908f3f8 RCX: 00007fe8e3e3ec6b [ 1723.143406] RDX: 00007ffdb9ed8690 RSI: 000000000000127f RDI: 0000000000000004 [ 1723.143410] RBP: 0000000000000008 R08: 0000000000000000 R09: 00007ffdb9ed7067 [ 1723.143412] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdb9ed8690 [ 1723.143413] R13: 00005595caa21590 R14: 0000000000001000 R15: 00005595cab6a028 [ 1723.143417] </TASK> [ 1723.143419] INFO: task mdX_raid5:212303 blocked for more than 123 seconds. [ 1723.143421] Not tainted 5.14.0-294.bz2183556.el9.x86_64 #1 [ 1723.143422] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1723.143423] task:mdX_raid5 state:D stack: 0 pid:212303 ppid: 2 flags:0x00004000 [ 1723.143426] Call Trace: [ 1723.143427] <TASK> [ 1723.143428] __schedule+0x248/0x630 [ 1723.143431] schedule+0x5a/0xc0 [ 1723.143433] raid5d+0x44a/0x660 [raid456] [ 1723.143441] ? cpuacct_percpu_seq_show+0x10/0x10 [ 1723.143444] md_thread+0xab/0x160 [ 1723.143446] ? cpuacct_percpu_seq_show+0x10/0x10 [ 1723.143448] ? super_90_load+0x70/0x70 [ 1723.143450] kthread+0xe0/0x100 [ 1723.143455] ? kthread_complete_and_exit+0x20/0x20 [ 1723.143457] ret_from_fork+0x22/0x30 [ 1723.143465] </TASK> Version-Release number of selected component (if applicable): kernel-5.14.0-285.el9 and later. How reproducible: Always reproducible on kernel-5.14.0-285.el9 and later. Cannot reproduce on rhel-9.2 or rhel-8 kernels. Steps to Reproduce: # while lvm2-testsuite --only lvcreate-raid.sh ; do :; done Actual results: Hang with kernel error messages Expected results: Test always completes successfully Additional Information: md_write_start() is scheduling at: wait_event(mddev->sb_wait, !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) || mddev->suspended); raid5d() is scheduling at: wait_event_lock_irq(mddev->sb_wait, !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags), conf->device_lock);
The first c9s commit where I can reproduce this is: commit 0990e82f887de26e61813c33cae08d76138239da Author: Juri Lelli <juri.lelli> Date: Tue Aug 2 11:07:41 2022 +0200 printk: Bring back the RT bits. Bugzilla: https://bugzilla.redhat.com/2171995 Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git commit b576ab7eadbf1835eddfa5a47eea0a57d3f5b74f Author: Sebastian Andrzej Siewior <bigeasy> Date: Tue Jul 19 20:08:01 2022 +0200 printk: Bring back the RT bits. This is a revert of the commits: | 07a22b61946f0 Revert "printk: add functions to prefer direct printing" | 5831788afb17b Revert "printk: add kthread console printers" | 2d9ef940f89e0 Revert "printk: extend console_lock for per-console locking" | 007eeab7e9f03 Revert "printk: remove @console_locked" | 05c96b3713aa2 Revert "printk: Block console kthreads when direct printing will be required" | 20fb0c8272bbb Revert "printk: Wait for the global console lock when the system is going down" which is needed for the atomic consoles which are used on PREEMPT_RT. Signed-off-by: Sebastian Andrzej Siewior <bigeasy> Signed-off-by: Juri Lelli <juri.lelli>
CCing Juri Lelli, since the bisect is pointing to a Kernel-RT commit.