Description of problem: New changes in 4.3-rc kernel causes lvm2 test suite test 'snapshot-merge' to eventually hang 'dm' subsystem. It gets to the state where plain call of 'dmsetup status' hangs in kernel. To reproduces the issue - running many times 'snapshot-merge' test seems to trigger it some point (hard to say if running test without problems for 100 times makes it 'ok') - but I'd say so far this is my 'estimate' for 'bisect' check. Hang tasks: pool 0: cpus=0 node=0 flags=0x0 nice=0 workers=5 idle: 28076 31863 24250 INFO: task kworker/0:0:31015 blocked for more than 120 seconds. Tainted: G W 4.3.0-0.rc2.git1.1.fc24.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/0:0 D ffff8800779d7a18 12384 31015 2 0x00000080 Workqueue: kcopyd do_work ffff8800157dbb48 0000000000000082 0000000000000282 0000000000000000 ffff8800055bb100 ffff8800754c3100 ffff8800157dc000 ffff88005551a000 0000000000000000 0000000000000378 0000000000000001 ffff8800157dbb60 Call Trace: [<ffffffff818599ba>] schedule+0x3a/0x90 [<ffffffff816b4b13>] snapshot_merge_next_chunks+0x2c3/0x3f0 [<ffffffff810f8040>] ? wake_atomic_t_function+0x70/0x70 [<ffffffff816b52ad>] merge_callback+0x23d/0x270 [<ffffffff816b5070>] ? snapshot_merge_resume+0x40/0x40 [<ffffffff816adc79>] run_complete_job+0x69/0xc0 [<ffffffff816add2a>] process_jobs+0x5a/0xf0 [<ffffffff816adc10>] ? drop_pages+0x40/0x40 [<ffffffff816ade0c>] do_work+0x4c/0xa0 [<ffffffff810c7480>] process_one_work+0x230/0x6a0 [<ffffffff810c73e9>] ? process_one_work+0x199/0x6a0 [<ffffffff810c793e>] worker_thread+0x4e/0x450 [<ffffffff810c78f0>] ? process_one_work+0x6a0/0x6a0 [<ffffffff810c78f0>] ? process_one_work+0x6a0/0x6a0 [<ffffffff810ce991>] kthread+0x101/0x120 [<ffffffff81103fc9>] ? trace_hardirqs_on_caller+0x129/0x1b0 [<ffffffff810ce890>] ? kthread_create_on_node+0x250/0x250 [<ffffffff818607af>] ret_from_fork+0x3f/0x70 [<ffffffff810ce890>] ? kthread_create_on_node+0x250/0x250 96240.285052] 2 locks held by kworker/0:0/31015: [96240.287231] #0: ("kcopyd"){++++..}, at: [<ffffffff810c73e9>] process_one_work+0x199/0x6a0 [96240.290512] #1: ((&kc->kcopyd_work)){+.+...}, at: [<ffffffff810c73e9>] process_one_work+0x199/0x6a0 [96240.294036] INFO: task systemd-udevd:28056 blocked for more than 120 seconds. [96240.296869] Tainted: G W 4.3.0-0.rc2.git1.1.fc24.x86_64 #1 [96240.299748] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [96240.303116] systemd-udevd D ffff880077bd7a18 13296 28056 499 0x00000084 [96240.306208] ffff88004f2938b0 0000000000000086 ffff88004f293860 ffff880000000000 [96240.309513] ffff880073c08000 ffff8800752c8000 ffff88004f294000 ffffffff00000000 [96240.312901] ffff880055518400 ffff8800752c8000 fffffffe00000001 ffff88004f2938c8 [96240.316717] Call Trace: [96240.319027] [<ffffffff818599ba>] schedule+0x3a/0x90 [96240.322139] [<ffffffff8185e1a0>] rwsem_down_write_failed+0x230/0x420 [96240.325627] [<ffffffff8185dfc6>] ? rwsem_down_write_failed+0x56/0x420 [96240.329147] [<ffffffff81025ac9>] ? sched_clock+0x9/0x10 [96240.332092] [<ffffffff810e351a>] ? sched_clock_cpu+0x8a/0xb0 [96240.335122] [<ffffffff81427053>] call_rwsem_down_write_failed+0x13/0x20 [96240.338025] [<ffffffff8185d818>] ? down_write+0x98/0xc0 [96240.340564] [<ffffffff816b569f>] ? snapshot_map+0x5f/0x440 [96240.343075] [<ffffffff816b569f>] snapshot_map+0x5f/0x440 [96240.345540] [<ffffffff816a244a>] __map_bio+0x3a/0x1a0 [96240.347918] [<ffffffff816a4353>] __split_and_process_bio+0x313/0x4e0 [96240.350643] [<ffffffff816a4071>] ? __split_and_process_bio+0x31/0x4e0 [96240.352995] [<ffffffff816a47f2>] dm_make_request+0x92/0x140 [96240.355058] [<ffffffff813e2c96>] generic_make_request+0xd6/0x110 [96240.357142] [<ffffffff813e2d46>] submit_bio+0x76/0x170 [96240.359089] [<ffffffff811f0bec>] ? lru_cache_add+0x1c/0x50 [96240.361090] [<ffffffff812b5b22>] mpage_readpages+0x122/0x160 [96240.363106] [<ffffffff812ae9f0>] ? I_BDEV+0x20/0x20 [96240.365011] [<ffffffff812ae9f0>] ? I_BDEV+0x20/0x20 [96240.366916] [<ffffffff81238f07>] ? alloc_pages_current+0x107/0x1a0 [96240.368994] [<ffffffff811dd035>] ? __page_cache_alloc+0x135/0x170 [96240.371033] [<ffffffff812af3ed>] blkdev_readpages+0x1d/0x20 [96240.373023] [<ffffffff811ee5bd>] __do_page_cache_readahead+0x29d/0x320 [96240.375138] [<ffffffff811ee490>] ? __do_page_cache_readahead+0x170/0x320 [96240.377269] [<ffffffff811eeb4d>] force_page_cache_readahead+0x3d/0x70 [96240.379325] [<ffffffff811eebc2>] page_cache_sync_readahead+0x42/0x50 [96240.381395] [<ffffffff811df066>] generic_file_read_iter+0x4e6/0x5f0 [96240.383448] [<ffffffff81173c6b>] ? __audit_syscall_entry+0xab/0xf0 [96240.385484] [<ffffffff810e3595>] ? local_clock+0x15/0x20 [96240.387417] [<ffffffff812b0005>] blkdev_read_iter+0x35/0x40 [96240.389802] [<ffffffff8126b81c>] __vfs_read+0xcc/0x100 [96240.392086] [<ffffffff8126c106>] vfs_read+0x86/0x130 [96240.394372] [<ffffffff8126ce88>] SyS_read+0x58/0xd0 [96240.396630] [<ffffffff81860432>] entry_SYSCALL_64_fastpath+0x12/0x76 [96240.399167] 2 locks held by systemd-udevd/28056: [96240.401331] #0: (&md->io_barrier){......}, at: [<ffffffff816a46b5>] dm_get_live_table+0x5/0xb0 [96240.404618] #1: (&s->lock){++++..}, at: [<ffffffff816b569f>] snapshot_map+0x5f/0x440 Version-Release number of selected component (if applicable): 4.3.0-0.rc2.git1.1.fc24.x86_64 (but seen already with rc0 kernel) How reproducible: Steps to Reproduce: 1. for i in $(seq 1 100) ; do make check_local T=snaphost-merge VERBOSE=1 ; done 2. see if it hangs.... 3. Actual results: Expected results: Additional info:
# dmsetup table LVMTEST27297vg-LV1_snap_1-cow: 0 21504 linear 253:2 122880 LVMTEST27297pv1: 0 204800 linear 7:0 2048 LVMTEST27297vg-LV1_snap-cow: 0 19456 linear 253:2 103424 LVMTEST27297vg-LV1_snap_1: 0 101376 snapshot 253:4 253:7 P 8 LVMTEST27297vg-LV1-real: 0 101376 linear 253:2 2048 # dmsetup table --inactive LVMTEST27297vg-LV1_snap_1-cow: LVMTEST27297pv1: LVMTEST27297vg-LV1_snap-cow: LVMTEST27297vg-LV1_snap_1: LVMTEST27297vg-LV1-real: # dmsetup info -c LVMTEST27297vg-LV1_snap_1-cow 253 7 L--w 1 1 0 LVM-JXizlk4JWakaXqChMRYmHdA9IZnMihQ6zfdZdfLAIGc0RIWjuLOnvpRfGDHykWgF-cow LVMTEST27297pv1 253 2 L--w 3 1 0 TEST-LVMTEST27297pv1 LVMTEST27297vg-LV1_snap-cow 253 5 L--w 1 1 0 LVM-JXizlk4JWakaXqChMRYmHdA9IZnMihQ6EuM8qBu0dZJ5VrVj6SEN5LGu0T86v0TY-cow LVMTEST27297vg-LV1_snap_1 253 8 L--w 1 1 0 LVM-JXizlk4JWakaXqChMRYmHdA9IZnMihQ6zfdZdfLAIGc0RIWjuLOnvpRfGDHykWgF LVMTEST27297vg-LV1-real 253 4 L--w 2 1 0 LVM-JXizlk4JWakaXqChMRYmHdA9IZnMihQ6y7WtfrhwiWASXkPeVAuTvWbjlqOcnhUU-real # dmsetup status LVMTEST27297vg-LV1_snap_1-cow: 0 21504 linear LVMTEST27297pv1: 0 204800 linear LVMTEST27297vg-LV1_snap-cow: 0 19456 linear ...hangs here...
Showing all locks held in the system: 2 locks held by kworker/0:0/31015: #0: ("kcopyd"){++++..}, at: [<ffffffff810c73e9>] process_one_work+0x199/0x6a0 #1: ((&kc->kcopyd_work)){+.+...}, at: [<ffffffff810c73e9>] process_one_work+0x199/0x6a0 2 locks held by systemd-udevd/28056: #0: (&md->io_barrier){......}, at: [<ffffffff816a46b5>] dm_get_live_table+0x5/0xb0 #1: (&s->lock){++++..}, at: [<ffffffff816b569f>] snapshot_map+0x5f/0x440 2 locks held by dmsetup/28270: #0: (&md->io_barrier){......}, at: [<ffffffff816a46b5>] dm_get_live_table+0x5/0xb0 #1: (&md->suspend_lock#2){+.+...}, at: [<ffffffff816a4bb2>] __dm_destroy+0xd2/0x2c0 2 locks held by dmsetup/28301: #0: (&md->io_barrier){......}, at: [<ffffffff816a46b5>] dm_get_live_table+0x5/0xb0 #1: (&s->lock){++++..}, at: [<ffffffff816b3102>] snapshot_status+0x82/0x1b0 3 locks held by bash/28317: #0: (sb_writers#3){.+.+.+}, at: [<ffffffff8126f179>] __sb_start_write+0xc9/0x110 #1: (rcu_read_lock){......}, at: [<ffffffff81521085>] __handle_sysrq+0x5/0x240 #2: (tasklist_lock){.+.+..}, at: [<ffffffff81101af0>] debug_show_all_locks+0x50/0x1e0
possibly related to this dm-devel report: https://www.redhat.com/archives/dm-devel/2015-October/msg00000.html
(In reply to Zdenek Kabelac from comment #2) > Showing all locks held in the system: ... > 2 locks held by systemd-udevd/28056: > #0: (&md->io_barrier){......}, at: [<ffffffff816a46b5>] > dm_get_live_table+0x5/0xb0 > #1: (&s->lock){++++..}, at: [<ffffffff816b569f>] snapshot_map+0x5f/0x440 > 2 locks held by dmsetup/28270: > #0: (&md->io_barrier){......}, at: [<ffffffff816a46b5>] > dm_get_live_table+0x5/0xb0 > #1: (&md->suspend_lock#2){+.+...}, at: [<ffffffff816a4bb2>] > __dm_destroy+0xd2/0x2c0 > 2 locks held by dmsetup/28301: > #0: (&md->io_barrier){......}, at: [<ffffffff816a46b5>] > dm_get_live_table+0x5/0xb0 > #1: (&s->lock){++++..}, at: [<ffffffff816b3102>] snapshot_status+0x82/0x1b0 Given the above locks that are held, and a dmsetup task being in __dm_destroy, it really _does_ look like the following report/fix is relevant: (In reply to Zdenek Kabelac from comment #3) > possibly related to this dm-devel report: > > https://www.redhat.com/archives/dm-devel/2015-October/msg00000.html
Josh, Could you help me build a test fedoera rawhide kernel that includes this fix that I've staged in linux-dm.git's 'for-next' branch?: https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-next&id=2a708cff93f1845b9239bc7d6310aef54e716c6a Zdenek works best with kernels he can just install.. I'd like to be able to arm him with a kernel that _should_ fix the issue he has reported in this BZ. Thanks!
I've started a scratch build of today's rawhide kernel with the patch Mike pointed to included. You will be able to find it here when the build completes: http://koji.fedoraproject.org/koji/taskinfo?taskID=11297054
So few more comments - my bare metal hw seems not to be able to reproduce this issue (with as well as without this patch) - at least not in 100 loops - so only our 'qemu' VM looks like a 'reliable' block-catcher so far. So as of now - it still blocks even with kernel from comment 6. All it does it create snapshot, and merges it back while it's being read in parallel by udev rule - when the right timing is met - it blocks. Other looping thread was noticed: kworker/1:2 D ffff880077bd7a18 10904 87 2 0x00000000 Workqueue: kcopyd do_work ffff880073d47a78 0000000000000082 ffff880077bce2c0 0000000000000000 ffff880076d18000 ffff880073ccb100 ffff880073d48000 ffff880073d47ab0 ffff880077bce2c0 ffff880077bce2c0 0000000000000000 ffff880073d47a90 Call Trace: [<ffffffff8185a05a>] schedule+0x3a/0x90 [<ffffffff8185ec50>] schedule_timeout+0x180/0x330 [<ffffffff8112c780>] ? detach_if_pending+0x190/0x190 [<ffffffff8112cfd8>] msleep+0x38/0x40 [<ffffffff816b387a>] __check_for_conflicting_io+0x6a/0x70 [<ffffffff816b5fa6>] pending_complete+0x1f6/0x2a0 [<ffffffff816b617e>] commit_callback+0xe/0x10 [<ffffffff816b75f8>] persistent_commit_exception+0x98/0x130 [<ffffffff816b6082>] complete_exception+0x32/0x40 [<ffffffff816b6116>] copy_callback+0x86/0xe0 [<ffffffff816b6090>] ? complete_exception+0x40/0x40 [<ffffffff816adf59>] run_complete_job+0x69/0xc0 [<ffffffff816ae00a>] process_jobs+0x5a/0xf0 [<ffffffff816adef0>] ? drop_pages+0x40/0x40 [<ffffffff816ae0ec>] do_work+0x4c/0xa0 [<ffffffff810c7470>] process_one_work+0x230/0x6a0 [<ffffffff810c73d9>] ? process_one_work+0x199/0x6a0 [<ffffffff810c792e>] worker_thread+0x4e/0x450 [<ffffffff810c78e0>] ? process_one_work+0x6a0/0x6a0 [<ffffffff810c78e0>] ? process_one_work+0x6a0/0x6a0 [<ffffffff810ce981>] kthread+0x101/0x120 [<ffffffff81103fb9>] ? trace_hardirqs_on_caller+0x129/0x1b0 [<ffffffff810ce880>] ? kthread_create_on_node+0x250/0x250 [<ffffffff81860e6f>] ret_from_fork+0x3f/0x70 [<ffffffff810ce880>] ? kthread_create_on_node+0x250/0x250 This has triggered Mikulases' bell - he's been already trying to fix similar problem here: http://www.redhat.com/archives/dm-devel/2014-May/msg00089.html Patch was not yet accept - so lets' try another kernel with this patch included.
Created attachment 1079459 [details] [PATCH] block: flush queued bios when the process blocks 4.3-rc3 rebase of Mikulas' original patch: http://www.redhat.com/archives/dm-devel/2014-May/msg00089.html
Josh, Sorry to bug you again.. could I trouble you to build on the kernel you provided in comment#6 by applying the patch I attached in comment#8? Thanks.
OK, here is another build with both patches included: http://koji.fedoraproject.org/koji/taskinfo?taskID=11309613
So far so good - passed already hundreds runs - I'll keep it running for a while - but it looks like this time the issue seems to be fixed.
We never picked anything up in the official Fedora kernels for this. Is there a set of accepted patches upstream we should be looking at?
(In reply to Josh Boyer from comment #12) > We never picked anything up in the official Fedora kernels for this. Is > there a set of accepted patches upstream we should be looking at? No, Jens ignored me (again)... still no fix included upstream :(
Still crashing with 4.10-rc (quite annoying)
(From 4.10-rc7 & snapshot-merge.sh) INFO: task kworker/0:4:20663 blocked for more than 120 seconds. Not tainted 4.10.0-0.rc7.git1.1.fc26.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/0:4 D12864 20663 2 0x00000000 Workqueue: kcopyd do_work Call Trace: __schedule+0x28c/0xae0 schedule+0x3d/0x90 snapshot_merge_next_chunks+0x28b/0x3b0 ? finish_wait+0x90/0x90 merge_callback+0x1b9/0x2b0 ? snapshot_merge_resume+0x40/0x40 run_complete_job+0x69/0xb0 process_jobs+0x84/0x140 ? drop_pages+0x40/0x40 do_work+0x4c/0xa0 process_one_work+0x260/0x750 ? process_one_work+0x1db/0x750 worker_thread+0x4e/0x4a0 ? process_one_work+0x750/0x750 kthread+0x12c/0x150 ? kthread_create_on_node+0x60/0x60 ret_from_fork+0x31/0x40 Showing all locks held in the system: 2 locks held by khungtaskd/26: #0: (rcu_read_lock){......}, at: [<ffffffff8c1a2ffb>] watchdog+0x9b/0x5c0 #1: (tasklist_lock){......}, at: [<ffffffff8c1147ad>] debug_show_all_locks+0x3d/0x1a0 2 locks held by kworker/0:4/20663: #0: ("kcopyd"){......}, at: [<ffffffff8c0d4d2b>] process_one_work+0x1db/0x750 #1: ((&kc->kcopyd_work)){......}, at: [<ffffffff8c0d4d2b>] process_one_work+0x1db/0x750 3 locks held by kworker/0:1/23433: #0: ("kcopyd"){......}, at: [<ffffffff8c0d4d2b>] process_one_work+0x1db/0x750 #1: ((&kc->kcopyd_work)){......}, at: [<ffffffff8c0d4d2b>] process_one_work+0x1db/0x750 #2: (&s->lock){......}, at: [<ffffffff8c7866a8>] pending_complete+0x188/0x2d0 2 locks held by systemd-udevd/5240: #0: (&md->io_barrier){......}, at: [<ffffffff8c772fc5>] dm_get_live_table+0x5/0xc0 #1: (&s->lock){......}, at: [<ffffffff8c7860dd>] snapshot_map+0x5d/0x450 ============================================= INFO: task systemd-udevd:5240 blocked for more than 120 seconds. Not tainted 4.10.0-0.rc7.git1.1.fc26.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. systemd-udevd D12944 5240 487 0x00000100 Call Trace: __schedule+0x28c/0xae0 ? rwsem_down_write_failed+0x2c7/0x4b0 schedule+0x3d/0x90 rwsem_down_write_failed+0x2cc/0x4b0 ? rwsem_down_write_failed+0x65/0x4b0 ? mempool_alloc_slab+0x1d/0x30 call_rwsem_down_write_failed+0x17/0x30 ? snapshot_map+0x5d/0x450 down_write+0x90/0xc0 ? snapshot_map+0x5d/0x450 snapshot_map+0x5d/0x450 __map_bio+0x3a/0x1d0 __split_and_process_bio+0x2b2/0x510 ? __split_and_process_bio+0x31/0x510 ? generic_make_request+0xf2/0x2e0 dm_make_request+0x7f/0x120 generic_make_request+0xff/0x2e0 submit_bio+0x73/0x150 ? __radix_tree_replace+0x50/0x110 do_mpage_readpage+0x671/0x830 ? I_BDEV+0x20/0x20 ? lru_cache_add+0x3a/0x80 mpage_readpages+0x1c0/0x220 ? I_BDEV+0x20/0x20 ? I_BDEV+0x20/0x20 ? alloc_pages_current+0x8f/0x1c0 ? __page_cache_alloc+0x146/0x190 blkdev_readpages+0x1d/0x20 __do_page_cache_readahead+0x2ea/0x3e0 ? __do_page_cache_readahead+0xe7/0x3e0 ? find_get_entry+0x1bc/0x280 force_page_cache_readahead+0xb8/0x120 page_cache_sync_readahead+0x43/0x50 generic_file_read_iter+0x6a5/0xa40 blkdev_read_iter+0x35/0x40 __vfs_read+0xe5/0x150 vfs_read+0xa3/0x180 SyS_read+0x58/0xc0 do_syscall_64+0x6c/0x1f0 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7ff21eacca50 RSP: 002b:00007ffc017397e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 0000561afd64f7a0 RCX: 00007ff21eacca50 RDX: 0000000000040000 RSI: 0000561afd64f7c8 RDI: 0000000000000007 RBP: 0000561afd5ee3c0 R08: 000000000006ca50 R09: 0000000000000000 R10: 0000561afd64f7b8 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000040000 R14: 0000561afd5ee410 R15: 0000000000040000 Showing all locks held in the system: 2 locks held by khungtaskd/26: #0: (rcu_read_lock){......}, at: [<ffffffff8c1a2ffb>] watchdog+0x9b/0x5c0 #1: (tasklist_lock){......}, at: [<ffffffff8c1147ad>] debug_show_all_locks+0x3d/0x1a0 2 locks held by kworker/0:4/20663: #0: ("kcopyd"){......}, at: [<ffffffff8c0d4d2b>] process_one_work+0x1db/0x750 #1: ((&kc->kcopyd_work)){......}, at: [<ffffffff8c0d4d2b>] process_one_work+0x1db/0x750 3 locks held by kworker/0:1/23433: #0: ("kcopyd"){......}, at: [<ffffffff8c0d4d2b>] process_one_work+0x1db/0x750 #1: ((&kc->kcopyd_work)){......}, at: [<ffffffff8c0d4d2b>] process_one_work+0x1db/0x750 #2: (&s->lock){......}, at: [<ffffffff8c7866a8>] pending_complete+0x188/0x2d0 2 locks held by systemd-udevd/5240: #0: (&md->io_barrier){......}, at: [<ffffffff8c772fc5>] dm_get_live_table+0x5/0xc0 #1: (&s->lock){......}, at: [<ffffffff8c7860dd>] snapshot_map+0x5d/0x450
This should be fixed in the upstream commit d67a5f4b5947aba4bfe9a80a2b86079c215ca755, so I'm closing this bug.