Bug 1267650

Summary: Snapshot merge is freezing with 4.3-rc kernel
Product: [Fedora] Fedora Reporter: Zdenek Kabelac <zkabelac>
Component: kernelAssignee: Mike Snitzer <msnitzer>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: agk, bmarzins, bmr, dwysocha, gansalmon, heinzm, itamar, jonathan, jwboyer, kernel-maint, lvm-team, madhu.chinakonda, mchehab, mpatocka, msnitzer, prajnoha, prockai, zkabelac
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1268996 1268997 (view as bug list) Environment:
Last Closed: 2017-07-13 03:04:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1268996, 1268997    
Attachments:
Description Flags
[PATCH] block: flush queued bios when the process blocks none

Description Zdenek Kabelac 2015-09-30 15:04:17 UTC
Description of problem:

New changes in  4.3-rc kernel causes  lvm2  test suite test  'snapshot-merge' to eventually hang  'dm' subsystem.
It gets to the state where plain call of  'dmsetup status' hangs in kernel.

To reproduces the issue - running many times  'snapshot-merge' test seems to trigger it some point  (hard to say if running test without problems for 100 times makes it 'ok') - but I'd say so far this is my 'estimate' for 'bisect' check.

Hang tasks:

pool 0: cpus=0 node=0 flags=0x0 nice=0 workers=5 idle: 28076 31863 24250
INFO: task kworker/0:0:31015 blocked for more than 120 seconds.
      Tainted: G        W       4.3.0-0.rc2.git1.1.fc24.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/0:0     D ffff8800779d7a18 12384 31015      2 0x00000080
Workqueue: kcopyd do_work
 ffff8800157dbb48 0000000000000082 0000000000000282 0000000000000000
 ffff8800055bb100 ffff8800754c3100 ffff8800157dc000 ffff88005551a000
 0000000000000000 0000000000000378 0000000000000001 ffff8800157dbb60
Call Trace:
 [<ffffffff818599ba>] schedule+0x3a/0x90
 [<ffffffff816b4b13>] snapshot_merge_next_chunks+0x2c3/0x3f0
 [<ffffffff810f8040>] ? wake_atomic_t_function+0x70/0x70
 [<ffffffff816b52ad>] merge_callback+0x23d/0x270
 [<ffffffff816b5070>] ? snapshot_merge_resume+0x40/0x40
 [<ffffffff816adc79>] run_complete_job+0x69/0xc0
 [<ffffffff816add2a>] process_jobs+0x5a/0xf0
 [<ffffffff816adc10>] ? drop_pages+0x40/0x40
 [<ffffffff816ade0c>] do_work+0x4c/0xa0
 [<ffffffff810c7480>] process_one_work+0x230/0x6a0
 [<ffffffff810c73e9>] ? process_one_work+0x199/0x6a0
 [<ffffffff810c793e>] worker_thread+0x4e/0x450
 [<ffffffff810c78f0>] ? process_one_work+0x6a0/0x6a0
 [<ffffffff810c78f0>] ? process_one_work+0x6a0/0x6a0
 [<ffffffff810ce991>] kthread+0x101/0x120
 [<ffffffff81103fc9>] ? trace_hardirqs_on_caller+0x129/0x1b0
 [<ffffffff810ce890>] ? kthread_create_on_node+0x250/0x250
 [<ffffffff818607af>] ret_from_fork+0x3f/0x70
 [<ffffffff810ce890>] ? kthread_create_on_node+0x250/0x250
96240.285052] 2 locks held by kworker/0:0/31015:
[96240.287231]  #0:  ("kcopyd"){++++..}, at: [<ffffffff810c73e9>] process_one_work+0x199/0x6a0
[96240.290512]  #1:  ((&kc->kcopyd_work)){+.+...}, at: [<ffffffff810c73e9>] process_one_work+0x199/0x6a0
[96240.294036] INFO: task systemd-udevd:28056 blocked for more than 120 seconds.
[96240.296869]       Tainted: G        W       4.3.0-0.rc2.git1.1.fc24.x86_64 #1
[96240.299748] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[96240.303116] systemd-udevd   D ffff880077bd7a18 13296 28056    499 0x00000084
[96240.306208]  ffff88004f2938b0 0000000000000086 ffff88004f293860 ffff880000000000
[96240.309513]  ffff880073c08000 ffff8800752c8000 ffff88004f294000 ffffffff00000000
[96240.312901]  ffff880055518400 ffff8800752c8000 fffffffe00000001 ffff88004f2938c8
[96240.316717] Call Trace:
[96240.319027]  [<ffffffff818599ba>] schedule+0x3a/0x90
[96240.322139]  [<ffffffff8185e1a0>] rwsem_down_write_failed+0x230/0x420
[96240.325627]  [<ffffffff8185dfc6>] ? rwsem_down_write_failed+0x56/0x420
[96240.329147]  [<ffffffff81025ac9>] ? sched_clock+0x9/0x10
[96240.332092]  [<ffffffff810e351a>] ? sched_clock_cpu+0x8a/0xb0
[96240.335122]  [<ffffffff81427053>] call_rwsem_down_write_failed+0x13/0x20
[96240.338025]  [<ffffffff8185d818>] ? down_write+0x98/0xc0
[96240.340564]  [<ffffffff816b569f>] ? snapshot_map+0x5f/0x440
[96240.343075]  [<ffffffff816b569f>] snapshot_map+0x5f/0x440
[96240.345540]  [<ffffffff816a244a>] __map_bio+0x3a/0x1a0
[96240.347918]  [<ffffffff816a4353>] __split_and_process_bio+0x313/0x4e0
[96240.350643]  [<ffffffff816a4071>] ? __split_and_process_bio+0x31/0x4e0
[96240.352995]  [<ffffffff816a47f2>] dm_make_request+0x92/0x140
[96240.355058]  [<ffffffff813e2c96>] generic_make_request+0xd6/0x110
[96240.357142]  [<ffffffff813e2d46>] submit_bio+0x76/0x170
[96240.359089]  [<ffffffff811f0bec>] ? lru_cache_add+0x1c/0x50
[96240.361090]  [<ffffffff812b5b22>] mpage_readpages+0x122/0x160
[96240.363106]  [<ffffffff812ae9f0>] ? I_BDEV+0x20/0x20
[96240.365011]  [<ffffffff812ae9f0>] ? I_BDEV+0x20/0x20
[96240.366916]  [<ffffffff81238f07>] ? alloc_pages_current+0x107/0x1a0
[96240.368994]  [<ffffffff811dd035>] ? __page_cache_alloc+0x135/0x170
[96240.371033]  [<ffffffff812af3ed>] blkdev_readpages+0x1d/0x20
[96240.373023]  [<ffffffff811ee5bd>] __do_page_cache_readahead+0x29d/0x320
[96240.375138]  [<ffffffff811ee490>] ? __do_page_cache_readahead+0x170/0x320
[96240.377269]  [<ffffffff811eeb4d>] force_page_cache_readahead+0x3d/0x70
[96240.379325]  [<ffffffff811eebc2>] page_cache_sync_readahead+0x42/0x50
[96240.381395]  [<ffffffff811df066>] generic_file_read_iter+0x4e6/0x5f0
[96240.383448]  [<ffffffff81173c6b>] ? __audit_syscall_entry+0xab/0xf0
[96240.385484]  [<ffffffff810e3595>] ? local_clock+0x15/0x20
[96240.387417]  [<ffffffff812b0005>] blkdev_read_iter+0x35/0x40
[96240.389802]  [<ffffffff8126b81c>] __vfs_read+0xcc/0x100
[96240.392086]  [<ffffffff8126c106>] vfs_read+0x86/0x130
[96240.394372]  [<ffffffff8126ce88>] SyS_read+0x58/0xd0
[96240.396630]  [<ffffffff81860432>] entry_SYSCALL_64_fastpath+0x12/0x76
[96240.399167] 2 locks held by systemd-udevd/28056:
[96240.401331]  #0:  (&md->io_barrier){......}, at: [<ffffffff816a46b5>] dm_get_live_table+0x5/0xb0
[96240.404618]  #1:  (&s->lock){++++..}, at: [<ffffffff816b569f>] snapshot_map+0x5f/0x440


Version-Release number of selected component (if applicable):
4.3.0-0.rc2.git1.1.fc24.x86_64
(but seen already with rc0 kernel)

How reproducible:


Steps to Reproduce:
1. for i in $(seq 1 100) ; do make check_local T=snaphost-merge VERBOSE=1 ; done
2. see if it hangs....
3.

Actual results:


Expected results:


Additional info:

Comment 1 Zdenek Kabelac 2015-09-30 15:07:44 UTC
# dmsetup table

LVMTEST27297vg-LV1_snap_1-cow: 0 21504 linear 253:2 122880
LVMTEST27297pv1: 0 204800 linear 7:0 2048
LVMTEST27297vg-LV1_snap-cow: 0 19456 linear 253:2 103424
LVMTEST27297vg-LV1_snap_1: 0 101376 snapshot 253:4 253:7 P 8
LVMTEST27297vg-LV1-real: 0 101376 linear 253:2 2048


# dmsetup  table --inactive

LVMTEST27297vg-LV1_snap_1-cow: 
LVMTEST27297pv1: 
LVMTEST27297vg-LV1_snap-cow: 
LVMTEST27297vg-LV1_snap_1: 
LVMTEST27297vg-LV1-real: 


# dmsetup info -c

LVMTEST27297vg-LV1_snap_1-cow 253   7 L--w    1    1      0 LVM-JXizlk4JWakaXqChMRYmHdA9IZnMihQ6zfdZdfLAIGc0RIWjuLOnvpRfGDHykWgF-cow 
LVMTEST27297pv1               253   2 L--w    3    1      0 TEST-LVMTEST27297pv1                                                     
LVMTEST27297vg-LV1_snap-cow   253   5 L--w    1    1      0 LVM-JXizlk4JWakaXqChMRYmHdA9IZnMihQ6EuM8qBu0dZJ5VrVj6SEN5LGu0T86v0TY-cow 
LVMTEST27297vg-LV1_snap_1     253   8 L--w    1    1      0 LVM-JXizlk4JWakaXqChMRYmHdA9IZnMihQ6zfdZdfLAIGc0RIWjuLOnvpRfGDHykWgF     
LVMTEST27297vg-LV1-real       253   4 L--w    2    1      0 LVM-JXizlk4JWakaXqChMRYmHdA9IZnMihQ6y7WtfrhwiWASXkPeVAuTvWbjlqOcnhUU-real

# dmsetup  status

LVMTEST27297vg-LV1_snap_1-cow: 0 21504 linear 
LVMTEST27297pv1: 0 204800 linear 
LVMTEST27297vg-LV1_snap-cow: 0 19456 linear 
...hangs here...

Comment 2 Zdenek Kabelac 2015-09-30 15:39:32 UTC
Showing all locks held in the system:
2 locks held by kworker/0:0/31015:
 #0:  ("kcopyd"){++++..}, at: [<ffffffff810c73e9>] process_one_work+0x199/0x6a0
 #1:  ((&kc->kcopyd_work)){+.+...}, at: [<ffffffff810c73e9>] process_one_work+0x199/0x6a0
2 locks held by systemd-udevd/28056:
 #0:  (&md->io_barrier){......}, at: [<ffffffff816a46b5>] dm_get_live_table+0x5/0xb0
 #1:  (&s->lock){++++..}, at: [<ffffffff816b569f>] snapshot_map+0x5f/0x440
2 locks held by dmsetup/28270:
 #0:  (&md->io_barrier){......}, at: [<ffffffff816a46b5>] dm_get_live_table+0x5/0xb0
 #1:  (&md->suspend_lock#2){+.+...}, at: [<ffffffff816a4bb2>] __dm_destroy+0xd2/0x2c0
2 locks held by dmsetup/28301:
 #0:  (&md->io_barrier){......}, at: [<ffffffff816a46b5>] dm_get_live_table+0x5/0xb0
 #1:  (&s->lock){++++..}, at: [<ffffffff816b3102>] snapshot_status+0x82/0x1b0
3 locks held by bash/28317:
 #0:  (sb_writers#3){.+.+.+}, at: [<ffffffff8126f179>] __sb_start_write+0xc9/0x110
 #1:  (rcu_read_lock){......}, at: [<ffffffff81521085>] __handle_sysrq+0x5/0x240
 #2:  (tasklist_lock){.+.+..}, at: [<ffffffff81101af0>] debug_show_all_locks+0x50/0x1e0

Comment 3 Zdenek Kabelac 2015-10-01 12:08:23 UTC
possibly related to this dm-devel report:

https://www.redhat.com/archives/dm-devel/2015-October/msg00000.html

Comment 4 Mike Snitzer 2015-10-01 18:37:21 UTC
(In reply to Zdenek Kabelac from comment #2)
> Showing all locks held in the system:
...
> 2 locks held by systemd-udevd/28056:
>  #0:  (&md->io_barrier){......}, at: [<ffffffff816a46b5>]
> dm_get_live_table+0x5/0xb0
>  #1:  (&s->lock){++++..}, at: [<ffffffff816b569f>] snapshot_map+0x5f/0x440

> 2 locks held by dmsetup/28270:
>  #0:  (&md->io_barrier){......}, at: [<ffffffff816a46b5>]
> dm_get_live_table+0x5/0xb0
>  #1:  (&md->suspend_lock#2){+.+...}, at: [<ffffffff816a4bb2>]
> __dm_destroy+0xd2/0x2c0

> 2 locks held by dmsetup/28301:
>  #0:  (&md->io_barrier){......}, at: [<ffffffff816a46b5>]
> dm_get_live_table+0x5/0xb0
>  #1:  (&s->lock){++++..}, at: [<ffffffff816b3102>] snapshot_status+0x82/0x1b0

Given the above locks that are held, and a dmsetup task being in __dm_destroy, it really _does_ look like the following report/fix is relevant:

(In reply to Zdenek Kabelac from comment #3)
> possibly related to this dm-devel report:
> 
> https://www.redhat.com/archives/dm-devel/2015-October/msg00000.html

Comment 5 Mike Snitzer 2015-10-01 18:42:30 UTC
Josh,

Could you help me build a test fedoera rawhide kernel that includes this fix that I've staged in linux-dm.git's 'for-next' branch?:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-next&id=2a708cff93f1845b9239bc7d6310aef54e716c6a

Zdenek works best with kernels he can just install.. I'd like to be able to arm him with a kernel that _should_ fix the issue he has reported in this BZ.

Thanks!

Comment 6 Josh Boyer 2015-10-01 19:15:50 UTC
I've started a scratch build of today's rawhide kernel with the patch Mike pointed to included.  You will be able to find it here when the build completes:

http://koji.fedoraproject.org/koji/taskinfo?taskID=11297054

Comment 7 Zdenek Kabelac 2015-10-02 14:20:21 UTC
So few more comments - my bare metal hw seems not to be able to reproduce this issue (with as well as without this patch) - at least not in 100 loops - so only our 'qemu' VM looks like a 'reliable' block-catcher so far.

So as of now - it still blocks even with kernel from  comment 6.

All it does it create snapshot, and merges it back while it's being read in parallel by udev rule - when the right timing is met - it blocks.

Other looping thread was noticed:

kworker/1:2     D ffff880077bd7a18 10904    87      2 0x00000000
Workqueue: kcopyd do_work
 ffff880073d47a78 0000000000000082 ffff880077bce2c0 0000000000000000
 ffff880076d18000 ffff880073ccb100 ffff880073d48000 ffff880073d47ab0
 ffff880077bce2c0 ffff880077bce2c0 0000000000000000 ffff880073d47a90
Call Trace:
 [<ffffffff8185a05a>] schedule+0x3a/0x90
 [<ffffffff8185ec50>] schedule_timeout+0x180/0x330
 [<ffffffff8112c780>] ? detach_if_pending+0x190/0x190
 [<ffffffff8112cfd8>] msleep+0x38/0x40
 [<ffffffff816b387a>] __check_for_conflicting_io+0x6a/0x70
 [<ffffffff816b5fa6>] pending_complete+0x1f6/0x2a0
 [<ffffffff816b617e>] commit_callback+0xe/0x10
 [<ffffffff816b75f8>] persistent_commit_exception+0x98/0x130
 [<ffffffff816b6082>] complete_exception+0x32/0x40
 [<ffffffff816b6116>] copy_callback+0x86/0xe0
 [<ffffffff816b6090>] ? complete_exception+0x40/0x40
 [<ffffffff816adf59>] run_complete_job+0x69/0xc0
 [<ffffffff816ae00a>] process_jobs+0x5a/0xf0
 [<ffffffff816adef0>] ? drop_pages+0x40/0x40
 [<ffffffff816ae0ec>] do_work+0x4c/0xa0
 [<ffffffff810c7470>] process_one_work+0x230/0x6a0
 [<ffffffff810c73d9>] ? process_one_work+0x199/0x6a0
 [<ffffffff810c792e>] worker_thread+0x4e/0x450
 [<ffffffff810c78e0>] ? process_one_work+0x6a0/0x6a0
 [<ffffffff810c78e0>] ? process_one_work+0x6a0/0x6a0
 [<ffffffff810ce981>] kthread+0x101/0x120
 [<ffffffff81103fb9>] ? trace_hardirqs_on_caller+0x129/0x1b0
 [<ffffffff810ce880>] ? kthread_create_on_node+0x250/0x250
 [<ffffffff81860e6f>] ret_from_fork+0x3f/0x70
 [<ffffffff810ce880>] ? kthread_create_on_node+0x250/0x250


This has triggered Mikulases' bell - he's been already trying to fix similar problem here: http://www.redhat.com/archives/dm-devel/2014-May/msg00089.html

Patch was not yet accept - so lets' try another kernel with this patch included.

Comment 8 Mike Snitzer 2015-10-02 14:29:49 UTC
Created attachment 1079459 [details]
[PATCH] block: flush queued bios when the process blocks

4.3-rc3 rebase of Mikulas' original patch:
http://www.redhat.com/archives/dm-devel/2014-May/msg00089.html

Comment 9 Mike Snitzer 2015-10-02 14:31:30 UTC
Josh,

Sorry to bug you again.. could I trouble you to build on the kernel you provided in comment#6 by applying the patch I attached in comment#8?

Thanks.

Comment 10 Josh Boyer 2015-10-02 14:51:02 UTC
OK, here is another build with both patches included:

http://koji.fedoraproject.org/koji/taskinfo?taskID=11309613

Comment 11 Zdenek Kabelac 2015-10-02 17:21:57 UTC
So far so good - passed already hundreds runs - I'll keep it running for a while - but it looks like this time the issue seems to be fixed.

Comment 12 Josh Boyer 2015-10-29 12:35:48 UTC
We never picked anything up in the official Fedora kernels for this.  Is there a set of accepted patches upstream we should be looking at?

Comment 13 Mike Snitzer 2015-12-11 17:04:22 UTC
(In reply to Josh Boyer from comment #12)
> We never picked anything up in the official Fedora kernels for this.  Is
> there a set of accepted patches upstream we should be looking at?

No, Jens ignored me (again)... still no fix included upstream :(

Comment 14 Zdenek Kabelac 2016-12-23 18:05:09 UTC
Still crashing  with 4.10-rc
(quite annoying)

Comment 15 Zdenek Kabelac 2017-02-12 17:56:57 UTC
(From  4.10-rc7   & snapshot-merge.sh)

 INFO: task kworker/0:4:20663 blocked for more than 120 seconds.
       Not tainted 4.10.0-0.rc7.git1.1.fc26.x86_64 #1
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 kworker/0:4     D12864 20663      2 0x00000000
 Workqueue: kcopyd do_work
 Call Trace:
  __schedule+0x28c/0xae0
  schedule+0x3d/0x90
  snapshot_merge_next_chunks+0x28b/0x3b0
  ? finish_wait+0x90/0x90
  merge_callback+0x1b9/0x2b0
  ? snapshot_merge_resume+0x40/0x40
  run_complete_job+0x69/0xb0
  process_jobs+0x84/0x140
  ? drop_pages+0x40/0x40
  do_work+0x4c/0xa0
  process_one_work+0x260/0x750
  ? process_one_work+0x1db/0x750
  worker_thread+0x4e/0x4a0
  ? process_one_work+0x750/0x750
  kthread+0x12c/0x150
  ? kthread_create_on_node+0x60/0x60
  ret_from_fork+0x31/0x40
 
 Showing all locks held in the system:
 2 locks held by khungtaskd/26:
  #0:  (rcu_read_lock){......}, at: [<ffffffff8c1a2ffb>] watchdog+0x9b/0x5c0
  #1:  (tasklist_lock){......}, at: [<ffffffff8c1147ad>] debug_show_all_locks+0x3d/0x1a0
 2 locks held by kworker/0:4/20663:
  #0:  ("kcopyd"){......}, at: [<ffffffff8c0d4d2b>] process_one_work+0x1db/0x750
  #1:  ((&kc->kcopyd_work)){......}, at: [<ffffffff8c0d4d2b>] process_one_work+0x1db/0x750
 3 locks held by kworker/0:1/23433:
  #0:  ("kcopyd"){......}, at: [<ffffffff8c0d4d2b>] process_one_work+0x1db/0x750
  #1:  ((&kc->kcopyd_work)){......}, at: [<ffffffff8c0d4d2b>] process_one_work+0x1db/0x750
  #2:  (&s->lock){......}, at: [<ffffffff8c7866a8>] pending_complete+0x188/0x2d0
 2 locks held by systemd-udevd/5240:
  #0:  (&md->io_barrier){......}, at: [<ffffffff8c772fc5>] dm_get_live_table+0x5/0xc0
  #1:  (&s->lock){......}, at: [<ffffffff8c7860dd>] snapshot_map+0x5d/0x450

 =============================================

 INFO: task systemd-udevd:5240 blocked for more than 120 seconds.
       Not tainted 4.10.0-0.rc7.git1.1.fc26.x86_64 #1
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 systemd-udevd   D12944  5240    487 0x00000100
 Call Trace:
  __schedule+0x28c/0xae0
  ? rwsem_down_write_failed+0x2c7/0x4b0
  schedule+0x3d/0x90
  rwsem_down_write_failed+0x2cc/0x4b0
  ? rwsem_down_write_failed+0x65/0x4b0
  ? mempool_alloc_slab+0x1d/0x30
  call_rwsem_down_write_failed+0x17/0x30
  ? snapshot_map+0x5d/0x450
  down_write+0x90/0xc0
  ? snapshot_map+0x5d/0x450
  snapshot_map+0x5d/0x450
  __map_bio+0x3a/0x1d0
  __split_and_process_bio+0x2b2/0x510
  ? __split_and_process_bio+0x31/0x510
  ? generic_make_request+0xf2/0x2e0
  dm_make_request+0x7f/0x120
  generic_make_request+0xff/0x2e0
  submit_bio+0x73/0x150
  ? __radix_tree_replace+0x50/0x110
  do_mpage_readpage+0x671/0x830
  ? I_BDEV+0x20/0x20
  ? lru_cache_add+0x3a/0x80
  mpage_readpages+0x1c0/0x220
  ? I_BDEV+0x20/0x20
  ? I_BDEV+0x20/0x20
  ? alloc_pages_current+0x8f/0x1c0
  ? __page_cache_alloc+0x146/0x190
  blkdev_readpages+0x1d/0x20
  __do_page_cache_readahead+0x2ea/0x3e0
  ? __do_page_cache_readahead+0xe7/0x3e0
  ? find_get_entry+0x1bc/0x280
  force_page_cache_readahead+0xb8/0x120
  page_cache_sync_readahead+0x43/0x50
  generic_file_read_iter+0x6a5/0xa40
  blkdev_read_iter+0x35/0x40
  __vfs_read+0xe5/0x150
  vfs_read+0xa3/0x180
  SyS_read+0x58/0xc0
  do_syscall_64+0x6c/0x1f0
  entry_SYSCALL64_slow_path+0x25/0x25
 RIP: 0033:0x7ff21eacca50
 RSP: 002b:00007ffc017397e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
 RAX: ffffffffffffffda RBX: 0000561afd64f7a0 RCX: 00007ff21eacca50
 RDX: 0000000000040000 RSI: 0000561afd64f7c8 RDI: 0000000000000007
 RBP: 0000561afd5ee3c0 R08: 000000000006ca50 R09: 0000000000000000
 R10: 0000561afd64f7b8 R11: 0000000000000246 R12: 0000000000000000
 R13: 0000000000040000 R14: 0000561afd5ee410 R15: 0000000000040000
 
 Showing all locks held in the system:
 2 locks held by khungtaskd/26:
  #0:  (rcu_read_lock){......}, at: [<ffffffff8c1a2ffb>] watchdog+0x9b/0x5c0
  #1:  (tasklist_lock){......}, at: [<ffffffff8c1147ad>] debug_show_all_locks+0x3d/0x1a0
 2 locks held by kworker/0:4/20663:
  #0:  ("kcopyd"){......}, at: [<ffffffff8c0d4d2b>] process_one_work+0x1db/0x750
  #1:  ((&kc->kcopyd_work)){......}, at: [<ffffffff8c0d4d2b>] process_one_work+0x1db/0x750
 3 locks held by kworker/0:1/23433:
  #0:  ("kcopyd"){......}, at: [<ffffffff8c0d4d2b>] process_one_work+0x1db/0x750
  #1:  ((&kc->kcopyd_work)){......}, at: [<ffffffff8c0d4d2b>] process_one_work+0x1db/0x750
  #2:  (&s->lock){......}, at: [<ffffffff8c7866a8>] pending_complete+0x188/0x2d0
 2 locks held by systemd-udevd/5240:
  #0:  (&md->io_barrier){......}, at: [<ffffffff8c772fc5>] dm_get_live_table+0x5/0xc0
  #1:  (&s->lock){......}, at: [<ffffffff8c7860dd>] snapshot_map+0x5d/0x450

Comment 16 Mikuláš Patočka 2017-07-13 03:04:26 UTC
This should be fixed in the upstream commit d67a5f4b5947aba4bfe9a80a2b86079c215ca755, so I'm closing this bug.