Bug 1277589 - [RFE] Mirror reading is blocked by region synchronization (mirror_flush)
Summary: [RFE] Mirror reading is blocked by region synchronization (mirror_flush)
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: lvm2
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
Assignee: LVM and device-mapper development team
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-11-03 15:32 UTC by Zdenek Kabelac
Modified: 2019-10-23 20:04 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-23 20:04:02 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Zdenek Kabelac 2015-11-03 15:32:04 UTC
Description of problem:

While exploring some other issue  - I've noticed there is IMHO unnecessary delay for 'read()' operation on mirror.

Example - Creating mirror from 2  legs - where a master leg is on 'fast' device and secondary leg is on 'dm-delay'-ed device exposes problem where  e.g. blkid scan on such mirror basically waits till full region gets in sync.

Likely relates to this code in mirror_map():

       /*
	 * If region is not in-sync queue the bio.
	 */
	if (!r || (r == -EWOULDBLOCK)) {
		if (rw == READA)
			return -EWOULDBLOCK;

		queue_bio(ms, bio, rw);
		return DM_MAPIO_SUBMITTED;
	}

While I'm aware old 'mirror' target is seen as obsolete - if the fix that would have allowed to serve available date wouldn't be to invasive - it would be nice to have it.



Here are 2 blocked task stack traces in the moment blkid is blocked:

sysrq: SysRq : Show Blocked State
  task                        PC stack   pid father
kworker/0:0     D ffff88013a9d5cc0     0  1562      2 0x00000080
Workqueue: kmirrord do_mirror [dm_mirror]
 ffff8801126778f8 0000000000000092 ffff880100000000 ffff8800b77da780
 ffff88012960cf00 ffff880112678000 ffff88013a9d5cc0 7fffffffffffffff
 ffff880112677aa0 ffff880112677c00 ffff880112677910 ffffffff81604c6d
Call Trace:
 [<ffffffff81604c6d>] schedule+0x3d/0x90
 [<ffffffff81609909>] schedule_timeout+0x2f9/0x490
 [<ffffffff810aece9>] ? mark_held_locks+0x79/0xa0
 [<ffffffff810e69bb>] ? ktime_get+0x6b/0x120
 [<ffffffff810aee3d>] ? trace_hardirqs_on_caller+0x12d/0x1b0
 [<ffffffff810aeecd>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff8112635f>] ? __delayacct_blkio_start+0x1f/0x30
 [<ffffffff816040c4>] io_schedule_timeout+0xa4/0x110
 [<ffffffff81605684>] wait_for_completion_io+0xb4/0x120
 [<ffffffff8108a330>] ? wake_up_q+0x60/0x60
 [<ffffffffa0e1b630>] sync_io+0xe0/0x140 [dm_mod]
 [<ffffffff810afd25>] ? __lock_acquire+0xb95/0x1b80
 [<ffffffff81605606>] ? wait_for_completion_io+0x36/0x120
 [<ffffffffa0e1b862>] dm_io+0x1d2/0x200 [dm_mod]
 [<ffffffffa0e1af70>] ? bio_next_page+0x20/0x20 [dm_mod]
 [<ffffffffa0e1afd0>] ? km_get_page+0x60/0x60 [dm_mod]
 [<ffffffffa0e42dc2>] mirror_flush+0xc2/0x130 [dm_mirror]
 [<ffffffffa0e35a84>] disk_flush+0x34/0x150 [dm_log]
 [<ffffffff81172e2e>] ? mempool_kfree+0xe/0x10
 [<ffffffff81173279>] ? mempool_free+0x29/0x80
 [<ffffffffa0e3b75a>] dm_rh_update_states+0x2ea/0x300 [dm_region_hash]
 [<ffffffffa0e436d1>] do_mirror+0xb1/0x270 [dm_mirror]
 [<ffffffff81076b54>] process_one_work+0x204/0x6e0
 [<ffffffff81076ac3>] ? process_one_work+0x173/0x6e0
 [<ffffffff81077271>] worker_thread+0x241/0x4b0
 [<ffffffff8107d5b6>] ? kthread+0xd6/0x120
 [<ffffffff81083ef5>] ? preempt_count_sub+0xa5/0xf0
 [<ffffffff81077030>] ? process_one_work+0x6e0/0x6e0
 [<ffffffff8107d5e1>] kthread+0x101/0x120
 [<ffffffff8107d4e0>] ? kthread_create_on_node+0x250/0x250
 [<ffffffff8160bdbf>] ret_from_fork+0x3f/0x70
 [<ffffffff8107d4e0>] ? kthread_create_on_node+0x250/0x250
systemd-udevd   D ffff88013abd5cc0     0  2710    335 0x00000080
 ffff8800404f7bc8 0000000000000096 ffff880100000000 ffff880134cb4f00
 ffff8800413f4f00 ffff8800404f8000 ffff88013abd5cc0 7fffffffffffffff
 0000000000000082 ffffffff816054c0 ffff8800404f7be0 ffffffff81604c6d
Call Trace:
 [<ffffffff816054c0>] ? bit_wait+0x50/0x50
 [<ffffffff81604c6d>] schedule+0x3d/0x90
 [<ffffffff81609909>] schedule_timeout+0x2f9/0x490
 [<ffffffff8112635f>] ? __delayacct_blkio_start+0x1f/0x30
 [<ffffffff810aee3d>] ? trace_hardirqs_on_caller+0x12d/0x1b0
 [<ffffffff810aeecd>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff8112635f>] ? __delayacct_blkio_start+0x1f/0x30
 [<ffffffff816054c0>] ? bit_wait+0x50/0x50
 [<ffffffff816040c4>] io_schedule_timeout+0xa4/0x110
 [<ffffffff816054f5>] bit_wait_io+0x35/0x50
 [<ffffffff816052cb>] __wait_on_bit_lock+0x4b/0xa0
 [<ffffffff8116f5ec>] __lock_page_killable+0x9c/0xa0
 [<ffffffff810a3640>] ? autoremove_wake_function+0x40/0x40
 [<ffffffff81171baf>] generic_file_read_iter+0x42f/0x5d0
 [<ffffffff81090857>] ? sched_clock_local+0x17/0x80
 [<ffffffff81225a35>] blkdev_read_iter+0x35/0x40
 [<ffffffff811e319a>] __vfs_read+0xaa/0xe0
 [<ffffffff811e3a26>] vfs_read+0x86/0x130
 [<ffffffff811e4709>] SyS_read+0x49/0xb0
 [<ffffffff8160ba1b>] entry_SYSCALL_64_fastpath+0x16/0x73
Sched Debug Version: v0.11, 4.3.0-rc7-00033-g27ba239 #31



Version-Release number of selected component (if applicable):
4.3 kernel

How reproducible:


Steps to Reproduce:
1. lvcreate mirror  with  one PV on a device with delayed sectors
2. 
3.

Actual results:
read from  non-delayed  master leg is heavily impacted by wait on region sync (mirror flush)

Expected results:
read is not so heavily influenced by mirror flush.

Additional info:

Comment 1 Jonathan Earl Brassow 2019-10-23 20:04:02 UTC
Won't add new features for 'mirror' target.  RAID target already has a concept of --writemostly / --writebehind that could be useful here also.


Note You need to log in before you can comment on or make changes to this bug.