473273 – Increase readhaead for LV if LVM over MD RAID device

Bug 473273 - Increase readhaead for LV if LVM over MD RAID device

Summary: Increase readhaead for LV if LVM over MD RAID device

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	lvm2
Sub Component:
Version:	11
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Milan Broz
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-11-27 13:28 UTC by Milan Broz
Modified:	2013-03-01 04:07 UTC (History)
CC List:	19 users (show)
Fixed In Version:	2.02.48-1.fc11
Clone Of:
Environment:
Last Closed:	2009-07-19 10:32:01 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Milan Broz 2008-11-27 13:28:07 UTC

+++ This bug was initially created as a clone of Bug #232843 +++

(similar to bug #450922 but this time for LVM over MD RAID)

Reproducing the same steps (from comment #37 above), on the same hardware, on kernel-2.6.27.5-117.fc10.x86_64 produces the same pitifully slow LVM over MD results, until I manually increase the readahead for each LV to match the MD.

# default readaheads
[root@nano ~]# blockdev --getra /dev/sda5 /dev/md3 /dev/vgr5/home
256
3072
256

# raw bottom-layer disk device speed is ~106 MB/s
[root@nano ~]# sync ; echo 1 > /proc/sys/vm/drop_caches ; dd if=/dev/sda5 of=/dev/null bs=4096 count=$((2**30*4/4096))
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB) copied, 40.6367 s, 106 MB/s

# raw middle-layer MD device read rate is ~289 MB/s
[root@nano ~]# sync ; echo 1 > /proc/sys/vm/drop_caches ; dd if=/dev/md3 of=/dev/null bs=4096 count=$((2**30*4/4096))
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB) copied, 14.8595 s, 289 MB/s

# raw top-layer LVM device read rate is only ~83 MB/s
[root@nano ~]# sync ; echo 1 > /proc/sys/vm/drop_caches ; dd if=/dev/vgr5/home of=/dev/null bs=4096 count=$((2**30*4/4096))
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB) copied, 52.1008 s, 82.4 MB/s


# increase the LV readahead to match the underlying MD readahead:
[root@nano ~]# blockdev --setra 3072 /dev/vgr5/home

# raw LVM device now comes very close to matching the MD device @ 288 MB/s
[root@nano ~]# sync ; echo 1 > /proc/sys/vm/drop_caches ; dd if=/dev/vgr5/home of=/dev/null bs=4096 count=$((2**30*4/4096))
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB) copied, 14.9148 s, 288 MB/s


so, it seems I'll still be having to add  "blockdev --setra 4096 /dev/dm-*"  to /etc/rc.local as a workaround.

Comment 1 Jason Farrell 2008-11-27 23:39:49 UTC

Ah - thanks Milan. tracking both bugs now.

Comment 2 Alasdair Kergon 2009-04-28 13:26:33 UTC

Milan - is this resolved now, or do you envisage any further tweaks?

Comment 3 Milan Broz 2009-04-28 20:20:23 UTC

No, it is not yet completely resolved - readahead is not automatically increased for md devices (only for lvm stripes). But it should not be too complicated to add that optimisation (we are already reading chunk size from sysfs, it need only add reading of # of md devices and use the same formula to calculate readahead).

(But administrator can easily workaround this by persistently set readahead manuually using lvchange -r).

Comment 4 Milan Broz 2009-05-27 15:15:20 UTC

The upstream lvm2 2.02.47 code now inherits readahead from underlying device, this change should solve this bug.

There is already build for rawhide (probably not visible in repo before F11 release).

Comment 5 Jason Farrell 2009-05-27 21:02:49 UTC

looks like lvm2-2.02.45-4.fc11 is what'll land in F11 final. looking forward to the subsequent fix so a workaround isn't needed. thanks.

Comment 6 Bug Zapper 2009-06-09 09:57:29 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 7 Fedora Update System 2009-07-03 09:29:55 UTC

lvm2-2.02.48-1.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/lvm2-2.02.48-1.fc11

Comment 8 Jason Farrell 2009-07-04 01:46:15 UTC

I have just updated to lvm2-2.02.48-1.fc11 and am very happy to report that this problem looks to be almost fixed. I say almost because *most*, but not all, of my underlying device readaheads are now properly inherited by LVs at boot (which will negate the need for a rc.local "blockdev --setra NNNN /dev/dm*" workaround to maximize throughput).

The problem seems to be only with mdraid level 10 (raid0 and 5 are good). The reported readahead on my raid10 /dev/md1 is 2048, but none of the LVs on top of it have inherited it, remaining at 256.


[root@nano ~]# cat /etc/mdadm.conf
MAILADDR root
ARRAY /dev/md0 level=raid1 num-devices=4 UUID=53f9fef2:6c86b573:6e4d8dc5:5f17557a
ARRAY /dev/md1 level=raid10 num-devices=4 UUID=db962e4c:3eea0be4:f2551a68:5227bb7b
ARRAY /dev/md2 level=raid0 num-devices=4 UUID=3646f3df:080b1adc:a9da9d8b:3167acae
ARRAY /dev/md3 level=raid5 num-devices=4 UUID=ed922e99:8c1c49bb:06c75cec:bd7b7a53
[root@nano ~]# for i in /dev/md?; do printf "%4d %s\n" $(blockdev --getra $i) $i; done
 256 /dev/md0
2048 /dev/md1
4096 /dev/md2
3072 /dev/md3
[root@nano ~]# vgs
  VG    #PV #LV #SN Attr   VSize   VFree
  vgr0    1   2   0 wz--n- 200.00G 10.00G
  vgr10   1   7   0 wz--n-  80.00G  9.34G
  vgr5    1   5   0 wz--n-   1.48T 51.88G
[root@nano ~]# for i in /dev/mapper/vgr*; do printf "%4d %s\n" $(blockdev --getra $i) $i; done
4096 /dev/mapper/vgr0-0safe
4096 /dev/mapper/vgr0-0tmp
 256 /dev/mapper/vgr10-butter
 256 /dev/mapper/vgr10-rootf10
 256 /dev/mapper/vgr10-rootf11
 256 /dev/mapper/vgr10-rootf11rc1
 256 /dev/mapper/vgr10-rootf9
 256 /dev/mapper/vgr10-swap
 256 /dev/mapper/vgr10-swapcrypt
3072 /dev/mapper/vgr5-0safe--backup
3072 /dev/mapper/vgr5-archive--nobu
3072 /dev/mapper/vgr5-foo
3072 /dev/mapper/vgr5-home
3072 /dev/mapper/vgr5-repo

Comment 9 Jason Farrell 2009-07-04 01:56:14 UTC

The 'vgs' above should've been a 'pvs' to better show the mapping...

[root@nano ~]# pvs
  PV         VG    Fmt  Attr PSize   PFree
  /dev/md1   vgr10 lvm2 a-    80.00G  9.34G
  /dev/md2   vgr0  lvm2 a-   200.00G 10.00G
  /dev/md3   vgr5  lvm2 a-     1.48T 51.88G

Comment 10 Milan Broz 2009-07-04 07:06:18 UTC

(In reply to comment #8)
> The problem seems to be only with mdraid level 10 (raid0 and 5 are good). The
> reported readahead on my raid10 /dev/md1 is 2048, but none of the LVs on top of
> it have inherited it, remaining at 256.

interesting, it works for me for mdraid10 (actually for any underlying device).

please can you deactivate one LV on the affected VG and then post output of

"lvchange -a y vgr10/<somelv> -vvvv" ?

Also metadata for vgr10 can be useful here (should be backuped in /etc/lvm/backup/vgr10, or simply run lvmdump)

Comment 11 Jason Farrell 2009-07-04 14:54:22 UTC

FIXED.

The obvious error (in hindsight) on my part was forgetting to mkinitrd after updating lvm2 in order to pack the new bin/lvm which is responsible for activating my vgr10 rootlv at boottime.

Thanks Milan - and thanks everyone else. I dare say this old bug is now closed.

Comment 12 Fedora Update System 2009-07-11 17:15:40 UTC

lvm2-2.02.48-1.fc11 has been pushed to the Fedora 11 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update lvm2'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F11/FEDORA-2009-7468

Comment 13 Fedora Update System 2009-07-19 10:31:16 UTC

lvm2-2.02.48-1.fc11 has been pushed to the Fedora 11 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.