+++ This bug was initially created as a clone of Bug #232843 +++ (similar to bug #450922 but this time for LVM over MD RAID) Reproducing the same steps (from comment #37 above), on the same hardware, on kernel-2.6.27.5-117.fc10.x86_64 produces the same pitifully slow LVM over MD results, until I manually increase the readahead for each LV to match the MD. # default readaheads [root@nano ~]# blockdev --getra /dev/sda5 /dev/md3 /dev/vgr5/home 256 3072 256 # raw bottom-layer disk device speed is ~106 MB/s [root@nano ~]# sync ; echo 1 > /proc/sys/vm/drop_caches ; dd if=/dev/sda5 of=/dev/null bs=4096 count=$((2**30*4/4096)) 1048576+0 records in 1048576+0 records out 4294967296 bytes (4.3 GB) copied, 40.6367 s, 106 MB/s # raw middle-layer MD device read rate is ~289 MB/s [root@nano ~]# sync ; echo 1 > /proc/sys/vm/drop_caches ; dd if=/dev/md3 of=/dev/null bs=4096 count=$((2**30*4/4096)) 1048576+0 records in 1048576+0 records out 4294967296 bytes (4.3 GB) copied, 14.8595 s, 289 MB/s # raw top-layer LVM device read rate is only ~83 MB/s [root@nano ~]# sync ; echo 1 > /proc/sys/vm/drop_caches ; dd if=/dev/vgr5/home of=/dev/null bs=4096 count=$((2**30*4/4096)) 1048576+0 records in 1048576+0 records out 4294967296 bytes (4.3 GB) copied, 52.1008 s, 82.4 MB/s # increase the LV readahead to match the underlying MD readahead: [root@nano ~]# blockdev --setra 3072 /dev/vgr5/home # raw LVM device now comes very close to matching the MD device @ 288 MB/s [root@nano ~]# sync ; echo 1 > /proc/sys/vm/drop_caches ; dd if=/dev/vgr5/home of=/dev/null bs=4096 count=$((2**30*4/4096)) 1048576+0 records in 1048576+0 records out 4294967296 bytes (4.3 GB) copied, 14.9148 s, 288 MB/s so, it seems I'll still be having to add "blockdev --setra 4096 /dev/dm-*" to /etc/rc.local as a workaround.
Ah - thanks Milan. tracking both bugs now.
Milan - is this resolved now, or do you envisage any further tweaks?
No, it is not yet completely resolved - readahead is not automatically increased for md devices (only for lvm stripes). But it should not be too complicated to add that optimisation (we are already reading chunk size from sysfs, it need only add reading of # of md devices and use the same formula to calculate readahead). (But administrator can easily workaround this by persistently set readahead manuually using lvchange -r).
The upstream lvm2 2.02.47 code now inherits readahead from underlying device, this change should solve this bug. There is already build for rawhide (probably not visible in repo before F11 release).
looks like lvm2-2.02.45-4.fc11 is what'll land in F11 final. looking forward to the subsequent fix so a workaround isn't needed. thanks.
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle. Changing version to '11'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
lvm2-2.02.48-1.fc11 has been submitted as an update for Fedora 11. http://admin.fedoraproject.org/updates/lvm2-2.02.48-1.fc11
I have just updated to lvm2-2.02.48-1.fc11 and am very happy to report that this problem looks to be almost fixed. I say almost because *most*, but not all, of my underlying device readaheads are now properly inherited by LVs at boot (which will negate the need for a rc.local "blockdev --setra NNNN /dev/dm*" workaround to maximize throughput). The problem seems to be only with mdraid level 10 (raid0 and 5 are good). The reported readahead on my raid10 /dev/md1 is 2048, but none of the LVs on top of it have inherited it, remaining at 256. [root@nano ~]# cat /etc/mdadm.conf MAILADDR root ARRAY /dev/md0 level=raid1 num-devices=4 UUID=53f9fef2:6c86b573:6e4d8dc5:5f17557a ARRAY /dev/md1 level=raid10 num-devices=4 UUID=db962e4c:3eea0be4:f2551a68:5227bb7b ARRAY /dev/md2 level=raid0 num-devices=4 UUID=3646f3df:080b1adc:a9da9d8b:3167acae ARRAY /dev/md3 level=raid5 num-devices=4 UUID=ed922e99:8c1c49bb:06c75cec:bd7b7a53 [root@nano ~]# for i in /dev/md?; do printf "%4d %s\n" $(blockdev --getra $i) $i; done 256 /dev/md0 2048 /dev/md1 4096 /dev/md2 3072 /dev/md3 [root@nano ~]# vgs VG #PV #LV #SN Attr VSize VFree vgr0 1 2 0 wz--n- 200.00G 10.00G vgr10 1 7 0 wz--n- 80.00G 9.34G vgr5 1 5 0 wz--n- 1.48T 51.88G [root@nano ~]# for i in /dev/mapper/vgr*; do printf "%4d %s\n" $(blockdev --getra $i) $i; done 4096 /dev/mapper/vgr0-0safe 4096 /dev/mapper/vgr0-0tmp 256 /dev/mapper/vgr10-butter 256 /dev/mapper/vgr10-rootf10 256 /dev/mapper/vgr10-rootf11 256 /dev/mapper/vgr10-rootf11rc1 256 /dev/mapper/vgr10-rootf9 256 /dev/mapper/vgr10-swap 256 /dev/mapper/vgr10-swapcrypt 3072 /dev/mapper/vgr5-0safe--backup 3072 /dev/mapper/vgr5-archive--nobu 3072 /dev/mapper/vgr5-foo 3072 /dev/mapper/vgr5-home 3072 /dev/mapper/vgr5-repo
The 'vgs' above should've been a 'pvs' to better show the mapping... [root@nano ~]# pvs PV VG Fmt Attr PSize PFree /dev/md1 vgr10 lvm2 a- 80.00G 9.34G /dev/md2 vgr0 lvm2 a- 200.00G 10.00G /dev/md3 vgr5 lvm2 a- 1.48T 51.88G
(In reply to comment #8) > The problem seems to be only with mdraid level 10 (raid0 and 5 are good). The > reported readahead on my raid10 /dev/md1 is 2048, but none of the LVs on top of > it have inherited it, remaining at 256. interesting, it works for me for mdraid10 (actually for any underlying device). please can you deactivate one LV on the affected VG and then post output of "lvchange -a y vgr10/<somelv> -vvvv" ? Also metadata for vgr10 can be useful here (should be backuped in /etc/lvm/backup/vgr10, or simply run lvmdump)
FIXED. The obvious error (in hindsight) on my part was forgetting to mkinitrd after updating lvm2 in order to pack the new bin/lvm which is responsible for activating my vgr10 rootlv at boottime. Thanks Milan - and thanks everyone else. I dare say this old bug is now closed.
lvm2-2.02.48-1.fc11 has been pushed to the Fedora 11 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update lvm2'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F11/FEDORA-2009-7468
lvm2-2.02.48-1.fc11 has been pushed to the Fedora 11 stable repository. If problems still persist, please make note of it in this bug report.