Users are experiencing much lower performance from RAID1 created from LVM than from MD. Tests show this is due to the small regionsize chosen for RAID1 in LVM. The user can choose larger regionsizes when creating their RAID LV, but they cannot change it later. This bug is for changing the default regionsize.
For example: https://www.redhat.com/archives/linux-lvm/2016-November/msg00003.html
Might be true for the related use case reported on linux-lvm: " I am experiencing a dramatic degradation of the sequential write speed on a raid1 LV that resides on two USB-3 connected harddisks (UAS enabled), compared to parallel access to both drives without raid or compared to MD raid: - parallel sequential writes LVs on both disks: 140 MB/s per disk - sequential write to MD raid1 without bitmap: 140 MB/s - sequential write to MD raid1 with bitmap: 48 MB/s - sequential write to LVM raid1: 17 MB/s !! " Using SATA disks for PVs, this is not reproducable: [root@o ~]# lvcreate --nosync -R512K -m1 -nr -y -L128G ssd_host WARNING: New raid1 won't be synchronised. Don't read what you didn't write! Logical volume "r" created. [root@o ~]# dd of=/dev/ssd_host/r if=/dev/zero oflag=direct iflag=fullblock bs=1G count=1 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.48014 s, 240 MB/s [root@o ~]# lvremove -y ssd_host/r Logical volume "r" successfully removed [root@o ~]# lvcreate --nosync -R128M -m1 -nr -y -L128G ssd_host WARNING: New raid1 won't be synchronised. Don't read what you didn't write! Logical volume "r" created. [root@o ~]# dd of=/dev/ssd_host/r if=/dev/zero oflag=direct iflag=fullblock bs=1G count=1 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.48841 s, 239 MB/s So the users USB stack plays a role, which is a niche use case for RAID.
Set up a 2PV VG on USB transprt. Getting worse performance with larger raid1 regionsize on it (seqwrite 512k regionsize: 109MB/s, 64m regionsize: 83.8MB/s). It is this not conclusive to assume larger regionsize always is better: [root@o ~]# pvs|grep usb /dev/sdh usb_vg lvm2 a-- 238.47g 238.47g /dev/sdi usb_vg lvm2 a-- 238.47g 238.47g [root@o ~]# vgs usb_vg VG #PV #LV #SN Attr VSize VFree usb_vg 2 0 0 wz--n- 476.95g 476.95g [root@o ~]# lvcreate --nosync -R512k -m1 -nr -y -L1G usb_vg WARNING: New raid1 won't be synchronised. Don't read what you didn't write! Logical volume "r" created.[root@o ~]# lvs -ao+regionsize,devices usb_vg [root@o ~]# lvs -ao+regionsize,devices usb_vg LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Region Devices r usb_vg Rwi-a-r--- 1.00g 100.00 512.00k r_rimage_0(0),r_rimage_1(0) [root@o ~]# dd of=/dev/usb_vg/r bs=256M count=1 iflag=fullblock oflag=direct if=/dev/zero 1+0 records in 1+0 records out 268435456 bytes (268 MB, 256 MiB) copied, 2.46946 s, 109 MB/s [root@o ~]# lvremove -y usb_vg Logical volume "r" successfully removed [root@o ~]# lvcreate --nosync -R64m -m1 -nr -y -L1G usb_vg WARNING: New raid1 won't be synchronised. Don't read what you didn't write! Logical volume "r" created. [root@o ~]# lvs -ao+regionsize,devices usb_vg LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Region Devices r usb_vg Rwi-a-r--- 1.00g 100.00 64.00m r_rimage_0(0),r_rimage_1(0) [root@o ~]# dd of=/dev/usb_vg/r bs=256M count=1 iflag=fullblock oflag=direct if=/dev/zero 1+0 records in 1+0 records out 268435456 bytes (268 MB, 256 MiB) copied, 3.20161 s, 83.8 MB/s
Optimzed USB config attaching USB disks to seperate controllers to reduce bandwidth variations found in test as of comment #3 lead to little thoughput variation on regionsize variations. We need the exact commands used to get the users results documented in comment #2. [root@o ~]# lvcreate --nosync -R512k -m1 -nr -y -L1G usb_vg WARNING: New raid1 won't be synchronised. Don't read what you didn't write! Logical volume "r" created. [root@o ~]# dd of=/dev/usb_vg/r bs=256M count=1 iflag=fullblock oflag=direct if=/dev/zero 1+0 records in 1+0 records out 268435456 bytes (268 MB, 256 MiB) copied, 1.64452 s, 163 MB/s [root@o ~]# lvremove -y usb_vg Logical volume "r" successfully removed [root@o ~]# lvcreate --nosync -R64m -m1 -nr -y -L1G usb_vg WARNING: New raid1 won't be synchronised. Don't read what you didn't write! Logical volume "r" created. [root@o ~]# dd of=/dev/usb_vg/r bs=256M count=1 iflag=fullblock oflag=direct if=/dev/zero 1+0 records in 1+0 records out 268435456 bytes (268 MB, 256 MiB) copied, 1.73097 s, 155 MB/s [root@o ~]# dd of=/dev/usb_vg/r bs=256M count=1 iflag=fullblock oflag=direct if=/dev/zero 1+0 records in 1+0 records out 268435456 bytes (268 MB, 256 MiB) copied, 1.66767 s, 161 MB/s [root@o ~]# lvremove -y usb_vg Logical volume "r" successfully removed [root@o ~]# lvcreate --nosync -R64k -m1 -nr -y -L1G usb_vg WARNING: New raid1 won't be synchronised. Don't read what you didn't write! Logical volume "r" created. [root@o ~]# dd of=/dev/usb_vg/r bs=256M count=1 iflag=fullblock oflag=direct if=/dev/zero 1+0 records in 1+0 records out 268435456 bytes (268 MB, 256 MiB) copied, 1.62996 s, 165 MB/s [root@o ~]# dd of=/dev/usb_vg/r bs=256M count=1 iflag=fullblock oflag=direct if=/dev/zero^C [root@o ~]# lvremove -y usb_vg Logical volume "r" successfully removed [root@o ~]# lvcreate --nosync -R8k -m1 -nr -y -L1G usb_vg WARNING: New raid1 won't be synchronised. Don't read what you didn't write! Logical volume "r" created. [root@o ~]# dd of=/dev/usb_vg/r bs=256M count=1 iflag=fullblock oflag=direct if=/dev/zero 1+0 records in 1+0 records out 268435456 bytes (268 MB, 256 MiB) copied, 1.67379 s, 160 MB/s [root@o ~]# dd of=/dev/usb_vg/r bs=256M count=1 iflag=fullblock oflag=direct if=/dev/zero 1+0 records in 1+0 records out 268435456 bytes (268 MB, 256 MiB) copied, 1.65624 s, 162 MB/s
(In reply to Heinz Mauelshagen from comment #4) User commands are in initial mail "[linux-lvm] very slow sequential writes on lvm raid1 (bitmap?)" dated 11/7/2016. I'm not seeing qualifying throughput variations for sequential writes with MD (w/o and with bitmap) and in case of the latter varying bitmap chunk size (aka lvm regionsize): [root@o ~]# mdadm -C /dev/md/r --bitmap=none -l 1 -n 2 /dev/sd[bh] mdadm: /dev/sdb appears to be part of a raid array: level=raid1 devices=2 ctime=Thu Nov 17 14:49:47 2016 mdadm: Note: this array has metadata at the start and may not be suitable as a boot device. If you plan to store '/boot' on this device please ensure that your boot-loader understands md/v1.x metadata, or use --metadata=0.90 mdadm: /dev/sdh appears to be part of a raid array: level=raid1 devices=2 ctime=Thu Nov 17 14:49:47 2016 Continue creating array? y mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md/r started. [root@o ~]# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] [raid1] md127 : active raid1 sdh[1] sdb[0] 249928000 blocks super 1.2 [2/2] [UU] [>....................] resync = 0.0% (170112/249928000) finish=24.4min speed=170112K/sec unused devices: <none> [root@o ~]# dd of=/dev/md/r bs=128M count=1 iflag=fullblock oflag=direct if=/dev/zero 1+0 records in 1+0 records out 134217728 bytes (134 MB, 128 MiB) copied, 0.851073 s, 158 MB/s [root@o ~]# dd of=/dev/md/r bs=128M count=1 iflag=fullblock oflag=direct if=/dev/zero 1+0 records in 1+0 records out 134217728 bytes (134 MB, 128 MiB) copied, 0.825584 s, 163 MB/s [root@o ~]# dd of=/dev/md/r bs=128M count=1 iflag=fullblock oflag=direct if=/dev/zero 1+0 records in 1+0 records out 134217728 bytes (134 MB, 128 MiB) copied, 0.830841 s, 162 MB/s [root@o ~]# dd of=/dev/md/r bs=128M count=1 iflag=fullblock oflag=direct if=/dev/zero 1+0 records in 1+0 records out 134217728 bytes (134 MB, 128 MiB) copied, 0.856382 s, 157 MB/s [root@o ~]# mdadm -S /dev/md/r mdadm: stopped /dev/md/r [root@o ~]# mdadm -C /dev/md/r --bitmap-chunk=524288 --bitmap=internal -l 1 -n 2 /dev/sd[bh] mdadm: /dev/sdb appears to be part of a raid array: level=raid1 devices=2 ctime=Thu Nov 17 14:50:04 2016 mdadm: Note: this array has metadata at the start and may not be suitable as a boot device. If you plan to store '/boot' on this device please ensure that your boot-loader understands md/v1.x metadata, or use --metadata=0.90 mdadm: /dev/sdh appears to be part of a raid array: level=raid1 devices=2 ctime=Thu Nov 17 14:50:04 2016 Continue creating array? y mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md/r started. [root@o ~]# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] [raid1] md127 : active raid1 sdh[1] sdb[0] 249928000 blocks super 1.2 [2/2] [UU] [>....................] resync = 0.0% (184192/249928000) finish=22.5min speed=184192K/sec bitmap: 1/1 pages [4KB], 524288KB chunk unused devices: <none> [root@o ~]# dd of=/dev/md/r bs=128M count=1 iflag=fullblock oflag=direct if=/dev/zero 1+0 records in 1+0 records out 134217728 bytes (134 MB, 128 MiB) copied, 0.882936 s, 152 MB/s [root@o ~]# dd of=/dev/md/r bs=128M count=1 iflag=fullblock oflag=direct if=/dev/zero 1+0 records in 1+0 records out 134217728 bytes (134 MB, 128 MiB) copied, 0.855478 s, 157 MB/s [root@o ~]# dd of=/dev/md/r bs=128M count=1 iflag=fullblock oflag=direct if=/dev/zero 1+0 records in 1+0 records out 134217728 bytes (134 MB, 128 MiB) copied, 0.877999 s, 153 MB/s [root@o ~]# mdadm -S /dev/md/r mdadm: stopped /dev/md/r [root@o ~]# mdadm -C /dev/md/r --bitmap-chunk=512 --bitmap=internal -l 1 -n 2 /dev/sd[bh] mdadm: /dev/sdb appears to be part of a raid array: level=raid1 devices=2 ctime=Thu Nov 17 14:50:54 2016 mdadm: Note: this array has metadata at the start and may not be suitable as a boot device. If you plan to store '/boot' on this device please ensure that your boot-loader understands md/v1.x metadata, or use --metadata=0.90 mdadm: /dev/sdh appears to be part of a raid array: level=raid1 devices=2 ctime=Thu Nov 17 14:50:54 2016 Continue creating array? y mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md/r started. [root@o ~]# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] [raid1] md127 : active raid1 sdh[1] sdb[0] 249928000 blocks super 1.2 [2/2] [UU] [>....................] resync = 0.1% (265280/249928000) finish=15.6min speed=265280K/sec bitmap: 239/239 pages [956KB], 512KB chunk unused devices: <none> [root@o ~]# dd of=/dev/md/r bs=128M count=1 iflag=fullblock oflag=direct if=/dev/zero 1+0 records in 1+0 records out 134217728 bytes (134 MB, 128 MiB) copied, 0.87625 s, 153 MB/s [root@o ~]# dd of=/dev/md/r bs=128M count=1 iflag=fullblock oflag=direct if=/dev/zero 1+0 records in 1+0 records out 134217728 bytes (134 MB, 128 MiB) copied, 0.839441 s, 160 MB/s [root@o ~]# dd of=/dev/md/r bs=128M count=1 iflag=fullblock oflag=direct if=/dev/zero 1+0 records in 1+0 records out 134217728 bytes (134 MB, 128 MiB) copied, 0.830878 s, 162 MB/s [root@o ~]# dd of=/dev/md/r bs=128M count=1 iflag=fullblock oflag=direct if=/dev/zero 1+0 records in 1+0 records out 134217728 bytes (134 MB, 128 MiB) copied, 0.825255 s, 163 MB/s
In my test when I use 200ms write delay on secondary leg (simulation of disk with rather very large latency on seek) there is observable speed-up. So it likely depends how quickly you can sync _tmeta writes on your attached USB storage. If the USB is quite slow with seek, but still very fast with stream write - increase of region size and reducing frequency of bitmap updates leads to significant speedup. i.e. in my case very small test case on slower hw (T61) with 200ms dm-delay dev usage: 512K regionsize and 64M write -> ~67MB/s 32M regionsize and 64M write -> ~128MB/s So I guess we need to know disk-types in use in the users' case. Is there some USB storage available there days giving large streaming bandwidth but poor seek access on small sector writes. IMHO I'd guess some flash MicroSD cards might be giving this bad performance behavior. Assuming we need to query a user for this.
I think the reason is the random IO caused by the bitmap updates and the poor seek times of my 5000 RPM harddisks. I did my tests with bs=1M oflag=direct while Heinz used bs=1G oflag=direct. This leads to much less bitmap updates (>1000 vs 60 for 1G of data). I'd expect that those bitmap updates cause two seeks each. This random IO is, of course, very expensive, especially if slow 5000 RPM disks are used... I've recorded some tests with blktrace. The results can be downloaded from http://leo.kloburg.at/tmp/lvm-raid1-bitmap/ # lsusb -t /: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M |__ Port 4: Dev 3, If 0, Class=Hub, Driver=hub/4p, 5000M |__ Port 1: Dev 9, If 0, Class=Mass Storage, Driver=uas, 5000M |__ Port 2: Dev 8, If 0, Class=Mass Storage, Driver=uas, 5000M # readlink -f /sys/class/block/sd[bc]/device/ /sys/devices/pci0000:00/0000:00:14.0/usb2/2-4/2-4.2/2-4.2:1.0/host2/target2:0:0/2:0:0:0 /sys/devices/pci0000:00/0000:00:14.0/usb2/2-4/2-4.1/2-4.1:1.0/host3/target3:0:0/3:0:0:0 # echo noop > /sys/block/sdb/queue/scheduler # echo noop > /sys/block/sdc/queue/scheduler # pvcreate /dev/sdb3 # pvcreate /dev/sdc3 # vgcreate vg_t /dev/sd[bc]3 # lvcreate --type raid1 -m 1 -L30G --regionsize=512k --nosync -y -n lv_t vg_t # ---------- regionsize 512k, dd bs=1M oflags=direct # blktrace -d /dev/sdb3 -d /dev/sdc3 -d /dev/vg_t/lv_t -D raid1-512k-reg-direct-bs-1M/ # dd if=/dev/zero of=/dev/vg_t/lv_t bs=1M count=1000 oflag=direct 1048576000 bytes (1,0 GB) copied, 55,7425 s, 18,8 MB/s Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb3 0,00 0,00 0,00 54,00 0,00 18504,00 685,33 0,14 2,52 0,00 2,52 1,70 9,20 sdc3 0,00 0,00 0,00 54,00 0,00 18504,00 685,33 0,14 2,52 0,00 2,52 1,67 9,00 dm-9 0,00 0,00 0,00 18,00 0,00 18432,00 2048,00 1,00 54,06 0,00 54,06 55,39 99,70 # ---------- regionsize 512k, dd bs=1G oflags=direct (like Heinz Mauelshagens test) # blktrace -d /dev/sdb3 -d /dev/sdc3 -d /dev/vg_t/lv_t -D raid1-512k-reg-direct-bs-1G/ # dd if=/dev/zero of=/dev/vg_t/lv_t bs=1G count=1 oflag=direct 1+0 records in 1+0 records out 1073741824 bytes (1,1 GB) copied, 7,3139 s, 147 MB/s Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb3 0,00 0,00 0,00 306,00 0,00 156672,00 1024,00 135,47 441,34 0,00 441,34 3,27 100,00 sdc3 0,00 0,00 0,00 302,00 0,00 154624,00 1024,00 129,46 421,76 0,00 421,76 3,31 100,00 dm-9 0,00 0,00 0,00 0,00 0,00 0,00 0,00 648,81 0,00 0,00 0,00 0,00 100,00 # ---------- regionsize 512k, dd bs=1M conv=fsync # blktrace -d /dev/sdb3 -d /dev/sdc3 -d /dev/vg_t/lv_t -D raid1-512k-reg-fsync-bs-1M/ # dd if=/dev/zero of=/dev/vg_t/lv_t bs=1M count=1000 conv=fsync 1048576000 bytes (1,0 GB) copied, 7,75605 s, 135 MB/s Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb3 0,00 21971,00 0,00 285,00 0,00 145920,00 1024,00 141,99 540,75 0,00 540,75 3,51 100,00 sdc3 0,00 21971,00 0,00 310,00 0,00 158720,00 1024,00 106,86 429,35 0,00 429,35 3,23 100,00 dm-9 0,00 0,00 0,00 0,00 0,00 0,00 0,00 24561,60 0,00 0,00 0,00 0,00 100,00
See https://bugzilla.redhat.com/show_bug.cgi?id=1392947 for upstream enhancement to change region size on existing RaidLVs.
Upstream commit 5ae7a016b8e5796d36cf491345b1cf8e43ec9ea5