I am having issues with LVM lvmraid RAID1. I am using Linux 4.19.16-1 and lvm2 2.03.02-1 on Debian 11 (buster). I have a desktop system with two 500GB disks, a HDD and an SSD. Each disk has a partition containing a LUKS volume, which contain LVM PVs, which form a single VG, which contains a swap LV and a rootfs LV. The swap LV is backed by the HDD PV and the rootfs LV by both PVs. I often randomly get sets of random processes going into D state for an extended period. It feels like all I/O is stalled as the disk light doesn't go on at all. I do not know if these issues are new in the Linux version I use since I only recently switched to this setup from having just the HDD. Sometimes switching to a different virtual console causes the I/O to start again. Sometimes logging in over SSH causes the I/O to start again. Sometimes I just need to wait some minutes until the I/O starts again. Recently I split the HDD from the RAID1 array (using `lvconvert --yes --splitmirrors 1 --trackchanges`) and this has completely prevented the I/O stalls from happening for several days. Of course, I don't really want to keep running in this configuration in case of SSD failures. I'm wondering if the difference in latency between the two devices is causing the Linux mq-deadline I/O scheduler to display suboptimal I/O behaviour. I'm wondering if there is any better way to investigate this than the naive script below that dumps kernel stacks for processes in D state. I wanted to make it only print processes that have been in D state for more than 5 seconds but the stime item in /proc/$PID/stat only seems to report cumulative kernel time rather than time since the last state transition. I tried looking in dmesg but the warnings about blocked tasks do not seem to happen even when /proc/sys/kernel/hung_task_timeout_secs is set to 5 seconds. # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 465.8G 0 disk ├─sda1 8:1 0 243M 0 part │ └─md127 9:127 0 242M 0 raid1 /boot ├─sda2 8:2 0 1K 0 part └─sda5 8:5 0 465.5G 0 part └─sda5_crypt 253:9 0 465.5G 0 crypt ├─hostname-root_rmeta_1 253:4 0 4M 0 lvm │ └─hostname-root 253:7 0 449.9G 0 lvm / └─hostname-root_rimage_1 253:6 0 449.9G 0 lvm └─hostname-root 253:7 0 449.9G 0 lvm / sdb 8:48 0 465.8G 0 disk ├─sdb1 8:49 0 243M 0 part │ └─md127 9:127 0 242M 0 raid1 /boot ├─sdb2 8:50 0 1K 0 part └─sdb5 8:53 0 465.5G 0 part └─sdb5_crypt 253:0 0 465.5G 0 crypt ├─hostname-root_rmeta_0 253:1 0 4M 0 lvm │ └─hostname-root 253:7 0 449.9G 0 lvm / ├─hostname-root_rimage_0 253:2 0 449.9G 0 lvm │ └─hostname-root 253:7 0 449.9G 0 lvm / └─hostname-swap 253:8 0 15.6G 0 lvm [SWAP] # lvs -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert root hostname rwi-aor--- 449.91g 100.00 [root_rimage_0] hostname iwi-aor--- 449.91g [root_rimage_1] hostname iwi-aor--- 449.91g [root_rmeta_0] hostname ewi-aor--- 4.00m [root_rmeta_1] hostname ewi-aor--- 4.00m swap hostname -wi-ao---- <15.60g # head /sys/block/sd{a,b}/queue/scheduler ==> /sys/block/sda/queue/scheduler <== [mq-deadline] none ==> /sys/block/sdb/queue/scheduler <== [mq-deadline] none # cat `which dump-d-state-process-stacks` #!/bin/bash while sleep 0.1 ; do grep -l State:.D /proc/*/status 2> /dev/null | sed 's_/proc/__;s_/status__' | xargs -I _ bash -c ' ret=0 link=$(readlink /proc/_/exe) || ret=$? if [ $ret -eq 0 ] ; then echo START PROCESS ------------------------------------------------- date echo $link tr "\0" " " < /proc/_/cmdline cat /proc/_/stack echo END PROCESS ---------------------------------------------------- fi ' done
I'm not sure what is happening here and I haven't heard of this reported before. Instead of splitting the raid1, you could try 'writebehind' and 'writemostly' options (see lvmraid(7)).
I switched the HDD to writemostly some months ago and that resolved the issue for me.
(In reply to Paul Wise (Debian) from comment #2) > I switched the HDD to writemostly some months ago and that resolved the > issue for me. Closing as of this comment. FWIW: could be another disk controller issue throttling I/O.
writemostly seems like a workaround rather than a fix, surely RAID1 should work sanely on disks of differing latency without requiring that?