Bug 1672496
| Summary: | lvmraid RAID1: I/O stalls with HDD+SSD | ||
|---|---|---|---|
| Product: | [Community] LVM and device-mapper | Reporter: | Paul Wise (Debian) <pabs3> |
| Component: | device-mapper | Assignee: | LVM and device-mapper development team <lvm-team> |
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | cluster-qe <cluster-qe> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | high | ||
| Version: | unspecified | CC: | agk, heinzm, jbrassow, msnitzer, prajnoha, thornber, zkabelac |
| Target Milestone: | --- | Flags: | rule-engine:
lvm-technical-solution?
|
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-09-06 12:23:33 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
I'm not sure what is happening here and I haven't heard of this reported before. Instead of splitting the raid1, you could try 'writebehind' and 'writemostly' options (see lvmraid(7)). I switched the HDD to writemostly some months ago and that resolved the issue for me. (In reply to Paul Wise (Debian) from comment #2) > I switched the HDD to writemostly some months ago and that resolved the > issue for me. Closing as of this comment. FWIW: could be another disk controller issue throttling I/O. writemostly seems like a workaround rather than a fix, surely RAID1 should work sanely on disks of differing latency without requiring that? |
I am having issues with LVM lvmraid RAID1. I am using Linux 4.19.16-1 and lvm2 2.03.02-1 on Debian 11 (buster). I have a desktop system with two 500GB disks, a HDD and an SSD. Each disk has a partition containing a LUKS volume, which contain LVM PVs, which form a single VG, which contains a swap LV and a rootfs LV. The swap LV is backed by the HDD PV and the rootfs LV by both PVs. I often randomly get sets of random processes going into D state for an extended period. It feels like all I/O is stalled as the disk light doesn't go on at all. I do not know if these issues are new in the Linux version I use since I only recently switched to this setup from having just the HDD. Sometimes switching to a different virtual console causes the I/O to start again. Sometimes logging in over SSH causes the I/O to start again. Sometimes I just need to wait some minutes until the I/O starts again. Recently I split the HDD from the RAID1 array (using `lvconvert --yes --splitmirrors 1 --trackchanges`) and this has completely prevented the I/O stalls from happening for several days. Of course, I don't really want to keep running in this configuration in case of SSD failures. I'm wondering if the difference in latency between the two devices is causing the Linux mq-deadline I/O scheduler to display suboptimal I/O behaviour. I'm wondering if there is any better way to investigate this than the naive script below that dumps kernel stacks for processes in D state. I wanted to make it only print processes that have been in D state for more than 5 seconds but the stime item in /proc/$PID/stat only seems to report cumulative kernel time rather than time since the last state transition. I tried looking in dmesg but the warnings about blocked tasks do not seem to happen even when /proc/sys/kernel/hung_task_timeout_secs is set to 5 seconds. # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 465.8G 0 disk ├─sda1 8:1 0 243M 0 part │ └─md127 9:127 0 242M 0 raid1 /boot ├─sda2 8:2 0 1K 0 part └─sda5 8:5 0 465.5G 0 part └─sda5_crypt 253:9 0 465.5G 0 crypt ├─hostname-root_rmeta_1 253:4 0 4M 0 lvm │ └─hostname-root 253:7 0 449.9G 0 lvm / └─hostname-root_rimage_1 253:6 0 449.9G 0 lvm └─hostname-root 253:7 0 449.9G 0 lvm / sdb 8:48 0 465.8G 0 disk ├─sdb1 8:49 0 243M 0 part │ └─md127 9:127 0 242M 0 raid1 /boot ├─sdb2 8:50 0 1K 0 part └─sdb5 8:53 0 465.5G 0 part └─sdb5_crypt 253:0 0 465.5G 0 crypt ├─hostname-root_rmeta_0 253:1 0 4M 0 lvm │ └─hostname-root 253:7 0 449.9G 0 lvm / ├─hostname-root_rimage_0 253:2 0 449.9G 0 lvm │ └─hostname-root 253:7 0 449.9G 0 lvm / └─hostname-swap 253:8 0 15.6G 0 lvm [SWAP] # lvs -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert root hostname rwi-aor--- 449.91g 100.00 [root_rimage_0] hostname iwi-aor--- 449.91g [root_rimage_1] hostname iwi-aor--- 449.91g [root_rmeta_0] hostname ewi-aor--- 4.00m [root_rmeta_1] hostname ewi-aor--- 4.00m swap hostname -wi-ao---- <15.60g # head /sys/block/sd{a,b}/queue/scheduler ==> /sys/block/sda/queue/scheduler <== [mq-deadline] none ==> /sys/block/sdb/queue/scheduler <== [mq-deadline] none # cat `which dump-d-state-process-stacks` #!/bin/bash while sleep 0.1 ; do grep -l State:.D /proc/*/status 2> /dev/null | sed 's_/proc/__;s_/status__' | xargs -I _ bash -c ' ret=0 link=$(readlink /proc/_/exe) || ret=$? if [ $ret -eq 0 ] ; then echo START PROCESS ------------------------------------------------- date echo $link tr "\0" " " < /proc/_/cmdline cat /proc/_/stack echo END PROCESS ---------------------------------------------------- fi ' done