Hide Forgot
Description of problem: If a drive fails in such a way that it is still present in the system, but gives out I/O errors, LVM with mirror LVs running lvmetad cannot recover from it. Since no event in udev is issued, lvmetad is not aware of the failure. The repair of such a failed mirror did not complete even by running: lvconvert --repair (called both from dmeventd and on cmd line) Version-Release number of selected component (if applicable): lvm2-2.02.105-14.el7.x86_64 device-mapper-1.02.84-14.el7.x86_64 How reproducible: 100% Steps to Reproduce: Get a device to fail with I/O errors, but still being visible in the system. (unmap iscsi drive assigned to the client on the storage server for example) Actual results: Mirror gets I/O errors, lvmetad is not aware of the failure. If some write is attempted it will hang and deadlock, since the repair will never really start or complete (will be stuck at 99.8%) Expected results: The same behavior as without lvmetad - the repair should complete based on policies chosen in lvm.conf. Additional info:
Similar thing can be observed with kernel's fault injection feature (echo 1 > /sys/block/.../make_it_fail). However, in this case, the lvconvert --repair does not hang as in the case with real HW Nenad described above, but lvconvert just finished wihtout actually repairing the failed mirror (it does finish with the mirror repaired in case lvmetad is not used). I'll post some logs shortly...
Created attachment 887498 [details] Verbose logs of mirror repair attempt with lvmetad running. Attaching a file with verbose logs of lvconvert -vvvv --repair with lvmetad running.
Created attachment 887554 [details] Logs from using failure injection (lvconvert not looping, but failing to repair) Also, ataching logs from alternative way of hitting the problem with lvmetad+lvconvert --repair on failed mirror leg. The leg shot down by failure injection: echo 1 > /sys/block/sda/make_it_fail. Contains logs from both lvconvert --repair with lvmetad (which fails to repair the mirror) and lvconvert --repari without using lvmetad which does the reapir correctly. There's one difference though form what's seen on real HW failure - in case of simulating the problem via failure injection, the lvconvert does not loop, but just reports "successful finish" despite the mirror is not repaired. Anyway, this can also help in identifying the problem with mirror repair + lvmetad.
Created attachment 887567 [details] Log of lvconvert with lvmetad (real HW) To avoid other influences such aas attempts at auto-repair, I have disabled dmeventd and monitoring of LVM. This debug log is the first repair attempt, done manually on command line. It loops in the end as mentioned by Peter.
Created attachment 887568 [details] Log of lvconvert without lvmetad (real HW) Debug output of lvconvert without lvmetad which finishes successfully.
Created attachment 888382 [details] Reproducer with logs using failure injection (lvconvert looping and failing to repair) (In reply to Peter Rajnoha from comment #4) > There's one difference though form what's seen on real HW failure - in case > of simulating the problem via failure injection, the lvconvert does not > loop, but just reports "successful finish" despite the mirror is not > repaired. Anyway, this can also help in identifying the problem with mirror > repair + lvmetad. And now I've managed to make the lvconvert to loop infinitely when using lvmetad and when injecting the mirror failure. Attaching complete logs with reproducer (both for working case without lvmetad and non-working case with lvmetad). The lvconvert looped at 93.8% in this test and never finished the repair.
When pvscan --cache is run just after the mirror failure and before anything tries to write to the mirror, the subsequent lvconvert --repair works correctly then. So it seems that lvconvert is just doing wrong decision based on outdated information.
This is a duplicate of bug 1085553, but i'd rather allow this bug to stay open and dependent on 1085553 since it is not immediately obvious.
This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux.
This is pretty weird actually. We have a test which covers a very similar scenario, and even when I explicitly disable lvmetad notifications, lvconvert --repair still works OK (cf. lvconvert-repair-transient.sh). I'll try to find the crucial difference and make a test for the failing case also. Might be related to *which* legs are failing...
*** Bug 1083285 has been marked as a duplicate of this bug. ***
Petr, it is possible the recovery is doing nothing? I have seen few cases where mirror with core log was "repaired" and reported as success by dmeventd but the missing PV was left as one of the legs as lvconvert somehow did not see any missing devices. I suggest moving code re-enabling the devices only after `vgreduce --removemissing`. And moving wait before vgreduce to check the mirror was actually repaired. From another instance of the duplicate Bug 1083285: Apr 1 08:46:47 node01 lvm[4441]: Log device 253:5 has failed (D). Apr 1 08:46:47 node01 lvm[4441]: Device failure in revolution_9-mirror_1. Apr 1 08:46:47 node01 lvm[4441]: /dev/sdh1: read failed after 0 of 512 at 34348662784: Input/output error Apr 1 08:46:47 node01 lvm[4441]: /dev/sdh1: read failed after 0 of 512 at 34348728320: Input/output error Apr 1 08:46:47 node01 lvm[4441]: /dev/sdh1: read failed after 0 of 512 at 0: Input/output error Apr 1 08:46:47 node01 lvm[4441]: /dev/sdh1: read failed after 0 of 512 at 4096: Input/output error Apr 1 08:46:47 node01 lvm[4441]: intermediate VG write failed. ^ seems this is the common signature of the failures Apr 1 08:46:47 node01 lvm[4441]: Mirror log status: 1 of 1 images failed. Apr 1 08:46:47 node01 lvm[4441]: Trying to up-convert to 3 images, 1 logs. Apr 1 08:46:47 node01 lvm[4441]: Unable to create LV mirror_1_mlog in Volume Group revolution_9: name already in use. Apr 1 08:46:47 node01 lvm[4441]: Failed to create mirror log. Apr 1 08:46:47 node01 lvm[4441]: Trying to up-convert to 2 images, 1 logs. Apr 1 08:46:47 node01 lvm[4441]: intermediate VG write failed. Apr 1 08:46:47 node01 lvm[4441]: Trying to up-convert to 2 images, 0 logs. Apr 1 08:46:47 node01 lvm[4441]: WARNING: Failed to replace 1 of 3 images in volume mirror_1 Apr 1 08:46:47 node01 lvm[4441]: WARNING: Failed to replace 1 of 1 logs in volume mirror_1 Apr 1 08:46:47 node01 lvm[4441]: Repair of mirrored device revolution_9-mirror_1 finished successfully. ^ I have not noticed this before but it is possible the lvconvert is returning success while failing
Yes, this seems to be the culprit: #device/dev-io.c:106 /dev/sdb1: read failed after 0 of 1024 at 4096: Input/output error #device/dev-io.c:249 <backtrace> #format_text/format-text.c:330 <backtrace> #format_text/format-text.c:597 <backtrace> #metadata/metadata.c:2668 <backtrace> #metadata/mirror.c:1041 intermediate VG write failed. #metadata/mirror.c:1145 <backtrace> This clearly hints at a problem in recognizing that the VG is partial and that partial write errors should be accepted in vg_write (as long as *some* MDAs write OK). However, I can't reproduce this so far. I have a very similar test, complete with IO errors, the only difference I can think of is that my PVs are DM error targets instead of broken iSCSI devices. (...) So it turns out the problem in my non-reproducibility is this: #/home/mornfall/dev/lvm2/git-upstream/lib/filters/filter-persistent.c:292 @TESTDIR@/dev/mapper/@PREFIX@pv4: Skipping unusable device When I comment out device_is_usable in the persistent filter, I get: #/home/mornfall/dev/lvm2/git-upstream/lib/metadata/mirror.c:1041 intermediate VG write failed. I.e. device_is_usable is being so smart that it fools our tests. Anyway, I can fix this now that I can reproduce the problem. PS: The recovery in our upstream test certainly works, the mirror goes from ADAD+A to AA+A just fine (same for DDAA+A -> AA+A), both when the device goes away completely and when the device is replaced by an error device.
Should work upstream now, although we don't have a regression test at the moment, due to abovementioned issues. I have tested this manually (with device_is_usable disabled) and it seems to work OK. I'll try to find a non-invasive way to make this more testable. The associated upstream commit is dba6dec661c6a0193d9d713c6fd0b9b008c0f26a.
I am still having issues with this one. Even though the errors are not completely the same, I am not getting the device replaced and mirror repaired. It is just stuck at 0% Will try to get more info...
Ok this is now fixed in the newest build: Jan 20 16:55:50 tardis-03 lvm[2085]: /dev/sdb1: read failed after 0 of 2048 at 0: Input/output error Jan 20 16:55:50 tardis-03 lvm[2085]: No PV label found on /dev/sdb1. Jan 20 16:55:50 tardis-03 lvm[2085]: WARNING: Device for PV YvhIyh-heiq-ugUa-JJkQ-zmG9-ds0c-46wfVY not found or rejected by a filter. Jan 20 16:55:58 tardis-03 lvm[2085]: Mirror status: 1 of 2 images failed. Jan 20 16:55:58 tardis-03 lvm[2085]: Trying to up-convert to 2 images, 1 logs. Jan 20 16:55:59 tardis-03 lvm[2085]: Monitoring mirror device vg-mirror_lv for events. Jan 20 16:55:59 tardis-03 lvm[2085]: WARNING: Device for PV YvhIyh-heiq-ugUa-JJkQ-zmG9-ds0c-46wfVY not found or rejected by a filter. Jan 20 16:55:59 tardis-03 lvm[2085]: vg/mirror_lv: Converted: 0.0% Jan 20 16:56:14 tardis-03 lvm[2085]: WARNING: Device for PV YvhIyh-heiq-ugUa-JJkQ-zmG9-ds0c-46wfVY not found or rejected by a filter. Jan 20 16:56:14 tardis-03 lvm[2085]: vg/mirror_lv: Converted: 21.7% Jan 20 16:56:29 tardis-03 lvm[2085]: WARNING: Device for PV YvhIyh-heiq-ugUa-JJkQ-zmG9-ds0c-46wfVY not found or rejected by a filter. Jan 20 16:56:29 tardis-03 lvm[2085]: vg/mirror_lv: Converted: 34.2% Jan 20 16:56:44 tardis-03 lvm[2085]: WARNING: Device for PV YvhIyh-heiq-ugUa-JJkQ-zmG9-ds0c-46wfVY not found or rejected by a filter. Jan 20 16:56:44 tardis-03 lvm[2085]: vg/mirror_lv: Converted: 43.1% Jan 20 16:56:59 tardis-03 lvm[2085]: WARNING: Device for PV YvhIyh-heiq-ugUa-JJkQ-zmG9-ds0c-46wfVY not found or rejected by a filter. Jan 20 16:56:59 tardis-03 lvm[2085]: vg/mirror_lv: Converted: 51.7% Jan 20 16:57:14 tardis-03 lvm[2085]: WARNING: Device for PV YvhIyh-heiq-ugUa-JJkQ-zmG9-ds0c-46wfVY not found or rejected by a filter. Jan 20 16:57:14 tardis-03 lvm[2085]: vg/mirror_lv: Converted: 60.4% Jan 20 16:57:29 tardis-03 lvm[2085]: WARNING: Device for PV YvhIyh-heiq-ugUa-JJkQ-zmG9-ds0c-46wfVY not found or rejected by a filter. Jan 20 16:57:29 tardis-03 lvm[2085]: vg/mirror_lv: Converted: 69.4% Jan 20 16:57:44 tardis-03 lvm[2085]: WARNING: Device for PV YvhIyh-heiq-ugUa-JJkQ-zmG9-ds0c-46wfVY not found or rejected by a filter. Jan 20 16:57:44 tardis-03 lvm[2085]: vg/mirror_lv: Converted: 78.1% Jan 20 16:57:59 tardis-03 lvm[2085]: WARNING: Device for PV YvhIyh-heiq-ugUa-JJkQ-zmG9-ds0c-46wfVY not found or rejected by a filter. Jan 20 16:57:59 tardis-03 lvm[2085]: vg/mirror_lv: Converted: 86.6% Jan 20 16:58:14 tardis-03 lvm[2085]: WARNING: Device for PV YvhIyh-heiq-ugUa-JJkQ-zmG9-ds0c-46wfVY not found or rejected by a filter. Jan 20 16:58:14 tardis-03 lvm[2085]: vg/mirror_lv: Converted: 95.6% Jan 20 16:58:29 tardis-03 lvm[2085]: WARNING: Device for PV YvhIyh-heiq-ugUa-JJkQ-zmG9-ds0c-46wfVY not found or rejected by a filter. Jan 20 16:58:29 tardis-03 lvm[2085]: vg/mirror_lv: Converted: 100.0% Jan 20 16:58:44 tardis-03 lvm[2085]: Repair of mirrored device vg-mirror_lv finished successfully. Jan 20 16:58:44 tardis-03 lvm[2085]: vg-mirror_lv is now in-sync. Jan 20 16:58:45 tardis-03 lvm[2085]: No longer monitoring mirror device vg-mirror_lv for events. Although there is a spam of "WARNING" messages... Marking VERIFIED with: 3.10.0-223.el7.x86_64 lvm2-2.02.114-6.el7 BUILT: Tue Jan 20 14:49:01 CET 2015 lvm2-libs-2.02.114-6.el7 BUILT: Tue Jan 20 14:49:01 CET 2015 lvm2-cluster-2.02.114-6.el7 BUILT: Tue Jan 20 14:49:01 CET 2015 device-mapper-1.02.92-6.el7 BUILT: Tue Jan 20 14:49:01 CET 2015 device-mapper-libs-1.02.92-6.el7 BUILT: Tue Jan 20 14:49:01 CET 2015 device-mapper-event-1.02.92-6.el7 BUILT: Tue Jan 20 14:49:01 CET 2015 device-mapper-event-libs-1.02.92-6.el7 BUILT: Tue Jan 20 14:49:01 CET 2015 device-mapper-persistent-data-0.4.1-2.el7 BUILT: Wed Nov 12 19:39:46 CET 2014 cmirror-2.02.114-6.el7 BUILT: Tue Jan 20 14:49:01 CET 2015
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0513.html