On RHEL 5.5 system: I create 4 loopback devices, created 4 dm-linear devices that map the whole loopback devices and created a volume group on the top of them. I create a mirror in the volume group and activated it. I reloaded the primary leg with dm-error target. I tried to read and write to the mirror. The reads and writes were redirected to the secondary log, but dmeventd did nothing. (it is supposed to remove the failing leg). lvconvert --repair did nothing either (it claimed "The mirror is consistent, nothing to repair.", but the status line clearly shows 'D' status for write error). See this: # dmsetup table loop4: 0 131072 linear 7:4 0 vg2-test_long_mimage_1: 0 65536 linear 253:1 128 loop3: 0 131072 error vg2-test_long_mimage_0: 0 65536 linear 253:2 128 vg2-test_long_mlog: 0 4096 linear 253:0 20864 loop2: 0 131072 linear 7:2 0 vg2-test_long: 0 65536 mirror disk 3 253:4 1024 block_on_error 2 253:5 0 253:6 0loop1: 0 131072 linear 7:1 0 # dmsetup status loop4: 0 131072 linear vg2-test_long_mimage_1: 0 65536 linear loop3: 0 131072 error vg2-test_long_mimage_0: 0 65536 linear vg2-test_long_mlog: 0 4096 linear loop2: 0 131072 linear vg2-test_long: 0 65536 mirror 2 253:5 253:6 63/64 1 DA 3 disk 253:4 A loop1: 0 131072 linear # lvconvert --repair vg2/test_long /dev/mapper/loop3: read failed after 0 of 4096 at 0: Chyba vstupu/výstupu /dev/mapper/vg2-test_long_mimage_0: read failed after 0 of 4096 at 0: Chyba vstupu/výstupu The mirror is consistent, nothing to repair.
rpm -q lvm2 lvm2-cluster device-mapper ?
balíček lvm2-cluster není nainstalován device-mapper-1.02.32-1.el5 device-mapper-1.02.32-1.el5 (I upgraded it with yum just before reporting and the upgraded version doesn't work neither)
lvm2 version: lvm2-2.02.46-8.el5_4.2
Created attachment 375498 [details] strace -f of dmeventd This is strace -f of dmeventd. Notice repeated select(6, [5], NULL, NULL, {1, 0}) = 0 (Timeout) [ that's where nothing was hapenning ], then <... ioctl resumed> , 0x1c5703c0) = 0 [ the event happened, I replaced the primary leg with dm-error and wrote to the mirror ] ... then the long list of calls, as dmeventd is scanning devices. ... and then it goes to sleep with ioctl(7, DM_DEV_WAIT <unfinished ...> without fixing the mirror ... and repeated selects again ...
Created attachment 375500 [details] syslog of dmeventd Syslog. At 18:00:22 the device was activated. At 18:00:48 I simulated the error. Dmeventd scans all the devices and does nothing.
> device-mapper-1.02.32-1.el5 > lvm2-2.02.46-8.el5_4.2 please retest it with 2.02.56 and dm 1.02.29 (I sent info about testing packages to lvm-team list) dmeventd mirror handling changed completely (--repair change).
I upgraded to device-mapper-event-1.02.39-1.el5.x86_64.rpm, device-mapper-1.02.39-1.el5.x86_64.rpm, lvm2-2.02.56-1.el5.x86_64.rpm and it still doesn't work. The log message differs: Dec 2 19:19:37 schizoid lvm[30078]: Monitoring mirror device vg2-test_long for events Dec 2 19:19:37 schizoid lvm[30078]: vg2-test_long is now in-sync Dec 2 19:20:24 schizoid lvm[30078]: Mirror device, 253:5, has failed. Dec 2 19:20:24 schizoid lvm[30078]: Device failure in vg2-test_long Dec 2 19:20:24 schizoid lvm[30078]: Failed to remove faulty devices in vg2-test
Manual execution of lvconvert --repair vg2/test_long doesn't work either: /dev/mapper/loop3: read failed after 0 of 4096 at 0: Chyba vstupu/výstupu /dev/mapper/vg2-test_long_mimage_0: read failed after 0 of 4096 at 0: Chyba vstupu/výstupu The mirror is consistent, nothing to repair. even though "dmsetup status" shows there is an error: vg2-test_long: 0 65536 mirror 2 253:5 253:6 63/64 1 DA 3 disk 253:4 A
Just FYI - if you have this mapping (0 offset): loop4: 0 131072 linear 7:4 0 system will find the failed PV on underlying device (/dev/loop4) so this will not work anyway (just add some offset there so signature is not directly visible on proper place in /dev/loop). That's probably why it says "The mirror is consistent, nothing to repair." (But it still fails even with this change, just this time it is propably real bug in code ;-)
I fixed it by filtering "r|/dev/loop.*|" in lvm.conf --- so it was really caused by lvm finding these real loop devices and using them instead of "dm-linear" devices. Milan, is it OK to set the bug as Invalid? Or do you get any other problem that needs to be investigated?
Thank to this bug I found much more serious problem on cluster when handling local VG (but with cluster locking), so please do not close it yet, I'll close it later :)