Description of problem: I've tried this case many times with single node mirrors and it always appears to work, but when I tried this on a cmirror with the latest rpms. It failed to down convert the cmirror to a linear. This causes the volume group and mirror to appear corrupted and also causes I/O errors when attempt to do I/O to the mirror. [root@link-02 ~]# lvs -a -o +devices LV VG Attr LSize Origin Snap% Move Log Copy% Devices mirror vg mwi-a- 100.00M mirror_mlog 100.00 mirror_mimage_0(0),mirror_mimage_1(0) [mirror_mimage_0] vg iwi-ao 100.00M /dev/sdh1(0) [mirror_mimage_1] vg iwi-ao 100.00M /dev/sda1(0) [mirror_mlog] vg lwi-ao 4.00M /dev/sdb1(0) # Here I turned off /dev/sdh [root@link-02 ~]# lvs -a -o +devices /dev/sdh: read failed after 0 of 4096 at 0: Input/output error /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error /dev/sdh2: read failed after 0 of 2048 at 0: Input/output error /dev/sdh3: read failed after 0 of 2048 at 0: Input/output error /dev/sdh5: read failed after 0 of 2048 at 0: Input/output error /dev/dm-3: read failed after 0 of 4096 at 104792064: Input/output error /dev/dm-3: read failed after 0 of 4096 at 0: Input/output error /dev/sdh: read failed after 0 of 4096 at 0: Input/output error /dev/sdh: read failed after 0 of 4096 at 1296954228736: Input/output error /dev/sdh: read failed after 0 of 4096 at 0: Input/output error /dev/sdh1: read failed after 0 of 1024 at 259384082432: Input/output error /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error /dev/sdh2: read failed after 0 of 512 at 259384082432: Input/output error /dev/sdh2: read failed after 0 of 2048 at 0: Input/output error /dev/sdh3: read failed after 0 of 512 at 259384082432: Input/output error /dev/sdh3: read failed after 0 of 2048 at 0: Input/output error /dev/sdh5: read failed after 0 of 1024 at 259384082432: Input/output error /dev/sdh5: read failed after 0 of 2048 at 0: Input/output error Couldn't find device with uuid 'KaBiI2-y4zs-4EPW-sPPS-X5uZ-gWLt-7zTctH'. Couldn't find all physical volumes for volume group vg. /dev/dm-3: read failed after 0 of 4096 at 0: Input/output error /dev/sdh: read failed after 0 of 4096 at 0: Input/output error /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error /dev/sdh2: read failed after 0 of 2048 at 0: Input/output error /dev/sdh3: read failed after 0 of 2048 at 0: Input/output error /dev/sdh5: read failed after 0 of 2048 at 0: Input/output error Couldn't find device with uuid 'KaBiI2-y4zs-4EPW-sPPS-X5uZ-gWLt-7zTctH'. Couldn't find all physical volumes for volume group vg. /dev/dm-3: read failed after 0 of 4096 at 0: Input/output error /dev/sdh: read failed after 0 of 4096 at 0: Input/output error /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error /dev/sdh2: read failed after 0 of 2048 at 0: Input/output error /dev/sdh3: read failed after 0 of 2048 at 0: Input/output error /dev/sdh5: read failed after 0 of 2048 at 0: Input/output error /dev/dm-3: read failed after 0 of 4096 at 0: Input/output error /dev/sdh: read failed after 0 of 4096 at 0: Input/output error /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error /dev/sdh2: read failed after 0 of 2048 at 0: Input/output error /dev/sdh3: read failed after 0 of 2048 at 0: Input/output error /dev/sdh5: read failed after 0 of 2048 at 0: Input/output error Couldn't find device with uuid 'KaBiI2-y4zs-4EPW-sPPS-X5uZ-gWLt-7zTctH'. Couldn't find all physical volumes for volume group vg. /dev/dm-3: read failed after 0 of 4096 at 0: Input/output error /dev/sdh: read failed after 0 of 4096 at 0: Input/output error /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error /dev/sdh2: read failed after 0 of 2048 at 0: Input/output error /dev/sdh3: read failed after 0 of 2048 at 0: Input/output error /dev/sdh5: read failed after 0 of 2048 at 0: Input/output error Couldn't find device with uuid 'KaBiI2-y4zs-4EPW-sPPS-X5uZ-gWLt-7zTctH'. Couldn't find all physical volumes for volume group vg. Volume group "vg" not found # I then attempted to write to the mirror to force the down convert, but it didn't work so I then attempt a write from all nodes in the cluster and still lvm reports this volume group and mirror as corrupted. Version-Release number of selected component (if applicable): 2.6.9-43.ELsmp lvm2-cluster-2.02.18-1.el4 lvm2-2.02.18-1.el4 device-mapper-1.02.14-2.el4 cmirror-1.0.1-0 cmirror-kernel-smp-2.6.9-18.6
I tried that again with gfs on top of the 2 cmirrors and had I/O going to them from all nodes in the cluster. After the primary leg failure, I again saw this issue and all the I/O to those mirrors ended up deadlocking. This bug needs to be on the blocker list for cmirrors.
Devel ACK for blocker-beta for cluster 4.5
Was the cluster mirror in-sync? Did you get any core dumps? (There have been changes made to dmeventd - check to ensure that it is running.)
The mirrors have always been in-sync before attempting the failure case and I didn't see any core dumps and have no reason to believe that dmeventd wasn't running.
It appears that dmeventd is running right up until a write is attempted to the failed mirror, at which time it stops for some reason.
where "for some reason" == seg fault try: sysctl kernel.core_pattern=/tmp/core when you start up. The your core files will appear as /tmp/core.*
I am also seeing cmirror creation issues now due to what also appears to be dmeventd failures. Changing title of this bug and bumping priority. [root@link-08 ~]# lvcreate -m 1 -L 1G -n mirror vg /dev/sda1:0-2000 /dev/sdb1:0-2000 /dev/sdh1:0-100 Error locking on node link-07: vg-mirror: event registration failed: libdevmapper-event-lvm2mirror.so LVM-RgEPrNphjR3yUmyDl8At2ccc3MBgeEYiq09j2Q60abIsnd8liJdela6Z05Sotbss 65280 0 Error locking on node link-04: vg-mirror: event registration failed: libdevmapper-event-lvm2mirror.so LVM-RgEPrNphjR3yUmyDl8At2ccc3MBgeEYiq09j2Q60abIsnd8liJdela6Z05Sotbss 65280 0 Error locking on node link-02: vg-mirror: event registration failed: libdevmapper-event-lvm2mirror.so LVM-RgEPrNphjR3yUmyDl8At2ccc3MBgeEYiq09j2Q60abIsnd8liJdela6Z05Sotbss 65280 0 Error locking on node link-08: vg-mirror: event registration failed: libdevmapper-event-lvm2mirror.so LVM-RgEPrNphjR3yUmyDl8At2ccc3MBgeEYiq09j2Q60abIsnd8liJdela6Z05Sotbss 65280 0 Failed to activate new LV.
Just a note that QA is still seeing this issue and that this is blocking any extensive cmirror testing. dmeventd: [...] select(6, [5], NULL, NULL, {1, 0}) = 0 (Timeout) select(6, [5], NULL, NULL, {1, 0}) = 0 (Timeout) select(6, [5], NULL, NULL, {1, 0}) = 0 (Timeout) select(6, [5], NULL, NULL, {1, 0}) = 0 (Timeout) select(6, [5], NULL, NULL, {1, 0}) = ? ERESTARTNOHAND (To be restarted) PANIC: attached pid 3107 exited Process 3107 detached lvm2-cluster-2.02.20-1.el4 lvm2-2.02.20-1.el4 device-mapper-1.02.16-1.el4 2.6.9-43.ELsmp
The locking model in lvm.conf changed. Woulda been nice to know 10 days ago. :)