Description of problem: If you kill a node doing I/O to a mirror at the same time you kill the in-sync primary mirror leg, it appears that primary leg will become out of sync again and thus, failing it results in a corrupt mirror. Senario: Kill primary leg of synced 2 leg mirror ****** Mirror hash info for this scenario ****** * name: fail_primary_synced_2_legs * sync: 1 * disklog: 1 * failpv: /dev/sdg1 * legs: 2 * pvs: /dev/sdg1 /dev/sdc1 /dev/sdb1 ************************************************ Creating mirror on link-07... qarsh root@link-07 lvcreate -m 1 -n fail_primary_synced_2_legs -L 800M helter_skelter /dev/sdg1:0-500 /dev/sdc1:0-500 /dev/sdb1:0-50 Creating gfs on top of mirror on link-07... Creating mnt point /mnt/fail_primary_synced_2_legs on link-02... Mounting gfs on link-02... Creating mnt point /mnt/fail_primary_synced_2_legs on link-07... Mounting gfs on link-07... Creating mnt point /mnt/fail_primary_synced_2_legs on link-08... Mounting gfs on link-08... Waiting for mirror to sync Verifying that the mirror is fully syncd, currently at ...18.50% ...28.00% ...38.00% ...47.50% ...57.50% ...67.00% ...76.50% ...86.50% ...96.00% ...100.00% Disabling device sdg on link-02 Disabling device sdg on link-07 Disabling device sdg on link-08 Attempting I/O to cause mirror down conversion on link-07 [KILL LINK-07] Didn't receive heartbeat for 120 seconds couldn't write to cmirror filesystem device-mapper: Read failure on mirror: Trying different device. scsi2 (0:7): rejecting I/O to offline device device-mapper: Unable to read from primary mirror during recovery device-mapper: All mirrors of 253:3 have failed. device-mapper: recovery failed on region 352 scsi2 (0:7): rejecting I/O to offline device device-mapper: A read failure occurred on a mirror device. device-mapper: Unable to retry read. GFS: fsid=LINK_128:gfs.2: fatal: I/O error GFS: fsid=LINK_128:gfs.2: block = 104432 GFS: fsid=LINK_128:gfs.2: function = gfs_dreread GFS: fsid=LINK_128:gfs.2: file = /builddir/build/BUILD/gfs-kernel-2.6.9-73/smp/src/gfs/dio.c, line = 576 GFS: fsid=LINK_128:gfs.2: time = 1188238095 scsi2 (0:7): rejecting I/O to offline device device-mapper: A read failure occurred on a mirror device. device-mapper: Unable to retry read. device-mapper: A read failure occurred on a mirror device. device-mapper: Unable to retry read. GFS: fsid=LINK_128:gfs.2: about to withdraw from the cluster GFS: fsid=LINK_128:gfs.2: waiting for outstanding I/O GFS: fsid=LINK_128:gfs.2: telling LM to withdraw [root@link-02 ~]# dmsetup ls helter_skelter-fail_primary_synced_2_legs_mimage_1 (253, 4) helter_skelter-fail_primary_synced_2_legs_mimage_0 (253, 3) helter_skelter-fail_primary_synced_2_legs_mlog (253, 2) helter_skelter-fail_primary_synced_2_legs (253, 5) VolGroup00-LogVol01 (253, 1) VolGroup00-LogVol00 (253, 0) [root@link-02 ~]# dmsetup status helter_skelter-fail_primary_synced_2_legs_mimage_1: 0 1638400 linear helter_skelter-fail_primary_synced_2_legs_mimage_0: 0 1638400 linear helter_skelter-fail_primary_synced_2_legs_mlog: 0 8192 linear helter_skelter-fail_primary_synced_2_legs: 0 1638400 mirror 2 253:3 253:4 1599/1600 1 AA 3 clustered_disk 253:2 A VolGroup00-LogVol01: 0 4063232 linear VolGroup00-LogVol00: 0 73728000 linear [root@link-02 ~]# cman_tool services Service Name GID LID State Code Fence Domain: "default" 2 2 run U-1,10,2 [1 3] DLM Lock Space: "clvmd" 6 5 run - [1 3] DLM Lock Space: "clustered_log" 9 6 run - [1 3] DLM Lock Space: "gfs" 10 7 run - [1 3] GFS Mount Group: "gfs" 11 8 recover 4 - [1 3] [root@link-08 ~]# cman_tool services Service Name GID LID State Code Fence Domain: "default" 2 2 run U-1,10,2 [1 3] DLM Lock Space: "clvmd" 6 5 run - [1 3] DLM Lock Space: "clustered_log" 9 6 run - [3 1] DLM Lock Space: "gfs" 10 7 run S-10,200,0 [1 3] GFS Mount Group: "gfs" 11 8 recover 2 - [1 3] Version-Release number of selected component (if applicable): 2.6.9-56.ELsmp cmirror-kernel-2.6.9-33.2 lvm2-2.02.27-1.el4 How reproducible: This is reproducable
I think the problem here is that when you kill a node, there _are_ regions that are out-of-sync. Thus, if you kill the primary leg, mirror will complain. I'm not sure what to do about this yet. The same problem should exist on single machine mirrors if you do the following: 0) do I/O to a mirror 1) kill machine 2) kill primary leg of mirror before mirror has a chance to resync after reboot If there is no clean solution, our only option may be to shutdown the mirror and forcibly convert it to linear.
Jon, Isn't this possibly a similar case to BA's setup? Where one leg of a mirror along with one of the nodes in a cluster is attached to one power supply and another leg and node could be attached to another. Then you could potentially lose a primary leg and node at the same time due to one switch failing. Or does this bug require a pause between killing the leg and killing a machine?
It's a little different in that case because of the way failover happens in HA LVM. In the case of HA LVM, if the machine dies and everything fails over, then during the window of time where the mirror is resyncing, the disk fails... then you might be able to hit something like this. IOW, the primary device needs to die when the mirror is active, but not in-sync.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Making this a dup of 359341 (going to newer bug because it has better recreation information) *** This bug has been marked as a duplicate of 395341 ***