Description of problem: This may be a straight clvmd bug. If you fail a leg of a cmirror on only a subset of the cluster. You're left with a mirror which is deadlocked attempting to recover, as well as other deadlocked clvmd commands. [THIS NODE HAD THE LEG FAILURE] [root@link-02 ~]# lvs -a -o +devices /dev/sda1: read failed after 0 of 2048 at 0: Input/output error LV VG Attr LSize Origin Snap% Move Log Copy% Devices LogVol00 VolGroup00 -wi-ao 35.19G /dev/hda2(0) LogVol01 VolGroup00 -wi-ao 1.94G /dev/hda2(1126) fail_primary_synced_2_legs helter_skelter mwi-so 800.00M fail_primary_synced_2_legs_mlog 90.50 fail_primary_synced_2_legs_mimage_0(0),fail_primary_synced_2_legs_mimage_1(0) [fail_primary_synced_2_legs_mimage_0] helter_skelter iwi-so 800.00M [fail_primary_synced_2_legs_mimage_1] helter_skelter iwi-so 800.00M /dev/sdg1(0) [fail_primary_synced_2_legs_mlog] helter_skelter lwi-so 4.00M /dev/sdb1(0) [THIS NODE STILL SEES THE LEG] [root@link-04 ~]# lvs -a -o +devices Volume group "helter_skelter" inconsistent Inconsistent metadata copies found - updating to use version 6 LV VG Attr LSize Origin Snap% Move Log Copy% Devices LogVol00 VolGroup00 -wi-ao 72.44G /dev/hda2(0) LogVol01 VolGroup00 -wi-ao 1.94G /dev/hda2(2318) fail_primary_synced_2_legs helter_skelter mwi-a- 800.00M fail_primary_synced_2_legs_mlog 90.50 fail_primary_synced_2_legs_mimage_0(0),fail_primary_synced_2_legs_mimage_1(0) [fail_primary_synced_2_legs_mimage_0] helter_skelter iwi-ao 800.00M [fail_primary_synced_2_legs_mimage_1] helter_skelter iwi-ao 800.00M /dev/sdg1(0) [fail_primary_synced_2_legs_mlog] helter_skelter lwi-ao 4.00M /dev/sdb1(0) Any pvs cmd gets stuck in a read. Version-Release number of selected component (if applicable): 2.6.9-55.ELlargesmp cmirror-kernel-2.6.9-32.0
More info... link-04 (node that still sees the leg): [...] May 25 16:13:22 link-04 kernel: dm-cmirror: Recovery blocked by outstanding write on region 492/SjTXMEG6 May 25 16:13:22 link-04 kernel: dm-cmirror: Recovery blocked by outstanding write on region 493/SjTXMEG6 May 25 16:13:22 link-04 kernel: dm-cmirror: Recovery blocked by outstanding write on region 494/SjTXMEG6 May 25 16:13:22 link-04 kernel: dm-cmirror: Recovery blocked by outstanding write on region 495/SjTXMEG6 May 25 16:13:22 link-04 kernel: dm-cmirror: Recovery blocked by outstanding write on region 496/SjTXMEG6 May 25 16:13:22 link-04 kernel: dm-cmirror: Recovery blocked by outstanding write on region 825/SjTXMEG6 May 25 16:13:22 link-04 kernel: dm-cmirror: Recovery blocked by outstanding write on region 826/SjTXMEG6 link-02 (node doesn't see the leg): May 25 15:29:42 link-02 qarshd[6394]: Running cmdline: echo offline > /sys/block/sda/device/state May 25 15:29:42 link-02 qarshd[6394]: That's enough scsi0 (0:1): rejecting I/O to offline device May 25 15:29:51 link-02 kernel: scsi0 (0:1): rejecting I/O to offline device May 25 15:29:51 link-02 kernel: dm-cmirror: LOG INFO: May 25 15:29:51 link-02 kernel: dm-cmirror: uuid: LVM-ZcfTPEokTadP8VK8Czcm4aEia6yh6BUpdesI0PhGLu3eiY9jf0xaqHf0SjTXMEG6 May 25 15:29:51 link-02 kernel: dm-cmirror: uuid_ref : 1 May 25 15:29:51 link-02 kernel: dm-cmirror: ?region_count: 1600 May 25 15:29:51 link-02 kernel: dm-cmirror: ?sync_count : 0 May 25 15:29:51 link-02 kernel: dm-cmirror: ?sync_search : 0 May 25 15:29:51 link-02 kernel: dm-cmirror: in_sync : YES May 25 15:29:51 link-02 kernel: dm-cmirror: suspended : NO May 25 15:29:51 link-02 kernel: dm-cmirror: server_id : 2 May 25 15:29:51 link-02 kernel: dm-cmirror: server_valid: YES May 25 15:29:51 link-02 lvm[5480]: No longer monitoring mirror device helter_skelter-fail_primary_synced_2_legs for events May 25 15:29:51 link-02 lvm[5480]: Unlocking memory May 25 15:29:51 link-02 lvm[5480]: memlock_count dec to 0 May 25 15:29:51 link-02 lvm[5480]: Dumping persistent device cache to /etc/lvm/.cache May 25 15:29:51 link-02 lvm[5480]: Locking /etc/lvm/.cache (F_WRLCK, 1) May 25 15:29:51 link-02 lvm[5480]: Unlocking fd 8 May 25 15:29:51 link-02 lvm[5480]: Wiping internal VG cache May 25 15:29:51 link-02 kernel: dm-cmirror: Performing flush to work around bug 235040 May 25 15:29:51 link-02 kernel: dm-cmirror: Log flush complete May 25 15:30:11 link-02 kernel: dm-cmirror: LRT_MASTER_LEAVING(13): (SjTXMEG6) May 25 15:30:11 link-02 kernel: dm-cmirror: starter : 2 May 25 15:30:11 link-02 kernel: dm-cmirror: co-ordinator: 0 May 25 15:30:11 link-02 kernel: dm-cmirror: node_count : 2 May 25 15:30:11 link-02 kernel: dm-cmirror: LRT_ELECTION(10): (SjTXMEG6) May 25 15:30:11 link-02 kernel: dm-cmirror: starter : 2 May 25 15:30:11 link-02 kernel: dm-cmirror: co-ordinator: 57005 May 25 15:30:11 link-02 kernel: dm-cmirror: node_count : 2 May 25 15:30:11 link-02 kernel: dm-cmirror: LRT_SELECTION(11): (SjTXMEG6) May 25 15:30:11 link-02 kernel: dm-cmirror: starter : 2 May 25 15:30:11 link-02 kernel: dm-cmirror: co-ordinator: 1 May 25 15:30:11 link-02 kernel: dm-cmirror: node_count : 2 May 25 15:30:11 link-02 kernel: dm-cmirror: LRT_MASTER_ASSIGN(12): (SjTXMEG6) May 25 15:30:11 link-02 kernel: dm-cmirror: starter : 2 May 25 15:30:11 link-02 kernel: dm-cmirror: co-ordinator: 1 May 25 15:30:11 link-02 kernel: dm-cmirror: node_count : 1 May 25 15:30:12 link-02 kernel: dm-cmirror: LRT_ELECTION(10): (SjTXMEG6) May 25 15:30:12 link-02 kernel: dm-cmirror: starter : 3 May 25 15:30:12 link-02 kernel: dm-cmirror: co-ordinator: 3 May 25 15:30:12 link-02 kernel: dm-cmirror: node_count : 1 scsi0 (0:1): rejecting I/O to offline device May 25 16:04:39 link-02 kernel: scsi0 (0:1): rejecting I/O to offline device May 25 16:04:39 link-02 kernel: dm-cmirror: server_id=dead, server_valid=1, SjTXMEG6 May 25 16:04:39 link-02 kernel: dm-cmirror: trigger = LRT_GET_SYNC_COUNT May 25 16:04:39 link-02 kernel: dm-cmirror: LRT_ELECTION(10): (SjTXMEG6) May 25 16:04:39 link-02 kernel: dm-cmirror: starter : 4 May 25 16:04:39 link-02 kernel: dm-cmirror: co-ordinator: 4 May 25 16:04:39 link-02 kernel: dm-cmirror: node_count : 0 scsi0 (0:1): rejecting I/O to offline device scsi0 (0:1): rejecting I/O to offline device May 25 16:12:40 link-02 kernel: scsi0 (0:1): rejecting I/O to offline device
Bug 249092 is related to this bug.
This defect is getting a lot of attention by our customers. This is a fairly typical scenario for bridging two datacenters storage arrays. Does fencing one of the nodes in the cluster allow normal operations to resume? Setting flags to get this into 4.7, would like a solution much sooner.
Is this the equivalent of split-brain mode from a storage perspective?
This can get more complicated when one subset sees one device fail and another subset sees a different device fail. I tried this with a 3 node cluster and failed the primary leg on two nodes (this included the mirror master) and failed the secondary leg on another. The conversion failed so I fenced the third node so there'd be a consistent storage view. The I/O attempts to that mirror remained deadlocked however. mirror test Mwi-so 10.00G mirror_mlog 0.00 mirror_mimage_0(0),mirror_mimage_1(0) [mirror_mimage_0] test iwi-so 10.00G [mirror_mimage_1] test iwi-so 10.00G /dev/sdb1(0) [mirror_mlog] test lwi-so 4.00M /dev/sdc1(0) [root@link-02 ~]# dmsetup ls --tree test-mirror (253:5) ├─test-mirror_mimage_1 (253:4) │ └─ (8:17) ├─test-mirror_mimage_0 (253:3) │ └─ (8:1) └─test-mirror_mlog (253:2) └─ (8:33) When I tried the downconvert by hand it failed because the mirror was already "consistent". [root@link-02 ~]# vgreduce --config devices{ignore_suspended_devices=1} --removemissing test /dev/sda1: read failed after 0 of 512 at 145661362176: Input/output error /dev/sda1: read failed after 0 of 2048 at 0: Input/output error Volume group "test" is already consistent