Bug 241422

Summary:	cmirror/clvmd issues when leg fails on subset of cluster
Product:	[Retired] Red Hat Cluster Suite	Reporter:	Corey Marthaler <cmarthal>
Component:	cmirror	Assignee:	LVM and device-mapper development team <lvm-team>
Status:	CLOSED WONTFIX	QA Contact:	Cluster QE <mspqa-list>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4	CC:	agk, bstevens, ccaulfie, coughlan, dwysocha, jbrassow, kawasaki, mbroz, prockai
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2010-05-07 20:42:38 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Corey Marthaler 2007-05-25 21:16:21 UTC

Description of problem:
This may be a straight clvmd bug.

If you fail a leg of a cmirror on only a subset of the cluster. You're left with
a mirror which is deadlocked attempting to recover, as well as other deadlocked
clvmd commands.

[THIS NODE HAD THE LEG FAILURE]
[root@link-02 ~]# lvs -a -o +devices
  /dev/sda1: read failed after 0 of 2048 at 0: Input/output error
  LV                                    VG             Attr   LSize   Origin
Snap%  Move Log                             Copy%  Devices
  LogVol00                              VolGroup00     -wi-ao  35.19G          
                                                /dev/hda2(0)
  LogVol01                              VolGroup00     -wi-ao   1.94G          
                                                /dev/hda2(1126)
  fail_primary_synced_2_legs            helter_skelter mwi-so 800.00M          
         fail_primary_synced_2_legs_mlog  90.50
fail_primary_synced_2_legs_mimage_0(0),fail_primary_synced_2_legs_mimage_1(0)
  [fail_primary_synced_2_legs_mimage_0] helter_skelter iwi-so 800.00M          
                                       
  [fail_primary_synced_2_legs_mimage_1] helter_skelter iwi-so 800.00M          
                                                /dev/sdg1(0)
  [fail_primary_synced_2_legs_mlog]     helter_skelter lwi-so   4.00M          
                                                /dev/sdb1(0)


[THIS NODE STILL SEES THE LEG]
[root@link-04 ~]# lvs -a -o +devices
  Volume group "helter_skelter" inconsistent
  Inconsistent metadata copies found - updating to use version 6
  LV                                    VG             Attr   LSize   Origin
Snap%  Move Log                             Copy%  Devices
  LogVol00                              VolGroup00     -wi-ao  72.44G          
                                                /dev/hda2(0)
  LogVol01                              VolGroup00     -wi-ao   1.94G          
                                                /dev/hda2(2318)
  fail_primary_synced_2_legs            helter_skelter mwi-a- 800.00M          
         fail_primary_synced_2_legs_mlog  90.50
fail_primary_synced_2_legs_mimage_0(0),fail_primary_synced_2_legs_mimage_1(0)
  [fail_primary_synced_2_legs_mimage_0] helter_skelter iwi-ao 800.00M          
                                       
  [fail_primary_synced_2_legs_mimage_1] helter_skelter iwi-ao 800.00M          
                                                /dev/sdg1(0)
  [fail_primary_synced_2_legs_mlog]     helter_skelter lwi-ao   4.00M          
                                                /dev/sdb1(0)

Any pvs cmd gets stuck in a read.


Version-Release number of selected component (if applicable):
2.6.9-55.ELlargesmp
cmirror-kernel-2.6.9-32.0

Comment 1 Corey Marthaler 2007-05-25 21:17:56 UTC

More info...

link-04 (node that still sees the leg):
[...]
May 25 16:13:22 link-04 kernel: dm-cmirror: Recovery blocked by outstanding
write on region 492/SjTXMEG6
May 25 16:13:22 link-04 kernel: dm-cmirror: Recovery blocked by outstanding
write on region 493/SjTXMEG6
May 25 16:13:22 link-04 kernel: dm-cmirror: Recovery blocked by outstanding
write on region 494/SjTXMEG6
May 25 16:13:22 link-04 kernel: dm-cmirror: Recovery blocked by outstanding
write on region 495/SjTXMEG6
May 25 16:13:22 link-04 kernel: dm-cmirror: Recovery blocked by outstanding
write on region 496/SjTXMEG6
May 25 16:13:22 link-04 kernel: dm-cmirror: Recovery blocked by outstanding
write on region 825/SjTXMEG6
May 25 16:13:22 link-04 kernel: dm-cmirror: Recovery blocked by outstanding
write on region 826/SjTXMEG6


link-02 (node doesn't see the leg):
May 25 15:29:42 link-02 qarshd[6394]: Running cmdline: echo offline >
/sys/block/sda/device/state
May 25 15:29:42 link-02 qarshd[6394]: That's enough
scsi0 (0:1): rejecting I/O to offline device
May 25 15:29:51 link-02 kernel: scsi0 (0:1): rejecting I/O to offline device
May 25 15:29:51 link-02 kernel: dm-cmirror: LOG INFO:
May 25 15:29:51 link-02 kernel: dm-cmirror:   uuid:
LVM-ZcfTPEokTadP8VK8Czcm4aEia6yh6BUpdesI0PhGLu3eiY9jf0xaqHf0SjTXMEG6
May 25 15:29:51 link-02 kernel: dm-cmirror:   uuid_ref    : 1
May 25 15:29:51 link-02 kernel: dm-cmirror:  ?region_count: 1600
May 25 15:29:51 link-02 kernel: dm-cmirror:  ?sync_count  : 0
May 25 15:29:51 link-02 kernel: dm-cmirror:  ?sync_search : 0
May 25 15:29:51 link-02 kernel: dm-cmirror:   in_sync     : YES
May 25 15:29:51 link-02 kernel: dm-cmirror:   suspended   : NO
May 25 15:29:51 link-02 kernel: dm-cmirror:   server_id   : 2
May 25 15:29:51 link-02 kernel: dm-cmirror:   server_valid: YES
May 25 15:29:51 link-02 lvm[5480]: No longer monitoring mirror device
helter_skelter-fail_primary_synced_2_legs for events
May 25 15:29:51 link-02 lvm[5480]: Unlocking memory
May 25 15:29:51 link-02 lvm[5480]: memlock_count dec to 0
May 25 15:29:51 link-02 lvm[5480]: Dumping persistent device cache to
/etc/lvm/.cache
May 25 15:29:51 link-02 lvm[5480]: Locking /etc/lvm/.cache (F_WRLCK, 1)
May 25 15:29:51 link-02 lvm[5480]: Unlocking fd 8
May 25 15:29:51 link-02 lvm[5480]: Wiping internal VG cache
May 25 15:29:51 link-02 kernel: dm-cmirror: Performing flush to work around bug
235040
May 25 15:29:51 link-02 kernel: dm-cmirror: Log flush complete
May 25 15:30:11 link-02 kernel: dm-cmirror: LRT_MASTER_LEAVING(13): (SjTXMEG6)
May 25 15:30:11 link-02 kernel: dm-cmirror:   starter     : 2
May 25 15:30:11 link-02 kernel: dm-cmirror:   co-ordinator: 0
May 25 15:30:11 link-02 kernel: dm-cmirror:   node_count  : 2
May 25 15:30:11 link-02 kernel: dm-cmirror: LRT_ELECTION(10): (SjTXMEG6)
May 25 15:30:11 link-02 kernel: dm-cmirror:   starter     : 2
May 25 15:30:11 link-02 kernel: dm-cmirror:   co-ordinator: 57005
May 25 15:30:11 link-02 kernel: dm-cmirror:   node_count  : 2
May 25 15:30:11 link-02 kernel: dm-cmirror: LRT_SELECTION(11): (SjTXMEG6)
May 25 15:30:11 link-02 kernel: dm-cmirror:   starter     : 2
May 25 15:30:11 link-02 kernel: dm-cmirror:   co-ordinator: 1
May 25 15:30:11 link-02 kernel: dm-cmirror:   node_count  : 2
May 25 15:30:11 link-02 kernel: dm-cmirror: LRT_MASTER_ASSIGN(12): (SjTXMEG6)
May 25 15:30:11 link-02 kernel: dm-cmirror:   starter     : 2
May 25 15:30:11 link-02 kernel: dm-cmirror:   co-ordinator: 1
May 25 15:30:11 link-02 kernel: dm-cmirror:   node_count  : 1
May 25 15:30:12 link-02 kernel: dm-cmirror: LRT_ELECTION(10): (SjTXMEG6)
May 25 15:30:12 link-02 kernel: dm-cmirror:   starter     : 3
May 25 15:30:12 link-02 kernel: dm-cmirror:   co-ordinator: 3
May 25 15:30:12 link-02 kernel: dm-cmirror:   node_count  : 1
scsi0 (0:1): rejecting I/O to offline device
May 25 16:04:39 link-02 kernel: scsi0 (0:1): rejecting I/O to offline device
May 25 16:04:39 link-02 kernel: dm-cmirror: server_id=dead, server_valid=1, SjTXMEG6
May 25 16:04:39 link-02 kernel: dm-cmirror: trigger = LRT_GET_SYNC_COUNT
May 25 16:04:39 link-02 kernel: dm-cmirror: LRT_ELECTION(10): (SjTXMEG6)
May 25 16:04:39 link-02 kernel: dm-cmirror:   starter     : 4
May 25 16:04:39 link-02 kernel: dm-cmirror:   co-ordinator: 4
May 25 16:04:39 link-02 kernel: dm-cmirror:   node_count  : 0
scsi0 (0:1): rejecting I/O to offline device
scsi0 (0:1): rejecting I/O to offline device
May 25 16:12:40 link-02 kernel: scsi0 (0:1): rejecting I/O to offline device

Comment 2 Jonathan Earl Brassow 2007-07-27 16:15:42 UTC

Bug 249092 is related to this bug.

Comment 3 Kiersten (Kerri) Anderson 2007-10-12 19:14:42 UTC

This defect is getting a lot of attention by our customers.  This is a fairly
typical scenario for bridging two datacenters storage arrays.  Does fencing one
of the nodes in the cluster allow normal operations to resume?   Setting flags
to get this into 4.7, would like a solution much sooner.

Comment 4 Kiersten (Kerri) Anderson 2007-10-12 20:07:41 UTC

Is this the equivalent of split-brain mode from a storage perspective?

Comment 5 Corey Marthaler 2007-10-12 20:13:27 UTC

This can get more complicated when one subset sees one device fail and another
subset sees a different device fail. 

I tried this with a 3 node cluster and failed the primary leg on two nodes (this
included the mirror master) and failed the secondary leg on another. The
conversion failed so I fenced the third node so there'd be a consistent storage
view. The I/O attempts to that mirror remained deadlocked however.

  mirror            test       Mwi-so 10.00G                    mirror_mlog  
0.00 mirror_mimage_0(0),mirror_mimage_1(0)
  [mirror_mimage_0] test       iwi-so 10.00G                                   
                                  
  [mirror_mimage_1] test       iwi-so 10.00G                                   
   /dev/sdb1(0)                   
  [mirror_mlog]     test       lwi-so  4.00M                                   
   /dev/sdc1(0) 


[root@link-02 ~]# dmsetup ls --tree
test-mirror (253:5)
 ├─test-mirror_mimage_1 (253:4)
 │  └─ (8:17)
 ├─test-mirror_mimage_0 (253:3)
 │  └─ (8:1)
 └─test-mirror_mlog (253:2)
    └─ (8:33)

When I tried the downconvert by hand it failed because the mirror was already
"consistent".

[root@link-02 ~]# vgreduce --config devices{ignore_suspended_devices=1}
--removemissing test
  /dev/sda1: read failed after 0 of 512 at 145661362176: Input/output error
  /dev/sda1: read failed after 0 of 2048 at 0: Input/output error
  Volume group "test" is already consistent