Bug 228104

Summary: greater than 2 legged cluster mirrors do not down convert when a leg fails
Product: [Retired] Red Hat Cluster Suite Reporter: Corey Marthaler <cmarthal>
Component: cmirrorAssignee: Jonathan Earl Brassow <jbrassow>
Status: CLOSED CURRENTRELEASE QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: high    
Version: 4CC: agk, cfeist, dwysocha, jbrassow, mbroz, prockai
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-08-05 21:42:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2007-02-09 23:36:03 UTC
Description of problem:
This is basically the root cause of bz 228067 and bz 228070, and this may be a
single node mirroring issuse and not cluster specific. Unlike a two legged
mirror, when a three or greater legged mirror has a leg failure, is doesn't
properly down convert it, thus causing problems with whatever is currently using
the mirror.

Version-Release number of selected component (if applicable):
2.6.9-46.ELsmp
lvm2-2.02.21-1.el4
lvm2-cluster-2.02.21-3.el4

How reproducible:
everytime

Comment 1 Corey Marthaler 2007-02-09 23:44:38 UTC
Hmmm, this appears to work just fine in single node mirroring.


[root@link-07 ~]# lvs -a -o +devices
  LV                VG   Attr   LSize  Origin Snap%  Move Log         Copy% 
Devices                  
  mirror            vg   mwi-ao 10.00G                    mirror_mlog  10.70
mirror_mimage_0(0),mirror_mimage_1(0),mirror_mimage_2(0),mirror_mimage_3(0)
  [mirror_mimage_0] vg   iwi-ao 10.00G                                      
/dev/sdh1(0)             
  [mirror_mimage_1] vg   iwi-ao 10.00G                                      
/dev/sda1(0)             
  [mirror_mimage_2] vg   iwi-ao 10.00G                                      
/dev/sdb1(0)             
  [mirror_mimage_3] vg   iwi-ao 10.00G                                      
/dev/sdc1(0)             
  [mirror_mlog]     vg   lwi-ao  4.00M                                      
/dev/sdd1(0)             


# FAIL /dev/sdh and wait...

[root@link-07 ~]# lvs -a -o +devices
  /dev/sdh: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error
  /dev/sdh2: read failed after 0 of 2048 at 0: Input/output error
  LV                VG   Attr   LSize  Origin Snap%  Move Log         Copy% 
Devices                  
  mirror            vg   mwi-ao 10.00G                    mirror_mlog  13.12
mirror_mimage_3(0),mirror_mimage_1(0),mirror_mimage_2(0)
  [mirror_mimage_1] vg   iwi-ao 10.00G                                      
/dev/sda1(0)             
  [mirror_mimage_2] vg   iwi-ao 10.00G                                      
/dev/sdb1(0)             
  [mirror_mimage_3] vg   iwi-ao 10.00G                                      
/dev/sdc1(0)             
  [mirror_mlog]     vg   lwi-ao  4.00M                                      
/dev/sdd1(0)             


Comment 2 Corey Marthaler 2007-02-12 19:53:58 UTC
Here's what actually happening from the user's view point...

Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg-cmirror1
                      9.5G   20K  9.5G   1% /mnt/gfs1

[root@link-07 ~]# lvs -a -o +devices
  LV                  VG   Attr   LSize  Origin Snap%  Move Log           Copy%
 Devices              
  cmirror1            vg   mwi-ao 10.00G                    cmirror1_mlog  71.33
cmirror1_mimage_0(0),cmirror1_mimage_1(0),cmirror1_mimage_2(0)
  [cmirror1_mimage_0] vg   iwi-ao 10.00G                                       
 /dev/sdh2(0)         
  [cmirror1_mimage_1] vg   iwi-ao 10.00G                                       
 /dev/sde1(0)         
  [cmirror1_mimage_2] vg   iwi-ao 10.00G                                       
 /dev/sdf1(0)         
  [cmirror1_mlog]     vg   lwi-ao  4.00M                                       
 /dev/sdg2(0)         
[root@link-07 ~]# ls -lrt /mnt/gfs1
total 3936
-rw-rw-rw-  1 root root 1000000 Feb 12 08:33 link-02
-rw-rw-rw-  1 root root 1000000 Feb 12 09:00 link-08
-rw-rw-rw-  1 root root 1000000 Feb 12 09:00 link-04
-rw-rw-rw-  1 root root 1000000 Feb 12 14:07 link-07


[FAIL /dev/sdh]


[root@link-07 ~]# ls -lrt /mnt/gfs1
ls: /mnt/gfs1: Input/output error
[root@link-07 ~]# touch /mnt/gfs1/foo
touch: cannot touch `/mnt/gfs1/foo': Input/output error

Filesystem            Size  Used Avail Use% Mounted on
df: `/mnt/gfs1': Input/output error

# The leg "remains" in the mirror
[root@link-08 ~]# lvs -a -o +devices
  /dev/dm-3: read failed after 0 of 4096 at 0: Input/output error
  /dev/dm-6: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh2: read failed after 0 of 2048 at 0: Input/output error
  LV                  VG   Attr   LSize  Origin Snap%  Move Log           Copy%
 Devices              
  cmirror1            vg   mwi-ao 10.00G                    cmirror1_mlog  86.17
cmirror1_mimage_0(0),cmirror1_mimage_1(0),cmirror1_mimage_2(0)
  [cmirror1_mimage_0] vg   iwi-ao 10.00G                                       
                      
  [cmirror1_mimage_1] vg   iwi-ao 10.00G                                       
 /dev/sde1(0)         
  [cmirror1_mimage_2] vg   iwi-ao 10.00G                                       
 /dev/sdf1(0)         
  [cmirror1_mlog]     vg   lwi-ao  4.00M                                       
 /dev/sdg2(0)         

All the nodes running I/O to GFS on cmirror lose their connection to the machine:
[...]
<xior magic="0xfeed10"><read
syscall="readv"><path>/mnt/gfs1/link-07</path><oflags>O_RDONLY</oflags><offset>0</offset><count>163676</count></read></xior>
<xior magic="0xfeed10"><write
syscall="write"><path>/mnt/gfs1/link-07</path><oflags>O_RDWR</oflags><offset>0</offset><count>974966</count><pattern>D</pattern></write></xior>
Connection to link-07 closed.


[...]
<xior magic="0xfeed10"><read
syscall="read"><path>/mnt/gfs1/link-04</path><oflags>O_RDONLY</oflags><offset>0</offset><count>814950</count></read></xior>
<xior magic="0xfeed10"><write
syscall="writev"><path>/mnt/gfs1/link-04</path><oflags>O_RDWR</oflags><offset>0</offset><count>703955</count><pattern>P</pattern></write></xior>
Connection to link-04 closed.

# sync % is stuck
[root@link-08 ~]# lvs -a -o +devices
  /dev/dm-3: read failed after 0 of 4096 at 0: Input/output error
  /dev/dm-6: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh2: read failed after 0 of 2048 at 0: Input/output error
  LV                  VG   Attr   LSize  Origin Snap%  Move Log           Copy%
 Devices              
  cmirror1            vg   mwi-ao 10.00G                    cmirror1_mlog  86.17
cmirror1_mimage_0(0),cmirror1_mimage_1(0),cmirror1_mimage_2(0)
  [cmirror1_mimage_0] vg   iwi-ao 10.00G                                       
                      
  [cmirror1_mimage_1] vg   iwi-ao 10.00G                                       
 /dev/sde1(0)         
  [cmirror1_mimage_2] vg   iwi-ao 10.00G                                       
 /dev/sdf1(0)         
  [cmirror1_mlog]     vg   lwi-ao  4.00M                                       
 /dev/sdg2(0)

Comment 3 Jonathan Earl Brassow 2007-02-13 18:51:05 UTC
I propose setting a restriction that mirrors are limited to 2 sides for 4.5. 
This would diffuse this bug.  Once we agree on that, I'll open a RFE for 4.6 and
make this bug dependent on that.


Comment 4 Corey Marthaler 2007-04-12 18:32:06 UTC
Greater then 2 legged mirrors have been verified to down convert during leg
failures.

Comment 5 Chris Feist 2008-08-05 21:42:50 UTC
Fixed in current release (4.7).