Bug 228104 - greater than 2 legged cluster mirrors do not down convert when a leg fails
greater than 2 legged cluster mirrors do not down convert when a leg fails
Status: CLOSED CURRENTRELEASE
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: cmirror (Show other bugs)
4
All Linux
high Severity medium
: ---
: ---
Assigned To: Jonathan Earl Brassow
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-02-09 18:36 EST by Corey Marthaler
Modified: 2010-01-11 21:02 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-08-05 17:42:50 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2007-02-09 18:36:03 EST
Description of problem:
This is basically the root cause of bz 228067 and bz 228070, and this may be a
single node mirroring issuse and not cluster specific. Unlike a two legged
mirror, when a three or greater legged mirror has a leg failure, is doesn't
properly down convert it, thus causing problems with whatever is currently using
the mirror.

Version-Release number of selected component (if applicable):
2.6.9-46.ELsmp
lvm2-2.02.21-1.el4
lvm2-cluster-2.02.21-3.el4

How reproducible:
everytime
Comment 1 Corey Marthaler 2007-02-09 18:44:38 EST
Hmmm, this appears to work just fine in single node mirroring.


[root@link-07 ~]# lvs -a -o +devices
  LV                VG   Attr   LSize  Origin Snap%  Move Log         Copy% 
Devices                  
  mirror            vg   mwi-ao 10.00G                    mirror_mlog  10.70
mirror_mimage_0(0),mirror_mimage_1(0),mirror_mimage_2(0),mirror_mimage_3(0)
  [mirror_mimage_0] vg   iwi-ao 10.00G                                      
/dev/sdh1(0)             
  [mirror_mimage_1] vg   iwi-ao 10.00G                                      
/dev/sda1(0)             
  [mirror_mimage_2] vg   iwi-ao 10.00G                                      
/dev/sdb1(0)             
  [mirror_mimage_3] vg   iwi-ao 10.00G                                      
/dev/sdc1(0)             
  [mirror_mlog]     vg   lwi-ao  4.00M                                      
/dev/sdd1(0)             


# FAIL /dev/sdh and wait...

[root@link-07 ~]# lvs -a -o +devices
  /dev/sdh: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error
  /dev/sdh2: read failed after 0 of 2048 at 0: Input/output error
  LV                VG   Attr   LSize  Origin Snap%  Move Log         Copy% 
Devices                  
  mirror            vg   mwi-ao 10.00G                    mirror_mlog  13.12
mirror_mimage_3(0),mirror_mimage_1(0),mirror_mimage_2(0)
  [mirror_mimage_1] vg   iwi-ao 10.00G                                      
/dev/sda1(0)             
  [mirror_mimage_2] vg   iwi-ao 10.00G                                      
/dev/sdb1(0)             
  [mirror_mimage_3] vg   iwi-ao 10.00G                                      
/dev/sdc1(0)             
  [mirror_mlog]     vg   lwi-ao  4.00M                                      
/dev/sdd1(0)             
Comment 2 Corey Marthaler 2007-02-12 14:53:58 EST
Here's what actually happening from the user's view point...

Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg-cmirror1
                      9.5G   20K  9.5G   1% /mnt/gfs1

[root@link-07 ~]# lvs -a -o +devices
  LV                  VG   Attr   LSize  Origin Snap%  Move Log           Copy%
 Devices              
  cmirror1            vg   mwi-ao 10.00G                    cmirror1_mlog  71.33
cmirror1_mimage_0(0),cmirror1_mimage_1(0),cmirror1_mimage_2(0)
  [cmirror1_mimage_0] vg   iwi-ao 10.00G                                       
 /dev/sdh2(0)         
  [cmirror1_mimage_1] vg   iwi-ao 10.00G                                       
 /dev/sde1(0)         
  [cmirror1_mimage_2] vg   iwi-ao 10.00G                                       
 /dev/sdf1(0)         
  [cmirror1_mlog]     vg   lwi-ao  4.00M                                       
 /dev/sdg2(0)         
[root@link-07 ~]# ls -lrt /mnt/gfs1
total 3936
-rw-rw-rw-  1 root root 1000000 Feb 12 08:33 link-02
-rw-rw-rw-  1 root root 1000000 Feb 12 09:00 link-08
-rw-rw-rw-  1 root root 1000000 Feb 12 09:00 link-04
-rw-rw-rw-  1 root root 1000000 Feb 12 14:07 link-07


[FAIL /dev/sdh]


[root@link-07 ~]# ls -lrt /mnt/gfs1
ls: /mnt/gfs1: Input/output error
[root@link-07 ~]# touch /mnt/gfs1/foo
touch: cannot touch `/mnt/gfs1/foo': Input/output error

Filesystem            Size  Used Avail Use% Mounted on
df: `/mnt/gfs1': Input/output error

# The leg "remains" in the mirror
[root@link-08 ~]# lvs -a -o +devices
  /dev/dm-3: read failed after 0 of 4096 at 0: Input/output error
  /dev/dm-6: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh2: read failed after 0 of 2048 at 0: Input/output error
  LV                  VG   Attr   LSize  Origin Snap%  Move Log           Copy%
 Devices              
  cmirror1            vg   mwi-ao 10.00G                    cmirror1_mlog  86.17
cmirror1_mimage_0(0),cmirror1_mimage_1(0),cmirror1_mimage_2(0)
  [cmirror1_mimage_0] vg   iwi-ao 10.00G                                       
                      
  [cmirror1_mimage_1] vg   iwi-ao 10.00G                                       
 /dev/sde1(0)         
  [cmirror1_mimage_2] vg   iwi-ao 10.00G                                       
 /dev/sdf1(0)         
  [cmirror1_mlog]     vg   lwi-ao  4.00M                                       
 /dev/sdg2(0)         

All the nodes running I/O to GFS on cmirror lose their connection to the machine:
[...]
<xior magic="0xfeed10"><read
syscall="readv"><path>/mnt/gfs1/link-07</path><oflags>O_RDONLY</oflags><offset>0</offset><count>163676</count></read></xior>
<xior magic="0xfeed10"><write
syscall="write"><path>/mnt/gfs1/link-07</path><oflags>O_RDWR</oflags><offset>0</offset><count>974966</count><pattern>D</pattern></write></xior>
Connection to link-07 closed.


[...]
<xior magic="0xfeed10"><read
syscall="read"><path>/mnt/gfs1/link-04</path><oflags>O_RDONLY</oflags><offset>0</offset><count>814950</count></read></xior>
<xior magic="0xfeed10"><write
syscall="writev"><path>/mnt/gfs1/link-04</path><oflags>O_RDWR</oflags><offset>0</offset><count>703955</count><pattern>P</pattern></write></xior>
Connection to link-04 closed.

# sync % is stuck
[root@link-08 ~]# lvs -a -o +devices
  /dev/dm-3: read failed after 0 of 4096 at 0: Input/output error
  /dev/dm-6: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh2: read failed after 0 of 2048 at 0: Input/output error
  LV                  VG   Attr   LSize  Origin Snap%  Move Log           Copy%
 Devices              
  cmirror1            vg   mwi-ao 10.00G                    cmirror1_mlog  86.17
cmirror1_mimage_0(0),cmirror1_mimage_1(0),cmirror1_mimage_2(0)
  [cmirror1_mimage_0] vg   iwi-ao 10.00G                                       
                      
  [cmirror1_mimage_1] vg   iwi-ao 10.00G                                       
 /dev/sde1(0)         
  [cmirror1_mimage_2] vg   iwi-ao 10.00G                                       
 /dev/sdf1(0)         
  [cmirror1_mlog]     vg   lwi-ao  4.00M                                       
 /dev/sdg2(0)
Comment 3 Jonathan Earl Brassow 2007-02-13 13:51:05 EST
I propose setting a restriction that mirrors are limited to 2 sides for 4.5. 
This would diffuse this bug.  Once we agree on that, I'll open a RFE for 4.6 and
make this bug dependent on that.
Comment 4 Corey Marthaler 2007-04-12 14:32:06 EDT
Greater then 2 legged mirrors have been verified to down convert during leg
failures.
Comment 5 Chris Feist 2008-08-05 17:42:50 EDT
Fixed in current release (4.7).

Note You need to log in before you can comment on or make changes to this bug.