Bug 228104
| Summary: | greater than 2 legged cluster mirrors do not down convert when a leg fails | ||
|---|---|---|---|
| Product: | [Retired] Red Hat Cluster Suite | Reporter: | Corey Marthaler <cmarthal> |
| Component: | cmirror | Assignee: | Jonathan Earl Brassow <jbrassow> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Cluster QE <mspqa-list> |
| Severity: | medium | Docs Contact: | |
| Priority: | high | ||
| Version: | 4 | CC: | agk, cfeist, dwysocha, jbrassow, mbroz, prockai |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2008-08-05 21:42:50 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Corey Marthaler
2007-02-09 23:36:03 UTC
Hmmm, this appears to work just fine in single node mirroring. [root@link-07 ~]# lvs -a -o +devices LV VG Attr LSize Origin Snap% Move Log Copy% Devices mirror vg mwi-ao 10.00G mirror_mlog 10.70 mirror_mimage_0(0),mirror_mimage_1(0),mirror_mimage_2(0),mirror_mimage_3(0) [mirror_mimage_0] vg iwi-ao 10.00G /dev/sdh1(0) [mirror_mimage_1] vg iwi-ao 10.00G /dev/sda1(0) [mirror_mimage_2] vg iwi-ao 10.00G /dev/sdb1(0) [mirror_mimage_3] vg iwi-ao 10.00G /dev/sdc1(0) [mirror_mlog] vg lwi-ao 4.00M /dev/sdd1(0) # FAIL /dev/sdh and wait... [root@link-07 ~]# lvs -a -o +devices /dev/sdh: read failed after 0 of 4096 at 0: Input/output error /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error /dev/sdh2: read failed after 0 of 2048 at 0: Input/output error LV VG Attr LSize Origin Snap% Move Log Copy% Devices mirror vg mwi-ao 10.00G mirror_mlog 13.12 mirror_mimage_3(0),mirror_mimage_1(0),mirror_mimage_2(0) [mirror_mimage_1] vg iwi-ao 10.00G /dev/sda1(0) [mirror_mimage_2] vg iwi-ao 10.00G /dev/sdb1(0) [mirror_mimage_3] vg iwi-ao 10.00G /dev/sdc1(0) [mirror_mlog] vg lwi-ao 4.00M /dev/sdd1(0) Here's what actually happening from the user's view point...
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg-cmirror1
9.5G 20K 9.5G 1% /mnt/gfs1
[root@link-07 ~]# lvs -a -o +devices
LV VG Attr LSize Origin Snap% Move Log Copy%
Devices
cmirror1 vg mwi-ao 10.00G cmirror1_mlog 71.33
cmirror1_mimage_0(0),cmirror1_mimage_1(0),cmirror1_mimage_2(0)
[cmirror1_mimage_0] vg iwi-ao 10.00G
/dev/sdh2(0)
[cmirror1_mimage_1] vg iwi-ao 10.00G
/dev/sde1(0)
[cmirror1_mimage_2] vg iwi-ao 10.00G
/dev/sdf1(0)
[cmirror1_mlog] vg lwi-ao 4.00M
/dev/sdg2(0)
[root@link-07 ~]# ls -lrt /mnt/gfs1
total 3936
-rw-rw-rw- 1 root root 1000000 Feb 12 08:33 link-02
-rw-rw-rw- 1 root root 1000000 Feb 12 09:00 link-08
-rw-rw-rw- 1 root root 1000000 Feb 12 09:00 link-04
-rw-rw-rw- 1 root root 1000000 Feb 12 14:07 link-07
[FAIL /dev/sdh]
[root@link-07 ~]# ls -lrt /mnt/gfs1
ls: /mnt/gfs1: Input/output error
[root@link-07 ~]# touch /mnt/gfs1/foo
touch: cannot touch `/mnt/gfs1/foo': Input/output error
Filesystem Size Used Avail Use% Mounted on
df: `/mnt/gfs1': Input/output error
# The leg "remains" in the mirror
[root@link-08 ~]# lvs -a -o +devices
/dev/dm-3: read failed after 0 of 4096 at 0: Input/output error
/dev/dm-6: read failed after 0 of 4096 at 0: Input/output error
/dev/sdh: read failed after 0 of 4096 at 0: Input/output error
/dev/sdh2: read failed after 0 of 2048 at 0: Input/output error
LV VG Attr LSize Origin Snap% Move Log Copy%
Devices
cmirror1 vg mwi-ao 10.00G cmirror1_mlog 86.17
cmirror1_mimage_0(0),cmirror1_mimage_1(0),cmirror1_mimage_2(0)
[cmirror1_mimage_0] vg iwi-ao 10.00G
[cmirror1_mimage_1] vg iwi-ao 10.00G
/dev/sde1(0)
[cmirror1_mimage_2] vg iwi-ao 10.00G
/dev/sdf1(0)
[cmirror1_mlog] vg lwi-ao 4.00M
/dev/sdg2(0)
All the nodes running I/O to GFS on cmirror lose their connection to the machine:
[...]
<xior magic="0xfeed10"><read
syscall="readv"><path>/mnt/gfs1/link-07</path><oflags>O_RDONLY</oflags><offset>0</offset><count>163676</count></read></xior>
<xior magic="0xfeed10"><write
syscall="write"><path>/mnt/gfs1/link-07</path><oflags>O_RDWR</oflags><offset>0</offset><count>974966</count><pattern>D</pattern></write></xior>
Connection to link-07 closed.
[...]
<xior magic="0xfeed10"><read
syscall="read"><path>/mnt/gfs1/link-04</path><oflags>O_RDONLY</oflags><offset>0</offset><count>814950</count></read></xior>
<xior magic="0xfeed10"><write
syscall="writev"><path>/mnt/gfs1/link-04</path><oflags>O_RDWR</oflags><offset>0</offset><count>703955</count><pattern>P</pattern></write></xior>
Connection to link-04 closed.
# sync % is stuck
[root@link-08 ~]# lvs -a -o +devices
/dev/dm-3: read failed after 0 of 4096 at 0: Input/output error
/dev/dm-6: read failed after 0 of 4096 at 0: Input/output error
/dev/sdh: read failed after 0 of 4096 at 0: Input/output error
/dev/sdh2: read failed after 0 of 2048 at 0: Input/output error
LV VG Attr LSize Origin Snap% Move Log Copy%
Devices
cmirror1 vg mwi-ao 10.00G cmirror1_mlog 86.17
cmirror1_mimage_0(0),cmirror1_mimage_1(0),cmirror1_mimage_2(0)
[cmirror1_mimage_0] vg iwi-ao 10.00G
[cmirror1_mimage_1] vg iwi-ao 10.00G
/dev/sde1(0)
[cmirror1_mimage_2] vg iwi-ao 10.00G
/dev/sdf1(0)
[cmirror1_mlog] vg lwi-ao 4.00M
/dev/sdg2(0)
I propose setting a restriction that mirrors are limited to 2 sides for 4.5. This would diffuse this bug. Once we agree on that, I'll open a RFE for 4.6 and make this bug dependent on that. Greater then 2 legged mirrors have been verified to down convert during leg failures. Fixed in current release (4.7). |