Bug 441970 - RHEL5 cmirror tracker: filesystem is missing after 'successful' device failure iteration
RHEL5 cmirror tracker: filesystem is missing after 'successful' device failur...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cmirror (Show other bugs)
5.2
All Linux
high Severity high
: rc
: ---
Assigned To: Jonathan Earl Brassow
Cluster QE
: TestBlocker
: 444983 (view as bug list)
Depends On:
Blocks: 444983
  Show dependency treegraph
 
Reported: 2008-04-10 17:52 EDT by Corey Marthaler
Modified: 2010-01-11 21:07 EST (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-20 16:25:43 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2008-04-10 17:52:33 EDT
Description of problem:
After executing a successful device failure iteration on a 3 legged mirror, my
script was unable to umount the gfs filesystem because it appeared to be missing. 

Stopping the io load (collie/xdoio) on mirror(s)
Unmounting gfs and removing mnt point on taft-01...
/sbin/umount.gfs: there isn't a GFS filesystem on
/dev/mapper/helter_skelter-syncd_primary_3legs_1
/sbin/umount.gfs: there isn't a GFS filesystem on
/dev/mapper/helter_skelter-syncd_primary_3legs_1
couldn't umount /mnt/syncd_primary_3legs_1 on taft-01

[root@taft-02 tmp]# gfs_tool sb /dev/helter_skelter/syncd_primary_3legs_1 all
gfs_tool: there isn't a GFS filesystem on /dev/helter_skelter/syncd_primary_3legs_1

[root@taft-02 tmp]# lvs -a -o +devices
  LV                               VG             Attr   LSize   Origin Snap% 
Move Log                        Copy%  Convert Devices
  LogVol00                         VolGroup00     -wi-ao  66.19G               
                                              /dev/sda2(0)
  LogVol01                         VolGroup00     -wi-ao   1.94G               
                                              /dev/sda2(2118)
  syncd_primary_3legs_1            helter_skelter mwi-ao 800.00M               
    syncd_primary_3legs_1_mlog 100.00        
syncd_primary_3legs_1_mimage_0(0),syncd_primary_3legs_1_mimage_1(0),syncd_primary_3legs_1_mimage_2(0)
  [syncd_primary_3legs_1_mimage_0] helter_skelter iwi-ao 800.00M               
                                              /dev/sdg1(0)
  [syncd_primary_3legs_1_mimage_1] helter_skelter iwi-ao 800.00M               
                                              /dev/sdh1(0)
  [syncd_primary_3legs_1_mimage_2] helter_skelter iwi-ao 800.00M               
                                              /dev/sde1(0)
  [syncd_primary_3legs_1_mlog]     helter_skelter lwi-ao   4.00M               
                                              /dev/sdf1(0)
[root@taft-02 tmp]# dmsetup ls
helter_skelter-syncd_primary_3legs_1_mimage_0   (253, 3)
helter_skelter-syncd_primary_3legs_1_mlog       (253, 2)
helter_skelter-syncd_primary_3legs_1    (253, 6)
VolGroup00-LogVol01     (253, 1)
VolGroup00-LogVol00     (253, 0)
helter_skelter-syncd_primary_3legs_1_mimage_2   (253, 5)
helter_skelter-syncd_primary_3legs_1_mimage_1   (253, 4)


I'll try and reproduce this and gather more info as this bug, if true, could be
very serious. This may be a GFS issue instead of cmirror.


Version-Release number of selected component (if applicable):
2.6.18-88.el5
lvm2-2.02.32-3.el5
lvm2-cluster-2.02.32-4.el5
openais-0.80.3-15.el5
gfs-utils-0.1.16-2.el5
kmod-gfs-0.1.23-3.el5
Comment 1 Corey Marthaler 2008-04-22 10:00:15 EDT
I was able to reproduce this issue.

[...]
Stopping the io load (collie/xdoio) on mirror(s)
Unmounting gfs and removing mnt point on taft-01...
/sbin/umount.gfs: there isn't a GFS filesystem on
/dev/mapper/helter_skelter-syncd_primary_2legs_1
/sbin/umount.gfs: there isn't a GFS filesystem on
/dev/mapper/helter_skelter-syncd_primary_2legs_1
couldn't umount /mnt/syncd_primary_2legs_1 on taft-01

[root@taft-01 tmp]# gfs_tool sb /dev/helter_skelter/syncd_primary_2legs_1 all
gfs_tool: there isn't a GFS filesystem on /dev/helter_skelter/syncd_primary_2legs_1

2.6.18-90.el5
lvm2-2.02.32-4.el5
lvm2-cluster-2.02.32-4.el5
openais-0.80.3-15.el5
gfs-utils-0.1.17-1.el5
kmod-gfs-0.1.23-3.el5
Comment 2 Jonathan Earl Brassow 2008-05-02 10:32:06 EDT
"After executing a successful device failure iteration on a 3 legged mirror, my
script was unable to umount the gfs filesystem because it appeared to be missing."

... but the rest of your data in comment #1 shows that there are still 3 legs
and a log to the mirror.  What gives?  Wouldn't one of the legs be missing if
the fault handling was successful?
Comment 3 Corey Marthaler 2008-05-02 14:19:00 EDT
Jon, that is because by that time in the test case, everything had been put back
together like it was originally. IOW, the failed device was once again
pvcreated, extened into the vg, and the cmirror converted back to the way it
was. At that point the test is tearing everything down inorder to create a new
cmirror set and try it all again, but it's that clean up that spots the "wth
where did my gfs go?"

I'll try to add a couple gfs checks earlier in the test case before the tear
down, as well to verify that it isn't disappearing earlier, though it already
does gfs I/O checks before and after most operations, so...
Comment 4 Corey Marthaler 2008-07-23 17:10:51 EDT
*** Bug 444983 has been marked as a duplicate of this bug. ***
Comment 5 Corey Marthaler 2008-07-25 11:24:34 EDT
Just a note that this is still reproducable using the same helter_skelter test
case, Senario: Kill primary leg of synced 2 leg mirror(s).
Comment 6 Corey Marthaler 2008-08-22 10:35:34 EDT
Another note that i'm still seeing this issue, however it takes running helter_skelter for quite a few iterations before seeing this. 

There is probably no reason for this to block beta, however it should still be fixed for the RC.
Comment 7 Kiersten (Kerri) Anderson 2008-09-19 10:31:46 EDT
Adding blocker flag for rc.
Comment 8 Corey Marthaler 2008-09-22 12:06:23 EDT
Just an FYI that this issue still appears in the latest cmirror rpms:

2.6.18-110.el5

lvm2-2.02.39-2.el5    BUILT: Wed Jul  9 07:26:29 CDT 2008
lvm2-cluster-2.02.39-1.el5    BUILT: Thu Jul  3 09:31:57 CDT 2008
device-mapper-1.02.27-1.el5    BUILT: Thu Jul  3 03:22:29 CDT 2008
cmirror-1.1.25-1.el5    BUILT: Fri Sep 19 16:27:46 CDT 2008
kmod-cmirror-0.1.17-1.el5    BUILT: Fri Sep 19 16:27:33 CDT 2008
Comment 10 Jonathan Earl Brassow 2008-09-29 17:33:25 EDT
The following commit should have fixed this issue:

commit 85d1423ec47e48ab844088ebaf4157327b928ae9
Author: Jonathan Brassow <jbrassow@redhat.com>
Date:   Fri Sep 19 16:19:02 2008 -0500

    dm-log-clustered/clogd: Fix off-by-one error and compilation errors

    Needed to tweek included header files to make dm-log-clustered compile
    again.

    Found an off-by-one error that was causing mirror corruption in the
    case where the primary mirror device was killed in a mirror.

Assuming that the build date on the RPMs implies that this check-in was included, then I need to examine the possibility of some of the other scenarios I put forward for corruption.
Comment 13 errata-xmlrpc 2009-01-20 16:25:43 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0158.html

Note You need to log in before you can comment on or make changes to this bug.