Description of problem: After executing a successful device failure iteration on a 3 legged mirror, my script was unable to umount the gfs filesystem because it appeared to be missing. Stopping the io load (collie/xdoio) on mirror(s) Unmounting gfs and removing mnt point on taft-01... /sbin/umount.gfs: there isn't a GFS filesystem on /dev/mapper/helter_skelter-syncd_primary_3legs_1 /sbin/umount.gfs: there isn't a GFS filesystem on /dev/mapper/helter_skelter-syncd_primary_3legs_1 couldn't umount /mnt/syncd_primary_3legs_1 on taft-01 [root@taft-02 tmp]# gfs_tool sb /dev/helter_skelter/syncd_primary_3legs_1 all gfs_tool: there isn't a GFS filesystem on /dev/helter_skelter/syncd_primary_3legs_1 [root@taft-02 tmp]# lvs -a -o +devices LV VG Attr LSize Origin Snap% Move Log Copy% Convert Devices LogVol00 VolGroup00 -wi-ao 66.19G /dev/sda2(0) LogVol01 VolGroup00 -wi-ao 1.94G /dev/sda2(2118) syncd_primary_3legs_1 helter_skelter mwi-ao 800.00M syncd_primary_3legs_1_mlog 100.00 syncd_primary_3legs_1_mimage_0(0),syncd_primary_3legs_1_mimage_1(0),syncd_primary_3legs_1_mimage_2(0) [syncd_primary_3legs_1_mimage_0] helter_skelter iwi-ao 800.00M /dev/sdg1(0) [syncd_primary_3legs_1_mimage_1] helter_skelter iwi-ao 800.00M /dev/sdh1(0) [syncd_primary_3legs_1_mimage_2] helter_skelter iwi-ao 800.00M /dev/sde1(0) [syncd_primary_3legs_1_mlog] helter_skelter lwi-ao 4.00M /dev/sdf1(0) [root@taft-02 tmp]# dmsetup ls helter_skelter-syncd_primary_3legs_1_mimage_0 (253, 3) helter_skelter-syncd_primary_3legs_1_mlog (253, 2) helter_skelter-syncd_primary_3legs_1 (253, 6) VolGroup00-LogVol01 (253, 1) VolGroup00-LogVol00 (253, 0) helter_skelter-syncd_primary_3legs_1_mimage_2 (253, 5) helter_skelter-syncd_primary_3legs_1_mimage_1 (253, 4) I'll try and reproduce this and gather more info as this bug, if true, could be very serious. This may be a GFS issue instead of cmirror. Version-Release number of selected component (if applicable): 2.6.18-88.el5 lvm2-2.02.32-3.el5 lvm2-cluster-2.02.32-4.el5 openais-0.80.3-15.el5 gfs-utils-0.1.16-2.el5 kmod-gfs-0.1.23-3.el5
I was able to reproduce this issue. [...] Stopping the io load (collie/xdoio) on mirror(s) Unmounting gfs and removing mnt point on taft-01... /sbin/umount.gfs: there isn't a GFS filesystem on /dev/mapper/helter_skelter-syncd_primary_2legs_1 /sbin/umount.gfs: there isn't a GFS filesystem on /dev/mapper/helter_skelter-syncd_primary_2legs_1 couldn't umount /mnt/syncd_primary_2legs_1 on taft-01 [root@taft-01 tmp]# gfs_tool sb /dev/helter_skelter/syncd_primary_2legs_1 all gfs_tool: there isn't a GFS filesystem on /dev/helter_skelter/syncd_primary_2legs_1 2.6.18-90.el5 lvm2-2.02.32-4.el5 lvm2-cluster-2.02.32-4.el5 openais-0.80.3-15.el5 gfs-utils-0.1.17-1.el5 kmod-gfs-0.1.23-3.el5
"After executing a successful device failure iteration on a 3 legged mirror, my script was unable to umount the gfs filesystem because it appeared to be missing." ... but the rest of your data in comment #1 shows that there are still 3 legs and a log to the mirror. What gives? Wouldn't one of the legs be missing if the fault handling was successful?
Jon, that is because by that time in the test case, everything had been put back together like it was originally. IOW, the failed device was once again pvcreated, extened into the vg, and the cmirror converted back to the way it was. At that point the test is tearing everything down inorder to create a new cmirror set and try it all again, but it's that clean up that spots the "wth where did my gfs go?" I'll try to add a couple gfs checks earlier in the test case before the tear down, as well to verify that it isn't disappearing earlier, though it already does gfs I/O checks before and after most operations, so...
*** Bug 444983 has been marked as a duplicate of this bug. ***
Just a note that this is still reproducable using the same helter_skelter test case, Senario: Kill primary leg of synced 2 leg mirror(s).
Another note that i'm still seeing this issue, however it takes running helter_skelter for quite a few iterations before seeing this. There is probably no reason for this to block beta, however it should still be fixed for the RC.
Adding blocker flag for rc.
Just an FYI that this issue still appears in the latest cmirror rpms: 2.6.18-110.el5 lvm2-2.02.39-2.el5 BUILT: Wed Jul 9 07:26:29 CDT 2008 lvm2-cluster-2.02.39-1.el5 BUILT: Thu Jul 3 09:31:57 CDT 2008 device-mapper-1.02.27-1.el5 BUILT: Thu Jul 3 03:22:29 CDT 2008 cmirror-1.1.25-1.el5 BUILT: Fri Sep 19 16:27:46 CDT 2008 kmod-cmirror-0.1.17-1.el5 BUILT: Fri Sep 19 16:27:33 CDT 2008
The following commit should have fixed this issue: commit 85d1423ec47e48ab844088ebaf4157327b928ae9 Author: Jonathan Brassow <jbrassow> Date: Fri Sep 19 16:19:02 2008 -0500 dm-log-clustered/clogd: Fix off-by-one error and compilation errors Needed to tweek included header files to make dm-log-clustered compile again. Found an off-by-one error that was causing mirror corruption in the case where the primary mirror device was killed in a mirror. Assuming that the build date on the RPMs implies that this check-in was included, then I need to examine the possibility of some of the other scenarios I put forward for corruption.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-0158.html