Bug 441970
Summary: | RHEL5 cmirror tracker: filesystem is missing after 'successful' device failure iteration | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Corey Marthaler <cmarthal> |
Component: | cmirror | Assignee: | Jonathan Earl Brassow <jbrassow> |
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 5.2 | CC: | agk, bstevens, ccaulfie, dwysocha, edamato, heinzm, jbrassow, mbroz, syeghiay |
Target Milestone: | rc | Keywords: | TestBlocker |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-01-20 21:25:43 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 444983 |
Description
Corey Marthaler
2008-04-10 21:52:33 UTC
I was able to reproduce this issue. [...] Stopping the io load (collie/xdoio) on mirror(s) Unmounting gfs and removing mnt point on taft-01... /sbin/umount.gfs: there isn't a GFS filesystem on /dev/mapper/helter_skelter-syncd_primary_2legs_1 /sbin/umount.gfs: there isn't a GFS filesystem on /dev/mapper/helter_skelter-syncd_primary_2legs_1 couldn't umount /mnt/syncd_primary_2legs_1 on taft-01 [root@taft-01 tmp]# gfs_tool sb /dev/helter_skelter/syncd_primary_2legs_1 all gfs_tool: there isn't a GFS filesystem on /dev/helter_skelter/syncd_primary_2legs_1 2.6.18-90.el5 lvm2-2.02.32-4.el5 lvm2-cluster-2.02.32-4.el5 openais-0.80.3-15.el5 gfs-utils-0.1.17-1.el5 kmod-gfs-0.1.23-3.el5 "After executing a successful device failure iteration on a 3 legged mirror, my script was unable to umount the gfs filesystem because it appeared to be missing." ... but the rest of your data in comment #1 shows that there are still 3 legs and a log to the mirror. What gives? Wouldn't one of the legs be missing if the fault handling was successful? Jon, that is because by that time in the test case, everything had been put back together like it was originally. IOW, the failed device was once again pvcreated, extened into the vg, and the cmirror converted back to the way it was. At that point the test is tearing everything down inorder to create a new cmirror set and try it all again, but it's that clean up that spots the "wth where did my gfs go?" I'll try to add a couple gfs checks earlier in the test case before the tear down, as well to verify that it isn't disappearing earlier, though it already does gfs I/O checks before and after most operations, so... *** Bug 444983 has been marked as a duplicate of this bug. *** Just a note that this is still reproducable using the same helter_skelter test case, Senario: Kill primary leg of synced 2 leg mirror(s). Another note that i'm still seeing this issue, however it takes running helter_skelter for quite a few iterations before seeing this. There is probably no reason for this to block beta, however it should still be fixed for the RC. Adding blocker flag for rc. Just an FYI that this issue still appears in the latest cmirror rpms: 2.6.18-110.el5 lvm2-2.02.39-2.el5 BUILT: Wed Jul 9 07:26:29 CDT 2008 lvm2-cluster-2.02.39-1.el5 BUILT: Thu Jul 3 09:31:57 CDT 2008 device-mapper-1.02.27-1.el5 BUILT: Thu Jul 3 03:22:29 CDT 2008 cmirror-1.1.25-1.el5 BUILT: Fri Sep 19 16:27:46 CDT 2008 kmod-cmirror-0.1.17-1.el5 BUILT: Fri Sep 19 16:27:33 CDT 2008 The following commit should have fixed this issue: commit 85d1423ec47e48ab844088ebaf4157327b928ae9 Author: Jonathan Brassow <jbrassow> Date: Fri Sep 19 16:19:02 2008 -0500 dm-log-clustered/clogd: Fix off-by-one error and compilation errors Needed to tweek included header files to make dm-log-clustered compile again. Found an off-by-one error that was causing mirror corruption in the case where the primary mirror device was killed in a mirror. Assuming that the build date on the RPMs implies that this check-in was included, then I need to examine the possibility of some of the other scenarios I put forward for corruption. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-0158.html |